The HPCRE group currently administers three HPC clusters: Ganymede, Ganymede 2, and most recently, Juno, our new cluster coming online in early January 2025.
Ganymede
This is our primary campus HPC resource. Ganymede, named after Jupiter’s largest satellite, is a cluster built on the condo model. While Ganymede does have significant free-to-use queues available to all UT Dallas researchers, a majority of the computational power is provided by nodes purchased for exclusive group access.
Compute System Specs
Ganymede has over 2,000 CPU cores, 9TB of memory, and attached storage for computation. Ganymede has been the workhorse here at UTD, currently utilized by over 600 users across many schools at UT Dallas with multiple software packages and scientific programs being utilized and supported.
Ganymede is set up to run only one job per node. When a user submits a job, they will be given exclusive access to the entire node, regardless of how many cores or how much memory is requested. The following partitions are available by default:
Note: The resources listed only account for a fraction of the compute capacity of Ganymede.
Queue Name | Number of nodes | Cores (CPU Architecture) | Memory | Time Limit ([d-]hh:mm:ss) |
debug | 2 | 16 (Intel Sandy Bridge) | 32 GB | 02:00:00 |
normal | 110 | 16 (Intel Sandy Bridge) | 32 GB | 4-00:00:00 |
128s | 8 | 16 (Intel Sandy Bridge) | 128 GB | 4-00:00:00 |
256i | 16 | 20 (Intel Ivy Bridge) | 256 GB | 4-00:00:00 |
256h | 1 | 16 ( Intel Haswell) | 256 GB | 4-00:00:00 |
Ganymede2
This is a computation condo HPC system, supplying a variety of hardware, bespoke to our researchers’ needs. Condos are built and offered to PIs based on their needs of budget, hardware, and compute use cases. Ganymede2 has advanced queueing support as well as support for both GPU and CPU-only nodes.
Compute System Specs
Because of the diversity of condos in Ganymede2, there isn’t a uniform definition of compute. Moreover, this is a growing cluster and as and when a buy-in happens, we recommend the latest-at-that-point hardware to be installed. With this progression, we are now at 90 CPU Nodes, 22 GPU Nodes with 106 GPUs.
Even though Ganymede2 assets are primarily owned by private researchers, the system has what are called “preempt” queues, which allow job submission from all Ganymede2 users. These preempt jobs (named cpu-preempt and gpu-preeempt) are heavily de-prioritized to the actual queue owner, so any workloads submitted to these queues should be seen as volatile and should heavily utilize checkpointing. When a preempt job is preempted, that job is killed immediately and forcefully. If data isn’t being constantly saved to an output file, data loss should be expected.
Ganymede2, unlike its predecessor, allows multiple jobs per node. Nodes can be “mixed”state, which indicates that node is currently processing multiple jobs at once. GPU nodes with multiple GPUs can have individual GPUs queued up to different jobs, or in some cases each GPU can run multiple jobs at the same time.
The following partitions are available to all users:
Queue Name | Number of nodes | Cores/Threads (CPU Architecture) | Memory | Time Limit ([d-]hh:mm:ss) | GPUs? | Use Case |
dev | 2 | 64/128 (Ice Lake) | 256GB | 2:00:00 | No | Code debugging, job submission testing |
normal | 4 | 64/128 (Ice Lake) | 256GB | 2-00:00:00 | No | Normal code runs, CPU only |
cpu-preempt | 8 | VARIOUS | VARIOUS | 7-00:00:00 | No | Volatile CPU job submission |
gpu-preempt | 6 | VARIOUS | VARIOUS | 7-00:00:00 | Yes, VARIOUS types | Volatile GPU job submission |
Juno
Juno is the next generation HPC cluster coming online in spring 2025. Juno has 5,184 CPU cores, 32.25 TB of memory and 12 GPUs. The detailed specifications are below.
Nodes | Number of nodes | Specification |
CPU Compute | 72 | 2x AMD EPYC 9334 2.7 GHz 32C/64T 128M cache, 384GB RAM |
GPU Compute | 1 | 2x Intel Xeon Platinum 8462Y 2.8GHz 32C/64T 60M cache, 512GB RAM, 4x Nvidia HGX H100 80GB HBM3 GPUs |
GPU Compute | 4 | 2x AMD EPYC 9534 2.4GHz 64C/128T 256M cache, 1TB RAM, 2x Nvidia A30 24GB GPUs |
Login | 1 | 2x Intel Xeon Sliver 4309Y 2.8GHz 8C/16T 12M cache, 128GB RAM |
Networking support for all our clusters is based on 100Gb/s Mellanox Infiniband (ultra-low latency) high-performance network.
Look for updated information on Juno coming soon!