Use Case 6: More Efficient Use of Resources Through Resource Pools

In current HPC systems, hardware resources that are necessary to provide every type of computation must be installed into the compute nodes.  This type of architectural design leads to wasted resources and limitations on the resources that are available to user jobs.  For instance, if a user job requires six GPUs, 1 TB of on-board memory, and eight CPU cores in each node to complete a batch job, then the job cannot be run on a cluster with four GPUs and four CPU cores.  In addition, if a batch job needs two GPUs antwo CPU cores, two GPUs and two CPU cores are wasted resources that can't be allocated to other 'needy' batch jobs.

In a new composable HPC system (see the figure below), resource pools can allocate CPU cores, NVMe memory devices, GPUs, and FPGAs from available hardware pools, through aggregated RDMA and CXL fabrics.  Hardware resources that are combined in such a manner provide versatility to batch jobs and mitigation of wasted resources.  In addition, if a compute node is using only a single CPU core and a GPU is not being utilized, the GPU can be allocated to another compute node that needs the GPU to fulfill the requirements of its own batch job.