Use Case 2: Mitigation of Lustre OSS Node Thrashing
Current Lustre Parallel Filesystems are designed with a back-end block storage filesystem to provide transactions to physical storage. Lustre block storage filesystems are implemented using LdiskFS and ZFS, with ZFS being a better option for the fact that it allows potential data backups and software RAID. The Lustre implementation of ZFS uses an Adaptive Replacement Cache (ARC) to store most frequently used and last used data, Lustre OSS servers are provided with a significant portion of running vIrtual memory. Modern operating systems do swaps to provide running processes and threads with appropriate free memory as memory fills up. Under current non-composable architectures, as the ARC cache fills up, IO threads are put into a standby state to allow time sharing of memory resources. Under very high load, in practice, it can be found that high quantities of swaps are performed that can lead to snowballing of IO threads being held in a wait queue waiting for service. This can lead to degradation of IO performance, and ultimately, Lustre filesystem failure due to thrashing.
In the figure below, dynamic composition using Sunfish has provided run-time memory resources. The necessary memory has been added from a pool of available memory, whether that memory has come from another OSS node or from an available pool of NVMe memory. The ARC cache can be expanded dynamically, and waiting IO processes and threads can be serviced in a timely manner. The memory has been chosen for best location, least latency, best network bandwidth, and least number of network 'hops' across switches through the implementation of Machine Learning. In this use case, dynamic allocation of memory resources has mitigated poor Lustre IO performance, and possibly, Lustre filesystem failure.