OpenFabrics Management Framework
The OpenFabrics Management Framework provides a common framework to help simplify the development of network fabric management applications and machinery. Modern HPC and Enterprise computing systems can benefit from a more efficient way to interact with, manage, manipulate, and control high-speed network fabrics. The OpenFabrics Alliance (OFA) together with its partners, the DMTF, SNIA, and the Gen-Z Consortium, are developing a new open-source fabric management framework to provide a unified set of tools to control and monitor multiple network fabric types. The OpenFabrics Management Framework provides clients with the ability to interact with distributed resources and connecting fabrics to automate orchestration and maintain complex systems, connected by fabrics. Current code work is published at: https://github.com/OFMFWG
The OpenFabrics Management Framework (OFMF) provides clients with a common set of tools, to interact with underlying fabrics. Clients can include Message Passing Interface applications, Fabric Attached Memory (FAM), Workload, Resource, and Cloud managers, and accelerator cards. Clients can be physical machines, virtual machines, and containers. Any entity (SW tool, admin GUI, shell script via CLI) may create a virtual platform, pod, cluster, partition, vLan, job queue, or subnet to enable some workload(s) to execute.
The OpenFabrics Management Framework is designed for System Administrators, Application Programmers and users, HPC and Cloud Architecture Designers, and other stakeholders that are involved in the design and deployment of stable and high-speed network based computing systems.
The OpenFabrics Management Framework (OFMF) provides a universal set of RESTful interfaces and tools and services to manage attached fabrics. The OFMF uses a common language, Redfish, to allow clients to manipulate network fabrics and request information about the fabrics. Each vendor specific fabric can be controlled and manipulated through the use of a custom agent that is designed to provide its services and functions to the OFMF via the Redfish API. Redfish currently has an object called a ‘zone’ that contains a list of the fabric endpoints which may be connected to each other. Zones can be used to enumerate the members of a vLAN, or the resources of a virtual platform. The OpenFabrics Management Framework (OFMF) is designed to be versatile and allow clients to connect and interact with underlying high-speed fabrics.
A fabric model in the ‘Redfish domain’ can be modeled as a group of endpoints, resources, zones, and zones-of-zones. An endpoint can be considered to be a destination, such as a server connected by a network card or a switch port. A resource can be considered as a component that provides services to a fabric. A zone can be considered to be a set of endpoints and resources that provide an integrated unit, such as a collection of remote memory. Finally, a zone-of-zones can be considered to be a unit or collection, of zones.
In the Redfish model there is no notion of physical separation. Thus, a zone of memory, for instance, could be made up of 2 separate memory chunks that are routable to each other, yet are not located in the same endpoint. The model above depicts only the logical resources, not the physical connections. Redfish models the physical fabric topology by associating actual fabric ports and fabric links, as shown below:
In this Use Case, a client requests a fabric endpoint connection between a server and Fabric Attached Memory (FAM) with read/write permissions, no fabric encryption, while picking the connection with the lowest latency, highest bandwidth, and at least one redundant path in active/active mode.
Proof-of-Concept using the Gen-Z fabric
The Gen-Z Consortium, the OpenFabrics Alliance, the DMTF, and the Storage Network Industry Association (SNIA) demonstrated key features of the new OFMF and in-band Gen-Z fabric management at the SC21 Supercomputing Conference in St. Louis, Missouri. Further advancements will be demonstrated at the OpenFabrics Alliance 2022 Workshop.
The purpose of the proof-of-concept is to allow a user to interface with a GUI to reach across a Gen-Z fabric and assemble combinations of Fabric Attached Memory. The OFMF provides means for clients to interact with the fabric. The clients consist of a User, a Fabric Attached Memory Manager, and a Composition Manager.
In this Use Case, use a Redfish Composition to allocate a zone object to define a virtual, private network within a larger fabric.
In this Use Case, use a Redfish Composition to interact with Slurm to define a zone of nodes within a larger fabric