OpenFabrics Management Framework
The OpenFabrics Management Framework provides open-source data structures to help simplify the development of composable distributed disaggregated computer architectures. The OpenFabrics Managment Framework contains abstract data structures that represent computer system resources, available network fabric components and management, current resource operational conditions, and abstracted representations of composed disaggregated computing systems.
Composable disaggregated computing architectures have the potential to mitigate resource thrashing and oversubscription, mitigate stranded resources, and dynamically assemble resources to meet client application needs. Modern HPC and Enterprise computing systems can benefit from a more efficient way to interact with, manage, manipulate, and control available resources and high-speed network fabrics.
The OpenFabrics Alliance (OFA), together with its partners, the DMTF, SNIA, and the CXL-Consortium, are developing a new open-source composable computing system framework to provide a unified set of tools to control and monitor both computing resources and multiple network fabric types. The OpenFabrics Management Framework provides clients with the ability to interact with distributed resources and connect fabrics to automate orchestration and maintain complex systems, connected by fabrics. Current code work is published at: https://github.com/OFMFWG
The OpenFabrics Management Framework (OFMF) provides computing system clients with a common set of tools, to interact with disaggregated fabrics and resources. Clients can include Message Passing Interface applications, Fabric Attached Memory (FAM), Workload, Resource, and Cloud managers, IO systems, and CPU and accelerator resources. Clients can be physical machines, virtual machines, and containers. Any entity (SW tool, admin GUI, shell script via CLI) may create a virtual platform, pod, cluster, partition, vLan, job queue, or subnet to enable some workload(s) to execute.
The OpenFabrics Management Framework is designed for System Administrators, Application Programmers and users, HPC and Cloud Architecture Designers, and other stakeholders that are involved in the design and deployment of stable and high-speed network based computing systems.
The OpenFabrics Management Framework (OFMF) provides a universal set of RESTful interaces and tools and services to manage fabric attached resources, such as, CPUs, Accelerators, and Memory Devices. The OFMF uses the common languages of Redfish and Swordfish, to allow clients to gather telemetry information on fabrics and components, request information about fabric attachments, allocate components, and compose disaggregated systems. Each vendor specific fabric can be controlled and manipulated through the use of a custom agent that is designed to provide its services and functions to the OFMF via the Redfish API. Redfish currently has an object called a ‘zone’ that contains a list of the fabric endpoints which may be connected to each other. Zones can be used to enumerate the members of a vLAN, or the resources of a virtual platform. The OpenFabrics Management Framework (OFMF) is designed to be versatile and allow clients to connect and interact with underlying high-speed fabrics.
The example below shows a basic fabric model example with disaggregated memory components. The memory components can be allocated over fabrics to multiple composed systems. The OpenFabrics Management Framework provides tools and constructs to keep track of both allocation of memory resources, pathways from CPUs and Accelerators to the memory, and current running compute states.
An abstract model in the ‘Redfish/Swordfish domain’ can be modeled as a group of endpoints, resources, zones, and zones-of-zones. An endpoint can be considered to be a destination, such as a server connected by a network card or a switch port. A resource can be considered as a component that provides services to a fabric. A zone can be considered to be a set of endpoints and resources that provide an integrated unit, such as a collection of remote memory. Finally, a zone-of-zones can be considered to be a unit or collection, of zones.
In the Redfish model there is no notion of physical separation. Thus, a zone of memory, for instance, could be made up of 2 separate memory chunks that are routable to each other, yet are not located in the same endpoint. The model above depicts only the logical resources, not the physical connections. Redfish models the physical fabric topology by associating actual fabric ports and fabric links, as shown below:
The OpenFabrics Management Framework (OFMF) architecture is shown in the diagram below.
On the left side of the architecture diagram, clients access the abstract representation of the compute system through 'interfaces'. These interfaces provide up-to-monitoring of the disaggregated components, event monitoring, composition policies, and current composition states. In turn, the interfaces use RESTful interface connections to the OFMF toolsets to gather information on resouce Inventory, current resource configurations, fabric configurations, Access Control, and events and logs.
On the right side of the diagram, both fabric resouces communicate through the use of 'Agents'. Fabric Agents communicate with Fabric Managers and convert information in 'Redfish' abstractions that are communicated back to the OFMF.