The State of High-Performance Fabrics: A Chat with the OpenFabrics Alliance

/ / Consortium News

This article was originally published on insideHPC on February 11, 2019. Click here to access the original article.  

In this special guest feature, Paul Grun and Doug Ledford from the OpenFabrics Alliance describe the industry trends in the fabrics space, its state of affairs and emerging applications.

The global high-performance computing (HPC) market is growing and its applications are constantly evolving. These systems rely on networks, often referred to as fabrics, to link servers together forming the communications backbone of modern HPC systems. These fabrics need to be high speed and highly scalable to efficiently run advanced computing applications. Often, there is also a requirement that the software that runs these fabrics be open source. It turns out that this description of high-performance fabrics is increasingly applicable to environments outside classical HPC, even as HPC continues to serve as the bellwether for the future of commercial and enterprise computing. Fortunately, the mission of the OpenFabrics Alliance (OFA) has recently been updated to include accelerating the development of advanced fabrics and importantly to further their adoption in fields beyond traditional HPC.

To learn more, we sat down with the leadership of the OFA, Chair Paul Grun and Vice Chair Doug Ledford.

insideHPC: What are the applications of high speed fabrics that benefit end users today? How did OFA contribute to these applications?

Paul Grun: Originally, ‘high-performance fabrics’ were associated with large, exotic HPC machines. But in the modern world, these fabrics, which are based on technologies designed to improve application efficiency, performance, and scalability, are becoming more and more common in the commercial sphere because of the increasing demands being placed on commercial systems. Scale out environments like cloud infrastructure, performance oriented environments like the financial services industry, and applications like data analytics, large databases, machine learning, and others are increasingly adopting these fabrics just to remain competitive. The acceptance of these fabrics in the commercial sphere is significant; we can measure the significance by the fact that the software driving these fabrics, originally developed by the OF A, has been adopted and embraced by the Linux community. And in many cases, it’s been adopted as part of standard Linux distributions.

As far as the OFA’s contributions, many people aren’t aware that the software driving these technologies was developed by the OpenFabrics Alliance beginning around 2004. As an alliance of fabric vendors, developers and consumers, we tend to toil behind the scenes so many people aren’t familiar with our organization. That said, if you are using a high performance fabric such as InfiniBand, RoCE, iWarp, Omni-Path or others, you are already using software originally developed by the OpenFabrics Alliance, even if you don’t realize it.

Doug Ledford: I think it’s also worthwhile to differentiate between modern fast interconnects and high-performance fabrics that include some sort of Remote Direct Memory Access (RDMA) capability. For instance, you can have 200 Gb Ethernet, but when you get to processing the data coming off of a network pipe at that speed, modern-day CPUs simply can’t keep up. RDMA-enabled high-speed fabrics have always been about the trade-off between CPU resources and memory consumption. They allow a person configuring their machine to add extra memory, then to utilize RDMA technology, and reap the benefit of increased CPU capacity for processing the data that they send and receive from the network. This means that the application gets the benefit of the CPU horsepower, instead of spending that horsepower on communications chores. Given that maximum CPU speed increases have largely stagnated in recent years (core counts have gone up, but the amount of work a single core can perform has not gone up much) and network speeds are increasing dramatically, RDMA technologies in all its various forms are becoming more and more relevant to applications across the board. As Paul pointed out, the OFA helped to originally create the primary RDMA software stack in use today, and the Alliance and its member companies are still very actively involved in the ongoing evolution of those technologies.

insideHPC: Who are the major contributors to this technology? How do they participate in the OFA?

Paul Grun: The OFA was originally formed as OpenIB in 2004 with a mission to prevent the splintering of a nascent technology, which Doug identified as RDMA, that was intended to produce a dramatic acceleration in fabrics and lead to breakthroughs in the performance and scalability of large systems. As you might imagine, our members typically include key network vendors, major OEMs, and system integrators. End users of network services have also become an increasingly important part of the OFA, since it is those applications that provide the requirements that continue to drive innovation in the fabric space. An important point though; the OFA is able to reap the benefits of engagements from major players in the network and system space who are not actually members of the Alliance. This is because the OFA is a relentlessly ‘open’ organization. Naturally, we hope that those organizations will see value in our work and choose to become members.

As far as how they participate, there are a number of avenues. First and foremost is by joining the Alliance as a ‘Promoter Member’, and taking a seat on the Board with the intention of guiding the OFA’s activities. Secondly is to participate in one of the Alliance’s technical working groups, such as the OpenFabrics Interfaces Working Group or the Enterprise Working Group. Another avenue is by taking advantage of the Alliance’s Interoperability and Logo Testing Program, or the newly emerging Distro Testing Program.

Given that a big part of our mission is to foster collaboration among providers, consumers, and vendors of high-performance fabrics, yet another route to participate is via the annual OFA Workshop, coming up during March 19-21 this year in Austin, Texas. This is really the only industry event of its kind focused on the role that networks play in driving forward scalable systems in HPC, the cloud, and commercial enterprises.

Doug Ledford: Hardware vendors are one major contributor, as are the National Labs that have traditionally used this software for years on end. Almost all of the entities that work in this space are involved with the OFA in one way or another. Even those that aren’t directly involved interact with our members who participate in other upstream projects (Linux kernel, RDMA-core, libfabric, etc.). So, at least indirectly, the OFA and its members participate with pretty much everyone working in this space.

insideHPC: Where do you see HPC penetrating in 5 years? How does this influence fabrics innovation?

Paul Grun: As I noted above, it’s becoming clear that the dependency on high-performance networks is growing beyond the traditional niche of HPC. As it has always been, HPC will continue to be the bellwether for high speed networking going forward, but it is quickly becoming clear that the commercial and enterprise spaces are not far behind. Looking out into the future, we see fabrics playing a key role in the adoption of, for example, persistent memory architectures that enable significant performance advances in graph analytics which depend on fast access to large graph structures over a network. And as Doug mentioned, the advent of multicore computing and the increasing use of GPUs and other acceleration technology all increase the importance of the fabric as a central actor in systems. And again, as the emphasis shifts away from compute technology per se, and instead focuses on data, the network becomes an integral part of how customers store, access, and manipulate data. In other words, the ability to manipulate data across a network becomes not only a realistic possibility, it becomes a requirement.

As far as innovation goes, we are seeing a trend toward consumer requirements to support a range of different fabrics and fabric types. System owners and application developers don’t necessarily want to be wedded to a particular network technology; they want applications to be portable to machines underpinned by different types of high-performance networks. That says something about the requirement for how network consumers actually access their network service. We are finding that fabric vendors are looking for help in ensuring that an application can use common methods to access the fabric regardless of the underlying technology, and this is exactly where the OFA plays to their advantage. For example, over the past few years, we’ve released a new family of APIs, known collectively as OpenFabrics Interfaces (OFI) with the specific objective of making the APIs comprising the family to be ‘fabric agnostic’ – we’ve abstracted the details of the fabric away from the network consumer while enabling the network vendors to differentiate themselves in their fabric offerings. All without sacrificing performance. Think back to the pivotal role played by TCP in driving the adoption of Ethernet. The software provided by the OFA does the same thing for high-performance networks.

Doug Ledford: In the next five years, I anticipate RDMA enabled fabrics making significant inroads into cloud infrastructure. One of the ways in which cloud infrastructure is organized, is as a large number of small service entities. These entities need to be able to pass messages quickly from one service to another. RDMA technology enables the passing of messages from one service to another with the minimal amount of CPU overhead possible. Anything that can be done in the way of improving the performance of an RDMA fabric for message passing between different services, such as containerization support, I expect to see influencing fabric design for next five years.

insideHPC: What is OFA strategy in keeping up with the new areas/applications of HPC?

Paul Grun: Offense is always the best defense. We keep up with new areas by constantly innovating in those areas. Or rather, by supporting our members who most certainly do create innovation.

One of the roles played by the OFA is to create an environment where collaboration and development can occur in ‘neutral territory’. Take the development of libfabric, for example. Libfabric is the first member of the OFI family I mentioned above; it was developed as a collaborative effort between network developers and the consumers of networks because everybody came to recognize the value in having a ‘standard’ API that both delivers high-performance to the consumer and supports a vendor’s unique offerings. For example, the MPI community played a crucial role in articulating the requirements that drove the development of libfabric. So, by focusing on the needs of the user community first, we ensure that we are always on the leading edge by delivering networking software stacks that are driven by design to meet the needs of the consumers of networks. This is a bit of a different approach to the traditional method of developing network software, where historically, a new network or network technology is developed along with its unique network stack to support it. With any luck, that stack works exactly the way the consumer would like to see it work. Sometimes it does, but in other instances the resulting network appears to the consumer like a compromise that the consumer would rather not make. Here in the OFA, we take a different approach by asking the consumers what network features should be exposed through the API and then providing an environment where network vendors can deliver products that provide those services.

Doug Ledford: When OpenIB was first formed, as Paul pointed out, its goal was to prevent the splintering of a nascent technology, RDMA and InfiniBand. That particular need hasn’t changed. Various vendors are always innovating and trying to reach the next level of HPC system performance. In the never-ending quest for the fastest interconnect, or the lowest CPU utilization, new features are written and the standard APIs must be extended. OFA members working upstream help to make sure that the core APIs that end users have been programming to for well over 10 years remain just as viable today as they always have been.

insideHPC: What are some of the challenges that you see around the corner in 2019?

Paul Grun: 2019 is going to be another big year for the OFA. This year we are focusing on a few items.
First, we are focused on a core technology by continuing to enhance our existing OFI family, and libfabric in particular, by adding for example, features to supportGPU co-processing, smartNICs and other techniques that reduce the burden on the network. We are also working hard on a collaboration project with SNIA (the Storage Networking Industry Association) to develop so-called ‘use cases’ for Remote Persistent Memory, which we suspect will be a significant influencer in server and data center architecture in the years ahead. These use cases will serve as the framework to guide us in developing new network features designed to support these new persistent memory technologies. The effort to develop these use cases is a good example of our requirements-driven approach to API development. By understanding the requirements of applications that want to access Remote Persistent Memory, we are in a much better position to propose APIs to meet those requirements.

Second, we are putting a lot of emphasis this year on expanding our existing Interop and Logo Testing Program. We want to build on the foundation of the existing program to bring more value to the industry as a whole and our members in particular. We’re doing this by looking for synergies among the elements that comprise our existing program, and by constantly asking the community which elements it values most. Is it through the regular debug events that allow vendors and distros to come together to test their latest innovations, or is it through a logo awarded by an independent testing agency such as the OFA? Or perhaps both.

Lastly, in 2018 we rolled out a new Mission Statement, which broadened our mission to ensure we are looking beyond HPC, and to be sure we are inclusive of all manner of high-performance network technologies. The new Mission Statement, which is to “accelerate the development and adoption of advanced fabrics for the benefit of the advanced networks ecosystem”, is the culmination of an effort to move the OFA beyond its original mission, which we had accomplished. As I described it above, that mission was essentially to foster the development and adoption of networks based on the InfiniBand Architecture. So, 2019 is the third year of an ambitious program to re-imagine and expand our role in the industry, and to put our efforts into making the industry, and our alliance members, as successful as possible.

Doug Ledford: I see two major challenges in 2019 and beyond. The first is containers. The desire to be able to use RDMA technology from a container is a high priority for many companies, but the technology is incomplete in existing Linux kernels. There are complex security issues around containers that have not been solved yet in the RDMA space. The standard networking stack has solved those issues, and it would be lovely if the RDMA stack could copy their work. But, because of the drastic differences between standard network devices and RDMA devices, this simply is not possible. For some of the RDMA networks, we can’t even reuse the SELinux security models. A standard network interface can be secured under SELinux using rules that relate to IP addresses and TCP/UDP ports. For an InfiniBand interface, this simply doesn’t work as many InfiniBand communications don’t involve an IP address. There is still lots of work to be done completing the container namespace work for the RDMA stack in the Linux kernel.

The second challenge is making sure that iWARP and RoCE devices are interoperable within their RDMA type. There is only one remaining vendor in the InfiniBand space, and one vendor in the OmniPath space, but there are multiple vendors in both the iWARP and the RoCE space. Many of these vendors have just entered the fray, and there are still others planning to enter it as well. It will be important in the coming months to make sure that all of these iWARP and RoCE devices can simply be dropped into an existing infrastructure with other brands of devices already in place and “just work”. While large InfiniBand and OmniPath clusters are usually bought as a single unit and then used until end of life, we anticipate that the RoCE and iWARP usage will be greatly different in that they will be used in the data center and the makeup of the connected devices can fluctuate at any time. This presents a new challenge for RDMA fabrics, and the interoperability testing will be vital to success.

Paul Grun, Chair, Open Fabrics Alliance, is a senior technologist in Cray’s Advanced Technology Development group. During his 40 year career he has been intimately involved in all aspects of server I/O beginning with storage for large mainframe systems, turning to high performance network architecture and now focusing on applying I/O technology to building large scale systems at Cray. His association with advanced networking technology stretches back to the genesis of InfiniBand when as a member of Intel’s Server Architecture Lab he contributed to the creation, development, and productization of the notion of high performance networks, going on to represent Intel to the InfiniBand Trade Association (IBTA). In the IBTA he has served as chair of the Technical Working Group, as chair and principle author for the RoCE (RDMA over Converged Ethernet) specification, and has served on the IBTA’s Steering Committee. He has been influential in the OpenFabrics Alliance since its inception and is currently serving as the OFA’s Chair, as well as serving as Co-Chair of the OpenFabrics Interfaces Working Group.

Doug Ledford, Vice Chair, Open Fabrics Alliance, is a Principal Software Engineer at Red Hat, Inc. He studied Computer Sciences at Southwest Missouri State University, where he was introduced to Linux back when it only ran from floppy discs. Doug led the team that built one of Springfield Missouri’s first Internet Service Providers, utilizing Linux for all of the servers. He became the defacto maintainer of the aic7xxx SCSI driver, and then a member of the Linux Maintenance Project. Now working for Red Hat. Doug has worked in a number of different areas in the Linux kernel, including SCSI drivers, SCSI mid layer, audio drivers, network drivers, MD software RAID, and the RDMA stack in the Linux kernel as well as the user space portion of the RDMA stack. Doug is currently an upstream maintainer of the Linux kernel’s RDMA stack and of the rdma-core user space project as well as a Red Hat Subject Matter Expert for the MD software RAID stack and the RDMA stack.

Registration is now open for the OpenFabrics Workshop, which takes place March 19-21 in Austin, Texas.