This post was originally written by Micheal Zimmerman Bitfusion CEO March 21, 2018
Programmable, pooled data center infrastructure has shaped new platforms development over the last few years. Unpredictable resource consumption, with the need to provision compute, storage and networking on-demand, at capacity when the need arises drove the hyperconvergence trend. While compute, storage and networking are well understood and can be provisioned per demand (with platforms such as Cisco HyperFlex and Cisco UCS), machine learning and AI workloads brought a new variable to the equation – GPUs. GPUs today are not part of the hyper converged architecture, and need to be hard configured to a workload, with no elasticity, flexibility, or sharing across users. Essentially GPUs are at the same place storage was 15 years ago.
Bitfusion has been working on this problem for three years. We are pleased to announce that our FlexDirect GPU hypervisor supports virtualization and AI attached network. With Bitfusion one common and shared pool of GPUs (either GPU blades or cluster of GPU servers) allows any AI workload (be it a container, VM, bare metal or other) to remotely attach (over Ethernet), on-demand and in real-time to any number of GPUs and execute the AI workload. Essentially, we are creating GPU attached network, very much like storage attached network. With Bitfusion, the ability to scale out the GPU resources, make a consumption-based infrastructure is extended to AI. Now GPUs are part of the elastic, on-demand, shared resource of the data center. In addition, the Bitfusion technology will extend to support other AI acceleration technologies such as FPGAs.
Bitfusion and Cisco are collaborating to implement this vision to extend the unique value proposition of Cisco’s UCS series of servers, HyperFlex products and networking technologies to include GPUs. The same way storage, compute and network resources are assigned to workloads (in VMs), so is the GPUs resource. Bitfusion allows the network to assign fraction of a GPU, one GPU or more (in any place in the network – same hardware or different disparate hardware) to a workload, execute the workload and release the GPUs back to the pool. We see Cisco in a unique position to innovate with AI attached network with their expertise in low latency, high bandwidth networking. Creating a highly efficient shared AI cluster is highly dependent on network performance.