Machine Learning Artificial Intelligence Deep Learning

Elastic ML/AI Deployment Principles

The Machine Learning market is still in its infancy with many moving parts. The expectation is that it will evolve to a trillion-dollar market. As part of this market, the infrastructure supporting ML draws-in several hardware technologies (beyond GPUs, such as AI ASICs and co-processors) that have the promise of 50x performance gains or more. To assure success at scale for any new emerging hardware, virtualization is needed. Virtualization dictates that the hardware can be shared, abstracted and be presented as logical and isolated entities to the upper software stack. Said differently, it can be partitioned and serve more than one application instance.

There are a few architecture principles that will ease the deployment and development of a new ML virtualization technology. The technology:

  • should be agnostic to the upper layers of the software stack. This means that the existing ML models will run, unchanged, on top of the virtualization layer. No additional dependencies, drivers, patches or modifications will be needed. The ML applications run “as is”.
  • should be agnostic to the lower layers of the software stack. This means that there will not be any needed changes to operating system, hypervisor or environment. The new ML virtualization will ‘slide-in’ to the stack with a painless and fast installation.
  • should be agnostic to the network layer. Particularly where the ML virtualization supports disaggregation (attaching remote ML hardware to the CPU workspace over the network), it must work with the pre-installed network layer, whether it is Ethernet, InfiniBand or RoCE. As a side requirement, this also means that the network APIs or sockets should be used as-is with no add-ons or modifications.
  • should operate on any hardware (any server hosting the new ML hardware – just a generic server)
  • Minimizes any performance penalty. This requirement is a general guideline for all virtualization technologies not only for ML/AI

One of the obvious outcomes of these principles, is that the same ML virtualization architecture will operate in any data center setting: cloud, private, edge, carrier, etc.

Bitfusion virtualization, now being integrated into vSphere, adheres to all of these architecture principles. Therefore, it can be installed and initialized in all Linux and x86 server environments and will not dictate any modifications of other components of the software stack. Further, it will operate with any CUDA-based physical GPU and with any pre-installed network.

Bitfusion software was designed as a user-space based virtualization technology. The vision driving it was that virtualization is pushed higher in the stack, thereby minimizing any changes and proprietary handling of drivers, kernels, and hardware devices. Taking a broader view of the industry evolution, it is clear that containers, Kubernetes and other software stacks are all residing above the OS, unlike the prior generation which was closer to the metal. Developing user space virtualization by leveraging the full capabilities of the Linux OS, allows the support of new hardware, networking stacks, and PCIe devices in an accelerated way. It is another big plus for the Bitfusion architecture.

Using vSphere and Bitfusion is not only the right choice for ML infrastructure virtualization, but also the best forward-looking architecture selection for upcoming support of more AI-specific hardware.

Questions or comment? Contact us as


Leave a Reply

Your email address will not be published.