Product Announcements

Announcing vSphere Bitfusion – Elastic Infrastructure for AI/ML Workloads

(Update: Missed the launch event for Bitfusion? Nope, you didn’t! Please watch the recording!)

I am excited to be announcing vSphere Bitfusion today and be delivering it by the end of July 2020 to customers. This has been VMware’s goal since the acquisition of Bitfusion late in 2019. It is now an integrated feature of vSphere 7 and will also work with vSphere 6.7 environments and higher (on the Bitfusion client side – server side will require vSphere 7). From a packaging perspective, vSphere Bitfusion will be an add-on feature for the vSphere Enterprise Plus edition.

Our customers have been growing by leaps and bounds in terms of the deployment of AI/ML apps and more of those apps are being put on VMware than ever before. We want to accelerate this trend and now have an optimized platform that allows the use of hardware accelerators, such as a GPU, in a way that has never been offered before. vSphere Bitfusion now has a vCenter Server plugin to allow management and configuration from within the vCenter UI.

Let me also address some of the key questions.

What is vSphere Bitfusion?

vSphere Bitfusion delivers elastic infrastructure for AI/ML workloads by creating pools of hardware accelerator resources. The best-known accelerators today are GPUs which vSphere can now use to create AI/ML cloud pools that can be used on-demand. GPUs can now be used efficiently across the network and driven to the highest levels of utilization possible. This means it allows for the sharing of GPUs in a similar fashion to the way vSphere allowed the sharing of CPUs many years ago. The result is an end to isolated islands of inefficiently used resources. End-users and service providers (wanting to off GPU as a Service for example) are going to see big benefits with this new feature.

Image of how Bitfusion Shares GPUs

 

What operating systems does Bitfusion run on?

Bitfusion runs on Linux for both client and server components. The client side has support for Red Hat Enterprise Linux, CentOS Linux, and Ubuntu Linux while the server side runs as a virtual appliance built on PhotonOS from VMware with vSphere 7.

Does Bitfusion work for desktops, too?

This Linux-based technology is for AI/ML apps running TensorFlow or PyTorch machine learning software and does not apply to graphics or rendering.

Do I have the right workload or environment for Bitfusion?

Walk through the following questions to see if the environment you operate is a good fit:

  • Are you running CUDA applications?

Bitfusion is a CUDA application — it uses the CUDA API from NVIDIA that to allow programmers to access GPU acceleration. Bitfusion technology uses GPUs by intercepting CUDA calls, which means that it does NOT address VDI or screen graphics use cases. It is intended for AI/ML applications using AI/ML software such as PyTorch and Tensorflow. It works well in ML environments that focus upon training and inference.

  • Are you hoping to address low GPU utilization (idle GPUs)? Inefficient GPU use (apps using only a portion of the GPU compute)?

Utilization and efficiency are the major benefits of Bitfusion — greater value from the investment in GPU hardware.

  • Can you meet networking requirements of 10 Gbps+ and 50 microseconds latency or less between the application nodes and GPU nodes?

Why is Bitfusion interesting to customers?

We have seen a number of use cases around Bitfusion across many different types of verticals. GPUs can be use in many different ways but just about everyone agrees that any AI/ML workload with require this type of resource. In addition, the inability to share these resources is a consistent challenge as we have highlighted above in the overview. vSphere Bitfusion can create that shared model IT organizations are looking for when it comes to GPU resources for use with AI/ML workloads.

Some of the verticals and use cases we have seen are as follows:

  • Financial services – understanding who are the best candidates for a loan or mortgage. Risk-analysis across the board.
  • Retail – understanding if store shelves are full or empty, using images to spot if shoppers are hurt on the floor, or helping to spot fraud at the checkout stand
  • Covid-19 – looking to see where ventilators are around the hospital and if they are available, vaccine research
  • Manufacturing and Shipping – understand if trucks transporting goods are full or empty without opening them up while entering or exiting a facility, counting animal purchases as they enter the meat packing plant
  • Higher Education – students having access to GPU resources for their engineering assignments both in and outside the classroom
  • Cool, new stuff – autonomous vehicles and smart city projects
  • Classic research applications using PyTorch and TensorFlow AI/ML software and models

 

Where can I find more information?

On June 2, 2020 we are holding an event with Dell to introduce Bitfusion. Please join us, or go to the link afterwards to view the recording! The event will also be available for replay if you cannot make the live broadcast.

VMware is not only discussing Bitfusion but also how we are working with Dell to deliver specific solutions for AI/ML with features like Bitfusion and also with the VMware Cloud Foundation.

VMware also has two additional blog posts on this announcement, on our AI/ML blog:

Can’t wait to get started with Bitfusion? Send us a note at [email protected] and we can help you get going with the next steps and getting your journey started.

Other Helpful Links:

Go Bitfusion!

As always, thank you.

– Mike Adams, Senior Director, CPBU,  AI/ML Market Development