With the recent announcement and GA of vSphere Bitfusion, what better way to learn about all the features and use cases than with a #vSphereChat? Joining us to answer ten rapid-fire questions were our very own experts: Jim Brogan, (@brogan_record), Mike Adams (@mikej_adams), and Niels Hagoort (@nhagoort). From operating systems to GPU partitioning to vSphere integrations, they covered it all! Keep reading to check out any of the tweets you may have missed.
A1-1. As the early days of VMware when compute resource sharing was introduced, VMware vSphere Bitfusion introduces GPU sharing for ML applications such as TensorFlow and Pytorch. Bitfusion shares GPUs in two ways
A1-2. Remote access: clients can allocate GPUs from pools of GPU servers across the network, then run their ML application with no modification. CUDA API calls are intercepted and run on the remote GPUs.
A1-3. GPU partitioning: Bitfusion can allocate an arbitrarily-sized slice of a GPU. Allowing multiple applications and clients to share a physical GPU concurrently. An important aspect of this sharing is that it is done dynamically; no machines need to be spun up or down. GPUs are deallocated and returned to the pool when an application or session completes.
A1-4. Bitfusion has a vCenter GUI plug-in for management and visibility of the GPUs in the pool.
A2. Bitfusion works on RHEL 7, CentOS 7, Ubuntu 16.04, and Ubuntu 18.04
A3. We work with many types of partners to promote and utilize Bitfusion. First, we have had a long standing relationship with NVIDIA and work together to utilize GPUs. We also work with Dell and many of their server models that contain GPUs (C4140 as an example).
A4-1. On the one hand, we don’t really focus on particular use cases, or verticals because Pytorch and TensorFlow applications don’t. On the other, we do focus on PyTorch and TensorFlow themselves, though other applications also work.
A4-2. But on the “third” hand, some of the exciting use-cases and verticals we like are image recognition and classification, risk analysis, GPUaaS, loss prevention, financial services, retail, manufacturing, automotive, and Higher Ed/Research.
A4-3. And looking at infrastructure use cases, rather than apps, edge computing is a particularly tough or expensive place to populate with high GPU counts–sharing on the edge is very interesting.
A4-4. I should mention that Bitfusion works for both training and inference.
A5-1. The principal problem is that you can’t buy GPUs for everyone who wants them, who needs them. They are expensive and tied to a single machine. Until now, they were hard to share.
A5-2. It’s hard to get good numbers, but on average they would seem to sit idle 85% the time. With Bitfusion GPU sharing, everyone gets what they need.
A6-1. The first answer is always aimed at admins who have limited budgets and want to get more use out of the GPUs they already own.
A6-2. Many AI and machine learning (ML) apps do so much computation that they run forever if you do not have a GPU for hardware acceleration.
A6-3. On the other hand, when an expensive GPU is dedicated to a single machine, it is very difficult to keep it busy. Users can have work to do in between runs, and can go home in the evening.
A6-4. Even production environments can be very bursty. So sharing can increase the utilization.
A6-5. But there are benefits for the users too. Users a) don’t have to coordinate with each other to share GPUs; b) they don’t have to shut down machines to pass GPUs to other machines; c) they don’t have to port their applications;
A6-6. d) they can use more GPUs than they could previously afford; e) they can experiment with more GPU models than they would previously have access to (e.g. T4 vs. V100)
A7-1. Some apps do not use all of a GPU’s resources. Some models are small, some inference jobs may leave lots of headroom. Partitioning GPUs lets multiple applications share the same GPU concurrently.
A7-2. A paper we released with GA gives a detailed inference use case.
A7-3. Another interesting benefit is give a large partition, say 75% of a GPU, to a long running application, but leave 25% of the capacity to smaller applications that can sneak in without waiting for the long-running app to complete
A7. Exactly! Increased efficiency. A use-case could be to support more users in the test and development phase.
A8-1. Bitfusion registers a plug-in with vCenter. vCenter, then, manages the machines using Bitfusion. It authorizes and configures clients to use Bitfusion servers. It can expand or modify the pool of GPU servers.
A8-2. It displays allocation and utilization statistics, history, and charts. It can terminate sessions, set limits on clients, set idle timeouts — all to help with fair use of the resources.
A8. The beauty of the vSphere integration is that Bitfusion is completely managed from within the vSphere Client. No need to log into another UI!!
A9. I would say; Efficiency! Both from a cost and performance perspective!!
A9-1. I like that one too. Sharing always leads to better efficiency.
A9-2. The ability to share GPUs. Everyone we talk to wants a shared service or GPU as a Service capability.
A10-1. It would be a lot of fun to compare Bitfusion to a Maserati or something, but like other infrastructure technologies, a truer analogy isn’t as exciting, even though it may be extremely useful or important.
A10-2. Bitfusion is more like a rental agency with a garage full of Maserati’s or passenger buses. It lets a lot of people use vehicles a lot more economically.
A10. Tough question. <insert any hypercar here> Maybe a high performance, remote controlled car??
Whether you followed along in real time or caught up with this recap, we hope you enjoyed our latest vSphere Tweet Chat. Stay tuned to our Twitter account (@VMwarevSphere) for details about our next chat. Have a topic idea? Reach out to us anytime on Twitter!