vSAN Hyperconverged Infrastructure Products

VMware Runs on vSAN – in a Big Way!

Originally posted here: How VMware IT Uses vSAN as Part of its Hyper-Converged Infrastructure Strategy

VMware IT team members Chanh Chi, Jerry Griffin, Tom Ralph, Larry Wong, and Zaigui Wang contributed to this article.

vSAN has had tremendous customer adoption since its release, especially in large-scale deployments.  Customers often want to know if VMware uses vSAN in its own mission-critical environments.

The answer is absolutely.  Some of the larger vSAN production deployments have been implemented by VMware IT. In this blog, we share how VMware runs its own mission-critical systems on vSAN as part of a hyper-converged infrastructure (HCI) platform.

Fast Configurations for Development/Test

VMware product development faces the same challenges as other development shops: giving developers easy access to environments so they can build and test code. We tackle this problem by using a virtualized infrastructure-as-a-service (IaaS) platform, an internal private cloud built by our engineers for orchestrating ongoing build deployments for development, testing, quality assurance (QA), and other activities.

Previously, capacity restrictions on LUNs made configurations difficult to execute. Storage configurations were customized to each project. Now vSAN offers the full aggregate space of the entire cluster, which offers a lot more flexibility for our users, who can access the storage resources they need as they need them.

Our IaaS platform is based on a self-service model. Developers can rapidly select and deploy configurations directly from the build repository. Configurations can be as small as a single virtual machine (VM) or as large as an entire cluster of hosts for system testing.  By running the configuration on vSAN our developers can very quickly and easily deploy VMs on-demand. vSAN has helped eliminate infrastructure as a bottleneck and relieved the capacity management burden.


Increased Flexibility

IT was an early adopter of vSAN. We have used many different deployment options, including software do-it-yourself (DIY) projects and engineered appliances. IT uses both VxRail and ReadyNode in this environment. Dell EMC VxRail provides a streamlined hardware, operating system, and software stack for implementing vSAN and ReadyNode for quickly deploying vSAN on any HCI appliance.

IT uses vSAN in three areas: (1) critical infrastructure services such as log management, telecom, voice over IP, MySQL, and Oracle databases; (2) footprint infrastructure services, such as DNS, DHCP, network security, LDAP, syslog, Active Directory domain controllers; and (3) network intelligence and other monitoring software.

IT must constantly weigh decisions around improving service while controlling costs. The promise of HCI was compelling enough that since vSAN 5.5, IT has accelerated its use of vSAN.  For example, the Dell EMC VxRail HCI appliance uses vSAN storage. Using VxRail, we can upgrade all the nodes, vCenter, and vSAN with one click. Having just one appliance to manage, simplifies our operations, and lowers our capital and operational costs.

Supporting the Private Cloud

A third vSAN use case is VMware’s cloud environment, which supports both development/test and private lab space as well as public-facing workloads: education/training services, Global Support Services, internal VDI, and the Hands-on-Labs (HOL).

One of the most interesting use cases is the HOL, a highly resilient environment that has been built using most of the VMware software stack. Users access the HOL to evaluate the features and functionality of VMware products without installing it.

HOL utilizes a hybrid cloud environment that leverages both on-premises and public cloud environments. Its operation across clouds is completely invisible to users, who may be served by one of four different cloud environments located in different parts of the world.  Regardless of where the workloads are hosted, they share common storage policies, thanks to the abstraction offered by vSAN.

The VMware cloud team was initially running the HOL on a traditional legacy configuration with servers discrete from the external SAN. The legacy architecture could scale as needed, but the variable workload pattern and increasing consumption made it unpredictable. The annual VMworld user conference is a good example of the challenges presented by this workload. The HOL needs to deliver hundreds of VMs per minute to up to 500 concurrent users. The HOL sets up and tears down more than 100,000 VMs during the five-day event.  Individual labs range in size from a few hundred GB to multi-TB in size.

Initial testing with all-flash vSAN for the HOL delivered the scale and the stability that was needed. Tests blew away everything else we had tried at that point. The traditional legacy configuration of discrete components offering about 150,000 IOPS at peak was replaced by an HCI running vSAN and delivering more than 620,000 IOPS in an on-premises environment. The HCI environment not only scaled across multiple private and public cloud instances, it delivered faster, more stable performance.

Scale and Performance

As the first adopter of its own products, VMware has firsthand experience in deploying all of its products. The vSAN footprint includes more than 27,000 VMs and close to 7 PB of vSAN capacity provisioned.

The growth is based on vSAN’s proven ability to support highly critical systems in large production environments. By deploying early releases of vSAN in large HCI environments, IT can identify issues that R&D can fix before the product is generally released.  This helps our customers have confidence in using vSAN in their large-scale deployments.

VMware on VMware blogs are written by IT subject matter experts sharing stories about IT’s transformation journey using VMware products and services in a global production environment.  Visit our portal to learn more.


3 comments have been added so far

  1. Nice article. I would like to hear how your Oracle instances are built and licensed. Are those on separate clusters where number of sockets is minimised or what is your strategy here?

  2. I know this is to showcase vSAN – but we rarely get a lot of information from VMware on VMware as far as architecture and design. So please entertain my questions 🙂
    1. Business customers have to drive density and cost efficiency. Why are your Nimbus and IT Production environments so sparse? Assuming 2 socket nodes and you’re at 6 and 3:1 socket ratio. Real world, we couldn’t support this density without significant resource waste. The Private Cloud environment is significantly denser!
    2. The spare capacity in the 2 environments above, have you considered bolstering your HOL capability with this spare capacity? Sandbox to use overhead HA capacity?
    3. What is your vCenter: host ratio? Are you testing real-world scalability and performance of your management products? Is this being used for VMware Validated Designs? All of this infrastructure could fit on 2 vCenters, how many are you using?
    4. I know you outlined the workloads being hosted on the VMware Private Cloud – but 620,000 IOPS does not seem like enough across 540 hosts and 25,000 VMs! Unless most of this is on-demand provisioning, VDI or similar. That is only 130 IOPS per TB capacity. I’m assuming you have performance tiers, all-flash mixed in and similar – but it still should be significantly higher, right?
    5. With the HOL churn and similar VDI churn, how are you handling product maximums with vRealize Operations and vRealize Business? These have been a pain point for VMs that are temporary. Are you providing cost metrics back for these temporary systems that may be online less than an hour?

    Look forward to seeing more about VMware on VMware! Thank you!

Leave a Reply

Your email address will not be published.