posted

3 Comments

vSAN has had tremendous customer adoption since its release, and of course people are often quite interested to hear about environments that have had great success and done deployments at a very large scale. What about VMware itself?  Do we use vSAN for mission critical environments?  Absolutely.  One of the larger deployments of vSAN into production has been done by VMware itself, and I’m quite pleased to share a few of the deployment details for these critical environments.

While there are of course many environments here at VMware where we use vSAN (my own group probably has around a dozen clusters deployed) there are a few quite major deployments I’d like to illustrate today to show not only our confidence in vSAN as a hyperconverged infrastructure platform, but also that we run mission critical systems on vSAN at a very impressive scale.

Let’s take a look at some of these use cases we have for vSAN HCI at VMware.

“Nimbus”

For product development at VMware, we face the same challenges as any other development shop: How do we allow developers easy access to environments they can use to build, sandbox, and test code?  One way we tackle this problem is to use a virtualized infrastructure-as-a-service platform called Nimbus.  This is an internal private cloud built by our engineers for orchestrating ongoing build deployments for their development, testing and QA and other purposes involved in the creation of VMware software.  Within that environment, self-service is given to our developers so they can rapidly deploy configurations directly from our build repository, a task that was made more difficult previously due to capacity restrictions on LUNs acting as a capacity boundary.  Configurations of a deployment in this environment can be as small as a single VM or as large as an entire cluster of hosts that might be needed for extensive system tests, and therefore these environments may be quite variable in size. vSAN, offering the full aggregate space of the entire cluster offers a lot more flexibility in this regard.  Deploying Nimbus on vSAN has helped reduce the management burden associated with capacity considerations of a LUN-based environment, freeing up our developers to get to work as quickly as they need without the infrastructure itself becoming a bottleneck.

The current vSAN deployment that supports Nimbus consists of:

  • 10 separate clusters
  • 148 hosts
  • 1800 virtual machines
  • 876 TB of storage

This critical aspect of our business, development and testing, has been dramatically simplified by running the configuration on vSAN, offering our developers the ability to very quickly and easily deploy VMs on-demand.

 

IT Production

Our IT groups have likewise found increased flexibility with vSAN and now have numerous deployments of vSAN for our IT infrastructure itself, and this footprint is growing dramatically.  IT was an early adopter of vSAN from the very first releases of vSAN 5.5, and have used many different deployment options including software “DIY” and engineered appliances with vSAN. We are currently using both VxRail and Readynodes in this environment to provide critical infrastructure services such as log management, telecom, VOIP, MySQL and Oracle databases, Footprint Infrastructure Services (DNC, DHCP, Infoblox, LDAP, Syslog, AD Domain Controllers), other monitoring services such as Thousand Eye appliances, and also VMware campus WiFi.

Decisions around reducing cost and management as well as considerations about environmental factors are common to all IT divisions, and VMware is no exception.  The promise of reducing these costs with HCI were immediately compelling to our IT group, and from vSAN 5.5 on they have been accelerating the use of vSAN.  Seeing the average cluster deployment move from 10U to 4U with VxRail showed immediate benefit, and adopting HCI with vSAN has helped them operationally with, for example, the ability to move away from “forklift upgrades”.

What, then, does the VMware IT vSAN footprint look like?

  • 53 clusters (predominantly VxRail)
  • 255 hosts
  • 1556 virtual machines
  • 1620 Terabytes of storage.

 

VMware’s Private Cloud

This is another cloud environment, albeit with a very interesting architecture, and a highly visible set of workloads. On top of offering private lab space for groups and teams, and for assisting with other internal tasks like test/dev, our internal cloud environment also serves extremely important public-facing workloads:  Education services runs our class environments here; Global Support Services uses it to reproduce customer environments; VMware runs an internal VDI instance to serve our users across the globe; and very visibly, our Hands-on-Labs run on a genuine hybrid cloud environment that federates across geographical boundaries with instances both on-premises and in partnership with public cloud environments.

This last workload is completely invisible to the users, but when participating in an HoL, a user may be served by one of four different cloud environments, located in different parts of the world hosted either directly by VMware or by one of our partners… all with common storage policies due to the abstraction offered by vSAN.

This is a highly resilient environment that has been built using most of the VMware software stack including NSX, vCloud Director, Horizon View and App Volumes, the vRealize suite of products among other components, and of course, vSAN!

Our cloud team was initially running this environment on a traditional legacy configuration with servers discrete from the external SAN.  They were finding that with this legacy architecture, they were reaching the scale they needed, but found the increasing consumption and variable workload pattern of the VMware Hands-on-Labs could sometimes tax the systems unpredictably.  For example, during VMworld the HoL can deliver many hundreds of VMs per minute, over a thousand concurrent users, well over 100,000 VMs for the event, with individual labs that range in size from a few hundred GB to multi-TB in size.

As an internal service provider, they were intrigued by the technology of vSAN and were eager to assist with development by helping with scale and usability testing of early versions.

Initial testing with all-flash vSAN for the Hands-on-Labs environment of the cloud blew away everything else they had tried at that point.

In moving from a traditional legacy configuration of discrete components to a hyperconverged infrastructure running vSAN, they moved from a storage platform offering about 150,000 IOPS at peak, to vSAN delivering over 620,000 IOPS in an on-premises environment.

The current footprint of vSAN in this environment?

  • 56 clusters
  • 540 hosts
  • 25,000 virtual machines
  • 4,724 Terabytes of storage

 

Totals

Keeping count on these environments?  Let’s recap the vSAN deployments:

Hosts

 

Clusters

 

VMs

 

TB Capacity 

 

Nimbus

 

148

 

10

 

1800

 

876

 

IT Production

 

255

 

53

 

1556

 

1620

 

Internal Cloud

 

540

 

56

 

25000

 

4724

 

 

At VMware in three of our major IT environments we have a vSAN footprint of 119 clusters across 943 hosts.

This is serving almost 30,000 VM’s and 7.2 Petabytes of data!

Does vSAN scale well for large customer environments?  Can you run critical systems on it? Can you save time and money while increasing IT flexibility?  Our internal teams put these questions to the test and found they could, and the result is a vSAN implementation that rivals some of our largest customer deployments!

One last thing: If you’re heading to VMworld and are Interested in more details on how VMware uses vSAN in its private cloud SDDC, you’re going to want to attend session STO3190BU!  Lots of great information in there from Henry Bauer (DevOps core services) and Kris Groh (vSAN product management).

-Ken