I recently had an opportunity to present at vForum 2013 in Mumbai, the Financial Capital of India. With more than 3,000 participants and two days of events, it was definitely one of the biggest customer events in India. Along with my team, I represented VMware Professional Services and presented on the following topic: “Architecting vSphere Environments – Everything you wanted to know!”
When we finalized the topic, I realized that presenting this topic in 45 minutes is next to impossible. With the amount of complexity that goes into Architecting a vSphere Environment, one could easily write an entire book. However, the task at hand was to keep it to the length of a presentation.
As I started planning the slides, I decided to look at the architectural decisions, which in my experience are the Most Important Ones, since they can make or break the virtual infrastructure. My other criterion was to ensure I talk about the Grey Areas where I always see uncertainty. This uncertainty can transform a good design into a bad one.
At the end I was able to come out with a final presentation which was received very well by the attendees. I thought of sharing the content with the entire community through this blog post. This is part 1, where I will give you some key design considerations for designing vSphere Clusters.
Before I begin, I also want to give the credit to a number of VMware experts in the community. Their books, blogs and the discussions I have had with them in the past helped me in creating this content. This includes books and blogs by Duncan, Frank, Forbes Guthrie, Scott Lowe, Cormac Hogan and some fantastic discussions with Michael Webster earlier this year.
Before we begin here is a small graphical disclaimer:
The message behind the slide above is to create vSphere Clusters based on the purpose they need to fulfill in the IT landscape of your organization.
The management cluster refers here to a 2- to 3-host ESXi host used by the IT team to primarily host all the workloads that are used to build up a vSphere Infrastructure. This includes VMs such as vCenter Server, Database Server, vCOps, SRM, vSphere Replication Appliance, VMA Appliance, Chargeback Manager, etc. This cluster can also host other infrastructure components such as active directory, backup servers, anti-virus etc. This approach has multiple benefits such as:
- Security due to isolation of management workloads from production workloads. This gives complete control to the IT team on the workloads, which are critical to manage the environment.
- Ease of upgrading the vSphere Environment and related components without impacting the production workloads.
- Ease of troubleshooting issues within these components since the resources such as compute, storage, and network are isolated and dedicated for this cluster.
- The number of ESXi hosts in a cluster will impact your consolidation ratios in most cases. As a rule of thumb, you will always consider one ESXi host in a 4-node cluster for HA failover (assuming), but you could also do the same on a 8-node cluster, which ideally saves one ESXi host for you for running additional workloads. Yes, the HA calculations matter and they can be either on the basis of slot size or percentage of resources.
- Always consider at least one host as a failover limit per 8 to 10 ESXi servers. So in a 16 node cluster, do not stick with only one host for failover, look for at least taking this number to two. This is to ensure that you cover the risk as much as possible by providing an additional node for failover scenarios.
- Setting up large clusters comes with its benefits, such as higher consolidation ratios etc. But they might have a downside as well if you do not have enterprise-class or rightly sized storage in your infrastructure. Remember, if a datastore is presented to a 16-node or a 32-node cluster, and if the VMs on that datastore are spread across the cluster, chances are you might get into contention for SCSI locking. If you are using VAAI, this will be reduced by ATS; however, try to start small and grow gradually to see if your storage behavior is not being impacted.
- Having separate ESXi servers for DMZ workloads is OLD SCHOOL. This was done to create physical boundaries between servers. This practice is a true burden carried over from the physical world to the virtual. It’s time to shed that load and make use of mature technologies, such as VLANs to create logical isolation zones between internal and external networks. Worst case, you might want to use separate network cards and physical network fabric, but you can still run on the same ESXi server, giving you better consolidation ratios and ensuring the level of security required in an enterprise.
Quick Tip: Ensure that this cluster is a minimum 2-node cluster for vSphere HA to protect workloads in case one host goes down. A 3-node management cluster would be ideal, since you would have the option of running maintenance tasks on ESXi servers without having to disable HA. You might want to consider using VSAN for this infrastructure as this is the primary use case that both Rawlinson & Cormac suggest. Remember, VSAN is in beta right now, so make your choices accordingly.
As the name suggests this cluster would host all your production workloads. This cluster is the heart of your organization as it hosts the business applications, databases, and web services. This is what gives you the job of being a VMware architect or a virtualization admin.
Here are a few pointers to keep in mind while creating production clusters:
- The number of ESXi hosts in a cluster will impact you consolidation ratios in most of the cases. As a rule of thumb, you will always consider one ESXi host in a 4-node cluster for HA failover (assuming), but you could also do the same on a 8-node cluster, which ideally saves one ESXi host for you for running additional workloads. Yes, the HA calculations matter and they can be either on the basis of slot size or percentage of resources.
- Always consider at least 1 host as a failover limit per 8 to 10 ESXi servers. So in a 16 node cluster, do not stick with only 1 host for failover, look for at least taking this number to 2. This is to ensure that you cover the risk as much as possible by providing additional node for failover scenarios
- Setting up large clusters comes with their benefits such as higher consolidation ratios etc., they might have a downside as well if you do not have the enterprise class or rightly sized storage in your infrastructure. Remember, if a Datastore is presented to a 16 Node or a 32 Node cluster, and on top of that, if the VMs on that datastore are spread across the cluster, chances that you might get into contention for SCSI locking. If you are using VAAI this will be reduced by ATS, however try to start with small and grow gradually to see if your storage behavior is not being impacted.
Having separate ESXI servers for DMZ workloads is OLD SCHOOL. This was done to create physical boundaries between servers. This practice is a true burden which is carried over from physical world to virtual. It’s time to shed that load and make use of mature technologies such as VLANs to create logical isolation zones between internal and external networks. In worst case, you might want to use separate network cards and physical network fabric but you can still run on the same ESXi server which gives you better consolidation ratios and ensures the level of security which is required in an enterprise.
They sound fancy but the concept of island clusters is fairly simple: run islands of ESXi servers (small groups) that can host workloads with special license requirements. Although I do not appreciate how some vendors try to apply illogical licensing policies on their applications, middle-ware and databases, this is a great way of avoiding all the hustle and bustle created by sales folks. Some examples of island clusters would include:
- Running Oracle Databases/Middleware/Applications on their dedicated clusters. This will not only ensure that you are able to consolidate more and more on a small cluster of ESXi hosts and save money but also ensures that you zip the mouth of your friendly sales guy by being in what they think is license compliance.
- I have customers who have used island clusters of operating systems such as Windows. This also helps you save on those datacenter, enterprise, or standard editions of Windows OS.
- Another important benefit of this approach is that it helps ESXi use the memory management technique of Transparent Page Sharing (TPS) more efficiently since there are chances that you are running a lot of duplicate pages spawned by these VMs in the physical memory of your ESXi servers. I have seen this reach 30 percent and can be fetched in a vCenter Operations Manager report if you have that installed in your virtual infrastructure.
With this I would close this article. I was hoping to give you a quick scoop in all these parts, but this article is now four pages. I hope this helps you make the right choices for your virtual infrastructure when it comes to vSphere Clusters.
This post originally appeared on Sunny Dua’s vXpress blog, where you can find follow-up posts 2 and 3. Sunny Dua is a Senior Technology Consultant for VMware’s Professional Services Organization, focused on India and SAARC countries.