Home > Blogs > VMware Consulting Blog > Tag Archives: Jonathan McDonald

Tag Archives: Jonathan McDonald

VMware Validated Design for SDDC 3.0 – Now Available!

Jonathan McDonaldBy Jonathan McDonald

I mentioned all the fun details on the VMware Validated Design in my previous blog post. I am happy to report that we have just released the next revision of it, version 3.0. This takes what everyone already knew and loved about the previous version—and made it better!

In case you have not heard of VMware Validated Designs, they are a construct used to build a reference design that:

  • Is built by expert architects who have many years of experience with the products, as well as integrations
  • Allows repeatable deployment of the end solution, which has been tested to scale
  • Integrates with the development cycle, so that if an issue is identified with the integrations and scale testing, it can be quickly identified and fixed by the developers before the products are released

All in all, this is an amazing project that I am excited to have worked on, and I am happy to finally talk about it publicly!

What’s New with the VMware Validated Design for SDDC 3.0?

There are quite a lot of changes in this version of the design. I am not going to go into every detail in this blog, but here is an overview of the major ones:

  • Full Dual Region Support—Previously, in the VMware Validated Design, although there was mention made of having dual sites, there was only implementation guidance for a single site. In this release we have full guidance and support on configuring a dual region environment.
  • Disaster Recovery Guidance—With the addition of dual region support, guidance is needed for disaster recovery. This includes installation, configuration, and operational guidance for VMware Site Recovery Manager, and vSphere Replication. Operationally, plans are created to not only allow for failover and failback of the management components between sites, but also to test these plans as well.
  • Reduced minimum footprint with a 2-pod design —In the prior versions of the VMware Validated design, we focused on a 3-pod architecture. This architecture used 12 ESXi hosts as a minimum recommended architecture:
    • 4 for management
    • 4 for compute
    • 4 for the NSX Edge cluster

In this release the default configuration is to use a 2-pod design which collapses the compute and Edge clusters. This allows for the minimum footprint to be 8 ESXi hosts:

  • 4 for management
  • 4 for shared Edge and compute functions

This marks a significant reduction in size for small or proof-of-concept installations, which can be later expanded to a full 3-pod design if required.

  • Updated bill of materials—The bill of materials has been updated to include new versions of many software components, including NSX for vSphere and vRealize Log Insight. In addition, Site Recovery Manager and vSphere Replication have been added to support the new design.
  • Upgrade Guidance—As a result of the upgraded bill of materials, guidance has been provided for any component which needs upgrading as a result of this revision. This guidance will continue to grow as products are released and incorporated into the design.

The good news is that the actual architecture has not changed significantly. As always, if a particular component design does not fit the business or technical requirements for whatever reason, it can be swapped out for another similar component. Remember, the VMware Validated Design for SDDC is one way of putting an architecture together that has been rigorously tested to ensure stability, scalability, and compatibility. Our design has been created to ensure the desired outcome will be achieved in a scalable and supported fashion.

Let’s take a more in-depth look at some of the changes.

Virtualized Infrastructure

The SDDC virtual infrastructure has not changed significantly. Each site consists of a single region, which can be expanded. Each region includes:

  • A management pod
  • A shared edge and compute pod
    jmcdonald_compute-management-pod

This is a standard design practice that has been tested in many customer environments. The following is the purpose of each pod.

Management Pod

Management pods run the virtual machines that manage the SDDC. These virtual machines host:

  • vCenter Server
  • NSX Manager
  • NSX Controller
  • vRealize Operations
  • vRealize Log Insight
  • vRealize Automation
  • Site Recovery Manager
  • And other shared management components

All management, monitoring, and infrastructure services are provisioned to a vCenter Server High Availability cluster which provides high availability for these critical services. Permissions on the management cluster limit access to only administrators. This limitation protects the virtual machines that are running the management, monitoring, and infrastructure services.

Shared Edge and Compute Pod

The shared edge and compute pod runs the required NSX services to enable north-south routing between the SDDC and the external network and east-west routing inside the SDDC. This pod also hosts the SDDC tenant virtual machines (sometimes referred to as workloads or payloads). As the SDDC grows, additional compute-only pods can be added to support a mix of different types of workloads for different types of SLAs.

Disaster Recovery and Data Protection

Nobody wants a disaster to occur, but in the worst case in case something does happen, you need to be prepared. The VMware Validated Design for SDDC 3.0, includes guidance on using VMware Products and technologies for both data protection and disaster recovery.

Data Protection Architecture

VMware Data protection is used as a backup solution for the architecture. It allows the virtual machines involved in the solution to be backed up and restored. This allows you to meet many company policies for recovery as well as data retention. The design goes across both regions, and looks as follows:

jmcdonald_vsphere-data-protection

Disaster Recovery

In addition to back ups, the design includes guidance on using Site Recovery Manager to back up the configuration. This includes a design that is used for both regions, and includes guidance on using vSphere Replication to replicate the data between sites. It also details how to create protection groups as well as recovery plans to ensure the management components are failed over between sites, including vRealize Automation and vRealize Operations Manager VMs where appropriate.

The architecture is shown as follows:
jmcdonald_vrealize-replicated

The Cloud

Of course, no SDDC is complete without a cloud platform and the design still includes familiar guidance on installation of the cloud components as well. vRealize Automation is definitely a part of the design and has not significantly changed, other than adding multiple region support. It is a big piece but I did want to show the conceptual design of the architecture here because it provides a high level overview of the components, user types, and operations in workload provisioning.

jmcdonald_workload-provisioning-end-user

The beauty here is that the design has been tried and tested to scale in the Validated design. This will allow for issues to be identified and fixed before the platform has been deployed.

Monitoring and Operational Procedures

Finally, last but not least, what design is complete without proper monitoring and operational procedures? The VMware Validated Design for SDDC includes a great design for both vRealize Operations Manager as well as vRealize Log Insight. In addition, it also goes into all the different practices for being able to backup, restore, and operate the actual cloud that has been built. It doesn’t go as far as a formal operational transformation for the business, but it does a great job of showing many standard practices can be used as a basis for defining what you—as a business owner—need in order to operate a cloud.

To show a bit of the design, vRealize Operations Manager contains functional elements that collaborate for data analysis and storage, and supports the creation of clusters of nodes with different roles:

jmcdonald_remote-collector

Overall, this is a really powerful platform that revolutionizes the way that you see the environment.

Download It Now!

Hopefully, this overview of the changes in the new VMware Validated Design for SDDC 3.0 has been useful. There is much more to the design than just the few items I’ve told you about in this blog, so I encourage you to check out the Validated Designs webpage for more details.

In addition—if you are interested—VMware Professional Services are available to help with the installation and configuration of a VMware Validated Design as well.

I hope this helps you in your architectural design discussions to show that integration stories are not only possible, but can make your experience deploying an SDDC much easier.

Look for myself and other folks from the Professional Services Engineering team and Integrated Systems Business Unit from VMware at VMworld Europe. We are happy to answer any questions you have about VMware Validated Designs!


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments

VMware Validated Design for SDDC 2.0 – Now Available

Jonathan McDonaldBy Jonathan McDonald

Recently I have been involved in a rather cool project inside VMware, aimed at validating and integrating all the different VMware products. The most interesting customer cases I see are related to this work because oftentimes products work independently without issue—but together can create unique problems.

To be honest, it is really difficult to solve some of the problems when integrating many products together. Whether we are talking about integrating a ticketing system, building a custom dashboard for vRealize Operations Manager, or even building a validation/integration plan for Virtual SAN to add to existing processes, there is always the question, “What would the experts recommend?”

The goal of this project is to provide a reference design for our products, called a VMware Validated Design. The design is a construct that:

  • Is built by expert architects who have many years of experience with the products as well as the integrations
  • Allow repeatable deployment of the end solution, which has been tested to scale
  • Integrates with the development cycle, so if there is an issue with the integration and scale testing, it can be identified quickly and fixed by the developers before the products are released.

All in all, this has been an amazing project that I’ve been excited to work on, and I am happy to be able to finally talk about it publicly!

Introducing the VMware Validated Design for SDDC 2.0

The first of these designs—under development for some time—is the VMware Validated Design for SDDC (Software-Defined Data Center). The first release was not available to the public and only internal to VMware, but on July 21, 2016, version 2.0 was released and is now available to everyone! This design builds not only the foundation for a solid SDDC infrastructure platform using VMware vSphere, Virtual SAN, and VMware NSX, but it builds on that foundation using the vRealize product suite (vRealize Operations Manager, vRealize Log Insight, vRealize Orchestrator, and vRealize Automation).

The VMware Validated Design for SDDC outcome requires a system that enables an IT organization to automate the provisioning of common, repeatable requests and to respond to business needs with more agility and predictability. Traditionally, this has been referred to as Infrastructure-as-a-Service (IaaS); however, the VMware Validated Design for SDDC extends the typical IAAS solution to include a broader and more complete IT solution.

The architecture is based on a number of layers and modules, which allows interchangeable components to be part of the end solution or outcome, such as the SDDC. If a particular component design does not fit the business or technical requirements for whatever reason, it should be able to be swapped out for another similar component. The VMware Validated Design for SDDC is one way of putting an architecture together that has been rigorously tested to ensure stability, scalability, and compatibility. Ultimately, however, the system is designed to ensure the desired outcome will be achieved.

The conceptual design is shown in the following diagram:

JMCDonald_VVD Conceptual Design

As you can see, the design brings a lot more than just implementation details. It includes many common “day two” operational tasks such as management and monitoring functions, business continuity, and security.

To simplify such a complex design, it has been broken up into:

  • A high-level Architecture Design
  • A Detailed Design with all the design decisions included
  • Implementation guidance.

Let’s take an in-depth look.

Continue reading

Virtualization and VMware Virtual SAN … the Old Married Couple

Don’t Mistake These Hyper-Converged Infrastructure Technologies as Mutually Exclusive

Jonathan McDonaldBy Jonathan McDonald

I have not posted many blogs recently as I’ve been in South Africa. I have however been hard at work on the latest release of VMware vSphere 6.0 Update 2 and VMware Virtual SAN 6.2. Some amazing features are included that will make life a lot easier and add some exciting new functionality to your hyper-converged infrastructure. I will not get into these features in this post, because I want to talk about one of the bigger non-technical questions that I get from customers and consultants alike. This is not one that is directly tied to the technology or architecture of the products. It is the idea that you can go into an environment and just do Virtual SAN, which from my experience is not true. I would love to know if your thoughts and experiences have shown you the same thing.

Let me first tell those of you who are unaware of Virtual SAN that I am not going to go into great depth about the technology. The key is that, as a platform, it is hyper-converged, meaning it is included with the ESXi hypervisor. This makes it radically simple to actually configure—and, more importantly, use—once it is up and running.

My hypothesis is that 80 to 90% of what you have to do to design for Virtual SAN focuses on the Virtualization design, and not so much on Virtual SAN.  This is not to say the Virtual SAN design is not important, but virtualization has to be integral to the design when you are building for it. To prove this, take a look at what the standard tasks are when creating the design for the environment:

  1. Hardware selection, racking, configuration of the physical hosts
  2. Selection and configuration of the physical network
  3. Software installation of the VMware ESXi hosts and VMware vCenter server
  4. Configuration of the ESXi hosts
    • Networking (For management traffic, and for VMware vSphere vMotion, at a minimum)
    • Disks
    • Features (VMware vSphere High Availability, VMware vSphere Distributed Resource Scheduler, VMware vSphere vMotion, at a minimum)
  5. Validation and testing of the configuration

If I add the Virtual SAN-specific tasks in, you have a holistic view of what is required in most greenfield configurations:

  1. Configuration of the Virtual SAN network
  2. Turning on Virtual SAN
  3. Creating new policies (optional, as the default is in place once configured)
  4. Testing Virtual SAN

As you can see, my first point shows that the majority of the work is actually virtualization and not Virtual SAN. In fact, as I write this, I am even more convinced of my hypothesis. The first three tasks alone are really the heavy hitters for time spent. As a consultant or architect, you need to focus on these tasks more than anything. Notice above where I mention “configure” in regards to Virtual SAN, and not installation; this is because it is already a hyper-converged element installed with ESXi. Once you get the environment up and running with ESXi hosts installed, Virtual SAN needs no further installation, simply configuration. You turn it on with a simple wizard, and, as long as you have focused on the supportability of the hardware and the underlying design, you will be up and running quickly. Virtual SAN is that easy.

Many of the arguments I get are interesting as well. Some of my favorites include:

  • “The customer has already selected hardware.”
  • “I don’t care about hardware.”
  • “Let’s just assume that the hardware is there.”
  • “They will be using existing hardware.”

My response is always that you should care a great deal about the hardware. In fact, this is by far the most important part of a Virtual SAN engagement. With Virtual SAN, if the hardware is not on the VMware compatibility list, then it is not supported. By not caring about hardware, you risk data loss and the loss of all VMware support.

If the hardware is already chosen, you should ensure that the hardware being proposed, added, or assumed as in place is proper. Get the bill of materials or the quote, and go over it line-by-line if that’s what’s needed to ensure that it is all supported.

Although the hardware selection is slightly stricter than with an average design, it is much the same as any traditional virtualization engagement in how you come to the situation. Virtual SAN Ready nodes are a great approach and make this much quicker and simpler, as they offer a variety of pre-configured hardware to meet the needs of Virtual SAN. Along with the Virtual SAN TCO Calculator it makes the painful process of hardware selection a lot easier.

Another argument I hear is “If I am just doing Virtual SAN, that is not enough time.” Yes, it is. It really, really is. I have been a part of multiple engagements for which the first five tasks above are already completely done. All we have to do is come in and turn on Virtual SAN. In Virtual SAN 6.2, this is made really easy with the new wizard:

JMcDonald_Configure VSAN

Even with the inevitable network issues (not lying here; every single time there is a problem with networking), environmental validation, performance testing, failure testing, testing virtual machine creation workflows, I have never seen it take more than a week to do this piece for a single cluster regardless of size of configuration. In many cases, after three days, everything is up and running and it is purely customer validation that is taking place. As a consultant or architect, don’t be afraid of the questions customers ask in regards to performance and failures. Virtual SAN provides mechanisms to easily test the environment as well as see as what “normal” is.

Here are two other arguments I hear frequently:

  • “We have never done this before.”
  • “We don’t have the skillset.”

These claims are probably not 100% accurate. If you have used VMware, or you are a VMware administrator, you are probably aware of the majority of what you have to do here. For Virtual SAN, specifically, this is where the knowledge needs to be grown. I suggest a training, or a review of VMworld presentations for Virtual SAN, to get familiar with this piece of technology and its related terminology. VMware offers training that will get you up to speed on hyper-converged infrastructure technologies, and the new features of VMware vSphere 6.0 Update Manager 2 and Virtual SAN 6.2.

For more information about free learnings, check out the courses below:

In addition, most of the best practices you will see are not unfamiliar since they are vCenter- or ESXi-related. Virtual SAN Health gives an amazing overview that is frequently refreshed, so any issues you may be seeing are reported here; this also takes a lot of the guess work out of the configuration tasks as you can see from the screenshot below, as many, if not all of, the common misconfigurations are shown.

JMcDonald_VSAN Health

In any case, I hope I have made the argument that Virtual SAN is mostly a virtualization design that just doesn’t use traditional SANs for storage.  Hyper-converged infrastructure is truly bringing change to many customers. This is, of course, just my opinion, and I will let you judge for yourself.

Virtual SAN has quickly become one of my favorite new technologies that I have worked with in my time at VMware, and I am definitely passionate about people using it to change the way they do business. I hope this helps in any engagements that you are planning as well as to prioritize and give a new perspective to how infrastructure is being designed.


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments

Virtual SAN Stretch Clusters – Real World Design Practices (Part 2)

Jonathan McDonaldBy Jonathan McDonald

This is the second part of a two blog series as there was just too much detail for a single blog. For Part 1 see: http://blogs.vmware.com/consulting/2016/01/virtual-san-stretch-clusters-real-world-design-practices-part-1.html .

As I mentioned at the beginning of the last blog, I want to start off by saying that all of the details here are based on my own personal experiences. It is not meant to be a comprehensive guide to setting up stretch clustering for Virtual SAN, but rather a set of pointers to show the type of detail most commonly asked for. Hopefully it will help you prepare for any projects of this type.

Continuing on with the configuration, the next set of questions regarded networking!

Networking, Networking, Networking

With sizing and configuration behind us, the next step was to enable Virtual SAN and set up the stretch clustering. As soon as we turned it on, however, we got the infamous “Misconfiguration Detected” message for the networking.

In almost all engagements I have been a part of, this has been a problem, even though the networking team said it was already set up and configured. This always becomes a fight, but it gets easier with the new Health UI Interface and multicast checks. Generally, when multicast is not configured properly, you will see something similar to the screenshot shown below.

JMcDonald VSAN Pt2 (1)

It definitely makes the process of going to the networking team easier. The added bonus is there are no messy command line syntaxes needed to validate the configuration. I can honestly say the health interface for Virtual SAN is one of the best features introduced for Virtual SAN!

Once we had this configured properly the cluster came online and we were able to configure the cluster, including stretch clustering, the proper vSphere high availability settings and the affinity rules.

The final question that came up on the networking side was about the recommendation that L3 is the preferred communication mechanism to the witness host. The big issue when using L2 is the potential that traffic could be redirected through the witness in the case of a failure, which has a substantially lower bandwidth requirement. A great description of this concern is in the networking section of the Stretched Cluster Deployment Guide.

In any case, the networking configuration is definitely more complex in stretched clustering because the networking across multiple sites. Therefore, it is imperative that it is configured correctly, not only to ensure that performance is at peak levels, but to ensure there is no unexpected behavior in the event of a failure.

High Availability and Provisioning

All of this talk finally led to the conversation about availability. The beautiful thing about Virtual SAN is that with the “failures to tolerate” setting, you can ensure there are between one and three copies of the data available, depending on what is configured in the policy. Gone are the long conversations of trying to design this into a solution with proprietary hardware or software.

A difference with stretch clustering is that the maximum “failures to tolerate” is one. This is because we have three fault domains: the two sites and the witness. Logically, when you look at it, it makes sense: more than that is not possible with only three fault domains. The idea here is that there is a full copy of the virtual machine data at each site. This allows for failover in case an entire site fails as components are stored according to site boundaries.

Of course, high availability (HA) needs to be aware of this. The way this is configured from a vSphere HA perspective is to assign the percentage of cluster resources allocation policy and set both CPU and memory to 50 percent:
JMcDonald VSAN Pt2 (2)

This may seem like a LOT of resources, but when you think of it from a site perspective, it makes sense; if you have an entire site fail, resources in the failed site will be able to restart without issues.

The question came up as to whether or not we allow more than 50 percent to be assigned. Yes, we can set it to use more than half consumed, but there might be an issue if there is a failure, as all virtual machines may not start back up. This is why it is recommended that 50 percent of resources be reserved. If you do want to configure a utilization of more than 50 percent of the resources for virtual machines, it is still possible, but not recommended. This configuration generally consists of setting a priority on the most important virtual machines so HA will start up as many as possible, starting with the most critical ones. Personally, I recommend not setting above 50 percent for a stretch cluster.

An additional question came up about using host and virtual machine affinity rules to control the placement of virtual machines. Unfortunately, the assignment to these groups is not easy during provisioning process and did not fit easily into the virtual machine provisioning practices that were used in the environment. vSphere Distributed Resource Scheduler (DRS) does a good job ensuring balance, but more control was needed rather than just relying on DRS to balance the load. The end goal was that during provisioning, placement in the appropriate site could be done automatically by staff.

This discussion boiled down to the need for a change to provisioning practices. Currently, it is a manual configuration change, but it is possible to use automation such as vRealize Orchestrator to automate deployment appropriately. This is something to keep in mind when working with customers to design a stretch cluster, as changes to provisioning practices may be needed.

Failure Testing

Finally, after days of configuration and design decisions, we were ready to test failures. This is always interesting because the conversation always varies between customers. Some require very strict testing and want to test every scenario possible, while others are OK doing less. After talking it over we decided on the following plan:

  • Host failure in the secondary site
  • Host failure in the primary site
  • Witness failure (both network and host)
  • Full site failure
  • Network failures
    • Witness to site
    • Site to site
  • Disk failure simulation
  • Maintenance mode testing

This was a good balance of tests to show exactly what the different failures look like. Prior to starting, I always go over the health status windows for Virtual SAN as it updates very quickly to show exactly what is happening in the cluster.

The customer was really excited about how seamlessly Virtual SAN handles errors. The key is to operationally prepare and ensure the comfort level is high with handling the worst-case scenario. When starting off, host and network failures are always very similar in appearance, but showing this is important; so I suggested running through several similar tests just to ensure that tests are accurate.

As an example, one of the most common failure tests requested (which many organizations don’t test properly) is simulating what happens if a disk fails in a disk group. Simply pulling a disk out of the server does not replicate what would happen if a disk actually fails, as a completely different mechanism is used to detect this. You can use the following commands to properly simulate a disk actually failing by injecting an error.  Follow these steps:

  1. Identify the disk device in which you want to inject the error. You can do this by using a combination of the Virtual SAN Health User Interface, and running the following command from an ESXi host and noting down the naa.<ID> (where <ID> is a string of characters) for the disk:
     esxcli vsan storage list
  2. Navigate to /usr/lib/vmware/vsan/bin/ on the ESXi host.
  3. Inject a permanent device error to the chosen device by running:
    python vsanDiskFaultInjection.pyc -p -d <naa.id>
  4. Check the Virtual SAN Health User Interface. The disk will show as failed, and the components will be relocated to other locations.
  5. Once the re-sync operations are complete, remove the permanent device error by running:
    python vsanDiskFaultInjection.pyc -c -d <naa.id>
  6. Once completed, remove the disk from the disk group and uncheck the option to migrate data. (This is not a strict requirement because data has already been migrated as the disk officially failed.)
  7. Add the disk back to the disk group.
  8. Once this is complete, all warnings should be gone from the health status of Virtual SAN.
    Note: Be sure to acknowledge and reset any alarms to green.

After performing all the tests in the above list, the customer had a very good feeling about the Virtual SAN implementation and their ability to operationally handle a failure should one occur.

Performance Testing

Last, but not least, was performance testing. Unfortunately, while I was onsite for this one, the 10G networking was not available. I would not recommend using a gigabit network for most configurations, but since we were not yet in full production mode, we did go through many of the performance tests to get a baseline. We got an excellent baseline of what the performance would look like with the gigabit network.

Briefly, because I could write an entire book on performance testing, the quickest and easiest way to test performance is with the Proactive Tests menu which is included in Virtual SAN 6.1:

JMcDonald VSAN Pt2 (3)

It provides a really good mechanism to test different types of workloads that are most common – all the way from a basic test, to a stress test. In addition, using IOmeter for testing (based on environmental characteristics) can be very useful.

In this case, to give you an idea of performance test results, we were pretty consistently getting a peak of around 30,000 IOPS with the gigabit network with 10 hosts in the cluster. Subsequently, I have been told that once the 10G network was in place, this actually jumped up to a peak of 160,000 IOPS for the same 10 hosts. Pretty amazing to be honest.

I will not get into the ins and outs of testing, as it very much depends on the area you are testing. I did want to show, however, that it is much easier to perform a lot of the testing this way than it was using the previous command line method.

One final note I want to add in the performance testing area is that one of the key things (other than pure “my VM goes THISSSS fast” type tests), is to test the performance of rebalancing in the case of maintenance mode, or failure scenarios. This can be done from the Resyncing Components Menu:

JMcDonald VSAN Pt2 (4)

Boring by default perhaps, but when you either migrate data in maintenance mode, or change a storage policy, you can see what the impact will be to resync components. It will either show when creating an additional disk stripe for a disk, or when fully migrating data off the host when going into maintenance mode. The compliance screen will look like this:

JMcDonald VSAN Pt2 (5)

This represents a significant amount of time, and is incredibly useful when testing normal workloads such as when data is migrated during the enter maintenance mode workflow. Full migrations of data can be incredibly expensive, especially if the disks are large, or if you are using gigabit rather than 10G networks. Oftentimes, convergence can take a significant amount of time and bandwidth, so this allows customers to plan for the amount of data to be moved while in or maintenance mode, or in the case of a failure.

Well, that is what I have for this blog post. Again, this is obviously not a conclusive list of all decision points or anything like that; it’s just where we had the most discussions that I wanted to share. I hope this gives you an idea of the challenges we faced, and can help you prepare for the decisions you may face when implementing stretch clustering for Virtual SAN. This is truly a pretty cool feature and will provide an excellent addition to the ways business continuity and disaster recovery plans can be designed for an environment.


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments

Virtual SAN Stretch Clusters – Real World Design Practices (Part 1)

Jonathan McDonaldBy Jonathan McDonald

This is part one of a two blog series as there was just too much detail for a single blog. I want to start off by saying that all of the details here are based on my own personal experiences. It is not meant to be a comprehensive guide for setting up stretch clustering for Virtual SAN, but a set of pointers to show the type of detail that is most commonly asked for. Hopefully it will help prepare you for any projects that you are working on.

Most recently in my day-to-day work I was asked to travel to a customer site to help with a Virtual SAN implementation. It was not until I got on site that I was told that the idea for the design was to use the new stretch clustering functionality that VMware added to the Virtual SAN 6.1 release. This functionality has been discussed by other folks in their blogs, so I will not reiterate much of the detail from them here. In addition, the implementation is very thoroughly documented by the amazing Cormac Hogan in the Stretched Cluster Deployment Guide.

What this blog is meant to be is a guide to some of the most important design decisions that need to be made. I will focus on the most recent project I was part of; however, the design decisions are pretty universal. I hope that the detail will help people avoid issues such as the ones we ran into while implementing the solution.

A Bit of Background

For anyone not aware of stretch clustering functionality, I wanted to provide a brief overview. Most of the details you already know about Virtual SAN still remain true. What it really amounts to is a configuration that allows two sites of hosts connected with a low latency link to participate in a virtual SAN cluster, together with an ESXi host or witness appliance that exists at a third site. This cluster is an active/active configuration that provides a new level of redundancy, such that if one of the two sites has a failure, the other site will immediately be able to recover virtual machines at the failed site using VMware High Availability.

The configuration looks like this:

JMcDonald Stretched Virtual SAN Cluster 1

This is accomplished by using fault domains and is configured directly from the fault domain configuration page for the cluster:

JMcDonald Stretched Virtual SAN Cluster 2

Each site is its own fault domain which is why the witness is required. The witness functions as the third fault domain and is used to host the witness components for the virtual machines in both sites. In Virtual SAN Stretched Clusters, there is only one witness host in any configuration.

JMcDonald Stretched Virtual SAN Cluster 3

For deployments that manage multiple stretched clusters, each cluster must have its own unique witness host.

The nomenclature used to describe a Virtual SAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness hosts at site C.

Finally, with stretch clustering, the current maximum configuration is 31 nodes (15 + 15 + 1 = 31 nodes). The minimum supported configuration is 1 + 1 + 1 = 3 nodes. This can be configured as a two-host virtual SAN cluster, with the witness appliance as the third node.

With all these considerations, let’s take a look at a few of the design decisions and issues we ran into.

Hosts, Sites and Disk Group Sizing

The first question that came up—as it almost always does—is about sizing. This customer initially used the Virtual SAN TCO Calculator for sizing and the hardware was already delivered. Sounds simple, right? Well perhaps, but it does get more complex when talking about a stretch cluster. The questions that came up regarded the number of hosts per site, as well as how the disk groups should be configured.

Starting off with the hosts, one of the big things discussed was the possibility of having more hosts in the primary site than in the secondary. For stretch clusters, an identical number of hosts in each site is a requirement. This makes it a lot easier from a decision standpoint, and when you look closer the reason becomes obvious: with a stretched cluster, you have the ability to fail over an entire site. Therefore, it is logical to have identical host footprints.

With disk groups, however, the decision point is a little more complex. Normally, my recommendation here is to keep everything uniform. Thus, if you have 2 solid state disks and 10 magnetic disks, you would configure 2 disk groups with 5 disks each. This prevents unbalanced utilization of any one component type, regardless of whether it is a disk, disk group, host, network port, etc. To be honest, it also greatly simplifies much of the design, as each host/disk group can expect an equal amount of love from vSphere DRS.

In this configuration, though, it was not so clear because one additional disk was available, so the division of disks cannot be equal. After some debate, we decided to keep one disk as a “hot spare,” so there was an equal number of disk groups—and disks per disk group—on all hosts. This turned out to be a good thing; see the next section for details.

In the end, much of this is the standard approach to Virtual SAN configuration, so other than site sizing, there was nothing really unexpected.

Booting ESXi from SD or USB

I don’t want to get too in-depth on this, but briefly, when you boot an ESXi 6.0 host from a USB device or SD card, Virtual SAN trace logs are written to RAMdisk, and the logs are not persistent. This actually serves to preserve the life of the device as the amount of data being written can be substantial. When running in this configuration these logs are automatically offloaded to persistent media during shutdown or system crash (PANIC). If you have more than 512 GB of RAM in the hosts, you are unlikely to have enough space to store this volume of data because these devices are not generally this large. Therefore, logs, Virtual SAN trace logs, or core dumps may be lost or corrupted because of insufficient space, and the ability to troubleshoot failures will be greatly limited.

So, in these cases it is recommended to configure a drive for the core dump and scratch partitions. This is also the only supported method for handling Virtual SAN traces when booting an ESXi from a USB stick or SD card.

That being said, when we were in the process of configuring the hosts in this environment, we saw the “No datastores have been configured” warning message pop up – meaning persistent storage had not been configured. This triggered the whole discussion; the error is similar to the one in the vSphere Web Client.
In the vSphere Client, this error also comes up when you click to the Configuration tab:

JMcDonald Stretched Virtual SAN Cluster 4

In the vSphere Client, this error also comes up when you click to the Configuration tab:

JMcDonald Stretched Virtual SAN Cluster 5

The spare disk turned out to be useful because we were able to use it to configure the ESXi scratch dump and core dump partitions. This is not to say we were seeing crashes, or even expected to; in fact, we saw no unexpected behavior in the environment up to this point. Rather, since this was a new environment, we wanted to ensure we’d have the ability to quickly diagnose any issue, and having this configured up-front saves significant time in support. This is of course speaking from first-hand experience.

In addition, syslog was set up to export logs to an external source at this time. Whether using the syslog service that is included with vSphere, or vRealize Log Insight (amazing tool if you have not used it), we were sure to have the environment set up to quickly identify the source of any problem that might arise.

For more details on this, see the following KB articles for instructions:

I guess the lesson here is that when you are designing your virtual SAN cluster, make sure you remember that having persistence available for logs, traces and core dumps is a best practice. If you have a large memory configuration, this is the easiest way to install ESXi and the scratch/core dump partitions to a hard drive. This also simplifies post-installation tasks, and will ensure you can collect all the information support might require to diagnose issues.

 Witness Host Placement

The witness host was the next piece we designed. Officially, the witness must be in a distinct third site in order to properly detect failures. It can either be a full host or a virtual appliance residing outside of the virtual SAN cluster. The cool thing is that if you use an appliance, it actually appears differently in the Web client:

JMcDonald Stretched Virtual SAN Cluster 6

For the witness host in this case, we decided to use the witness appliance rather than a full host. This way, it could be migrated easily because the networking was not set up to the third site yet. As a result, for the initial implementation while I was onsite, the witness was local to one of the sites, and would be migrated as soon as the networking was set up. This is definitely not a recommended configuration, but for testing—or for a non-production proof-of-concept—it does work. Keep in mind, that a site failure may not be properly detected unless the cluster is properly configured.

With this, I conclude Part 1 of this blog series; hopefully, you have found this useful. Stay tuned for Part 2!


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments

 

VMware Certificate Authority, Part 3: My Favorite New Feature of vSphere 6.0 – The New!

jonathanm-profileBy Jonathan McDonald

In the last blog, I left off right after the architecture discussion. To be honest, this was not because I wanted to but more because I couldn’t say anything more about it at the time. As of September 10, vSphere 6.0 Update 1 has been released with some fantastic new features in this area that make the configuration of customized certificates even easier. At this point what is shown is a tech preview, however it shows the direction that the development is headed in the future. It is amazing when things just work out and with a little bit of love, an incredibly complex area becomes much easier.

In this release, there is a UI that has been released for configuration of the Platform Services Controller. This new interface can be accessed by navigating to:

https://psc.domain.com/psc

When you first navigate here, a first time setup screen may be shown:

JMcDonald 1

To set up the configuration, login with a Single Sign-On administrator account, and the actual setup will run and be complete in short order. Subsequently when you login, the screen is plain and similar to the login of the vSphere Web Client:

JMcDonald 2
After login, the interface appears as follows:

JMcDonald 3

As you can see, it provides a ton of new and great functionality, including a GUI for installation of certificates! I will not be talking about the other features except to say there is some pretty fantastic content in there, including the single sign-on configuration, as well as appliance-specific configurations. I only expect this to grow in the future, but it is definitely amazing for a first start.

Let’s dig in to the certificate stuff.

Certificate Store

When navigating to the Certificate Store link, it allows you to see all of the different certificate stores that exist on the VMware Certificate Authority System:

JMcDonald 4This gives the option to view the details of all the different stores that are on the system, as well as view details, and add or remove entry details of each of the entries available:

JMcDonald 5
This is very useful when troubleshooting a configuration or for auditing/validating the different certificates that are trusted on the system.

Certificate Authority

Next up: the Certificate Authority option, which shows a view similar to the following:

JMcDonald 6

This area shows the Active, Revoked, Expired and Root Certificate for the VMware Certificate Authority. It also provides the option to be able to show details of each certificate for auditing or review purposes:

JMcDonald 7

In addition to providing a review, the Root Certificate Tab also allows the additional functionality of replacing the root certificate:

JMcDonald 8

When you go here to do just that, you are prompted to input the new Certificate and Private Key:

JMcDonald 9

Once processed the new certificate will show up in the list.

Certificate Management

Finally, and by far the most complex, is the Certificate Management screen. When you first click this, you will need to enter the single sign-on credentials for the server you want to connect to. In this case, it is to the local Platform Services Controller:

JMcDonald 10

Once logged in the interface looks as follows:

JMcDonald 11

Don’t worry, however, the user or server is not a one-time thing and can be changed by clicking the logout button. This interface allows the Machine Certificates and Solution User Certificates to be viewed, renewed and changed as appropriate.

If the renew button is clicked the certificate will be renewed from VMware Certificate Authority.JMcDonald 12

Once complete the following message is presented:

JMcDonald Renewal

If the certificate is to be replaced it is similar to the process of replacing the root certificate:

JMcDonald Root

Remember that the root certificate must be valid or replaced first or the installation will fail. Finally, the last screenshot I will show is the Solution Users Screen:

JMcDonald Solutions

The notable difference here is that there is a Renew All button, which will allow for all the solution user certificates to be changed.

This new interface for certificates is the start of something amazing, and I can’t wait to see the continued development in the future. Although it is still a tech preview, from my own testing it seems to work very well. Of course my environment is a pretty clean one with little environmental complexity which can sometimes show some unexpected results.

For further details on the exact steps you should take to replace the certificates (including all of the command line steps, which are still available as per my last blog) see, Replacing default certificates with CA signed SSL certificates in vSphere 6.0 (2111219).

I hope this blog series has been useful to you – it is definitely something I am passionate about so I can write about it for hours! I will be writing next about my experiences at VMworld and hopefully to help address the most common concerns I heard from customers while there.


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments

 

VMware Certificate Authority, Part 2: My Favorite New Feature of vSphere 6.0 – The Architecture

jonathanm-profileBy Jonathan McDonald

Picking up where Part 1 left off, I will now discuss the architecture decisions I have seen commonly used for the VMware Certificate Authority. This comes from many conversations with customers, VMware Professional Services, VMware Engineering and even VMware Support. In addition to these sources I recently participated in many conversations at VMworld. I spoke at several sessions as well while I was manning the VMworld booth. I ended up with the opportunity to better appreciate the complexities, and also got to talk with some fantastic people about their environments.

Architecture of the Environment

Getting back to the conversation, the architecture of the environment becomes one that is incredibly important to design up front. This allows an administrator to avoid much of the complexity and allows for security to be kept. That being said, from my current experiences I have seen three different ways that environments are most frequently configured:

  • VMware Certificate Authority as the root CA in the default configuration
  • VMware Certificate Authority used but operating as a subordinate CA
  • Hybrid model using custom Machine SSL Certificates, but using VMware Certificate Authority in its default configuration.

Before we get into them, however, keep in mind that as I mentioned in my previous blog series regarding the architecture changes for vSphere 6.0, there are two basic platform service controller architectures that should be considered when designing your certificate infrastructure.

Be sure to note up-front whether an external or embedded Platform Services controller is to be used as this is quite important. The difference here is that a separate Machine SSL endpoint certificate would be required for each system.

This means that on an embedded system a single certificate is required as shown below:

JMcDonald 1

Or, for an external environment, two or more will be needed depending on the size of the infrastructure, as can be seen in the following figure.

JMcDonald 2

For further details on the different platform services controller architectures, and to become familiar with them before proceeding, see http://blogs.vmware.com/consulting/2015/03/vsphere-datacenter-design-vcenter-architecture-changes-vsphere-6-0-part-1.html.

Using VMware Certificate Authority as the Root CA in the Default Configuration

This is by far the most common configuration I have seen deployed. Of course this is also the default, which explains why this is the case. Personally, I fully support and recommend actually using this configuration in almost all circumstances. The beauty of it is that it takes very little configuration to be fully secured. Why change something if you do not need to, right? By default, after installation everything is already configured to use VMware Certificate Authority and has already had certificates granted, which are deployed to all solutions and ESXi hosts, and are then added to vCenter Server.

In this situation the only thing required to secure the environment is to download and install the root certificate (or chain) from VMware Certificate Authority.

JMcDonald 3

Note: When you download this file in vSphere 6.0, it is simply called ‘download.’ This is a known issue in some browsers. It is actually a ZIP file, which contains the root certificate chain.

Once downloaded and extracted, the certificate(s) are the file(s) ending in .0. To install simply rename .0 to .cer and double-click to import to the Windows certificate store. Repeat the procedure for all certificates in the chain. The certificates should be installed into the Local Machine’s Trusted Root Certificate Authority or the Intermediate Certificate Authority stores respectively. If using Firefox, import to its proper store to ensure the chain is trusted.

The next time you go to the web page, it will show as trusted as in the following screenshot.

JMcDonald 4

This can potentially take some time if there are many clients who need to have certificates imported to them, however, this is the easiest (and default) deployment model.

Using VMware Certificate Authority as a Subordinate CA

This mode is less commonly used but it is the second largest deployment type I have seen. This takes a bit of work, but essentially it allows you to integrate VMware Certificate Authority into an existing certificate infrastructure. The big benefit to this is that you will issue completely signed certificates in the existing hierarchy, and in many cases no installation of the certificate chain is required. The downside is you will need a subordinate CA certificate to be able to implement the configuration in this way. I have seen this in some cases but it is simply not allowed by policy. This is where the hybrid configuration comes into play as discussed next.

To configure this use the command line utility called certificate-manager.

JMcDonald 5

Once launched, Option 2 is used to Replace VMCA Root Certificate with Custom Signing Certificate and to replace all certificates. The first part of the process is to generate the private key, and the certificate request that will be submitted to the Certificate Authority. To do this, select Option 1:
JMcDonald 6

Once generated, submit the request to the CA for issuance of the certificate, as well as collection of the root certificate chain. For more details, see KB article Configuring VMware vSphere 6.0 VMware Certificate Authority as a subordinate Certificate Authority (2112016).

When the new certificate has been collected, it is a matter of providing this new certificate, as well as the key, to the manager utility. If the certificate-manager screen is still open, select Option 1 to continue, otherwise select Option 2 and then Option 2 again. You will be prompted to provide the details for the certificate including all details for the certificate being issued. Once complete, services are stopped and restarted:

JMcDonald 7

After this the Machine Certificate (aka the Reverse Proxy certificate) can be regenerated from the VMware Certificate Authority for the vCenter Server(s) that are in the environment. This is done by selecting Option 3 from the menu:

JMcDonald 8

This will prompt for a Single Sign-On administrator password—as well as the Platform Services Controller IP—if the system is a distributed installation. It will then prompt you to enter the information for the certificate and restart the vCenter services. The server has its reverse proxy certificate replaced at this point.

The next step is to replace the solution user certificates by running Option 6 from certificate-manager:

JMcDonald 9

This completes the configuration of the custom certificate authority certificates for the vCenter components, but it is not quite done yet.

The final step is to replace the ESXi host certificates. This is done directly from each host’s configuration in the Web Client. The first step here is to set the details for the certificate in the advanced settings for the vCenter Server:

JMcDonald 10

When complete, navigate to an ESXi host and select Manage > Certificates. On this screen, click Renew. This will regenerate the certificate for this host.

JMcDonald 11

In any case, this is the most complex of the architectures shown, but it also is the most environmentally integrated as well. It provides an additional level of security that is required to satisfy compliance regulatory requirements.

Using Hybrid Configurations for the Reverse Proxy – but VMware Certificate Authority for All Other Certificates

Hybrid configurations are the middle ground in this discussion. It is a balance between security and complexity and also satisfies the security team in many cases as well. The biggest issues I have seen that require such a configuration such are:

  • Granting or purchasing a Subordinate CA certificate is not possible due to policy or cost.
  • Issuing multiple certificates to the same server, one for each service, is not something that can be done due to policy or regulatory requirements.
  • Controls to approve or deny certificate requests are required.

In these cases, although it may not be possible to fully integrate VMware Certificate Authority into the existing CA Hierarchy, it is possible to still provide added levels of security. In this case it is possible to leave VMware Certificate Authority in the default configuration, and replace the Machine certificate only. You can do this by using Option 1 from the Certificate Manager tool.

JMcDonald 12

When configured, the environment will use the corporate CA certificate for external user-facing communication because everything now goes through the reverse proxy. On the other side of things components are still secured by certificates from VMware Certificate Authority. The configuration should look like this:

JMcDonald 13

As you can see, the architectures are varied and also provide quite a bit of flexibility for the configuration.

Which Configuration Method Should You Use?

The only remaining question is – which method is the best for you to use? As stated in a previous section, my personal preference is actually to use the “keep it simple” methodology, and use the default configuration. The reason is it’s the simplest configuration, and the only requirement is that the root certificate is installed on clients rather than regenerating custom certificates, and then modifying the configuration.

Obviously where policy or regulatory compliance is concerned there may be a need to integrate it. This, although more complex than doing nothing, is also something that is much easier than it was in prior versions.

Hopefully all of this information has been of use to you for consideration when designing or configuring your different vSphere environments. As can be seen, the complexity has been dramatically reduced, and only promises to get better. I can’t wait till I complete the next blog in this series—part 3—which will provide even more detail that will make all of this even simpler.


 

Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments.

VMware Certificate Authority – My Favorite New Feature of vSphere 6.0 – Part 1 – The History

jonathanm-profileBy Jonathan McDonald

Anyone who knows me knows I have a particular passion (however misplaced that may be…) for the VMware Certificate Story. This multi-part blog series will discuss a bit of background on the certificate story, what you need to know about the architectural design of it in an environment and some new features you may not know about.

Let me start off by saying this passion started off several years ago when I was in Global Support Services, and I realized that too few people had an understanding about certificates. I am not even talking certificates in the context of VMware, but in general. This was compounded when we released vSphere 5.1 due to the fact that strict certificate checking was enabled.

A Bit of History

Although certificates were used for securing communication prior to vSphere 5.1, they were self-signed and there was no verification performed to ensure the certificate was valid. Therefore, for example, the certificate could be expired, or be used for multiple services at the same time (such as for the vCenter Server service and the vCenter Inventory service). This is obviously not a good practice, but it nevertheless was allowed.

When vCenter Single Sign-On was released with vSphere 5.1, it enforced strict certificate checking. This included not only the certificate uniqueness, but such information as the validity period of the certificates as well. Therefore, if any of the components were not using a unique and valid certificate, they would not be accepted when registering the different services as solutions in Single Sign-On. This would turn out to be a pretty large issue as upgrades would fail with very little detail as to why.

That being said, if all services in vCenter 5.1 and 5.5 have their certificate replaced, seven unique certificates are required:

  • vCenter Single Sign-On
  • vCenter Inventory Service
  • vCenter Server
  • vSphere Web Client
  • vSphere Web Client Log Browser
  • vCenter Update Manager
  • vCenter Orchestrator

The process to change the certificates is not straightforward and caused a significant amount of trouble amongst customers and global support services alike. This is when we raised it as a concern internally and helped to get a short-, medium- and long-term plan in place in time to make it easier to replace certificates when required. The plan was as follows:

  • Short term – We ensured the KB articles relating to certificate replacement were accurate and easy to follow.
  • Medium term – We helped in the development of the SSL Certificate Automation Tool, which dramatically reduced the number of steps and made it fairly easy to replace the certificates.
  • Long term – We forced focus on the issue so a solution could be built into the product.

Prior to moving from VMware Support to Professional Services Engineering we had released the tool and the larger plan was in place. The following are two blog posts I did for the tool:

http://blogs.vmware.com/kb/2013/04/introducing-the-vcenter-certificate-automation-tool-1-0.html

http://blogs.vmware.com/kb/2013/05/ssl-certificate-automation-tool-version-1-0-1.html

With vSphere 6.0 the larger long-term solution is finally coming to fruition with the introduction of the VMware Certificate Authority. It solves many of the problems that were seen.

Introduction to the VMware Certificate Authority

With vSphere 6.0, the base product installs an internal certificate authority (CA) called the VMware Certificate Authority. This is a part of the Platform Services Controller installation and has changed the architecture significantly for the better. No longer are the default certificates self-signed, rather, they are issued and signed by the VMware Certificate Authority.

This works in one of two ways:

  • VMware Certificate Authority acts as the root certificate authority. This is the default configuration and allows for an out-of-the-box configuration that is fully signed. All the clients need to do is to trust the root certificate and the communication is fully trusted.
  • VMware Certificate Authority acts as an Intermediate CA, integrating into an existing CA infrastructure in the environment. This allows for certificates to be issued that are already trusted throughout the environment.

In each of these two modes, it acts in the same way granting certificates to not only the solutions connected to the management infrastructure, but to ESXi hosts as well. This occurs when the solution/host is added to vCenter server. By default, communication is secure and trusted, and therefore, everything on the management network that was previously difficult to secure is trusted.

Introduction to the VMware Endpoint Certificate Store

In addition to the certificate authority itself, vSphere 6 certificates are now managed and stored in a “wallet.” This wallet is called the VMware Endpoint Certificate Store (VECS). The benefit here is certificates and private keys are no longer stored on disk in various locations. They are centrally managed in VECS on every vSphere node. This allows for a greatly simplified configuration for the environment because you no longer need to update trusts when the certificates are replaced because it is done automatically by VECS.

The VECS is installed on all platform services controller installation, including both embedded and external configurations.

JMcDonald Certificate Authority 1

The following different stores for certificates are used:

  • The Machine Certificates store contains the Machine SSL Certificate and private key, which is used for the Reverse Proxy, discussed next.
  • The Root CA Certificates store contains trusted root certificates and revocation lists, from any VMware Certificate Authority in the environment, or the third-party certificate authority being used. Solutions use this store to verify certificates.
  • The Solution User Certificates store contains the certificates and private keys for any solutions such as vCenter and the vSphere Web Client.

A single location for all certificates is a welcome change to the previous versions.

The Reverse Proxy – (Machine SSL Certificate)

Finally, before we get into the recommended architectures, the Reverse Proxy is the last thing I want to introduce. The change here addresses one of the biggest problems that was seen in previous versions of vCenter, which is that there are so many different services installed that require SSL communication. To be honest, the real challenge here is not the number of services rather, trying to get signed certificates all of them from the SSL administrator for the same host.

To combat this, solution users were consolidated with vCenter 6.0 to four vpxd, vpxd-, vsphere-webclient and machine. This is to say that where possible many of the various listening ports on the vCenter Server have been replaced with a single Reverse Web Proxy for communication. The Reverse Web Proxy uses the newly created Machine SSL Certificate to secure communication. Therefore, all communication to the different services is routed to the appropriate service based on the type of request through the reverse proxy. This can be seen in the figure below.

JMcDonald Certificate Authority 2

It is still possible to change the certificate of the solution users behind it, however, these are only used internally and do not necessarily need to be changed. More on this in the next part of this series.

With all of this background detail out of the way, I think it is a good place to pause. Part 2 of this article will discuss the architecture decision points and some configuration details. On another note, I am actually going to be off to VMworld very soon and will be manning the VMware booth, as well as speaking at several sessions! It is unlikely I will get part 2 done before then, but if you have any questions look for me (downstairs at the Solution Exchange, in the VMware Booth, at the Technology Advisor station), and we can talk more on this and other topics with the other members of my team!


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments.

VMworld Preview: Just Because You COULD, Doesn’t Mean You SHOULD – VMware vSphere 6.0 Architecture

jonathanm-profileBy Jonathan McDonald

Have you noticed the ever-changing VMware vSphere architecture with the introduction of new services and technologies? If you answered yes, you should already know that the architectural configuration details in an environment become instrumentally important to the Software-Defined Data Center (SDDC). The foundation of the SDDC starts with vSphere; if architected correctly it will be much more than a platform for the environment.

At VMworld in San Francisco I will discuss lessons learned from our VMware Professional Services team. This discussion will bring real-world experience to light so that common issues can be addressed prior to the deployment of the solution, rather than after the fact.

Here is an example of what we will dive into: There are different architectures for the Platform Services Controller, from an embedded node to a maximum-sized configuration, as shown in the figure below.

JMcDonald VMworld Blog

To be able to use Enhanced Linked Mode however, it is important to understand the correct and supported architectures that allow for a design to be configured in a supported manner. This will ensure that the chances of a failure are minimized from the beginning.

To learn more, attend my session on Wednesday, September 2, 8-00 AM- 9:00 AM or on Thursday, September 3, at 1:30 PM. # INF4712


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments.

vSphere Datacenter Design – vCenter Architecture Changes in vSphere 6.0 – Part 2

jonathanm-profileBy Jonathan McDonald

In Part 1 the different deployment modes for vCenter and Enhanced Linked Mode were discussed. In part 2 we finish this discussion by addressing different platforms, high availability and recommended deployment configurations for vCenter.

Mixed Platforms

Prior to vSphere 6.0, there was no interoperability between vCenter for Windows and the vCenter Server Linux Appliance. After a platform was chosen, a full reinstall would be required to change to the other platform. The vCenter Appliance was also limited in features and functionality.

With vSphere 6.0, they are functionally the same, and all features are available in either deployment mode. With Enhanced Linked Mode both versions of vCenter are interchangeable. This allows you to mix vCenter for Windows and vCenter Server Appliance configurations.

The following is an example of a mixed platform environment:

JMcDonald pt 2 (1)

This mixed platform environment provides flexibility that has never been possible with the vCenter Platform.

As with any environment, the way it is configured is based on the size of the environment (including expected growth) and the need for high availability. These factors will generally dictate the best configuration for the Platform Services Controller (PSC).

High Availability

Providing high availability protection to the Platform Services Controller adds an additional level of overhead to the configuration. When using an embedded Platform Services Controller, protection is provided in the same way that vCenter is protected, as it is all a part of the same system.

Availability of vCenter is critical due to the number of solutions requiring continuous connectivity, as well as to ensure the environment can be managed at all times. Whether it is a standalone vCenter Server, or embedded with the Platform Services Controller, it should run in a highly available configuration to avoid extended periods of downtime.

Several methods can be used to provide higher availability for the vCenter Server system. The decision depends on whether maximum downtime can be tolerated, failover automation is required, and if budget is available for software components.

The following table lists methods available for protecting the vCenter Server system and the vCenter Server Appliance when running in embedded mode.

Redundancy Method Protects
vCenter Server system?
Protects
vCenter Server Appliance?
Automated protection using vSphere HA Yes Yes
Manual configuration and manual failover, for example, using a cold standby. Yes Yes
Automated protection using Microsoft Clustering Services (MSCS) Yes No

If high availability is required for an external Platform Services Controller, protection is provided by adding a secondary backup Platform Services Controller, and placing them both behind a load balancer.

The load balancer must support Multiple TCP Port Balancing, HTTPS Load Balancing, and Sticky Sessions.  VMware has currently tested several load balancers including F5 and Netscaler, however does not directly support these products. See the vendor documentation regarding configuration details for any load balancer used.

Here is an example of this configuration using a primary and a backup node.

JMcDonald pt 2 (2)

With vCenter 6.0, connectivity to the Platform Services Controller is stateful, and the load balancer is only used for its failover ability. So active-active connectivity is not recommended for both nodes at the same time, or you risk corruption of the data between nodes.

Note: Although it is possible to have more than one backup node, it is normally a waste of resources and adds a level of complexity to the configuration for little gain. Unless there is an expectation that more than a single node could fail at the same time, there is very little benefit to configuring a tertiary backup node.

Scalability Limitations

Prior to deciding the configuration for vCenter, the following are the scalability limitations for the different configurations. These can have an impact on the end design.

Scalability Maximum
Number of Platform Services Controllers per domain

8

Maximum PSCs per vSphere Site, behind a single load balancer

4

Maximum objects within a vSphere domain (Users, groups, solution users)

1,000,000

Maximum number of VMware solutions connected to a single PSC

4

Maximum number of VMware products/solutions per vSphere domain

10

Deployment Recommendations

Now that you understand the basic configuration details for vCenter and the Platform Services Controller, you can put it all together in an architecture design. The choice of a deployment architecture can be a complex task depending on the size of the environment.

The following are some recommendations for deployment. But please note that VMware recommends virtualizing all the vCenter components because you gain the benefits of vSphere features such as VMware HA. These recommendations are provided for virtualized systems; physical systems need to be protected appropriately.

  • For sites that will not use Enhanced Linked Mode, use an embedded Platform Services Controller.
    • This provides simplicity in the environment, including a single pane-of-glass view of all servers while at the same time reducing the administrative overhead of configuring the environment for availability.
    • High availability is provided by VMware HA. The failure domain is limited to a single vCenter Server, as there is no dependency on external component connectivity to the Platform Services Controller.
  • For sites that will use Enhanced Linked Mode use external Platform Service Controllers.
    • This configuration uses external Platform Services controllers and load balancers (recommended for high availability). The number of controllers depends on the size of the environment:
      • If there are two to four VMware solutions – You will only need a single Platform Services Controller if the configuration is not designed for high availability; two Platform Services Controllers will be required for high availability behind a single load balancer.
      • If there are four to eight VMware solutions – Two Platform Services Controllers must be linked together if the configuration is not designed for high availability; four will be required for a high-availability configuration behind two load balancers (two behind each load balancer).
      • If there are eight to ten VMware solutions – Three Platform Services Controllers should be linked together for a high-availability configuration; and six will be required for high availability configured behind three load balancers (two behind each load balancer).
    • High availability is provided by having multiple Platform Services Controllers and a load balancer to provide failure protection. In addition to this, all components are still protected by VMware HA. This will limit the failure implications of having a single Platform Services Controller, assuming they are running on different ESXi hosts.

With these deployment recommendations hopefully the process of choosing a design for vCenter and the Platform Services Controller will be dramatically simplified.

This concludes this blog series. I hope this information has been useful and that it demystifies the new vCenter architecture.

 


Jonathan McDonald is a Technical Solutions Architect for the Professional Services Engineering team. He currently specializes in developing architecture designs for core Virtualization, and Software-Defined Storage, as well as providing best practices for upgrading and health checks for vSphere environments.