Home > Blogs > vCloud Architecture Toolkit (vCAT) Blog

VMware Cloud Foundation Digging Deeper into the Architecture

VMware Cloud Foundation Overview
This has been an exciting time for the IT industry. At VMworld US 2016 (August 29th 2015) we had the announcement of VMware Cloud Foundation becoming an integral part of IBM SoftLayer and then we had the news of the strategic partnership with Amazon Web Services (AWS) and VMware (October 13th 2016). VMware Cloud Foundation is a shift in cloud infrastructure that enables the Software Defined Data Center (SDDC). This is significant because what we know as the SDDC, with technology such as VMware Horizon, NSX and Virtual SAN, can now be consumed and offered by service providers in a unique way.

At the core is SDDC Manager and lifecycle management (LCM) which allows a fully automated deployment, configuration and patching & upgrades. But what does the architecture look like behind VMware Cloud Foundation? Let’s take a closer look. Continue reading

Automated Deployments of vRealize Automation for vCloud Air Network

In the previous blog post “Leveraging vRealize CloudClient with vRealize Automation deployments for vCAN”, we explored the use of VMware vRealize® CloudClient for the automated configuration of VMware vRealize Automation™ on a per-tenant basis to speed up the deployment of per-tenant instances in a service provider environment. This method relied on a manual installation of the vRealize Automation infrastructure components. However, the release of vRealize Automation 7.1 provides built-in silent installation capabilities for increased time-to-value deployments of vRealize Automation.


Overview of vRealize Automation for SPs

While vRealize Automation is typically implemented in Private Cloud – Enterprise environments, service providers still have an interest in providing services based on vRealize Automation for customers on a per-tenant basis as well as the management of the internal infrastructure. Customers benefit from this by experiencing an expedited time to value while also being able to offload the maintenance and management overhead of the Private Cloud infrastructure to a trusted VMware vCloud® Air™ Network service provider of their choice. Some of the common deployment models that service providers use for vRealize Automation are:

  • Internal Operations – Single tenant deployment of vRealize Automation by the service provider for internal operations users.
  • Dedicated Customer Private Cloud – Single tenant deployment of vRealize Automation with the optional use of multiple business groups. Customer manages user access and catalog content.
  • Fully Managed Service Offering – Service offering that leverages multiple business groups and/or tenants and is managed fully by the vCloud Air Network service provider on behalf of the customer.

At a platform level, each of these models enables the consumption of single and multiple data centers provided by the service provider, while the Dedicated Private Cloud and the Managed Service offering provide customers the capability to consume on-premises compute resources.

Continue reading

VMware Horizon Client (PCoIP & Blast) Connection Workflow

Since I published the Horizon 7 Network Ports diagram with the latest release of Horizon 7, I’ve been frequently asked about the connection flow between the Horizon Client and the virtual desktop. VMware Horizon supports RDP, PCoIP and now Blast Extreme. I’ll start with PCoIP and then we’ll look at Blast Extreme.

The connection flow of the Horizon Client is largely the same with Horizon 7, Horizon Air or Horizon DaaS. There may be differences in external load-balancing, Security Server or Access Point, and external URL configuration, but for this post I’ll focus on the Horizon Client itself and the aforementioned protocols.

A colleague asked me a very good question which I’d also like to address. How does Access Point know which VM to connect to?

Access Point doesn’t need to know which ESXi host is running the VM. When the entitled desktops are returned to the client(see 1b below) it also receives the external URL of the Access Point appliance, this is where the Horizon Client > Access Point connection is established on HTTPS (TCP 443). This could be a VIP on the load-balancer, or an external facing IP for each of the Access Point appliances, depending on the configuration (see Method 3 of Mark’s article).

When the user launches the chosen desktop pool, Access Point will communicate on HTTPS (TCP 443) to receive the desktop IP from the Connection server. The role of the PCoIP Gateway on the Access Point appliance is to then forward the PCoIP connection to the IP address of the Horizon Agent.

Note: In the past, Security Server used JMS, IPsec and AJP13, but Access Point doesn’t use these protocols (JMS is still used on the Connection Servers). If you refer to my Horizon 7 Network Ports diagram, you’ll see I’ve put these in a dotted line to show this.

Tunneled Connections (PCoIP)

VMware Horizon PCoIP Connection Flow
Continue reading

Enterprise Application Migration Technologies – Finding the Right Fit


When looking at the adoption of public or hybrid cloud, one of the primary considerations must be how to migrate existing workloads to the target platform. Choosing the right migration tool(s) will prove critical in the coaching of customers, mainly their IT and application owners, to address this challenge. There are many VMware vCloud® Air™ Network architectures that can provide workload mobility where capabilities, like hybrid cloud networking enabled by VMware NSX®, and other solutions, such as VMware Site Recovery Manager™, might be in place. Enterprise migration technologies however, span a much broader scope than that of moving applications hosted on physical or virtual infrastructure to a cloud architecture. Specifically, these tools address the enterprise architecture features required to discover, plan, and execute migration, while allowing for scheduling and systems level dependencies.

VMware offers tools that address many of these needs and some have been described in the VMware vCloud Architecture Toolkit™ for Service Providers (vCAT-SP) blog and white paper.  As stated in the vCAT-SP documentation for migration, offerings will not meet all requirements for migrating workloads to the cloud, and the purpose of this series of blogs is to allow VMware Technology Partners to discuss their solutions and advocate for why they might be the best choice in many situations. Many standard forms of analysis will apply to the evaluation of enterprise migration technologies, including common items such as pricing, support, or strategic direction. This series of blogs will focus on the more technical aspects, such as ease of deployment/usage, versatility, reliability, scalability, and security. The blog entries will also cover optimal use cases addressed by the partner solutions, often with customer references.

The first blog in this series is with VMware Technology Partner ATADATA. In particular, their enterprise migration solution focusing on their ATAvision and ATAmotion products. The combination of these two offerings fits into the “Discover & Assess, Job Scheduling, Workload Migration, Application Verification” lifecycle described in the blog and vCAT-SP documentation referenced above. The first three letters of the ATADATA name are an acronym for “any to any” and their deployment model, shown in the following figure, indicates their abstraction from the underlying physical, virtual, or cloud infrastructures that are part of an enterprise migration. This capability enables their technology to not only support many platforms (see ATADATA supported platforms), but to provide a consistent abstraction of underlying details for migrating between sources and targets of any supported type.
Continue reading

vRealize Automation Configuration with CloudClient for vCloud Air Network

As a number of vCloud Air Network service providers start to enhance their existing hosting offerings, VMware are seeing some demand from service providers to offer a dedicated vRealize Automation implementation to their end-customers to enable them to offer application services, heterogeneous cloud management and provisioning in a self-managed model.

This blog post details an implementation option where the vCloud Air Network service provider can offer “vRealize Automation as a Service” hosted in a vCloud Director vApp, with some additional automated configuration. This allows the service provider to offer vRealize Automation to their customers based out of their existing multi-tenancy IaaS platforms and achieve high levels of efficiency and economies of scale.

“vRealize Automation as a Service”

During a recent Proof of Concept demonstrating such a configuration, an vCloud Director Organizational vDC was configured for tenant consumption.  Within this Org vDC a vApp containing a simple installation of vRealize Automation was deployed that consisted of a vRealize Automation Appliance and one Windows Server for IaaS components and an instance of Microsoft SQL.  With vRealize Automation successfully deployed, the vRealize Automation instance was customized leveraging vRealize CloudClient via Microsoft PowerShell scripts.  Using this method for configuration of the tenant within vRealize Automation reduced the deployment time for vRealize Automation instances while ensuring that the vRealize Automation Tenant configuration was consistent and conformed to the pre-determined naming standards and conventions required by the provider.

vRaaS vCAN Operations
Continue reading

Deep Dive Architecture Comparison of DaaS & VDI, Part 2

In part 1 of this blog series, I discussed the Horizon 7 architecture and a typical single-tenant deployment using Pods and Blocks. In this post I will discuss the Horizon DaaS platform architecture and how this offers massive scale for multiple tenants in a service provider environment.

Horizon DaaS Architecture

The fundamental difference with the Horizon DaaS platform is multi-tenancy architecture. There are no Connection or Security servers, but there are some commonalities. I mentioned Access Point previously, this was originally developed for Horizon Air, and is now a key component for both Horizon 7 and DaaS for remote access.


Horizon DaaS Architecture

If you take a look at the diagram above you’ll see these key differences. Let’s start with the management appliances.
Continue reading

Deep Dive Architecture Comparison of DaaS & VDI, Part 1

In this two part blog series, I introduce the architecture behind Horizon DaaS and the recently announced Horizon 7. From a service provider point of view, the Horizon® family of products offers massive scale from both single-tenant deployments and multi-tenanted service offerings.

Many of you are very familiar with the term Virtual Desktop Infrastructure (VDI), but I don’t think the term does any justice to the evolution of the virtual desktop. VDI can have very different meanings depending on who you are talking to. Back in 2007 when VMware acquired Propero, which soon became VDM (then View and Horizon), VDI was very much about brokering virtual machines running a desktop OS to end-users using a remote display protocol. Almost a decade later, VMware Horizon is vastly different and it has matured into an enterprise desktop and application delivery platform for any device. Really… Horizon 7 is the ultimate supercar of VDI compared to what it was a decade ago.

I’ve read articles that compare VDI to DaaS but they all seem to skip this evolution of VDI and compare it to the traditional desktop broker of the past. DaaS on the other hand provides the platform of choice for service providers offering Desktops as a Service. DaaS was acquired in October 2013 (formerly Desktone). In fact I remember the day of the announcement because I was working on a large VMware Horizon deployment for a service provider at the time.

For this blog post I’d like to start our comparisons on the fundamental architecture of the Horizon DaaS platform to Horizon 7 which was announced in February 2016. This article is aimed at consultants and architects wishing to learn more about the DaaS platform.
Continue reading

Managed Security Services Maturity Model for vCloud Air Network Service Providers


We’ve all heard about the many successful cyber-attacks carried out in various industries. Rather than cite a few examples to establish background I would encourage you to review the annual report from Verizon called the Data Breach Digest. This report gives critical insight for understanding how the most pervasive of attacks are executed and what to protect against to impede or prevent them. In order to provide a sound architecture and operational model for this purpose of protection, let’s look at some universal principals that have emerged as a result of forensics from these events. Those principles are time and space. Space, in this case, is cyberspace and involves the moving digital components of the target systems that must be compromised to execute a successful attack. Time involves events that may occur at network or CPU speed, but it is the ability to trap those events and put them into a human context, in terms of minutes, hours, or days, where security operations can respond. The combination of unprotected attack vectors, already compromised components of the system, and the inability to spot them, creates what are known as “blind spots” and “dwell time” where an attacker can harvest additional information, and potentially expand to other attack vectors.

While all of that is hopefully easy to understand, we have to face the reality that many attacks still occur by using compromised credentials from social engineering. These credentials provide enough privilege to establish a foothold for command and control used in a cyber-attack. For this reason, we want to employ one of the core principles of the Managed Security Services Maturity Model, known as Zero Trust, or the idea that every action must have specific authentication, authorization and accounting (AAA) defined. By subscribing to this maturity model as a VMware vCloud® Air™ Network service provider, you will uncover ways in which you can leverage features, such VMware NSX® Distributed Firewall and micro-segmentation, putting you well on the road to offering services that can help customers address potential blind spots and reduce dwell time, thereby taking control and ownership of their cyber risk posture. No matter how nefarious a rogue entry into target systems is, or what escalated privilege was acquired, the Managed Security Services Model will limit the kind of lateral movement necessary to conduct consistent ongoing attacks, or what is known as an advanced persistent threat (APT). Although not all occurrences are APTs, by understanding the methods used in these most advanced attacks, we can isolate and protect aspects of the system required to execute a “kill chain,” essentially allowing ownership of a system in undetectable ways.

Managed Security Services Maturity Model

Cyber security, in its entirety, is a vast concept not to be given justice with a small set of blog articles and white papers. However, given the expansive nature of cyber-threats in this day and age, along with the ratio of successful attacks, information technology needs to continually seek out new approaches. One approach is to create as much of an IT environment as possible from known patterns and templates of installed technologies that can be deployed with a high fidelity of audit information to measure their collective effectiveness against cyber-threats. This turns on its head the idea of protecting environments against an exponentially exploding number of threats with greater diversity in the areas frequently attacked, and instead refines deployed environments to accept only activities that are well defined, with results that are well understood. Simply put, measure what you can trust. If it can’t be measured, it can’t be trusted.

Once again, this approach touches on a large concept, but it is finite in nature in that its definition seeks to gain the control needed to deliver sustainable security operations for customers. To further illustrate this point, let’s think about the idea of what a control and the maturity model affords the operator in pursuit of their target vision. First, is the idea of “control,” which simply put in cyber security terms means defining a behavior that can be measured. This could be architecture patterns expected from the provider layer, such as data privacy or geo-location, or automation and orchestration of security operations. Second, is the maturity model itself, which has prerequisites for executing on specific rungs of the model, along with providing operational and security benefits. One output of each rung of the maturity model is the potential set of services to be offered to aid in the completion the customer’s target cyber security vision.

Enter the Managed Security Services Maturity Model, which encodes the methodology for capturing each customer’s ideal approach and provides five different maturity “layers” that aid vCloud Air Network service providers in delivering highly secure hybrid cloud environments. Looking at Figure 1, we can see that the ideas of time and “geometry” (networks and boundaries we have defined), along with the provider (below the horizontal blue line) and consumer (operating system and application runtimes) layers, provide us the cyber dimensions we seek to define and measure.

Maturity Model

Figure 1. Managed Security Services Maturity Model

Like most capability maturity models, when starting from the bottom we can often borrow attributes and patterns for service from the layers above. Generally, however, we need to accomplish the prerequisites for the upper layers (Orchestrated and above) to truly be considered operating at that layer. Often, there are issues of completeness where we must perform these prerequisite tasks n number of times in the design of our architecture and operations to have mobility to upper levels. For instance, to complete the Automation level, you should plan to automate on the order of about a dozen elements although your mileage may vary.

You may find more work to be done moving up the levels as you determine the right composition and critical mass of controls appropriate to deliver for targeted customer profiles. In the case of our maturity model, we will bind several concepts at each level to ultimately achieve the Zen-like “Advanced” layer 5, where we truly realize the completeness of the vision to own cyber security for our customers. A big responsibility to be sure, but perhaps a bigger opportunity to change the game from the status quo. The offering of managed services composed of facets from all levels is not for everyone but there is plenty of room to add value from all layers.

We have defined the following layers for the Managed Security Services Maturity Model:

  1. Basic

At this level, we introduce VMware NSX, VXLAN, and the Distributed Firewall to the hybrid cloud environment. This allows us to create controlled boundaries and security policies that can be applied in an application-centric fashion, resulting in focused operating contexts for security operations.

  1. Automated

At this level, we want to automate the behavior of the system with regard to controls. This will prompt security operations with events generated by discreet controls and their performance involving established measurements or tolerances. The goal is to automate as many controls as possible to become Orchestrated.

  1. Orchestrated

After we have many controls automated, we want to make them recombinant in ways that allow for controlling the space, or the “geometry”, along with coordinating events, information, automated reactions, and so on, which will allow us to drive down response times. These combinations will result in “playbooks,” or collections of controls assembled in patterns that are used to combat cyber threats.

  1. Lifecycle

Taking on full lifecycle responsibility means just that. We might monitor in-guest security aspects like anti-virus/malware or vulnerability scanning in discreet, automated, and even orchestrated ways in previous levels. This level, however, is about actually taking ownership of operating systems and perhaps even application runtimes within the customer virtual machines. By extending managed services to include what is inside the virtual machines themselves, it is possible to take ownership of all facets of cyber security regarding applications in the hybrid cloud.

  1. Advanced

At the Advanced level, we must be able to leverage all previous levels in such a way that managed services can be deployed to remediate a cyber-threat or execute on a risk management plan to help address security issues of all types. Additionally, we want our resulting cyber toolkit derived from the maturity model to become portable, in appliance form, where managed security services can be delivered anywhere in the hybrid cloud network.

In the upcoming series of blog postings that describe VMware vCloud Architecture Toolkit for Service Providers (vCAT-SP) reference architecture design blueprints and use cases for each maturity level, vCloud Air Network service providers can help customer’s to visualize what it will take to both architect and operate managed security services used to augment the hybrid cloud delivery model.

Eliminating Blind Spots and Reducing Dwell Time

The cyber defense strategies that are devised based on achieving levels of the maturity model focus on defining individual elements within the system. Management user interfaces, ports, session authentication, as well as virtual machine file systems, network communications, and so on, should be defined to allow alignment of controls. In addition, the provisioning of networks between the resources that consume services and those that provide them, such as management components like VMware vCloud Director® or VMware vCenter™, DNS, or Active Director and logging of network components (including those that serve end user applications to their communities), should also occur in as highly an automated fashion as possible.

In this way, human-centric, error-prone activities can be eliminated from consideration as potential vulnerabilities, although automated detection of threats by discreet components across cyber dimensions is still expected. A high level example of how we expect these discreet, automated controls to behave is described by Gartner, who defines the concept of a “cloud security gateway” as “the ability to interject enterprise security policies as the cloud-based resources are accessed”. By defining controls for system elements and their groupings in this way, we can form a fully identified inventory of what is being managed and by whom as well as where it resides. Likewise, by understanding and quantifying the controls in the system that are applied collectively to these elements, we can begin to measure and score their effectiveness. This harmonization is critical to deliver the consistency in the enforcement mechanisms we can rely on across both sides of the hybrid cloud creating the foundation of trust.

Despite our efforts to inventory all elements within systems, attacks will still arrive from the outside world in the user portions of the application stack, for example, through SQL injection or using cross-site scripting techniques. The threat of compromised insider privileged users will still be present as will “social engineering” methods of obtaining passwords. However, the “escape” of a rogue, privileged user to a realm from which they can continue their attack has been minimized. We have taken the elements of time and space and defined them to our advantage, creating a high security prison effect and requiring new vulnerability exploits to be executed for each step in the kill chain.

Because the attackers generally deal with a limited budget and time in which to execute a successful attack, often times even our simplest security approaches are enough to make us the safest house on the block. Also, because of the likelihood that all activities that occur within the environment are well known, effectively generating high confidence indicators or signals, and very little noise as a sensor, anomalies are easy to spot. Given the presentation of those anomalies and playbooks already available to address many adverse operating conditions, you are providing customers the ability to deliver a credible response to threats, something that many lack today.


The goal of vCloud Air Network service providers and their partners should be identifying cyber security challenges that customers face, as well as which meaningful, coarsely grained packages of managed services can be offered to help tackle those challenges. By aligning with the Managed Security Services Maturity Model, providers can leverage the VMware SDDC and VMware NSX software-defined networking and security capabilities to deliver something truly unique in the enterprise IT industry—a secure hybrid cloud. By further aligning these capabilities and services with those of application migration and DevOps (stay tuned for blogs on those and other subjects), and taking ownership of the full lifecycle of security, the potential of effectively remediating existing threats becomes possible. Together, we can help customers evaluate their risk profile, as well as understand how these techniques can minimize attack points and vectors and reduce response times, while increasing effectiveness in fighting cyber threats.

What you’ll see throughout the Managed Security Services Maturity Model is the creation of a “ubiquity” of security controls across each data center participating in the hybrid cloud. This ubiquity will allow for a consistent, trusted foundation from which the performance of the architecture and operations can be measured. Individual policies can then be constructed across this trusted foundation relative to specific security contexts consisting of applications and their users as well as administrators and their actions, leaving very little room for threats to go unnoticed. As these policies are enforced by the controls of the trusted foundation, cyber security response becomes more agile because all components are performing in a well understood fashion. Think of military special forces training on a “built for purpose” replica of an area they plan to assault to minimize unexpected results. Security operators can now be indoctrinated and immersed, knowing what scenes are expected to play out instead of constantly looking for the needle in the haystack. This will also ultimately create the ideal conditions for helping to rationalize unfettered consumption of elastic resources while also fulfilling the vision and realizing the potential of the hybrid cloud.

Streamlining VMware vCloud Air Network Customer Onboarding with VMware NSX Edge Services

When migrating private cloud workloads to a public or hosted cloud provider, the methods used to facilitate customer onboarding can provide some of the most critical challenges. The cloud service provider requires a method for onboarding tenants that reduces the need for additional equipment or contracts that often create barriers for customers when moving enterprise workloads onto a hosting or public cloud offering.

Customer Onboarding Scenarios

When a service provider is preparing for customer onboarding, there are a few options that can be considered. Some of the typical onboarding scenarios are:

  • Migration of live workloads
  • Offline data transfer of workloads
  • Stretching on-premises L2 networks
  • Remote site and user access to workloads

One of the most common scenarios is workload migration. For some implementations, this means migrating private cloud workloads to a public cloud or hosted service provider’s infrastructure. One path to migration leverages VMware vSphere® vMotion® to move live VMs from the private cloud to the designated CSP environment. In situations where this is not feasible, service providers can supply options for the offline migration of on-premises workloads where private cloud workloads that are marked for migration are copied to physical media, shipped to the service provider, and then deployed within the public cloud or hosted infrastructure. In some cases, migration can also mean the ability to move workloads between private cloud and CSP infrastructure on demand.

Continue reading

vCenter Server Scalability for Service Providers

Designing and architecting monster vCloud Air Network service provider environments takes VMware technology to its very limits, in terms of both scalability and complexity. vCenter Server, and its supporting services, such as SSO, are at the heart of the vSphere infrastructure, even in cloud service provider environments where a Cloud Management Platform (CMP) is employed to abstract the service presentation away from vCenter Server.

Meeting service provider scalability requirements with vCenter Server requires optimization at every level of the design, in order to implement a robust technical platform that can scale to its very limits, whilst also maintain operational efficiency and support.

This article outlines design considerations around optimization of Microsoft Windows vCenter Server instances and best practice recommendations, in order to maximize operational performance of your vCenter ecosystem, which is particularly pertinent when scaling over 400 host servers. Each item listed below should be addressed in the context of the target environment, and properly evaluated before implementation, as there is no one solution to optimize all vCenter Server instances.

The following is simply a list of recommendations that should, to some extent, improve performance in large service provider environments. This blog targets the Windows variant of vCenter Server 5.x and 6.x with a Microsoft SQL database, which is still the most commonly deployed configuration.

Warning: Some of the procedures and tasks outlined in this article are potentially destructive to data, and therefore should only be undertaken by experienced personnel once all appropriate safeguards, such as backed up data and a tested recovery procedure, are in place.


Part 1 – vCenter Server Operational Optimization

vCenter Server Sizing
vCloud Air Network service providers must ensure that the vCenter virtual system(s) are sized accordingly, based on their inventory size. Where vCenter components are separated and distributed across multiple virtual machines, ensure that all systems meet the sizing recommendations set out in the installation and configuration documentation.

vSphere 5.5: https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html
vSphere 6.0: https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html
vSphere 5.1: http://kb.vmware.com/kb/2021202

Distribute vCenter Services across multiple virtual machines (vSphere 5.5)
In vSphere 5.5, depending on inventory size, multiple virtual machines can be used to accommodate different vCenter roles. VMware recommends separating VMware vCenter, SSO Server, Update Manager and SQL for flexibility during maintenance and to improve scalability of the vCenter management ecosystem. The new architecture of vCenter 6 simplifies the deployment model, but also reduces design and scaling flexibility, with only two component roles to deploy.

Dedicated Management Cluster
For anything other than the smallest of environments, VMware recommends separating all vSphere management components onto a separate out-of-band management cluster. The primary benefits of management component separation, include:

  • Facilitating quicker troubleshooting and problem resolution as management components are strictly contained in a relatively small and manageable cluster.
  • Providing resource isolation between workloads running in the production environment and the actual systems used to manage the infrastructure.
  • Separating the management components from the resources they are managing.

vCenter to Host operational latency
The number of network hops between the vCenter Server and the ESXi host affects operational latency. The ESXi host should reside as few network hops away from the vCenter Server as possible.

vCenter to SQL Server operational latency
The number of network hops between the vCenter Server and the SQL database also affects operational latency. Where possible, vCenter should reside on the same network segment as the supporting database. If appropriate, configure a DRS affinity rule to ensure that the vCenter Server and database server reside on the same ESXi host, reducing latency still further.

Java Max Heap Size 
vCloud Air Network service providers must ensure that the max heap size for Java virtual machine is set correctly based on the inventory size. Confirm heap size on JVM Heap settings on vCenter, Inventory Service, SSO and Web Client are checked. Monitor Web Services to verify. vSphere 5.1 & 5.5: http://kb.vmware.com/kb/2021302

Concurrent Client Connections
Whilst no always easy, attempt to limit the number of clients connected to vCenter Server, as this affects its performance. This is particularly the case for the traditional Windows C# client.

Performance Monitoring
Employ a performance monitoring tool to ensure the health of the vCenter ecosystem and to help troubleshoot problems when they arise. Where appropriate, configure a vROps Custom Dashboard for vCenter/Management components. Also ensure appropriate alerts and notifications on performance monitoring tools exist.

Virtual disk type
All vCenter Server virtual machine VMDK’s should be provisioned in an eagerZeroedThick format. This provides approximately a 10-20 percent performance improvement over the other two disk formats.

vCenter vNIC type
vCloud Air Network service providers should ensure to employ the VMXNET3 paravirtualized network adaptor to maximise network throughput, efficiency and reduce latency.

ODBC Connection
Ensure that the vCenter and VUM ODBC connections are configured with the minimum permissions required for daily operations. Additional permissions are typically required during installation and upgrade activities, but not for day to day operations. Please refer to the Service Account Permissions provided below.

vCenter Logs Clean Up
vCenter Server has no automated way of purging old vCenter Log files. These files can grow and consume a significant amount of disk space on the vCenter Server. Consider a 3/6 monthly scheduled task to delete or move log files older than the period of time defined by business requirements.

For instance, the VBscript below can be used to clean up old log files from vCenter. This script deletes files that are older than a fixed number of days, defined in line 9, from the path set in line 6. This VBscript can be configured to run as a scheduled task using the windows task scheduler.

Dim Fso
Dim Directory
Dim Modified
Dim Files
Set Fso = CreateObject("Scripting.FileSystemObject")
Set Directory = Fso.GetFolder("C:\ProgramData\VMware\VMware VirtualCenter\Logs\")
Set Files = Directory.Files
For Each Modified in Files
If DateDiff("D", Modified.DateLastModified, Now) > 180 Then Modified.Delete

For more information, refer to KB article: KB1021804 Location of vCenter Server log files.
For additional information on modifying logging levels in vCenter please refer to KB1004795 and KB1001584.

Note: Once a log file reaches a maximum size it is rotated and numbered similar to component-nnn.log files and they may be compressed.

Statistics Levels
The statistics collection interval determines the frequency at which statistic queries occur, the length of time statistical data is stored in the database, and the type of statistical data that is collected.

As historical performance statistics can take up to 90% of the vCenter server database size, it is the primary factor in the performance and scalability of the vCenter Server database. Retaining this performance data allow administrators to view the collected historical statistics, through the performance charts in the vSphere Web Client, through the traditional Windows Client or through command-line monitoring utilities, for up to 1 year after the data was first ingested into the database.

You must ensure that statistics collection times are set as conservatively as possible so that the system does not become overloaded. For instance, you could set a new DB Data Retention Period of 60 Days and configure the DB to not retain performance data beyond 60 days. At the same, it is equally important to ensure that the retention of this historical data meets the service provider’s data compliance requirements.

As this statistics data consumes such a large proportion of the database, proper management of these vCenter Server statistics is an important consideration for overall database health. This is achieved by the processing of this data through a series of rollup jobs, which stop the database server from becoming overloaded. This is a key consideration for vCenter Server performance and is addressed in more detail in Part 2 of this article.

Task and Events Retention
Operational teams should ensure that the Task and Events retention levels are set as conservatively as possible, whilst still meeting the service provider’s data retention and compliance requirements. Every time a task or event is executed via vCenter, it is stored in the database. For example, a task is created when an user powers on or off on a virtual machine and an event is generated when something occurs, such as the vCPU usage for a VM changing to red.

vCenter Server has a Database Retention Policy setting that allows you to specify after how long vCenter Server Tasks and Events should be deleted. This correlates to a database rollup job that purges the data from the database after the selected period of time. Whilst compared to statistical data these tables consume a relevantly small amount of database space, it is good practice to consider this option for further database optimization. For Instance, by default, vCenter is configured to store tasks and events data for 180 days. However, it might be possible, based on the service provider’s compliance requirements, to configure vCenter not to retain Event and Task Data in the database beyond 60 days.

vCenter Server Backup Best Practice
In addition to scheduling regular backups of the vCenter Server database, the backups for the vCenter Server should also include the SSL certificates and license key information.


Part 2 – SQL DB Server Operational Optimization (for vCenter Server)

SQL Database Server Disk Configuration
The vCenter Server database data file (mdf) generates mostly random I/O, while database transaction logs (ldf) generate mostly sequential I/O. The traffic for these files is almost always simultaneous so it’s preferable to keep these files on two separate storage resources, that don’t share disks or I/O. Therefore, where a large service provider inventory demands it, operational teams should ensure that the vCenter Server database uses separate drives for data and logs which, in turn, are backed by different physical disks.

tempDB Separation
For large service provider inventories, place tempDB on a different drive, backed by different physical disks than the vCenter database files or transaction logs.

Reduce Allocation Contention in SQL Server tempDB database
Consider using multiple data files to increase the I/O throughput to tempDB. Configure 1:1 alignment between TempDB files and vCPUs (up to eight) by spreading tempDB across at least as many equal sized files as there are vCPUs.

For instance, where 4 vCPUs exist on the SQL server, create three additional tempDB data files, and make them all equally sized. They should also be configured to grow in equal amounts. After changing the configuration, a restart of the SQL Server instance is required. For more information please refer to: http://support.microsoft.com/kb/2154845

Database Connection Pool
vCenter server starts, by default, with a database connection pool of 50 threads. This pool is then dynamically sized according to the vCenter Server’s workload. If high load is expected due to a large inventory, then the size of the pool can be increased to 128 threads. This will increase memory consumption and load time of the vCenter Server. To change the pool size, edit the vpxd.cfg file, adding, as below, where ‘128’ is the number of connection threads to be configured.

< vpxd>
< odbc>
< maxConnections>128
< /odbc>
< /vpxd>

Table Statistics
Update statistics of the SQL tables and indexes on a regular basis, for better overall performance of the database. Create an SQL job to carry out this task, or alternatively, it should form part of a vSphere database maintenance plan. http://sqlserverplanet.com/dba/update-statistics

Index Fragmentation (Not Applicable to vCenter 5.1 or newer)
Check for fragmentation of index objects and recreate indexes if needed. This happens with vCenter due to statistic roll ups. Defragment after <30% fragmentation. See this KB1003990.

Note: With the new enhancements and design changes made in the vCenter Server 5.1 database and later, this is no longer applicable or required.

Database Recovery Model
Depending on your vCenter database backup methodology, consider setting the transaction logs to SIMPLE recovery. This model will reduce the disk space needed for the logs as well decrease I/O load.

Choosing the Recovery Model for a Database: http://msdn.microsoft.com/en-us/library/ms175987(SQL.90).aspx
How to view or Change the Recovery Model of a Database in SQL Server Management Studio: http://msdn.microsoft.com/en-us/library/ms189272(SQL.90).aspx

Virtual Disk Type
Where the vCenter Server database server is a virtual machine, ensure that all VMDK’s are provisioned in an eagerZeroedThick format. This option provides approximately 10-20 percent performance improvement over the other two disk formats.

Verify SQL Rollup Jobs
Ensure all the SQL Agent rollup jobs have been created on the SQL server during the vCenter Server Installation. For instance:

  • Past Day stats rollup
  • Past Week stats rollup
  • Past Month stats rollup

For the full set of stored procedures and jobs please refer to the appropriate article below. Where necessary, recreate MSSQL agent rollup jobs. Note that detaching, attaching, importing, and restoring a database to a newer version of MSSQL Server does not automatically recreate these jobs. To recreate these jobs, if missing, please refer to: KB1004382.

KB 2033096 (vSphere 5.1, 5.5 & 6.0): http://kb.vmware.com/kb/2033096
KB 2006097 (vSphere 5.0): http://kb.vmware.com/kb/2006097

Also, ensure that the myDB references the vCenter Server database, and not the master or some other database. If these jobs reference any other database, you must delete and recreate the jobs.

Ensure database jobs are running correctly
Monitor scheduled database jobs to ensure they are running correctly. For more information, refer to KB article: Checking the status of vCenter Server performance rollup jobs: KB2012226

Verify MSSQL Permissions
Ensure that the local SQL and AD permissions required are in place, and align with the principle of least privilege (see below). If necessary, truncate all unrequired performance data from the database (Purging Historical Statistical Performance Data). For more information, refer to KB article: Reducing the size of the vCenter Server database when the rollup scripts take a long time to run KB1007453

Truncate all performance data from vCenter Server
As discussed in Part 1, to truncate all performance data from vCenter Server 5.1 and 5.5:

Warning: This procedure permanently removes all historical performance data. Ensure to take a backup of the database/schema before proceeding.

  1. Stop the VMware VirtualCenter Server service. Note: Ensure that you have a recent backup of the vCenter Server database before continuing.
  2. Log in to the vCenter Server database using SQL Management Studio.
  3. Copy and paste the contents of the SQL_truncate_5.x.sql script (available from the link below) into SQL Management Studio.
  4. Execute the script to delete the data.
  5. Restart the vCenter Server services.

For truncating data in vCenter Server and vCenter Server Appliance 5.1, 5.5, and 6.0, see Selective deletion of tasks, events, and historical performance data in vSphere 5.x and 6.x (2110031)

Shrink Database
After purging historical data from the database, optionally shrink the database. This is an online procedure to reduce the database size and to free up space on the VMDK, however, this activity will not in itself improve performance. For more information, refer to: Shrinking the size of the VMware vCenter Server SQL database KB1036738

For further information on Shrinking a Database, refer to: http://msdn.microsoft.com/en-us/library/ms189080.aspx

Rebuilding indexes to Optimize the performance of SQL Server
Configure regular maintenance job to rebuild indexes. KB2009918

  1. To rebuild the vCenter Server database indexes. Note, for a vCenter Server 5.1 and 5.5 database, download and extract the .sql files from the 2009918_rebuild_51.zip file attached to this procedure.
  2. Backup your vCenter Server database before proceeding. For more information, see Backing up and restoring vCenter Server 4.x and 5.x (1023985).
  3. These steps must be performed against the vCenter database and not the Master.
  4. Connect to the vCenter Server database using Management Studio for SQL Server
  5. Execute the .sql file to create the REBUILD_INDEX stored procedure, available from the above link.
  6. Execute the stored procedure that was created in the previous step: execute REBUILD_INDEX

VMware recommend a fill factor of 70% for the 4 VPX_HIST_STAT tables. If this recommended fill factor is too high for resources on the database server, then it will need to take time splitting pages, which equates to additional I/O.

If you are experiencing high unexplained I/O in the environment, monitor the SQL Server Access Methods object: Page Splits/sec. Page splits are expensive, and cause your table to perform more poorly due to fragmentation. Therefore, the fewer page splits you have the better your system will perform.

By decreasing the fill factor in your indexes, what you are doing is increasing the amount of empty space on each data page. The more empty space there is, the fewer page splits you will experience. On the other hand, having too much unnecessary empty space can also hurt performance because it means that less data is stored per page, which means it takes more disk I/O to read tables, and less data can be stored in the buffer cache.

High Page Splits/sec will result in the database being larger than necessary and having more pages to read during normal operations.

To determining where growth is occurring in the VMware vCenter Server database refer to:  http://kb.vmware.com/kb/1028356

For troubleshooting VPX_HIST_STAT table sizes in VMware vCenter Server 5, refer to: KB2038474

To reduce the size of the vCenter Server database when the rollup scripts take a long time to run, refer to: KB1007453

Monitor Database Growth
Service provider operational teams should monitor vCenter Server database growth over a period of time to ensure the database is functioning as expected. For more information, refer to KB article: Determining where growth is occurring in the vCenter Server database KB1028356

Schedule and verify regular database backups
The vCenter, SSO, VUM and SRM servers are by themselves stateless. The databases are far more critical since they store all the configuration and state information for each of the management components. These databases must be backed-up nightly and the restore process of each database needs to be tested periodically.

Operational teams should ensure that a schedule of regular backups exists of the vCenter database and based on requirements of the business, restore and mount databases from backup periodically onto a non-production system to ensure a clean recovery is possible, should database corruption or data loss occur in the production environment.

Create a Maintenance Plan for vSphere databases
Work with the DBA’s to create a daily and weekly database maintenance plan. For Instance:

  • Check Database Integrity
  • Rebuild Index
  • Update Statistics
  • Back Up Database (Full)
  • Maintenance Cleanup Task



Part 3 – Service Account Permissions (Least Privilege)

vCenter Service Account
Required by the ODBC Connection for access to the database, the vCenter service account must be configured with dbo_owner privileges for normal operational use. However, the vCenter database account being used to make the ODBC connection also requires the db_owner role on the MSDB System database, during installation or upgrade of the vCenter Server. This permission facilitates the installation of SQL Agent jobs for vCenter statistic rollups.

Typically, the DBA should only grant the vCenter service account the db_owner role on the MSDB System database when installing or upgrading vCenter, then revoke that role when these activities are complete.

RSA_DBO (vSphere 5.1 Only)
Only Required for SSO 5.1, the RSA_DBA account is a local SQL account which is used for creating the schema (DDL) and requires dbo_owner permissions.

RSA_USER (vSphere 5.1 Only)
Only Required for SSO 5.1, the RSA_USER reads and writes data (only DML).

VUM Service Account
Despite being a 64bit application, VUM requires a 32bit ODBC connection from “C:\Windows\SysWOW64\odbcad32.exe”. The VUM service account must be provide the dbo_owner permission on the VUM DB. The installation of vCenter Update Manager 5.x and 6.x with a Microsoft SQL back end database also requires the ODBC connection account to temporarily have db_owner permissions on the MSDB System database. This was a new requirement in vSphere 5.0.

As with the vCenter service account, typically the DBA would only grant the VUM service account the db_owner role for the MSDB System database during an install or upgrade to the VUM component of vCenter. This permission should then be revoked when that task has been completed.