Home > Blogs > vCloud Architecture Toolkit (vCAT) Blog

VMware Horizon Client (PCoIP & Blast) Connection Workflow

Since I published the Horizon 7 Network Ports diagram with the latest release of Horizon 7, I’ve been frequently asked about the connection flow between the Horizon Client and the virtual desktop. VMware Horizon supports RDP, PCoIP and now Blast Extreme. I’ll start with PCoIP and then we’ll look at Blast Extreme.

The connection flow of the Horizon Client is largely the same with Horizon 7, Horizon Air or Horizon DaaS. There may be differences in external load-balancing, Security Server or Access Point, and external URL configuration, but for this post I’ll focus on the Horizon Client itself and the aforementioned protocols.

Tunneled Connections (PCoIP)

VMware Horizon PCoIP Connection Flow
Continue reading

Enterprise Application Migration Technologies – Finding the Right Fit

Introduction

When looking at the adoption of public or hybrid cloud, one of the primary considerations must be how to migrate existing workloads to the target platform. Choosing the right migration tool(s) will prove critical in the coaching of customers, mainly their IT and application owners, to address this challenge. There are many VMware vCloud® Air™ Network architectures that can provide workload mobility where capabilities, like hybrid cloud networking enabled by VMware NSX®, and other solutions, such as VMware Site Recovery Manager™, might be in place. Enterprise migration technologies however, span a much broader scope than that of moving applications hosted on physical or virtual infrastructure to a cloud architecture. Specifically, these tools address the enterprise architecture features required to discover, plan, and execute migration, while allowing for scheduling and systems level dependencies.

VMware offers tools that address many of these needs and some have been described in the VMware vCloud Architecture Toolkit™ for Service Providers (vCAT-SP) blog and white paper.  As stated in the vCAT-SP documentation for migration, offerings will not meet all requirements for migrating workloads to the cloud, and the purpose of this series of blogs is to allow VMware Technology Partners to discuss their solutions and advocate for why they might be the best choice in many situations. Many standard forms of analysis will apply to the evaluation of enterprise migration technologies, including common items such as pricing, support, or strategic direction. This series of blogs will focus on the more technical aspects, such as ease of deployment/usage, versatility, reliability, scalability, and security. The blog entries will also cover optimal use cases addressed by the partner solutions, often with customer references.

The first blog in this series is with VMware Technology Partner ATADATA. In particular, their enterprise migration solution focusing on their ATAvision and ATAmotion products. The combination of these two offerings fits into the “Discover & Assess, Job Scheduling, Workload Migration, Application Verification” lifecycle described in the blog and vCAT-SP documentation referenced above. The first three letters of the ATADATA name are an acronym for “any to any” and their deployment model, shown in the following figure, indicates their abstraction from the underlying physical, virtual, or cloud infrastructures that are part of an enterprise migration. This capability enables their technology to not only support many platforms (see ATADATA supported platforms), but to provide a consistent abstraction of underlying details for migrating between sources and targets of any supported type.

Deployment

The deployment of the ATAmotion is as simple as establishing the central management console and preparing network/firewall communication between source systems and target cloud infrastructure, where individual copy processes will rendezvous with auto-provisioned machines on the target platform of choice. In addition, the ATAmotion management console (see migration demos at the bottom of this link) is simplified by being geared toward the administrator who understands the source environment and by providing configurable “connectors” that abstract particulars like VMware vCloud Director® APIs and networks.

atadata_architecture_diagrams_v7_FINAL vCloud.jpg

Scalability

The ATADATA architecture also facilitates another important feature, illustrated as “Direct Encrypted Data Transfer from Source to Target” below the bottom green line of the previous figure. This means that data is not staged when migrated between source and target machines, allowing ATAmotion to scale to very large transfers of source data. In addition, ATADATA’s support for a wide variety of operating systems (OS types) allow for the greatest compatibility with existing source operating systems to be migrated as well as capabilities to upgrade versions of Windows within the migration (ATAtransform).

Because of the scalability of ATAmotion with its proprietary multi-threaded copy engine, even when the migration data path is encrypted, you can bulk migrate groups of machines based on discovered dependencies. This bulk migration happens as a single wave due to integration across ATADATA modules.

Discovery

But what about discovery of those existing source application environments? Perhaps more importantly, how can you uncover the dependencies of the application services running in those operating system instances? That’s where ATAvision comes in. The following figure shows some of the features of ATAvision, including the ability to build a repository of application affinity and dependency maps which is accomplished without agents, but with encrypted stored administrative credentials (sudo for Linux and local administrator for Windows).

ATAvision

Agility

By combining the broadest support for sources and targets, discovery of application layer components and their interactions, and simplified deployment and management, service providers can focus on what matters—adding value for their customers. Migration to the cloud presents a tremendous opportunity for enterprise architecture transformation to control of not only architectures but operations as well. By leveraging ATADATA’s integrated solution for handling the myriad tasks involved in migrating enterprise applications to the cloud can eliminate much of the opacity that might exist between enterprise IT and application owners. By consolidating this outlook across these aspects and concerns, service providers and solution delivery partners are able to not only de-risk the migration to cloud but leave customers in a place with more control and streamlined operational models to take full advantage of paradigms like hybrid cloud.

Example Use Case

In complex enterprise application migration scenarios, it is now possible to take an existing application topology, SAP HANA as an example, and inject best practices from the vendor.  SAP HANA Software Defined Network Requirements can be leveraged in the creation of the target cloud environment that will support the enterprise application. By aligning items like firewall rulesets and providing API extensibility for integrating orchestration of the target VMware SDDC including VMware NSX, additional value can be added in the form of prescribed security controls aligned to best practices and regulations in the target cloud environment.

After an extensive vetting process, ATAmotion was recently selected by a VMware based service provider as the migration platform of choice to onboard SAP HANA for Fortune 200 clients. The challenges that the ATAmotion offering was able to meet included: ability to scale to large datasets over the wire, bulk workload migration, extensive Linux distribution support, and enterprise focus for enterprise business applications such as SAP HANA.

Conclusion

As previously stated, VMware has many native migration tools and utilities available and we encourage you to utilize these products for the appropriate use cases. However, enterprise scale cloud migration might require the adoption of third-party technologies that have been purpose built for the requisite tasks. Based on the points made above, ATADATA has a robust and versatile offering for most, if not all, scenarios likely to be encountered during enterprise workload migrations.

By carefully selecting migration tools capable of addressing the most complex of enterprise application requirements, VMware vCloud Air Network as well as other partners and customers can rest assured that there is ample opportunity to add value to customer environments forging relationships that maximize the value of migration to the cloud.

 

vRealize Automation Configuration with CloudClient for vCloud Air Network

As a number of vCloud Air Network service providers start to enhance their existing hosting offerings, VMware are seeing some demand from service providers to offer a dedicated vRealize Automation implementation to their end-customers to enable them to offer application services, heterogeneous cloud management and provisioning in a self-managed model.

This blog post details an implementation option where the vCloud Air Network service provider can offer “vRealize Automation as a Service” hosted in a vCloud Director vApp, with some additional automated configuration. This allows the service provider to offer vRealize Automation to their customers based out of their existing multi-tenancy IaaS platforms and achieve high levels of efficiency and economies of scale.

“vRealize Automation as a Service”

During a recent Proof of Concept demonstrating such a configuration, an vCloud Director Organizational vDC was configured for tenant consumption.  Within this Org vDC a vApp containing a simple installation of vRealize Automation was deployed that consisted of a vRealize Automation Appliance and one Windows Server for IaaS components and an instance of Microsoft SQL.  With vRealize Automation successfully deployed, the vRealize Automation instance was customized leveraging vRealize CloudClient via Microsoft PowerShell scripts.  Using this method for configuration of the tenant within vRealize Automation reduced the deployment time for vRealize Automation instances while ensuring that the vRealize Automation Tenant configuration was consistent and conformed to the pre-determined naming standards and conventions required by the provider.

vRaaS vCAN Operations

vRealize Automation

To reduce the complexity of the implementation vRealize Automation was deployed within a vApp using the simple install method as this was determined to meet the anticipated user and workload requirements for actual consumers.  In this solution we leveraged the minimal installation where each instance of vRealize Automation consists of:

  • vRealize Automation Appliance – The vRealize Automation Appliance provides the vRealize Automation portal, Identity services, and vRealize Orchestrator.
  • Windows IaaS Server – The vRealize Automation server will include an instance of Microsoft SQL, the vRealize Automation Model Manager, vRealize Automation Manager Service, DEM Orchestrator and DEM Worker.

 Tenant Consumption2

vRealize Automation Deployment

Deployment of vRealize Automation can be carried out by either a manual configuration or scripted installation.  Due to the new installation wizard introduced in vRealize Automation 7, the level of effort required for a Simple Install of vRealize Automation has be reduced.  For the purposes of this discussion we will assume that the manual installation method for vRealize Automation deployment is used.

 

vRealize Automation Configuration

An important place to introduce automation in this process is the configuration of vRealize Automation for relevant creation of Fabric Groups, Business Groups as well as Blueprint and Entitlement configuration. vCloud Air Network Operation Admins can choose to script these configuration steps by leveraging vRealize CloudClient.  Leveraging the vRealize Automation API, CloudClient is a Java based command line utility which can be used for the configuration of vRealize Automation as well as the display and export of configuration details for vRealize Automation.  Let’s take a look at some of the considerations when using CloudClient and Microsoft PowerShell when carrying out post configuration tasks.

PowerShell Script and Cloud Client

When using CloudClient for scripting, one of the first steps is to create and configure the  cloudclient.properties file.  This file contains the environment variables to be used when calling CloudClient for scripting tasks.  Please refer to the CloudClient documentation for details steps on the creation and configuration of the cloudclient.properties file.

The PowerShell script will be configured to accept a set of parameters to be included in line when the script is executed.  One benefit of doing this upfront is that the script will already be configured for remote execution from an orchestration engine such as vRealize Orchestrator.

param(
$idStoreDomain,
$idStoreBaseDn,
$idStoreLoginUserDn,
$idStoreDcUrl,
$tenantName,
$customerPrefix,
$credsUserName, 
$credsPassword, 
$ComputeResourceName
)

 
CloudClient Environment Variables

While the cloudclient.properties files contains settings that can be leveraged for scripted execution of CloudClient commands, these setting are static and may need to be changed during the scripted execution of some CloudClient commands to ensure the correct credentials are used for the successful execution of commands.  For example, while most CloudClient commands require credentials with the Tenant Administrator role, other commands such as “vra identitystore add” and require the System Administrator account administrator@vsphere.local account for successful execution.

To address this in the PowerShell script, we Prefix CloudClient environment variables with “$env:” followed by the name of the CloudClient environment variable to be updated.  Here is an example of updating the required environment variables to run CloudClient with the administrator@vsphere.local user.

##---------------------------------------------------------------------------
## Set the Enviroment Variables for use with the Add Identity Source Command
##---------------------------------------------------------------------------

$env:CLOUDCLIENT_SESSION_KEY="administrator"
$env:vra_server="vra01.corp.local"
$env:vra_username="administrator@vsphere.local"
$env:vra_tenant="vsphere.local"
$env:vra_password="VMware1!"

 

After the credentials for the administrator@vsphere.local account have been set, the we can proceed to execute the CloudClient commands to add the required accounts to the Tenant Administrator role and IaaS Administrator role:

###-------------------------------------------------------------------------
### 'vra tenant identitystore add' Section - Add Identity Store to 
###  vsphere.local Tenant
###-------------------------------------------------------------------------


## Construct 'vra identitystore add' Command
$idStoreAddCommand = $CMD + " vra tenant identitystore add --tenantname " + $tenantName + " --name "+ $idStoreDomain + " --domain " + 
$idStoreDomain + " --groupbasedn " + $idStoreBaseDn + " --userdn " + $idStoreLoginUserDn + " --password " + $credsPassword + 
" --type AD --url " + $idStoreDcUrl + " --userbasedn " + $idStoreBaseDn

## Print 'vra identitystore add' Command to screen and then execute
Write-Host $idStoreAddCommand
Invoke-Expression $idStoreAddCommand


###-------------------------------------------------------------------------
### 'vra tenant admin update' Section - Update Infrastructure Admin role for 
###  vsphere.local Tenant
###-------------------------------------------------------------------------

## Declare IaaS Admin Group
$iaasGroup = $customerPrefix + "-iaasadmin@" + $idStoreDomain

## Construct 'vra tenant admin update' Command
$tenantUpdateCommand = $CMD + " vra tenant admin update --tenantname " + 
$tenantName + " --role IAAS_ADMIN --action ADD --users " + $iaasGroup

## Print 'vra tenant admin update' Command to screen and then execute
Write-Host $tenantUpdateCommand
Invoke-Expression $tenantUpdateCommand 

 

In the above example, we declare variables to construct the CloudClient commands “vra tenant identity store add” and “vra tenant admin update” with the desired parameters, of which the former requires the administrator@vsphere.local credentials.  We then use the “Invoke-Expression” PowerShell commandlet to run the resulting CloudClient commands.

Once we have completed the necessary commands to update the Tenant and IaaS Administrator roles, we can update the environment credential variables for proper execution of vRealize Automation Tenant constructs:

##------------------------------------------------------------------
## Set Environment Variables for use with the rest of the commands
##------------------------------------------------------------------

$env:CLOUDCLIENT_SESSION_KEY="configurationadmin"
$env:vra_server="vra01.corp.local"
$env:vra_username="configurationadmin@vsphere.local"
$env:vra_tenant="vsphere.local"
$env:vra_password="VMware1!" 

 

At this point, additional scripting can be created to continue the customer configuration of the tenant such as:

  • Creation of the customer’s vCloud Director Organization vDC as an Endpoint
  • Fabric Group creation
  • Machine prefix
  • Business Group creation

Additionally, Services, Entitlements and the required Actions can be created for the consumption of pre created Converged Blueprints backed by standard templates offered by the vCloud Air Network Service Provider.  Once the necessary scripted tasks have been completed, reservations are created manually and the vRealize Automation instance can be turned over to the customer.

Conclusion

In this post we have explored some basic examples of using CloudClient and PowerShell to script the configuration of vRealize Automation.  This powerful tool can also be used by vCloud Air Network partners to automate the configuration of vRealize Automation instances on a per customer basis, creating a “vRealize Automation as a Service” (vRAaaS) offering that is managed by the service provider, combining the multi-tenancy of vCloud Director with the unique self-service portal experience of vRealize Automation.

Deep Dive Architecture Comparison of DaaS & VDI, Part 2

In part 1 of this blog series, I discussed the Horizon 7 architecture and a typical single-tenant deployment using Pods and Blocks. In this post I will discuss the Horizon DaaS platform architecture and how this offers massive scale for multiple tenants in a service provider environment.

Horizon DaaS Architecture

The fundamental difference with the Horizon DaaS platform is multi-tenancy architecture. There are no Connection or Security servers, but there are some commonalities. I mentioned Access Point previously, this was originally developed for Horizon Air, and is now a key component for both Horizon 7 and DaaS for remote access.

 

Horizon DaaS Architecture

If you take a look at the diagram above you’ll see these key differences. Let’s start with the management appliances.
Continue reading

Deep Dive Architecture Comparison of DaaS & VDI, Part 1

In this two part blog series, I introduce the architecture behind Horizon DaaS and the recently announced Horizon 7. From a service provider point of view, the Horizon® family of products offers massive scale from both single-tenant deployments and multi-tenanted service offerings.

Many of you are very familiar with the term Virtual Desktop Infrastructure (VDI), but I don’t think the term does any justice to the evolution of the virtual desktop. VDI can have very different meanings depending on who you are talking to. Back in 2007 when VMware acquired Propero, which soon became VDM (then View and Horizon), VDI was very much about brokering virtual machines running a desktop OS to end-users using a remote display protocol. Almost a decade later, VMware Horizon is vastly different and it has matured into an enterprise desktop and application delivery platform for any device. Really… Horizon 7 is the ultimate supercar of VDI compared to what it was a decade ago.

I’ve read articles that compare VDI to DaaS but they all seem to skip this evolution of VDI and compare it to the traditional desktop broker of the past. DaaS on the other hand provides the platform of choice for service providers offering Desktops as a Service. DaaS was acquired in October 2013 (formerly Desktone). In fact I remember the day of the announcement because I was working on a large VMware Horizon deployment for a service provider at the time.

For this blog post I’d like to start our comparisons on the fundamental architecture of the Horizon DaaS platform to Horizon 7 which was announced in February 2016. This article is aimed at consultants and architects wishing to learn more about the DaaS platform.
Continue reading

Managed Security Services Maturity Model for vCloud Air Network Service Providers

Introduction

We’ve all heard about the many successful cyber-attacks carried out in various industries. Rather than cite a few examples to establish background I would encourage you to review the annual report from Verizon called the Data Breach Digest. This report gives critical insight for understanding how the most pervasive of attacks are executed and what to protect against to impede or prevent them. In order to provide a sound architecture and operational model for this purpose of protection, let’s look at some universal principals that have emerged as a result of forensics from these events. Those principles are time and space. Space, in this case, is cyberspace and involves the moving digital components of the target systems that must be compromised to execute a successful attack. Time involves events that may occur at network or CPU speed, but it is the ability to trap those events and put them into a human context, in terms of minutes, hours, or days, where security operations can respond. The combination of unprotected attack vectors, already compromised components of the system, and the inability to spot them, creates what are known as “blind spots” and “dwell time” where an attacker can harvest additional information, and potentially expand to other attack vectors.

While all of that is hopefully easy to understand, we have to face the reality that many attacks still occur by using compromised credentials from social engineering. These credentials provide enough privilege to establish a foothold for command and control used in a cyber-attack. For this reason, we want to employ one of the core principles of the Managed Security Services Maturity Model, known as Zero Trust, or the idea that every action must have specific authentication, authorization and accounting (AAA) defined. By subscribing to this maturity model as a VMware vCloud® Air™ Network service provider, you will uncover ways in which you can leverage features, such VMware NSX® Distributed Firewall and micro-segmentation, putting you well on the road to offering services that can help customers address potential blind spots and reduce dwell time, thereby taking control and ownership of their cyber risk posture. No matter how nefarious a rogue entry into target systems is, or what escalated privilege was acquired, the Managed Security Services Model will limit the kind of lateral movement necessary to conduct consistent ongoing attacks, or what is known as an advanced persistent threat (APT). Although not all occurrences are APTs, by understanding the methods used in these most advanced attacks, we can isolate and protect aspects of the system required to execute a “kill chain,” essentially allowing ownership of a system in undetectable ways.

Managed Security Services Maturity Model

Cyber security, in its entirety, is a vast concept not to be given justice with a small set of blog articles and white papers. However, given the expansive nature of cyber-threats in this day and age, along with the ratio of successful attacks, information technology needs to continually seek out new approaches. One approach is to create as much of an IT environment as possible from known patterns and templates of installed technologies that can be deployed with a high fidelity of audit information to measure their collective effectiveness against cyber-threats. This turns on its head the idea of protecting environments against an exponentially exploding number of threats with greater diversity in the areas frequently attacked, and instead refines deployed environments to accept only activities that are well defined, with results that are well understood. Simply put, measure what you can trust. If it can’t be measured, it can’t be trusted.

Once again, this approach touches on a large concept, but it is finite in nature in that its definition seeks to gain the control needed to deliver sustainable security operations for customers. To further illustrate this point, let’s think about the idea of what a control and the maturity model affords the operator in pursuit of their target vision. First, is the idea of “control,” which simply put in cyber security terms means defining a behavior that can be measured. This could be architecture patterns expected from the provider layer, such as data privacy or geo-location, or automation and orchestration of security operations. Second, is the maturity model itself, which has prerequisites for executing on specific rungs of the model, along with providing operational and security benefits. One output of each rung of the maturity model is the potential set of services to be offered to aid in the completion the customer’s target cyber security vision.

Enter the Managed Security Services Maturity Model, which encodes the methodology for capturing each customer’s ideal approach and provides five different maturity “layers” that aid vCloud Air Network service providers in delivering highly secure hybrid cloud environments. Looking at Figure 1, we can see that the ideas of time and “geometry” (networks and boundaries we have defined), along with the provider (below the horizontal blue line) and consumer (operating system and application runtimes) layers, provide us the cyber dimensions we seek to define and measure.

Maturity Model

Figure 1. Managed Security Services Maturity Model

Like most capability maturity models, when starting from the bottom we can often borrow attributes and patterns for service from the layers above. Generally, however, we need to accomplish the prerequisites for the upper layers (Orchestrated and above) to truly be considered operating at that layer. Often, there are issues of completeness where we must perform these prerequisite tasks n number of times in the design of our architecture and operations to have mobility to upper levels. For instance, to complete the Automation level, you should plan to automate on the order of about a dozen elements although your mileage may vary.

You may find more work to be done moving up the levels as you determine the right composition and critical mass of controls appropriate to deliver for targeted customer profiles. In the case of our maturity model, we will bind several concepts at each level to ultimately achieve the Zen-like “Advanced” layer 5, where we truly realize the completeness of the vision to own cyber security for our customers. A big responsibility to be sure, but perhaps a bigger opportunity to change the game from the status quo. The offering of managed services composed of facets from all levels is not for everyone but there is plenty of room to add value from all layers.

We have defined the following layers for the Managed Security Services Maturity Model:

  1. Basic

At this level, we introduce VMware NSX, VXLAN, and the Distributed Firewall to the hybrid cloud environment. This allows us to create controlled boundaries and security policies that can be applied in an application-centric fashion, resulting in focused operating contexts for security operations.

  1. Automated

At this level, we want to automate the behavior of the system with regard to controls. This will prompt security operations with events generated by discreet controls and their performance involving established measurements or tolerances. The goal is to automate as many controls as possible to become Orchestrated.

  1. Orchestrated

After we have many controls automated, we want to make them recombinant in ways that allow for controlling the space, or the “geometry”, along with coordinating events, information, automated reactions, and so on, which will allow us to drive down response times. These combinations will result in “playbooks,” or collections of controls assembled in patterns that are used to combat cyber threats.

  1. Lifecycle

Taking on full lifecycle responsibility means just that. We might monitor in-guest security aspects like anti-virus/malware or vulnerability scanning in discreet, automated, and even orchestrated ways in previous levels. This level, however, is about actually taking ownership of operating systems and perhaps even application runtimes within the customer virtual machines. By extending managed services to include what is inside the virtual machines themselves, it is possible to take ownership of all facets of cyber security regarding applications in the hybrid cloud.

  1. Advanced

At the Advanced level, we must be able to leverage all previous levels in such a way that managed services can be deployed to remediate a cyber-threat or execute on a risk management plan to help address security issues of all types. Additionally, we want our resulting cyber toolkit derived from the maturity model to become portable, in appliance form, where managed security services can be delivered anywhere in the hybrid cloud network.

In the upcoming series of blog postings that describe VMware vCloud Architecture Toolkit for Service Providers (vCAT-SP) reference architecture design blueprints and use cases for each maturity level, vCloud Air Network service providers can help customer’s to visualize what it will take to both architect and operate managed security services used to augment the hybrid cloud delivery model.

Eliminating Blind Spots and Reducing Dwell Time

The cyber defense strategies that are devised based on achieving levels of the maturity model focus on defining individual elements within the system. Management user interfaces, ports, session authentication, as well as virtual machine file systems, network communications, and so on, should be defined to allow alignment of controls. In addition, the provisioning of networks between the resources that consume services and those that provide them, such as management components like VMware vCloud Director® or VMware vCenter™, DNS, or Active Director and logging of network components (including those that serve end user applications to their communities), should also occur in as highly an automated fashion as possible.

In this way, human-centric, error-prone activities can be eliminated from consideration as potential vulnerabilities, although automated detection of threats by discreet components across cyber dimensions is still expected. A high level example of how we expect these discreet, automated controls to behave is described by Gartner, who defines the concept of a “cloud security gateway” as “the ability to interject enterprise security policies as the cloud-based resources are accessed”. By defining controls for system elements and their groupings in this way, we can form a fully identified inventory of what is being managed and by whom as well as where it resides. Likewise, by understanding and quantifying the controls in the system that are applied collectively to these elements, we can begin to measure and score their effectiveness. This harmonization is critical to deliver the consistency in the enforcement mechanisms we can rely on across both sides of the hybrid cloud creating the foundation of trust.

Despite our efforts to inventory all elements within systems, attacks will still arrive from the outside world in the user portions of the application stack, for example, through SQL injection or using cross-site scripting techniques. The threat of compromised insider privileged users will still be present as will “social engineering” methods of obtaining passwords. However, the “escape” of a rogue, privileged user to a realm from which they can continue their attack has been minimized. We have taken the elements of time and space and defined them to our advantage, creating a high security prison effect and requiring new vulnerability exploits to be executed for each step in the kill chain.

Because the attackers generally deal with a limited budget and time in which to execute a successful attack, often times even our simplest security approaches are enough to make us the safest house on the block. Also, because of the likelihood that all activities that occur within the environment are well known, effectively generating high confidence indicators or signals, and very little noise as a sensor, anomalies are easy to spot. Given the presentation of those anomalies and playbooks already available to address many adverse operating conditions, you are providing customers the ability to deliver a credible response to threats, something that many lack today.

Conclusion

The goal of vCloud Air Network service providers and their partners should be identifying cyber security challenges that customers face, as well as which meaningful, coarsely grained packages of managed services can be offered to help tackle those challenges. By aligning with the Managed Security Services Maturity Model, providers can leverage the VMware SDDC and VMware NSX software-defined networking and security capabilities to deliver something truly unique in the enterprise IT industry—a secure hybrid cloud. By further aligning these capabilities and services with those of application migration and DevOps (stay tuned for blogs on those and other subjects), and taking ownership of the full lifecycle of security, the potential of effectively remediating existing threats becomes possible. Together, we can help customers evaluate their risk profile, as well as understand how these techniques can minimize attack points and vectors and reduce response times, while increasing effectiveness in fighting cyber threats.

What you’ll see throughout the Managed Security Services Maturity Model is the creation of a “ubiquity” of security controls across each data center participating in the hybrid cloud. This ubiquity will allow for a consistent, trusted foundation from which the performance of the architecture and operations can be measured. Individual policies can then be constructed across this trusted foundation relative to specific security contexts consisting of applications and their users as well as administrators and their actions, leaving very little room for threats to go unnoticed. As these policies are enforced by the controls of the trusted foundation, cyber security response becomes more agile because all components are performing in a well understood fashion. Think of military special forces training on a “built for purpose” replica of an area they plan to assault to minimize unexpected results. Security operators can now be indoctrinated and immersed, knowing what scenes are expected to play out instead of constantly looking for the needle in the haystack. This will also ultimately create the ideal conditions for helping to rationalize unfettered consumption of elastic resources while also fulfilling the vision and realizing the potential of the hybrid cloud.

Streamlining VMware vCloud Air Network Customer Onboarding with VMware NSX Edge Services

When migrating private cloud workloads to a public or hosted cloud provider, the methods used to facilitate customer onboarding can provide some of the most critical challenges. The cloud service provider requires a method for onboarding tenants that reduces the need for additional equipment or contracts that often create barriers for customers when moving enterprise workloads onto a hosting or public cloud offering.

Customer Onboarding Scenarios

When a service provider is preparing for customer onboarding, there are a few options that can be considered. Some of the typical onboarding scenarios are:

  • Migration of live workloads
  • Offline data transfer of workloads
  • Stretching on-premises L2 networks
  • Remote site and user access to workloads

One of the most common scenarios is workload migration. For some implementations, this means migrating private cloud workloads to a public cloud or hosted service provider’s infrastructure. One path to migration leverages VMware vSphere® vMotion® to move live VMs from the private cloud to the designated CSP environment. In situations where this is not feasible, service providers can supply options for the offline migration of on-premises workloads where private cloud workloads that are marked for migration are copied to physical media, shipped to the service provider, and then deployed within the public cloud or hosted infrastructure. In some cases, migration can also mean the ability to move workloads between private cloud and CSP infrastructure on demand.

Providing Connectivity for Customer Onboarding

When contemplating the method of migrating workloads during customer onboarding, one of the critical enablers of the migration effort is network connectivity. Some enterprise workloads might require access to Layer 2 segments that exist on-premises, while others might require access to essential underlying infrastructure systems, such as Active Directory/LDAP, DNS, CMDBs, and other systems that cannot be moved due to internal policy mandates, or the perceived increased cost and management of duplicating these components. In some cases, this access is needed only during the migration, while in others, the access might be required over a longer period of time. For example, with workloads that require on-premises Active Directory or LDAP access.

NSX Edge Service for vCloud Air Network Onboarding

VMware NSX® brings many of benefits of SDN to vCloud Air Network service providers, including increased security through micro-segmentation, reduced hardware costs, and the programmatic control of networking functions and services. VMware NSX provides a useful set of features for vCloud Air Network partners and customers alike. In addition to these capabilities, VMware NSX also provides essential services such as Virtual Private Networking (VPN) and Network Address Translation services that can be leveraged to facilitate the onboarding of customers to a vCloud Air Network partner infrastructure.

One feature example of VMware NSX built-in VPN functionality is the Layer 2 virtual private network (L2VPN) service. With the L2VPN service, vCloud Air Network providers implementing VMware NSX can provide customers with the ability to extend Layer 2 networks from their on-premises data centers to the VMware NSX environment of their chosen vCloud Air Network service provider. This functionality can even be extended to customers that need the benefits of hosting enterprise workloads on the public cloud, but have not yet implemented VMware NSX within their on-premises data centers.

This powerful VMware NSX Edge™ service provides an SDN solution for workloads that must be migrated to a vCloud Air Network public cloud or hosting provider, while maintaining original IP address and L2 connectivity. The L2VPN feature of VMware NSX is also an efficient way to enable long-distance vSphere vMotion between on-premises and vCloud Air Network hybrid clouds. The combination of these features can be leveraged in scenarios for a one-time vSphere vMotion based migration or ongoing workload mobility between on-premises and the vCloud Air Network infrastructure. For one approach to accomplishing the migration of live workloads, see the Live Workload Mobility to a vCloud Air Network IaaS Provider blog.

Example of vCloud Air Network Workload Migration with L2VPN

Intro - NSX to Standalone L2VPN v2

There are also situations in which a customer will simply need the ability to remotely access their workloads that have been migrated or deployed to their chosen vCloud Air Network partner. Customers want an encrypted connection to maintain the security standards they expect when accessing these critical enterprise workloads. In these scenarios, VMware NSX Edge™ can provide IPsec and SSL VPN services to extend site-to-site and remote user-to-site access to workloads residing behind the NSX Edge appliance within the vCloud Air Network provider’s data center. Additionally, VPN services such as IPsec can be leveraged to enable workloads that are deployed to a vCloud Air Network service provider to access on-premises systems that will not be moved during migration efforts.

Example of Remote Management of vCloud Air Network Workloads with SSL VPN

Intro -SSL-VPN Plus v2

Conclusion

Using VMware NSX, vCloud Air Network partners are able to take an SDN approach to streamline the onboarding of customer workloads to vCloud Air Network public cloud and hosting environments. From extending L2 networks from a customer’s on-premises data center to a vCloud Air Network powered Hosting provider to enabling remote access to deployed workloads, VMware NSX can be leveraged to assist with customer onboarding without the need for additional hardware. In turn, customers of VMware vCloud Air Network partners benefit from this by being able to efficiently migrate existing workloads to a vCloud Air Network partner’s vSphere based infrastructure.

This blog post has outlined some of the features that VMware NSX can provide to VMware vCloud Air Network partners and their end customers for onboarding. Stay tuned for follow-up posts that expand on these use cases and for additional ways that VMware vCloud Air Network partners can ease the path for customer onboarding with VMware NSX.

vCenter Server Scalability for Service Providers

Designing and architecting monster vCloud Air Network service provider environments takes VMware technology to its very limits, in terms of both scalability and complexity. vCenter Server, and its supporting services, such as SSO, are at the heart of the vSphere infrastructure, even in cloud service provider environments where a Cloud Management Platform (CMP) is employed to abstract the service presentation away from vCenter Server.

Meeting service provider scalability requirements with vCenter Server requires optimization at every level of the design, in order to implement a robust technical platform that can scale to its very limits, whilst also maintain operational efficiency and support.

This article outlines design considerations around optimization of Microsoft Windows vCenter Server instances and best practice recommendations, in order to maximize operational performance of your vCenter ecosystem, which is particularly pertinent when scaling over 400 host servers. Each item listed below should be addressed in the context of the target environment, and properly evaluated before implementation, as there is no one solution to optimize all vCenter Server instances.

The following is simply a list of recommendations that should, to some extent, improve performance in large service provider environments. This blog targets the Windows variant of vCenter Server 5.x and 6.x with a Microsoft SQL database, which is still the most commonly deployed configuration.

Warning: Some of the procedures and tasks outlined in this article are potentially destructive to data, and therefore should only be undertaken by experienced personnel once all appropriate safeguards, such as backed up data and a tested recovery procedure, are in place.

 

Part 1 – vCenter Server Operational Optimization

vCenter Server Sizing
vCloud Air Network service providers must ensure that the vCenter virtual system(s) are sized accordingly, based on their inventory size. Where vCenter components are separated and distributed across multiple virtual machines, ensure that all systems meet the sizing recommendations set out in the installation and configuration documentation.

vSphere 5.5: https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html
vSphere 6.0: https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html
vSphere 5.1: http://kb.vmware.com/kb/2021202

Distribute vCenter Services across multiple virtual machines (vSphere 5.5)
In vSphere 5.5, depending on inventory size, multiple virtual machines can be used to accommodate different vCenter roles. VMware recommends separating VMware vCenter, SSO Server, Update Manager and SQL for flexibility during maintenance and to improve scalability of the vCenter management ecosystem. The new architecture of vCenter 6 simplifies the deployment model, but also reduces design and scaling flexibility, with only two component roles to deploy.

Dedicated Management Cluster
For anything other than the smallest of environments, VMware recommends separating all vSphere management components onto a separate out-of-band management cluster. The primary benefits of management component separation, include:

  • Facilitating quicker troubleshooting and problem resolution as management components are strictly contained in a relatively small and manageable cluster.
  • Providing resource isolation between workloads running in the production environment and the actual systems used to manage the infrastructure.
  • Separating the management components from the resources they are managing.

vCenter to Host operational latency
The number of network hops between the vCenter Server and the ESXi host affects operational latency. The ESXi host should reside as few network hops away from the vCenter Server as possible.

vCenter to SQL Server operational latency
The number of network hops between the vCenter Server and the SQL database also affects operational latency. Where possible, vCenter should reside on the same network segment as the supporting database. If appropriate, configure a DRS affinity rule to ensure that the vCenter Server and database server reside on the same ESXi host, reducing latency still further.

Java Max Heap Size 
vCloud Air Network service providers must ensure that the max heap size for Java virtual machine is set correctly based on the inventory size. Confirm heap size on JVM Heap settings on vCenter, Inventory Service, SSO and Web Client are checked. Monitor Web Services to verify. vSphere 5.1 & 5.5: http://kb.vmware.com/kb/2021302

Concurrent Client Connections
Whilst no always easy, attempt to limit the number of clients connected to vCenter Server, as this affects its performance. This is particularly the case for the traditional Windows C# client.

Performance Monitoring
Employ a performance monitoring tool to ensure the health of the vCenter ecosystem and to help troubleshoot problems when they arise. Where appropriate, configure a vROps Custom Dashboard for vCenter/Management components. Also ensure appropriate alerts and notifications on performance monitoring tools exist.

Virtual disk type
All vCenter Server virtual machine VMDK’s should be provisioned in an eagerZeroedThick format. This provides approximately a 10-20 percent performance improvement over the other two disk formats.

vCenter vNIC type
vCloud Air Network service providers should ensure to employ the VMXNET3 paravirtualized network adaptor to maximise network throughput, efficiency and reduce latency.

ODBC Connection
Ensure that the vCenter and VUM ODBC connections are configured with the minimum permissions required for daily operations. Additional permissions are typically required during installation and upgrade activities, but not for day to day operations. Please refer to the Service Account Permissions provided below.

vCenter Logs Clean Up
vCenter Server has no automated way of purging old vCenter Log files. These files can grow and consume a significant amount of disk space on the vCenter Server. Consider a 3/6 monthly scheduled task to delete or move log files older than the period of time defined by business requirements.

For instance, the VBscript below can be used to clean up old log files from vCenter. This script deletes files that are older than a fixed number of days, defined in line 9, from the path set in line 6. This VBscript can be configured to run as a scheduled task using the windows task scheduler.

Dim Fso
Dim Directory
Dim Modified
Dim Files
Set Fso = CreateObject("Scripting.FileSystemObject")
Set Directory = Fso.GetFolder("C:\ProgramData\VMware\VMware VirtualCenter\Logs\")
Set Files = Directory.Files
For Each Modified in Files
If DateDiff("D", Modified.DateLastModified, Now) > 180 Then Modified.Delete
Next

For more information, refer to KB article: KB1021804 Location of vCenter Server log files.
For additional information on modifying logging levels in vCenter please refer to KB1004795 and KB1001584.

Note: Once a log file reaches a maximum size it is rotated and numbered similar to component-nnn.log files and they may be compressed.

Statistics Levels
The statistics collection interval determines the frequency at which statistic queries occur, the length of time statistical data is stored in the database, and the type of statistical data that is collected.

As historical performance statistics can take up to 90% of the vCenter server database size, it is the primary factor in the performance and scalability of the vCenter Server database. Retaining this performance data allow administrators to view the collected historical statistics, through the performance charts in the vSphere Web Client, through the traditional Windows Client or through command-line monitoring utilities, for up to 1 year after the data was first ingested into the database.

You must ensure that statistics collection times are set as conservatively as possible so that the system does not become overloaded. For instance, you could set a new DB Data Retention Period of 60 Days and configure the DB to not retain performance data beyond 60 days. At the same, it is equally important to ensure that the retention of this historical data meets the service provider’s data compliance requirements.

As this statistics data consumes such a large proportion of the database, proper management of these vCenter Server statistics is an important consideration for overall database health. This is achieved by the processing of this data through a series of rollup jobs, which stop the database server from becoming overloaded. This is a key consideration for vCenter Server performance and is addressed in more detail in Part 2 of this article.

Task and Events Retention
Operational teams should ensure that the Task and Events retention levels are set as conservatively as possible, whilst still meeting the service provider’s data retention and compliance requirements. Every time a task or event is executed via vCenter, it is stored in the database. For example, a task is created when an user powers on or off on a virtual machine and an event is generated when something occurs, such as the vCPU usage for a VM changing to red.

vCenter Server has a Database Retention Policy setting that allows you to specify after how long vCenter Server Tasks and Events should be deleted. This correlates to a database rollup job that purges the data from the database after the selected period of time. Whilst compared to statistical data these tables consume a relevantly small amount of database space, it is good practice to consider this option for further database optimization. For Instance, by default, vCenter is configured to store tasks and events data for 180 days. However, it might be possible, based on the service provider’s compliance requirements, to configure vCenter not to retain Event and Task Data in the database beyond 60 days.

vCenter Server Backup Best Practice
In addition to scheduling regular backups of the vCenter Server database, the backups for the vCenter Server should also include the SSL certificates and license key information.

 

Part 2 – SQL DB Server Operational Optimization (for vCenter Server)

SQL Database Server Disk Configuration
The vCenter Server database data file (mdf) generates mostly random I/O, while database transaction logs (ldf) generate mostly sequential I/O. The traffic for these files is almost always simultaneous so it’s preferable to keep these files on two separate storage resources, that don’t share disks or I/O. Therefore, where a large service provider inventory demands it, operational teams should ensure that the vCenter Server database uses separate drives for data and logs which, in turn, are backed by different physical disks.

tempDB Separation
For large service provider inventories, place tempDB on a different drive, backed by different physical disks than the vCenter database files or transaction logs.

Reduce Allocation Contention in SQL Server tempDB database
Consider using multiple data files to increase the I/O throughput to tempDB. Configure 1:1 alignment between TempDB files and vCPUs (up to eight) by spreading tempDB across at least as many equal sized files as there are vCPUs.

For instance, where 4 vCPUs exist on the SQL server, create three additional tempDB data files, and make them all equally sized. They should also be configured to grow in equal amounts. After changing the configuration, a restart of the SQL Server instance is required. For more information please refer to: http://support.microsoft.com/kb/2154845

Database Connection Pool
vCenter server starts, by default, with a database connection pool of 50 threads. This pool is then dynamically sized according to the vCenter Server’s workload. If high load is expected due to a large inventory, then the size of the pool can be increased to 128 threads. This will increase memory consumption and load time of the vCenter Server. To change the pool size, edit the vpxd.cfg file, adding, as below, where ‘128’ is the number of connection threads to be configured.

< vpxd>
< odbc>
< maxConnections>128
< /odbc>
< /vpxd>

Table Statistics
Update statistics of the SQL tables and indexes on a regular basis, for better overall performance of the database. Create an SQL job to carry out this task, or alternatively, it should form part of a vSphere database maintenance plan. http://sqlserverplanet.com/dba/update-statistics

Index Fragmentation (Not Applicable to vCenter 5.1 or newer)
Check for fragmentation of index objects and recreate indexes if needed. This happens with vCenter due to statistic roll ups. Defragment after <30% fragmentation. See this KB1003990.

Note: With the new enhancements and design changes made in the vCenter Server 5.1 database and later, this is no longer applicable or required.

Database Recovery Model
Depending on your vCenter database backup methodology, consider setting the transaction logs to SIMPLE recovery. This model will reduce the disk space needed for the logs as well decrease I/O load.

Choosing the Recovery Model for a Database: http://msdn.microsoft.com/en-us/library/ms175987(SQL.90).aspx
How to view or Change the Recovery Model of a Database in SQL Server Management Studio: http://msdn.microsoft.com/en-us/library/ms189272(SQL.90).aspx

Virtual Disk Type
Where the vCenter Server database server is a virtual machine, ensure that all VMDK’s are provisioned in an eagerZeroedThick format. This option provides approximately 10-20 percent performance improvement over the other two disk formats.

Verify SQL Rollup Jobs
Ensure all the SQL Agent rollup jobs have been created on the SQL server during the vCenter Server Installation. For instance:

  • Past Day stats rollup
  • Past Week stats rollup
  • Past Month stats rollup

For the full set of stored procedures and jobs please refer to the appropriate article below. Where necessary, recreate MSSQL agent rollup jobs. Note that detaching, attaching, importing, and restoring a database to a newer version of MSSQL Server does not automatically recreate these jobs. To recreate these jobs, if missing, please refer to: KB1004382.

KB 2033096 (vSphere 5.1, 5.5 & 6.0): http://kb.vmware.com/kb/2033096
KB 2006097 (vSphere 5.0): http://kb.vmware.com/kb/2006097

Also, ensure that the myDB references the vCenter Server database, and not the master or some other database. If these jobs reference any other database, you must delete and recreate the jobs.

Ensure database jobs are running correctly
Monitor scheduled database jobs to ensure they are running correctly. For more information, refer to KB article: Checking the status of vCenter Server performance rollup jobs: KB2012226

Verify MSSQL Permissions
Ensure that the local SQL and AD permissions required are in place, and align with the principle of least privilege (see below). If necessary, truncate all unrequired performance data from the database (Purging Historical Statistical Performance Data). For more information, refer to KB article: Reducing the size of the vCenter Server database when the rollup scripts take a long time to run KB1007453

Truncate all performance data from vCenter Server
As discussed in Part 1, to truncate all performance data from vCenter Server 5.1 and 5.5:

Warning: This procedure permanently removes all historical performance data. Ensure to take a backup of the database/schema before proceeding.

  1. Stop the VMware VirtualCenter Server service. Note: Ensure that you have a recent backup of the vCenter Server database before continuing.
  2. Log in to the vCenter Server database using SQL Management Studio.
  3. Copy and paste the contents of the SQL_truncate_5.x.sql script (available from the link below) into SQL Management Studio.
  4. Execute the script to delete the data.
  5. Restart the vCenter Server services.

For truncating data in vCenter Server and vCenter Server Appliance 5.1, 5.5, and 6.0, see Selective deletion of tasks, events, and historical performance data in vSphere 5.x and 6.x (2110031)

Shrink Database
After purging historical data from the database, optionally shrink the database. This is an online procedure to reduce the database size and to free up space on the VMDK, however, this activity will not in itself improve performance. For more information, refer to: Shrinking the size of the VMware vCenter Server SQL database KB1036738

For further information on Shrinking a Database, refer to: http://msdn.microsoft.com/en-us/library/ms189080.aspx

Rebuilding indexes to Optimize the performance of SQL Server
Configure regular maintenance job to rebuild indexes. KB2009918

  1. To rebuild the vCenter Server database indexes. Note, for a vCenter Server 5.1 and 5.5 database, download and extract the .sql files from the 2009918_rebuild_51.zip file attached to this procedure.
  2. Backup your vCenter Server database before proceeding. For more information, see Backing up and restoring vCenter Server 4.x and 5.x (1023985).
  3. These steps must be performed against the vCenter database and not the Master.
  4. Connect to the vCenter Server database using Management Studio for SQL Server
  5. Execute the .sql file to create the REBUILD_INDEX stored procedure, available from the above link.
  6. Execute the stored procedure that was created in the previous step: execute REBUILD_INDEX

VPX_HIST_STAT Table Sizes
VMware recommend a fill factor of 70% for the 4 VPX_HIST_STAT tables. If this recommended fill factor is too high for resources on the database server, then it will need to take time splitting pages, which equates to additional I/O.

If you are experiencing high unexplained I/O in the environment, monitor the SQL Server Access Methods object: Page Splits/sec. Page splits are expensive, and cause your table to perform more poorly due to fragmentation. Therefore, the fewer page splits you have the better your system will perform.

By decreasing the fill factor in your indexes, what you are doing is increasing the amount of empty space on each data page. The more empty space there is, the fewer page splits you will experience. On the other hand, having too much unnecessary empty space can also hurt performance because it means that less data is stored per page, which means it takes more disk I/O to read tables, and less data can be stored in the buffer cache.

High Page Splits/sec will result in the database being larger than necessary and having more pages to read during normal operations.

To determining where growth is occurring in the VMware vCenter Server database refer to:  http://kb.vmware.com/kb/1028356

For troubleshooting VPX_HIST_STAT table sizes in VMware vCenter Server 5, refer to: KB2038474

To reduce the size of the vCenter Server database when the rollup scripts take a long time to run, refer to: KB1007453

Monitor Database Growth
Service provider operational teams should monitor vCenter Server database growth over a period of time to ensure the database is functioning as expected. For more information, refer to KB article: Determining where growth is occurring in the vCenter Server database KB1028356

Schedule and verify regular database backups
The vCenter, SSO, VUM and SRM servers are by themselves stateless. The databases are far more critical since they store all the configuration and state information for each of the management components. These databases must be backed-up nightly and the restore process of each database needs to be tested periodically.

Operational teams should ensure that a schedule of regular backups exists of the vCenter database and based on requirements of the business, restore and mount databases from backup periodically onto a non-production system to ensure a clean recovery is possible, should database corruption or data loss occur in the production environment.

Create a Maintenance Plan for vSphere databases
Work with the DBA’s to create a daily and weekly database maintenance plan. For Instance:

  • Check Database Integrity
  • Rebuild Index
  • Update Statistics
  • Back Up Database (Full)
  • Maintenance Cleanup Task

Warning: DO NOT SHRINK DB IN MAINTENANCE PLAN UNLESS THERE IS A SPECIFIC REQUIREMENT TO RECLAIM DISK SPACE: http://msdn.microsoft.com/en-us/library/ms189080.aspx

 

Part 3 – Service Account Permissions (Least Privilege)

vCenter Service Account
Required by the ODBC Connection for access to the database, the vCenter service account must be configured with dbo_owner privileges for normal operational use. However, the vCenter database account being used to make the ODBC connection also requires the db_owner role on the MSDB System database, during installation or upgrade of the vCenter Server. This permission facilitates the installation of SQL Agent jobs for vCenter statistic rollups.

Typically, the DBA should only grant the vCenter service account the db_owner role on the MSDB System database when installing or upgrading vCenter, then revoke that role when these activities are complete.

RSA_DBO (vSphere 5.1 Only)
Only Required for SSO 5.1, the RSA_DBA account is a local SQL account which is used for creating the schema (DDL) and requires dbo_owner permissions.

RSA_USER (vSphere 5.1 Only)
Only Required for SSO 5.1, the RSA_USER reads and writes data (only DML).

VUM Service Account
Despite being a 64bit application, VUM requires a 32bit ODBC connection from “C:\Windows\SysWOW64\odbcad32.exe”. The VUM service account must be provide the dbo_owner permission on the VUM DB. The installation of vCenter Update Manager 5.x and 6.x with a Microsoft SQL back end database also requires the ODBC connection account to temporarily have db_owner permissions on the MSDB System database. This was a new requirement in vSphere 5.0.

As with the vCenter service account, typically the DBA would only grant the VUM service account the db_owner role for the MSDB System database during an install or upgrade to the VUM component of vCenter. This permission should then be revoked when that task has been completed.

Leveraging Virtual SAN for Highly Available Management Clusters

A pivotal element in each Cloud Service Provider service plan is the class of service being offered to the tenants. The amount of moving parts in a data center raises legitimate questions about the reliability of each component and its influence on the overall solution. Cloud infrastructure and services are built on the traditional three pillars: compute, networking and storage, assisted by security and availability technologies and processes.

The Cloud Management Platform (CMP) is the management foundation for VMware vCloud® Air Network™ providers with a critical set of components that deliver a resilient environment for vCloud consumers.

This blog post highlights how a vCloud Air Network provider can leverage VMware Virtual SAN™ as a cost effective, highly available storage solution for cloud services management environments, and how the availability requirements set by the business can be achieved.

Management Cluster

A management cluster is a group of hosts joined together and reserved for powering the components that provide infrastructure management services to the environment, some of which include the following:

  • VMware vCenter Server™ and database, or VMware vCenter Server Appliance™
  • VMware vCloud Director® cells and database
  • VMware vRealize® Orchestrator™
  • VMware NSX® Manager™
  • VMware vRealize Operations Manager™
  • VMware vRealize Automation™
  • Optional infrastructure services to adapt the service provider offering (LDAP, NTP, DNS, DHCP, and so on)

To help guarantee predictable reliability, steady performance, and separation of duties as a best practice, a management cluster should be deployed over an underlying layer of dedicated compute and storage resources without having to compete with business or tenant workloads. This practice also simplifies the approach for data protection, availability, and recoverability of the service components in use on the management cluster.

Blog - Leveraging VSAN for HA management clusters_1

Rationale for a Software-Defined Storage Solution

The use of traditional storage devices in the context of the Cloud Management Platform requires the purchase of dedicated hardware to provide the necessary workload isolation, performance, and high availability.

In the case of a Cloud Service Provider, the cost and management complexity of these assets would most likely be passed on the service costs to the consumer with the risk of tailoring a less competitive solution offering. Virtual SAN can dramatically reduce cost and complexity for this dedicated management environment. Some of the key benefits including the following:

  • Reduced management complexity because of the native integration with VMware vSphere® at the hypervisor level and access to a common management interface
  • Independence from shared or external storage devices, because it abstracts the hosts locally attached storage and presents it as a uniform datastore to the virtual machines
  • Granular virtual machine-centric policies which allow you to tune performance on a per-workload basis.

Availability as a Top Requirement

Availability is defined as “The degree to which a system or component is operational and accessible when required for use” [IEEE 610]. It is commonly calculated as a percentage, and often measured in term of number of 9s.

Availability = Uptime / (Uptime + Downtime)

To calculate the overall availability of a complex system, the availability percentage of each component should be multiplied as a factor.

Overall Availability = Element#1(availability %) * Element#2(availability %) * … * Element#n(availability %)

 

Number of 9s Availability % Downtime/year System/component inaccessible
1 90% 36.5 days Over 5 weeks per year
2 99% 3.65 days Less than 4 days per year
3 99.9% 8.76 hours About 9 hours per year
4 99.99% 52.56 minutes About 1 hour per year
5 99.999% 5.26 minutes About 5 minutes per year
6 99.9999% 31.5 seconds About half minute per year

When defining the level of service for its offering, the Cloud Service Provider will take this data into account and compute the expected availability of the systems provided. In this way, the vCloud consumer is able to correctly plan the positioning of their own workloads depending on their criticality and the business needs.

In a single or multi-tenant scenario, because the management cluster is transparent to the vCloud consumers, the class of service for this set of components is critical for delivering a resilient environment. If any Service Level Agreement is defined between the Cloud Service Provider (CMP) and the vCloud consumers, the level of availability for the CMP should match or be at least comparable to the highest requirement defined across the SLAs to maintain both the management cluster and the resource groups in the same availability zone.

Virtual SAN and High Availability

To support a critical management cluster, the underlying SDS solution must fulfill strict high availability requirements. Some of the key elements of Virtual SAN include the following:

  • Distributed architecture implementing a software-based data redundancy, similar to hardware-based RAID, by mirroring the data, not only across storage devices, but also across server hosts for increased reliability and redundancy
  • Data management based on data containers: logical objects carrying their own data and metadata
  • Intrinsic cost advantage by leveraging commodity hardware (physical servers and locally-attached flash or hard disks) to deliver mission critical availability to the overlying workloads
  • Seamless ability to scale out capacity and performance by adding more nodes to the Virtual SAN cluster, or to scale up by adding new drives to the existing hosts
  • Tiered storage functionality through the combination of storage policies, disk group configurations, and heterogeneous physical storage devices

Virtual SAN allows a storage policy configuration defining the number of failures to tolerate (FTT) which represents the number of copies of the virtual machine components to store across the cluster. This policy can increase or decrease the level of redundancy of the objects and their degree of tolerance to the loss of one or more nodes of the cluster.

Virtual SAN also supports and integrates VMware vSphere® High Availability (HA) features, including the following:

  • In case of a physical system failure, vSphere HA powers up the virtual machines on the remaining hosts
  • VMware vSphere Fault Tolerance (FT) provides continuous availability for virtual machines (applications) up to a limited size of 4 vCPUs and 64 GB RAM
  • VMware vSphere Data Protection™ provides a combination of backup and restore features for both virtual machines and applications

Blog - Leveraging VSAN for HA management clusters_2

Architecture Example

This example provides a conceptual system design for an architecture to implement a CMP in a cloud service provider scenario with basic resiliency and that is supported by Virtual SAN. The key elements of this design include the following:

  • Management cluster located in a single site
  • Two fault domains identified by the rack placement of the servers
  • A Witness to achieve a quorum in case of a failure, deployed on a dedicated virtual appliance (a Witness Appliance is a customized nested ESXi host designed to store objects and metadata from the cluster, pre-configured and available for download from VMware)
  • Full suite of management products, including optional CSP-related services
  • Virtual SAN general rule for failure to tolerate set to the value of 1 (two copies per object)
  • vSphere High Availability feature enabled for the relevant workloads

This example is a starting point that can provide an overall availability close to four 9’s, or 99.99%. Virtual SAN provides greater availability rates by increasing the number of copies per object (FTT) and the number of fault domains.

Some of the availability metrics for computing overall availability are variable and lie outside the scope of this blog post, but they can be summarized as the following:

  • Rack (power supplies, cabling, top of rack network switches, and so on)
  • Host (physical server and hardware components)
  • Hard disks MTBF (both SSD and spindle)
  • Hard disks capacity and performance (influence rebuild time)
  • Selection of the FTT, which influences the required capacity across the management cluster

Blog - Leveraging VSAN for HA management clusters_3

The complete architecture example will be documented and released as part of the VMware vCloud Architecture Toolkitfor Service Providers in Q1 2016.

 

Migration Strategies for vCloud Air Network Service Providers

As a vCloud Air Network service provider, building and offering hybrid cloud services to customers based on the SDDC is only half of the battle. Making sure that they are able to consume that service with fluidity becomes a critical area of focus. The less friction this Cloud Migration process has, the faster the customer time to value and service provider time to revenue become.

To address this rather broad subject, VMware is publishing a new document, “Migration Strategies for Hybrid Cloud,” in the VMware vCloud Architecture Toolkit™ for Service Providers (vCAT-SP). This blog introduces high-level concepts from that document. These concepts are meant to help both service providers and customers alike understand the opportunities and challenges when undergoing migration to a hybrid cloud scenario. Because this topic covers a vast area of information, the document only covers a few of the use cases available. Many of the more advanced use cases are accomplished through VMware Technology Partners, so stay tuned to the vCAT blog for additional information on how these solutions can be leveraged for migration to the hybrid cloud.

Migration

Figure 1. High Level Tool Categories for Migration

Looking at the figure above, we can see there are four main categories of tools available to accomplish different phases of the migration workflow. While there is a possibility that each category will be provided discretely in a single tool, it is often the case that single tools function in more than one category. It is also quite likely for most migration use cases that operators must coordinate activities between the tools in a workflow. Some, or in a best case scenario, all of these capabilities are integrated to help parties carry out a significant number of steps depending on the variables required for each migration instance. Leveraging the SDDC and its APIs provides the opportunity to automate as many of these steps as possible, and many of the available tools will facilitate some level of this type of automation.

More often than not, however, the governance of the migration projects, or perhaps even programs, should be addressed with a Migration Center of Excellence. In this Migration COE, typically hosted by the service provider, will be one or more instances of this tool chain constructed to allow customers and potentially other partners to come together and understand all of the variations that may drive migrations. Too often there is a rush to the workload migration tools themselves to relocate applications to the cloud without having considered the pitfalls, risks, and even upside potential offered by looking at the problem holistically. Specifically, we want customers to leverage the SDDC along with other VMware and Technology Partner solutions to introspect current application architectures as well as plan, and perhaps automate, the target tenant consumption of the service provider. The Migration COE allows us to visualize the “best fit” combination of tools and processes to plan the customer migration experience. The more information that can be applied to the process, the better.

By gleaning all of the potential information about source infrastructure and applications, we can create a repository of knowledge to plan migrations. The more virtualization and cloud-oriented solutions that are installed on the customer premises, such as VMware NSX® or VMware vRealize®, the more “migration ready” applications under the management of that infrastructure become. This is due to the ubiquity of the target hybrid cloud architectures, both built on the VMware SDDC. The primary function of the discovery and assessment tools is to ascertain the dependencies of the applications at a functional technology level. Examples of this might be DNS, PKI, or other authentication/authorization services, such as LDAP, that need to be made available to the application in its new post-migration home. Determining these dependencies will go far in planning for the serialization and parallelization of follow-on tasks related to migration and help to feed the three downstream task types—job scheduling, workload migration, and application verification. A great example of this customer-centric approach to discovery and assessment leverages VMware NSX and VMware vRealize Log Insight™. Once configured, the solution provides visualization of network activities through the Log Insight NSX for vSphere Content Pack v3, including application component interaction through networks and ports as described in this video.

Another important topic discussed in the migration document is workload mobility. There are a number of ways to provide hybrid cloud network connectivity (some are described in the blog Streamlining VMware vCloud Air Network Customer Onboarding with VMware NSX Edge Services), and many ways in which customers understand the concept of workload mobility. Because of the SDDC abstraction, many concepts discussed in the vCAT-SP use the terms “underlay” and “overlay”. While there is an obvious requirement for Layer 3 network connectivity to each site, the architecture will depend on the VMware software capabilities available at each site. Customers may choose VMware vSphere® metro clusters, a disaster avoidance scenario using VMware vSphere Replication™, or disaster recovery with VMware Site Recovery Manager™.

The Migration COE may include recommended methods based on any or all of these capabilities to help understand which may be appropriate in what situations. The hybrid network types in the previous paragraph provide workload mobility in the SDDC portion of the underlay that require VMkernel ports and operations. Migration solutions discussed in the vCAT-SP migration document, however, focus on the overlay consumption of hybrid cloud networks provided by VMware NSX in the creation of target environment capabilities to facilitate acceptable application characteristics in the new hybrid cloud location.  For example, the creation of VMware NSX Distributed Firewall policies for application-centric micro-segmentation as described in the vCAT-SP blog, Micro-Segmentation with NSX for vCloud Air Network Service Providers. Because the overall costs of labor in a migration can exceed 50%, as described by pro forma cost model in this Forrester brief and detailed in this blog, migration becomes the lynchpin for the entire process of acquiring and recognizing new customers consuming the services offered. Choosing the right combination of tools and labor then is in the critical path to making sure migrations function in an optimal fashion.

Another critical facet that that might be outside of the Migration COE is capacity planning. The different methods used in workload mobility require specific underlay network capabilities to achieve their goals, mainly bandwidth/throughput and latency. More information on underlay networking for hybrid cloud can be found in the vCAT-SP document, Architecting a Hybrid Mobility Strategy with VMware Cloud Air Network. It is important to understand that the entire phenomenon of workload mobility, including migration, is a numbers game and not just of network performance. The customers will demand an understanding of how the application will be managed for performance and maintenance in the new environment, perhaps through SLA’s, which will be used to forecast the service provider’s TCO of the hosting environment. Provider compute/storage/network infrastructure must be provisioned in time to accommodate new tenant migration activities including potential shared transfer storage along with ongoing performance requirements. Some of the main drivers for the application cutover itself can be related to Recovery Point Objectives/Recovery Time Objectives, perhaps requiring the introduction of a hardware storage replication scheme into the mix.  Consider also operational lead times for deploying and making these items ready for consumption and the potential ROI from automating as many tasks as possible.

Finally, one of the key reasons a service provider would drive their customers to collect the fullest amount of data possible is to leverage it to predict which customer workloads come with a “stickiness” to new services offered by the service provider and their partners. The ability to digest and manage all of this data in an effective, holistic way provides agility, creating a migration “funnel” of activities, fully leveraging but not exceeding capacities. This is achieved while also sustaining transparency to stakeholders, which is very powerful when a new journey is undertaken. Because vCloud Air Network offerings are built on the VMware SDDC you can be confident that it will offer the greatest compatibility and ease of both migration and mapping new operational procedures based on best practices in the vCAT-SP.