Home > Blogs > vCloud Architecture Toolkit (vCAT) Blog

vRealize Automation Configuration with CloudClient for vCloud Air Network

As a number of vCloud Air Network service providers start to enhance their existing hosting offerings, VMware are seeing some demand from service providers to offer a dedicated vRealize Automation implementation to their end-customers to enable them to offer application services, heterogeneous cloud management and provisioning in a self-managed model.

This blog post details an implementation option where the vCloud Air Network service provider can offer “vRealize Automation as a Service” hosted in a vCloud Director vApp, with some additional automated configuration. This allows the service provider to offer vRealize Automation to their customers based out of their existing multi-tenancy IaaS platforms and achieve high levels of efficiency and economies of scale.

“vRealize Automation as a Service”

During a recent Proof of Concept demonstrating such a configuration, an vCloud Director Organizational vDC was configured for tenant consumption.  Within this Org vDC a vApp containing a simple installation of vRealize Automation was deployed that consisted of a vRealize Automation Appliance and one Windows Server for IaaS components and an instance of Microsoft SQL.  With vRealize Automation successfully deployed, the vRealize Automation instance was customized leveraging vRealize CloudClient via Microsoft PowerShell scripts.  Using this method for configuration of the tenant within vRealize Automation reduced the deployment time for vRealize Automation instances while ensuring that the vRealize Automation Tenant configuration was consistent and conformed to the pre-determined naming standards and conventions required by the provider.

vRaaS vCAN Operations

vRealize Automation

To reduce the complexity of the implementation vRealize Automation was deployed within a vApp using the simple install method as this was determined to meet the anticipated user and workload requirements for actual consumers.  In this solution we leveraged the minimal installation where each instance of vRealize Automation consists of:

  • vRealize Automation Appliance – The vRealize Automation Appliance provides the vRealize Automation portal, Identity services, and vRealize Orchestrator.
  • Windows IaaS Server – The vRealize Automation server will include an instance of Microsoft SQL, the vRealize Automation Model Manager, vRealize Automation Manager Service, DEM Orchestrator and DEM Worker.

 Tenant Consumption2

vRealize Automation Deployment

Deployment of vRealize Automation can be carried out by either a manual configuration or scripted installation.  Due to the new installation wizard introduced in vRealize Automation 7, the level of effort required for a Simple Install of vRealize Automation has be reduced.  For the purposes of this discussion we will assume that the manual installation method for vRealize Automation deployment is used.

 

vRealize Automation Configuration

An important place to introduce automation in this process is the configuration of vRealize Automation for relevant creation of Fabric Groups, Business Groups as well as Blueprint and Entitlement configuration. vCloud Air Network Operation Admins can choose to script these configuration steps by leveraging vRealize CloudClient.  Leveraging the vRealize Automation API, CloudClient is a Java based command line utility which can be used for the configuration of vRealize Automation as well as the display and export of configuration details for vRealize Automation.  Let’s take a look at some of the considerations when using CloudClient and Microsoft PowerShell when carrying out post configuration tasks.

PowerShell Script and Cloud Client

When using CloudClient for scripting, one of the first steps is to create and configure the  cloudclient.properties file.  This file contains the environment variables to be used when calling CloudClient for scripting tasks.  Please refer to the CloudClient documentation for details steps on the creation and configuration of the cloudclient.properties file.

The PowerShell script will be configured to accept a set of parameters to be included in line when the script is executed.  One benefit of doing this upfront is that the script will already be configured for remote execution from an orchestration engine such as vRealize Orchestrator.

param(
$idStoreDomain,
$idStoreBaseDn,
$idStoreLoginUserDn,
$idStoreDcUrl,
$tenantName,
$customerPrefix,
$credsUserName, 
$credsPassword, 
$ComputeResourceName
)

 
CloudClient Environment Variables

While the cloudclient.properties files contains settings that can be leveraged for scripted execution of CloudClient commands, these setting are static and may need to be changed during the scripted execution of some CloudClient commands to ensure the correct credentials are used for the successful execution of commands.  For example, while most CloudClient commands require credentials with the Tenant Administrator role, other commands such as “vra identitystore add” and require the System Administrator account administrator@vsphere.local account for successful execution.

To address this in the PowerShell script, we Prefix CloudClient environment variables with “$env:” followed by the name of the CloudClient environment variable to be updated.  Here is an example of updating the required environment variables to run CloudClient with the administrator@vsphere.local user.

##---------------------------------------------------------------------------
## Set the Enviroment Variables for use with the Add Identity Source Command
##---------------------------------------------------------------------------

$env:CLOUDCLIENT_SESSION_KEY="administrator"
$env:vra_server="vra01.corp.local"
$env:vra_username="administrator@vsphere.local"
$env:vra_tenant="vsphere.local"
$env:vra_password="VMware1!"

 

After the credentials for the administrator@vsphere.local account have been set, the we can proceed to execute the CloudClient commands to add the required accounts to the Tenant Administrator role and IaaS Administrator role:

###-------------------------------------------------------------------------
### 'vra tenant identitystore add' Section - Add Identity Store to 
###  vsphere.local Tenant
###-------------------------------------------------------------------------


## Construct 'vra identitystore add' Command
$idStoreAddCommand = $CMD + " vra tenant identitystore add --tenantname " + $tenantName + " --name "+ $idStoreDomain + " --domain " + 
$idStoreDomain + " --groupbasedn " + $idStoreBaseDn + " --userdn " + $idStoreLoginUserDn + " --password " + $credsPassword + 
" --type AD --url " + $idStoreDcUrl + " --userbasedn " + $idStoreBaseDn

## Print 'vra identitystore add' Command to screen and then execute
Write-Host $idStoreAddCommand
Invoke-Expression $idStoreAddCommand


###-------------------------------------------------------------------------
### 'vra tenant admin update' Section - Update Infrastructure Admin role for 
###  vsphere.local Tenant
###-------------------------------------------------------------------------

## Declare IaaS Admin Group
$iaasGroup = $customerPrefix + "-iaasadmin@" + $idStoreDomain

## Construct 'vra tenant admin update' Command
$tenantUpdateCommand = $CMD + " vra tenant admin update --tenantname " + 
$tenantName + " --role IAAS_ADMIN --action ADD --users " + $iaasGroup

## Print 'vra tenant admin update' Command to screen and then execute
Write-Host $tenantUpdateCommand
Invoke-Expression $tenantUpdateCommand 

 

In the above example, we declare variables to construct the CloudClient commands “vra tenant identity store add” and “vra tenant admin update” with the desired parameters, of which the former requires the administrator@vsphere.local credentials.  We then use the “Invoke-Expression” PowerShell commandlet to run the resulting CloudClient commands.

Once we have completed the necessary commands to update the Tenant and IaaS Administrator roles, we can update the environment credential variables for proper execution of vRealize Automation Tenant constructs:

##------------------------------------------------------------------
## Set Environment Variables for use with the rest of the commands
##------------------------------------------------------------------

$env:CLOUDCLIENT_SESSION_KEY="configurationadmin"
$env:vra_server="vra01.corp.local"
$env:vra_username="configurationadmin@vsphere.local"
$env:vra_tenant="vsphere.local"
$env:vra_password="VMware1!" 

 

At this point, additional scripting can be created to continue the customer configuration of the tenant such as:

  • Creation of the customer’s vCloud Director Organization vDC as an Endpoint
  • Fabric Group creation
  • Machine prefix
  • Business Group creation

Additionally, Services, Entitlements and the required Actions can be created for the consumption of pre created Converged Blueprints backed by standard templates offered by the vCloud Air Network Service Provider.  Once the necessary scripted tasks have been completed, reservations are created manually and the vRealize Automation instance can be turned over to the customer.

Conclusion

In this post we have explored some basic examples of using CloudClient and PowerShell to script the configuration of vRealize Automation.  This powerful tool can also be used by vCloud Air Network partners to automate the configuration of vRealize Automation instances on a per customer basis, creating a “vRealize Automation as a Service” (vRAaaS) offering that is managed by the service provider, combining the multi-tenancy of vCloud Director with the unique self-service portal experience of vRealize Automation.

Deep Dive Architecture Comparison of DaaS & VDI, Part 2

In part 1 of this blog series, I discussed the Horizon 7 architecture and a typical single-tenant deployment using Pods and Blocks. In this post I will discuss the Horizon DaaS platform architecture and how this offers massive scale for multiple tenants in a service provider environment.

Horizon DaaS Architecture

The fundamental difference with the Horizon DaaS platform is multi-tenancy architecture. There are no Connection or Security servers, but there are some commonalities. I mentioned Access Point previously, this was originally developed for Horizon Air, and is now a key component for both Horizon 7 and DaaS for remote access.

 

Horizon DaaS Architecture

If you take a look at the diagram above you’ll see these key differences. Let’s start with the management appliances.
Continue reading

Deep Dive Architecture Comparison of DaaS & VDI, Part 1

In this two part blog series, I introduce the architecture behind Horizon DaaS and the recently announced Horizon 7. From a service provider point of view, the Horizon® family of products offers massive scale from both single-tenant deployments and multi-tenanted service offerings.

Many of you are very familiar with the term Virtual Desktop Infrastructure (VDI), but I don’t think the term does any justice to the evolution of the virtual desktop. VDI can have very different meanings depending on who you are talking to. Back in 2007 when VMware acquired Propero, which soon became VDM (then View and Horizon), VDI was very much about brokering virtual machines running a desktop OS to end-users using a remote display protocol. Almost a decade later, VMware Horizon is vastly different and it has matured into an enterprise desktop and application delivery platform for any device. Really… Horizon 7 is the ultimate supercar of VDI compared to what it was a decade ago.

I’ve read articles that compare VDI to DaaS but they all seem to skip this evolution of VDI and compare it to the traditional desktop broker of the past. DaaS on the other hand provides the platform of choice for service providers offering Desktops as a Service. DaaS was acquired in October 2013 (formerly Desktone). In fact I remember the day of the announcement because I was working on a large VMware Horizon deployment for a service provider at the time.

For this blog post I’d like to start our comparisons on the fundamental architecture of the Horizon DaaS platform to Horizon 7 which was announced in February 2016. This article is aimed at consultants and architects wishing to learn more about the DaaS platform.
Continue reading

Managed Security Services Maturity Model for vCloud Air Network Service Providers

Introduction

We’ve all heard about the many successful cyber-attacks carried out in various industries. Rather than cite a few examples to establish background I would encourage you to review the annual report from Verizon called the Data Breach Digest. This report gives critical insight for understanding how the most pervasive of attacks are executed and what to protect against to impede or prevent them. In order to provide a sound architecture and operational model for this purpose of protection, let’s look at some universal principals that have emerged as a result of forensics from these events. Those principles are time and space. Space, in this case, is cyberspace and involves the moving digital components of the target systems that must be compromised to execute a successful attack. Time involves events that may occur at network or CPU speed, but it is the ability to trap those events and put them into a human context, in terms of minutes, hours, or days, where security operations can respond. The combination of unprotected attack vectors, already compromised components of the system, and the inability to spot them, creates what are known as “blind spots” and “dwell time” where an attacker can harvest additional information, and potentially expand to other attack vectors.

While all of that is hopefully easy to understand, we have to face the reality that many attacks still occur by using compromised credentials from social engineering. These credentials provide enough privilege to establish a foothold for command and control used in a cyber-attack. For this reason, we want to employ one of the core principles of the Managed Security Services Maturity Model, known as Zero Trust, or the idea that every action must have specific authentication, authorization and accounting (AAA) defined. By subscribing to this maturity model as a VMware vCloud® Air™ Network service provider, you will uncover ways in which you can leverage features, such VMware NSX® Distributed Firewall and micro-segmentation, putting you well on the road to offering services that can help customers address potential blind spots and reduce dwell time, thereby taking control and ownership of their cyber risk posture. No matter how nefarious a rogue entry into target systems is, or what escalated privilege was acquired, the Managed Security Services Model will limit the kind of lateral movement necessary to conduct consistent ongoing attacks, or what is known as an advanced persistent threat (APT). Although not all occurrences are APTs, by understanding the methods used in these most advanced attacks, we can isolate and protect aspects of the system required to execute a “kill chain,” essentially allowing ownership of a system in undetectable ways.

Managed Security Services Maturity Model

Cyber security, in its entirety, is a vast concept not to be given justice with a small set of blog articles and white papers. However, given the expansive nature of cyber-threats in this day and age, along with the ratio of successful attacks, information technology needs to continually seek out new approaches. One approach is to create as much of an IT environment as possible from known patterns and templates of installed technologies that can be deployed with a high fidelity of audit information to measure their collective effectiveness against cyber-threats. This turns on its head the idea of protecting environments against an exponentially exploding number of threats with greater diversity in the areas frequently attacked, and instead refines deployed environments to accept only activities that are well defined, with results that are well understood. Simply put, measure what you can trust. If it can’t be measured, it can’t be trusted.

Once again, this approach touches on a large concept, but it is finite in nature in that its definition seeks to gain the control needed to deliver sustainable security operations for customers. To further illustrate this point, let’s think about the idea of what a control and the maturity model affords the operator in pursuit of their target vision. First, is the idea of “control,” which simply put in cyber security terms means defining a behavior that can be measured. This could be architecture patterns expected from the provider layer, such as data privacy or geo-location, or automation and orchestration of security operations. Second, is the maturity model itself, which has prerequisites for executing on specific rungs of the model, along with providing operational and security benefits. One output of each rung of the maturity model is the potential set of services to be offered to aid in the completion the customer’s target cyber security vision.

Enter the Managed Security Services Maturity Model, which encodes the methodology for capturing each customer’s ideal approach and provides five different maturity “layers” that aid vCloud Air Network service providers in delivering highly secure hybrid cloud environments. Looking at Figure 1, we can see that the ideas of time and “geometry” (networks and boundaries we have defined), along with the provider (below the horizontal blue line) and consumer (operating system and application runtimes) layers, provide us the cyber dimensions we seek to define and measure.

Maturity Model

Figure 1. Managed Security Services Maturity Model

Like most capability maturity models, when starting from the bottom we can often borrow attributes and patterns for service from the layers above. Generally, however, we need to accomplish the prerequisites for the upper layers (Orchestrated and above) to truly be considered operating at that layer. Often, there are issues of completeness where we must perform these prerequisite tasks n number of times in the design of our architecture and operations to have mobility to upper levels. For instance, to complete the Automation level, you should plan to automate on the order of about a dozen elements although your mileage may vary.

You may find more work to be done moving up the levels as you determine the right composition and critical mass of controls appropriate to deliver for targeted customer profiles. In the case of our maturity model, we will bind several concepts at each level to ultimately achieve the Zen-like “Advanced” layer 5, where we truly realize the completeness of the vision to own cyber security for our customers. A big responsibility to be sure, but perhaps a bigger opportunity to change the game from the status quo. The offering of managed services composed of facets from all levels is not for everyone but there is plenty of room to add value from all layers.

We have defined the following layers for the Managed Security Services Maturity Model:

  1. Basic

At this level, we introduce VMware NSX, VXLAN, and the Distributed Firewall to the hybrid cloud environment. This allows us to create controlled boundaries and security policies that can be applied in an application-centric fashion, resulting in focused operating contexts for security operations.

  1. Automated

At this level, we want to automate the behavior of the system with regard to controls. This will prompt security operations with events generated by discreet controls and their performance involving established measurements or tolerances. The goal is to automate as many controls as possible to become Orchestrated.

  1. Orchestrated

After we have many controls automated, we want to make them recombinant in ways that allow for controlling the space, or the “geometry”, along with coordinating events, information, automated reactions, and so on, which will allow us to drive down response times. These combinations will result in “playbooks,” or collections of controls assembled in patterns that are used to combat cyber threats.

  1. Lifecycle

Taking on full lifecycle responsibility means just that. We might monitor in-guest security aspects like anti-virus/malware or vulnerability scanning in discreet, automated, and even orchestrated ways in previous levels. This level, however, is about actually taking ownership of operating systems and perhaps even application runtimes within the customer virtual machines. By extending managed services to include what is inside the virtual machines themselves, it is possible to take ownership of all facets of cyber security regarding applications in the hybrid cloud.

  1. Advanced

At the Advanced level, we must be able to leverage all previous levels in such a way that managed services can be deployed to remediate a cyber-threat or execute on a risk management plan to help address security issues of all types. Additionally, we want our resulting cyber toolkit derived from the maturity model to become portable, in appliance form, where managed security services can be delivered anywhere in the hybrid cloud network.

In the upcoming series of blog postings that describe VMware vCloud Architecture Toolkit for Service Providers (vCAT-SP) reference architecture design blueprints and use cases for each maturity level, vCloud Air Network service providers can help customer’s to visualize what it will take to both architect and operate managed security services used to augment the hybrid cloud delivery model.

Eliminating Blind Spots and Reducing Dwell Time

The cyber defense strategies that are devised based on achieving levels of the maturity model focus on defining individual elements within the system. Management user interfaces, ports, session authentication, as well as virtual machine file systems, network communications, and so on, should be defined to allow alignment of controls. In addition, the provisioning of networks between the resources that consume services and those that provide them, such as management components like VMware vCloud Director® or VMware vCenter™, DNS, or Active Director and logging of network components (including those that serve end user applications to their communities), should also occur in as highly an automated fashion as possible.

In this way, human-centric, error-prone activities can be eliminated from consideration as potential vulnerabilities, although automated detection of threats by discreet components across cyber dimensions is still expected. A high level example of how we expect these discreet, automated controls to behave is described by Gartner, who defines the concept of a “cloud security gateway” as “the ability to interject enterprise security policies as the cloud-based resources are accessed”. By defining controls for system elements and their groupings in this way, we can form a fully identified inventory of what is being managed and by whom as well as where it resides. Likewise, by understanding and quantifying the controls in the system that are applied collectively to these elements, we can begin to measure and score their effectiveness. This harmonization is critical to deliver the consistency in the enforcement mechanisms we can rely on across both sides of the hybrid cloud creating the foundation of trust.

Despite our efforts to inventory all elements within systems, attacks will still arrive from the outside world in the user portions of the application stack, for example, through SQL injection or using cross-site scripting techniques. The threat of compromised insider privileged users will still be present as will “social engineering” methods of obtaining passwords. However, the “escape” of a rogue, privileged user to a realm from which they can continue their attack has been minimized. We have taken the elements of time and space and defined them to our advantage, creating a high security prison effect and requiring new vulnerability exploits to be executed for each step in the kill chain.

Because the attackers generally deal with a limited budget and time in which to execute a successful attack, often times even our simplest security approaches are enough to make us the safest house on the block. Also, because of the likelihood that all activities that occur within the environment are well known, effectively generating high confidence indicators or signals, and very little noise as a sensor, anomalies are easy to spot. Given the presentation of those anomalies and playbooks already available to address many adverse operating conditions, you are providing customers the ability to deliver a credible response to threats, something that many lack today.

Conclusion

The goal of vCloud Air Network service providers and their partners should be identifying cyber security challenges that customers face, as well as which meaningful, coarsely grained packages of managed services can be offered to help tackle those challenges. By aligning with the Managed Security Services Maturity Model, providers can leverage the VMware SDDC and VMware NSX software-defined networking and security capabilities to deliver something truly unique in the enterprise IT industry—a secure hybrid cloud. By further aligning these capabilities and services with those of application migration and DevOps (stay tuned for blogs on those and other subjects), and taking ownership of the full lifecycle of security, the potential of effectively remediating existing threats becomes possible. Together, we can help customers evaluate their risk profile, as well as understand how these techniques can minimize attack points and vectors and reduce response times, while increasing effectiveness in fighting cyber threats.

What you’ll see throughout the Managed Security Services Maturity Model is the creation of a “ubiquity” of security controls across each data center participating in the hybrid cloud. This ubiquity will allow for a consistent, trusted foundation from which the performance of the architecture and operations can be measured. Individual policies can then be constructed across this trusted foundation relative to specific security contexts consisting of applications and their users as well as administrators and their actions, leaving very little room for threats to go unnoticed. As these policies are enforced by the controls of the trusted foundation, cyber security response becomes more agile because all components are performing in a well understood fashion. Think of military special forces training on a “built for purpose” replica of an area they plan to assault to minimize unexpected results. Security operators can now be indoctrinated and immersed, knowing what scenes are expected to play out instead of constantly looking for the needle in the haystack. This will also ultimately create the ideal conditions for helping to rationalize unfettered consumption of elastic resources while also fulfilling the vision and realizing the potential of the hybrid cloud.

Streamlining VMware vCloud Air Network Customer Onboarding with VMware NSX Edge Services

When migrating private cloud workloads to a public or hosted cloud provider, the methods used to facilitate customer onboarding can provide some of the most critical challenges. The cloud service provider requires a method for onboarding tenants that reduces the need for additional equipment or contracts that often create barriers for customers when moving enterprise workloads onto a hosting or public cloud offering.

Customer Onboarding Scenarios

When a service provider is preparing for customer onboarding, there are a few options that can be considered. Some of the typical onboarding scenarios are:

  • Migration of live workloads
  • Offline data transfer of workloads
  • Stretching on-premises L2 networks
  • Remote site and user access to workloads

One of the most common scenarios is workload migration. For some implementations, this means migrating private cloud workloads to a public cloud or hosted service provider’s infrastructure. One path to migration leverages VMware vSphere® vMotion® to move live VMs from the private cloud to the designated CSP environment. In situations where this is not feasible, service providers can supply options for the offline migration of on-premises workloads where private cloud workloads that are marked for migration are copied to physical media, shipped to the service provider, and then deployed within the public cloud or hosted infrastructure. In some cases, migration can also mean the ability to move workloads between private cloud and CSP infrastructure on demand.

Providing Connectivity for Customer Onboarding

When contemplating the method of migrating workloads during customer onboarding, one of the critical enablers of the migration effort is network connectivity. Some enterprise workloads might require access to Layer 2 segments that exist on-premises, while others might require access to essential underlying infrastructure systems, such as Active Directory/LDAP, DNS, CMDBs, and other systems that cannot be moved due to internal policy mandates, or the perceived increased cost and management of duplicating these components. In some cases, this access is needed only during the migration, while in others, the access might be required over a longer period of time. For example, with workloads that require on-premises Active Directory or LDAP access.

NSX Edge Service for vCloud Air Network Onboarding

VMware NSX® brings many of benefits of SDN to vCloud Air Network service providers, including increased security through micro-segmentation, reduced hardware costs, and the programmatic control of networking functions and services. VMware NSX provides a useful set of features for vCloud Air Network partners and customers alike. In addition to these capabilities, VMware NSX also provides essential services such as Virtual Private Networking (VPN) and Network Address Translation services that can be leveraged to facilitate the onboarding of customers to a vCloud Air Network partner infrastructure.

One feature example of VMware NSX built-in VPN functionality is the Layer 2 virtual private network (L2VPN) service. With the L2VPN service, vCloud Air Network providers implementing VMware NSX can provide customers with the ability to extend Layer 2 networks from their on-premises data centers to the VMware NSX environment of their chosen vCloud Air Network service provider. This functionality can even be extended to customers that need the benefits of hosting enterprise workloads on the public cloud, but have not yet implemented VMware NSX within their on-premises data centers.

This powerful VMware NSX Edge™ service provides an SDN solution for workloads that must be migrated to a vCloud Air Network public cloud or hosting provider, while maintaining original IP address and L2 connectivity. The L2VPN feature of VMware NSX is also an efficient way to enable long-distance vSphere vMotion between on-premises and vCloud Air Network hybrid clouds. The combination of these features can be leveraged in scenarios for a one-time vSphere vMotion based migration or ongoing workload mobility between on-premises and the vCloud Air Network infrastructure. For one approach to accomplishing the migration of live workloads, see the Live Workload Mobility to a vCloud Air Network IaaS Provider blog.

Example of vCloud Air Network Workload Migration with L2VPN

Intro - NSX to Standalone L2VPN v2

There are also situations in which a customer will simply need the ability to remotely access their workloads that have been migrated or deployed to their chosen vCloud Air Network partner. Customers want an encrypted connection to maintain the security standards they expect when accessing these critical enterprise workloads. In these scenarios, VMware NSX Edge™ can provide IPsec and SSL VPN services to extend site-to-site and remote user-to-site access to workloads residing behind the NSX Edge appliance within the vCloud Air Network provider’s data center. Additionally, VPN services such as IPsec can be leveraged to enable workloads that are deployed to a vCloud Air Network service provider to access on-premises systems that will not be moved during migration efforts.

Example of Remote Management of vCloud Air Network Workloads with SSL VPN

Intro -SSL-VPN Plus v2

Conclusion

Using VMware NSX, vCloud Air Network partners are able to take an SDN approach to streamline the onboarding of customer workloads to vCloud Air Network public cloud and hosting environments. From extending L2 networks from a customer’s on-premises data center to a vCloud Air Network powered Hosting provider to enabling remote access to deployed workloads, VMware NSX can be leveraged to assist with customer onboarding without the need for additional hardware. In turn, customers of VMware vCloud Air Network partners benefit from this by being able to efficiently migrate existing workloads to a vCloud Air Network partner’s vSphere based infrastructure.

This blog post has outlined some of the features that VMware NSX can provide to VMware vCloud Air Network partners and their end customers for onboarding. Stay tuned for follow-up posts that expand on these use cases and for additional ways that VMware vCloud Air Network partners can ease the path for customer onboarding with VMware NSX.

vCenter Server Scalability for Service Providers

Designing and architecting monster vCloud Air Network service provider environments takes VMware technology to its very limits, in terms of both scalability and complexity. vCenter Server, and its supporting services, such as SSO, are at the heart of the vSphere infrastructure, even in cloud service provider environments where a Cloud Management Platform (CMP) is employed to abstract the service presentation away from vCenter Server.

Meeting service provider scalability requirements with vCenter Server requires optimization at every level of the design, in order to implement a robust technical platform that can scale to its very limits, whilst also maintain operational efficiency and support.

This article outlines design considerations around optimization of Microsoft Windows vCenter Server instances and best practice recommendations, in order to maximize operational performance of your vCenter ecosystem, which is particularly pertinent when scaling over 400 host servers. Each item listed below should be addressed in the context of the target environment, and properly evaluated before implementation, as there is no one solution to optimize all vCenter Server instances.

The following is simply a list of recommendations that should, to some extent, improve performance in large service provider environments. This blog targets the Windows variant of vCenter Server 5.x and 6.x with a Microsoft SQL database, which is still the most commonly deployed configuration.

Warning: Some of the procedures and tasks outlined in this article are potentially destructive to data, and therefore should only be undertaken by experienced personnel once all appropriate safeguards, such as backed up data and a tested recovery procedure, are in place.

 

Part 1 – vCenter Server Operational Optimization

vCenter Server Sizing
vCloud Air Network service providers must ensure that the vCenter virtual system(s) are sized accordingly, based on their inventory size. Where vCenter components are separated and distributed across multiple virtual machines, ensure that all systems meet the sizing recommendations set out in the installation and configuration documentation.

vSphere 5.5: https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html
vSphere 6.0: https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html
vSphere 5.1: http://kb.vmware.com/kb/2021202

Distribute vCenter Services across multiple virtual machines (vSphere 5.5)
In vSphere 5.5, depending on inventory size, multiple virtual machines can be used to accommodate different vCenter roles. VMware recommends separating VMware vCenter, SSO Server, Update Manager and SQL for flexibility during maintenance and to improve scalability of the vCenter management ecosystem. The new architecture of vCenter 6 simplifies the deployment model, but also reduces design and scaling flexibility, with only two component roles to deploy.

Dedicated Management Cluster
For anything other than the smallest of environments, VMware recommends separating all vSphere management components onto a separate out-of-band management cluster. The primary benefits of management component separation, include:

  • Facilitating quicker troubleshooting and problem resolution as management components are strictly contained in a relatively small and manageable cluster.
  • Providing resource isolation between workloads running in the production environment and the actual systems used to manage the infrastructure.
  • Separating the management components from the resources they are managing.

vCenter to Host operational latency
The number of network hops between the vCenter Server and the ESXi host affects operational latency. The ESXi host should reside as few network hops away from the vCenter Server as possible.

vCenter to SQL Server operational latency
The number of network hops between the vCenter Server and the SQL database also affects operational latency. Where possible, vCenter should reside on the same network segment as the supporting database. If appropriate, configure a DRS affinity rule to ensure that the vCenter Server and database server reside on the same ESXi host, reducing latency still further.

Java Max Heap Size 
vCloud Air Network service providers must ensure that the max heap size for Java virtual machine is set correctly based on the inventory size. Confirm heap size on JVM Heap settings on vCenter, Inventory Service, SSO and Web Client are checked. Monitor Web Services to verify. vSphere 5.1 & 5.5: http://kb.vmware.com/kb/2021302

Concurrent Client Connections
Whilst no always easy, attempt to limit the number of clients connected to vCenter Server, as this affects its performance. This is particularly the case for the traditional Windows C# client.

Performance Monitoring
Employ a performance monitoring tool to ensure the health of the vCenter ecosystem and to help troubleshoot problems when they arise. Where appropriate, configure a vROps Custom Dashboard for vCenter/Management components. Also ensure appropriate alerts and notifications on performance monitoring tools exist.

Virtual disk type
All vCenter Server virtual machine VMDK’s should be provisioned in an eagerZeroedThick format. This provides approximately a 10-20 percent performance improvement over the other two disk formats.

vCenter vNIC type
vCloud Air Network service providers should ensure to employ the VMXNET3 paravirtualized network adaptor to maximise network throughput, efficiency and reduce latency.

ODBC Connection
Ensure that the vCenter and VUM ODBC connections are configured with the minimum permissions required for daily operations. Additional permissions are typically required during installation and upgrade activities, but not for day to day operations. Please refer to the Service Account Permissions provided below.

vCenter Logs Clean Up
vCenter Server has no automated way of purging old vCenter Log files. These files can grow and consume a significant amount of disk space on the vCenter Server. Consider a 3/6 monthly scheduled task to delete or move log files older than the period of time defined by business requirements.

For instance, the VBscript below can be used to clean up old log files from vCenter. This script deletes files that are older than a fixed number of days, defined in line 9, from the path set in line 6. This VBscript can be configured to run as a scheduled task using the windows task scheduler.

Dim Fso
Dim Directory
Dim Modified
Dim Files
Set Fso = CreateObject("Scripting.FileSystemObject")
Set Directory = Fso.GetFolder("C:\ProgramData\VMware\VMware VirtualCenter\Logs\")
Set Files = Directory.Files
For Each Modified in Files
If DateDiff("D", Modified.DateLastModified, Now) > 180 Then Modified.Delete
Next

For more information, refer to KB article: KB1021804 Location of vCenter Server log files.
For additional information on modifying logging levels in vCenter please refer to KB1004795 and KB1001584.

Note: Once a log file reaches a maximum size it is rotated and numbered similar to component-nnn.log files and they may be compressed.

Statistics Levels
The statistics collection interval determines the frequency at which statistic queries occur, the length of time statistical data is stored in the database, and the type of statistical data that is collected.

As historical performance statistics can take up to 90% of the vCenter server database size, it is the primary factor in the performance and scalability of the vCenter Server database. Retaining this performance data allow administrators to view the collected historical statistics, through the performance charts in the vSphere Web Client, through the traditional Windows Client or through command-line monitoring utilities, for up to 1 year after the data was first ingested into the database.

You must ensure that statistics collection times are set as conservatively as possible so that the system does not become overloaded. For instance, you could set a new DB Data Retention Period of 60 Days and configure the DB to not retain performance data beyond 60 days. At the same, it is equally important to ensure that the retention of this historical data meets the service provider’s data compliance requirements.

As this statistics data consumes such a large proportion of the database, proper management of these vCenter Server statistics is an important consideration for overall database health. This is achieved by the processing of this data through a series of rollup jobs, which stop the database server from becoming overloaded. This is a key consideration for vCenter Server performance and is addressed in more detail in Part 2 of this article.

Task and Events Retention
Operational teams should ensure that the Task and Events retention levels are set as conservatively as possible, whilst still meeting the service provider’s data retention and compliance requirements. Every time a task or event is executed via vCenter, it is stored in the database. For example, a task is created when an user powers on or off on a virtual machine and an event is generated when something occurs, such as the vCPU usage for a VM changing to red.

vCenter Server has a Database Retention Policy setting that allows you to specify after how long vCenter Server Tasks and Events should be deleted. This correlates to a database rollup job that purges the data from the database after the selected period of time. Whilst compared to statistical data these tables consume a relevantly small amount of database space, it is good practice to consider this option for further database optimization. For Instance, by default, vCenter is configured to store tasks and events data for 180 days. However, it might be possible, based on the service provider’s compliance requirements, to configure vCenter not to retain Event and Task Data in the database beyond 60 days.

vCenter Server Backup Best Practice
In addition to scheduling regular backups of the vCenter Server database, the backups for the vCenter Server should also include the SSL certificates and license key information.

 

Part 2 – SQL DB Server Operational Optimization (for vCenter Server)

SQL Database Server Disk Configuration
The vCenter Server database data file (mdf) generates mostly random I/O, while database transaction logs (ldf) generate mostly sequential I/O. The traffic for these files is almost always simultaneous so it’s preferable to keep these files on two separate storage resources, that don’t share disks or I/O. Therefore, where a large service provider inventory demands it, operational teams should ensure that the vCenter Server database uses separate drives for data and logs which, in turn, are backed by different physical disks.

tempDB Separation
For large service provider inventories, place tempDB on a different drive, backed by different physical disks than the vCenter database files or transaction logs.

Reduce Allocation Contention in SQL Server tempDB database
Consider using multiple data files to increase the I/O throughput to tempDB. Configure 1:1 alignment between TempDB files and vCPUs (up to eight) by spreading tempDB across at least as many equal sized files as there are vCPUs.

For instance, where 4 vCPUs exist on the SQL server, create three additional tempDB data files, and make them all equally sized. They should also be configured to grow in equal amounts. After changing the configuration, a restart of the SQL Server instance is required. For more information please refer to: http://support.microsoft.com/kb/2154845

Database Connection Pool
vCenter server starts, by default, with a database connection pool of 50 threads. This pool is then dynamically sized according to the vCenter Server’s workload. If high load is expected due to a large inventory, then the size of the pool can be increased to 128 threads. This will increase memory consumption and load time of the vCenter Server. To change the pool size, edit the vpxd.cfg file, adding, as below, where ‘128’ is the number of connection threads to be configured.

< vpxd>
< odbc>
< maxConnections>128
< /odbc>
< /vpxd>

Table Statistics
Update statistics of the SQL tables and indexes on a regular basis, for better overall performance of the database. Create an SQL job to carry out this task, or alternatively, it should form part of a vSphere database maintenance plan. http://sqlserverplanet.com/dba/update-statistics

Index Fragmentation (Not Applicable to vCenter 5.1 or newer)
Check for fragmentation of index objects and recreate indexes if needed. This happens with vCenter due to statistic roll ups. Defragment after <30% fragmentation. See this KB1003990.

Note: With the new enhancements and design changes made in the vCenter Server 5.1 database and later, this is no longer applicable or required.

Database Recovery Model
Depending on your vCenter database backup methodology, consider setting the transaction logs to SIMPLE recovery. This model will reduce the disk space needed for the logs as well decrease I/O load.

Choosing the Recovery Model for a Database: http://msdn.microsoft.com/en-us/library/ms175987(SQL.90).aspx
How to view or Change the Recovery Model of a Database in SQL Server Management Studio: http://msdn.microsoft.com/en-us/library/ms189272(SQL.90).aspx

Virtual Disk Type
Where the vCenter Server database server is a virtual machine, ensure that all VMDK’s are provisioned in an eagerZeroedThick format. This option provides approximately 10-20 percent performance improvement over the other two disk formats.

Verify SQL Rollup Jobs
Ensure all the SQL Agent rollup jobs have been created on the SQL server during the vCenter Server Installation. For instance:

  • Past Day stats rollup
  • Past Week stats rollup
  • Past Month stats rollup

For the full set of stored procedures and jobs please refer to the appropriate article below. Where necessary, recreate MSSQL agent rollup jobs. Note that detaching, attaching, importing, and restoring a database to a newer version of MSSQL Server does not automatically recreate these jobs. To recreate these jobs, if missing, please refer to: KB1004382.

KB 2033096 (vSphere 5.1, 5.5 & 6.0): http://kb.vmware.com/kb/2033096
KB 2006097 (vSphere 5.0): http://kb.vmware.com/kb/2006097

Also, ensure that the myDB references the vCenter Server database, and not the master or some other database. If these jobs reference any other database, you must delete and recreate the jobs.

Ensure database jobs are running correctly
Monitor scheduled database jobs to ensure they are running correctly. For more information, refer to KB article: Checking the status of vCenter Server performance rollup jobs: KB2012226

Verify MSSQL Permissions
Ensure that the local SQL and AD permissions required are in place, and align with the principle of least privilege (see below). If necessary, truncate all unrequired performance data from the database (Purging Historical Statistical Performance Data). For more information, refer to KB article: Reducing the size of the vCenter Server database when the rollup scripts take a long time to run KB1007453

Truncate all performance data from vCenter Server
As discussed in Part 1, to truncate all performance data from vCenter Server 5.1 and 5.5:

Warning: This procedure permanently removes all historical performance data. Ensure to take a backup of the database/schema before proceeding.

  1. Stop the VMware VirtualCenter Server service. Note: Ensure that you have a recent backup of the vCenter Server database before continuing.
  2. Log in to the vCenter Server database using SQL Management Studio.
  3. Copy and paste the contents of the SQL_truncate_5.x.sql script (available from the link below) into SQL Management Studio.
  4. Execute the script to delete the data.
  5. Restart the vCenter Server services.

For truncating data in vCenter Server and vCenter Server Appliance 5.1, 5.5, and 6.0, see Selective deletion of tasks, events, and historical performance data in vSphere 5.x and 6.x (2110031)

Shrink Database
After purging historical data from the database, optionally shrink the database. This is an online procedure to reduce the database size and to free up space on the VMDK, however, this activity will not in itself improve performance. For more information, refer to: Shrinking the size of the VMware vCenter Server SQL database KB1036738

For further information on Shrinking a Database, refer to: http://msdn.microsoft.com/en-us/library/ms189080.aspx

Rebuilding indexes to Optimize the performance of SQL Server
Configure regular maintenance job to rebuild indexes. KB2009918

  1. To rebuild the vCenter Server database indexes. Note, for a vCenter Server 5.1 and 5.5 database, download and extract the .sql files from the 2009918_rebuild_51.zip file attached to this procedure.
  2. Backup your vCenter Server database before proceeding. For more information, see Backing up and restoring vCenter Server 4.x and 5.x (1023985).
  3. These steps must be performed against the vCenter database and not the Master.
  4. Connect to the vCenter Server database using Management Studio for SQL Server
  5. Execute the .sql file to create the REBUILD_INDEX stored procedure, available from the above link.
  6. Execute the stored procedure that was created in the previous step: execute REBUILD_INDEX

VPX_HIST_STAT Table Sizes
VMware recommend a fill factor of 70% for the 4 VPX_HIST_STAT tables. If this recommended fill factor is too high for resources on the database server, then it will need to take time splitting pages, which equates to additional I/O.

If you are experiencing high unexplained I/O in the environment, monitor the SQL Server Access Methods object: Page Splits/sec. Page splits are expensive, and cause your table to perform more poorly due to fragmentation. Therefore, the fewer page splits you have the better your system will perform.

By decreasing the fill factor in your indexes, what you are doing is increasing the amount of empty space on each data page. The more empty space there is, the fewer page splits you will experience. On the other hand, having too much unnecessary empty space can also hurt performance because it means that less data is stored per page, which means it takes more disk I/O to read tables, and less data can be stored in the buffer cache.

High Page Splits/sec will result in the database being larger than necessary and having more pages to read during normal operations.

To determining where growth is occurring in the VMware vCenter Server database refer to:  http://kb.vmware.com/kb/1028356

For troubleshooting VPX_HIST_STAT table sizes in VMware vCenter Server 5, refer to: KB2038474

To reduce the size of the vCenter Server database when the rollup scripts take a long time to run, refer to: KB1007453

Monitor Database Growth
Service provider operational teams should monitor vCenter Server database growth over a period of time to ensure the database is functioning as expected. For more information, refer to KB article: Determining where growth is occurring in the vCenter Server database KB1028356

Schedule and verify regular database backups
The vCenter, SSO, VUM and SRM servers are by themselves stateless. The databases are far more critical since they store all the configuration and state information for each of the management components. These databases must be backed-up nightly and the restore process of each database needs to be tested periodically.

Operational teams should ensure that a schedule of regular backups exists of the vCenter database and based on requirements of the business, restore and mount databases from backup periodically onto a non-production system to ensure a clean recovery is possible, should database corruption or data loss occur in the production environment.

Create a Maintenance Plan for vSphere databases
Work with the DBA’s to create a daily and weekly database maintenance plan. For Instance:

  • Check Database Integrity
  • Rebuild Index
  • Update Statistics
  • Back Up Database (Full)
  • Maintenance Cleanup Task

Warning: DO NOT SHRINK DB IN MAINTENANCE PLAN UNLESS THERE IS A SPECIFIC REQUIREMENT TO RECLAIM DISK SPACE: http://msdn.microsoft.com/en-us/library/ms189080.aspx

 

Part 3 – Service Account Permissions (Least Privilege)

vCenter Service Account
Required by the ODBC Connection for access to the database, the vCenter service account must be configured with dbo_owner privileges for normal operational use. However, the vCenter database account being used to make the ODBC connection also requires the db_owner role on the MSDB System database, during installation or upgrade of the vCenter Server. This permission facilitates the installation of SQL Agent jobs for vCenter statistic rollups.

Typically, the DBA should only grant the vCenter service account the db_owner role on the MSDB System database when installing or upgrading vCenter, then revoke that role when these activities are complete.

RSA_DBO (vSphere 5.1 Only)
Only Required for SSO 5.1, the RSA_DBA account is a local SQL account which is used for creating the schema (DDL) and requires dbo_owner permissions.

RSA_USER (vSphere 5.1 Only)
Only Required for SSO 5.1, the RSA_USER reads and writes data (only DML).

VUM Service Account
Despite being a 64bit application, VUM requires a 32bit ODBC connection from “C:\Windows\SysWOW64\odbcad32.exe”. The VUM service account must be provide the dbo_owner permission on the VUM DB. The installation of vCenter Update Manager 5.x and 6.x with a Microsoft SQL back end database also requires the ODBC connection account to temporarily have db_owner permissions on the MSDB System database. This was a new requirement in vSphere 5.0.

As with the vCenter service account, typically the DBA would only grant the VUM service account the db_owner role for the MSDB System database during an install or upgrade to the VUM component of vCenter. This permission should then be revoked when that task has been completed.

Leveraging Virtual SAN for Highly Available Management Clusters

A pivotal element in each Cloud Service Provider service plan is the class of service being offered to the tenants. The amount of moving parts in a data center raises legitimate questions about the reliability of each component and its influence on the overall solution. Cloud infrastructure and services are built on the traditional three pillars: compute, networking and storage, assisted by security and availability technologies and processes.

The Cloud Management Platform (CMP) is the management foundation for VMware vCloud® Air Network™ providers with a critical set of components that deliver a resilient environment for vCloud consumers.

This blog post highlights how a vCloud Air Network provider can leverage VMware Virtual SAN™ as a cost effective, highly available storage solution for cloud services management environments, and how the availability requirements set by the business can be achieved.

Management Cluster

A management cluster is a group of hosts joined together and reserved for powering the components that provide infrastructure management services to the environment, some of which include the following:

  • VMware vCenter Server™ and database, or VMware vCenter Server Appliance™
  • VMware vCloud Director® cells and database
  • VMware vRealize® Orchestrator™
  • VMware NSX® Manager™
  • VMware vRealize Operations Manager™
  • VMware vRealize Automation™
  • Optional infrastructure services to adapt the service provider offering (LDAP, NTP, DNS, DHCP, and so on)

To help guarantee predictable reliability, steady performance, and separation of duties as a best practice, a management cluster should be deployed over an underlying layer of dedicated compute and storage resources without having to compete with business or tenant workloads. This practice also simplifies the approach for data protection, availability, and recoverability of the service components in use on the management cluster.

Blog - Leveraging VSAN for HA management clusters_1

Rationale for a Software-Defined Storage Solution

The use of traditional storage devices in the context of the Cloud Management Platform requires the purchase of dedicated hardware to provide the necessary workload isolation, performance, and high availability.

In the case of a Cloud Service Provider, the cost and management complexity of these assets would most likely be passed on the service costs to the consumer with the risk of tailoring a less competitive solution offering. Virtual SAN can dramatically reduce cost and complexity for this dedicated management environment. Some of the key benefits including the following:

  • Reduced management complexity because of the native integration with VMware vSphere® at the hypervisor level and access to a common management interface
  • Independence from shared or external storage devices, because it abstracts the hosts locally attached storage and presents it as a uniform datastore to the virtual machines
  • Granular virtual machine-centric policies which allow you to tune performance on a per-workload basis.

Availability as a Top Requirement

Availability is defined as “The degree to which a system or component is operational and accessible when required for use” [IEEE 610]. It is commonly calculated as a percentage, and often measured in term of number of 9s.

Availability = Uptime / (Uptime + Downtime)

To calculate the overall availability of a complex system, the availability percentage of each component should be multiplied as a factor.

Overall Availability = Element#1(availability %) * Element#2(availability %) * … * Element#n(availability %)

 

Number of 9s Availability % Downtime/year System/component inaccessible
1 90% 36.5 days Over 5 weeks per year
2 99% 3.65 days Less than 4 days per year
3 99.9% 8.76 hours About 9 hours per year
4 99.99% 52.56 minutes About 1 hour per year
5 99.999% 5.26 minutes About 5 minutes per year
6 99.9999% 31.5 seconds About half minute per year

When defining the level of service for its offering, the Cloud Service Provider will take this data into account and compute the expected availability of the systems provided. In this way, the vCloud consumer is able to correctly plan the positioning of their own workloads depending on their criticality and the business needs.

In a single or multi-tenant scenario, because the management cluster is transparent to the vCloud consumers, the class of service for this set of components is critical for delivering a resilient environment. If any Service Level Agreement is defined between the Cloud Service Provider (CMP) and the vCloud consumers, the level of availability for the CMP should match or be at least comparable to the highest requirement defined across the SLAs to maintain both the management cluster and the resource groups in the same availability zone.

Virtual SAN and High Availability

To support a critical management cluster, the underlying SDS solution must fulfill strict high availability requirements. Some of the key elements of Virtual SAN include the following:

  • Distributed architecture implementing a software-based data redundancy, similar to hardware-based RAID, by mirroring the data, not only across storage devices, but also across server hosts for increased reliability and redundancy
  • Data management based on data containers: logical objects carrying their own data and metadata
  • Intrinsic cost advantage by leveraging commodity hardware (physical servers and locally-attached flash or hard disks) to deliver mission critical availability to the overlying workloads
  • Seamless ability to scale out capacity and performance by adding more nodes to the Virtual SAN cluster, or to scale up by adding new drives to the existing hosts
  • Tiered storage functionality through the combination of storage policies, disk group configurations, and heterogeneous physical storage devices

Virtual SAN allows a storage policy configuration defining the number of failures to tolerate (FTT) which represents the number of copies of the virtual machine components to store across the cluster. This policy can increase or decrease the level of redundancy of the objects and their degree of tolerance to the loss of one or more nodes of the cluster.

Virtual SAN also supports and integrates VMware vSphere® High Availability (HA) features, including the following:

  • In case of a physical system failure, vSphere HA powers up the virtual machines on the remaining hosts
  • VMware vSphere Fault Tolerance (FT) provides continuous availability for virtual machines (applications) up to a limited size of 4 vCPUs and 64 GB RAM
  • VMware vSphere Data Protection™ provides a combination of backup and restore features for both virtual machines and applications

Blog - Leveraging VSAN for HA management clusters_2

Architecture Example

This example provides a conceptual system design for an architecture to implement a CMP in a cloud service provider scenario with basic resiliency and that is supported by Virtual SAN. The key elements of this design include the following:

  • Management cluster located in a single site
  • Two fault domains identified by the rack placement of the servers
  • A Witness to achieve a quorum in case of a failure, deployed on a dedicated virtual appliance (a Witness Appliance is a customized nested ESXi host designed to store objects and metadata from the cluster, pre-configured and available for download from VMware)
  • Full suite of management products, including optional CSP-related services
  • Virtual SAN general rule for failure to tolerate set to the value of 1 (two copies per object)
  • vSphere High Availability feature enabled for the relevant workloads

This example is a starting point that can provide an overall availability close to four 9’s, or 99.99%. Virtual SAN provides greater availability rates by increasing the number of copies per object (FTT) and the number of fault domains.

Some of the availability metrics for computing overall availability are variable and lie outside the scope of this blog post, but they can be summarized as the following:

  • Rack (power supplies, cabling, top of rack network switches, and so on)
  • Host (physical server and hardware components)
  • Hard disks MTBF (both SSD and spindle)
  • Hard disks capacity and performance (influence rebuild time)
  • Selection of the FTT, which influences the required capacity across the management cluster

Blog - Leveraging VSAN for HA management clusters_3

The complete architecture example will be documented and released as part of the VMware vCloud Architecture Toolkitfor Service Providers in Q1 2016.

 

Migration Strategies for vCloud Air Network Service Providers

As a vCloud Air Network service provider, building and offering hybrid cloud services to customers based on the SDDC is only half of the battle. Making sure that they are able to consume that service with fluidity becomes a critical area of focus. The less friction this Cloud Migration process has, the faster the customer time to value and service provider time to revenue become.

To address this rather broad subject, VMware is publishing a new document, “Migration Strategies for Hybrid Cloud,” in the VMware vCloud Architecture Toolkit™ for Service Providers (vCAT-SP). This blog introduces high-level concepts from that document. These concepts are meant to help both service providers and customers alike understand the opportunities and challenges when undergoing migration to a hybrid cloud scenario. Because this topic covers a vast area of information, the document only covers a few of the use cases available. Many of the more advanced use cases are accomplished through VMware Technology Partners, so stay tuned to the vCAT blog for additional information on how these solutions can be leveraged for migration to the hybrid cloud.

Migration

Figure 1. High Level Tool Categories for Migration

Looking at the figure above, we can see there are four main categories of tools available to accomplish different phases of the migration workflow. While there is a possibility that each category will be provided discretely in a single tool, it is often the case that single tools function in more than one category. It is also quite likely for most migration use cases that operators must coordinate activities between the tools in a workflow. Some, or in a best case scenario, all of these capabilities are integrated to help parties carry out a significant number of steps depending on the variables required for each migration instance. Leveraging the SDDC and its APIs provides the opportunity to automate as many of these steps as possible, and many of the available tools will facilitate some level of this type of automation.

More often than not, however, the governance of the migration projects, or perhaps even programs, should be addressed with a Migration Center of Excellence. In this Migration COE, typically hosted by the service provider, will be one or more instances of this tool chain constructed to allow customers and potentially other partners to come together and understand all of the variations that may drive migrations. Too often there is a rush to the workload migration tools themselves to relocate applications to the cloud without having considered the pitfalls, risks, and even upside potential offered by looking at the problem holistically. Specifically, we want customers to leverage the SDDC along with other VMware and Technology Partner solutions to introspect current application architectures as well as plan, and perhaps automate, the target tenant consumption of the service provider. The Migration COE allows us to visualize the “best fit” combination of tools and processes to plan the customer migration experience. The more information that can be applied to the process, the better.

By gleaning all of the potential information about source infrastructure and applications, we can create a repository of knowledge to plan migrations. The more virtualization and cloud-oriented solutions that are installed on the customer premises, such as VMware NSX® or VMware vRealize®, the more “migration ready” applications under the management of that infrastructure become. This is due to the ubiquity of the target hybrid cloud architectures, both built on the VMware SDDC. The primary function of the discovery and assessment tools is to ascertain the dependencies of the applications at a functional technology level. Examples of this might be DNS, PKI, or other authentication/authorization services, such as LDAP, that need to be made available to the application in its new post-migration home. Determining these dependencies will go far in planning for the serialization and parallelization of follow-on tasks related to migration and help to feed the three downstream task types—job scheduling, workload migration, and application verification. A great example of this customer-centric approach to discovery and assessment leverages VMware NSX and VMware vRealize Log Insight™. Once configured, the solution provides visualization of network activities through the Log Insight NSX for vSphere Content Pack v3, including application component interaction through networks and ports as described in this video.

Another important topic discussed in the migration document is workload mobility. There are a number of ways to provide hybrid cloud network connectivity (some are described in the blog Streamlining VMware vCloud Air Network Customer Onboarding with VMware NSX Edge Services), and many ways in which customers understand the concept of workload mobility. Because of the SDDC abstraction, many concepts discussed in the vCAT-SP use the terms “underlay” and “overlay”. While there is an obvious requirement for Layer 3 network connectivity to each site, the architecture will depend on the VMware software capabilities available at each site. Customers may choose VMware vSphere® metro clusters, a disaster avoidance scenario using VMware vSphere Replication™, or disaster recovery with VMware Site Recovery Manager™.

The Migration COE may include recommended methods based on any or all of these capabilities to help understand which may be appropriate in what situations. The hybrid network types in the previous paragraph provide workload mobility in the SDDC portion of the underlay that require VMkernel ports and operations. Migration solutions discussed in the vCAT-SP migration document, however, focus on the overlay consumption of hybrid cloud networks provided by VMware NSX in the creation of target environment capabilities to facilitate acceptable application characteristics in the new hybrid cloud location.  For example, the creation of VMware NSX Distributed Firewall policies for application-centric micro-segmentation as described in the vCAT-SP blog, Micro-Segmentation with NSX for vCloud Air Network Service Providers. Because the overall costs of labor in a migration can exceed 50%, as described by pro forma cost model in this Forrester brief and detailed in this blog, migration becomes the lynchpin for the entire process of acquiring and recognizing new customers consuming the services offered. Choosing the right combination of tools and labor then is in the critical path to making sure migrations function in an optimal fashion.

Another critical facet that that might be outside of the Migration COE is capacity planning. The different methods used in workload mobility require specific underlay network capabilities to achieve their goals, mainly bandwidth/throughput and latency. More information on underlay networking for hybrid cloud can be found in the vCAT-SP document, Architecting a Hybrid Mobility Strategy with VMware Cloud Air Network. It is important to understand that the entire phenomenon of workload mobility, including migration, is a numbers game and not just of network performance. The customers will demand an understanding of how the application will be managed for performance and maintenance in the new environment, perhaps through SLA’s, which will be used to forecast the service provider’s TCO of the hosting environment. Provider compute/storage/network infrastructure must be provisioned in time to accommodate new tenant migration activities including potential shared transfer storage along with ongoing performance requirements. Some of the main drivers for the application cutover itself can be related to Recovery Point Objectives/Recovery Time Objectives, perhaps requiring the introduction of a hardware storage replication scheme into the mix.  Consider also operational lead times for deploying and making these items ready for consumption and the potential ROI from automating as many tasks as possible.

Finally, one of the key reasons a service provider would drive their customers to collect the fullest amount of data possible is to leverage it to predict which customer workloads come with a “stickiness” to new services offered by the service provider and their partners. The ability to digest and manage all of this data in an effective, holistic way provides agility, creating a migration “funnel” of activities, fully leveraging but not exceeding capacities. This is achieved while also sustaining transparency to stakeholders, which is very powerful when a new journey is undertaken. Because vCloud Air Network offerings are built on the VMware SDDC you can be confident that it will offer the greatest compatibility and ease of both migration and mapping new operational procedures based on best practices in the vCAT-SP.

Micro-Segmentation with NSX for vCloud Air Network Service Providers

Micro-Segmentation with VMware NSX for VMware vCloud Air Network Service Providers

Introduction

As a VMware vCloud® Air™ Network service provider running your cloud with VMware software, you’re probably familiar with technologies such as VMware NSX® and how they can be used to accomplish huge paradigm shifts within the enterprise data center. Micro-segmentation is one of the phenomena brought about by VMware NSX that facilitates one of these shifts—software-defined networking and security. Owning and operating a VMware powered data center means you are also likely seeking to leverage differentiators in the VMware platform to offer new, value-add services to your customers. What might not be clear, however, is how to take a killer feature like micro-segmentation and build differentiating use cases into the platform that can help customers and other partners in solving many challenges.

This is the first in a series of blog posts designed to help vCloud Air Network partners to do just that—offer new, differentiated services that leverage software-defined networking and security. These blog posts serve as a vehicle to introduce several forms of information. First will be the published reference architectures that match the subjects of these blogs, in this case, micro-segmentation. Second, use cases based on the reference architectures will be provided. Last, the Managed Security Services Maturity Model will offer the opportunity to provide increasingly enhanced security-related services to our customers by positioning those use cases within the maturity model that are the best fit. A separate blog on the maturity model is forthcoming.

Understanding Industry Challenges

Micro-segmentation is the ability to provide segmentation at a micro, or VM, level. Micro-segmentation may employ different mechanisms for different components of the virtual machine and in this blog we are discussing the virtual network component. In days past segmentation was achieved by means of physical separation of the servers (and their network interfaces) in order to filter tiers of an application. This of course is inefficient at best to do in a cloud computing environment although many customers and service providers are left to do just that in the name of security, compliance, etc. In the purest sense then, micro-segmentation is about bringing functionally equivalent segmentation to the virtualization layer effectively allowing virtual machines to exist in an isolated security context while consuming shared resources.

One of the fundamental challenges solved by micro-segmentation is East/West traffic in the data center. Simply put, micro-segmentation provides the ability to apply network-centric controls to virtual machines without “hairpinning” traffic, or taking all packets between every virtual machine and passing them through centralized firewall technologies to be filtered. This legacy approach creates immense operational challenges for managing physical network components, including VLANs, cabling, and overall throughput of the security devices. From a security perspective, any traffic that cannot use the hairpinning method of transport falls outside of policies, and renders “blind spots” for cyber threats to communicate. While many vendors make virtual versions of their firewall and other security appliances, performance suffers due to serialization of network traffic across many contexts in the virtualization stack.

To address many of these challenges, VMware NSX introduced VXLAN and Distributed Firewall to the mix. VXLAN extends virtual Layer 2 subnets, known as “overlay” networks, over any physical Layer 3 routed network, also known as the “underlay” networks. In addition, VMware NSX now provides a stateful, virtual firewall running in the VMware ESXi™ hypervisor memory space, right next to where the network traffic is serialized from the physical network interface. This provides not only tremendous performance benefits, but also the ability to deal with firewall tuples that are no longer bound only to the “old school” mechanisms of TCP ports, IP addresses, and so on. VMware NSX Distributed Firewall now includes next-generation features like Active Directory security identifiers, and dynamic groups of VMware vSphere® objects, where policies can be enforced independent of, or in addition to, network configurations of protected virtual machines. What is perhaps most important, no matter where those protected vSphere objects might reside in terms of ESXi hosts across a hybrid cloud, they will be protected by those policies enforced within the hypervisor space prior to being serialized for network I/O. To level set readers in understanding these concepts, see this short video:

VMware NSX Hybrid Cloud Networks and Micro-Segmentation

While this awesome new capability opens many opportunities for VMware and vCloud Air Network partners to offer something truly unique in the industry, the ways to deploy the micro-segmentation pattern must be addressed. To evaluate the critical path items, first consider the potential deployment models and types of managed services that can be offered to aid in adoption of this new method of deploying firewall security into the hybrid cloud. Prerequisite to understanding the ideal deployment model for micro-segmentation will be the planning of how to deliver the “underlay network” or the Layer 3 path from the vCloud Air Network service provider data center to the customer premises. Once this is understood, the types of VXLAN networks, along with potential Layer 3 routes, will need to be prescribed for both underlay and overlay. This approach will be decided by each service provider but does have implications as to how the NSX Distributed Firewall and micro-segmentation will be implemented.

For more background, remember that in vSphere 6 and VMware NSX 6.2, as detailed in the blog “Live Workload Mobility to a vCloud Air Network IaaS Provider” , VMware introduced features critical to the delivery of a hybrid cloud network. First was the ability for a VMware vSphere Distributed Switch™ to exist within a VXLAN network across VMware vCenter™ instances (VMware NSX Manager™ now supports up to eight vCenter instances). In addition, was the ability of cross vCenter VMware vSphere vMotion® operation which also synchronizes vSphere Distributed Switch definitions across participating vCenter instances. However, this doesn’t come without its drawbacks. In this scenario, the VMware NSX Distributed Firewall is restricted to the aforementioned legacy, or “old school”, network security tuples known as Universal Security Groups. These Universal Security Groups provide potential for shared management of policies, and assurance that migrated workloads come with a collection that is transportable across these domain boundaries (from private to public cloud). Note: Universal Services/Service Groups replicate Universal object states.

Deployment Models and Managed Services

Given the new paradigms introduced by VMware NSX Distributed Firewall, along with the myriad ways in which hybrid cloud networks can be architected and deployed, it becomes increasingly necessary to generate “line of sight” through not only the on-boarding process but also the process of taking ownership of workloads with regards to firewall policies. A critical exercise is to decide on questions such as whether or not you would support long-distance vSphere vMotion, and whether or not that is a one-time activity or can occur during only particular time windows as examples. To further illustrate this point, see Figure 1. below. In this case, up to eight vCenter instances are enlisted in a replication scheme to synchronize universal object types between them. This allows the inventory to stay updated relative to virtual machine location, network connectivity, and distributed firewall rules that will be applied.

NSX Universal

Figure 1. Multi vCenter Synchronized VMware NSX Universal Objects

While this provides the most freedom relative to workload mobility, and perhaps even elastic consumption in some cases, it does so at a loss of some of the more advanced security groupings used to dynamically enforce policies that will be discussed in future blogs. All is not lost, however, because advanced groupings and policy application are not excluded from participation. They are simply bound to a single vCenter in scope, and therefore, to a single NSX Manager on whichever side of the hybrid cloud they may lie. Because the Security Group option is available as a Universal object type, you can still group virtual machines for application of policies. However, those rules become static as opposed to the dynamic ones that are used to orchestrate many NSX security related operations.

As you will see in the upcoming blogs, this full VMware NSX security context is critical for delivering increasingly greater value in terms of security functions that you are able to offload for your customers as a managed service. While eliminating the network boundary between data centers and moving the firewall and its pertinent rule set to be enforced into each ESXi host, there remains a boundary between the private cloud or public side. This boundary is no longer necessarily of only networks but also management in nature consisting of objects with a universal context. The freedom given in operations like long-distance vSphere vMotion migration of virtual machines across these boundaries requires an understanding of how to take ownership of more facets of the customer workload that can benefit from security controls implemented by the provider, filesystem encryption, vulnerability scanning, or operating system patching just to name a few. This philosophy becomes critical in the delivery of a managed service where disruptive networking and security technology is employed.

Conclusion

This situation opens up opportunities to take ownership of security services management, such as firewall, along with the greatly simplified positioning of micro-segmentation, through a managed service. This will require careful coordination of items such as workload migration and application of security policy via Universal or standard NSX security groups. By defining optimal policies for each of the VMware NSX security realms and providing administrator sessions for customers to manage Universal objects (as Advanced Networking Services will do for VMware vCloud Director®), VMware wants vCloud Air Network partners to become Centers of Excellence for customers, conveying the delivery of advanced security capabilities.

Given the nature of shared responsibility that is required, many of the challenges in delivering micro-segmentation to the hybrid cloud are not unique. However, the opportunities relative to operationalizing security in a hybrid cloud model with your customers, are numerous. Managing the relationship with your customers becomes an integral part of how future services based on security will be offered. This relationship management, now consisting of even more diligence regarding what expectations should be on all sides, includes strictly-defined, measurable parameters for all security services to be delivered. With VMware NSX, its Distributed Firewall, and micro-segmentation, VMware is well on the way to delivering network security and operations in way that changes the very nature of these concerns for hybrid cloud from impediment to asset. All that is left is understanding and mapping the value in ways that can be effectively executed upon to reduce risk and to realize the hybrid cloud vision. Stay tuned for future blog posts here on the vCAT blog that will show you how to do just that.

 

Live Workload Mobility to a vCloud Air Network IaaS Provider

Solution Introduction

VMware vCloud Air Network providers are uniquely positioned to become a seamless extension of their existing customers on-premises datacenters, offering a true unified hybrid cloud experience for applications and cloud infrastructure management.

With the introduction of NSX 6.2 and vSphere 6.0, VMware introduced the concept of cross vCenter Network and Security between vCenter servers that are within 150ms RTT. This raises some excellent opportunities for vCloud Air Network Providers to offer live workload mobility and business continuity services as an extension of their end-customers on-premises data centers.

This blog post will introduce a solution which can be offered by VMware’s vCloud Air Network partners to enable live workload mobility between an end-customer’s on-premises data center and a VMware vCloud Air Network provider. A follow-up blog post in the coming weeks will explain how vCloud Air Network provider’s can also very easily introduce business continuity services for their customers on top of this solution.

The full solution will be published as part of vCloud Architecture Toolkit for Service Providers during the first quarter of 2016.

Key Business Drivers

  • To provide a seamless extension to the end-customer’s data center, enabling ease of migration between customer and provider data centers.
  • To provide additional ‘burstable’ capacity to end-customers to support emerging projects, based on business demand.
  • To provide consistent security policies enforcement and micro-segmentation to all end-customer workloads, whether based on-premises or within the hosting provider’s data center.
  • To provide a managed mobility service to end-customers, where the provider executes mobility requests.
  • To offer a self-service workload mobility, disaster recovery and disaster avoidance solution to the end-customers.

Assumptions

  • Network connectivity between datacenters is established and out of scope for this blog post.
  • vMotion networks are configured at both provider and customer data centers.

Architecture Overview

The design below highlights a vCloud Air Network provider managed solution, where an end-customer datacenter is connected to a vCloud Air Network provider data center via a federated vSphere and NSX management domain. This architecture introduces “Universal Objects” in NSX, which are objects that span across vCenter server objects. The following sections will highlight the management components required and which NSX universal objects have been configured with basic configuration considerations.

 Workload Mobility Image v1-2

Software Bill of Materials

Management Components

  • VMware vCenter Server at each site with mirrored release versions:
    • Both vCenter Server instances should be members of the same SSO domain for operations carried out through the UI. However, separate SSO domains can be supported if vMotion operations are executed through the API with appropriate authentication.
  • VMware NSX Manager at each site, paired with their local vCenter Server:
    • The primary NSX manager hosted in provider data center, and secondary NSX manager in end-customer’s data-center.

Control Plane Components

Data-Plane Components

  • Universal Transport Zone – controls the hosts that a universal object can span across – this needs to be configured across both vCenter Servers (vCAN Provider and on-premises).
  • Universal Logical Distributed Router – provides east > west routing between universal logical switches.
  • Universal Logical Switches – Layer 2 segment which spans the universal transport zone. This is where the provider and customer will attach the virtual machine network.

Service Offerings

This solution has several potential service offerings that the vCloud Air Network provider can offer to their end-customers:

  • Hosted Virtual Infrastructure – the provider can offer their existing virtual hosted infrastructure portfolio as its foundation offering, with the required scale and distribution the end-customer requires for their new initiatives, or to support migration.
  • Network Connectivity between provider and end-customer – with support for higher levels of latency, up to 150ms, the options which the provider can offer their end-customers could range from direct connected networks, to VPN connectivity across the internet, leveraging NSX services such as L2, SSL or IPSec VPN.
  • Advanced Hybrid Networking Services – the provider can offer their end-customers additional hybrid software-defined networking services, ranging from NAT, DHCP, Firewall, Routing (dynamic / static) and Load-Balancing services.
  • Portable Security Services – the provider, or end-customer, can build security policies and groups with dynamic membership, which work at a per-VM level across the provider and end-customer’s data centers.
  • Live Workload Mobility Services – with this architecture, the hosting provider can enable live workload mobility services between the end-customer and the provider data centers.
  • Disaster Avoidance Services – with this architecture, the provider can build true hybrid applications, maintaining Layer 2 network connectivity between application components hosted on-premises and with the provider.

Conclusion

As we have seen outlined above, by including the VMware NSX 6.2 into a vCloud Air Network provider’s hosting portfolio, the service provider can offer a unified hybrid platform which enables the provider to become a strategic extension of their end-customer’s data center. By extending network and security services across these data centers, we can enable numerous use-cases around workload mobility, disaster avoidance and disaster recovery, which will be covered in more detail with a follow up blog post.

For more information on how a vCloud Air Network Provider can leverage long-distance vMotion to enhance their user experience, please refer to the vCAT-SP document: Architecting a Hybrid Mobility Strategy for vCloud Air Network.