Author Archives: Danilo Feroce

Migrating VMware vCloud Director vApps across Distributed Virtual Switches

An interesting topic that came to our attention is how to migrate VMware vCloud Director® vApps from one distributed virtual switch to another. Recently, from the experience of one of our field consultants, Aleksander Bukowinski, we received a detailed procedure to overcome the possible service disruptions due to such a move. Aleksander has also authored a whitepaper on this topic that will soon be available for our audience in VMware Partner Central. The paper also covers in detail an additional use case with Cisco Nexus 1000V and provides PowerShell and API call samples.

Depending on connectivity mode, we can have five different types of vApps in vCD: directly connected, routed, connected to routed vApp networks, isolated, and fenced. The migration process would not require shutting down the vApps while the migration happens, but rather could generate brief network outages in case the VMs are connected to a vCloud Director Edge Gateway, or no outage at all if the VMs use isolated networks with no dependency to the Edge. Continue reading

Protecting workloads in the cloud with minimal effort through VMware vCloud Availability

Among the many challenges an organization and its IT department confront on a daily basis, availability of services is particularly critical for the survival of the businesses that entrust and rely on the technologies on which their services have been built. At the same time, several legislations across different countries are creating continuous pressure on each and every organization to maintain an appropriate plan to protect and secure their data and their services.

Historically, every large enterprise has planned and built its own approach to face a disaster of small or large proportions in the most suitable way for their businesses: backups, hardware redundancy, host clustering, data mirroring, replication, geographically distributed sites, and so on, are just few identifiers for technologies and strategies to build a solution trying to address the problem.

Over the years, some of these technologies have been commoditized. Still for some of them, the financial burden to allow their implementation has been an overwhelming capital expense for many medium and small organizations. In addition, expertise is required to manage and organize the software, hardware, and storage components involved.

In this context, a great opportunity for cloud service providers has materialized. The market has increased its confidence in using cloud-based services offering a more cost-effective (subscription based) access to resources. Disaster recovery as a service (DRaaS) is a highly desirable service to offer to all organizations, but particularly for the ones that might have concerns or financial exposures caused by planning and building their own secondary data center site to make their services more robust and resilient to local disasters. Continue reading

Leveraging Virtual SAN for Highly Available Management Clusters

A pivotal element in each Cloud Service Provider service plan is the class of service being offered to the tenants. The amount of moving parts in a data center raises legitimate questions about the reliability of each component and its influence on the overall solution. Cloud infrastructure and services are built on the traditional three pillars: compute, networking and storage, assisted by security and availability technologies and processes.

The Cloud Management Platform (CMP) is the management foundation for VMware vCloud® Air Network™ providers with a critical set of components that deliver a resilient environment for vCloud consumers.

This blog post highlights how a vCloud Air Network provider can leverage VMware Virtual SAN™ as a cost effective, highly available storage solution for cloud services management environments, and how the availability requirements set by the business can be achieved.

Management Cluster

A management cluster is a group of hosts joined together and reserved for powering the components that provide infrastructure management services to the environment, some of which include the following:

  • VMware vCenter Server™ and database, or VMware vCenter Server Appliance™
  • VMware vCloud Director® cells and database
  • VMware vRealize® Orchestrator™
  • VMware NSX® Manager™
  • VMware vRealize Operations Manager™
  • VMware vRealize Automation™
  • Optional infrastructure services to adapt the service provider offering (LDAP, NTP, DNS, DHCP, and so on)

To help guarantee predictable reliability, steady performance, and separation of duties as a best practice, a management cluster should be deployed over an underlying layer of dedicated compute and storage resources without having to compete with business or tenant workloads. This practice also simplifies the approach for data protection, availability, and recoverability of the service components in use on the management cluster.

Blog - Leveraging VSAN for HA management clusters_1

Rationale for a Software-Defined Storage Solution

The use of traditional storage devices in the context of the Cloud Management Platform requires the purchase of dedicated hardware to provide the necessary workload isolation, performance, and high availability.

In the case of a Cloud Service Provider, the cost and management complexity of these assets would most likely be passed on the service costs to the consumer with the risk of tailoring a less competitive solution offering. Virtual SAN can dramatically reduce cost and complexity for this dedicated management environment. Some of the key benefits including the following:

  • Reduced management complexity because of the native integration with VMware vSphere® at the hypervisor level and access to a common management interface
  • Independence from shared or external storage devices, because it abstracts the hosts locally attached storage and presents it as a uniform datastore to the virtual machines
  • Granular virtual machine-centric policies which allow you to tune performance on a per-workload basis.

Availability as a Top Requirement

Availability is defined as “The degree to which a system or component is operational and accessible when required for use” [IEEE 610]. It is commonly calculated as a percentage, and often measured in term of number of 9s.

Availability = Uptime / (Uptime + Downtime)

To calculate the overall availability of a complex system, the availability percentage of each component should be multiplied as a factor.

Overall Availability = Element#1(availability %) * Element#2(availability %) * … * Element#n(availability %)


Number of 9s Availability % Downtime/year System/component inaccessible
1 90% 36.5 days Over 5 weeks per year
2 99% 3.65 days Less than 4 days per year
3 99.9% 8.76 hours About 9 hours per year
4 99.99% 52.56 minutes About 1 hour per year
5 99.999% 5.26 minutes About 5 minutes per year
6 99.9999% 31.5 seconds About half minute per year

When defining the level of service for its offering, the Cloud Service Provider will take this data into account and compute the expected availability of the systems provided. In this way, the vCloud consumer is able to correctly plan the positioning of their own workloads depending on their criticality and the business needs.

In a single or multi-tenant scenario, because the management cluster is transparent to the vCloud consumers, the class of service for this set of components is critical for delivering a resilient environment. If any Service Level Agreement is defined between the Cloud Service Provider (CMP) and the vCloud consumers, the level of availability for the CMP should match or be at least comparable to the highest requirement defined across the SLAs to maintain both the management cluster and the resource groups in the same availability zone.

Virtual SAN and High Availability

To support a critical management cluster, the underlying SDS solution must fulfill strict high availability requirements. Some of the key elements of Virtual SAN include the following:

  • Distributed architecture implementing a software-based data redundancy, similar to hardware-based RAID, by mirroring the data, not only across storage devices, but also across server hosts for increased reliability and redundancy
  • Data management based on data containers: logical objects carrying their own data and metadata
  • Intrinsic cost advantage by leveraging commodity hardware (physical servers and locally-attached flash or hard disks) to deliver mission critical availability to the overlying workloads
  • Seamless ability to scale out capacity and performance by adding more nodes to the Virtual SAN cluster, or to scale up by adding new drives to the existing hosts
  • Tiered storage functionality through the combination of storage policies, disk group configurations, and heterogeneous physical storage devices

Virtual SAN allows a storage policy configuration defining the number of failures to tolerate (FTT) which represents the number of copies of the virtual machine components to store across the cluster. This policy can increase or decrease the level of redundancy of the objects and their degree of tolerance to the loss of one or more nodes of the cluster.

Virtual SAN also supports and integrates VMware vSphere® High Availability (HA) features, including the following:

  • In case of a physical system failure, vSphere HA powers up the virtual machines on the remaining hosts
  • VMware vSphere Fault Tolerance (FT) provides continuous availability for virtual machines (applications) up to a limited size of 4 vCPUs and 64 GB RAM
  • VMware vSphere Data Protection™ provides a combination of backup and restore features for both virtual machines and applications

Blog - Leveraging VSAN for HA management clusters_2

Architecture Example

This example provides a conceptual system design for an architecture to implement a CMP in a cloud service provider scenario with basic resiliency and that is supported by Virtual SAN. The key elements of this design include the following:

  • Management cluster located in a single site
  • Two fault domains identified by the rack placement of the servers
  • A Witness to achieve a quorum in case of a failure, deployed on a dedicated virtual appliance (a Witness Appliance is a customized nested ESXi host designed to store objects and metadata from the cluster, pre-configured and available for download from VMware)
  • Full suite of management products, including optional CSP-related services
  • Virtual SAN general rule for failure to tolerate set to the value of 1 (two copies per object)
  • vSphere High Availability feature enabled for the relevant workloads

This example is a starting point that can provide an overall availability close to four 9’s, or 99.99%. Virtual SAN provides greater availability rates by increasing the number of copies per object (FTT) and the number of fault domains.

Some of the availability metrics for computing overall availability are variable and lie outside the scope of this blog post, but they can be summarized as the following:

  • Rack (power supplies, cabling, top of rack network switches, and so on)
  • Host (physical server and hardware components)
  • Hard disks MTBF (both SSD and spindle)
  • Hard disks capacity and performance (influence rebuild time)
  • Selection of the FTT, which influences the required capacity across the management cluster

Blog - Leveraging VSAN for HA management clusters_3

The complete architecture example will be documented and released as part of the VMware vCloud Architecture Toolkitfor Service Providers in Q1 2016.