Home > Blogs > VMware SMB Blog > Category Archives: Business Continuity & Disaster Recovery

Category Archives: Business Continuity & Disaster Recovery

Pop Quiz: Take the DR Challenge for a Chance to Win a Pass to VMworld 2015!

Have you mastered your organization’s disaster recovery plan? Think you know the answers to, “When did we last test our plan?” or “How much would an hour of downtime cost?” If so, we want to give you a chance to put your knowledge to the test.

Last year, we launched the first DR Challenge and invited our audience to test their disaster recovery know-how. The participation was so great we’ve decided to bring it back! Sign up for this year’s DR Challenge and try your hand at three quizzes designed to test your mastery of cloud-based disaster recovery facts. A perfect score enters you into a drawing for a free pass to VMworld 2015!

The challenge begins February 12th, and runs through March 25th.  You’ll have roughly 2 weeks to take each quiz. Register now to get started:

Quiz Challenge 1: Feb. 12 – Feb. 25   

Quiz Challenge 2: Feb. 26 – Mar. 11

Quiz Challenge 3: Mar. 12 – Mar. 25

You’re free to take the quiz as many times as you want, but you only need to ace it once to be entered to win the grand prize. (We have faith you can do it, but just in case, we’ll also offer hints along the way.)

Click here to sign up for the DR Challenge. We’ll announce the winners at the end of each challenge, so stay tuned on our social channels. Good luck, everyone!

To learn more about vCloud Air Disaster Recovery, visit us at vCloud.VMware.com.

Four Ways a BC/DR Plan Can Help Your SMB – Part 3: Software-Based Replication

By: vExpert Gregg Robertson                        

A Business Continuity and Disaster Recovery (BC/DR) iplan s something every business, no matter how big or small, should be thinking about and implementing. Whilst preparing for my VCAP-DCD and even for my VCDX attempt, BC/DR was a very important topic, as two of the infrastructure qualities of AMPRS (Availability, Manageability, Performance, Recoverability and Security)  are availability and recoverability.

In my daily role as a consultant, BC/DR is a core component for every virtualization design, no matter if it is data center virtualization, end-user computing or hybrid cloud. In this four-part blog series, I am going to cover four ways BC/DR can help your small/midsized business (SMB) through the usage of solutions available to you. In this third blog, I will cover the benefits of automated software-based replication built in as a feature in VMware vSphere.

Automated Software-Based Replication

With the release of VMware vSphere 5.1, came the availability of vSphere Replication (VR), which was previously only available in VMware Site Recovery Manager 5.0. VR is a software-based replication engine that works at the host level rather than the array level. Identical hardware is not required between sites, and in fact, customers can run their VMs on any type of storage they choose at their site – even local storage on the vSphere hosts, and VR will still work. It provides simple and cost-efficient replication of applications to a failover site. VR is a component delivered with vSphere Essentials Plus and above editions, and also comes bundled with vCenter Site Recovery Manager. This offers protection and simple recoverability to the vast majority of VMware customers, without additional cost.

vSphere Replication allows single site replication and protection. This is perfect for SMB organizations that may have a local campus, with a single cluster spanning two floors of a building where recoverability is within a proximal datacenter. If a floor loses power and the primary hosts and disks are unreachable, the administrator could simply point to the replica VMDK within VR and choose to recover it. The administrator deploys a single VR Appliance to act as both the replication manager and also the recipient and distributor of changed blocks. Then the admin configures a VM and one or more of its VMDK files to be replicated, giving the local VR Appliance as the target, and selecting a different datastore for the replica of the VM. The vSphere Replication Agent on the appropriate vSphere 5.x host that holds the running VM then starts tracking changes to disk as they are being written, and in accordance with the configured RPO sends the changed blocks to the VR Appliance.  The VR Appliance passes the changed block bundle via NFC to a host to write the blocks to the replica VMDK.

VR is also a perfect fit for IT managers looking to protect virtual machines in ROBO scenarios.

In this model, hosts at remote sites are not managed by distributed vCenter Server instances, but from a central ‘head office’ datacenter. A single vCenter Server instance manages both local vSphere instances and remote clusters or hosts.

VMs from multiple remote sites need to be replicated to the central office in this scenario.  At the remote sites, as long as the hosts are vSphere 5.x, there is no change necessary to be implemented. They will have the necessary vSphere Replication built in to the kernel.

At the head office datacenter, at least one vSphere Replication Appliance must be deployed to manage the replication of all the VMs (both remote and local targets). This single appliance will usually be sufficient to handle the incoming replications, but sometimes customers will want to isolate replication traffic by source, or will need to scale up the number of recipient servers to handle more incoming replications.

In that case, administrators can deploy more VR Servers (Not the full VR Appliance – there is only one per vCenter) to handle isolating the incoming replication traffic or to adjust for scale.

Each VR Server can be used as a dedicated target for one or more remote sites.

Within the main datacenter, the VR Servers will pass the incoming replication data to the recovery cluster via Network File Copy for committing to local replica copies of the remote VMs.

Continue reading

Four Ways BC/DR Can Help Your SMB – Part 2: Automated High Availability

By: Gregg Robertson ,vExpert

Business Continuity and Disaster Recovery (BC/DR) is something every business, no matter how big or small, should be thinking about and planning for. Whilst preparing for my VCAP-DCD and even for my VCDX attempt, BC/DR was a very important topic, as two of the infrastructure qualities of AMPRS (Availability, Manageability, Performance, Recoverability and Security) designs should show impact on availability and recoverability.

In my daily role as a consultant, BC/DR is a core component for every virtualization design no matter if it is data center virtualization, end-user computing or hybrid cloud. In this four-part blog series, I am going to cover four different ways BC/DR can help you with your small/midsized business (SMB) IT infrastructure.  In this second blog, we will cover the benefits of automated high availability built in as a feature in VMware vSphere.

Automated High Availability For SMB’s

BC/DR is met and ensured with features that have been part of vSphere for years, like VMware High Availability (HA), which, since vSphere 5.0, has been rebuilt from the ground up to use the Fault Domain Manager (FDM) agent instead of the legacy AAM agent (Legato Automated Availability Management). This rebuilding of a new agent has introduced higher resiliency and less complexity and means that HA can be enabled with as little as five clicks and be installed onto ESXi hosts in seconds rather than the minutes that it took previously. HA allows you to protect the virtual machines running on your hosts from isolation and/or recover from host failure by restarting the virtual machines on the affected host to the remaining working hosts, thereby bringing your applications and solutions back online as soon as possible. With the new FDM agent, this also allows partitioned hosts to elect a master node within the partitioned section and maintain the uptime of the virtual machines on the affected hosts. HA also has a number of features that provide additional checks to ensure that hosts are indeed non-responsive before rebooting the virtual machines through the usage of Datastore Heartbeating and the setting of additional isolation addresses.

HA can also restart virtual machines if the application in a virtual machine fails through the usage of application monitoring. By utilizing the appropriate SDK or an application that supports VMware application monitoring, HA can setup customized heartbeats for your applications.

vSphere HA has several advantages over traditional failover solutions, including:

Minimal setup – After a vSphere HA cluster is set up, all virtual machines in the cluster get failover support without additional configuration.

Reduced hardware cost and setupThe virtual machine acts as a portable container for the applications and it can be moved among hosts. Administrators avoid duplicate configurations on multiple machines. When you use vSphere HA, you must have sufficient resources to fail over the number of hosts you want to protect with vSphere HA. However, the vCenter Server system automatically manages resources and configures clusters.

Increased application availability – Any application running inside a virtual machine has access to increased availability. Because the virtual machine can recover from hardware failure, all applications that start at boot have increased availability without increased computing needs, even if the application is not itself a clustered application. By monitoring and responding to VMware Tools heartbeats and restarting nonresponsive virtual machines, it protects against guest operating system crashes.

Distributed Resource Scheduler (DRS) and vMotion integration – If a host fails and virtual machines are restarted on other hosts, DRS can provide migration recommendations or migrate virtual machines for balanced resource allocation. If one or both of the source and destination hosts of a migration fail, vSphere HA can help recover from that failure.

High Availability Overview

Fault Domain Manager Agent

HA’s architecture is fairly simple with the FDM agent being installed on each ESXi host within a vSphere cluster that has HA enabled. As of vSphere 5.0, there is now only a single master node and all the remaining hosts within the cluster are slaves which report their health to the master node as well as the vCenter server. This is unlike HA in versions previous to vSphere 5.0, where there were Primary and Secondary nodes, which constrained you to a limit of 5 primary nodes and the need to have at least 1 primary node available. The below diagram shows a simplistic view of the FDM agent on each host and the allocation of the master and slave roles to the hosts.


As of vSphere 5.0, there are now two different heartbeat mechanisms that HA uses to ensure the health of the ESXi hosts within the HA enabled cluster. The first of these is datastore heartbeating, a new feature as of vSphere 5.0. Datastore heartbeating adds an additional check where HA utilizes the existing VMFS file system locking mechanism of creating a heartbeat region. The heartbeat region is where at least one file per host is kept open per selected heartbeat datastore (default is two datastores). HA does a check whether the heartbeat region has been updated and if it has, then the host still has storage connectivity and therefore the virtual machines on the host don’t need to be restarted elsewhere. The below diagram shows the selection of three datastores and that currently, only two of the hosts within the cluster are attached to the two datastores. Good design practice is to allow HA to select the datastores, as HA will choose the datastores with the most connected hosts and if applicable NFS and FC/iSCSI datastores to ensure added resiliency.

The other method of heartbeating is the standard way of using the heartbeat network to talk to the master and the master sends a heartbeat to the slaves, as I mentioned earlier in this blog. When a slave stops receiving heartbeats from its master, it will start trying to ascertain if it is isolated/partitioned or if the master is isolated or failed. To learn more about the various states of isolated, partitioned and failed hosts, this vSphere documentation on host failure types and detection describes it perfectly, as does the vSphere 5.1 Clustering Deepdive book by Duncan Epping and Frank Denneman.

High Availability Installation

The installation of HA is actually as simple as ticking the box to Turn ON vSphere HA during the creation of a vSphere cluster or by going into the settings of an existing cluster and enabling HA from within the cluster setting panel as shown below:

Selecting Enable admission control allows the admission control mechanism to control and protect a determined percentage of resources or number of hosts worth of resources for failover capabilities. I won’t go into all the different options and the permutations, as there are many, but the capabilities and settings of HA are defined and explained in depth in the vSphere 5.1 Clustering Deepdive book by Duncan Epping and Frank Denneman.

Conclusion: High availability benefits for SMBs

VMware vSphere High Availability allows three nines (99.9%) of availability, which is the sweet spot for SMB customers looking for automated and intelligent failover of their virtual machines in the event of hosts being lost/failing. The configurations enabled by the multiple heartbeating checks and the assurance via admission control that resources are set aside in the event of a host or more failure means HA is a brilliant solution for SMB businesses.

Look out for the third part of this series, where I will be covering how BC/DR through the usage of vSphere Replication can help your SMB.


Gregg Robertson is a senior consultant, professional blogger, vExpert 2011 – 2014, VCAP5-DCA/DCD, VCP-Cloud, VCP 3/4/5, VMware communities moderator and co-host of the EMEA vBrownbag weekly webinars/podcasts. Gregg’s blog, TheSaffaGeek , started as a place to write down fixes plus VMware certification links and resources, but has quickly found a large following of readers and subscribers.


Follow VMware SMB on FacebookTwitterSpiceworks and Google+ for more blog posts, conversation with your peers, and additional insights on IT issues facing small to midmarket businesses.

* HA Architecture diagram from vSphere 5.0 Clustering Deepdive book by Duncan Epping and Frank Denneman

The Evolution of IT Disaster Recovery

Like all things in the data center, disaster recovery approaches have evolved, needing to account for the volume and types of content that are now an every day reality.  Traditional manual disaster recovery solutions can’t keep up with these changes, making automation necessary.

These approaches took a big step forward with the adoption of virtualization. Back in 2001, many organizations were turning to virtualization to enable business continuity and disaster recovery, as well to save money, increase IT efficiency, and improve business agility.

And then came the growing challenges stemming from rising DR requirements at the storage layer. By 2004, replication could account for 30 percent of a DR solution’s cost.[i]

Today the storage picture looks even more daunting. Demand for data storage is expected to grow 41 percent in the next two years.[ii] This increase is putting an even bigger burden on IT to determine how to handle this increased need for storage with budgets that are remaining flat.

So how can you keep pace with it all? Look to VMware Virtual SAN™, the industry leader in hyper-converged software-defined storage for virtual environments. With its radically simple approach to storage, Virtual SAN is an evolutionary step forward that can deliver big savings in capital expenditures.

How big? Direct procurement of Virtual SAN can cost 50 percent less than traditional SAN storage.[iii] And you can save even more by automating your DR processes with VMware vCenter® Site Recovery Manager™, which provides automated orchestration and non-disruptive testing of centralized recovery plans for your virtualized applications. It can help you reduce your DR management costs by more than 50 percent.

Clearly, the combination of VMware Virtual SAN and vCenter Site Recovery Manager are important steps forward in the ongoing evolution of IT disaster recovery.

For a closer look at the topics discussed here, check out the new VMware infographic covering The Evolution of IT Disaster Recovery

Follow VMware SMB on FacebookTwitterSpiceworks and Google+ for more blog posts, conversation with your peers, and additional insights on IT issues facing small to midmarket businesses..

[i] Forrester Research, “The Total Economic Impact of VMware vCenter Site Recovery Manager,” May 2013.

[ii] IDC, “Worldwide Quarterly Disk Storage Systems Forecast,” 2013.

[iii] Gartner Inc., “Competitive Profiles & Vendors/Resellers,” 2013.


Why Virtualizing Business-Critical Applications is a Team Sport

Post by vExpert Michael Webster

I’ve been helping companies of all sizes virtualize their business-critical applications for about 8 years now. A lot has changed over that time, especially the capability of VMware Hypervisors to handle the most demanding workloads, as you can see from the image below. There are almost no limits to the type of applications that are now good candidates for virtualization from a technical perspective.