Home > Blogs > VMware vCloud Blog


vCloud Director 1.5 Multisite Cloud Considerations

By: Massimo Re Ferre', vCloud Architect

This is a repost from Massimo’s personal blog, IT 2.0 – Next Generation IT Infrastructures.

In the last few months, among other things, I have been working on a document called vCloud Director 1.5 Multisite Cloud Considerations. Being able to deploy vCloud Director 1.5 across different sites is something our customers and service provider partners have been asking us a lot.

Some of these customers and partners have decided to deploy independent vCloud Director instances in different “sites”, others have wanted more clarity on how far they could stretch a single vCloud Director instance across multiple “sites”. Of course both approaches present advantages and disadvantages.

We have never been very clear about the supportability boundaries other than “a single vCD instance can only been implemented in a single site”. What is a single site anyway? Is it a rack? Is it a building? Is it a campus? Is it a city? Is it a region? What is it? In this paper we have tried to clarify those boundaries. We have also provided some supportability guidelines. 

In the document, we have described the various components that comprise a vCloud environment and we have classified them in macro areas such as provider workloads, user workload clusters and user workloads.

VCD15MultisiteCloudConsiderations1

 

In a nutshell, throughout the document, we have tried to clarify and classify different MAN and WAN scenarios based on network connectivity characteristics (namely latency). We have determined, in our vCD parlance, what would constitute a single site deployment (over a MAN) and what would constitute a multisite deployment (over WAN). We have determined 20 ms of latency to be “our” threshold between what we can support and what we cannot support with this specific vCloud Director 1.5 release.

The document gets into a lot more details and scenarios but the two major takeaways are:

  • It is not possible to stretch the provider workloads that is the software modules that comprise your VMware vCloud (e.g. vCD cells, vCD database, the NFS share, etc).
  • It is possible to have Provider vDCs that are located up to 20 ms (RTT) from the provider workloads.

This picture summarizes one of the supported scenarios:

VCD15MultisiteCloudConsiderations2

 

In the doc we call out and describe more precisely other supported scenarios (such as stretched clusters) and various caveats associated. The following are the scenarios we are taking into account:

VCD15MultisiteCloudConsiderations3

 

It is important to understand that, when we talk about a distributed vCloud environment, we are not necessarily referring to DR of the end-user workloads. This is really about how a Service Provider can allow an end user to spin up workloads in a distributed environment. This doesn’t necessarily mean that the SP is responsible for failing over those workloads in the other data centers. If you want to know more about how to build a resilient vCloud architecture you should read this link.

Towards the end of the document we have summarized the supportability statements associated to distributing compute resources in a vCloud setup. In the current version of the doc the summary looks like this:

VCD15MultisiteCloudConsiderations4

 

If you are evaluating a multisite vCloud Director 1.5 deployment you may want to give this document a read. Note that it isn’t published externally on VMware.com but it is available through your VMware representative.

I’d be interested to hear any questions, comments, or feedback you may have. 

Massimo currently works as at VMware as a Staff Systems Engineer, vCloud Architect. He works with Service Providers and Outsourcers to help them shape their Public Cloud services roadmap based on VMware cloud technologies. Massimo also blogs about Next Generation IT Infrastructures on his personal blog, IT 2.0.

4 thoughts on “vCloud Director 1.5 Multisite Cloud Considerations

  1. Got a quick question for you. You mentioned that there is a 20ms RTT latency.
    So, if my math is correct 20ms translates roughly into just 10Mhz or 1.5Mbs for an MTU size of 1500 (std enet.) – yes?
    I’m trying to put that number into a real link speed needed for the WAN or MAN network.
    The more I think about this, the more I think that this RTT really talks about the speed at which vCD communicates with vCenter via API calls. Like, “please start a VM”, etc. If that is so, then this 20ms is perhaps based upon a timer in vCD. So if vCenter doesn’t come back with a response within that time frame, it will time out?
    If that is so, then this RTT is really based upon message RTTs and not just simply enet packet latency. So, if that’s the case… what is the actual link speed needed in Mhz or Ghz between the two sites??

  2. I am not familiar with measuring networking latency in Mhz to be honest. When we debated internally these boundaries we determined that both latency and bandwidth were important parameters. However bandwidth was too dependent on what the cloud consumer was doing to try to shoot for absolute numbers. Deploying one small vApp once in a while across a WAN is different than deploying ten big vApps in parallel concurrently at every point in time.
    Latency, as you pointed out, is an easier number to describe in absolute terms as it tends to depict the behavior of the underlying platform and how the different pieces interact with each other. From experience, engineering decided to draw a line at 20ms because we know if you stretch (the components we allow to stretch) within that boundary it would work. Consider this was a conservative number. There isn’t any hard limit or such.
    As I said I am not familiar with turning that latency number (expressed in ms) into other metrics like Ghz. If you could provide more background about your question we can see whether we could provide a better answer.
    Thanks.