Home > Blogs > VMware vSphere Blog > Author Archives: Duncan Epping
Duncan Epping

About Duncan Epping

Duncan Epping is Principal Architect at VMware (R&D, Integration Engineering) and is focused on vCloud / vSphere architecture and integration. He was among the first VMware certified design experts (VCDX 007). He is the co-author of several books, including best seller vSphere 5.1 Clustering Technical Deepdive. He is the owner and main author of the leading virtualization blog yellow-bricks.com.

New Beta Program offering: VMware Hosted Beta

Many of you probably have participated in one of the many beta programs VMware has offered in the last couple of years. I personally have participated in various beta programs when I was a customer / partner and I always loved going through the various exercises. The challenging part for me always was finding the time to setup the environment.

Recently VMware started offering a new way to participate in the evaluation and feedback of VMware’s developing products. The VMware Beta Program is now offering a Hosted Beta; providing registered users access to pre-build online Lab environments with guided workflows to get a closer look at the latest and greatest VMware technologies without the need to build-out infrastructure onsite.

This hosted technology is based on the same technologies used for the Hands-On Labs (HOL) at VMworld, providing a fully built environment to explore intricate product features while requiring nothing more than an HTML5 compliant browser and the latest View Client.

In my opinion this is a great opportunity to test-drive products and provide VMware with your feedback on the features still under development. On top of that this will allow you to spend 1-2 hour blocks to get acquainted with new technology, without the need to be on-site. You can do this at the office, or at home with just a connection to the internet.

If you are interested and want to learn more about the VMware Beta Program you can go here: http://communities.vmware.com/community/vmtn/beta

If you are interested in joining the VMware Beta Program you can either work with your VMware account team or submit a participation request form found here: http://communities.vmware.com/community/beta/betainterest

AWS Integration Survey

VMware is improving integration with Amazon Web Services and would like your input. vCloud Automation Center (based on our DynamicOps acquisition), Cloud Foundry, and vFabric Application Director already support provisioning on AWS, and we are interested in increasing support for AWS (through enhancements to vCAC, in particular). Do any teams in your company or organization use AWS? Are you interested in managing AWS usage better? If you could spare 5-10 minutes to answer some brief questions on your AWS usage — or a few seconds to let us know you don’t use AWS — it will guide our initiatives around AWS support and help prioritize enhancements to vCAC. Our survey is here:

http://vmdev.info/aws-survey

Technical Marketing Speaking Engagements – November 2012 – #tmupdate

After I saw Cormac posting his speaking engagements I figured why not do this for the whole team. I made a list for November, if you are in the area make sure to stop by and attend!

Cormac Hogan:

Duncan Epping:

Frank Denneman:

Alan Renouf:

William Lam:

Justin King:

Jeff Hunter:

Mark Achtemichuk:

Jim Senicka:

Autoscaling in vSphere survey!

As the infrastructure layer matures, the IT problems and bottlenecks are moving closer to the application layer. Autoscaling of application workloads is a feature that is important to customers building private clouds. VMware is uniquely positioned to be able to deliver autoscaling of infrastructure layer using insights from how application are performing. With technologies like Hyperic APM, Orchestration technologies like vCO and DynamicOps VMware already owns key pieces of the puzzle to providing a powerful autoscaling solution. We are looking for feedback on what you feel is important for your applications.
We have created this survey and would very much appreciate your feedback. It doesn’t take longer than 10 minutes to fill out, so please take the time.

 

Should I use das.failuredetectiontime or das.config.fdm.isolationPolicyDelaySec

Today I received a question around the use of das.failuredetectiontime and das.config.fdm.isolationPolicyDelaySec. Should you be using these advanced settings?

Short answer: no.

Long answer: This myth has been floating around for a long long time. Many people were under the impression that das.failuredetectiontime needed to be configured when you started adding additional isolation addresses.This is not the case!

I just received an email with the question if the same applied to vSphere 5.1 now that “das.config.fdm.isolationPolicyDelaySec” was introduced as a replacement for das.failiredetectiontime. The answer remains “no you should not be using this. Unless, unless there are specific networking requirements to do so. There is absolutely no need to increase the failure detection time / isolation policy delay when multiple isolation addresses are configured. Isolation addresses are pinged in parallel, which means no additional time is needed to complete this process.

In summary, it is not recommended to use these advanced settings unless you have specific networking requirements to do so.

A tweet says more than a 1000 words

Sometimes a 140 character tweet says more than a 1000 words…

If you are not using it today, enable it… it could also save your weekend!

@DuncanYB at VMworld 2012 Barcelona

I figured I would follow the infamous Cormac Hogan with a post around my VMworld activities. This is what my schedule looks like currently. If you want to meet me, have a discussion around anything VMware related I highly recommend the Expert 1:1 or Group Discussions! Note that you can register for the Group Discussions but you will need to sign up for the 1:1 at the show itself.

NetApp and IBM both vMSC certified…

By Duncan Epping, Principal Architect.

More and more storage vendors are certifying their stretched cluster solutions. Recently NetApp and IBM have both been added to the VMware HCL. Below you can find the links to the KB article which describe the supported configuration and the tested scenarios.

Those at the point of deploying a stretched cluster I would highly recommend reading these KB articles. Get as familiar as you can with the failure scenarios described, test them over and over again. This is key for operating a stretched cluster and will also give you a deep understanding of how your environment responds to failures.

Admission control: “used slots” exceeds “total slots”

By Duncan Epping, Principal Architect.

On the VMTN forum today someone asked how it was possible that the “used slots” exceeded the “total slots”. This is what their environment showed in vCenter:

HA Advanced Runtime Info:
Slot size                          4000Mhz
                                   4 vCPUs,
                                   4232MB
Total Slots in Cluster             16
Used Slots                         66
Available Slots                    0
Total Powered on vms in Cluster    66
Total Hosts in cluster             2
Total good host                    2

You can imagine this person was very surprised to see this. How can you have 66 slots used and only 16 total slots available in your cluster? There are two possible explanations:

  1. Admission Control is disabled
  2. A reservation was set on a virtual machine after all virtual machines were powered on, skewing the numbers

Let’s tackle number 1 first. If you disable admission control the vSphere UI will still show the slot size and the number of slots etc, it just won’t do anything with it…

With regards to the second explanation it might be easier to give an example:

Just imagine you have 2 hosts and HA does its calculations and you have 100 slots available. You power-on 100 VMs. Now you set a reservation on a VM, this reservation will change the slotsize. HA does it calculations again based on this new slotsize. This will result in only 25 slots available based on this new slotsize. However you already used 100 slots. In other words, now you have 25 totals slots and your used is 100.

vSphere HA isolation response… which to use when?

By Duncan Epping, Principal Architect.

A while back I wrote this article about a split brain scenario with vSphere HA. Although we have multiple techniques to mitigate these scenarios it is always better to prevent. I had already blogged about this before but I figured it wouldn’t hurt to get this out again and elaborate on it a bit more.

First some basics…

What is an “Isolation Response”?

The isolation response refers to the action that vSphere HA takes when the heartbeat network is isolated. The heartbeat network is usually the management network of an ESXi host. When a host does not receive any heartbeats it will trigger the response after an X number of seconds. So when exactly? Well that depends if the host is a slave or a master. This is the timeline:

Isolation of a slave

  • T0 – Isolation of the host (slave)
  • T10s – Slave enters “election state”
  • T25s – Slave elects itself as master
  • T25s – Slave pings “isolation addresses”
  • T30s – Slave declares itself isolated and “triggers” isolation response

Isolation of a master

  • T0 – Isolation of the host (master)
  • T0 – Master pings “isolation addresses”
  • T5s – Master declares itself isolated and “triggers” isolation response

What are my options?

Today there are three options for the isolation response. The responses is what the host will do for the virtual machines running on that host when it has validated it is isolated.

  1. Power off – When a network isolation occurs all VMs are powered off. It is a hard stop.
  2. Shut down – When a network isolation occurs all VMs running on that host are shut down via VMware Tools. If this is not successful within 5 minutes a “power off” will be executed.
  3. Leave powered on – When a network isolation occurs on the host the state of the VMs remains unchanged.

Now that we know what the options are. Which one should you use? Well this depends on your environment. Are you using iSCSI/NAS? Do you have a converged network infrastructure? We’ve put the most common scenarios in a table.

Likelihood that host will retain access to VM datastores Likelihood that host will retain access to VM network Recommended Isolation policy Explanation
Likely Likely Leave Powered On VM is running fine so why power it off?
Likely Unlikely Either Leave Powered On or Shutdown Choose shutdown to allow HA to restart VMs on hosts that are not isolated and hence are likely to have access to storage
Unlikely Likely Power Off Use Power Off to avoid having two instances of the same VM on the VM network
Unlikely Unlikely Leave Powered On or Power Off Leave Powered on if the VM can recover from the network/datastore outage if it is not restarted because of the isolation, and Power Off if it likely can’t.

But why is it important…. Well just imagine you pick “leave powered on” and you have a converged network environment and are using iSCSI storage, chances are fairly big that when the host management network is isolated… so is the virtual machine network and the storage for your virtual machine. In that case, having the virtual machine restarted will reduce the amount of “downtime” from an “application / service” perspective.

I hope this helps making the right decision for the vSphere HA isolation response. Although it is just a small part of what vSphere HA does, it is important to understand the impact a wrong decision can have.