Home > Blogs > VMware VROOM! Blog > Tag Archives: vsphere

Tag Archives: vsphere

Writing Performant Tagging Code: Tips and Tricks for PowerCLI

vSphere 5.1 introduced an inventory tagging feature that has been available in all later versions of vSphere, including vSphere 6.7. Tags let datacenter administrators organize different vSphere objects like datastores, virtual machines, hosts, and so on. This makes it easier to sort and search for objects that share a tag, among other things. For example, you might use tags to track a group of VMs that all have the same operating system.

Writing code to use tags can be challenging in large-scale environments: a straightforward use of VMware PowerCLI cmdlets may result in poor performance, and while direct Tagging Service APIs are faster, the documentation can be difficult to understand. In this blog, we show some practical examples of using PowerCLI and Tagging Service APIs to perform tag-related operations. We include some simple measurements to show the performance improvements when using the Tagging Service vs. cmdlets. The sample performance numbers are for illustrative purposes only. We describe the test setup in the Appendix.

1. Connecting to PowerCLI and the Tagging Service

In this document, when we write “PowerCLI cmdlets,” we mean calls like Get-Tag, or Get-TagCategory. To access this API, simply open a PowerShell terminal and log in:

Connect-VIServer <vCenter server IP or FQDN> -User <username> -Pass <password>

When we write “Tagging Service APIs,” we are referring to calls that are satisfied by the Tagging Service. To access such APIs from a PowerShell terminal, you must log in both to the vCenter Server and to the Tagging Service:

# Login to vCenter
Connect-VIServer <VC server IP or FQDN> -User <username> -Pass <password>

# Login to the tagging server (known as the CIS server), which contains the Tagging Service
Connect-CISserver <VC server IP or FQDN> -User <username> -Pass <password>

# Get a handle to the Tagging Service APIs
$taggingAPI = Get-CisService com.vmware.cis.tagging.tag

# Get a handle to the tag assignment APIs
$tagAssignmentAPI = Get-CisService com.vmware.cis.tagging.tag_association

To access built-in help for the tagging APIs, add .help to the method name. We give an example below with the actual documentation in italics.

# List available tag assignment method calls, using $tagAssignmentAPI from above
PS H:\> $tagAssignmentAPI.help

Documentation: The {@name TagAssociation} {@term service} provides {@term operations} to attach, detach, and query tags.

Operations: List<com.vmware.cis.tagging.tag_association.object_to_tags> list_attached_tags_on_objects(List<com.vmware.vapi.std.dynamic_ID> object_ids):

Fetches the {@term list} of {@link ObjectToTags} describing the input object identifiers and the tags attached to each object. To invoke this {@term operation}, you need the read privilege on each input object. The {@link ObjectToTags#tagIds} will only contain those tags for which you have the read privilege.

List<id> list_attachable_tags(com.vmware.vapi.std.dynamic_ID object_id): 

Fetches the {@term list} of attachable tags for the given object, omitting the tags that have already been attached. Criteria for attachability is calculated based on tagging cardinality ({@link CategoryModel#cardinality}) and associability ({@link CategoryModel#associableTypes}) constructs. To invoke this {@term operation}, you need the read privilege on the input object. The {@term list} will only contain those tags for which you have read privileges.

… <output truncated> …

2. High level differences between cmdlets and Tagging Service calls

2.1 Names vs. IDs

When using cmdlets, it is customary to use the name of a tag or category. For example, you might write the following to get the tag whose name is “john”.

PS H:\> $tag = Get-Tag -Name “john”

To get all tags with “john” in the name, you would use a “*” wildcard

PS H:\> $tagList = Get-Tag -Name “john*”

The Tagging Service, in contrast, typically requires IDs instead of names. These IDs persist throughout the lifetime of the tag or category, so they can be cached on creation and used throughout the lifetime of your scripts.

Example: Here is some sample code to retrieve the ID of a tag category given its name:

# We will find the id of the tag category for the category named “john”
$tagCatName = ‘john’

# Get a handle to all tag category API calls
$allCategoryMethodSvc = Get-CisService com.vmware.cis.tagging.category

# List all categories
$allCats = $allCategoryMethodSvc.list()

# Iterate over categories to find the desired category (name = $tagCatName)
foreach ($cat in $allCats) {
      if (($allCategoryMethodSvc.get($cat.value)).name -eq $tagCatName){
            # set $catID to the id of this category
            $catID= $cat.id

Example:  Here is some code to find the tag ID of the tag named “john”.

$tagName = “john”

# return a list of all tag IDs in the system.
$allTags = $allTagMethodSvc.list()

# iterate over tag IDs to find the tag whose name is $tagName
foreach ($tag in $allTags) {
      if ($allTagMethodSvc.get($tag).name -eq $tagName) {
            $tagID = $tag.id

If you know the name of the category, you can improve on the previous code by searching tags within this category.

# Assume that $catID was retrieved as specified above
$tagsForCatID = $allTagMethodSvc.list_tags_for_category($catID)
foreach ($tag in $tagsForCatID) {
      if ($allTagMethodSvc.get($tag).name -eq $tagName) {
            $tagID = $tag.id

For performance reasons, it is a good idea to store a mapping of IDs to names. With such a map, you avoid the need to iterate over each tag ID or category ID whenever you need a tag or category name. We give an example of making such a map below under example 4 (see createTagNameIdMap).

2.2 Tag and category specifications for Tagging Service calls

When creating a tag or a category using cmdlets, there are many default parameters, and others can be specified on the command line. With Tagging Service calls, a “spec” must be created and used.

Here is an example of creating a tag with “multiple” cardinality. (Multiple cardinality means that multiple tags from this category can be applied to a specific object at a time. For example, a category named “Owners” may have tags named “Alice” and “Bob”, and a given VM may have both “Alice” and “Bob” assigned to it. In contrast, single cardinality means that only one tag from a given category can be used on a specific object. For example, a category named “OS” may have tags named “Linux” and “Windows.” A given VM would have only one of these tags assigned.)

Creating a tag category using cmdlets

# Create tag category named ‘testCat’
$catName = ‘testCat’
New-TagCategory -Name $catName -Cardinality Multiple

Creating a tag category using the Tagging Service directly

$catName = ‘testCat’

# Get a handle to the category methods:
$allCategoryMethodSVC = Get-CisService com.vmware.cis.tagging.category

# Create a spec for the category
$catSpec = $allCategoryMethodSVC.Help.create.create_spec
$catSpec.cardinality = ‘MULTIPLE’
$catSpec.associable_types = ‘’

# NOTE: In vSphere 6.5, the category_id field should not be used.
# In vSphere 6.7, please set the category_id to ‘’.
$catSpec.category_id = ‘’
$catSpec.description = ‘’
$catSpec.name = $catName

# Now create the tag category

3. Performance considerations

In general, Tagging Service calls are faster than cmdlets, and the difference becomes larger as the inventory size or number of tags grows. There are two main reasons for this. First, a cmdlet, while presenting a simpler interface to the user, often must make multiple backend calls to retrieve the same information that one might need to retrieve with a direct Tagging Service call. Second, cmdlets often use names rather than IDs in their invocations (e.g., Get-Tag <tagName> rather than Get-Tag -Id <tag-id>). Most of the tagging information is indexed by ID in the backend, so calls that use an ID are faster than calls that use a name.

4. Examples using cmdlets and Tagging Service calls

In the following examples, we show how to retrieve information using both cmdlets and Tagging Service calls. We also show sample performance numbers for a sample inventory with 3200 VMs (see the Appendix for details on the experimental setup). As noted above, the API for the Tagging Service is more efficient, though it requires using IDs rather than names. As a result, where possible, we suggest storing the ID/name mapping for tags and tag categories in a local file or data structure for fast lookup.

4.1 Global variables

In the examples that follow, we use a number of variables repeatedly. We define those variables here.

Cmdlet variables

$vsphereUnderTest = FQDN of the vCenter Server 
$user = username for the vCenter under test 
$pass = password for the vCenter under test 
$catName = ‘testCat’ 
$tagName = ‘testTag’ 
$allVMS = Get-VM

Tagging Service variables

$vsphereUnderTest = FQDN of the vCenter Server 
$user = username for the vCenter under test 
$pass = password for the vCenter under test 
$catName = ‘testCat’ 
$tagName = ‘testTag’ 
$vmNames = ‘testVM_1’, ‘testVM_2’, ‘testVM_3’, ‘testVM_4’, ‘testVM_5’ 
$allVMs = Get-VM

4.2 Code Samples 

Example 1: Connecting to vCenter and the Tagging Service

In this example, we show how to connect to vCenter and the Tagging Service. For the Tagging Service, as described above, you must log in to both the vCenter and the “CIS” server.

1A: Cmdlets
Connect-VIServer -Server $vsphereUnderTest -User $user -Password $pass
1B: Tagging Service (requires login to both vCenter and the CIS server)
# also connect to CIS service
Connect-VIServer -Server $vsphereUnderTest -User $user -Password $pass 
Connect-CISServer -Server $vsphereUnderTest -User $user -Password $pass

Discussion: Tagging Service requires logging in to both vCenter and the Tagging Service server (called the CIS server). When using the Tagging Service, you must also retrieve a handle to the appropriate Tagging Service methods. Here are some examples:

Getting method handles for various Tagging Services
# Category methods 
$allCategoryMethodSVC = Get-CisService com.vmware.cis.tagging.category 

# Tagging methods 
$allTagMethodSVC = Get-CisService com.vmware.cis.tagging.tag 

# Tag association methods 
$allTagAssociationMethodSVC = Get-CisService com.vmware.cis.tagging.tag_association

Example 2: Creating a tag category

In this example, we create a tag category. Recall that the category name $catName is defined above.

2A: Cmdlets
New-TagCategory -Name $catName -Cardinality Multiple
Average time to create tag category in our setup: 200 ms
2B: Tagging Service
# Create a spec for the category: 
$catSpec = $allCategoryMethodSVC.Help.create.create_spec 
$catSpec.cardinality = ‘MULTIPLE’ 
$catSpec.associable_types = ‘’
$catSpec.category_id = ‘’ 
$catSpec.description = ‘’ 
$catSpec.name = $catName 

# Now create the tag category 
$catObject = $allCategoryMethodSVC.create($catSpec)
   Average time to create tag category in our setup: 53 ms

Programming note: You don’t need to set the description, but if you don’t, the default value will be set to: @{Documentation=The description of the category.}

WARNING: If you are connected to a 6.7 vCenter you need to specify the category_id, if you don’t you will see an invalid_argument error.

Discussion: In the Tagging Service example, $catObject = $allCategoryMethodSVC.create($catSpec) creates the tag category, and the object returned is saved as $catObject. $catObject.Value is the ID of the category that was created.

Example 3: Creating a tag under a tag category

In this example, we create a tag under a tag category. As a convenience, we use the following function to get a category ID given the category’s name. We need the ID because the Tagging Service uses IDs, not names.

function Get-CategoryIdFromName {
    Param ($inputCatName)
    $allCategoryMethodSvc = Get-CisService com.vmware.cis.tagging.category
    $allCats = @()
    $allCats = $allCategoryMethodSvc.list()

    foreach ($cat in $allCats) {
        # Compare name of input category to current category. Return if match.
        if (($allCategoryMethodSvc.get($cat.value)).name -eq $inputCatName) {
            return $cat.value
    # no match
    return $null

Note: if you need to get category IDs from names multiple times, it is better to create a mapping of names to IDs. The following function creates a map of category names to IDs.

function createCategoryNameIdMap {
    $allCategoryMethodSvc = Get-CisService com.vmware.cis.tagging.category
    $allCats = $allCategoryMethodSvc.list()
    $catNameIdMap = @{}
    foreach ($cat in $allCats) {
        $catName = $allCategoryMethodSvc.get($cat.value).name
    return $catNameIdMap


Creating a tag under a tag category

3A: Cmdlets
New-Tag -Name $tagName -Category $catName
Average time to create a tag in our setup: 112 ms
3B: Tagging Service
# Use function defined above to get category ID from category name
$catID = Get-CategoryIdFromName($catName)

# First create a tag spec.
$spec = $allTagMethodSVC.Help.create.create_spec

$spec.name = $tagName
$spec.description = ‘’
$spec.tag_id = ‘’
$spec.category_id = $catID.value

# Now create the tag
$tagObject = $allTagMethodSVC.Create($spec)
Average time to create a tag in our setup: 64 ms

Programming note: You don’t need to set the description, but if you don’t the default value will be set to: @{Documentation=The description of the tag.}

WARNING: If you are connected to a 6.7 vCenter you need to specify the tag_id; if you don’t, you will see an invalid_argument error.

Discussion: The Tagging Service example uses the function Get-CategoryIdFromName to get the category ID. This line of code can be eliminated if you save the category object $catObject from example 2b. If you do this, you will also need to change the line $spec.category_id = $catID.value to $spec.category_id = $catObject.value.

When we create the tag, we save the object that is returned by $allTagMethodSVC.Create($spec) in $tagObject.

Example 4: Associating a tag with a VM

In this example, we associate a tag with a VM (also known as “attaching a tag to a VM”). The Tagging Service APIs, as mentioned above, use tag IDs instead of tag names. As a convenience, we use the following function to get a tag ID given the tag’s name.

function Get-TagIdFromName {
    Param ($a)
    $allTagMethodSvc = Get-CisService com.vmware.cis.tagging.tag
    $allTags = @()
    $allTags = $allTagMethodSvc.list()
    foreach ($tag in $allTags) {
        if (($allTagMethodSvc.get($tag.value)).name -eq $a) {
            return $tag.value
    return $null

In addition to using the tag ID instead of the tag name, the tag attachment API requires a specially-created VM object, as shown in the example below.

Attaching a tag to one VM

4A: Cmdlets
$tagArray = Get-Tag -Category $catName
New-TagAssignment -tag $tagArray[0] -entity $allVMS[0]
Average time to attach a tag in our setup: 2717 ms
4B: Tagging Service
# Pick a VM to attach the tag to
$testVM = Get-VM -Name $vmNames[0]

# The Tagging Service needs a VM object as an argument.
# We recommend doing this once and storing the result rather than recreating it each time you need it.
$VMInfo = $testVM.ExtensionData.MoRef
$vmid = New-Object PSObject -Property @{
    id = $VMInfo.value
    type = $VMInfo.Type

# The Tagging Service uses tag IDs, not names, so get the ID from the name using the method above.
$tagId = Get-TagIdFromName($tagName)

# Now attach the tag to the VM.
$allTagAssociationMethodSVC.attach($tagID, tagId, $vmid)
Average time to execute the attach() call in our setup: 35 ms

Programming note: $tagId uses our Get-TagIdFromName convenience function.

Note: If you need to get tag IDs from names multiple times, it is probably better to create a mapping of names to IDs. The following function creates a map of tag names to IDs:

function createTagNameIdMap {
    $allTagMethodSvc = Get-CisService com.vmware.cis.tagging.tag
    $allTags = $allTagMethodSvc.list()
    $tagNameIdMap = @{}
    foreach ($tag in $allTags) {
        $tagName = $allTagMethodSvc.get($tag.value).name
    return $tagNameIdMap

To use the function above, you could do the following (we assume that there is a tag named “test tag”):

$tmap = createTagNameIdMap
$testTagId = $tmap[“test tag”]

Example 5: Get the tags assigned to a VM (1 tag associated with the VM)

In this example, we get the tags assigned to a VM. The cmdlet assumes that we have an array of VMs $allVMs (defined above in “Global Variables”), and it gets the association to the first one. The Tagging Service needs a VM ID object rather than a VM name. We use the $vmid object created in the previous example. We assume there is one tag associated with the VM.

5A: Cmdlets
# Pick the first VM in our list of VMs ($allVMs)
Get-TagAssignment -Entity $allVMS[0]
Average time to get tag assignment in our setup: 2610 ms
5B: Tagging Service
# Assume we have the VM object $vmid from the previous example, and assume we have only 1 tag association.
$tagID = $allTagAssociationMethodSVC.list_attached_tags($vmid)
Average time to get tag assignment in our setup: 49 ms

Programming note: The above code will return the tag ID. If you need to get more information about the tag, use the following line:


If you want the name, then use this:



Example 6: Assign one tag to 1000 VMs

In this example, we assign a single tag to 1000 VMs. In both the cmdlet case and the Tagging Service case, we assume an array of VMs ($allVMs), as defined above. For the Tagging Service, we create an array of VM ID objects from this $allVMs array. We also use the $tagID created in the previous example.

6A: Cmdlets
# Pick 1000 VMs from $allVMs and assign the first tag in $tagArray to each VM.
$useTheseVMs = $allVMS[0..999] 
$useTheseVMs | New-TagAssignment -Tag $tagArray[0]
Average time to attach 1k VMs to one tag in our setup: 2727842 ms
6B: Tagging Service
# Use $allVMs from Global Variables above.
$useTheseVMIDS = @()

# Create VM objects for all VMs.
# This should be done once for all VMs, not every time
# you want to do an association.
for ($i = 0; $i -lt 1000; $i++) {
    $VMInfo = $allVMS[$i].extensiondata.moref
    $useTheseVMIDS += New-Object PSObject -Property @{
        id = $VMInfo.value
        type = $VMInfo.type

# Assume we have $tagId from the previous example.
$allTagAssociationMethodSVC.attach_tag_to_multiple_objects($tagID, $useTheseVMIDS)

Once VM objects have been created, the average time to attach 1k VMs to one tag: 18438 ms

Example 7: List the VMs assigned to one tag

In this example, we retrieve the VMs assigned to one tag. In the cmdlet case, we assume that there is an array of 1000 tags $tagArray, and we find the VMs assigned to the first tag. In the Tagging Service case, we need to use a tag ID instead of a name, so we use $tagID from the previous example.

7A: Cmdlets
# Get VMs attached to the first tag in $tagArray.
$vmsAssignedToTag = Get-VM -Tag $tagArray[0]
Average time to search for the VMs: 25844 ms
7B: Tagging Service
# Assume we have $tagID from earlier tests. 
$vmsAssignedToTag = $allTagAssociationMethodSVC.list_attached_objects($tagId)
Average time to search for the VMs: 411 ms

Programming note: Return value $vmsAssignedToTag is a list of PSCustomOjbects. To find the type, use $vmsAssignedToTag.type, and to find the ID, use $vmsAssignedToTag.id

Note: The return value of the list_attached_objects method is a list of VM objects. If you want to get the ID of a given VM for subsequent Tagging Service calls, you must do the following:

# Get the first VM in the list.
$vm_0 = $vmsAssignedToTag[0]

# Get the type. In this case, it is VirtualMachine.
$v_type = $vm_0.type

# Get the id. In this case, it will be something like ‘vm-222’.
$v_id = $vm_0.id

# Construct an ID from the type and ID.
# The result is of the form ‘VirtualMachine-vm-222’.
$VirtualMachineId = $v_type + "-" + $v_id

# This ID can now be used to retrieve additional information about VMs.
$vm_0_get = Get-VM -Id $VirtualMachineId

Example 8: Attach 1000 tags to one VM

In this example, we attach 1000 tags to a single VM. We assume that you have already created 1000 tags, for example, by taking the code in example 3 in a loop.

8A: Cmdlets
$useTheseTags = $tagArray[0..999] 
foreach ($tag in $useTheseTags) { 
  New-TagAssignment -Entity $vmArray[1] -Tag $tag
Average time to attach 1000 tags to 1 VM in our setup: 3236949 ms
8B: Tagging Service
# Get list of tags you want to attach
$tagArray = $alltagMethod.list()
$useTheseTags = $tagArray[0..999]

# Create a VM object to attach the tags to. We re-use a VM from our list above.
# For best performance, create these VM objects once and store them: do not 
# create them every time you wish to do an associaton.
$VMInfo = ($allVMS[0]).extensiondata.moref
$vmid = New-Object PSObject -Property @{
    id = $VMInfo.value
    type = $VMInfo.type

# Now do the actual association.
$allTagAssociationMethod.attach_multiple_tags_to_object($vmid, $useTheseTags)
Once VM object has been created, average time to attach 1000 tags to 1 VM in our setup: 1511 ms

Example 9: List all the tags associated with a VM (1000 associations)

In this example, we list all tags associated with a VM. In our test, we use a VM that has 1000 tags associated with it. In both the cmdlet case and the Tagging Service case, we use the $allVMs array above, and pick the first VM.

9A: Cmdlets
Get-TagAssignment -Entity $allVMs[0]
Average time to get tag assignments in our setup: 3715 ms
9B: Tagging Service
# Create VM object. Do this once and store it, rather than
# creating it every time you need to retrieve tag associations.
$VMInfo = $allVMs[0].extensiondata.moref 
$vmid = New-Object PSObject -Property @{ 
   Id = $VMInfo.value 
   Type = $VMInfo.type 
$tagIDArray = $allTagAssociationMethod.list_attached_tags($vmid)
 Average time for get tag assignments after object has been created: 63 ms

Programming note: The above code will return an array of tag IDs. If you need to get more information about the tags, use $allTagMethodSVC.get for each tag ID.

5. Takeaways

As you can see, the cmdlets are typically shorter and more intuitive to write than the Tagging Service scripts. However, the performance of Tagging Service calls can be substantially better than cmdlets. For small inventories or small numbers of tags, cmdlet performance is likely to be adequate. For larger inventories, and for better performance, we recommend using Tagging Service calls.

Appendix: Experimental environment

For these examples, we used a testbed that had 32 hosts and 3200 VMs. These measurements were done after creating 10 tag categories with 1000 tags per category (for a total of 10,000 tags in the system). Actual performance for your environments will vary with inventory configuration and the configuration of the vCenter appliance.

PowerShell version 5.1.14409.1005
VMware PowerCLI 10.1.0 build 8346946
vSphere 6.7 GA

About the Authors

Joseph Zuk is a staff-level automation test engineer working at VMware within the Performance Engineering Automation team. He focuses on at-scale performance testing of vCenter. He has been using PowerCLI for setup of his automation testbed since 2011.

Ravi Soundararajan is a principal engineer in the Performance Engineering group at VMware. He works on vCenter performance and scalability–from the UI, to the server, to the database, to the hypervisor management agents. He has been at VMware since 2003, and he has presented on the topic of vCenter Performance at VMworld from 2013-2018. His Twitter handle is @vCenterPerfGuy.

First VMmark 3.1 Publications, Featuring New Cascade Lake Processors

VMmark is a free tool used by hardware vendors and others to measure the performance, scalability, and power consumption of virtualization platforms.  If you’re unfamiliar with VMmark 3.x, each tile is a grouping of 19 virtual machines (VMs) simultaneously running diverse workloads commonly found in today’s data centers, including a scalable Web simulation, an E-commerce simulation (with backend database VMs), and standby/idle VMs.

As Joshua mentioned in a recent blog post, we released VMmark 3.1 in February, adding support for persistent memory, improving workload scalability, and better reflecting secure customer environments by increasing side-channel vulnerability mitigation requirements.

I’m happy to announce that today we published the first VMmark 3.1 results.  These results were obtained on systems meeting our industry-leading side-channel-aware mitigation requirements, thus continuing the benchmark’s ability to provide an indication of real-world performance.

Some mitigations for recently-discovered side-channel vulnerabilities (i.e., Spectre, Meltdown, and L1TF) incur significant performance impacts, while others have little or no impact.  Today’s VMmark results demonstrate that even when additional mitigations are in place, ESXi hosts using the new 2nd-Generation Intel® Xeon® Scalable processors obtain higher VMmark scores than comparable 1st-Generation Intel Xeon Scalable processors.  This is due to processor design improvements that reduce (or even negate) the performance impact of security mitigations, by mitigating some of the security vulnerabilities in hardware rather than in software.

These results, from Fujitsu, span all three VMmark publication categories:

  1. Performance Only (9.02 @ 9 tiles)
  2. Performance with Server Power (6.3290 @ 9 tiles)
  3. Performance with Server and Storage Power (3.5013 @ 9 tiles)

So, how does this new performance result with Cascade Lake processors compare to the previous generation with Skylake processors?  Hopefully a graph is worth a thousand words 😊…

Fujitsu Skylake to Cascade Lake Graph

As you can see, Fujitsu was able to achieve a higher score, while being able to run an additional tile (19 more VMs) and still meeting strict Quality-of-Service (QoS) compliance requirements imposed by the VMmark benchmark harness.

Industry-Leading Side-Channel Mitigation Requirements
Given the numerous security vulnerabilities recently identified, we set a high bar in VMmark 3.1 that requires all applicable security mitigations in benchmarked environments to best represent secure, real-world customer environments.

These are the current security mitigation requirements for VMmark 3.1:

VMmark 3.1 Security Mitigations Table

VMmark 3.1 Security Mitigations Table

Note: If “N/A” is listed, that vulnerability does not apply to that portion of the stack.

For more information about VMmark, please visit the VMmark product page.

If you have any questions or feedback, please leave us a comment below.  Thanks!

Oracle Database Performance with VMware Cloud on AWS

You’ve probably already heard about VMware Cloud on Amazon Web Services (VMC on AWS). It’s the same vSphere platform that has been running business critical applications for years, but now it’s available on Amazon’s cloud infrastructure. Following up on the many tests that we have done with Oracle databases on vSphere, I was able to get some time on a VMC on AWS setup to see how Oracle databases perform in this new environment.

It is important to note that VMC on AWS is vSphere running on bare metal servers in Amazon’s infrastructure. The expectation is that performance will be very similar to “regular” onsite vSphere, with the added advantage that the hardware provisioning, software installation, and configuration is already done and the environment is ready to go when you login. The vCenter interface is the same, except that it references the Amazon instance type for the server.

Our VMC on AWS instance is made up of four ESXi hosts. Each host has two 18-core Intel Xeon E5-2686 v4 (aka Broadwell) processors and 512 GB of RAM. In total, the cluster has 144 cores and 2 TB of RAM, which gives us lots of physical resources to utilize in the cloud.

In our test, the database VMs were running Red Hat Enterprise Linux 7.2 with Oracle 12c. To drive a load against the database VMs, a single 18 vCPU driver VM was running Windows Server 2012 R2, and the DVD Store 3 test workload was also setup on the cluster. A 100 GB test DS3 database was created on each of the Oracle database VMs. During testing, the number of threads driving load against the databases were increased until maximum throughput was achieved, which was around 95% CPU utilization. The total throughput across all database servers for each test is shown below.


In this test, the DB VMs were configured with 16 vCPUs and 128 GB of RAM. In the 8 VMs test case, a total of 128 vCPUs were allocated across the 144 cores of the cluster. Additionally the cluster was also running the 18 vCPU driver VM,  vCenter, vSAN, and NSX. This makes the 12 VM test case interesting, where there were 192 vCPUs for the DB VMs, plus 18 vCPUs for the driver. The hyperthreads clearly help out, allowing for performance to continue to scale, even though there are more vCPUs allocated than physical cores.

The performance itself represents scaling very similar to what we have seen with Oracle and other database workloads with vSphere in recent releases. The cluster was able to achieve over 370 thousand orders per minute with good scaling from 1 VM to 12 VMs. We also recently published similar tests with SQL Server on the same VMC on AWS cluster, but with a different workload and more, smaller VMs.

UPDATE (07/30/2018): The whitepaper detailing these results is now available here.

SQL Server Performance of VMware Cloud on AWS

In the past, I’ve always benchmarked performance of SQL Server VMs on vSphere with “on-premises” infrastructure.  Given the skyrocketing interest in the cloud, I was very excited to get my hands on VMware Cloud on AWS – just in time for Amazon’s AWS Summit!

A key question our customers have is: how well do applications (like SQL Server) perform in our cloud?  Well, I’m happy to report that the answer is great!

VMware Cloud on AWS Environment

First, here is a screenshot of what my vSphere-powered Software-Defined Data Center (SDDC) looks like:vSphere Client - VMware Cloud on AWSThis screenshot shows several notable items:

  • The HTML5-based vSphere Client interface should be very familiar to vSphere administrators, making the move to the cloud extremely easy
  • This SDDC instance was auto-provisioned with 4 ESXi hosts and 2TB of memory, all of which were pre-configured with vSAN storage and NSX networking.
    • Each host is configured with two CPUs (Intel Xeon Processor E5-2686 v4); each socket contains 18 cores running at 2.3GHz, resulting in 144 physical cores in the cluster. For more information, see the VMware Cloud on AWS Technical Overview
  • Virtual machines are provisioned within the customer workload resource pool, and vSphere DRS automatically handles balancing the VMs across the compute cluster.

Benchmark Methodology

To measure SQL Server database performance, I used HammerDB, an open-source database load testing and benchmarking tool.  It implements a TPC-C like workload, and reports throughput in TPM (Transactions Per Minute).

To measure how well performance scaled in this cloud, I started with a single 8 vCPU, 32GB RAM VM for the SQL Server database.  To drive the workload, I created a 4 vCPU, 4GB RAM HammerDB driver VM.  I then cloned these VMs to measure 2 database VMs being driven simultaneously:HammerDB and SQL Server VMs in VMware Cloud on AWS

I then doubled the number of VMs again to 4, 8, and finally 16.  As with any benchmark, these VMs were completely driven up to saturation (100% load) – “pedal to the metal”!


So, how did the results look?  Well, here is a graph of each VM count and the resulting database performance:

As you can see, database performance scaled great; when running 16 8-vCPU VMs, VMware Cloud on AWS was able to sustain 6.7 million database TPM!

I’ll be detailing these benchmarks more in an upcoming whitepaper, but wanted to share these results right away.  If you have any questions or feedback, please leave me a comment!

UPDATE (07/25/2018): The whitepaper detailing these results is now available here.

ESX IP Storage Troubleshooting Best Practice White Paper

We have published an ESX IP Storage Troubleshooting Best Practice white paper in which we recommend vSphere customers deploying ESX IP storage over 10G networks to include 10G packet capture systems as a best practice to ensure network visibility.

The white paper explores the challenges and alternatives for packet capture in a vSphere environment with IP storage (NFS, iSCSI) datastores over a 10G network, and explains why traditional techniques for capturing packet traces on 1G networks will suffer from severe limitations (capture drops and inaccurate timestamps) when used for 10G networks. Although commercial 10G packet capture systems are commonly available, they may be beyond the budget of some vSphere customers. We present the design of a self-assembled 10G packet capture solution that can be built using commercial components relatively inexpensively. The self-assembled solution is optimized for common troubleshooting scenarios where short duration packet captures can satisfy most analysis requirements.

Our experience troubleshooting a large number of IP storage issues has shown that the ability to capture and analyze packet traces in an ESX IP storage environment can significantly reduce the mean time to resolution for serious functional and performance issues. When reporting an IP storage problem to VMware or to a storage array vendor, an accompanying packet trace file is a great piece of evidence that can significantly reduce the time required by the responsible engineering teams to identify the problem.

Performance of SQL Server 2017 for Linux VMs on vSphere 6.5

Microsoft SQL Server has long been one of the most popular applications for running on vSphere virtual machines.  Last year there was quite a bit of excitement when Microsoft announced they were bringing SQL Server to Linux.  Over the last year Microsoft has had quite a bit of interest in SQL Server for Linux and it was announced at Microsoft Ignite last month that it is now officially launched and generally available.

VMware and Microsoft have collaborated to validate and support the functionality and performance scalability of SQL Server 2017 on vSphere-based Linux VMs.  The results of that work show SQL Server 2017 for Linux installs easily and has great performance within VMware vSphere virtual machines. VMware vSphere is a great environment to be able to try out the new Linux version of SQL Server and be able to also get great performance.

Using CDB, a cloud database benchmark developed by the Microsoft SQL Server team, we were able to verify that the performance of SQL Server for Linux in a vSphere virtual machine was similar to other non-virtualized and virtualized operating systems or platforms.

Our initial reference test size was relatively small, so we wanted to try out testing larger sizes to see how well SQL Server 2017 for Linux performed as the VM size was scaled up.  For the test, we used a four socket Intel Xeon E7-8890 v4 (Broadwell)-based server with 96 cores (24 cores per socket).  The initial test began with a 24 virtual CPU VM to match the number of physical cores of a single socket.  Additional tests were run by increasing the size of the VM by 24 vCPUs for each test until, in the final test, the VM had 96 total vCPUs.  We configured the virtual machine with 512 GB of RAM and separate log and data disks on an SSD-based Fibre Channel SAN.  We used the same best practices for SQL Server for Linux as what we normally use for the windows version as documented in our published best practices guide for SQL Server on vSphere.

The results showed that SQL Server 2017 for Linux scaled very well as the additional vCPUs were added to the virtual machine. SQL Server 2017 for Linux is capable of scaling up to handle very large databases on VMware vSphere 6.5 Linux virtual machines.

Skylake Update – Oracle Database Performance on vSphere 6.5 Monster Virtual Machines

We were able to get one of the new four-socket Intel Skylake based servers and run some more tests. Specifically we used the Xeon Platinum 8180 processors with 28 cores each. The new data has been added to the Oracle Monster Virtual Machine Performance on VMware vSphere 6.5 whitepaper. Please check out the paper for the full details and context of these updates.

The generational testing in the paper now includes a fifth generation with a 112 vCPU virtual machine running on the Skylake based server. Performance gain from the initial 40 vCPU VM on Westmere-EX to the Skylake based 112 vCPU VM is almost 4x.

The performance gained from Hyper-Threading was also updated and shows a 27% performance gain from the use of Hyper-Threads. The test was conducted by running two 112 vCPU VMs at the same time so that all 224 logical threads are active. The total throughput from the two VMs is then compared with the throughput from a single VM.

My colleague David Morse has also updated his SQL Server monster virtual machine whitepaper with Skylake data as well.

Episode 3: Performance Comparison of Native GPU to Virtualized GPU and Scalability of Virtualized GPUs for Machine Learning

In our third episode of machine learning performance with vSphere 6.x, we look at the virtual GPU vs. the physical GPU. In addition, we extend the performance results of machine learning workloads using VMware DirectPath I/O (passthrough) vs. NVIDIA GRID vGPU that have been partially addressed in previous episodes:

Machine Learning with Virtualized GPUs

Performance is one of the biggest concerns that keeps high performance computing (HPC) users from choosing virtualization as the solution for deploying HPC applications despite virtualization benefits such as reduced administration costs, resource utilization efficiency, energy saving, and security. However, with the constant evolution of virtualization technologies, the performance gaps between bare metal and virtualization have almost disappeared, and, in some use cases, virtualized applications can achieve better performance than running on bare metal because of the intelligent and highly optimized resource utilization of hypervisors. For example, a prior study [1] shows that vector machine applications running on a virtualized cluster of 10 servers have a better execution time than running on bare metal.

Virtual GPU vs. Physical GPU

To understand the performance impact of machine learning with GPUs using virtualization, we used a complex language modeling application—predicting next words given a history of previous words using a recurrent neural network (RNN) with 1500 Long Short Term Memory (LSTM) units per layer, on the Penn Treebank (PTB) dataset [2, 3], which has:

  • 929,000 training words
  • 73,000 validation words
  • 82,000 test words
  • 10,000 vocabulary words

We tested three cases:

  • A physical GPU installed on bare metal (this is the “native” configuration)
  • A DirectPath I/O GPU inside a VM on vSphere 6
  • A GRID vGPU (that is, an M60-8Q vGPU profile with 8GB memory) inside a VM on vSphere 6

The VM in the last two cases has 12 virtual CPUs (vCPUs), 60GB RAM, and 96GB SSD storage.

The benchmark was implemented using TensorFlow [4], which was also used for the implementation of the other machine learning benchmarks in our experiments. We used CUDA 7.5, cuDNN 5.1, and CentOS 7.2 for both native and guest operating systems. These test cases were run on a Dell PowerEdge R730 server with dual 12-core Intel Xeon Processors E5-2680 v3, 2.50 GHz sockets (24 physical core, 48 logical with hyperthreading enabled), 768 GB memory, and an SSD (1.5 TB). This server also had two NVIDIA Tesla M60 cards (each has two GPUs) for a total of 4 GPUs where each had 2048 CUDA cores, 8GB memory, 36 x H.264 video 1080p 30 streams, and could support 1–32 GRID vGPUs whose memory profiles ranged from 512MB to 8GB. This experimental setup was used for all tests presented in this blog (Figure 1, below).

Figure 1. Testbed configurations for native GPU vs. virtual GPU comparison

The results in Figure 2 (below) show the relative execution times of DirectPath I/O and GRID vGPU compared to native GPU. Virtualization introduces a 4% overhead—the performance of DirectPath I/O and GRID vGPU are similar. These results are consistent with prior studies of virtual GPU performance with passthrough where overheads in most cases are less than 5% [5, 6].

Figure 2. DirectPath I/O and NVIDIA GRID vs. native GPU

GPU vs. CPU in a Virtualization Environment

One important benefit of using GPU is the shortening of the long training times of machine learning tasks, which has boosted the results of AI research and developments in recent years. In many cases, it helps to reduce execution times from weeks/days to hours/minutes. We illustrate this benefit in Figure 3 (below), which shows the training time with and without vGPU for two applications:

  • RNN with PTB (described earlier)
  • CNN with MNIST—a handwriting recognizer that uses a convolution neural network (CNN) on the MNIST dataset [7].

From the results, we see that the training time for RNN on PTB with CPU was 7.9 times higher than with vGPU training time (Figure 3-a).  The training time for CNN on MNIST with CPU was 10.1 times higher than with the vGPU training time (Figure 3-b). The VM used in this test has 1 vGPU, 12 vCPUs, 60 GB memory, 96 GB of SSD storage and the test setup is similar to that of the above experiment.

Figure 3. Normalized training time of PTB, MNIST with and without vGPU

As the test results show, we can successfully run machine learning applications in a vSphere 6 virtualized environment, and its performance is similar to training times for machine learning applications running in a native configuration (not virtualized) using physical GPUs.

But what about a passthrough scenario? How does a machine learning application run in a vSphere 6 virtual machine using a passthrough to the physical GPU vs. using a virtualized GPU? We present our findings in the next section.

Comparison of DirectPath I/O and GRID vGPU

We evaluate the performance, scalability, and other benefits of DirectPath I/O and GRID vGPU. We also provide some recommendations of the best use cases for each virtual GPU solutions.


To compare the performance of DirectPath I/O and GRID vGPU, we benchmarked them with RNN on PTB, and CNN on MNIST and CIFAR-10. CIFAR-10 [8] is an object classification application that categorizes RGB images of 32×32 pixels into 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. MNIST is a handwriting recognition application. Both CIFAR-10 and MNIST use a convolutional neural network. The language model used to predict words is based on history using a recurrent neural network. The dataset used is The Penn Tree Bank (PTB).

Fig. 4. Performance comparison of DirectPath I/O and GRID vGPU.

The results in Figure 4 (above) show the comparative performance of the two virtualization solutions in which DirectPath I/O achieves slightly better performance than GRID vGPU. This improvement is due to the passthrough mechanism of DirectPath I/O adding minimal overhead to GPU-based workloads running inside a VM. In Figure 4-a, DirectPath I/O is about 5% faster than GRID vGPU for MNIST, and they have the same performance with PTB. For CIFAR-10, DirectPath I/O can process about 13% more images per second than GRID vGPU. We use images per second for CIFAR-10 because it is a frequently used metric for this dataset. The VM in this experiment has 12 vCPU, 60GB VRAM and one GPU (either DirectPath I/O or GRID vGPU).


We look at two types of scalability: user and GPU.

User Scalability

In a cloud environment, multiple users can share physical servers, which helps to better utilize resources and save cost. Our test server with 4 GPUs can allow up to 4 users needing a GPU. Alternatively, a single user can have four VMs with a vGPU.  The number of virtual machines run per machine in a cloud environment is typically high to increase utilization and lower costs [9]. Machine learning workloads are typically much more resource intensive and using our 4 GPU test systems for up to only 4 users reflects this.

Figure 5. Scaling the number of VMs with vGPU on CIFAR-10

Figure 5 (above) presents the scalability of users on CIFAR-10 from 1 to 4 where each uses a VM with one GPU, and we normalize images per second to that of the DirectPath I/O – 1 VM case (Figure 5-a).  Similar to the previous comparison, DirectPath I/O and GRID vGPU show comparable performance as the number of VMs with GPUs scale. Specifically, the performance difference between them is 6%–10% for images per second and 0%–1.5% for CPU utilization. This difference is not significant when weighed against the benefits that vGPU brings. Because of its flexibility and elasticity, it is a good option for machine learning workloads. The results also show that the two solutions scale linearly with the number of VMs both in terms of execution time and CPU resource utilization. The VMs used in this experiment have 12 vCPUs, 16GB memory, and 1 GPU (either DirectPath I/O or GRID vGPU).

GPU Scalability

For machine learning applications that need to build very large models or in which the datasets cannot fit into a single GPU, users can use multiple GPUs to distribute the workloads among them and speed up the training task further. On vSphere, applications that require multiple GPUs can use DirectPath I/O passthrough to configure VMs with as many GPUs as required. This capability is limited for CUDA applications using GRID vGPU because only 1 vGPU per VM is allowed for CUDA computations.

We demonstrate the efficiency of using multiple GPUs on vSphere by benchmarking the CIFAR-10 workload and using the metric of images per second (images/sec) to compare the performance of CIFAR-10 on a VM with different numbers of GPUs scaling from 1 to 4 GPUs.

From the results in Figure 6 (below), we found that the images processed per second improves almost linearly with the number of GPUs on the host (Figure 6-a). At the same time, their CPU utilization also increases linearly (Figure 6-b). This result shows that machine learning workloads scale well on the vSphere platform. In the case of machine learning applications that require more GPUs than the physical server can support, we can use the distributed computing model with multiple distributed processes using GPUs running on a cluster of physical servers. With this approach, both DirectPath I/O and GRID vGPU can be used to enhance scalability with a very large number of GPUs.

Figure 6. Scaling the number of GPUs per VM on CIFAR-10

How to Choose Between DirectPath I/O and GRID vGPU

For DirectPath I/O

From the above results, we can see that DirectPath I/O and GRID vGPU have similar performance and low overhead compared to the performance of native GPU, which makes both good choices for machine learning applications in virtualized cloud environments. For applications that require short training times and use multiple GPUs to speed up machine learning tasks, DirectPath I/O is a suitable option because this solution supports multiple GPUs per VM. In addition, DirectPath I/O supports a wider range of GPU devices, and so can provide a more flexible choice of GPU for users.


When each user needs a single GPU, GRID vGPU can be a good choice. This configuration provides a higher consolidation of virtual machines and leverages the benefits of virtualization:

  • GRID vGPU allows the flexible use of the device because vGPU supports both shared GPU (multiple users per physical machine) and dedicated GPU (one user per physical GPU). Mixing and switching among machine learning, 3D graphics, and video encoding/decoding workloads using GPUs is much easier and allows for more efficient use of the hardware resource. Using GRID solutions for machine learning and 3D graphics allows cloud-based services to multiplex the GPUs among more concurrent users than the number of physical GPUs in the system. This contrasts with DirectPath I/O, which is the dedicated GPU solution, where the number of concurrent users are limited to the number of physical GPUs.
  • GRID vGPU reduces administration cost because its deployment and maintenance does not require server reboot, so no down time is required for end users. For example, changing the vGPU profile of a virtual machine does not require a server reboot. Any changes to DirectPath I/O configuration requires a server reboot. GRID vGPU’s ease of management reduces the time and the complexity of administering and maintaining the GPUs. This benefit is particularly important in a cloud environment where the number of managed servers would be very large.


Our tests show that virtualized machine learning workloads on vSphere with vGPUs offer near bare-metal performance.


  1. Jaffe, D. Big Data Performance on vSphere 6. (August 2016). http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/bigdata-perf-vsphere6.pdf.
  2. Zaremba, W., Sutskever,I., Vinyals, O.: Recurrent Neural Network Regularization. In: arXiv:1409.2329 (2014).
  3. Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An Overview. In: Abeille, A. (ed.). Treebanks: the state of the art in syntactically annotated corpora. Kluwer (2003).
  4. Tensorflow Homepage, https://www.tensorflow.org
  5. Vu, L., Sivaraman, H., Bidarkar, R.: GPU Virtualization for High Performance General Purpose Computing on the ESX hypervisor. In: Proc. of the 22nd High Performance Computing Symposium (2014).
  6. Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications. In: Proceedings of 2014 IEEE 7th International Conference on Cloud Computing (2014).
  7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, 86(11):2278-2324 (November 1998).
  8. Multiple Layers of Features from Tiny Images, https://www.cs.toronto.edu/~kriz/cifar.html
  9. Pandey, A., Vu, L., Puthiyaveettil, V., Sivaraman, H., Kurkure, U., Bappanadu, A.: An Automation Framework for Benchmarking and Optimizing Performance of Remote Desktops in the Cloud. In: To appear in Proceedings of the 2017 International Conference on High Performance Computing & Simulation (2017).

Updated – SQL Server VM Performance with vSphere 6.5, October 2017

Back in March, I published a performance study of SQL Server performance with vSphere 6.5 across multiple processor generations.  Since then, Intel has released a brand-new processor architecture: the Xeon Scalable platform, formerly known as Skylake.

Our team was fortunate enough to get early access to a server with these new processors inside – just in time for generating data that we presented to customers at VMworld 2017.

Each Xeon Platinum 8180 processor has 28 physical cores (pCores), and with four processors in the server, there was a whopping 112 pCores on one physical host!  As you can see, that extra horsepower provides nice database server performance scaling:

Generational SQL Server VM Database Performance

Generational SQL Server VM Database Performance

For more details and the test results, take a look at the updated paper:
Performance Characterization of Microsoft SQL Server on VMware vSphere 6.5

Introducing DRS DumpInsight

In an effort to provide a more insightful user experience and to help understand how vSphere DRS works, we recently released a fling: DRS Dump Insight.

DRS Dump Insight is a service portal where users can upload drmdump files and it provides a summary of the DRS run, with a breakup of all the possible moves along with the changes in ESX hosts resource consumption before and after DRS run.

Users can get answers to questions like:

  • Why did DRS make a certain recommendation?
  • Why is DRS not making any recommendations to balance my cluster?
  • What recommendations did DRS drop due to cost/benefit analysis?
  • Can I get all the recommendations made by DRS?

Continue reading