Author Archives: Ravi Soundararajan

Ravi Soundararajan

About Ravi Soundararajan

Ravi Soundararajan is a Principal Engineer in the Performance Group at VMware. He works on vCenter performance and scalability, from the UI to the server to the database to the hypervisor management agents. He has been at VMware since 2003, and he has presented on the topic of vCenter Performance at VMworld from 2013-2017. His Twitter handle is @vCenterPerfGuy.

Writing Performant Tagging Code: Tips and Tricks for PowerCLI

vSphere 5.1 introduced an inventory tagging feature that has been available in all later versions of vSphere, including vSphere 6.7. Tags let datacenter administrators organize different vSphere objects like datastores, virtual machines, hosts, and so on. This makes it easier to sort and search for objects that share a tag, among other things. For example, you might use tags to track a group of VMs that all have the same operating system.

Writing code to use tags can be challenging in large-scale environments: a straightforward use of VMware PowerCLI cmdlets may result in poor performance, and while direct Tagging Service APIs are faster, the documentation can be difficult to understand. In this blog, we show some practical examples of using PowerCLI and Tagging Service APIs to perform tag-related operations. We include some simple measurements to show the performance improvements when using the Tagging Service vs. cmdlets. The sample performance numbers are for illustrative purposes only. We describe the test setup in the Appendix.

1. Connecting to PowerCLI and the Tagging Service

In this document, when we write “PowerCLI cmdlets,” we mean calls like Get-Tag, or Get-TagCategory. To access this API, simply open a PowerShell terminal and log in:

Connect-VIServer <vCenter server IP or FQDN> -User <username> -Pass <password>

When we write “Tagging Service APIs,” we are referring to calls that are satisfied by the Tagging Service. To access such APIs from a PowerShell terminal, you must log in both to the vCenter Server and to the Tagging Service:

# Login to vCenter
Connect-VIServer <VC server IP or FQDN> -User <username> -Pass <password>

# Login to the tagging server (known as the CIS server), which contains the Tagging Service
Connect-CISserver <VC server IP or FQDN> -User <username> -Pass <password>

# Get a handle to the Tagging Service APIs
$taggingAPI = Get-CisService com.vmware.cis.tagging.tag

# Get a handle to the tag assignment APIs
$tagAssignmentAPI = Get-CisService com.vmware.cis.tagging.tag_association

To access built-in help for the tagging APIs, add .help to the method name. We give an example below with the actual documentation in italics.

# List available tag assignment method calls, using $tagAssignmentAPI from above
PS H:\> $tagAssignmentAPI.help

Documentation: The {@name TagAssociation} {@term service} provides {@term operations} to attach, detach, and query tags.

Operations: List<com.vmware.cis.tagging.tag_association.object_to_tags> list_attached_tags_on_objects(List<com.vmware.vapi.std.dynamic_ID> object_ids):

Fetches the {@term list} of {@link ObjectToTags} describing the input object identifiers and the tags attached to each object. To invoke this {@term operation}, you need the read privilege on each input object. The {@link ObjectToTags#tagIds} will only contain those tags for which you have the read privilege.

List<id> list_attachable_tags(com.vmware.vapi.std.dynamic_ID object_id): 

Fetches the {@term list} of attachable tags for the given object, omitting the tags that have already been attached. Criteria for attachability is calculated based on tagging cardinality ({@link CategoryModel#cardinality}) and associability ({@link CategoryModel#associableTypes}) constructs. To invoke this {@term operation}, you need the read privilege on the input object. The {@term list} will only contain those tags for which you have read privileges.

… <output truncated> …

2. High level differences between cmdlets and Tagging Service calls

2.1 Names vs. IDs

When using cmdlets, it is customary to use the name of a tag or category. For example, you might write the following to get the tag whose name is “john”.

PS H:\> $tag = Get-Tag -Name “john”

To get all tags with “john” in the name, you would use a “*” wildcard

PS H:\> $tagList = Get-Tag -Name “john*”

The Tagging Service, in contrast, typically requires IDs instead of names. These IDs persist throughout the lifetime of the tag or category, so they can be cached on creation and used throughout the lifetime of your scripts.

Example: Here is some sample code to retrieve the ID of a tag category given its name:

# We will find the id of the tag category for the category named “john”
$tagCatName = ‘john’

# Get a handle to all tag category API calls
$allCategoryMethodSvc = Get-CisService com.vmware.cis.tagging.category

# List all categories
$allCats = $allCategoryMethodSvc.list()

# Iterate over categories to find the desired category (name = $tagCatName)
foreach ($cat in $allCats) {
      if (($allCategoryMethodSvc.get($cat.value)).name -eq $tagCatName){
            # set $catID to the id of this category
            $catID= $cat.id
            break
      }
}

Example:  Here is some code to find the tag ID of the tag named “john”.

$tagName = “john”

# return a list of all tag IDs in the system.
$allTags = $allTagMethodSvc.list()

# iterate over tag IDs to find the tag whose name is $tagName
foreach ($tag in $allTags) {
      if ($allTagMethodSvc.get($tag).name -eq $tagName) {
            $tagID = $tag.id
            break
      }
}

If you know the name of the category, you can improve on the previous code by searching tags within this category.

# Assume that $catID was retrieved as specified above
$tagsForCatID = $allTagMethodSvc.list_tags_for_category($catID)
foreach ($tag in $tagsForCatID) {
      if ($allTagMethodSvc.get($tag).name -eq $tagName) {
            $tagID = $tag.id
            break
      }
}

For performance reasons, it is a good idea to store a mapping of IDs to names. With such a map, you avoid the need to iterate over each tag ID or category ID whenever you need a tag or category name. We give an example of making such a map below under example 4 (see createTagNameIdMap).

2.2 Tag and category specifications for Tagging Service calls

When creating a tag or a category using cmdlets, there are many default parameters, and others can be specified on the command line. With Tagging Service calls, a “spec” must be created and used.

Here is an example of creating a tag with “multiple” cardinality. (Multiple cardinality means that multiple tags from this category can be applied to a specific object at a time. For example, a category named “Owners” may have tags named “Alice” and “Bob”, and a given VM may have both “Alice” and “Bob” assigned to it. In contrast, single cardinality means that only one tag from a given category can be used on a specific object. For example, a category named “OS” may have tags named “Linux” and “Windows.” A given VM would have only one of these tags assigned.)

Creating a tag category using cmdlets

# Create tag category named ‘testCat’
$catName = ‘testCat’
New-TagCategory -Name $catName -Cardinality Multiple

Creating a tag category using the Tagging Service directly

$catName = ‘testCat’

# Get a handle to the category methods:
$allCategoryMethodSVC = Get-CisService com.vmware.cis.tagging.category

# Create a spec for the category
$catSpec = $allCategoryMethodSVC.Help.create.create_spec
$catSpec.cardinality = ‘MULTIPLE’
$catSpec.associable_types = ‘’

# NOTE: In vSphere 6.5, the category_id field should not be used.
# In vSphere 6.7, please set the category_id to ‘’.
$catSpec.category_id = ‘’
$catSpec.description = ‘’
$catSpec.name = $catName

# Now create the tag category
$allCategoryMethodSVC.create($catSpec)

3. Performance considerations

In general, Tagging Service calls are faster than cmdlets, and the difference becomes larger as the inventory size or number of tags grows. There are two main reasons for this. First, a cmdlet, while presenting a simpler interface to the user, often must make multiple backend calls to retrieve the same information that one might need to retrieve with a direct Tagging Service call. Second, cmdlets often use names rather than IDs in their invocations (e.g., Get-Tag <tagName> rather than Get-Tag -Id <tag-id>). Most of the tagging information is indexed by ID in the backend, so calls that use an ID are faster than calls that use a name.

4. Examples using cmdlets and Tagging Service calls

In the following examples, we show how to retrieve information using both cmdlets and Tagging Service calls. We also show sample performance numbers for a sample inventory with 3200 VMs (see the Appendix for details on the experimental setup). As noted above, the API for the Tagging Service is more efficient, though it requires using IDs rather than names. As a result, where possible, we suggest storing the ID/name mapping for tags and tag categories in a local file or data structure for fast lookup.

4.1 Global variables

In the examples that follow, we use a number of variables repeatedly. We define those variables here.

Cmdlet variables

$vsphereUnderTest = FQDN of the vCenter Server 
$user = username for the vCenter under test 
$pass = password for the vCenter under test 
$catName = ‘testCat’ 
$tagName = ‘testTag’ 
$allVMS = Get-VM

Tagging Service variables

$vsphereUnderTest = FQDN of the vCenter Server 
$user = username for the vCenter under test 
$pass = password for the vCenter under test 
$catName = ‘testCat’ 
$tagName = ‘testTag’ 
$vmNames = ‘testVM_1’, ‘testVM_2’, ‘testVM_3’, ‘testVM_4’, ‘testVM_5’ 
$allVMs = Get-VM

4.2 Code Samples 

Example 1: Connecting to vCenter and the Tagging Service

In this example, we show how to connect to vCenter and the Tagging Service. For the Tagging Service, as described above, you must log in to both the vCenter and the “CIS” server.

1A: Cmdlets
Connect-VIServer -Server $vsphereUnderTest -User $user -Password $pass
1B: Tagging Service (requires login to both vCenter and the CIS server)
# also connect to CIS service
Connect-VIServer -Server $vsphereUnderTest -User $user -Password $pass 
Connect-CISServer -Server $vsphereUnderTest -User $user -Password $pass

Discussion: Tagging Service requires logging in to both vCenter and the Tagging Service server (called the CIS server). When using the Tagging Service, you must also retrieve a handle to the appropriate Tagging Service methods. Here are some examples:

Getting method handles for various Tagging Services
# Category methods 
$allCategoryMethodSVC = Get-CisService com.vmware.cis.tagging.category 

# Tagging methods 
$allTagMethodSVC = Get-CisService com.vmware.cis.tagging.tag 

# Tag association methods 
$allTagAssociationMethodSVC = Get-CisService com.vmware.cis.tagging.tag_association

Example 2: Creating a tag category

In this example, we create a tag category. Recall that the category name $catName is defined above.

2A: Cmdlets
New-TagCategory -Name $catName -Cardinality Multiple
Average time to create tag category in our setup: 200 ms
2B: Tagging Service
# Create a spec for the category: 
$catSpec = $allCategoryMethodSVC.Help.create.create_spec 
$catSpec.cardinality = ‘MULTIPLE’ 
$catSpec.associable_types = ‘’
$catSpec.category_id = ‘’ 
$catSpec.description = ‘’ 
$catSpec.name = $catName 

# Now create the tag category 
$catObject = $allCategoryMethodSVC.create($catSpec)
   Average time to create tag category in our setup: 53 ms

Programming note: You don’t need to set the description, but if you don’t, the default value will be set to: @{Documentation=The description of the category.}

WARNING: If you are connected to a 6.7 vCenter you need to specify the category_id, if you don’t you will see an invalid_argument error.

Discussion: In the Tagging Service example, $catObject = $allCategoryMethodSVC.create($catSpec) creates the tag category, and the object returned is saved as $catObject. $catObject.Value is the ID of the category that was created.

Example 3: Creating a tag under a tag category

In this example, we create a tag under a tag category. As a convenience, we use the following function to get a category ID given the category’s name. We need the ID because the Tagging Service uses IDs, not names.

function Get-CategoryIdFromName {
    Param ($inputCatName)
    $allCategoryMethodSvc = Get-CisService com.vmware.cis.tagging.category
    $allCats = @()
    $allCats = $allCategoryMethodSvc.list()

    foreach ($cat in $allCats) {
        # Compare name of input category to current category. Return if match.
        if (($allCategoryMethodSvc.get($cat.value)).name -eq $inputCatName) {
            return $cat.value
        }
    }
    # no match
    return $null
}

Note: if you need to get category IDs from names multiple times, it is better to create a mapping of names to IDs. The following function creates a map of category names to IDs.

function createCategoryNameIdMap {
    $allCategoryMethodSvc = Get-CisService com.vmware.cis.tagging.category
    $allCats = $allCategoryMethodSvc.list()
    $catNameIdMap = @{}
    foreach ($cat in $allCats) {
        $catName = $allCategoryMethodSvc.get($cat.value).name
        $catNameIdMap.Add($catName,$cat)
    }
    return $catNameIdMap
}

 

Creating a tag under a tag category

3A: Cmdlets
New-Tag -Name $tagName -Category $catName
Average time to create a tag in our setup: 112 ms
3B: Tagging Service
# Use function defined above to get category ID from category name
$catID = Get-CategoryIdFromName($catName)

# First create a tag spec.
$spec = $allTagMethodSVC.Help.create.create_spec

$spec.name = $tagName
$spec.description = ‘’
$spec.tag_id = ‘’
$spec.category_id = $catID.value

# Now create the tag
$tagObject = $allTagMethodSVC.Create($spec)
Average time to create a tag in our setup: 64 ms

Programming note: You don’t need to set the description, but if you don’t the default value will be set to: @{Documentation=The description of the tag.}

WARNING: If you are connected to a 6.7 vCenter you need to specify the tag_id; if you don’t, you will see an invalid_argument error.

Discussion: The Tagging Service example uses the function Get-CategoryIdFromName to get the category ID. This line of code can be eliminated if you save the category object $catObject from example 2b. If you do this, you will also need to change the line $spec.category_id = $catID.value to $spec.category_id = $catObject.value.

When we create the tag, we save the object that is returned by $allTagMethodSVC.Create($spec) in $tagObject.

Example 4: Associating a tag with a VM

In this example, we associate a tag with a VM (also known as “attaching a tag to a VM”). The Tagging Service APIs, as mentioned above, use tag IDs instead of tag names. As a convenience, we use the following function to get a tag ID given the tag’s name.

function Get-TagIdFromName {
    Param ($a)
    $allTagMethodSvc = Get-CisService com.vmware.cis.tagging.tag
    $allTags = @()
    $allTags = $allTagMethodSvc.list()
    foreach ($tag in $allTags) {
        if (($allTagMethodSvc.get($tag.value)).name -eq $a) {
            return $tag.value
        }
    }
    return $null
}

In addition to using the tag ID instead of the tag name, the tag attachment API requires a specially-created VM object, as shown in the example below.

Attaching a tag to one VM

4A: Cmdlets
$tagArray = Get-Tag -Category $catName
New-TagAssignment -tag $tagArray[0] -entity $allVMS[0]
Average time to attach a tag in our setup: 2717 ms
4B: Tagging Service
# Pick a VM to attach the tag to
$testVM = Get-VM -Name $vmNames[0]

# The Tagging Service needs a VM object as an argument.
# We recommend doing this once and storing the result rather than recreating it each time you need it.
$VMInfo = $testVM.ExtensionData.MoRef
$vmid = New-Object PSObject -Property @{
    id = $VMInfo.value
    type = $VMInfo.Type
}

# The Tagging Service uses tag IDs, not names, so get the ID from the name using the method above.
$tagId = Get-TagIdFromName($tagName)

# Now attach the tag to the VM.
$allTagAssociationMethodSVC.attach($tagID, tagId, $vmid)
Average time to execute the attach() call in our setup: 35 ms

Programming note: $tagId uses our Get-TagIdFromName convenience function.

Note: If you need to get tag IDs from names multiple times, it is probably better to create a mapping of names to IDs. The following function creates a map of tag names to IDs:

function createTagNameIdMap {
    $allTagMethodSvc = Get-CisService com.vmware.cis.tagging.tag
    $allTags = $allTagMethodSvc.list()
    $tagNameIdMap = @{}
    foreach ($tag in $allTags) {
        $tagName = $allTagMethodSvc.get($tag.value).name
        $tagNameIdMap.Add($tagName,$tag)
    }
    return $tagNameIdMap
}

To use the function above, you could do the following (we assume that there is a tag named “test tag”):

$tmap = createTagNameIdMap
$testTagId = $tmap[“test tag”]

Example 5: Get the tags assigned to a VM (1 tag associated with the VM)

In this example, we get the tags assigned to a VM. The cmdlet assumes that we have an array of VMs $allVMs (defined above in “Global Variables”), and it gets the association to the first one. The Tagging Service needs a VM ID object rather than a VM name. We use the $vmid object created in the previous example. We assume there is one tag associated with the VM.

5A: Cmdlets
# Pick the first VM in our list of VMs ($allVMs)
Get-TagAssignment -Entity $allVMS[0]
Average time to get tag assignment in our setup: 2610 ms
5B: Tagging Service
# Assume we have the VM object $vmid from the previous example, and assume we have only 1 tag association.
$tagID = $allTagAssociationMethodSVC.list_attached_tags($vmid)
Average time to get tag assignment in our setup: 49 ms

Programming note: The above code will return the tag ID. If you need to get more information about the tag, use the following line:

$allTagMethodSVC.get($tagID.value)

If you want the name, then use this:

$allTagMethodSVC.get($tagID.value).name

 

Example 6: Assign one tag to 1000 VMs

In this example, we assign a single tag to 1000 VMs. In both the cmdlet case and the Tagging Service case, we assume an array of VMs ($allVMs), as defined above. For the Tagging Service, we create an array of VM ID objects from this $allVMs array. We also use the $tagID created in the previous example.

6A: Cmdlets
# Pick 1000 VMs from $allVMs and assign the first tag in $tagArray to each VM.
$useTheseVMs = $allVMS[0..999] 
$useTheseVMs | New-TagAssignment -Tag $tagArray[0]
Average time to attach 1k VMs to one tag in our setup: 2727842 ms
6B: Tagging Service
# Use $allVMs from Global Variables above.
$useTheseVMIDS = @()

# Create VM objects for all VMs.
# This should be done once for all VMs, not every time
# you want to do an association.
for ($i = 0; $i -lt 1000; $i++) {
    $VMInfo = $allVMS[$i].extensiondata.moref
    $useTheseVMIDS += New-Object PSObject -Property @{
        id = $VMInfo.value
        type = $VMInfo.type
    }
}

# Assume we have $tagId from the previous example.
$allTagAssociationMethodSVC.attach_tag_to_multiple_objects($tagID, $useTheseVMIDS)

Once VM objects have been created, the average time to attach 1k VMs to one tag: 18438 ms

Example 7: List the VMs assigned to one tag

In this example, we retrieve the VMs assigned to one tag. In the cmdlet case, we assume that there is an array of 1000 tags $tagArray, and we find the VMs assigned to the first tag. In the Tagging Service case, we need to use a tag ID instead of a name, so we use $tagID from the previous example.

7A: Cmdlets
# Get VMs attached to the first tag in $tagArray.
$vmsAssignedToTag = Get-VM -Tag $tagArray[0]
Average time to search for the VMs: 25844 ms
7B: Tagging Service
# Assume we have $tagID from earlier tests. 
$vmsAssignedToTag = $allTagAssociationMethodSVC.list_attached_objects($tagId)
Average time to search for the VMs: 411 ms

Programming note: Return value $vmsAssignedToTag is a list of PSCustomOjbects. To find the type, use $vmsAssignedToTag.type, and to find the ID, use $vmsAssignedToTag.id

Note: The return value of the list_attached_objects method is a list of VM objects. If you want to get the ID of a given VM for subsequent Tagging Service calls, you must do the following:

# Get the first VM in the list.
$vm_0 = $vmsAssignedToTag[0]

# Get the type. In this case, it is VirtualMachine.
$v_type = $vm_0.type

# Get the id. In this case, it will be something like ‘vm-222’.
$v_id = $vm_0.id

# Construct an ID from the type and ID.
# The result is of the form ‘VirtualMachine-vm-222’.
$VirtualMachineId = $v_type + "-" + $v_id

# This ID can now be used to retrieve additional information about VMs.
$vm_0_get = Get-VM -Id $VirtualMachineId

Example 8: Attach 1000 tags to one VM

In this example, we attach 1000 tags to a single VM. We assume that you have already created 1000 tags, for example, by taking the code in example 3 in a loop.

8A: Cmdlets
$useTheseTags = $tagArray[0..999] 
foreach ($tag in $useTheseTags) { 
  New-TagAssignment -Entity $vmArray[1] -Tag $tag
}
Average time to attach 1000 tags to 1 VM in our setup: 3236949 ms
8B: Tagging Service
# Get list of tags you want to attach
$tagArray = $alltagMethod.list()
$useTheseTags = $tagArray[0..999]

# Create a VM object to attach the tags to. We re-use a VM from our list above.
# For best performance, create these VM objects once and store them: do not 
# create them every time you wish to do an associaton.
$VMInfo = ($allVMS[0]).extensiondata.moref
$vmid = New-Object PSObject -Property @{
    id = $VMInfo.value
    type = $VMInfo.type
}

# Now do the actual association.
$allTagAssociationMethod.attach_multiple_tags_to_object($vmid, $useTheseTags)
Once VM object has been created, average time to attach 1000 tags to 1 VM in our setup: 1511 ms

Example 9: List all the tags associated with a VM (1000 associations)

In this example, we list all tags associated with a VM. In our test, we use a VM that has 1000 tags associated with it. In both the cmdlet case and the Tagging Service case, we use the $allVMs array above, and pick the first VM.

9A: Cmdlets
Get-TagAssignment -Entity $allVMs[0]
Average time to get tag assignments in our setup: 3715 ms
9B: Tagging Service
# Create VM object. Do this once and store it, rather than
# creating it every time you need to retrieve tag associations.
$VMInfo = $allVMs[0].extensiondata.moref 
$vmid = New-Object PSObject -Property @{ 
   Id = $VMInfo.value 
   Type = $VMInfo.type 
} 
$tagIDArray = $allTagAssociationMethod.list_attached_tags($vmid)
 Average time for get tag assignments after object has been created: 63 ms

Programming note: The above code will return an array of tag IDs. If you need to get more information about the tags, use $allTagMethodSVC.get for each tag ID.

5. Takeaways

As you can see, the cmdlets are typically shorter and more intuitive to write than the Tagging Service scripts. However, the performance of Tagging Service calls can be substantially better than cmdlets. For small inventories or small numbers of tags, cmdlet performance is likely to be adequate. For larger inventories, and for better performance, we recommend using Tagging Service calls.

Appendix: Experimental environment

For these examples, we used a testbed that had 32 hosts and 3200 VMs. These measurements were done after creating 10 tag categories with 1000 tags per category (for a total of 10,000 tags in the system). Actual performance for your environments will vary with inventory configuration and the configuration of the vCenter appliance.

Versions:
PowerShell version 5.1.14409.1005
VMware PowerCLI 10.1.0 build 8346946
vSphere 6.7 GA

About the Authors

Joseph Zuk is a staff-level automation test engineer working at VMware within the Performance Engineering Automation team. He focuses on at-scale performance testing of vCenter. He has been using PowerCLI for setup of his automation testbed since 2011.

Ravi Soundararajan is a principal engineer in the Performance Engineering group at VMware. He works on vCenter performance and scalability–from the UI, to the server, to the database, to the hypervisor management agents. He has been at VMware since 2003, and he has presented on the topic of vCenter Performance at VMworld from 2013-2018. His Twitter handle is @vCenterPerfGuy.

vCenter performance improvements from vSphere 6.5 to 6.7: What does 2x mean?

In a recent blog, the VMware vSphere team shared the following performance improvements in vSphere 6.7 vs. 6.5:

Moreover, with vSphere 6.7 vCSA delivers phenomenal performance improvements (all metrics compared at cluster scale limits, versus vSphere 6.5):
2X faster performance in vCenter operations per second
3X reduction in memory usage
3X faster DRS-related operations (e.g. power-on virtual machine)

As senior engineers within the VMware Performance and vSphere teams, we are writing this blog to provide more details regarding these numbers and to explain how we measured them. We also briefly explain some of the technical details behind these improvements.

Cluster Scale

Let us first explain what all metrics compared at cluster scale limits means. What is cluster scale? Here, it is an environment that includes a vCenter server that is configured for the largest vSphere cluster that VMware currently supports, namely 64 hosts and 8,000 powered-on VMs. This setup represents a high-consolidation environment with 125 VMs per host. Note that this is different from the setup used in our previous blog about vCenter 6.5 performance improvements. The setup in that blog was our datacenter scale environment, which used the largest number of supported hosts (2000) and VMs (25,000), so the numbers from that blog should not be compared to these numbers.

2x and 3x

Let us now explain some of the performance numbers quoted. We produced the numbers by measuring workload runs in our cluster scale setup.

2x vCenter Operations Per Second, vSphere 6.7 vs. 6.5, cluster scale limits. We measure operations per second using an internal benchmark called vcbench. We describe vcbench below under “Benchmark Details.” One of the outputs of this workload is management operations (for example, clone, powerOn, vMotion) performed per second.

  • In 6.5, vCenter was capable of performing approximately 8.3 vcbench operations per second (described below under “Benchmark Details”) in the cluster-scale testbed.
  • In 6.7, vCenter is now capable of performing approximately 16.7 vcbench operations per second.

3x reduction in memory usage. In addition to our vcbench workload, we also include a simplified workload that simply executes a standard workflow: create a VM, power it on, power it off, and delete it. The rapid powerOn and powerOff of VMs in this setup puts more load on the DRS subsystem than the typical vcbench test.

  • In 6.5, the core vCenter process (vpxd) used on average about 10 GB to complete the workflow benchmark (described below under “Benchmark Details”).
  • In 6.7, the core vCenter process used approximately 3 GB to complete this run, while also achieving higher churn (that is, more workflow ‘create/powerOn/powerOff/delete’ cycles completed within the same time period).

3x faster DRS-related operations. In our vcbench workload, we measure not just the overall operations per second, but also the average latencies of individual operations like powerOn (which exercises the majority of the DRS software stack). We issue many concurrent operations, and we measure latency under such load.

  • In 6.5, the average latency of an individual powerOn during a vcbench run was 9.5 seconds.
  • In 6.7, the average latency of an individual powerOn during a vcbench run was 2.8 seconds.

The latencies above reflect the fact that a cluster has 8,000 VMs and many operations in flight at once. As a result, individual operations are slower than if they were simply run on a single-host, single-VM environment.

What does this mean to you as a customer?

As a result of these improvements, customers in high-consolidation environments will see reduced resource consumption due to DRS and reduced latency to generate VMotions for load balancing. Customers will also see faster initial placement of VMs.

Brief Deep Dive into Improvements

Before we describe the improvements, let us first briefly explain how DRS works, at a very high level.

When powering on a VM, vCenter must determine where to place the VM. This is called initial placement. Many subsystems, including DRS and policy management, must be consulted to determine valid hosts on which this VM can run. This phase is called constraint check. Once DRS determines the host on which a VM should be powered on, it registers the VM onto that host and issues the powerOn. This initial placement requires a snapshot of the inventory: by snapshot, we mean that DRS records the current configuration of hosts and VMs in the cluster.

In addition to balancing during initial placement, every 5 minutes, DRS re-examines the load of the cluster and performs a series of computations to generate VMotions that help balance the load across hosts. This phase is called periodic rebalancing. Periodic rebalancing requires an examination of the historical utilization statistics for each host and VM (for example, over the previous hour) in order to determine proper placement.

Finally, as VMs get moved around, the used capacity in resource pools changes. The vCenter server periodically exchanges messages called SpecSyncs with each host to push down the most recent resource pool configuration. The SpecSync operation requires traversing a host’s resource pool structure and changing it to make sure it matches vCenter’s configuration.

With this understanding in mind, let us now give some technical details behind the improvements above. As with our previous blog about vCenter performance improvement, we describe changes in terms of rocks (that is, somewhat large changes to entire subsystems) and pebbles (smaller individual changes throughout the code base).

Rocks

The three main rocks that we address in 6.7 are simplified initial placement, faster resource divvying, and faster SpecSyncs.

Simplified initial placement. As mentioned above, initial placement relies on a snapshot of the current state of the inventory. Creating this snapshot can be a heavyweight operation, requiring a number of data copies and locking of host and cluster data structures to ensure a consistent view of the data. In 6.7, we moved to a lightweight online approach that keeps the state up-to-date in a continuous manner, avoiding snapshots altogether. With this approach, we significantly reduce locking demands and significantly reduce the number of times we need to copy data. In some highly-contended clusters, this reduced the initial placement time from seconds down to milliseconds.

Faster (and more frequent) resource divvying. Divvying is the act of determining the resource allocations for each VM. Every five minutes, the state of the cluster is examined and both divvying and then rebalancing (using VMotion) are performed. To make the divvying phase faster, we performed a number of optimizations.

  • We changed the approach to examining historical usage statistics. Instead of storing metrics for every VM and every host over an hour, we aggregated the data, which allowed us to store a smaller number of metrics per host. This dramatically reduced memory usage and simplified the computation of the desired load for each host.
  • We restructured the code to remove compatibility checks (for example, those that help to determine which VMs can run on which hosts) during this divvying phase. In 6.5 and earlier, divvying a load also involved various host/VM compatibility calculations. Now, we store the compatible matrix and update it when compatibility changes, which is typically infrequent.
  • We have also done significant code refactoring (described below under “Pebbles”) to this code path.

By implementing these changes and making divvying faster in 6.7, we are now able to perform divvying more frequently: once per minute instead of once every five minutes. As a result, resources flow more quickly between resource pools, and we are better able to enforce fairness guarantees in DRS clusters.

Note that periodic load balancing (through VMotion) still occurs every five minutes.

Faster SpecSyncs. To perform a SpecSync, vCenter sends a resource pool configuration to a host. The host must examine this configuration and create a list of changes required to bring that host in sync with vCenter. In 6.5 and earlier, depending on the number of VMs, creating this list of changes could result in hundreds of operations on a host, and the runtime was highly variable. In 6.7, we made this computation more deterministic, reducing the number of operations and lowering the latency appropriately.

Pebbles

In addition to the changes above, we also performed a number of optimizations throughout our code base.

Code Refactoring. In 6.5 and before, admission control decisions were made by multiple independent subsystems within vCenter (for example, DRS would be responsible for some decisions, and HA would make others). In 6.7, we simplified our code such that all admission control decisions are handled by a module within DRS. Reducing multiple copies of this code simplifies debugging and reduces resource usage.

Finer-grained locks. In 6.7, we continued to make strides in reducing the scope of our locks. We introduced finer-grained locks so that DRS would not have to lock an entire VM to examine certain pieces of state. We made similar improvements to both hosts and clusters.

Removal of unnecessary classes, maps, and sets. In refactoring our code, we were able to remove a number of classes and thereby reduce the number of copies of data in our system. The maps and sets that were needed to support these classes could also be removed.

Preferring integers over strings. In many situations, we replaced strings and string comparisons with integers and integer comparisons. This dramatically reduces memory allocation overhead and speeds up comparisons, reducing CPU.

Benchmark Details

We measure Operations Per Second (OPS) using a VMware benchmark called vcbench. (For more details about vcbench, see “Benchmarking Details” in vCenter 6.5 Performance: what does 6x mean?) Briefly, vcbench is a java-based application that takes as an input a runlist, which is a list of management operations to perform. These operations include powering on VMs, cloning VMs, reconfiguring VMs, and migrating VMs, among many other operations. We chose the operations based on an analysis of typical customer management scenarios. The vcbench application uses vSphere APIs to issue these operations to vCenter. The benchmark creates multiple threads and issues operations on those threads in parallel (up to 32). We measure the operations per second by taking the total number of operations performed in a given timeframe (say, 1 hour), and dividing it by the time interval.

Our workflow benchmark is very similar to vcbench. The main difference is that more operations are issued per host at a time. In vcbench, one operation is issued per host at a time, while in workflow, up to 8 operations are issued per host at a time.

In many cases, the size of the VM has an impact on operational latency. For example, a VM with a lot of memory (say, 32 GB) and large disks (say, 100 GB) will take longer to clone because more memory will need to be snapshotted, and more disk data will need to be copied. To minimize the impact of the disk subsystem in our measurements, we use small VMs (<4GB memory, < 8GB disk).

Because we limit ourselves to 32 threads per vCenter in this single-cluster setup, throughput numbers are smaller than for our datacenter-at-scale setups (2,000 hosts; 25,000 VMs), which use up to 256 concurrent threads per vCenter.

Summary

In this blog, we have described some of the performance improvements in vCenter from 6.5 to 6.7. A variety of improvements to DRS have led to improved throughput and reduced resource usage for our vcbench workload in a cluster scale setup of 64 hosts and 8,000 powered-on VMs. Many of these changes also apply to larger datacenter-scale setups, although the scope of improvement may not be as pronounced.

Acknowledgments

The vCenter improvements described in this blog are the results of thousands of person-hours from vCenter developers, performance engineers, and others throughout VMware. We are deeply grateful to them for making this happen.

Authors

Zhelong Pan is a senior staff engineer in the Distributed Resource Management Team at VMware. He works on cluster management, including shared resource allocation, VM placement, and load balancing. He is interested in performance optimizations, including virtualization performance and management software performance. He has been at VMware since 2006.

Ravi Soundararajan is a principal engineer in the Performance Group at VMware. He works on vCenter performance and scalability, from the UI to the server to the database to the hypervisor management agents. He has been at VMware since 2003, and he has presented on the topic of vCenter Performance at VMworld from 2013-2017. His Twitter handle is @vCenterPerfGuy.

vCenter 6.5 Performance: what does 6x mean?

At the VMworld 2016 Barcelona keynote, CTO Ray O’Farrell proudly presented the performance improvements in vCenter 6.5. He showed the following slide:

6x_slide

Slide from Ray O’Farrell’s keynote at VMworld 2016 Barcelona, showing 2x improvement in scale from 6.0 to 6.5 and 6x improvement in throughput from 5.5 to 6.5.

As a senior performance engineer who focuses on vCenter, and as one of the presenters of VMworld Session INF8108 (listed in the top-right corner of the slide above), I have received a number of questions regarding the “6x” and “2x scale” labels in the slide above. This blog is an attempt to explain these numbers by describing (at a high level) the performance improvements for vCenter in 6.5. I will focus specifically on the vCenter Appliance in this post.

Continue reading