posted

0 Comments

As a VMware {code} Coach, I wanted to share a situation I recently experienced where I responded to a request quickly and then had to reevaluate & revise my approach.

A customer was looking to map some Linux virtual disks on a vVol datastore to their respective Pure Storage FlashArray volumes.

Looking at the request, I quickly determined that the new mechanism my colleague Cody Hosterman blogged about with the release of vSphere 7 wasn’t going to be sufficient. This was because the customer was on an older version of vSphere, and the new functionality wasn’t available to them.

Quickly putting a script together against my minimally used lab, I sent something to my customer.

It ran without a hitch in my environment, largely because my array was mostly empty.

My first attempt

I decided to go another route to accomplish some of the same things with a more legacy vSphere (6.5 in this case) approach. Some of the things I needed to accomplish included:

  • Determine which vmdks reside on a vVol datastore
  • Determine which of those are attached to a given VM
  • Determine which specific VM hard disk it is, and the SCSI ID
  • Map that SCSI back to a VM’s hard disk, and then retrieve the vVol’s name on FlashArray

My code was a little crude, and not specifically efficient.
I used the ‘lsscsi‘ command in my CentOS guest to return the SCSI ID for comparison.

The above script HardDiskToVvol.ps1 wasn’t efficient because it enumerated all of the vmdks on all vVol datastores, and then matched those that were attached to the specific VM.

Room for improvement

Upon hearing back from the customer, I learned that the script took a significant amount of time in their environment. Why was it so slow? Surely it shouldn’t have been.

After digging a little deeper, I realize the error in my ways and adjusted my approach.

In my test/lab environment, the process wasn’t specifically slow, but keep in mind that I only had a few vVols. In an environment with a significant number of vmdks on a vVol datastore, it could be quite slow.

So I then approached it a little differently:

  • Query the VM for the individual vmdks
  • Determine if those vmdks resided on a vVol datastore
  • Determine which specific VM hard disk it is, and the SCSI ID
  • Map that SCSI back to a VM’s hard disk, and then retrieve the vVol’s name on FlashArray

By only looking at the individual vmdks attached to the specific VM, the process is much faster, especially in cases where there are a significant number of vmdks residing on a vVol datastore. I also added the ability to prompt for VM Guest Credentials for the purpose of performing the process of invoking ‘lsscsi‘ in the guest.

The resulting output looks something like this:

In a very large environment the difference can be very significant. Consider the first script being run against an environment with hundreds of vmdks residing on a vVol datastore. This would put each of the hundreds of vmdks in an array, then have to check the VM’s vmdk’s against that list.

The HardDiskToVvol2.ps1 script is more efficient because it uses the properties of the individual disks and their datastore backing, rather than querying datastores for all the vmdks and only selecting those connected to the requested VM.

The second script simply checks the vmdks, determines if they are on a vVol backed datastore, and then performs the same operations. In my example, the VM only has 2 vmdks that meet this criteria. The second script runs significantly faster because the properties of only two vmdks, and their datastore backings.


Basically, script 1 was a Saturday night quick script that was run against a mostly bare environment. Script 2 has a bit more of a larger scale & optimized approach that should behave the same in any environment.

While my first attempt met the need, always look for opportunities to streamline and optimize code.

The above scripts are also on my site: https://www.jasemccarty.com/blog/powershell-match-linux-vmdk-on-vvols-to-flasharray-volume/