I’m a VMware Technical Account Manager (TAM), and many of my customers run their VMs on a storage array with a fibre channel (FC) connection. Connections to the storage array can be via FC traditional storage networks or FCoE through a high-speed Ethernet network. With FCoE, it converges FC and IP networks into one.
VMware has some great documents you can refer to for a complete understanding of storage virtualization and FC storage best practices. From my engagements with my customers, here are some of the common issues we’ve uncovered together:
- There should only be one LUN for one datatstore. If the LUN needs to be extended because it is running low in space, just extend that particular LUN from the storage array. After that, increase the datastore size from vCenter. Do not create another LUN and add on to it. I have seen customers with multiple extensions on a datastore, which causes inconsistent performance. This can be checked from RVTools.
- All LUNs presented to the host or all hosts within a cluster must be consistent. This mean the LUN ID (SCSI ID) must be consistent for all the hosts. I have seen customers with different LUN IDs for the same LUN presented to different hosts. This has caused ESXi mounted to the LUN as snap-xxxx. If you see a datastore has been renamed to snap-xxxx, be cautious. A different LUN ID may have caused it.
- Check with a storage array vendor on the recommended path selection policy (PSP). A commonly used PSP is Round Robin. The default PSP setting is MRU. To change the PSP settings to Round Robin, follow this KB. PSP can be verified from RVTools.
- When configuring Round Robin, check the storage array on the IOPS settings. By default, IOPS is set to 1000. VMware has a KB recommending setting IOPS=1 for Round Robin. In addition, if you have LUNs used by Raw Device Mapping (RDM) for Microsoft Cluster, the recommended setting is IOPS=200.
- All LUNs presented to the host within a cluster ideally should have a minimum of four paths for redundancy. You can refer here for more information. I have seen customers with datastores with six paths on one host and two paths on another hosts, which created inconsistency.
- Mark all RDM LUNs used for Microsoft Clustering or Red Hat Clustering as “perennially reserved” as per KB. Customers often ask me why their ESXi hosts take more than an hour to reboot.
As a TAM, one of the many configuration settings I always check is LUN (datastores) settings to make sure they are set consistently. There are many tools we can use to check, RVTools being the most popular. However, if you have hundreds or thousands of datastores and a huge virtual infrastructure, this could be a nightmare.
Luckily, I’ve found an easy way! Here is a simple PowerCLI script that you can use to catch some of the misconfiguration settings on the LUNs shared to an ESXi host. This script will export all the datastores’ information into Excel CSV format for easy reference:
VMHost – ESXi host name
Cluster – Cluster name
Canonical – LUN UUID
Capacity – LUN size in GB
Vendor – Storage Vendor (for hosts connecting to multiple arrays)
MultipathPolicy – PSP
CommandsToSwitchPath – IOPS setting for Round Robin
LunType – Array disk or local storage controller
IsLocal – True, if local disk
IsSsd – True, if disk is of SSD type
LunID – LUN ID
Perennial Reservation – True, if it is RDM disk for VM clustering
Datastore – Datastore name
Here is an output generated by the script from a cluster. Yes, this cluster is purposely created for SQL DB clustering:
I made minor changes to the script by adding the datastore’s name and perennial reservation settings. This info helps the customer identify the RDM LUN used for clustering from normal datastores for VMs. You can run the script for a host or a cluster. If you run it against the cluster, you can use a pivot table with tabular form with LUN as the column and host as the row showing LUN ID. This will help you validate if all the LUN IDs are set correctly.
Below is an output for one of my customers shown in a pivot table. Evidently, three hosts had different LUN ID presented on the same LUN.
Here is the script I used:
# Enumerating datastores list
# LucD – https://communities.vmware.com/t5/user/viewprofilepage/user-id/256147
Write-Host “Building Datastores Table” -ForegroundColor Yellow
$dsTab = @{}
foreach($ds in (Get-Cluster “cluster name” | Get-Datastore | where{$_.ExtensionData.Summary.MultipleHostAccess})){
$ds.ExtensionData.Info.Vmfs.Extent | %{if($_.DiskName){$dsTab.Add($_.DiskName,$ds.Name)
}
}
}
Write-Host “Gather LUN info” -ForegroundColor Yellow
# Shane – https://www.virtuallyshane.com/posts/powercli-to-get-lun-id-number-info
#Get VMFS volumes info. Ignore local SCSILuns.
Get-Cluster “cluster name” | Get-VMHost | Get-ScsiLun | Sort-Object VMhost | Select-Object VMHost, @{N=’Cluster’;E={Get-Cluster -VMHost $_.VMHost}}, CanonicalName, CapacityGB, Vendor, MultipathPolicy, CommandsToSwitchPath, LunType, IsLocal, IsSsd, @{n=’LunID’;E={
$esxcli = Get-EsxCli -VMHost $_.VMHost -V2
$esxcli.storage.nmp.path.list.Invoke(@{‘device’=$_.CanonicalName}).RuntimeName.Split(‘:’)[-1].TrimStart(‘L’)}},
@{n=’Perennial Reservation’;E={$esxcli = Get-EsxCli -VMHost $_.VMHost -V2
$esxcli.storage.core.device.list.Invoke(@{‘device’=$_.CanonicalName}).IsPerenniallyReserved}}, @{n=’Datastore’;E={$dsTab[$_.CanonicalName]}} |
Sort-Object -Property {[int]$_.LUN} | Export-Csv “C:\Host\LUN.csv” -NoTypeInformation
I was able to use the output and run through with my customer to ensure all the settings are set accordingly and consistently. I hope this quick and simple script can be useful to you in identifying any inconsistency in your configurations. Please test out this script on a test environment before using it on production.
Reach out to your VMware technical account manager if you ever have questions. We’re happy to help!