Best Practice: How to correctly remove a LUN from an ESX host
Yes, at first glance, you may be forgiven for thinking that this subject hardly warrants a blog post. But for those of you who have suffered the consequences of an All Paths Down (APD) condition, you'll know why this is so important.
Let's recap on what APD actually is.
APD is when there are no longer any active paths to a storage device from the ESX, yet the ESX continues to try to access that device. When hostd tries to open a disk device, a number of commands such as read capacity and read requests to validate the partition table are sent. If the device is in APD, these commands will be retried until they time out. The problem is that hostd is responsible for a number of other tasks as well, not just opening devices. One task is ESX to vCenter communication, and if hostd is blocked waiting for a device to open, it may not respond in a timely enough fashion to these other tasks. One consequence is that you might observe your ESX hosts disconnecting from vCenter.
We have made a number of improvements to how we handle APD conditions over the last number of releases, but prevention is better than cure, so I wanted to use this post to highlight once again the best practices for removing a LUN from an ESX host and avoid APD:
ESX/ESXi 4.1
Improvements in 4.1 means that hostd now checks whether a VMFS datastore is accessible or not before issuing I/Os to it. This is an improvement, but doesn't help with I/Os that are already in-flight when an APD occurs. The best practices for removing a LUN from an ESX 4.1 host, as described in detail in KB 1015084, are as follows:
- Unregister all objects from the datastore including VMs and Templates
- Ensure that no 3rd party tools are accessing the datastore
- Ensure that no vSphere features, such as Storage I/O Control, are using the device
- Mask the LUN from the ESX host by creating new rules in the PSA (Pluggable Storage Architecture)
- Physically unpresent the LUN from the ESX host using the appropriate array tools
- Rescan the SAN
- Clean up the rules created earlier to mask the LUN
- Unclaim any paths left over after the LUN has been removed
Now this is a rather complex set of instructions to follow. Fortunately, we have made things a little easier with 5.0.
ESXi 5.0
The first thing to mention in 5.0 is that we have introduced a new Permanent Device Loss (PDL) condition - this can help alleviate some of the conditions which previously caused APD. But you could still run into it if you don't correctly remove a LUN from the ESX. There are details in the post about the enhancements made in the UI and the CLI to make the removal of a LUN easier. But there are KB articles that go into even greater detail.
To avoid the rather complex set of instructions that you needed to follow in 4.1, VMware introduced new detach and unmount operations to the vSphere UI & the CLI.
As per KB 2004605, to avoid an APD condition in 5.0, all you need to do now is to detach the device from the ESX. This will automatically unmount the VMFS volume first. If there are objects still using the datastore, you will be informed. You no longer have to mess about creating and deleting rules in the PSA to do this safely. The steps now are:
- Unregister all objects from the datastore including VMs and Templates
- Ensure that no 3rd party tools are accessing the datastore
- Ensure that no vSphere features, such as Storage I/O Control or Storage DRS, are using the device
- Detach the device from the ESX host; this will also initiate an unmount operation
- Physically unpresent the LUN from the ESX host using the appropriate array tools
- Rescan the SAN
This KB article is very good since it also tells you which features (Storage DRS, Storage I/O Control, etc) may prevent a successful unmount and detach.
Please pay particular attention to these KB articles if/when you need to unpresent a LUN from an ESX host.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage
### Masking LUN paths using PowerCLI (Tested on ESXi 4.1)
Connect-VIServer -Server MyESX -Protocol https -User root -Password password!!
$esxcli = Get-EsxCli
$esxcli.nmp.path.list() | Select Device, RuntimeName | Sort RuntimeName
naa.6005nnnnnnnn0019 vmhba34:C0:T0:L0
$esxcli.nmp.path.list("naa.6005nnnnnnnn0019")
$esxcli.corestorage.claimrule.list()
$Prule=157
$Padapter="vmhba34"
$Pchannels=0,1
$Ptargets=0,1
$Pluns=0,1,2
ForEach ($Plun in $Pluns) {
ForEach ($Pchannel in $Pchannels) {
ForEach ($Ptarget in $Ptargets) {
Write-host "Add Rule: " $Prule " for "$Padapter":C"$Pchannel":T"$Ptarget":L"$Plun
$esxcli.corestorage.claimrule.add($Padapter,$null,$Pchannel,$null,$null,$null,$null,$Plun,$null,"MASK_PATH",$Prule,$Ptarget,$null,"location",$null)
$esxcli.corestorage.claimrule.load()
$esxcli.corestorage.claiming.unclaim($Padapter,$Pchannel,$null,$null,$null,$Plun,$null,$null,$null,$Ptarget,"location",$null)
$esxcli.corestorage.claimrule.run()
$Prule++
}
}
}
$esxcli.corestorage.claimrule.list()
--> Perform LUN masking at the storage level
### Unmasking the path
$Prule=157
$Pluns=0,1,2
ForEach ($Plun in $Pluns) {
ForEach ($Pchannel in $Pchannels) {
ForEach ($Ptarget in $Ptargets) {
Write-host "Delete Rule: " $Prule " for "$Padapter":C"$Pchannel":T"$Ptarget":L"$Plun
$esxcli.corestorage.claimrule.delete($null,$Prule)
$esxcli.corestorage.claimrule.load()
$esxcli.corestorage.claiming.unclaim($Padapter,$Pchannel,$null,$null,$null,$Plun,$null,$null,$null,$Ptarget,"location",$null)
$Prule++
}
}
}
$esxcli.corestorage.claimrule.run()
$esxcli.corestorage.claimrule.list()
Posted by: Oscar Madrigal | 11/22/2011 at 11:50 PM
Thanks for sharing this Oscar.
Posted by: Chogan | 11/23/2011 at 01:56 AM
Specific to ESXi4.1, is the procedure same while decommissioning the storage. We are decommissioning existing datastore. We will storage vMotion our VMs to new Datastores. If you could please clarify.
Also according to KB:1015084 I've not understood last section.
"Run this command to unclaim each path to the masked device:
# esxcli corestorage claiming unclaim -t location -A vmhba0 -C 0 -T 0 -L 2
This ensures that all paths to the device are unclaimed successfully before running the claim. Update vmhba, controller, target and LUN number as required."
Why I have to unclaim the path when device is never going to in?
Posted by: Techstarts | 11/23/2011 at 05:49 PM
Thanks for commenting Techstarts.
Yes, I would use the same procedure to decommission storage.
My understanding for why you unclaim the device is that you remove it completely from the VMkernel. This means that the VMkernel will no longer try to look for that device while you are in the process of removing/decommissioning it from the host.
If the KB is unclear, please leave a comment in the KB and ask them to clarify. These comments are all monitored, and should result in more detail getting added to the KB.
Posted by: Chogan | 11/24/2011 at 12:41 AM
How does this relate to using RDM? I wrote back in 2009 how annoying vSphere 4.1's new "intelligence" was with regards to removing LUNs (in my case RDMs) and re-adding them (I did this for SAN based snapshot stuff).
http://www.techopsguys.com/2009/08/18/its-not-a-bug-its-a-feature/
I have worked around the "feature" by using iSCSI directly in the guests bypassing vSphere entirely, this is not an ideal solution when I have hardware accelerated fibre channel available to every system.
Basically my process was:
1) unmount RDM from within guest
2) remove LUN out from under ESX
3) delete snapshot assigned to LUN
4) create new snapshot
5) create new LUN (same number as before)
6) export to ESX
7) re-mount volume in guest (i.e. no unmapping from guest, no guest reboot/shutdown etc)
This was part of an automated process that would take a R/W snapshot from say a production Oracle or MySQL database and swing it around to another system for use.
in ESX 3.0 and 3.5 this worked flawlessly hundreds of times, in vSphere 4.0 (haven't tried 4.1 or 5.0 yet) this broke because vSphere assigned a new UUID to the new snapshot even though it was on the same LUN # with RDM! The docs for RDM explicitly state RDM is good for leveraging SAN snapshots and/or for doing backups, both of which I leveraged this ability with, and it broke horribly in 4.0. vSphere should stay out of trying to micro manage RDMs because they are just that - raw devices.
I had similar UUID troubles (which I document in the above post) when dealing with NFS as well since vSphere 4.0 (again haven't tried 4.1 or 5.0 yet) assigned a UUID based on the IP address of the NFS cluster (of which there was many), so if I relied upon DNS to access my NFS (which seems common) I could not do things like vMotion because host #1 may be using IP #1 and host #2 may be using IP #3 and vSphere would see those different IPs as different UUIDs (even though it is the same file system, same NAS cluster), and prevent vMotion from occurring.
Can you revise your post to include instructions on how to do the same thing with RDM ? Preferably something that is simple.
I filed support cases on both of these issues back in 2009 but got nowhere.
Maybe it will 'just work' in 4.1 and/or 5.0, though I'm not holding my breath based on what support told me back then.
thanks
Posted by: nate | 11/29/2011 at 04:59 PM
Hello Nate,
Thanks for the comment.
My best guess for this behaviour is that in vSphere 4.0 we moved away from using the archaic vmhbaC:T:L:P naming convention to using actual SCSI identifiers like the NAA id. The older method would allow you to remove a LUN and present a different LUN back, and have ESX 3.x earlier think it was the same device. As you've now observed, this is no longer the case, with each snapshot getting allocated its own NAA id. This mechanism prevents original and snapshot copies of a VMFS being presented to the same ESX at the same time, and causing other problems.
The behaviour is 4.0 will be the same in 4.1 & 5.0 as we now use NAA ids to uniquely identify SCSI devices.
To work around this issue, could you not just remove the mapping file after unpresenting the old LUN/snapshot and recreate it when the new LUN/snapshot is presented? This way the mapping file will then have the correct reference (older mapping files used the vmhbaC:T:L:P which is why it worked).
Posted by: Chogan | 11/30/2011 at 03:38 AM
In vsphere 5.0, It is worthwhile to note that prior to unmount using VC, it does a exhaustive check and only when everything checks out "Green" it will give you an option to proceed with the unmount.
Big thanks to the team on making this a easier task rather than going thru hoops (masking and stuff as in ESX 4.x).
Posted by: Ravi | 01/20/2012 at 03:40 PM
Yes - good call Ravi. The VMware engineering teams have done a lot of good work in this area for sure.
Posted by: Chogan | 01/23/2012 at 02:12 AM