I have 3 hosts (version 5.0 build 1476327) that are showing the datastores present but the backing device has now gone. Oddly others in the same cluster are fine, I'm mildly concerned about the host crashing due to a APD condition. Unfortunately evacuating the hosts is probably not an option due to licensing, can anyone advise of the best/safest course of action.
Ouch and yes you should be concerned..even though they have done some substantial work on APDs over ESXi4x... the risk is stll there...think about clusters and how you would react if kneecapped...hopefully your hosts haven't zombied.
Check responsiveness to running commands via telnet or console. If its unresponsive, you may have to RDP into the guests on the hosts and shut them down then cold boot them. Run an RVTools export on what vms are where or make a list in case you loos some vmx files.
If not and they are responsive, vacate the host and bounce it
I had my storage team yank some drives from one of my clusters w/o checking with me first recently and put together this little ditty from some good posts on the matter...a practice they will never do again after the outage they cased.
I'd give credit where its do on this one but I never tracked the original posts...as always thank you...
Best Practice: How to correctly remove a LUN from an ESX host
Yes, at first glance, you may be forgiven for thinking that this subject hardly warrants a blog post. But for those of you who have suffered the consequences of an All Paths Down (APD) condition, you'll know why this is so important.
Let's recap on what APD actually is.
APD is when there are no longer any active paths to a storage device from the ESX, yet the ESX continues to try to access that device. When hostd tries to open a disk device, a number of commands such as read capacity and read requests to validate the partition table are sent. If the device is in APD, these commands will be retried until they time out. The problem is that hostd is responsible for a number of other tasks as well, not just opening devices. One task is ESX to vCenter communication, and if hostd is blocked waiting for a device to open, it may not respond in a timely enough fashion to these other tasks. One consequence is that you might observe your ESX hosts disconnecting from vCenter.
We have made a number of improvements to how we handle APD conditions over the last number of releases, but prevention is better than cure, so I wanted to use this post to highlight once again the best practices for removing a LUN from an ESX host and avoid APD:
Just a quick thought. Where do you see these datastores? Maybe they show up because one or more of the VMs have active snapshots, where something (e.g. the CD-ROM) used files from that datastore at the time the snapshot was created.
I'm pretty sure the VM's were removed from the inventory prior to yanking the plug on the storage, it was a bit of a communication break down between me and the storage guy (more my fault tbh). I'm just wondering what would be the consequences of right clicking the datastore and un-mounting as I usually would?
Is there a procedure for dealing with this sort of scenario? Its not ideal I know, I know the procedure very well and I have done it a thousand times. The trouble is downtime is difficult and so are vMotions due to silly individuals purchasing the wrong licences.