VMware Cloud Community
lcinfrastructur
Contributor
Contributor

Cannot delete/unmount datastore from one of ESXi hosts

Hi team,

We have an environment with the following configuration:

4 ESXi hosts 5.5 U2 Standard License

EQualogic iSCSI datastores

We are trying to delete one of our datastores, it has been successfully deleted from 3 of the hosts, but one of them has lost communication to the DS and does not allow us to delete the pointer to it:

Call "HostDatastoreSystem.RemoveDatastore" for object "datastoreSystem-49" on vCenter Server "VCENTER" failed.

Operation failed, diagnostics report: Unable to query live VMFS state of volume.: No such file or directory

- No IO control enabled

- It's not being used as DS Heartbeat

- Completely empty

- HBA Rescan doesn't work

- Unmount/delete do not work either from Vcenter nor from host

- Restarting management agents doesn't work

- Syslog path is pointing to a correct, active location for the esx host

- Can't use partitioning tools as there is no vmfs disk in the ESXi side, as per below commands

When querying ESXi host directly, we do not even get the /vmfs path:

~ # esxcfg-scsidevs -m

naa.603be83f7d6eaba6fb5e35baa20500e7:1                            :1                                                        55efba86-7d62f6d6-703e-549f35a01922  0  DSNAME

~ # ls /vmfs/devices/disks/

ls: /vmfs/devices/disks/naa.603be83f7d6eaba6fb5e35baa20500e7:1: No such file or directory

ls: /vmfs/devices/disks/naa.603be83f7d6eaba6fb5e35baa20500e7: No such file or directory

Any ideas on how to solve it other than rebooting? With only 4 hosts in our system, setting one in maintenance mode and rebooting is highly impacting.

Thanks

Neus

Reply
0 Kudos
6 Replies
a_p_
Leadership
Leadership

Please clarify what exactly you did. "Deleting" a datastore will delete the VMFS partition on the LUN, and can only be done once, because the other hosts are using the same LUN. What you would usually do is to either delete a datastore on one host (or unmount it from all hosts), then detach the LUN on all hosts, and finally unpresent/delete the LUN on the storage system. After that you may - if you want - cleanup the detached LUNs from the command line (see e.g. http://kb.vmware.com/kb/2004605).

With only 4 hosts in our system, setting one in maintenance mode and rebooting is highly impacting

Not related to the issue. If this is the case, I highly recommend that you consider to add additional resources (e.g. additional hosts, more RAM, ...), to mitigate that risk in case of e.g. a host failure.

André

Reply
0 Kudos
lcinfrastructur
Contributor
Contributor

Ok, sorry for not being clear enough. When I say "delete" I mean the "delete" option on Vsphere (right-click on DS, delete) after unmounting from all hosts.

ESX2 lost connectivity to DS. ESX1, ESX3 and ESX4 didn't. ESX1,3,4 unmounted and detached the LUN correctly. ESX2 through the error I pasted on my previous post.

I unmounted and deleted the LUN from all hosts. It disappeared from all except from ESX2 as, again, cannot access the path.

In theory PDL in 5.5 is handled automatically and storage should be removed from environment, but with ESX2 this is not the case.

Attached image for reference,

Regards,

Neus

Reply
0 Kudos
soulman_yu
Contributor
Contributor

Just a silly question, or two

Did you follow steps in kb Andre sent you?

Do you maybe have some VMs that have snapshots? Sometimes, VMs when you take snapshot, keep information on what datastore they use to be, and then you do storage vmotion and forget about it, but when you try to remove that datastore, even when you moved all VMs off of it, it will not allow you (if you follow steps in above mentioned kb). You have to remove all snapshots before this datastore is released to be removed.

So, if you do not have snapshots and did follow steps, then disregard my questions

Thanks

S

Reply
0 Kudos
adrianych
Enthusiast
Enthusiast

I think i had faced this issue before as I am also using Dell EQL with connection to 6 ESXi hosts.

But logically, you do not have to remove or delete datastores more than once. vCenter will sync mounted datastores (unless its local HDD on the physical ESXi host).

There are a few main scenario where delete fails :

- vCenter Database fails to update changes or updates slowly

- LUN was removed from Dell EQL prematurely (deleted from one ESXi host then set offline in Dell EQL)

You can try the following (please choose one at a time, refrain from performing vCenter related task while tasks are carried out)

i. use Domain, Data Center or Cluster level "rescan for datastores" to try to sync all of the ESXi hosts

ii. remount the LUN (if it was only set offline), Domain, Data Center or Cluster level "rescan for datastores" . Then delete or unmount from one ESXi, use Domain, Data Center or Cluster level "rescan for datastores" @

iii. Purge and clean up your vCenter database (there was also a SQL script to run but I lost the link, also temp reduce logs to 1 or 3 days to reduce size), then run item i.

VMware KB: Purging old data from the database used by VMware vCenter Server

VMware KB: Reducing the size of the vCenter Server database when the rollup scripts take a long time...

Reply
0 Kudos
adrianych
Enthusiast
Enthusiast

Sometimes due to the complexity of multipathing or round-robin "multi-channel" we like to set, there maybe delays.

Best practise for Dell EQL, which I have done was to only delete or unmount from only one host, then use vCenter to rescan datastores at cluster level, then Data Center level. This should propagate to the other hosts.

Reply
0 Kudos
lcinfrastructur
Contributor
Contributor

If you read my initial post you will see there is no vmfs mountpoint, it fails when trying to access, therefore the KB cannot be applied, there was no detach option in my scenario, nor anything on the KB that could be of any help. Also, LUN has not been removed yet from storage side, so that is not the case either.

Anyway, in case it's useful to somebody, I have solved it by emptying and then unmounting a different datastore. This seems to have triggered some kind of refresh on some processes that do not get refreshed with a management restart, and it has finally cleared up / deleted the storage from the troublesome host.

Thanks for your help, the info on the RR policy has been very useful, I have indeed experienced this slowness in the past. Good to know!

Neus

Reply
0 Kudos