MRoushdy
Hot Shot
Hot Shot

ESXi 6.0 loses datastores after rebooting

Hello,

A customer of mine has four esxi hosts connected to SAN and boot from it, one of these hosts had some hardware failures and what I am going to illustrate happens also after replacing the faulty parts with new ones (HBA failures):

  • all of the attached datastores, from the SAN are not detected as datastores, they are all detected as devices only, but they are working fine on the other three hosts, if we try to add datastores the only available option is to format the devices, so we used the following command to add the datastores forcibly.

esxcfg-volume -M <the name of the datastore>

after adding the datastores everything works fine, but, if the host is rebooted, all of the datastores are gone again.

What can we do instead of re-installing the ESXi again?

Thanks,

vEXPERT - VCAP-DCV - Blog: arabitnetwork.com | YouTube: youtube.com/c/MohamedRoushdy
9 Replies
Paltelkalpesh
Enthusiast
Enthusiast

Might be this below mentioned step will work for you.

manually added the datastores again, then removed the VMs from inventory.

Reboot.

Re-add datastores, specifying "Assign new signature"  Takes a few seconds for each.

Re-add the VMs using Browse datastore, and adding the .vmx

Reboot.  Datastores are there.  YES.

0 Kudos
npadmani
Virtuoso
Virtuoso

Datastore not mounted after reboot of ESXi5.5

look at the above thread, see if it's any help. Though it's for version 5.5 and not marked as answered. But many people have contributed in that discussion.

Narendra Padmani VCIX6-DCV | VCIX7-CMA | VCI | TOGAF 9 Certified
0 Kudos
Paltelkalpesh
Enthusiast
Enthusiast

Did you check logs while reboot host, might be you get some help to trace issue through logs.

0 Kudos
MRoushdy
Hot Shot
Hot Shot

Thanks npadmani,

but what I am facing is not fixed by rescanning, the devices are already present, but cannot be added unless i use the command I mentioned.

vEXPERT - VCAP-DCV - Blog: arabitnetwork.com | YouTube: youtube.com/c/MohamedRoushdy
0 Kudos
npadmani
Virtuoso
Virtuoso

VMware KB: Persistent mount of a snapshot LUN may not persist across reboots in VMware ESXi 5.0/5.1/...

what you are facing is very much similar to what they are explaining in above KB. but that is not for version 6.0

since you have mentioned that there was HBA failure, and after replacing that, this has started happening.

I have a feeling that now those LUNs are being recognised as Replica or Snapshot of Original LUNs.

considering that you have already used that esxcfg-volume -M command to force mount those datastores. See what you get when you run following command on that host

esxcli storage vmfs snapshot list

Narendra Padmani VCIX6-DCV | VCIX7-CMA | VCI | TOGAF 9 Certified
0 Kudos
MRoushdy
Hot Shot
Hot Shot

Hello everyone,

I have an update, but I am facing a difficulty, the customer site is in another city and a junior engineer is doing the troubleshooting on behalf of me, so I can't tell him to use all of the commands here, but let me first tell you what I have today:

  1. the engineer couldn't attach and mount the volumes.
  2. the engineer re-installed the ESXi on a new boot LUN.
  3. remounted all of the shared datastores back to the server.
  4. the issue remains!!!!!!!!!
  5. one more thing, had attached the same shared datastores to an older IBM server running ESXi 5.5, by adding the datastores from the GUI, the server asked for formatting the datasores, which means that there's something wrong with the LUNs.

Does this has to do anything with storage snapshots on the SAN (I am not sure if the user is snapshotting), we get a warning message while adding the datastore which indicates that there's something wrong with the "signature".

what do you think guys?

vEXPERT - VCAP-DCV - Blog: arabitnetwork.com | YouTube: youtube.com/c/MohamedRoushdy
0 Kudos
MRoushdy
Hot Shot
Hot Shot

Hello again,

I have found two log entries in VMkernel.log indicating that snapshot luns are detected:

2016-06-05T13:29:35.501Z cpu3:32987)LVM: 10060: Device eui.58c2323c1f94000c:1 detected to be a snapshot:

2016-06-05T13:29:35.519Z cpu3:32987)LVM: 10060: Device eui.58c2323c1f940006:1 detected to be a snapshot:

I will ask the site engineer to run the commands for mounting the snapshot (detected) VMFS datastores, but do you know why VMware - or let's say the hosts here - detect VMFS as snapshots?

vEXPERT - VCAP-DCV - Blog: arabitnetwork.com | YouTube: youtube.com/c/MohamedRoushdy
0 Kudos
zangirolami
Contributor
Contributor

Hello friend, you managed to solve your problem in esx 6.0? Yesterday was the same with me.


MRoushdy
Hot Shot
Hot Shot

Hello Zangirolami,

I've responded to your message a week ago, but it seems that it wasn't delivered, anyway, Yes, I've solved the issue.

it's all about the sorting of LUNs and the LUN ID, you know, when you assign a LUN to a server it gets an ID on that server, so, in the case of a shared LUN/datastore, it must be with the same LUN ID across all hosts sharing that LUN, meaning, if the shared LUN ID is "5" on a host, all of the other hosts that share the same lun must have it with the same ID "5".

the LUN get's the ID when it's assigned to a server from the SAN side, so keep the LUN ID the same and your issue will be solved, it's all about the LUN signature, when the LUN ID is the same, the signature is the same, if the ID differs across the hosts, the signature will also be different and it will be required to format it from one or more hosts to be able to access it.

vEXPERT - VCAP-DCV - Blog: arabitnetwork.com | YouTube: youtube.com/c/MohamedRoushdy
0 Kudos