A customer of mine has four esxi hosts connected to SAN and boot from it, one of these hosts had some hardware failures and what I am going to illustrate happens also after replacing the faulty parts with new ones (HBA failures):
esxcfg-volume -M <the name of the datastore>
after adding the datastores everything works fine, but, if the host is rebooted, all of the datastores are gone again.
What can we do instead of re-installing the ESXi again?
Might be this below mentioned step will work for you.
manually added the datastores again, then removed the VMs from inventory.
Re-add datastores, specifying "Assign new signature" Takes a few seconds for each.
Re-add the VMs using Browse datastore, and adding the .vmx
Reboot. Datastores are there. YES.
look at the above thread, see if it's any help. Though it's for version 5.5 and not marked as answered. But many people have contributed in that discussion.
but what I am facing is not fixed by rescanning, the devices are already present, but cannot be added unless i use the command I mentioned.
what you are facing is very much similar to what they are explaining in above KB. but that is not for version 6.0
since you have mentioned that there was HBA failure, and after replacing that, this has started happening.
I have a feeling that now those LUNs are being recognised as Replica or Snapshot of Original LUNs.
considering that you have already used that esxcfg-volume -M command to force mount those datastores. See what you get when you run following command on that host
esxcli storage vmfs snapshot list
I have an update, but I am facing a difficulty, the customer site is in another city and a junior engineer is doing the troubleshooting on behalf of me, so I can't tell him to use all of the commands here, but let me first tell you what I have today:
Does this has to do anything with storage snapshots on the SAN (I am not sure if the user is snapshotting), we get a warning message while adding the datastore which indicates that there's something wrong with the "signature".
what do you think guys?
I have found two log entries in VMkernel.log indicating that snapshot luns are detected:
2016-06-05T13:29:35.501Z cpu3:32987)LVM: 10060: Device eui.58c2323c1f94000c:1 detected to be a snapshot:
2016-06-05T13:29:35.519Z cpu3:32987)LVM: 10060: Device eui.58c2323c1f940006:1 detected to be a snapshot:
I will ask the site engineer to run the commands for mounting the snapshot (detected) VMFS datastores, but do you know why VMware - or let's say the hosts here - detect VMFS as snapshots?
I've responded to your message a week ago, but it seems that it wasn't delivered, anyway, Yes, I've solved the issue.
it's all about the sorting of LUNs and the LUN ID, you know, when you assign a LUN to a server it gets an ID on that server, so, in the case of a shared LUN/datastore, it must be with the same LUN ID across all hosts sharing that LUN, meaning, if the shared LUN ID is "5" on a host, all of the other hosts that share the same lun must have it with the same ID "5".
the LUN get's the ID when it's assigned to a server from the SAN side, so keep the LUN ID the same and your issue will be solved, it's all about the LUN signature, when the LUN ID is the same, the signature is the same, if the ID differs across the hosts, the signature will also be different and it will be required to format it from one or more hosts to be able to access it.