I am truly hoping that someone can help us unravel the mystery my group is currently trying to resolve.. It doesn't make much sense, but I will do my best to explain in succint terms -
Late in the afternoon, across a time span of 3 hours, 3 successive virtual machines reported that they had lost their underlying disks, and the message mentioned mounting an ISO file to install an operating system. 3 hours later, once the support team had begun triage, one of our engineers did a rescan of HBAs, and vCenter remarked that it was removing a number of datastores. My interpretation of that is that vCenter believed it no longer had access to the underlying LUNS, and therefore decided it would remove the datastores. At some point in the troubleshooting, we could see that the underlying LUNS were still presented to the ESX hosts (4.0), but even more interestingly, when navigating to "add storage", the LUNS appeared as unformatted disks which were ready to be added as new datastores. I should mention here that our back-end storage array is an HP XP series model.
We engaged with Vmware support and HP storage for hours upon hours, however no resolution was found. VMware did a data dump of one of the LUNS that had held all of the OS VMDKS of the affected vms, and saw that zeros were written to the blocks.
We are struggling to find out the cause of what has happened. If anyone has any ideas, I would be extraoridarily grateful for your feedback.
Check your SAN zoning and LUN masking and see if any incorrect hosts have access to those LUNs. One example would be some versions of Windows write signatures on disks they don't recognize and that has been known to kill VMFS datastores. They usually ask first, but I wouldn't trust it.