VMware Cloud Community
HarisB
Contributor
Contributor

Problem with VMFS storage

Hi all,

I have a test server with RAID 10 disk config, and 5-6 VMs on it, local storage using 6 port SATA controller and 4 SATA drives. Few days ago one of the disks developed some bad blocks, so I pulled it out and ran the server for couple of days with 3 disks, eventually replacing failed drive with another SATA drive. The controller for some reason didn't like the new drive (it should start the rebuild process on its own automatically, but this didn't happen and I suspect because the drive was only slightly smaller than original drives (80 GB drives, but real capacity may be a hundred MB less and array can't be rebuilt, not enough space)). I've seen this exact same controller (Dell 2610SA) rebuild arrays (RAID 10) before just fine.

Then I abandoned that and pulled the replacement drive out, started the server which went to "freeing unused kernel memory" line of startup and hung there. Tried rebooting few times, same thing. At this time (as before) RAID status is "degraded" but seems to be working to the extent, at least it tries booting.

So I've got a bigger drive today, connected it, and the controller didn't like it neither. Tried few things, nothing helped.

I then decided to leave RAID in degraded state, reinstall ESX without touching VMFS partition, get it up and running, move VMs off to another server and then torch the whole thing and reinitialize array to get it back into shape. This is where it got interesting, and where I am stuck now. In the reinstall phase, I've got several messages stating "partition signature is invalid" for sda disk, from /tmp/sda to some other folder. These gave me option to ignore which I did, and after going through the partition menu I checked "leave VMFS partition intact", formatted others and everything went fine. I don't remember if I've got a message for VMFS partition or not.

After the setup I connected to the box just fine using VC, but now I have "The VMware ESX Server does not have persistent storage" message under configuration tab. I do however see my "ESX4:storage1" VMFS partition I created long time ago, where my VMs used to be. Capacity, used and free values are correct, nothing out of ordinary. Looks just fine. However when I browse datastore, I can see only 3 files, vpxa-.. no trace of my folders where VMs used to be. If I go to console and to /vmfs/volumes/, there is nothing there. On the configuration tab, I have "Location" state "/vmfs/volumes/47b2..." where that partition used to be.

So my question is this - I'm suspecting the signature on the VMFS partition got screwed up somehow, and would like to know if there is a way to restore it. The 5 Vms I have there are not critical but it would take me couple of days to build them from scratch.

Any suggestions and advices welcome.

Thanks

Haris

Reply
0 Kudos
5 Replies
ronmanu07
Enthusiast
Enthusiast

Hi HarisB,

I had the same problem a couple of months back, and I think I tried to clone the affected vm I had and it worked...I think!

But I cannot remember clearly if that worked or not, if it doesn't from my experience you will have to rebuild these vm's.

Good luck.?:|

Reply
0 Kudos
HarisB
Contributor
Contributor

That won't work, my VMs show as "Orphaned" and there are only 3 files in datastore, nothing related to VM.

Reply
0 Kudos
Rumple
Virtuoso
Virtuoso

In the advanced settings of VC you will see an option to resignature luns. I would try to give that a shot.

Reply
0 Kudos
HarisB
Contributor
Contributor

Hi,

Found the option under VC Host configuration > Advanced Settings > LVM > LVM.EnableResignature, and changed this to 1. Nothing changed from what I can see, tried rescanning storage few times and afterwards browsing datastore I can still see same 3 files.

After rebooting server the problem is still there but now I see 6 files, all are vpxa*.log and one vpxa-index. Under path they show []/var/log/vmware/vpx.

One interesting thing is if I try to "Add Extent" to my VMFS I get the error message "The request refers to an object that no longer exist or has never existed" and then it proceeds to the screen where you are supposed to specify extent. On another host that works correctly I don't get that message, config is almost identical.

Thanks

Reply
0 Kudos
sgilstrap
Contributor
Contributor

I hope you have resolved this by now. My version doesn't seem to be entirely related to what caused your problem. In the interest of helping others, here is my contribution:

We had the same symptoms after upgrading ESX from 3.0.1 to 3.5

"The VMware ESX Server does not have persistent storage" message appeared at the top of the configuration tab. I noticed that the iSCSI LUNs still showed up under the storage section, but their device information was wrong - they pointed to device IDs from a different ESX host's LUN. When working in the storage adapters screen, the targets would not show up, nor would they discover. Likewise, I could not view or edit the LUN properties, I would be met by errors indicating the path was bad. The funny part is I found a defect in the VI Client that had me fooled for a bit. When referring to another host's configuration tab, I noticed that if I had the iSCSI software adapter highlighted when clicking back to the defective host, the previous target details (from the good host) still remained on the screen. I have a suspicion that there is a software defect that caused the device IDs to get translated over to the haywire host. It couldn't be coincidence that the device IDs were from the particular host that I looked at for reference.

In the interest of removing the erroneous device ID information without destroying the machines, our first step was to remove (not disconnect, actually remove) the host from the virtual center and log into it directly by pointing the VI Client to the server and logging in as root. We set the iSCSI software adapter initiator to disabled via the adapter properties (this requires a reboot of the ESX server). We then stripped the initiator from the LUN as well as unmapped it. Reversing the process as though creating the connection from scratch should have done the trick, but we found that the iSCSI software client port was disabled under the firewall properties (in the security profile section of the configuration tab). Who knows why the port was disabled, but after turning it on, we were able to get targets through the initiator. When the targets discovered, we simply re-added the host to the VC.

Reply
0 Kudos