Re: RAID 1 Recovered: Can no longer mount VMFS (ca...

oconshaw · ‎02-02-2011

ESXi 4.0: After a power outage, my RAID 1 disk array failed to come up properly for datastore2. Re-initializing the array using what is called a "Quick Init" procedure supported in the Adaptec RAID controller (rebuilds the partition back on the drives and leaves the data partition alone) seemed to resolve the failure and allowed me to bring the VMSERVER online. However, I'm unable to "see" the 600GB datastore volume (datastore2) that is supposed to be there.

I've tried the procedures here:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1002281&sl...

here:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1020140&sl...

and here:

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1011387

And read through I don't know how many posts to resolve this issue. The procedure to recover the lost partition as described in the first link above seems to work and did make the partition table "viewable" from the command line -- but the last step - running vmkfstools -V to discover the VMFS didn't help. VMSphere still doesn't see the volumne and from the command line, a df doesn't show the partition. It's like it is there, but the OS doesn't see it because it can't mount the disk.

The device I'm trying to get working is: /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

I'm at my wits end on this. I've rescanned, rebooted, tried to mount in VMSphere (but the only option I get says that moving forward will delete the current disk layout and lose all data).

** I hope one of you can help to get this working so I can at least mount the volume and get my data. **

Also, one thing I did notice -- in following this doc: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1011387

I was trying to mount and resignature from the command line -- seems like it should work, but esxcfg-volume -l shows nothing. And then if I wanted to force-mount, well I can't because I don't know what the VMFS UUID | label is -- so maybe I need to somehow relabel the volume. I don't know -- just grasping at straws at this point.

Thanks in advance for reading this long message and for any help you can provide.

It's almost there. The output of the following commands may help someone more skilled to pinpoint the problem:

~ # fdisk -l

Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 750.0 GB, 750035927040 bytes
255 heads, 63 sectors/track, 91186 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1 1 91186 732451481 fb VMFS

Disk /dev/disks/mpx.vmhba1:C0:T0:L0: 146.5 GB, 146590924800 bytes
64 heads, 32 sectors/track, 139800 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

                          Device Boot      Start         End      Blocks Id System
/dev/disks/mpx.vmhba1:C0:T0:L0p1             5       900    917504    5 Extended
/dev/disks/mpx.vmhba1:C0:T0:L0p2           901      4995   4193280    6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p3          4996    139800 138040320   fb VMFS
/dev/disks/mpx.vmhba1:C0:T0:L0p4   *         1         4      4080    4 FAT16 <32M
/dev/disks/mpx.vmhba1:C0:T0:L0p5             5       254    255984    6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p6           255       504    255984    6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p7           505       614    112624   fc VMKcore
/dev/disks/mpx.vmhba1:C0:T0:L0p8           615       900    292848    6 FAT16

Partition table entries are not in disk order

~ # ls /vmfs/volumes
4a528785-626bfbe1-da52-00145e5a5a6b Hypervisor2                          datastore1
4a528785-a09b1431-e90c-00145e5a5a6b Hypervisor3                          f4af117d-e768366c-60a8-44390c9dec9b
Hypervisor1                          c2a427e4-2d317086-fef9-b5750d88536c f5ba2674-b560f63d-48a3-ef125fce46e0

~ # ls /vmfs/devices/lvm/
4a528785-6f27cf74-4d9e-00145e5a5a6b

~ # ls /vmfs/devices/disks/
mpx.vmhba1:C0:T0:L0                   mpx.vmhba1:C0:T0:L0:8                 vml.0000000000766d686261313a303a30:5
mpx.vmhba1:C0:T0:L0:1                 mpx.vmhba1:C0:T1:L0                   vml.0000000000766d686261313a303a30:6
mpx.vmhba1:C0:T0:L0:2                 mpx.vmhba1:C0:T1:L0:1                 vml.0000000000766d686261313a303a30:7
mpx.vmhba1:C0:T0:L0:3                 vml.0000000000766d686261313a303a30    vml.0000000000766d686261313a303a30:8
mpx.vmhba1:C0:T0:L0:4                 vml.0000000000766d686261313a303a30:1 vml.0000000000766d686261313a313a30
mpx.vmhba1:C0:T0:L0:5                 vml.0000000000766d686261313a303a30:2 vml.0000000000766d686261313a313a30:1
mpx.vmhba1:C0:T0:L0:6                 vml.0000000000766d686261313a303a30:3
mpx.vmhba1:C0:T0:L0:7                 vml.0000000000766d686261313a303a30:4

~ # esxcfg-scsidevs -c
Device UID           Device Type      Console Device                              Size      Plugin Display Name
mpx.vmhba1:C0:T0:L0 Direct-Access    /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0     139800MB NMP     Local ServeRA Disk (mpx.vmhba1:C0:T0:L0)
mpx.vmhba1:C0:T1:L0 Direct-Access    /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0     715290MB NMP     Local ServeRA Disk (mpx.vmhba1:C0:T1:L0)
naa.5005076a02148d2d Enclosure Svc Dev/vmfs/devices/genscsi/naa.5005076a02148d2d 0MB       NMP     Local IBM-ESXS Enclosure Svc Dev (naa.5005076a02148d2d)

~ # esxcfg-scsidevs -a
vmhba0 pata_serverworks link-n/a ide.vmhba0                              (0:8.1) ServerWorks Serverworks HT1000 IDE/PATA Controller
vmhba1 aacraid           link-n/a pscsi.vmhba1                            (5:0.0) Adaptec ServeRAID 8k/8k-l8
vmhba2 sata_svw          link-n/a sata.vmhba2                             (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
vmhba32 pata_serverworks link-n/a ide.vmhba32                             (0:8.1) ServerWorks Serverworks HT1000 IDE/PATA Controller
vmhba33 sata_svw          link-n/a sata.vmhba33                            (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
vmhba34 sata_svw          link-n/a sata.vmhba34                            (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
vmhba35 sata_svw          link-n/a sata.vmhba35                            (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)

~ # esxcfg-scsidevs -u
Primary UID                                                     Other UID
mpx.vmhba1:C0:T0:L0                                             vml.0000000000766d686261313a303a30
mpx.vmhba1:C0:T1:L0                                             vml.0000000000766d686261313a313a30
naa.5005076a02148d2d                                            vml.020d0000005005076a02148d2d565343373136

oconshaw · ‎02-04-2011

Any ideas?

DSTAVERT · ‎02-04-2011

You might want to see if there is anything helpful at http://sanbarrow.com/sickbay.html

-- David -- VMware Communities Moderator

oconshaw · ‎02-04-2011

Thanks for the suggestion. I'm looking at that resource, but I don't see anything that looks like it will help with my particular problem. I'll keep looking. If you or anyone else has any further suggestions I would be very grateful. --Thanks.

DSTAVERT · ‎02-04-2011

I would suggest a VMware support call. Anything at this point risks destroying the file. I would image the disk before going forward.

-- David -- VMware Communities Moderator

oconshaw · ‎02-06-2011

Thanks again for the suggestion. I tried calling support, but they say not until I buy a licensed version. I thought per incident support was available?

I've emailed sales to see if they can't provide some guidance; I called but nobody was there (it's Sunday; probably watching the Super Bowl). It seems from the website that per incident support is available:

http://store.vmware.com/store?Action=DisplayPage&Env=BASE&Locale=en_US&SiteID=vmware&id=ProductDetai...

But what do I know? I'd hate to buy and then be told tough luck -- you still have to buy the product. Not that buying is a bad thing, but as the product goes what I have with the free version works for my small business.

Anyway, for anyone industrious enough to review this thread I found a few more details that my be helpful. I did try taking the drive offline and mounting it to another LINUX box -- tried the open source drivers at: http://code.google.com/p/vmfs/ and also tried using vmfs-tools described here: http://planetvm.net/blog/?p=1592

..But no luck. The open source drivers were pretty cool as I could connect via ssh from one LINUX box to the ESXi box as follows:

user1@user1-virtual-machine:~/Downloads/vmfs_r95$ java -jar fvmfs.jar ssh://root:password@192.168.1.111/dev/disks/mpx.vmhba1:C0:T0:L0 info

Might come in handy with scripting backups and such down the road...

-- That worked against the vmfs partition that represents the good datastore, but didn't recognize the bad datastore as a valid vmfs partition. Neither did doing the same using those tools against the drive while mounte directly on another machine.

Something I did notice though -- I have another machine that is exactly the same; different VM's, but carved out and provisioned identically. I compared the output of fdisk -l and noted that the good machine against the disk in question had a FAT16 partition as well as the VMFS partition. (not sure if using the word "partition" in this case is accurate -- hope you understand my meaning). On the other machine with the drive in question I don't have a FAT16 partition. So maybe when I tried to rebuild the VMFS partition from one of the knowledge base notes (mentioned earlier in this thread) it grabbed a little too much? See below the good disk from one machine compared against the bad disk on the other machine:

Good Disk on Machine #1

~ # fdisk -l

Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 750.0 GB, 750035927040 bytes
255 heads, 63 sectors/track, 91186 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

                          Device Boot      Start         End      Blocks Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1             1       523   4193280    6 FAT16
Partition 1 does not end on cylinder boundary
/dev/disks/mpx.vmhba1:C0:T1:L0p2           523     91187 728263648+ fb VMFS

-----------------------------------------------------------------------------------------------------------------

Bad Disk on Machine #2 (note no FAT16)

~ # fdisk -l

Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 750.0 GB, 750035927040 bytes
255 heads, 63 sectors/track, 91186 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1 1 91186 732451481 fb VMFS

Thanks in advance for any light you may shed on this issue.

All

RAID 1 Recovered: Can no longer mount VMFS (cannot see datastore2)