ESXi 4.0: After a power outage, my RAID 1 disk array failed to come up properly for datastore2. Re-initializing the array using what is called a "Quick Init" procedure supported in the Adaptec RAID controller (rebuilds the partition back on the drives and leaves the data partition alone) seemed to resolve the failure and allowed me to bring the VMSERVER online. However, I'm unable to "see" the 600GB datastore volume (datastore2) that is supposed to be there.
I've tried the procedures here:
here:
and here:
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1011387
And read through I don't know how many posts to resolve this issue. The procedure to recover the lost partition as described in the first link above seems to work and did make the partition table "viewable" from the command line -- but the last step - running vmkfstools -V to discover the VMFS didn't help. VMSphere still doesn't see the volumne and from the command line, a df doesn't show the partition. It's like it is there, but the OS doesn't see it because it can't mount the disk.
The device I'm trying to get working is: /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0
I'm at my wits end on this. I've rescanned, rebooted, tried to mount in VMSphere (but the only option I get says that moving forward will delete the current disk layout and lose all data).
** I hope one of you can help to get this working so I can at least mount the volume and get my data. **
Also, one thing I did notice -- in following this doc: http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1011387
I was trying to mount and resignature from the command line -- seems like it should work, but esxcfg-volume -l shows nothing. And then if I wanted to force-mount, well I can't because I don't know what the VMFS UUID | label is -- so maybe I need to somehow relabel the volume. I don't know -- just grasping at straws at this point.
Thanks in advance for reading this long message and for any help you can provide.
It's almost there. The output of the following commands may help someone more skilled to pinpoint the problem:
~ # fdisk -l
Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 750.0 GB, 750035927040 bytes
255 heads, 63 sectors/track, 91186 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1 1 91186 732451481 fb VMFS
Disk /dev/disks/mpx.vmhba1:C0:T0:L0: 146.5 GB, 146590924800 bytes
64 heads, 32 sectors/track, 139800 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Device Boot Start End Blocks Id System
/dev/disks/mpx.vmhba1:C0:T0:L0p1 5 900 917504 5 Extended
/dev/disks/mpx.vmhba1:C0:T0:L0p2 901 4995 4193280 6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p3 4996 139800 138040320 fb VMFS
/dev/disks/mpx.vmhba1:C0:T0:L0p4 * 1 4 4080 4 FAT16 <32M
/dev/disks/mpx.vmhba1:C0:T0:L0p5 5 254 255984 6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p6 255 504 255984 6 FAT16
/dev/disks/mpx.vmhba1:C0:T0:L0p7 505 614 112624 fc VMKcore
/dev/disks/mpx.vmhba1:C0:T0:L0p8 615 900 292848 6 FAT16
Partition table entries are not in disk order
~ # ls /vmfs/volumes
4a528785-626bfbe1-da52-00145e5a5a6b Hypervisor2 datastore1
4a528785-a09b1431-e90c-00145e5a5a6b Hypervisor3 f4af117d-e768366c-60a8-44390c9dec9b
Hypervisor1 c2a427e4-2d317086-fef9-b5750d88536c f5ba2674-b560f63d-48a3-ef125fce46e0
~ # ls /vmfs/devices/lvm/
4a528785-6f27cf74-4d9e-00145e5a5a6b
~ # ls /vmfs/devices/disks/
mpx.vmhba1:C0:T0:L0 mpx.vmhba1:C0:T0:L0:8 vml.0000000000766d686261313a303a30:5
mpx.vmhba1:C0:T0:L0:1 mpx.vmhba1:C0:T1:L0 vml.0000000000766d686261313a303a30:6
mpx.vmhba1:C0:T0:L0:2 mpx.vmhba1:C0:T1:L0:1 vml.0000000000766d686261313a303a30:7
mpx.vmhba1:C0:T0:L0:3 vml.0000000000766d686261313a303a30 vml.0000000000766d686261313a303a30:8
mpx.vmhba1:C0:T0:L0:4 vml.0000000000766d686261313a303a30:1 vml.0000000000766d686261313a313a30
mpx.vmhba1:C0:T0:L0:5 vml.0000000000766d686261313a303a30:2 vml.0000000000766d686261313a313a30:1
mpx.vmhba1:C0:T0:L0:6 vml.0000000000766d686261313a303a30:3
mpx.vmhba1:C0:T0:L0:7 vml.0000000000766d686261313a303a30:4
~ # esxcfg-scsidevs -c
Device UID Device Type Console Device Size Plugin Display Name
mpx.vmhba1:C0:T0:L0 Direct-Access /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0 139800MB NMP Local ServeRA Disk (mpx.vmhba1:C0:T0:L0)
mpx.vmhba1:C0:T1:L0 Direct-Access /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0 715290MB NMP Local ServeRA Disk (mpx.vmhba1:C0:T1:L0)
naa.5005076a02148d2d Enclosure Svc Dev/vmfs/devices/genscsi/naa.5005076a02148d2d 0MB NMP Local IBM-ESXS Enclosure Svc Dev (naa.5005076a02148d2d)
~ # esxcfg-scsidevs -a
vmhba0 pata_serverworks link-n/a ide.vmhba0 (0:8.1) ServerWorks Serverworks HT1000 IDE/PATA Controller
vmhba1 aacraid link-n/a pscsi.vmhba1 (5:0.0) Adaptec ServeRAID 8k/8k-l8
vmhba2 sata_svw link-n/a sata.vmhba2 (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
vmhba32 pata_serverworks link-n/a ide.vmhba32 (0:8.1) ServerWorks Serverworks HT1000 IDE/PATA Controller
vmhba33 sata_svw link-n/a sata.vmhba33 (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
vmhba34 sata_svw link-n/a sata.vmhba34 (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
vmhba35 sata_svw link-n/a sata.vmhba35 (21:14.0) ServerWorks BCM5785 [HT1000] SATA (Native SATA Mode)
~ # esxcfg-scsidevs -u
Primary UID Other UID
mpx.vmhba1:C0:T0:L0 vml.0000000000766d686261313a303a30
mpx.vmhba1:C0:T1:L0 vml.0000000000766d686261313a313a30
naa.5005076a02148d2d vml.020d0000005005076a02148d2d565343373136
Any ideas?
You might want to see if there is anything helpful at http://sanbarrow.com/sickbay.html
Thanks for the suggestion. I'm looking at that resource, but I don't see anything that looks like it will help with my particular problem. I'll keep looking. If you or anyone else has any further suggestions I would be very grateful. --Thanks.
I would suggest a VMware support call. Anything at this point risks destroying the file. I would image the disk before going forward.
Thanks again for the suggestion. I tried calling support, but they say not until I buy a licensed version. I thought per incident support was available?
I've emailed sales to see if they can't provide some guidance; I called but nobody was there (it's Sunday; probably watching the Super Bowl). It seems from the website that per incident support is available:
But what do I know? I'd hate to buy and then be told tough luck -- you still have to buy the product. Not that buying is a bad thing, but as the product goes what I have with the free version works for my small business.
Anyway, for anyone industrious enough to review this thread I found a few more details that my be helpful. I did try taking the drive offline and mounting it to another LINUX box -- tried the open source drivers at: http://code.google.com/p/vmfs/ and also tried using vmfs-tools described here: http://planetvm.net/blog/?p=1592
..But no luck. The open source drivers were pretty cool as I could connect via ssh from one LINUX box to the ESXi box as follows:
user1@user1-virtual-machine:~/Downloads/vmfs_r95$ java -jar fvmfs.jar ssh://root:password@192.168.1.111/dev/disks/mpx.vmhba1:C0:T0:L0 info
Might come in handy with scripting backups and such down the road...
-- That worked against the vmfs partition that represents the good datastore, but didn't recognize the bad datastore as a valid vmfs partition. Neither did doing the same using those tools against the drive while mounte directly on another machine.
Something I did notice though -- I have another machine that is exactly the same; different VM's, but carved out and provisioned identically. I compared the output of fdisk -l and noted that the good machine against the disk in question had a FAT16 partition as well as the VMFS partition. (not sure if using the word "partition" in this case is accurate -- hope you understand my meaning). On the other machine with the drive in question I don't have a FAT16 partition. So maybe when I tried to rebuild the VMFS partition from one of the knowledge base notes (mentioned earlier in this thread) it grabbed a little too much? See below the good disk from one machine compared against the bad disk on the other machine:
Good Disk on Machine #1
~ # fdisk -l
Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 750.0 GB, 750035927040 bytes
255 heads, 63 sectors/track, 91186 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1 1 523 4193280 6 FAT16
Partition 1 does not end on cylinder boundary
/dev/disks/mpx.vmhba1:C0:T1:L0p2 523 91187 728263648+ fb VMFS
-----------------------------------------------------------------------------------------------------------------
Bad Disk on Machine #2 (note no FAT16)
~ # fdisk -l
Disk /dev/disks/mpx.vmhba1:C0:T1:L0: 750.0 GB, 750035927040 bytes
255 heads, 63 sectors/track, 91186 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/disks/mpx.vmhba1:C0:T1:L0p1 1 91186 732451481 fb VMFS
Thanks in advance for any light you may shed on this issue.