2 days ago, my ESX 3.5 server suffered a pretty major failure. I went into the datacenter to find several failed disks - lots of blinking orange lights. I was unable to repair it at the scene, so pulled the server.
Anyway, the hardware is a Dell PE2800 with:
3 x 15K 73GB disks as a RAID5 array
3 x 10K 300GB disks as a RAID5 array
2 x 10K 73GB disks as HotSpares.
The first array had ESX installed (/dev/sda) and the remaining space, approx 120GB as VMFS storage1 (/dev/sda3).
The second array was all as VMFS storage2 (/dev/sdb1).
When I powered up again, all the the disks appeared healthy, but the RAID controller did automatically go through a recheck/rebuild on both arrays - but when it came back it did not boot ESX, but hung at the GRUB _ (blinking underscore) and refused to move on. This was a good sign, I thought, because at least it knew about the partitions and got as far as looking in /boot (suggesting the data was intact).
After lots of searching here and elsewhere, I determined I would have to reinstall ESX - butensuring I retained all vmfs stores. I installed ESX 3.5U5 successfully.
However, only the second array - storage3 /dev/sdb1 - is recognised by ESX as a VMFS store. And I'm able to see the VM data in it ok.
storage1 does not appear.
Disk /dev/sda: 146.5 GB, 146548981760 bytes
255 heads, 63 sectors/track, 17816 cylinders, total 286228480 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 63 417689 208813+ 83 Linux
/dev/sda2 417690 17189549 8385930 83 Linux
/dev/sda3 17189550 280703744 131757097+ fb Unknown
/dev/sda4 280703745 286214039 2755147+ f Win95 Ext'd (LBA)
/dev/sda5 280703808 284896709 2096451 83 Linux
/dev/sda6 284896773 286005194 554211 82 Linux swap
/dev/sda7 286005258 286214039 104391 fc Unknown
vmhba0:0:0 /dev/sda
vmhba0:1:0 /dev/sdb
Disk vmhba0:0:0 /dev/sda (139760MB) has 1 paths and policy of Fixed
Local 2:14.0 vmhba0:0:0 On active preferred
Disk vmhba0:1:0 /dev/sdb (572160MB) has 1 paths and policy of Fixed
Local 2:14.0 vmhba0:1:0 On active preferred
So far so good.
vmhba0:1:0:1 /dev/sdb1 48642536-75d5e1f9-92d3-001143ec7a00
# ls -la /vmfs/devices/disks/vmhba0:0:0:3
-rw------- 1 root root 134919267840 Nov 24 11:20 /vmfs/devices/disks/vmhba0:0:0:3
Again promising, since this is the correct size and location.
So I know the partition table has not been damaged. Most solutions in this forum relate to repairing the partition table - so that doesn't apply here.
# hexdump -C vmhba0:0:0:3 | less
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
0000a400 30 ff 2f ff 30 ff 53 4f 48 00 00 00 00 00 00 00 |0./.0.SOH.......|
0000a410 41 00 72 00 70 00 68 00 69 00 63 00 20 00 47 00 |A.r.p.h.i.c. .G.|
0000a420 79 00 6f 00 6b 00 61 00 69 00 6c 00 65 00 6e 00 |y.o.k.a.i.l.e.n.|
0000a430 6d 00 65 00 6e 00 74 00 61 00 69 00 20 00 48 00 |m.e.n.t.a.i. .H.|
0000a440 65 00 61 00 76 00 79 00 20 00 4a 00 49 00 53 00 |e.a.v.y. .J.I.S.|
0000a450 00 00 00 00 00 00 00 00 41 00 72 00 70 00 68 00 |........A.r.p.h.|
0000a460 69 00 63 00 20 00 47 00 79 00 6f 00 6b 00 61 00 |i.c. .G.y.o.k.a.|
0000a470 69 00 6c 00 65 00 6e 00 6d 00 65 00 6e 00 74 00 |i.l.e.n.m.e.n.t.|
0000a480 61 00 69 00 20 00 4c 00 69 00 67 00 68 00 74 00 |a.i. .L.i.g.h.t.|
Now that doesn't look so good - but if I search for data that I know is inside one of the VMs (such as my email address) then I can find that - which gives me some glimmer of hope.
00c9f110 b8 ff ff ff 3a 00 70 00 73 00 65 00 72 00 76 00 |....:.p.s.e.r.v.|
00c9f120 65 00 72 00 3a 00 70 00 67 00 72 00 65 00 67 00 |e.r.:.p.g.r.e.g.|
00c9f130 67 00 40 00 70 00 67 00 72 00 65 00 67 00 67 00 |g.@.p.g.r.e.g.g.|
00c9f140 34 00 3a 00 2f 00 4f 00 53 00 53 00 43 00 56 00 |4.:./.O.S.S.C.V.|
Rescanning doesn't give me anything more in Storage (SCSI Target 0 - the 130GB partition is listed in Storage Adapters ok - but it always was)... but if I go into Storage / Add Storage, it does show me the 130GB partition but that would (and claims to) destroy all the data on the partition if I were to add it.
I also tried the Resignaturing instructions at but this did not make any difference.
I've temporarily put vsftpd on the box and am copying down /vmfs/devices/disks/vmhba0:0:0:3 so at least I have the file outside of ESX. The VMFS partition only has a single (100GB) VM.
Thoughts? and thanks.
PG
Forgot to add this:
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.723 cpu3:1035)SCSI: 863: GetInfo for adapter vmhba0, , max_vports=0, vports_inuse=0, linktype=0, s
tate=0, failreason=0, rv=-1, sts=bad001f
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.724 cpu3:1035)ScsiScan: 398: Path 'vmhba0:C0:T0:L0': Vendor: 'MegaRAID' Model: 'LD 0 RAID5 139G' Rev: '516A
'
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.724 cpu3:1035)ScsiScan: 399: Type: 0x0, ANSI rev: 2
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.724 cpu3:1035)ScsiUid: 776: Path 'vmhba0:C0:T0:L0' does not support VPD Serial Id page.
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.724 cpu3:1035)ScsiUid: 847: Path 'vmhba0:C0:T0:L0' does not support VPD Device Id page.
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.724 cpu3:1035)ScsiScan: 524: Path 'vmhba0:C0:T0:L0': No standard UID: Failure
Nov 24 11:44:22 core01 vmkernel: 0:12:28:04.724 cpu3:1035)ScsiScan: 398: Path 'vmhba0:C0:T1:L0': Vendor: 'MegaRAID' Model: 'LD 1 RAID5 572G' Rev: '516A
'
Hi,
Run the following from the Service Console:
- esxcfg-volume --list
If this returns the missing data store then try:
- esxcfg-volume --mount <VMFS UUID|label>
If storage 1 does not stay mounted over a restart then run the above command but use --persistent-mount <VMFS UUID|label>
Let me know if any of this works.
Thanks and kind regards.
Message was edited by: ThompsG edit grammar and spelling
Thanks for your reply. However I don't have an esxcfg-volume command - I believe that came in with ESX 4.x, whereas I am using ESX 3.5.
Sorry about that. Forgot what forum I was in
Given that you have tried the LVM.EnableResignature and SCSI.CompareLUNNumber options there does not appear to be many other options available to you.
I was wondering if it possible to attach the VMkernel log (/var/log/vmkernel) from after the ESX servers boots? Won't mind looking through to see if anything jumps out.
Kind regards.
Im afraid I posted the vmkernel already (see first comment)...
I couldn't wait on rebuilding the machine so I ended up taking a backup of the VMFS disk with dd and moving it off the ESX box. I was then able to extract the /usr UFS partition and roughly knowing the data I wanted page through it rebuilding the missing files.
I still have the VMFS image, so if anything magical crops up in the future, I'll try a more thorough restoration of the contents.
Thanks