3.0.2 (as of yet unpatched) - We keep losing our L... - VMware Technology Network VMTN

VMware Cloud Community

So... We've had this happen about 5 times now, and HP support has been consistantly useless... the lvl 1 tech will remove and re-add all of our iSCSI SW initiator settings, rescan the HBA fifty times, refresh the storage configuration ten times, and then try to format one of our production LUNs before I yell at him. We then get to their lvl 2 tech support, where they get the log files off the host, and tell me that they don't know how to fix it -they know what the problem is (aka partition tables are gone) but they don't know how to fix it. They get VMware techs on the phone, and then the VMware techs use partedUtil to fix the problem.

That's entirely too much running around (and this shouldn't be happening anyways).

So, here's the environment:

-Two ESX 3.0.2 hosts, unpatched (FQDNs are esx01.countygp.ab.ca and esx02.countygp.ab.ca)... HP DL380 G5

-One MSA 1510i iSCSI SAN (it is in the HCL)

Since ESX caches the partition tables, I have our systems running on one host... however I am unable to patch, as doing so will end up removing the cached partition tables and we'll be down.. again. One host works, the other I rebooted and found our partition tables to be gone again.

esx01 is not functional, esx02 is functional (for now).

VMKwarnings for esx01 are attached in a txt file called vmkwarnings.

esx02 shows the following for fdisk -l

# fdisk -l

Disk /dev/sda: 124.5 GB, 124552151040 bytes

255 heads, 63 sectors/track, 15142 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 1 4960 39841136 fb Unknown

Disk /dev/sdb: 193.2 GB, 193270579200 bytes

255 heads, 63 sectors/track, 23497 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sdb1 1 23497 188739588+ fb Unknown

Disk /dev/sdc: 124.5 GB, 124552151040 bytes

64 heads, 32 sectors/track, 118782 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System

/dev/sdc1 1 118782 121632752 fb Unknown

Disk /dev/sdd: 274.8 GB, 274878955008 bytes

64 heads, 32 sectors/track, 262144 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System

/dev/sdd1 1 262144 268435392 fb Unknown

Disk /dev/sde: 700.0 GB, 700080717312 bytes

255 heads, 63 sectors/track, 85113 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sde1 1 85113 683670108+ fb Unknown

Disk /dev/sdf: 1030.7 GB, 1030793199104 bytes

64 heads, 32 sectors/track, 983040 cylinders

Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System

/dev/sdf1 1 983040 1006632896 fb Unknown

Disk /dev/sdg: 42.9 GB, 42949017600 bytes

255 heads, 63 sectors/track, 5221 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sdg1 1 5221 41937618+ fb Unknown

Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes

255 heads, 63 sectors/track, 8920 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/cciss/c0d0p1 * 1 13 104391 83 Linux

/dev/cciss/c0d0p2 14 650 5116702+ 83 Linux

/dev/cciss/c0d0p3 651 8584 63729855 fb Unknown

/dev/cciss/c0d0p4 8585 8920 2698920 f Win95 Ext'd (LBA)

/dev/cciss/c0d0p5 8908 8920 104391 fc Unknown

/dev/cciss/c0d0p6 8585 8653 554179+ 82 Linux swap

/dev/cciss/c0d0p7 8654 8907 2040223+ 83 Linux

Partition table entries are not in disk order

#

Can anyone help me? Why is this happening... how can I prevent it from continuing to happen? I promise I'll patch, I just need to get these tables up again so that I can reboot my other host and not have the shit hit the fan.

3 Replies

By my count you have about 82 patches available to you for ESX 3.0.2. Maybe one of the fixes is in those patches. Is your HP MSA SAN firmware up to the proper version as well? I've seen some reports of the lower tier HP SANs having availability issues even though they are in the HCL.

Ben

yes, I'm in the process of downloading them all now (yay for select all and the java-based downloader).

While they are downloading, however, do you know how I can rebuild the partition tables?

Since they're all showing up the way they should be in my other host, would it not stand to reason that they should be the same in the host currently offline?

oh wow...

I removed all of my discovery IP addresses, disabled the iSCSI HBA, rebooted the host, re-enabled the HBA, re-added the discovery IPs, re-scanned the HBA, and then refreshed my storage, and all my LUNs are showing up again.

Kind of a pain, but I guess that's the easiest it has ever been.

Since I hate to create a question and not give points to someone, would you happen to know why this would fix it?

(and in the patch list there are several relating to iSCSI...)