VMware Cloud Community
DanielJWoodhous
Contributor
Contributor

Missing VMFS Volume

Hi everybody,

One of my clients has recently had a few issues with a HP MSA2012i where one of the disks died in the array (RAID 6, so wasn't an issue) and was replaced with a new one. The array started to rebuild the disk as it should. However, due to some unknown issue, it decided to take the array offline. The client brought the array back up and all was well, both from a usability and rebuild point of view.

After rebuilding the disk, one of the volumes disappeared from the VMware ESX servers (two ESX servers running 3.5 Update 2). The volume (LUN 1), which was called DRSANSATA1, exists on the same physical RAID set as another LUN (LUN 0), called DRSANSATA0.

From the VMware console (and CLI), DRSANSATA0 can be seen and used as normal. DRSANSATA1 is missing.

There are now no issues with the SAN RAID set. Host Mapping on the SAN is correct.

Within Storage Adapters, LUN 1 can be seen and all the paths are correct. Performing a Rescan does nothing.

Within Storage, if you select 'Add Storage', you can see LUN 1 and from the 'Current Disk Layout' it even remarks that it is 'VMFS' formatted. Again, performing a Refresh does nothing

Any ideas on how to re-associate this volume with ESX without blowing it away?

Many thanks in advance,

Daniel

0 Kudos
19 Replies
Lightbulb
Virtuoso
Virtuoso

Not really familiar with the MSA2000 series. You may want to check this out though

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=...

0 Kudos
mcowger
Immortal
Immortal

is it possible it got a new SCSI VPID and is therefore beeing seen as a snapshot?






--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
Lightbulb
Virtuoso
Virtuoso

For the LUN ID to change wouldn't there need to be a controller type mismatch or something similar (Not really a super storage guy)? This is a new piece of hardware so maybe there is an issue that HP is not aware of or has not fessed up to.

I think you might want to call HP before setting EnableResignature and trying to bring the Volume back online

0 Kudos
mcowger
Immortal
Immortal

That entirely depends on the storage array, and some are better than others. The MSAs are well known to be in the 'others' group Smiley Happy






--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
mike_laspina
Champion
Champion

Hi,

Yes it's possible to bring it back provided it not badly corrupted.

Does the host that is still running see that storage and properly access it?

What does esxcfg-mpath -l report on the host with the issue?

also check the vmkwarning logs

e.g.

cat /var/log/vmkwarning

http://blog.laspina.ca/ vExpert 2009
0 Kudos
DanielJWoodhous
Contributor
Contributor

Hmm, that's exactly what I thought after replacing a controller in another MSA2000 two weeks ago and finding that the LUN ID had changed.

However, I've already tried the workaround of setting LVM.DisallowSnapshotLUN to 0. Still no joy.

0 Kudos
DanielJWoodhous
Contributor
Contributor

Thank you for your response. The paths all seem to be correct, as well as the preferred path.

I had checked the logs earlier and nothing was being reported. However, I have just rebooted one of the servers and now we have something to work on. The issue is kind of what I thought it was, and wishing wouldn't happen. Here's the message: -

Jan 15 16:40:23 mhgsvesx00789 vmkernel: 0:00:01:16.196 cpu3:1040)WARNING: Vol3: 611: Couldn't read volume header from 4907489e-a8e1739e-a009-00215aaa8368: Address temporarily unmapped

Any ideas on how to rescue a VMFS header?

Many thanks,

Daniel

0 Kudos
DanielJWoodhous
Contributor
Contributor

Now we all know HP equipment is the best! Shame that this is a rebagged Dothill product then Smiley Happy

0 Kudos
mike_laspina
Champion
Champion

Yes. But I'm not sure it will help you.

Here's how it was done.

http://communities.vmware.com/message/1033534

http://blog.laspina.ca/ vExpert 2009
0 Kudos
SuryaVMware
Expert
Expert

Can you post the fdisk -lu output from one of the host?

Also run the command vmkfstools -V and post the last 4-5 lines in the /var/log/vmkernel.

-Surya

0 Kudos
DanielJWoodhous
Contributor
Contributor

Hi Surya

Here are the outputs you requested: -

fdisk -lu

Disk /dev/sda: 2000.3 GB, 2000375775232 bytes

255 heads, 63 sectors/track, 243198 cylinders, total 3906983936 sectors

Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System

/dev/sda1 128 -387991427 1953487871 fb Unknown

Disk /dev/sdb: 2000.3 GB, 2000375775232 bytes

255 heads, 63 sectors/track, 243198 cylinders, total 3906983936 sectors

Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System

/dev/sdb1 128 -387991427 1953487871 fb Unknown

Disk /dev/cciss/c0d0: 73.3 GB, 73372631040 bytes

255 heads, 63 sectors/track, 8920 cylinders, total 143305920 sectors

Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System

/dev/cciss/c0d0p1 * 63 208844 104391 83 Linux

/dev/cciss/c0d0p2 208845 20691719 10241437+ 83 Linux

/dev/cciss/c0d0p3 20691720 23968979 1638630 82 Linux swap

/dev/cciss/c0d0p4 23968980 143299799 59665410 f Win95 Ext'd (LBA)

/dev/cciss/c0d0p5 23969043 32162129 4096543+ 83 Linux

/dev/cciss/c0d0p6 32162193 40355279 4096543+ 83 Linux

/dev/cciss/c0d0p7 40355343 48548429 4096543+ 83 Linux

/dev/cciss/c0d0p8 48548493 143090954 47271231 fb Unknown

/dev/cciss/c0d0p9 143091018 143299799 104391 fc Unknown

Disk /dev/sdb: is the affected LUN

As for doing a Rescan, the only messages presented are as follows: -

Jan 15 23:12:10 mhgsvesx00788 vmkernel: 0:03:54:51.775 cpu2:1041)WARNING: Res3: 1053: resource 4 (cluster 9) already freed by another host: This may be a non-issue

Jan 16 01:08:13 mhgsvesx00788 vmkernel: 0:05:50:54.548 cpu3:1041)WARNING: Vol3: 611: Couldn't read volume header from 4907489e-a8e1739e-a009-00215aaa8368: Address temporarily unmapped

Daniel

0 Kudos
DanielJWoodhous
Contributor
Contributor

Yep, that's exactly what I was worried about.

Not sure as to whether to fix this partition or blow it away. It only holds replicated VM's anyway, so not important. However, might have a little fun first!

0 Kudos
SuryaVMware
Expert
Expert

Wuld it be possible for you to send the dd dump of both the sda and sdb? I need the first 100K.

dd if=/dev/sd<X> of=/tmp/sd<X>.out bs=1024 count=100

If you are not comfortable posting the dump here you can PM me.

-Surya

0 Kudos
mike_laspina
Champion
Champion

Device Boot Start End Blocks Id System

/dev/sda1 128 -387991427 1953487871 fb Unknown

Device Boot Start End Blocks Id System

/dev/sdb1 128 -387991427 1953487871 fb Unknown

This does not look good, you should not have a negative value for the ending sector.

Is the volume backed by /dev/sda1 functioning correctly?

http://blog.laspina.ca/ vExpert 2009
0 Kudos
DanielJWoodhous
Contributor
Contributor

Yes, this device is working perfectly (DRSANSATA0).

0 Kudos
mike_laspina
Champion
Champion

Disk /dev/sdd: 2199.0 GB, 2199023190016 bytes

255 heads, 63 sectors/track, 267349 cylinders, total 4294967168 sectors

Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System

/dev/sdd1 128 -5612 2147480778+ fb Unknown

I just created a 2TB LUN at home and it appears that if the raw device is near 2TB the partition end block value is displayed as a negative integer.

So that does not look like a real issue.

http://blog.laspina.ca/ vExpert 2009
0 Kudos
SuryaVMware
Expert
Expert

Mike,

It's not a negitive value, just that fdisk can not show the complete value and uses a "-" inbetween the start and end blocks.

Try sfdisk -luB this should show you the correct value.

-Surya

0 Kudos
mike_laspina
Champion
Champion

Thanks,

Looks like it's just a field reporting width issue with fdisk.

http://blog.laspina.ca/ vExpert 2009
0 Kudos
SuryaVMware
Expert
Expert

Sorry Wrong post ..... removing it .. .

-Surya

0 Kudos