VMware Cloud Community
pjapk
Contributor
Contributor

Main vmfs hosting VMs "disappeared"

I've had a 3.5 lab box up & running for over a year now quite happily on an ML110 with a dedicated boot disk & a set of 4 x 500Gb disks in a local RAID-5 array running on an E200 controller for VM/Template storage.

However, recently I started getting errors on a physical SBS 2003 box which has a DC as a VM and discovered that the VMFS hosting the VMs had "disappeared". It sees the storage, but when I try to add it (in the hope that it just needs re-adding and will pick up the contents again) it says the disk is empty - i.e. no partitions on it!

Rebooting the host didn't make a difference so, after searching here, I hoped that it had simply lost it's partition table and needed re-creating. However, I'm having trouble doing this as it seems to only "half-see" the storage.

Following posts here I've tried the following, but fdisk won't play ball:

# esxcfg-vmhbadevs

vmhba1:0:0 /dev/cciss/c0d0

vmhba1:1:0 /dev/cciss/c0d1

(Sees both volumes - boot disk & RAID5 array)

# fdisk -lu /dev/cciss/c0d0

Disk /dev/cciss/c0d0: 160.0 GB, 160005980160 bytes

255 heads, 63 sectors/track, 19452 cylinders, total 312511680 sectors

Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System

/dev/cciss/c0d0p1 * 63 208844 104391 83 Linux

/dev/cciss/c0d0p2 208845 10442249 5116702+ 83 Linux

/dev/cciss/c0d0p3 10442250 307098539 148328145 fb Unknown

/dev/cciss/c0d0p4 307098540 312496379 2698920 f Win95 Ext'd (LBA)

/dev/cciss/c0d0p5 307098603 308207024 554211 82 Linux swap

/dev/cciss/c0d0p6 308207088 312287534 2040223+ 83 Linux

/dev/cciss/c0d0p7 312287598 312496379 104391 fc Unknown

(Sees contents of boot disk)

# fdisk -lu /dev/cciss/c0d1

#

(Doesn't see anything on second disk)

# fdisk /dev/cciss/c0d1

Unable to read /dev/cciss/c0d1

#

(Doesn't seem to want to manage disk with fdisk)

What makes it even more frustrating is that I'm currently 3500 miles away with VERY limited access! I can get a remote desktop to home in the evenings but I can't get physical access for two weeks. I have no means of checking that the RAID5 array is working, but the fact that it sees a disk of around 1.4Tb available when I go to add storage within Virtual Center says to me that the array must be functioning.

Can anyone suggest anything else I can do (remotely) to try to get this volume recognised?

Regards,

Paul

Reply
0 Kudos
22 Replies
RParker
Immortal
Immortal

It would appear the partition is unrecognized, so you can login with putty on that machine.

Change to root to do this, but then you can do fdisk -l to list the current partitions. Figure out that partition that isn't recognized by ESX (sda, sdb, etc..).

then type fdisk /dev/sdb (or whatever partition it is).

Delete the partition (this is non-destructive, it won't delete ANY data unless you format). Type 'm' for the menu to see a list of commands I think it's 'd' to delete parition.

Then you have to be in expert mode, which I believe is 'e'.

Then type 'n' to create a new partition, tell the beginning sector to start at 1024. Type 't' to set partition type as 'fb'. Then save the changes.

Then rescan the partition and see if that works. Otherwise you can resginature the volume (it's on the forums someplace) to enable resignature of a volume, but I don't remember exacly where to do it in the VI Client settings, but if you search you can find it.

Reply
0 Kudos
pjapk
Contributor
Contributor

Thanks for the reply but perhaps I wasn't quite clear enough. If you look at my original post you'll see that I've actually tried running fdisk but seem unable to "connect" to the second (main) disk in order to make any changes to its partition.

Paul

Reply
0 Kudos
athlon_crazy
Virtuoso
Virtuoso

"fdisk -l" without need to mention the partition number. This perhaps will lists all partition recognized by your system.

VMware newbie..

Zen Systems Sdn Bhd

www.no-x.org

http://www.no-x.org
Reply
0 Kudos
TomHowarth
Leadership
Leadership

What have you changed, have you upgraded your ESXi ??? how is /dev/sdb attached to your machine is it on the same controller or it is another one?? if it is a different controller have you verified that the controller is fully functional with it's tools???

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

Tom Howarth VCP / vExpert

VMware Communities User Moderator

Blog: www.planetvm.net

Contributing author for the upcoming book "VMware Virtual Infrastructure Security: Securing ESX and the Virtual Environment”.

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
Reply
0 Kudos
pjapk
Contributor
Contributor

I'll try fdisk -l alone when I get back to my personal machine later (currently at work).

Nothing at all has changed at all for at least 6 months, probably nearer 8 months in fact. For the last 16 months I've been away from home 90% of the time working overseas (only home every 6 - 8 weeks) so the box just sits there & runs doing its thing. Last uptime prior to this issue on it was 167 days and that reboot was for a power cut!

Connections are via SmartArray E200 controller with BBWC.

I haven't (as yet) been able to check controller status due to being away although I'm debating whether I can talk my partner through pointing out pertinant messages!

However, as the space is available for presentation when I try to add (create) a vmfs then I'm assuming (dangerous sometimes I know) that the controller sees the disks.

Regards,

Paul

Reply
0 Kudos
pjapk
Contributor
Contributor

fdisk -l doesn't appear to reveal any more than fdisk -ul used in the original post - output is identical.

Reply
0 Kudos
pjapk
Contributor
Contributor

Managed to talk my other half through reboot & watching out for SmartArray errors (iLO card on order!) and it appears to be hanging for some time on detecting disks with "Initializing..." so looks like it'll now have to wait until I'm home in a couple of weeks Smiley Sad

Thanks anyway folks. Hopefully it's recoverable once I'm in front of it...

Paul

Reply
0 Kudos
pjapk
Contributor
Contributor

I'm now home and have managed to get hands-on. Turns out two out of 4 disks in a Raid-5 array were "failing"! (s**t luck I guess!)

Anyway, I say "failing" as I think I've managed to get some replacement disks in, and rebuilt (from the arrays perspective) before all has been lost.

Basically, I'm now in a situation whereby the array is reporting good but ESX is not mounting the volume. Output from fdisk is as follows:

# fdisk -l

Disk /dev/cciss/c0d0: 160.0 GB, 160005980160 bytes

255 heads, 63 sectors/track, 19452 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/cciss/c0d0p1 * 1 13 104391 83 Linux

/dev/cciss/c0d0p2 14 650 5116702+ 83 Linux

/dev/cciss/c0d0p3 651 19116 148328145 fb Unknown

/dev/cciss/c0d0p4 19117 19452 2698920 f Win95 Ext'd (LBA)

/dev/cciss/c0d0p5 19117 19185 554211 82 Linux swap

/dev/cciss/c0d0p6 19186 19439 2040223+ 83 Linux

/dev/cciss/c0d0p7 19440 19452 104391 fc Unknown

Disk /dev/cciss/c0d1: 1500.2 GB, 1500222873600 bytes

255 heads, 63 sectors/track, 182391 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/cciss/c0d1p1 1 182391 1465055643+ fb Unknown

#

# fdisk -lu /dev/cciss/c0d1

Disk /dev/cciss/c0d1: 1500.2 GB, 1500222873600 bytes

255 heads, 63 sectors/track, 182391 cylinders, total 2930122800 sectors

Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System

/dev/cciss/c0d1p1 128 -1364855882 1465055643+ fb Unknown

#

It's the second volume giving me grief. ESX clearly sees a partition there, but how can I get it mounted? Trying to "add" it as storage within the GUI just wants to vape the disk. Even if I can just get it mounted far enough that I can copy all the data off to other disks, to then wipe completely & copy back over, that'd be fantastic!

So, what can I do to get this volume mounted in such a way that I can access it?

Further help/pointers would be MOST gratefully received!

Reply
0 Kudos
kjb007
Immortal
Immortal

When you have existing datastores, you don't use add storage, and you see why, it will wipe your data. After the disk is installed, and ESX can see it, going to the storage adapters section, and running 'rescan hbas' should bring that datastore back on line. If for some reason, you can not see it, check your logs to see if they mention snapshot or resignature. It would mean that the disk is seen, but has been disabled. If this is the case, go to the esx configuration tab, advanced settings, lvm, and set lvm.disallowsnapshotlun to 0. Then rescan. You should see your datastore.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
pjapk
Contributor
Contributor

I had already tried scanning for datastores to no avail. Just tried changing the snapshot advanced setting & rescanning again, still not showing.

I'm starting to wonder if it's worth sticking in a spare disk & throwing a temp install at it to see if that'll pick up the volume.

Will have a scan through some logfiles anyway just in case there's useful info in there...

Reply
0 Kudos
pjapk
Contributor
Contributor

(Thought I'd posted an update earlier so no idea where that went...)

Having looked through the logs I found the following:

# more /var/log/vmkernel

May 12 19:19:58 esx vmkernel: 0:01:12:37.203 cpu1:1034)Config: 414: "DisallowSnapshotLun" = 0, Old Value: 1, (Status: 0x0)

May 12 19:20:11 esx vmkernel: 0:01:12:49.457 cpu1:1033)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

and then:

# grep -i snap /var/log/vmkernel*

/var/log/vmkernel:May 12 19:19:58 esx vmkernel: 0:01:12:37.203 cpu1:1034)Config: 414: "DisallowSnapshotLun" = 0, Old Value: 1, (Status: 0x0)

#

I've seen another thread hereabouts where another identically sized volume was created from which the partition header could be copied with dd but I don't have the storage to create such a temp volume.

Is there anything else I can do? It's SOO frustrating having it "nearly" there but not quite!! I have until saturday night to get it fixed which is when I back overseas for another month Smiley Sad

Reply
0 Kudos
kjb007
Immortal
Immortal

Since you have the fdisk output, you can try recreating the partitions as you see. This is dangerous, however, and could cause serious issues if the options are incorrect.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
pjapk
Contributor
Contributor

I did start the process documented around here for recreating partitions with exact same settings but fdisk basically said the partition already existed so I stopped at that point.

It really seems as though it's in some sort of "limbo" state!

Reply
0 Kudos
kjb007
Immortal
Immortal

You will have to remove the existing and re-create. Is that what you did?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
pjapk
Contributor
Contributor

Ah, no, hadn't tried deleting. Is it (reasonably) safe to delete as long as exact same partition is re-created?

Just did a quick google but it's difficult to find such info in a sea of partitioning queries...

Reply
0 Kudos
kjb007
Immortal
Immortal

Partitioning is simply marking the beginning and end of sections of disk. So, fdisk can be dangerous, as you're messing with the layout of the disk itself, but as long as you don't rewrite or format filesystems on top of the partitions, you should be ok. If you have support, you can open an SR and work with vmware directly to help solve your issue. Since you're in limbo, this is another thing to try out. Hopefully you have a backup of the vm data that's stored on that datastore, just in case.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
pjapk
Contributor
Contributor

I bit the bullet & tried deleting/re-creating the partition info but still no joy. Still get the following in vmkernel log file:

May 14 10:20:16 esx vmkernel: 0:00:00:18.955 cpu0:1024)Mod: 936: Starting load for module: nfsclient R/O length: 0x12000 R/W length: 0x1000 Md5sum: 9ad109134aab31c4e81efa710a367550

May 14 10:20:16 esx vmkernel: 0:00:00:19.071 cpu1:1035)Mod: 1373: Module nfsclient: initFunc: 0x9a2694 text: 0x999000 data: 0x29022c0 bss: 0x2902b40 (writeable align 32)

May 14 10:20:16 esx vmkernel: 0:00:00:19.071 cpu1:1035)Mod: 1389: modLoaderHeap avail before: 7810232

May 14 10:20:16 esx vmkernel: 0:00:00:19.071 cpu1:1035)World: vm 1064: 895: Starting world nfsRemountHandler with flags 1

May 14 10:20:16 esx vmkernel: 0:00:00:19.072 cpu1:1035)FSS: 307: Registered fs nfs, module 13, fsTypeNum 0xb00f

May 14 10:20:16 esx vmkernel: 0:00:00:19.150 cpu1:1035)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

Unfortunately I don't have support for this install any longer nor do I have backups - in fact, one of the VMs WAS a backup machine for other machines I have.

I can't find anything else to try atm...

Reply
0 Kudos
kjb007
Immortal
Immortal

Do you see any errors in your vmkernel, vmkwarning file, and your messages file now?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
pjapk
Contributor
Contributor

Most useful looking info from vmkernel logfile is above in previous post, vmkwarning entries are as follows:

May 14 10:10:20 esx vmkernel: 1:16:02:58.931 cpu0:1024)VMNIX: WARNING: VMnix: 766: kdev_t=0x6810 nsects=2930122800 > Service Console max.

May 14 10:10:22 esx vmkernel: 1:16:03:00.961 cpu0:1024)VMNIX: WARNING: VMnix: 766: kdev_t=0x6810 nsects=2930122800 > Service Console max.

May 14 10:12:20 esx vmkernel: 1:16:04:58.580 cpu0:1034)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:12:20 esx vmkernel: 1:16:04:58.757 cpu0:1033)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:12:20 esx vmkernel: 1:16:04:59.139 cpu1:1035)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:16:28 esx vmkernel: 1:16:09:07.153 cpu1:1034)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:16:29 esx vmkernel: 1:16:09:07.238 cpu1:1033)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:16:29 esx vmkernel: 1:16:09:07.264 cpu1:1033)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:20:16 esx vmkernel: 0:00:00:03.317 cpu0:1024)VMNIX: WARNING: VmkDev: 2679: kdev_t=0x6810 nsects=2930122800 > Service Console max.

May 14 10:20:16 esx vmkernel: 0:00:00:04.809 cpu1:1035)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:20:16 esx vmkernel: 0:00:00:19.150 cpu1:1035)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

May 14 10:20:16 esx vmkernel: 0:00:00:27.614 cpu1:1034)WARNING: Vol3: 607: Couldn't read volume header from 4809f785-bb51bfff-2bd6-001b78578b76: Address temporarily unmapped

Nothing related that I can see in the messages logfile. Only mention of VMFS I can find since latest reboot is:

May 14 10:20:14 esx vmkload_mod: Using /usr/lib/vmware/vmkmod/vmfs2

May 14 10:20:14 esx vmkload_mod: Module load of vmfs2 succeeded.

May 14 10:20:14 esx vmware: Loading VMkernel module vmfs2 succeeded

May 14 10:20:15 esx vmware-late: Restoring NAS volumes succeeded

May 14 10:20:15 esx esxcfg-swiscsi: Software iSCSI is not enabled. Doing nothing.

The fact that it says "Address temporarily unmapped" implies to me that there must be a way to re-map it, but I can't find out how.

(I do appreciate your pointers!)

Reply
0 Kudos