VMware Cloud Community
Hompie
Contributor
Contributor
Jump to solution

ESX 3.0.1: VMFS3 lun disappeared - VM gone - Analysis tips?

Hi,

ESX 3.0.1. While doing a VMotion one of the machines reported a problem about not being able to access the vmx file.

After several trials and as we couldn;t see an error we stopped the VM, did a reboot, then it didn't boot anymore.

At the same time we lost access to the lun, even though it's still visible with a lun id there is no more VMFS3 file system mapped to it.

So we see teh lun having 100% free, and can create a new VMFS3 if we want.

In other words the VMFS3 is gone. How, we don;t know.

var/log/messages shows nothing of intrest.

vmkernel log shows :

Nov 16 11:56:45 HOST001 vmkernel: 6:00:32:12.497 cpu2:1067)World: vm 1067: 3864: Killing self with status=0x0:Success

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.562 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)WARNING: Fil3: 1564: Failed to reserve volume f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 2 4552c6d8 8b2a5aa 1300125b bab85872 4 1 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)WARNING: Fil3: 1564: Failed to reserve volume f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 2 4552c6d8 8b2a5aa 1300125b bab85872 4 1 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.734 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.741 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

The VM is already restored on another LUn for the time being, so we have some time to analyse the problem, byt the lun is really empty, what happened?

0 Kudos
1 Solution

Accepted Solutions
MikeTedescucci
Enthusiast
Enthusiast
Jump to solution

Call support. Same thing happened to me. Partition table of file system was corrupt.. I rebuilt the filesystem with a few easy commands and all of my VM's were back. If you would like, I can post the commands that I performed, but it would be best to get the info from them..

Mike

View solution in original post

0 Kudos
23 Replies
vangoge
Hot Shot
Hot Shot
Jump to solution

Hi Hompie,

Have you done a unattended install of a ESX server using the kickstart file that was seeing the same lun??

Gert

0 Kudos
Hompie
Contributor
Contributor
Jump to solution

Hi,

No, this is an installation which was done months ago, and then upgraded two weeks ago to 3.0.1.

there are for luns on the system, one of them got lost. The Vmotion never had problems before, i also don;t think it;s vmotion. The VMFS simply disappeared.

0 Kudos
sebek
Enthusiast
Enthusiast
Jump to solution

Can you see that lun under Storage Adapters in Config Tab?

0 Kudos
MikeTedescucci
Enthusiast
Enthusiast
Jump to solution

Call support. Same thing happened to me. Partition table of file system was corrupt.. I rebuilt the filesystem with a few easy commands and all of my VM's were back. If you would like, I can post the commands that I performed, but it would be best to get the info from them..

Mike

0 Kudos
Hompie
Contributor
Contributor
Jump to solution

Hi,

I can see the LUN, but indeed i suspect that the Partition table is corrupt.

0 Kudos
Hompie
Contributor
Contributor
Jump to solution

Hi,

Please post the commands, i'm still waiting for feedback from support Smiley Happy

0 Kudos
vangoge
Hot Shot
Hot Shot
Jump to solution

Hi I have seen this at a client of us.

Check offline for document

0 Kudos
Hompie
Contributor
Contributor
Jump to solution

i,

Problem is already solved

0 Kudos
Hompie
Contributor
Contributor
Jump to solution

Hi,

After logging a call with Vmware they remotely solved the problem.

They advise to allways open a call, and note that this is not part of regular support but they will try to help.

Apparently the VMFS partition table was destroyed, this can be caused by windows writing a signature to the disk.

We don;t have any windows hosts attached to the lun, so that is very weird, but i can imagine it went wrong druing the 3.0.1 upgrade.

Anyhow, they solved it using fdisk, and advised to not try this at home, allways cal support.

0 Kudos
jagroep
Contributor
Contributor
Jump to solution

Hi,

I have the same problem. I lost a Lun on ESX.

Can you please post the instructions here, or mail me the instructions to **jagroep71*at*hotmail.com**

Please leave out the asterixes in the email address.

Thanks,

Jagroep

0 Kudos
kimono
Expert
Expert
Jump to solution

Hi , Is it possible to get this document also? Our test environment faced this yesterday.

kim

/kimono/
0 Kudos
ICT-Freak
Enthusiast
Enthusiast
Jump to solution

Hi can you please send me the document? My e-mail address is afokkema \{at} gmail dot com

0 Kudos
daniel_uk
Hot Shot
Hot Shot
Jump to solution

wouldnt it be easier if Vmware are reading for them to post as a KB!

0 Kudos
VirtualKenneth
Virtuoso
Virtuoso
Jump to solution

Same problem here, still no commands published by anyone?

0 Kudos
kimono
Expert
Expert
Jump to solution

I fixed it by holding my breath and using fdisk .. i'd have to dig up the commands in the office. (home sick with flu at the mo).

/kimono/
0 Kudos
VirtualKenneth
Virtuoso
Virtuoso
Jump to solution

Thanx, I've already contacted VMware support... did some fancy trick's with fdisk but unfortunately the LUN was really damaged and fdisk couldn't same me no more

0 Kudos
doctormiru
Enthusiast
Enthusiast
Jump to solution

We had the same issue after reinstalling a host.

VMware support helped me to fix the lost partition table information using fdisk. The root cause was the san cable which was plugged in while unattend setup of esx. The esx host does not care about existing VMFS volumes and destroys the tables. By VMware this is fixed in 3.02

If anyone likes I can post the required "dangerous" commands to use fdisk in this case of horror.

Regards

Michael

0 Kudos
etamir
Enthusiast
Enthusiast
Jump to solution

please post those commands just for reference...

thanks.

0 Kudos
maxl80
Contributor
Contributor
Jump to solution

I've the same problems here with three 3.0.1 servers. Two still operate on the LUNs, but one cant get access to the vmfs3 LUN, so the filesytem can't be destroyed, but corrupted. Any chance on getting hands on the fdisk parameters for fixing this?

Regarding to the comment, that it should be fixed in 3.0.2 ... well i tried to reinstall one of the four servers with 3.0.2 and still no access to the LUN. I read it could have caused by the fibre cable ?!

0 Kudos