Solved: ESX 3.0.1: VMFS3 lun disappeared - VM gone - Analy...

Hompie · ‎11-16-2006

Hi,

ESX 3.0.1. While doing a VMotion one of the machines reported a problem about not being able to access the vmx file.

After several trials and as we couldn;t see an error we stopped the VM, did a reboot, then it didn't boot anymore.

At the same time we lost access to the lun, even though it's still visible with a lun id there is no more VMFS3 file system mapped to it.

So we see teh lun having 100% free, and can create a new VMFS3 if we want.

In other words the VMFS3 is gone. How, we don;t know.

var/log/messages shows nothing of intrest.

vmkernel log shows :

Nov 16 11:56:45 HOST001 vmkernel: 6:00:32:12.497 cpu2:1067)World: vm 1067: 3864: Killing self with status=0x0:Success

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.562 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)WARNING: Fil3: 1564: Failed to reserve volume f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 2 4552c6d8 8b2a5aa 1300125b bab85872 4 1 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)WARNING: Fil3: 1564: Failed to reserve volume f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 2 4552c6d8 8b2a5aa 1300125b bab85872 4 1 0 0 0 0 0

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.734 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure

Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.741 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0

The VM is already restored on another LUn for the time being, so we have some time to analyse the problem, byt the lun is really empty, what happened?

MikeTedescucci · ‎11-16-2006

Call support. Same thing happened to me. Partition table of file system was corrupt.. I rebuilt the filesystem with a few easy commands and all of my VM's were back. If you would like, I can post the commands that I performed, but it would be best to get the info from them..

Mike

View solution in original post

vangoge · ‎11-16-2006

Hi Hompie,

Have you done a unattended install of a ESX server using the kickstart file that was seeing the same lun??

Gert

Hompie · ‎11-16-2006

Hi,

No, this is an installation which was done months ago, and then upgraded two weeks ago to 3.0.1.

there are for luns on the system, one of them got lost. The Vmotion never had problems before, i also don;t think it;s vmotion. The VMFS simply disappeared.

sebek · ‎11-16-2006

Can you see that lun under Storage Adapters in Config Tab?

MikeTedescucci · ‎11-16-2006

Call support. Same thing happened to me. Partition table of file system was corrupt.. I rebuilt the filesystem with a few easy commands and all of my VM's were back. If you would like, I can post the commands that I performed, but it would be best to get the info from them..

Mike

Hompie · ‎11-16-2006

Hi,

I can see the LUN, but indeed i suspect that the Partition table is corrupt.

Hompie · ‎11-16-2006

Hi,

Please post the commands, i'm still waiting for feedback from support

vangoge · ‎11-17-2006

Hi I have seen this at a client of us.

Check offline for document

Hompie · ‎11-17-2006

i,

Problem is already solved

Hompie · ‎11-17-2006

Hi,

After logging a call with Vmware they remotely solved the problem.

They advise to allways open a call, and note that this is not part of regular support but they will try to help.

Apparently the VMFS partition table was destroyed, this can be caused by windows writing a signature to the disk.

We don;t have any windows hosts attached to the lun, so that is very weird, but i can imagine it went wrong druing the 3.0.1 upgrade.

Anyhow, they solved it using fdisk, and advised to not try this at home, allways cal support.

jagroep · ‎05-04-2007

Hi,

I have the same problem. I lost a Lun on ESX.

Can you please post the instructions here, or mail me the instructions to **jagroep71*at*hotmail.com**

Please leave out the asterixes in the email address.

Thanks,

Jagroep

kimono · ‎07-04-2007

Hi , Is it possible to get this document also? Our test environment faced this yesterday.

kim

/kimono/

ICT-Freak · ‎07-13-2007

Hi can you please send me the document? My e-mail address is afokkema \{at} gmail dot com

daniel_uk · ‎07-13-2007

wouldnt it be easier if Vmware are reading for them to post as a KB!

VirtualKenneth · ‎07-19-2007

Same problem here, still no commands published by anyone?

kimono · ‎07-19-2007

I fixed it by holding my breath and using fdisk .. i'd have to dig up the commands in the office. (home sick with flu at the mo).

/kimono/

VirtualKenneth · ‎07-20-2007

Thanx, I've already contacted VMware support... did some fancy trick's with fdisk but unfortunately the LUN was really damaged and fdisk couldn't same me no more

doctormiru · ‎08-13-2007

We had the same issue after reinstalling a host.

VMware support helped me to fix the lost partition table information using fdisk. The root cause was the san cable which was plugged in while unattend setup of esx. The esx host does not care about existing VMFS volumes and destroys the tables. By VMware this is fixed in 3.02

If anyone likes I can post the required "dangerous" commands to use fdisk in this case of horror.

Regards

Michael

etamir · ‎09-09-2007

please post those commands just for reference...

thanks.

maxl80 · ‎09-09-2007

I've the same problems here with three 3.0.1 servers. Two still operate on the LUNs, but one cant get access to the vmfs3 LUN, so the filesytem can't be destroyed, but corrupted. Any chance on getting hands on the fdisk parameters for fixing this?

Regarding to the comment, that it should be fixed in 3.0.2 ... well i tried to reinstall one of the four servers with 3.0.2 and still no access to the LUN. I read it could have caused by the fibre cable ?!

All

ESX 3.0.1: VMFS3 lun disappeared - VM gone - Analysis tips?