Hi,
ESX 3.0.1. While doing a VMotion one of the machines reported a problem about not being able to access the vmx file.
After several trials and as we couldn;t see an error we stopped the VM, did a reboot, then it didn't boot anymore.
At the same time we lost access to the lun, even though it's still visible with a lun id there is no more VMFS3 file system mapped to it.
So we see teh lun having 100% free, and can create a new VMFS3 if we want.
In other words the VMFS3 is gone. How, we don;t know.
var/log/messages shows nothing of intrest.
vmkernel log shows :
Nov 16 11:56:45 HOST001 vmkernel: 6:00:32:12.497 cpu2:1067)World: vm 1067: 3864: Killing self with status=0x0:Success
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.562 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)WARNING: Fil3: 1564: Failed to reserve volume f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 2 4552c6d8 8b2a5aa 1300125b bab85872 4 1 0 0 0 0 0
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.723 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)WARNING: Fil3: 1564: Failed to reserve volume f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.729 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 2 4552c6d8 8b2a5aa 1300125b bab85872 4 1 0 0 0 0 0
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.734 cpu5:1037)LVM: 2294: Could not open device vmhba1:0:0:1, vol \[4552c6d7-8e1ea291-a708-00137258, 4552c6d7-8e1ea291-a708-00137258b8ba, 1]: Failure
Nov 16 11:57:23 HOST001 vmkernel: 6:00:32:50.741 cpu5:1037)FSS: 343: Failed with status 0xbad000e for f530 28 1 4552c6d8 8b2a5aa 1300125b bab85872 0 0 0 0 0 0 0
The VM is already restored on another LUn for the time being, so we have some time to analyse the problem, byt the lun is really empty, what happened?
Call support. Same thing happened to me. Partition table of file system was corrupt.. I rebuilt the filesystem with a few easy commands and all of my VM's were back. If you would like, I can post the commands that I performed, but it would be best to get the info from them..
Mike
Hi Hompie,
Have you done a unattended install of a ESX server using the kickstart file that was seeing the same lun??
Gert
Hi,
No, this is an installation which was done months ago, and then upgraded two weeks ago to 3.0.1.
there are for luns on the system, one of them got lost. The Vmotion never had problems before, i also don;t think it;s vmotion. The VMFS simply disappeared.
Can you see that lun under Storage Adapters in Config Tab?
Call support. Same thing happened to me. Partition table of file system was corrupt.. I rebuilt the filesystem with a few easy commands and all of my VM's were back. If you would like, I can post the commands that I performed, but it would be best to get the info from them..
Mike
Hi,
I can see the LUN, but indeed i suspect that the Partition table is corrupt.
Hi,
Please post the commands, i'm still waiting for feedback from support
Hi I have seen this at a client of us.
Check offline for document
i,
Problem is already solved
Hi,
After logging a call with Vmware they remotely solved the problem.
They advise to allways open a call, and note that this is not part of regular support but they will try to help.
Apparently the VMFS partition table was destroyed, this can be caused by windows writing a signature to the disk.
We don;t have any windows hosts attached to the lun, so that is very weird, but i can imagine it went wrong druing the 3.0.1 upgrade.
Anyhow, they solved it using fdisk, and advised to not try this at home, allways cal support.
Hi,
I have the same problem. I lost a Lun on ESX.
Can you please post the instructions here, or mail me the instructions to **jagroep71*at*hotmail.com**
Please leave out the asterixes in the email address.
Thanks,
Jagroep
Hi , Is it possible to get this document also? Our test environment faced this yesterday.
kim
Hi can you please send me the document? My e-mail address is afokkema \{at} gmail dot com
wouldnt it be easier if Vmware are reading for them to post as a KB!
Same problem here, still no commands published by anyone?
I fixed it by holding my breath and using fdisk .. i'd have to dig up the commands in the office. (home sick with flu at the mo).
Thanx, I've already contacted VMware support... did some fancy trick's with fdisk but unfortunately the LUN was really damaged and fdisk couldn't same me no more
We had the same issue after reinstalling a host.
VMware support helped me to fix the lost partition table information using fdisk. The root cause was the san cable which was plugged in while unattend setup of esx. The esx host does not care about existing VMFS volumes and destroys the tables. By VMware this is fixed in 3.02
If anyone likes I can post the required "dangerous" commands to use fdisk in this case of horror.
Regards
Michael
please post those commands just for reference...
thanks.
I've the same problems here with three 3.0.1 servers. Two still operate on the LUNs, but one cant get access to the vmfs3 LUN, so the filesytem can't be destroyed, but corrupted. Any chance on getting hands on the fdisk parameters for fixing this?
Regarding to the comment, that it should be fixed in 3.0.2 ... well i tried to reinstall one of the four servers with 3.0.2 and still no access to the LUN. I read it could have caused by the fibre cable ?!