ESX 4 boot fails at vsd-mount

AramiS_1970 · ‎07-22-2009

Hi,

we have two ESX4 hosts connected to an IBM DS3400 storage system.

With FC cables plugged in both the ESX stops booting at vsd-mount.

With FC cables pulgged out, the ESX boot is ok.

After the ESX is up and running, if i plug the fc cables i can see and succesfully work with my storage LUNS, as well starting al VM.

At any reboot i have to unplug fc cables to allows esx to came up! The problem is the same on both servers.

The problem happens in all cases with or without any lun presented to the esx.

Any ideas?

Thanks.

krishnaprasad · ‎07-22-2009

when you face this issue, you can get into a maintenance shell right? can you see the logs in /var/log or post the same in this thread?

goodserg · ‎07-24-2009

I have same boot!

krishnaprasad · ‎07-24-2009

the vsd-mount tries to load the service console vmdk file during bootup.

can you see any vsd-mount script in your /etc/rc.d/rc3.d directory?

goodserg · ‎07-25-2009

I have another server with same problem

this result command LS

# ls

K10psacct K86nfslock S12restorecond S56xinetd

K15vmware-webAccess K87portmap S12syslog S58ntpd

K20nfs K89netplugd S13mcstrans S62vmware-late

K35winbind K89rdisc S18rpcidmapd S90crond

K50netconsole K92iptables S19rpcgssd S97vmware-vmkauthd

K50snmpd S00vmkstart S19slpd S98mgmt-vmware

K50snmptrapd S01vmware S21wsman S98sfcbd-watchdog

K69rpcsvcgssd S08ip6tables S26lm_sensors S99local

K74nscd S09firewall S44acpid S99vmware-autostart

K75netfs S10network S55sshd S99vmware-vpxa

mbaharde · ‎08-05-2009

Same issue here with a Clariion array.... anybody have any updates or any luck getting past this?

kschurig · ‎08-11-2009

Same here with DELL PE 710 Server and HP MSA 1000 SAN.

joukom · ‎08-12-2009

At least one 100% reproducible case for this error is to install ESX4 on a raid array on PERC5 controller (with service console installed on a datastore on the same controller) and then swap the controller to PERC6. Controller imports the raid array OK, and starts booting OK, but fails at vsd-mount.

I've done this several times, and apparently something changes with the volume setup. Both controllers use the same megaraid_sas driver, the LUN path is preserved, as is the vmfs id. In recovery console I can see the datastore device under /vmfs/devices/disks, but /vmfs/volumes is empty. I guess that when the controller id is different, ESX will not mount the datastore.

Does anyone know if there is a CLI command to import the datastore? I know it could be done via vSphere Client, but as the system does not get past vsd-mount, I cannot connect to it using the client.

I've got one ESX server that has VM datastores on same LUN where the ESX is installed to, and I should swap that raid controller, but unless someone knows a way to get past this issue, I'd have to move the VM's to another datastore, as apparently ESX4 cannot be installed to a LUN while preserving the datastore (ESX3.5 could do that).

- J

dl9msr · ‎08-22-2009

hello forum,

i've the same problem with ibm x3250 after build new raid 1 array on LSI controller

what can i do?

goodserg · ‎08-22-2009

If use megaraid or lsi controller, try install last vmware update (fix boot with driver on megaraid_sas)

joukom · ‎08-22-2009

> If use megaraid or lsi controller, try install last vmware update (fix boot with driver on megaraid_sas)

Tried installing both patches available for ESX4, but that does not help. I guess that you are referring to the raid cache flush problem, that was patched (at least I could not find any patches whose release notes speak about boot problems), but the cache was flushed OK before swapping the perc5 to perc6 (I restarted the system and let it boot until perc has initialized, which also flushes the cache, just to be on the safe side).

The datastore itself is intact, I eventually replaced the raid cards so that I booted the machine with an usb installed ESX4i, used storage vmotion to move the vm's to another raid array, reinstalled esx4 on the original array and moved the vm's back. So, there's nothing wrong with the datastore, esx is just unable to mount the service console vmdk for some reason if anything changes on the physical path to the vmfs.

- J

dl9msr · ‎08-22-2009

I downloaded Patch ESX400-200906409-BG i hopeyou mean this patch

i've no network connection to the host because boot fails

how can i install the patch

joukom · ‎08-22-2009

I thought that he meant ESX400-200907401-BG. But AFAIK that corrects a bug that causes megaraid_sas driver not to flush write cache at shutdown. So, if you have broken something by letting it shut down and then removing the RAID controller battery / letting it run dry, the damage is already done (ie. some data in the controller cache memory was not written to the disk and is now lost).

If that is the case, the disk is now corrupted, and installing the patch will not fix it anymore, just prevents it from happening again.

According to my experience (I tried it several times by intentionally causing the boot failyre by swapping the raid cards) the error is always identically the same. That's also why I don't believe it resulting from data not written to the disk - that would probably cause different data being lost each time, so the symptoms would also be different.

I'd suggest you install ESX4i installable to a usb stick, boot the server from that and see if you can recover the data from the disk (if there's a datastore, most likely you can read from there and save all the VM's).

- J

dl9msr · ‎08-22-2009

my esx 4 host was running well with drive 0 without raid configuration

i did shut down the host and build a new raid 1 with resync in the LSI Configuration menu

after restart the esx 4 host i got boot fails vsd-mount

joukom · ‎08-22-2009

So this supports the theory I had that if anything changes in the raid volume topology (in earlier cases connecting/disconnecting cables, in my case replacing raid controller and in your case adding a raid volume), the ESX is unable to mount the underlying VMFS that contains the service console VMDK, and thus halts the boot process.

Looks like the best choice is to open a support call at VMware, they might be able to provide instructions on how the boot process can be fixed (and hopefully supply a patch that prevents this from happening).

I did get past the situation by reinstalling ESX, and currently I don't have any spare perc6 cards to experiment with this any further, so I'm not going to open the support issue at the moment, but if someone else is suffering from this, and can open the issue, I'm anxious to see their answers just to be future proof.

- J

dl9msr · ‎08-22-2009

Hello joukom

thanks for your help

i've no shared storage and when i reinstall esx 4, i lost all my data

i'll open a support call at vmware

Do you know how can i install the esx zip patch from the console?

AramiS_1970 · ‎09-13-2009

Hello,

any answer from vmware support? The problem is still the same for evrebody it seems..

My esx are updated with latest patches and to bring them up i still have to disconnect FC cables and to plug them in after boot..

Thanks

paolo

kschurig · ‎09-13-2009

Ich bin vom 14.09.09 bis zum 02.10.09 nicht erreichbar.

E-Mails werden nicht weitergeleitet und können erst nach meiner Rückkehr beantwortet werden.

Vertretung im Bereich PS-Support ist Herr Thomas Mueller, eMails bitte an ps-support@t-systems.com senden.

Freundliche Grüße

Kai-Uwe Schurig

AramiS_1970 · ‎09-13-2009

English please!

Triclaw · ‎10-19-2009

Hi All,

having the same issue already second time. As we upgraded from 3.5 to 4.0, I was able to boot with 3.5 again. Then I cleaned out what I could find of 4.0and upgraded once more to 4. That worked out alright, could save all the data. Painful, but at least I still got all VMs.

Regards,

T.