VMware Cloud Community
techpaul
Contributor
Contributor

ESX 3.5 Slow boot Loading VMkernel qla2300_707_vmw.o

I installed update 4 for ESX 3.5 via Update manager. After installing all required updates the ESX host did the final reboot, but did not come back up. I checked the console and could see it was stuck at the line loading vmkernel qla2300_707_vmw.o. I restarted the offending ESX host a few times but it still would not get past the loading vmkernel message.

There were a couple of articles on the vmware community webpage suggesting that this can take up to 30 mins and just to sit tight and wait. I waited for 45 mins it still was stuck at the same screen.

It eventually boots up after around 1hr 30 mins!

I have installed all the latest firmware updates for the Hardware HP BL460c G1

I also ran the below command after reading an article with a fix. This did not do anything to speed up boot time, still around 1 hr 30 mins.

esxcfg-advcfg -s 10 /Scsi/ConflictRetries

Has anyone else experienced this problem before? I ran update 4 on all 5 of the DR esx hosts wih out any issues, this is the first ESX host in our prod environment that has update 4 installed.

Thanks

P

0 Kudos
14 Replies
DillonMiller
Contributor
Contributor

One thing you may want to check is the log folder free space on the system.

if you run the command "df -h" and look for /var/log just make sure it has enough space. I had an ESX box do the same thing when it had no space left for logs. output of df -h looks like this.

Filesystem Size Used Avail Use% Mounted on

/dev/sda2 4.9G 3.0G 1.7G 65% /

/dev/sda1 99M 26M 69M 28% /boot

none 132M 0 132M 0% /dev/shm

/dev/sda6 2.0G 126M 1.7G 7% /var/log

0 Kudos
techpaul
Contributor
Contributor

Hi I appreciate the speedy response. Here is the output from the df -h command.

Is /var the log directory you are refering to as you mentioned var/log ?

]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/cciss/c0d0p2 5.0G 2.9G 1.9G 61% /

/dev/cciss/c0d0p1 244M 31M 201M 14% /boot

/dev/cciss/c0d0p3 2.0G 124M 1.8G 7% /opt

none 391M 0 391M 0% /dev/shm

/dev/cciss/c0d0p5 2.0G 33M 1.9G 2% /tmp

/dev/cciss/c0d0p6 2.0G 298M 1.6G 16% /var

0 Kudos
vm_arch
Enthusiast
Enthusiast

what is the actual model of HBA in the server?

I seem to recall reading somewhere in here about several models of QLogic HBA (esp ones OEMed by Dell) that for one reason or another would appear to run under the qla2300_707 driver, but actually weren't supported and had major reliability issues (such as yours)

0 Kudos
techpaul
Contributor
Contributor

Actual model is:

QLogic QMH2462 4Gb FC HBA for HP c-Class BladeSystem

thanks

0 Kudos
pironet
Enthusiast
Enthusiast

Hi Paul,

The trick shown at deinoscloud.wordpress.com works in the scenario you have this in your vmkernel log files:

SCSI: vm 1043: 5522: Sync CR at 64

SCSI: vm 1043: 5522: Sync CR at 48

SCSI: vm 1043: 5522: Sync CR at 32

SCSI: vm 1043: 5522: Sync CR at 16

SCSI: vm 1043: 5522: Sync CR at 0

WARNING: SCSI: 5532: Failing I/O due to too many reservation conflicts

Check vmkernel log files and eventually paste here Smiley Wink

Rgds,

Didier

I wish I was a virtual machine :) http://deinoscloud.wordpress.com
0 Kudos
kjb007
Immortal
Immortal

How many LUNs do you have zoned/masked to the ESX host? What are the LUN id's associated with them? If you have any ID's higher than 256, you will see problems with fiber channel scan. What happens when the server is booted, and you run a HBA rescan? Does that take a long time to complete as well?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
0 Kudos
DillonMiller
Contributor
Contributor

By default var/log is its own partition but in your case it will be a subdirectory under var. However, your df output doesn't show the problem I was having so it's something else.

0 Kudos
davismisbehavis
Enthusiast
Enthusiast

Did you ever manage to resolve this issue. We have just upgrade a number of DL580 G5 servers to 3.5 Update 4 and it takes about 1.5 hours to get passed the following stage of boot up.

Loading VMkernel ‘HBA driver name' i.e. Loading VMkernel qla2300_707_vmw.0 (options: ")

We are running HP / Qlogic HBA's and they're shown as ISP2432's within ESX when the server eventually starts up

Appreciate any assistance anyone can offer on this problem as these server are part of our production environment.

0 Kudos
pironet
Enthusiast
Enthusiast

The following trick saved my day in my environment: esxcfg-advcfg -s 10 /Scsi/ConflictRetries

Read the complete post at http://deinoscloud.wordpress.com/2009/08/12/do-you-suffer-slow-boot-up-with-your-esx-host/

Cheers,

Didier

I wish I was a virtual machine

I wish I was a virtual machine :) http://deinoscloud.wordpress.com
0 Kudos
davismisbehavis
Enthusiast
Enthusiast

Hi Didier

Thanks for the prompt response, I did stumble upon your blog post when looking at the issue. The problem I have is that I'm not seeing the SCSI conflicts in the logs as you've detailed in your post.

I'd be wary of making the change without fully understanding the issue, I've logged a call with VMware support to take a look into this. I've read a couple of other posts where people reported that minor differences in the hardware revision of the HBA caused them issues. This might explain why one server was fine and the other one was not.

I'll report back with the findings when I've worked through it with Vmware and HP.

Cheers

D.Misbehavis

0 Kudos
pironet
Enthusiast
Enthusiast

Hi Davis, any news from the ticket you raised?

Cheers,

I wish I was a virtual machine :) http://deinoscloud.wordpress.com
0 Kudos
davismisbehavis
Enthusiast
Enthusiast

Hi

As it turned out we have unsupported storage. We use HP MSA 1000's which are on the 3.5 U4 HCL, however due to the firmware revision on them they are showing up as Compaq MSA 1000's which are not supported on the HCL.

VMware tech support basically stated that this was the cause of the issue, however one of our servers boots up fine without any delay and it's on the same HBA hardware revision, firmware release as well as the same ESX build and patch level as one of the ones that doesn't.

We are moving to EMC CX4 in January and the MSA's appear to be working without issue according to the logs. Going to stick with it and run the risk for the next month or so.

0 Kudos
pironet
Enthusiast
Enthusiast

Hi Davis,

Perhaps a dumb question but is it possible to upgrade your MSA storage with latest HP firmware?

Rgds,

Didier

I wish I was a virtual machine :) http://deinoscloud.wordpress.com
0 Kudos
davismisbehavis
Enthusiast
Enthusiast

I'm currently working with HP on that one, still awaiting an answer.

The MSA 1000 is showing up on the HCL as being supported with V7.x of the firmware, this is the latest revision of the active / active firmware so my thoughts are that this would resolve the supportability issue.

Whether it would solve the problem with the slow booting is another question, as I mentioned one of the hosts boots without issue and it's seeing the MSA as a compaq one within the /proc/scsi/scsi file same as the 2 slow hosts. Need to get it supported first and then revisit with VMware

0 Kudos