VMware Cloud Community
JustyC
Enthusiast
Enthusiast

ESXi 5.0 hosts hang on restart\shutdown


Having issues when we restart or shutdown hosts from the console.  The hosts (ESXi 5.0 U2) do not complete either process.  We've tried

3 hosts - they all exhibit the same condition.  We see either restart or shutdown has been started messages being displayed, but the hosts do

not complete either process.  After a very long period of time the displayed messages go away.  Uptime command shows the hosts have

not been cycled.  Continous pings to the hosts show no drops.  vSphere client though does lose it's connection.  Host hardware id HP Proliant 585 G7.

Turned off usbarbitrator and disabled the data mover and vmfs3.  Our belief is that it is somehow related to the SAN.  We use EMC VNX.  Hosts have

to be manually powered off\on.  Not good !  Has anyone found a resolution to this problem ?

0 Kudos
7 Replies
jrmunday
Commander
Commander

I haven't seen this behaviour before, but there are some obvious questions;

  1. What has changed in your environment - I'm assuming this worked fine before?
  2. Are you fully patched, including firmware BIOS etc?
  3. Does the issue persist if you disconnect your EMC storage - this could rule out the storage?
  4. Are there any clues in the logs to give an indication of what's going wrong?
vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
marcelo_soares
Champion
Champion

Have you verified if your BIOS version is compatible with this ESX version? On VMware HCL I see only BIOS A16 is supported for 5.0 U2.

Also, it would be good to test the process with the SAN disconnected as mentioned by jrmunday , and also check the flare version for the VNX to check for HCL inconsistencies.

Additionally, are you seeing errors on the /var/log/vmkernel.log file while the reboot process is goind on? It would be good to redirect the ESXi logs to a syslog server and check after the problem.

Marcelo Soares
0 Kudos
JustyC
Enthusiast
Enthusiast

Our network people rebooted all core Cisco switches while the VMware environment was up and running production.  Needless to say we had many alerts

and problems.  We also had SAN storage issues afterward, some of which existed before the switch reboots.  These were high latency and lost access to

volume events.  EMC rebooted the VNX SPs, turned off auto tier and updated their code.  They say all that needed to be done has been done.  I updated

the host firmware (including BIOS) to the latest on HP's site.  The hosts were all built using the ESXi download from HP.  The host hardware is monitored by

HP Insight Manager.  There are no reported hardware issues. 

0 Kudos
marcelo_soares
Champion
Champion

So... asI could see, this was a one time issue? Can you reproduce it again on your environment while everything is working fine now? As you had a lot of variants when you tested it, it is very difficult to tell what may cause the issue - usually, SAN connection problems are used to cause this, but is just a guess.

If you have the opportunity to redirect the host logs to a syslog server, place this host in maintenance mode, reboot it and check if the behavior repeats it would be the only chance to know what is going on (or not).

Marcelo Soares
JustyC
Enthusiast
Enthusiast

Performed some testing.  Powered off one of the hosts (had to because shutdown didn't work).  Disconnected the SAN iSCSI cables.  Powered the host back up - was at the ESXi splash page about 12 minutes later.  Issued a shutdown - the host shutdown as expected in a few minutes.  Will test again this morning.  As mentioned we had turned off usbarbitrator and data mover as these have been reported to increase reboot times.  We may need to turn the data mover on again since the SAN hardware supports this feature.   Will start looking at the the host logs.

0 Kudos
JustyC
Enthusiast
Enthusiast

Testing again this morning.  Shutdown or Restart commands issued at the console do not complete.  But manually powering on the host it comes up

quickly with Iscsi connected.  The vmksummary log shows.....

2013-09-18T09:30:07Z bootstop: Host is powering off (When I issued the console shutdown request)

2013-09-18T10:23:55Z bootstop: Host has booted (after waiting 53 minutes I manually powered off\on host because no shutdown had taken place)

2013-09-18T10:26:32Z bootstop: Host is rebooting

0 Kudos
vmware_tam_neto
Contributor
Contributor

I see this is an old post but I stumbled across it while trying to find a VMware KB related to what I ended up doing to fix hung hosts after initiating a shutdown/reboot command...

We are also running VMware ESXi 5.0 Update 2 with some of the hosts on U2 Express Patch 5. We have the same problem on reboot occasionally. We have no SANs, each host has internal storage only. Intermittently on reboot (we never really shutdown), the host would hang for as long as we let it.

Anyways, when a host hangs after running "reboot" at the troubleshooting console, this command would reboot the host in a few moments - "/etc/init.d/hostd stop"

Hope this helps,

Aaron

0 Kudos