VMware Cloud Community
CinciTech
Enthusiast
Enthusiast

VM locks up, won't power off. Looking for logs.

I have a whitebox ESXi 6.0 server with just a couple of VMs, mostly for testing purposes but also hosting a VM (Windows 2012) used to store home movies and stream them for family viewing, so when this goes down I hear about it.  I have a vCenter appliance and one other VM (Windows 2012).  I upgraded this host from ESXi 5.5 U2 to ESXi 6.0 U2 a month or so ago.  I also installed a VIB driver for a couple NICs which were supported in my 5.5 install but not in 6.0.  Since the upgrade I've had the VM lock up such that RDP doesn't work, and the console screen is viewable from the vSphere interface (both thick client and web client) but it is frozen with the time of death stored on the Windows login screen.  The first time this happened I tried to restart OS from the vSphere interface, (no luck), and then tried to power off the VM, (also did not work, command hung at 0% in 'recent tasks').  At this point I could give it more commands but the recent tasks panel would eventually say the VM couldn't process further commands.  I then opted to shut down my appliance and other VM, and then restart the host, and ultimately found that the host had to be powered off via the hardware button.  Pretty serious for one VM to bring down the host, so I assume that there's maybe a driver issue involved.

Now it has happened a second time, and upon finding my VM frozen I immediately tried to power off the VM from the vSphere web interface  (just in case the restart command was what hung things).  Again the 'recent tasks' panel shows a 0% completed task of powering down the VM.  For the moment I'm only able to remotely connect to the host, so I am limited to SSH and whatever the vSphere web interface will let me do (later tonight I can log into the shell or power down if need be).

With all that said, where can I go on the host to look up logs to help me find out what may have happened to allow a VM to bring down my host?

0 Kudos
3 Replies
klapkaj
Enthusiast
Enthusiast

maybe the drivers installed after upgrade causing something to fail in networking?

It this isnt production host try fresh 6U2 install. if now is installed esxi on the storage where is also datastore then use SD or USB to install ESXi on it.

Its about half hour and you will  know if this is caused by the upgrade.

if host doesnt go down you can show logs from console ALT+F12

0 Kudos
CinciTech
Enthusiast
Enthusiast

I also think it's possible that the NIC driver is the culprit, although I want to verify with logs rather than just labeling a problem without researching and proper diagnosis.  Unfortunately I don't remember if the first lock-up occurred before or after I installed the drivers (it's a Realtek 8169 chipset, and I'm pretty miffed that VMWare has decided to just blacklist an otherwise-working piece of hardware, but that's a rant for another forum...).

I will try ALT+F12 when I get back in front of the host console.  Is there nowhere I can access these logs via SSH or vSphere?

0 Kudos
klapkaj
Enthusiast
Enthusiast

logs are stored at /var/log/....

there are lots of log files

yes but on the other side supported Intel or Broadcom (now Qlogic) NIC is not as expensive and Realtek isn't the big player in the world of industry standard servers RTL hw is targeted in other segment.

0 Kudos