VMware Cloud Community
Tomson74
Contributor
Contributor

Host Randomly Shuts Down With PSOD

Greetings everyone.

I am having a problem with one of my host is running VMware ESXi, 5.5.0, 2068190 image profile HP-ESXi-5.5.0-Update2-iso-5.77.3

The server is a ProLiant DL360e Gen8 That being said I am new to VMware. I have one other host that is not having this issue and that host is a HP ProLiant DL360e Gen8

VMware ESXi, 5.1.0, 1065491 Image Profile HP-ESXi-5.1.0-standard-iso

Both hosts have three VM's on it.

The issue used to happen one a week. It is now happening every two weeks. I grabbed the system logs, and I also browsed to http://<hostname or ip address>/host to see what I can find.

To be honest I am not sure which file to look at. I did look in the vpxa.log, syslog.log, sysboot.log, vmkwarning.log, vmkeventd.log as well.

Any help would be appreciated.

I did contact VMware support, however, they told me at the time (which was about a month ago) that the data was "inconclusive".

Thanks again.

0 Kudos
17 Replies
cykVM
Expert
Expert

Hi Tomson74,

maybe you have a similar problem as I have, see long discussion with no real solution here: HP Proliant DL380e Gen8, HP OEM VMWare ESXi 5.5 Update 2 keeps crashing (PSOD)

Ok, I have a DL380e Gen8 but if the hardware (storage controller, NICs) you run is identical it might be a similar issue.

If possible create a screenshot from the PSOD if you have an iLo card built-in and access to the console.

The only solution for me was to go back to VMWare ESXi 5.5 Update 1 (HP customized).

If your server is still under warranty you may also contact HP support and probably point them to this thread or my thread.

cykVM

0 Kudos
brunofernandez1

do you have a pic of the PSOD?

maybe with this information we can do more...

------------------------------------------------------------------------------- If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards from Switzerland, B. Fernandez http://vpxa.info/
0 Kudos
Tomson74
Contributor
Contributor

Thank you both for replying. Here is the picture of the PSOD.

0 Kudos
cykVM
Expert
Expert

A bit weirdly cut that pic Smiley Wink

But it's the same "No heartbeat" PSOD I got with 5.5 Update 2. See discussion I linked to above.

Could yoe give some more details on hardware in use (CPU, RAM, storage controller, NICs) and the driver versions?

0 Kudos
Tomson74
Contributor
Contributor

Sorry about the picture, I took it with my phone lol.

I looked at that thread, and want to roll back to update 1.

That being said, I do have a question. I have three VM's on that host.

(We don't have a SAN here, a long story about that lol)

If I export a OVF/OVA of the VM's, and rollback to update 1, and import the OVF/OVA file, would that restore the VM's the way they were?

Also, I ran into trouble when I tried booting to a USB to install ESXi, it didn't see the local HD, just the USB, is this typical?

Thanks again.

0 Kudos
cykVM
Expert
Expert

You should use the HP customized VMWare installation image for 5.5 Update 1 if you also use a HP SmartArray B320i/B120i. The vanilla (genuine VMWare) installation image does not have any driver for that storage controller (the driver 'hpvsa' is only available on customized images).

So you may have used an original VMWare installation image on your install to USB test.

0 Kudos
Tomson74
Contributor
Contributor

Ok awesome that makes sense. I will do that.


If I export a OVF/OVA of the VM's, and rollback to update 1, and import the OVF/OVA file, would that restore the VM's the way they were?



Promise I am done after this Smiley Wink

0 Kudos
cykVM
Expert
Expert

No problem, ask as many questions as you like Smiley Wink

I never tried that to be honest. I would take a backup of the VMs, but the datastore should generally not be touched if you install 5.5 U1 over U2.

You then could re-import the VMs to your inventory directly from that datastore.

If you have more than one host available you may even vmotion the VMs to another host and move them back after the install of U1.

0 Kudos
Tomson74
Contributor
Contributor

Awesome! Thank you very much!

I will reply to the thread once I complete the tasks during a maintenance schedule

0 Kudos
Tomson74
Contributor
Contributor

Well, I successfully rolled back to HP VMware 5.5 Update 1, and this past Fri, the host went down again.

I am lost on what to do next.

0 Kudos
cykVM
Expert
Expert

Was it the same PSOD with "no heartbeat" or another one?

How long was the host running with the update 1 version?

0 Kudos
Tomson74
Contributor
Contributor

Yes same error. The host had been running for five days with Update 1.

0 Kudos
cykVM
Expert
Expert

Did you already try to update all BIOS and firmware on the server? Especially running the iLo card with an outdated firmware version might cause this.

I would also do a memory test with e.g. memtest and check if the server is probably overheating due to dust.

0 Kudos
Tomson74
Contributor
Contributor

Not yet. I am going to look into that for sure.

I know I used to get Broadband Management errors when I was on Update 2, those have since disappeared.

0 Kudos
cykVM
Expert
Expert

Could you list the hardware in use (CPU(s), RAM, storage controller (B120i, B320i or any other) and NICs)?

Another thing to consider is the time on your host, is there probably a time drift? If you use ntp for automatic time, is that configured correctly and working?

Does the system health in vSpehere client (host configuration) show any warnings/errors?

And last but not least dig through the VMWare logfiles for any traces causing this.

0 Kudos
Tomson74
Contributor
Contributor

Where can I view the logs, and which logs specific would I look at?

HP ProLiant DL360e Gen8

Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz

NICs: 4 (teamed)

31.84gb of memory 12.49gb used 19.35gb free

storage controller B320i

0 Kudos
cykVM
Expert
Expert

I woukld first take a closer look on the logfile through vSphere client (Home -> Administration -> System Logs or "View" menu -> Administration -> System Logs).

You can even download them from there and view them on you admin workstation with an external editor.

What did you mean by "Brodband management errors" you got with Update 2?

Did you set your BIOS options regarding Power Management according to HP's recommendations ("HP static high performance mode" in BIOS and "High performance" within VMWare configuration)?

0 Kudos