Hello,
we have 3 ESXi servers. 2 of them run ESXi 6.7, 14320388. One of them produced a PSOD 2 days ago. After a reboot, everything seemed to be fine, then 6 hours later it crashed again.
I moved all VMs off of the host to another host that's also running ESXi 6.7, 14320388. That host then crashed with a PSOD after about 10 hours - this time I took a screenshot of the PSOD screen (attached). I reset the server and it started up fine. They both had an uptime of more than 400 days.
Not sure whether this is important, but both servers have their system on a SD card - we are planning to replace it with local disks.
Hardware diagnostics reveal no hardware issue.
Which would be the next steps to find out what's going on and what would be your recommendations to prevent these issues?
Thanks
Have a look at this
Hello.
While your current patch level is Update 3 (14320388), I recommend upgrading to 16713306 (6.7 P03/2020-08-20), which fixes certain SSD issues.
For security issues, it is always advisable to update to the latest available level.
Attached is a link where you can get the patches:
https://customerconnect.vmware.com/patch
If you used a custom image from the server manufacturer to install VMware vSphere, you should check the manufacturer's web site to see if they already have a newer image for version 6.7 that you can upgrade to.
If you have several ESXi hosts, the practice recommends to upgrade only one and verify that all of them are working well for some time, then upgrade the others.
Thank you so much for the suggestions.
Indeed we use a custom image from HP. I'll get the most recent update and will install that once the harddrives are in.
In the meantime I'll update the network drivers by itself and see whether the PSOD returns.
Much appreciated
As a rule, I never let my ESX servers go more than 180 days w/o a reboot... 400 days is a bit excessive