We have installed brand new HP Proliant ML350 running ESXi 6.5.0 [Releasebuild 4564106 x86_64]
It has now crashed twice with a complete VMware host hang and a purple screen of death. Crashes tended to be overnight. We are running Veeam backup on all VM's.
I really need to find out what the cause is, as it takes down 3 customers VMs.
The message screen shot is below and starts:
Spin Count exceeded - possible deadlock
the rest is shown below.
We have just updated to the latest ESXi version 6.5.0 (Build 5146846), in the hope that it resolves the issue.
What logs might help us narrow down the cause?
Or perhaps there is someone clever that can help us analyse the Support Bundle, or inspect the logs
Thank in advance.
Hello,
Can you upload a higher resolution screenshot? It is not possible to read all of the output.
Are all of the PSOD backtraces identical? If not then upload any that have any different output.
You should also ensure you have the latest version and drivers of iLO installed as this is known to cause some PSODs in 6.5:
kb.vmware.com/kb/2148123
Bob
-o- If you found this comment useful please click the 'Helpful' button and/or select as 'Answer' if you consider it so, please ask follow-up questions if you have any -o-
PS - I know that we updated all of the various drivers about two weeks ago as the HP servers have a FAN issue with them changing pitch randomly and in sudden jumps, which seems to be a general ongoing issue with ML350's, but not sure if we did the iLO, but I think so. I'll need to ask tomorrow when my colleague is back in work.
Also you mention PSOD back traces, are those what I posted, or can I find them somewhere?
Check out this vmware kb article to extract dump file after PSOD.