I have two identical hosts that randomly will go "Not Responding" at different times.
I pulled a vmkernel log and cut out everything aside from the BlueScreen area.
How can I determine the cause from this message?
6.5.0 VMKernel 4564106
I am having this exact same issue. The build is identical which makes me believe it could be a HP problem. Can you advise if you found a solution ? I am finding this issue happening when the server is under high load.
I've got the same issue.
i have a Dell R900. 4x Xeon 7460 CPU's, 128GB RAM, 5x 2GB SATA hard drives (RAID-5 configuration on Dell PERC-6). latest Dell BIOS.
I installed ESXI 6.5 a couple of months ago. spun up about a dozen VM's. had it running for about a month, then go a PSOD, spin count exceeded - possible deadlock on PCPU 1.
after rebooting, all my VM's were corrupted. couldn't find anything on how to recover so i scrapped them.
happened to see the that ESXI 6.5 Update 1 was available so i grabbed it, installed fresh and created my VM's again. had what appeared to be a stable build this time so i went about building windows system templates for various OS's based on some applications we'd be setting up. took dozens of hours download and patches from Microsoft to get the VM's to be as current as possible.
then today i got hit with another PSOD. and of course all the running VM's are dead, won't start. "cannot open blah/blah/blah.vmdk" error.
we're being instructed to move from hyper-v to vmware for several servers, but in 10 years of running hyper-V, i've never had a server blue screen, and in the very rare event of having to cold-power-cycle (ie. shutdown or restart simply isn't working) a hyper-v host, the VM's would still be in a re-startable state when the host came back up. I know that VMware is the market leader, is more reliable, is simpler in architecture, but i'm not sold on the reliability factor yet. if a PSOD means that all running VM's get hosed, then this is has a very high "screw this noise" factor.
1) what exactly is a "spin count exceeded - possible deadlock on PCPUx" error and how do you troubleshoot it?
2) how do you recover a failed VM after a PSOD?
please don't refer to vMotion or vCenter as we have neither or those.
Many thanks in advance.
I've got the same issue on my HP proliant DL 360 Gen9 server. Have you managed to get an resolution as we applied firmware updates but still nothing?!
As per recovering the failed state beekerc i have used the vmkfstools from SSH session to the host