Hello!
4 VMs were restarted recently by HA, in VM Events I see this:
The VM using the config file {1} contains the host physical page {2} which was scheduled for immediate retirement. To avoid system instability the VM is forcefully powered off.
I suppose ESXi wanted to swap RAM page belonging to this VMs and failed due to timeout, then it just decided to restart VM on another host. Am I right?
How can we prevent these issues from happening again? Could this be HW related?
It ended up to be RAM module error, which showed up on IDRAC 5h later after incident, ordered replacemnt from Dell.
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.
It ended up to be RAM module error, which showed up on IDRAC 5h later after incident, ordered replacemnt from Dell.
---------------------------------------------------------------------------------------------------------
Was it helpful? Let us know by completing this short survey here.
Totally curious but was this as a result of pro-active HA?
i also faced this issue with one of infra, where one VM got restarted by ha on some other host.
I checked and found that the the application monitor was on the Guest OS and while some app got crashed it sent the NMI to the Guest with the help of VMtools to restart it, in order to avoid corruption. Need to check with Guest OS dump for more deep dive into this issue.
(not a hypervisor/vmware issue)