One of our ESXi hosts running version 6.7 keeps having a weekly issue where it will randomly lose contact to vSphere. The guest VMs are a basic domain controller, SQL server, and a couple of Citrix VDA servers as well as the VMware vCenter Server Appliance.
It starts out showing that the domain controller's CPU is spiking and alerts in a red status, then shows each VM disconnecting 1-by-1, after which the host becomes unreachable. Errors stating "cannot synchronize host" and "Host [hostname] is not responding." I'm sure there's some issue with the domain controller randomly having it's CPU spike, but that shouldn't cause the rest of the VMs and the entire host to become unreachable.
Typically the issue resolves itself after a few hours as it tends to happen around 3AM. The uptime on the host never indicates that it rebooted or shutdown.
Has anyone else ran into this issue? Any assistance is appreciated.
wabash2015,
Such issues may occur if the management vmkernel adapter is sharing the physical nic with a service that is stealing all the bandwidth such as vMotion, backup or similar. Please check if this may be the issue in your environment.
Lars
Thanks for the response.
I've reviewed this, and the only service that it's sharing is the management service with vmnic0.
If you are ensured about a special VM resource usage such as your DC, limit it's CPU/Memory via resource pool setup
That's something I will try.
Regardless, even if the VM's CPU is spiking, it shouldn't cause the host to become unreachable from the vCenter service appliance, right? It also shouldn't cause all of the VMs to go to a disconnected state.
normally yes, you shouldn't see such this situation, whenever a VM demands more CPU usage ...
Does this occur always? most of times? or in higher peaks?
Please monitor your CPU usage metrics with more detail after reading the following link: