Dear VMWare Community,
I have serious trouble with Windows 2016/2019 VM guests on ESXi 6.7. Every 3,4,5 days several VMs are unresponsible and hang. The only way to bring the VM back to life is the reset. The log is clean. I seems that the VM hangs without any error and the VMWare tools are unavailable.
2019-03-10T19:02:44.185Z| vcpu-3| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0
2019-03-10T19:32:44.507Z| vcpu-2| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0
2019-03-10T20:02:44.827Z| vcpu-4| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0
2019-03-10T20:32:45.146Z| vcpu-2| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0
2019-03-10T21:02:45.472Z| vcpu-3| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0
2019-03-10T21:32:45.790Z| vcpu-4| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0
2019-03-10T21:37:58.652Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 270003, assigned to channel 1.
2019-03-10T21:37:59.654Z| vmx| I125: GuestRpc: Got error for channel 1 connection 270004: Remote disconnected
2019-03-10T21:37:59.654Z| vmx| I125: GuestRpc: Closing channel 1 connection 270004
2019-03-10T21:50:26.983Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2019-03-10T21:50:39.976Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2019-03-10T21:50:39.976Z| vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.
2019-03-10T21:50:39.976Z| vcpu-0| I125: Tools: Changing running status: 1 => 0.
2019-03-10T21:50:46.987Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2019-03-10T21:50:46.987Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down
2019-03-10T21:50:46.988Z| vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)
2019-03-10T21:50:46.988Z| vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
2019-03-11T00:24:50.221Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 280003, assigned to channel 1.
2019-03-11T00:24:51.221Z| vmx| I125: GuestRpc: Got error for channel 1 connection 280004: Remote disconnected
2019-03-11T00:24:51.221Z| vmx| I125: GuestRpc: Closing channel 1 connection 280004
2019-03-11T03:11:43.674Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 290003, assigned to channel 1.
2019-03-11T03:11:44.675Z| vmx| I125: GuestRpc: Got error for channel 1 connection 290004: Remote disconnected
2019-03-11T03:11:44.675Z| vmx| I125: GuestRpc: Closing channel 1 connection 290004
2019-03-11T05:58:39.366Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 300003, assigned to channel 1.
2019-03-11T05:58:40.366Z| vmx| I125: GuestRpc: Got error for channel 1 connection 300004: Remote disconnected
2019-03-11T05:58:40.366Z| vmx| I125: GuestRpc: Closing channel 1 connection 300004
This problem occurs on several ESXi machines. The configuration is similiar.
Intel S2600 Board (with newest firmware)
LSI Megaraid Controller (with newest firmware)
Intel Xeon Processors
ESXi 6.7.0 Update 1 (Build 11675023)
Am I the only one having this problem? 🙂
Any ideas?
Thanks in advance
Greets from Germany
Hello Bernhad
I have a similar issue with Windows 10 1809 clients. We did an inplace upgrade from build 1709 and 1803 to Windows 1809.
After a few days the VMs are also unresponsible and hang. Logs are clean and only a reset helps.
2019-05-27T17:30:02.594Z| vmx| I125: GuestRpc: Got error for channel 1 connection 1111: Remote disconnected
2019-05-27T21:27:39.396Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2019-05-27T21:27:46.225Z| vmx| I125: GuestRpcSendTimedOut: message to vdiagent timed out.
2019-05-27T21:27:46.225Z| vmx| I125: ToolsGetAppGenericName: vdiagent status not set
2019-05-27T21:27:53.295Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2019-05-27T21:27:53.295Z| vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.
2019-05-27T21:27:53.295Z| vcpu-0| I125: Tools: Changing running status: 1 => 0.
2019-05-27T21:27:59.399Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2019-05-27T21:27:59.399Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down
2019-05-27T21:27:59.400Z| vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)
2019-05-27T21:27:59.400Z| vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
2019-05-27T21:28:06.229Z| vmx| I125: GuestRpcSendTimedOut: message to vdiagent timed out.
2019-05-27T21:28:06.229Z| vmx| I125: GuestRpc: app vdiagent's second ping timeout; assuming app is down
2019-05-27T21:28:06.229Z| vmx| I125: ToolsGetAppGenericName: vdiagent status not set
2019-05-27T21:28:06.229Z| vmx| I125: GuestRpc: Reinitializing Channel 1(vdiagent)
2019-05-27T21:28:06.229Z| vmx| I125: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed
2019-05-27T21:28:06.229Z| vmx| I125: ToolsGetAppGenericName: vdiagent status not set
2019-05-27T21:28:58.295Z| vcpu-0| I125: Tools: Running status rpc handler: 0 => 1.
2019-05-27T21:28:58.295Z| vcpu-0| I125: Tools: Changing running status: 0 => 1.
2019-05-27T21:29:19.295Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
We are on VMware ESXi, 6.5.0, 9298722.
Did you find a solution to your problem?
thanks
Hi bernhardgmeiner,
Am I the only one having this problem? 🙂
Haven't seen this problem so before. First thing I would check is the power saving setting in the BIOS of the host. Make sure C1 states are disabled (often you can find a power profile that you can set to Full Performance).
Lars
I'm also running into the same issue. It happen on random VMs, some share a common disk/datastore. The ESXI host needs to be restarted
ESXI 6.7 U3
OSes that have frozen: Centos 7, Windows 7 and Sever 2012R2. I noticed that vmware tools will stop running and the CPU on the VM in ESXI will be at 0%
2020-01-24T10:49:30.746Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2020-01-24T19:20:04.099Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2020-01-24T19:20:24.100Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2020-01-24T19:20:24.100Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down
2020-01-24T19:20:24.100Z| vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)
2020-01-24T19:20:24.101Z| vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
2020-01-24T19:21:06.989Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2020-01-24T19:21:26.990Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2020-01-24T19:21:26.990Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down
2020-01-24T19:21:26.990Z| vmx| I125: GuestRpc: Reinitializing Channel 1(toolbox-dnd)
2020-01-24T19:21:26.990Z| vmx| I125: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed
2020-01-24T22:20:26.324Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
2020-01-24T22:20:59.209Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
2020-01-24T22:21:04.169Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
2020-01-24T23:38:50.357Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
2020-01-24T23:48:19.752Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
2020-01-24T23:48:57.749Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
2020-01-24T23:55:41.910Z| svga| I125: MKSScreenShotMgr: Taking a screenshot
What version of VMware Tools do you have?
Hello Bernhard,
please update the ESXi 6.7 with the latest updates and the guest with the latest VMware tools. What kind of VMFS do you use? VMFS 5 oder VMFS 6? Any applications running inside the Windows Server?
I switched all my VMs over from E1000 to vmxnet3 and so far so good, will keep an eye on it.
Hi,
I faced same issue. Do you have solution?
Regards,
SagiK
Hello Bernhard,
Do your virtual machines have E1000 vNICs configured? If so, try to change it to VMXNET3.
Did you ever open a VMWare support case and if you did, can we have the case number? We have been battling this since April of 2016 and it is extremely difficult to replicate.
Some other information like AV solution used and backup solution could also help us point to a cause.
My current case is 20129838706
If you share the case, we can look for similarities and get to the bottom of this.