Highlighted
Contributor
Contributor

ESXi 6.7 Windows 2016/2019 freeze

Dear VMWare Community,

I have serious trouble with Windows 2016/2019 VM guests on ESXi 6.7. Every 3,4,5 days several VMs are unresponsible and hang. The only way to bring the VM back to life is the reset. The log is clean. I seems that the VM hangs without any error and the VMWare tools are unavailable.

2019-03-10T19:02:44.185Z| vcpu-3| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0

2019-03-10T19:32:44.507Z| vcpu-2| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0

2019-03-10T20:02:44.827Z| vcpu-4| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0

2019-03-10T20:32:45.146Z| vcpu-2| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0

2019-03-10T21:02:45.472Z| vcpu-3| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0

2019-03-10T21:32:45.790Z| vcpu-4| I125: CDROM: Emulate GET CONFIGURATION RT 1 starting feature 0

2019-03-10T21:37:58.652Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 270003, assigned to channel 1.

2019-03-10T21:37:59.654Z| vmx| I125: GuestRpc: Got error for channel 1 connection 270004: Remote disconnected

2019-03-10T21:37:59.654Z| vmx| I125: GuestRpc: Closing channel 1 connection 270004

2019-03-10T21:50:26.983Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2019-03-10T21:50:39.976Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2019-03-10T21:50:39.976Z| vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.

2019-03-10T21:50:39.976Z| vcpu-0| I125: Tools: Changing running status: 1 => 0.

2019-03-10T21:50:46.987Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2019-03-10T21:50:46.987Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down

2019-03-10T21:50:46.988Z| vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)

2019-03-10T21:50:46.988Z| vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed

2019-03-11T00:24:50.221Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 280003, assigned to channel 1.

2019-03-11T00:24:51.221Z| vmx| I125: GuestRpc: Got error for channel 1 connection 280004: Remote disconnected

2019-03-11T00:24:51.221Z| vmx| I125: GuestRpc: Closing channel 1 connection 280004

2019-03-11T03:11:43.674Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 290003, assigned to channel 1.

2019-03-11T03:11:44.675Z| vmx| I125: GuestRpc: Got error for channel 1 connection 290004: Remote disconnected

2019-03-11T03:11:44.675Z| vmx| I125: GuestRpc: Closing channel 1 connection 290004

2019-03-11T05:58:39.366Z| vmx| I125: GuestRpc: Got RPCI vsocket connection 300003, assigned to channel 1.

2019-03-11T05:58:40.366Z| vmx| I125: GuestRpc: Got error for channel 1 connection 300004: Remote disconnected

2019-03-11T05:58:40.366Z| vmx| I125: GuestRpc: Closing channel 1 connection 300004

This problem occurs on several ESXi machines. The configuration is similiar.

Intel S2600 Board (with newest firmware)

LSI Megaraid Controller (with newest firmware)

Intel Xeon Processors

ESXi 6.7.0 Update 1 (Build 11675023)

Am I the only one having this problem? πŸ™‚

Any ideas?

Thanks in advance

Greets from Germany

0 Kudos
9 Replies
Highlighted
Contributor
Contributor

Hello Bernhad

I have a similar issue with Windows 10 1809 clients. We did an inplace upgrade from build 1709 and 1803 to Windows 1809.

After a few days the VMs are also unresponsible and hang. Logs are clean and only a reset helps.

2019-05-27T17:30:02.594Z| vmx| I125: GuestRpc: Got error for channel 1 connection 1111: Remote disconnected

2019-05-27T21:27:39.396Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2019-05-27T21:27:46.225Z| vmx| I125: GuestRpcSendTimedOut: message to vdiagent timed out.

2019-05-27T21:27:46.225Z| vmx| I125: ToolsGetAppGenericName: vdiagent status not set

2019-05-27T21:27:53.295Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2019-05-27T21:27:53.295Z| vcpu-0| I125: Tools: Running status rpc handler: 1 => 0.

2019-05-27T21:27:53.295Z| vcpu-0| I125: Tools: Changing running status: 1 => 0.

2019-05-27T21:27:59.399Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2019-05-27T21:27:59.399Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down

2019-05-27T21:27:59.400Z| vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)

2019-05-27T21:27:59.400Z| vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed

2019-05-27T21:28:06.229Z| vmx| I125: GuestRpcSendTimedOut: message to vdiagent timed out.

2019-05-27T21:28:06.229Z| vmx| I125: GuestRpc: app vdiagent's second ping timeout; assuming app is down

2019-05-27T21:28:06.229Z| vmx| I125: ToolsGetAppGenericName: vdiagent status not set

2019-05-27T21:28:06.229Z| vmx| I125: GuestRpc: Reinitializing Channel 1(vdiagent)

2019-05-27T21:28:06.229Z| vmx| I125: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed

2019-05-27T21:28:06.229Z| vmx| I125: ToolsGetAppGenericName: vdiagent status not set

2019-05-27T21:28:58.295Z| vcpu-0| I125: Tools: Running status rpc handler: 0 => 1.

2019-05-27T21:28:58.295Z| vcpu-0| I125: Tools: Changing running status: 0 => 1.

2019-05-27T21:29:19.295Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

We are on VMware ESXi, 6.5.0, 9298722.

Did you find a solution to your problem?

thanks

0 Kudos
Highlighted
Champion
Champion

Hi bernhardgmeiner,

Am I the only one having this problem? πŸ™‚

Haven't seen this problem so before. First thing I would check is the power saving setting in the BIOS of the host. Make sure C1 states are disabled (often you can find a power profile that you can set to Full Performance).

Lars

0 Kudos
Highlighted
Contributor
Contributor

I'm also running into the same issue. It happen on random VMs, some share a common disk/datastore. The ESXI host needs to be restarted

ESXI 6.7 U3

OSes that have frozen: Centos 7, Windows 7 and Sever 2012R2. I noticed that vmware tools will stop running and the CPU on the VM in ESXI will be at 0%

2020-01-24T10:49:30.746Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2020-01-24T19:20:04.099Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2020-01-24T19:20:24.100Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2020-01-24T19:20:24.100Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down

2020-01-24T19:20:24.100Z| vmx| I125: GuestRpc: Reinitializing Channel 0(toolbox)

2020-01-24T19:20:24.101Z| vmx| I125: GuestMsg: Channel 0, Cannot unpost because the previous post is already completed

2020-01-24T19:21:06.989Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

2020-01-24T19:21:26.990Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

2020-01-24T19:21:26.990Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down

2020-01-24T19:21:26.990Z| vmx| I125: GuestRpc: Reinitializing Channel 1(toolbox-dnd)

2020-01-24T19:21:26.990Z| vmx| I125: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed

2020-01-24T22:20:26.324Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-01-24T22:20:59.209Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-01-24T22:21:04.169Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-01-24T23:38:50.357Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-01-24T23:48:19.752Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-01-24T23:48:57.749Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-01-24T23:55:41.910Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

0 Kudos
Highlighted
Hot Shot
Hot Shot

What version of VMware Tools do you have?

0 Kudos
Highlighted
Enthusiast
Enthusiast

Hello Bernhard,

please update the ESXi 6.7 with the latest updates and the guest with the latest VMware tools. What kind of VMFS do you use? VMFS 5 oder VMFS 6? Any applications running inside the Windows Server?

Best regards Patrick https://www.vcloudnine.de
0 Kudos
Highlighted
Contributor
Contributor

  • Compatibility:ESXi 6.7 and later (VM version 14)
  • VMware Tools:Running, version:11265 (Current)

I switched all my VMs over from E1000 to vmxnet3 and so far so good, will keep an eye on it.

0 Kudos
Highlighted
Contributor
Contributor

Hi,

I faced same issue. Do you have solution?

Regards,

SagiK

0 Kudos
Highlighted
Commander
Commander

Hello Bernhard,

Do your virtual machines have E1000 vNICs configured? If so, try to change it to VMXNET3.

0 Kudos
Highlighted
Contributor
Contributor

Did you ever open a VMWare support case and if you did, can we have the case number? We have been battling this since April of 2016 and it is extremely difficult to replicate.

Some other information like AV solution used and backup solution could also help us point to a cause.

My current case is 20129838706

If you share the case, we can look for similarities and get to the bottom of this.

0 Kudos