VMware Cloud Community

Windows Server 2016 Freezes, can unfreeze with VM snapshot or NIC disconnect/connect

Hello all,

We are having some weird issues with some of our Windows 2016 servers running a web service freezing up at random. Console is locked up and network is unreachable. CPU usage usually locks in at its last reported value, not always 100%, but just flatlines during the freeze. VMTools is unreachable (how we know its happening is the IP/DNS/Tools status stop reporting) so the only option was to hard reset or power off.  But we discovered that running a quick no-mem snapshot or disconnecting the NIC and reconnecting it caused the VM to unfreeze and pickup right where it left off. VMTools comes back and the system seems normal until it's next episode, which may be hours away.

How does one begin to troubleshoot this? Not affecting all our Windows servers, just primarily the ones running this service. Even running Task manager on the console while it freezes has not gotten us any closer to a solution.  Is there some way to see what the VM CPU is doing when this freeze occurs? I know we can pause the VM in this state and get a memory dump, but is that even helpful? What is it about taking a snapshot or editing the NIC that would cause the VM CPU to unfreeze? 

We have 48 Win2016 servers in this cluster and this issue is affecting about 16 of them. There are 8 hosts in this cluster and it doesnt seem to matter which host they are on.

0 Kudos
0 Replies