YllowDnk
Enthusiast
Enthusiast

Win2008R2 VM - Tools appear as "Not Running"

Jump to solution

Hello,

we have a 3-node Cluster with 3 x ESXi 6.0 (Build 5050593) managed by vCSA 6.5.0.5300. A Windows 2008 R2 VM in this Cluster (12 x vCPU & 64 GB RAM, Terminalserver with 200 users, ~4 TB HDD,1xNIC(VMXNet3), HW-Version:10, Tools Version 10249, ) has a problem with VMWare Tools and i'm unable to figure out what could be the root cause.

Problem:

- After some days (mostly 1-7) the Summary Tab of the VM shows VMWare Tools as "not Running" via:

               - vSphere Web Client
               - new HTML5 Client
               - vSphere Client directly to the host the VM is running on (this maybe leeds to that this is not a vCenter problem ?)

- Within the Guest OS "services.msc" shows VMWare Tools as running & VMWare Tools Icon in Taskbar shows "VMWare Tools service is running" ?!

- I thought about the VMXNET3 NIC in use - it is pingable and there are no problems for the Terminalsessions every day so the Tools are running
- As far as i know if the tools are not running VMXNET3 Adapters won't be reachable anymore

- There are more then 15 x Win2K8R2 VM's in the same Cluster that have the same Tools & HW Version and are not affected - or have ever been

So this seems to be a buggy or faulty view in the summary tab. OK so far so good but the problem comes with Veeam v.9.5 looking trough the vCenter API and for this VM is reading the status "VMWare Tools not running". If the Tools are not running, the Veeam Guest Agent is unable to start and this ends in an unsuccessful hotbackup.

What i've done so far:

--> opened vmware support ticket (so far the support did not find the cause of this)

--> if i restart the tools service or reboot --> vmware tools appear as "running" within the guestOS & the summary tab but the problem reappears within 1-7 days

followed https://kb.vmware.com/kb/2063887:
--> followed the steps carefully --> solved the issue for 5 days --> problem reappeared (in the past there were some weeks where the tools had no problems for 6 or 7 days, too. So im not sure if reinstallation of the tools did anything)

vmware support told me to follow https://kb.vmware.com/kb/2149642
--> edited vmx --> problem not solved

followed https://kb.vmware.com/kb/1007873

- i tried to enable tools debugging logs but the logging does not really work or im doin something wrong.

tools.conf:

[logging]
log = true

vmtoolsd.level = debug
vmtoolsd.handler = file
vmtoolsd.data = c:/temp/vmtoolsd.log

maxOldLogFiles = 50
maxLogSize = 10

But every generated Log is less then 10 MB and logs only 15 Minutes. Am i missing something in the tools.conf ? Any tipps how to get a full log of 24h ?

Any help would be nice,

Thanks in Advance

0 Kudos
1 Solution

Accepted Solutions
YllowDnk
Enthusiast
Enthusiast

Hello,

sorry for the late Reply. 3 disks (3,5 TB (2 Partitions), 2 TB (2 Partitions), 600 GB(1 Partition). However:

- Before 2 Weeks we upgraded  vCSA to Version 6.5 U1 / Tools 10.1.7 - Problem not solved

- i couldnt believe it but... after upgrading Tools to Version 10.1.10 .... the Problem was solved 🙂

Me & VMware Support are not able to identify the root cause.... But at the Moment vCSA 6.5 U1 and / or Tools Version 10.1.10 solved the issue.

Best Regards

YllowDnk

View solution in original post

0 Kudos
4 Replies
YllowDnk
Enthusiast
Enthusiast

Hello,

short update:

Meanwhile we found out that excessive terminalserver traffic kills tools debugging logging and the only workaround is to write the tools debugging logs directly into the virtual machine logs (changed vmtoolsd.handler = file to vmtoolsd.handler = vmx). After this change logging is working for this vm.

We updated tools ver 1.09 --> 1.17 --> problem not solved

In the logs we can not see any problems except of:

2017-06-26T17:02:14.809Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2017-06-26T17:02:23.185Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2017-06-26T17:02:23.185Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down

2017-06-26T17:02:23.186Z| vmx| I125: GuestRpc: Reinitializing Channel 1(toolbox)

2017-06-26T17:02:23.186Z| vmx| I125: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed

2017-06-26T17:02:30.565Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

2017-06-26T17:02:30.565Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down

2017-06-26T17:02:30.566Z| vmx| I125: GuestRpc: Reinitializing Channel 4(toolbox-dnd)

2017-06-26T17:02:30.566Z| vmx| I125: GuestMsg: Channel 4, Cannot unpost because the previous post is already completed

2017-06-26T17:02:49.722Z| vcpu-4| I125: GuestMsg: channel 5: wrong cookie, discarding message.

2017-06-26T17:02:49.722Z| vcpu-4| I125: GuestMsg: channel 5: wrong cookie, discarding message.

2017-06-26T17:02:49.728Z| vcpu-10| I125: Guest: *** WARNING: GuestInfo collection interval longer than expected; actual=79 sec, expected=30 sec. ***

2017-06-27T17:01:52.718Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2017-06-27T17:01:58.804Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2017-06-27T17:01:58.804Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down

2017-06-27T17:01:58.805Z| vmx| I125: GuestRpc: Reinitializing Channel 5(toolbox)

2017-06-27T17:01:58.805Z| vmx| I125: GuestMsg: Channel 5, Cannot unpost because the previous post is already completed

2017-06-27T17:02:00.289Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

2017-06-27T17:02:00.289Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down

2017-06-27T17:02:00.289Z| vmx| I125: GuestRpc: Reinitializing Channel 3(toolbox-dnd)

2017-06-27T17:02:00.289Z| vmx| I125: GuestMsg: Channel 3, Cannot unpost because the previous post is already completed

2017-06-27T17:02:07.665Z| vcpu-0| I125: Guest: [   debug] [vmsvc:vmtoolsd] CNTService::HandlerEx(14)

2017-06-27T17:02:27.718Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2017-06-27T17:02:29.768Z| vcpu-10| I125: Guest: *** WARNING: GuestInfo collection interval longer than expected; actual=64 sec, expected=30 sec. ***

2017-06-29T17:01:42.570Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2017-06-29T17:01:51.386Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.

2017-06-29T17:01:51.386Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down

2017-06-29T17:01:51.387Z| vmx| I125: GuestRpc: Reinitializing Channel 2(toolbox)

2017-06-29T17:01:51.387Z| vmx| I125: GuestMsg: Channel 2, Cannot unpost because the previous post is already completed

2017-06-29T17:01:54.484Z| vcpu-2| I125: Guest: [   debug] [vmsvc:vmtoolsd] CNTService::HandlerEx(14)

2017-06-29T17:01:56.590Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

2017-06-29T17:01:56.590Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down

2017-06-29T17:01:56.590Z| vmx| I125: GuestRpc: Reinitializing Channel 4(toolbox-dnd)

2017-06-29T17:01:56.590Z| vmx| I125: GuestMsg: Channel 4, Cannot unpost because the previous post is already completed

2017-06-29T17:02:14.570Z| vcpu-0| I125: Tools: Tools heartbeat timeout.

2017-06-29T17:02:18.803Z| vcpu-8| I125: GuestMsg: Channel 4, Protocol error, state: 0

2017-06-29T17:02:18.803Z| vcpu-8| I125: GuestMsg: Cannot close channel 4: it is not opened

2017-06-29T17:02:18.806Z| vcpu-6| I125: Guest: *** WARNING: GuestInfo collection interval longer than expected; actual=82 sec, expected=30 sec. ***

This happens after the snapshot process of veeam.i guess this is a normal warning during or after snapshot process. Is this a problem ? Is there maybe a way to max the expected=30 sec

setting to a higher value ?

Regards

0 Kudos
sean_wang
VMware Employee
VMware Employee

Hi YllowDnk,

how many partition/disk on the server?

Thanks

0 Kudos
YllowDnk
Enthusiast
Enthusiast

Hello,

sorry for the late Reply. 3 disks (3,5 TB (2 Partitions), 2 TB (2 Partitions), 600 GB(1 Partition). However:

- Before 2 Weeks we upgraded  vCSA to Version 6.5 U1 / Tools 10.1.7 - Problem not solved

- i couldnt believe it but... after upgrading Tools to Version 10.1.10 .... the Problem was solved 🙂

Me & VMware Support are not able to identify the root cause.... But at the Moment vCSA 6.5 U1 and / or Tools Version 10.1.10 solved the issue.

Best Regards

YllowDnk

View solution in original post

0 Kudos
YllowDnk
Enthusiast
Enthusiast

PS: In Addition to Thread VMware Tools 10.1.10 - System tray icon no longer available

I suggest the assumption that my problem had something to do with multiple vmtoolsd.exe processes (one process for every loggedin rdp-user). In 10.1.10 one Change was that there is only one vmtoolsd.exe process regardless of how many rdp-users are logged in... VMware didn't mention this Change in the release notes of Tools Version 10.1.10 :-(. Maybe because they don't want to speak about an "issue" when multiple prcoesses are generated, one for every rdp-user 😉

Release Notes 10.1.10:

Resolved Issues

  • VMware Tools uninstaller is unable to stop VMware Tools service
    While uninstalling VMware Tools in a Linux guest operating system, VMware Tools uninstaller is unable to stop vmtoolsd service. This issue occurs in Linux distributions such as Ubuntu 15.04 and later, RHEL7 and later, and SLES12 and later. This issue is resolved in this release.
  • Installing VMware Tools on a 64-bit Windows virtual machine might result in an error
    After you install VMware Tools on a 64-bit Windows virtual machine, when the virtual machine boots up, the system might display the following error:
    VMware Tools unrecoverable error: (vthread-4)
    Exception 0xc0000005 (access violation) has occurred. This issue is resolved in this release.
  • Mouse movements in RDP sessions to Windows virtual machines are affected by MKS console mouse movements
    If an administrator uses the vSphere Client to open a console to a Windows virtual machine on which multiple users are logged in through terminal sessions, their mouse movements might become synchronized with the mouse movements of the administrator. This issue is resolved in this release.
  • VMware Tools upgrade on power cycle fails on Windows operating system
    VMware Tools upgrade on power cycle fails to complete on Windows operating system. This issue is caused by a corrupted manifest file. This issue is resolved in this release.
  • VMware Tools upgrade fails if /tmp is mounted as noexec
    An upgrade of VMware Tools fails on a Linux system where /tmp is mounted with the option noexec. This issue occurs because the upgrade binary cannot be executed from /tmp directory. This issue is resolved in this release.
  • Quiesced snapshot fails on a Japanese Windows Server 2008 R2 in vSphere
    After upgrading VMware Tools on Japanese Windows Server 2008 R2 to VMware Tools 10.1.0 or later, VMware Tools service on NT service process fails while taking a quiesced snapshot. This issue is resolved in this release.
  • Quiesced snapshots of Windows Server 2012 and Windows Server 2012 R2 virtual machines with VMware Tools 10.1.0 fails with an error
    Quiesced snapshots of Windows Server 2012 and Windows Server 2012 R2 virtual machines with VMware Tools 10.1.0 fails with an error. This issue occurs when VMware Tools service fails to respond, which automatically results in the change of status in the virtual machines. This issue is resolved in this release.
  • VMware Tools re-installation in repair mode triggers a warning
    VMware Tools re-installation in repair mode triggers a warning message similar to the following:
    "setup failed to install physical disk driver automatically...." This issue is resolved in this release.
  • Connecting to View fails with a black screen intermittently
    An issue that resulted in black screen to appear while connecting to View with Horizon View Agent hosted on ESXi 6.5 is fixed in this release. This issue is resolved in this release.
  • Upgrading VMware Tools to 10.1.0 in a Windows guest operating system results in system event log
    After upgrading VMware Tools to 10.1.0 results in system event log with 10010 error event by DCOM on the Windows Guest operating system. This issue is resolved in this release.
  • WMI performance adapter service fails on windows guest operating systems
    WMI performance adapter service (wmiapsrv.exe) fails on virtual machines running Windows 10 and Windows Server 2016. This issue is resolved in this release.
0 Kudos