Hello,
we have a 3-node Cluster with 3 x ESXi 6.0 (Build 5050593) managed by vCSA 6.5.0.5300. A Windows 2008 R2 VM in this Cluster (12 x vCPU & 64 GB RAM, Terminalserver with 200 users, ~4 TB HDD,1xNIC(VMXNet3), HW-Version:10, Tools Version 10249, ) has a problem with VMWare Tools and i'm unable to figure out what could be the root cause.
Problem:
- After some days (mostly 1-7) the Summary Tab of the VM shows VMWare Tools as "not Running" via:
- vSphere Web Client
- new HTML5 Client
- vSphere Client directly to the host the VM is running on (this maybe leeds to that this is not a vCenter problem ?)
- Within the Guest OS "services.msc" shows VMWare Tools as running & VMWare Tools Icon in Taskbar shows "VMWare Tools service is running" ?!
- I thought about the VMXNET3 NIC in use - it is pingable and there are no problems for the Terminalsessions every day so the Tools are running
- As far as i know if the tools are not running VMXNET3 Adapters won't be reachable anymore
- There are more then 15 x Win2K8R2 VM's in the same Cluster that have the same Tools & HW Version and are not affected - or have ever been
So this seems to be a buggy or faulty view in the summary tab. OK so far so good but the problem comes with Veeam v.9.5 looking trough the vCenter API and for this VM is reading the status "VMWare Tools not running". If the Tools are not running, the Veeam Guest Agent is unable to start and this ends in an unsuccessful hotbackup.
What i've done so far:
--> opened vmware support ticket (so far the support did not find the cause of this)
--> if i restart the tools service or reboot --> vmware tools appear as "running" within the guestOS & the summary tab but the problem reappears within 1-7 days
followed https://kb.vmware.com/kb/2063887:
--> followed the steps carefully --> solved the issue for 5 days --> problem reappeared (in the past there were some weeks where the tools had no problems for 6 or 7 days, too. So im not sure if reinstallation of the tools did anything)
vmware support told me to follow https://kb.vmware.com/kb/2149642
--> edited vmx --> problem not solved
followed https://kb.vmware.com/kb/1007873
- i tried to enable tools debugging logs but the logging does not really work or im doin something wrong.
tools.conf:
[logging]
log = true
vmtoolsd.level = debug
vmtoolsd.handler = file
vmtoolsd.data = c:/temp/vmtoolsd.log
maxOldLogFiles = 50
maxLogSize = 10
But every generated Log is less then 10 MB and logs only 15 Minutes. Am i missing something in the tools.conf ? Any tipps how to get a full log of 24h ?
Any help would be nice,
Thanks in Advance
Hello,
sorry for the late Reply. 3 disks (3,5 TB (2 Partitions), 2 TB (2 Partitions), 600 GB(1 Partition). However:
- Before 2 Weeks we upgraded vCSA to Version 6.5 U1 / Tools 10.1.7 - Problem not solved
- i couldnt believe it but... after upgrading Tools to Version 10.1.10 .... the Problem was solved 🙂
Me & VMware Support are not able to identify the root cause.... But at the Moment vCSA 6.5 U1 and / or Tools Version 10.1.10 solved the issue.
Best Regards
YllowDnk
Hello,
short update:
Meanwhile we found out that excessive terminalserver traffic kills tools debugging logging and the only workaround is to write the tools debugging logs directly into the virtual machine logs (changed vmtoolsd.handler = file to vmtoolsd.handler = vmx). After this change logging is working for this vm.
We updated tools ver 1.09 --> 1.17 --> problem not solved
In the logs we can not see any problems except of:
2017-06-26T17:02:14.809Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2017-06-26T17:02:23.185Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2017-06-26T17:02:23.185Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down
2017-06-26T17:02:23.186Z| vmx| I125: GuestRpc: Reinitializing Channel 1(toolbox)
2017-06-26T17:02:23.186Z| vmx| I125: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed
2017-06-26T17:02:30.565Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-06-26T17:02:30.565Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down
2017-06-26T17:02:30.566Z| vmx| I125: GuestRpc: Reinitializing Channel 4(toolbox-dnd)
2017-06-26T17:02:30.566Z| vmx| I125: GuestMsg: Channel 4, Cannot unpost because the previous post is already completed
2017-06-26T17:02:49.722Z| vcpu-4| I125: GuestMsg: channel 5: wrong cookie, discarding message.
2017-06-26T17:02:49.722Z| vcpu-4| I125: GuestMsg: channel 5: wrong cookie, discarding message.
2017-06-26T17:02:49.728Z| vcpu-10| I125: Guest: *** WARNING: GuestInfo collection interval longer than expected; actual=79 sec, expected=30 sec. ***
2017-06-27T17:01:52.718Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2017-06-27T17:01:58.804Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2017-06-27T17:01:58.804Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down
2017-06-27T17:01:58.805Z| vmx| I125: GuestRpc: Reinitializing Channel 5(toolbox)
2017-06-27T17:01:58.805Z| vmx| I125: GuestMsg: Channel 5, Cannot unpost because the previous post is already completed
2017-06-27T17:02:00.289Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-06-27T17:02:00.289Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down
2017-06-27T17:02:00.289Z| vmx| I125: GuestRpc: Reinitializing Channel 3(toolbox-dnd)
2017-06-27T17:02:00.289Z| vmx| I125: GuestMsg: Channel 3, Cannot unpost because the previous post is already completed
2017-06-27T17:02:07.665Z| vcpu-0| I125: Guest: [ debug] [vmsvc:vmtoolsd] CNTService::HandlerEx(14)
2017-06-27T17:02:27.718Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2017-06-27T17:02:29.768Z| vcpu-10| I125: Guest: *** WARNING: GuestInfo collection interval longer than expected; actual=64 sec, expected=30 sec. ***
2017-06-29T17:01:42.570Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2017-06-29T17:01:51.386Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox timed out.
2017-06-29T17:01:51.386Z| vmx| I125: GuestRpc: app toolbox's second ping timeout; assuming app is down
2017-06-29T17:01:51.387Z| vmx| I125: GuestRpc: Reinitializing Channel 2(toolbox)
2017-06-29T17:01:51.387Z| vmx| I125: GuestMsg: Channel 2, Cannot unpost because the previous post is already completed
2017-06-29T17:01:54.484Z| vcpu-2| I125: Guest: [ debug] [vmsvc:vmtoolsd] CNTService::HandlerEx(14)
2017-06-29T17:01:56.590Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-06-29T17:01:56.590Z| vmx| I125: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down
2017-06-29T17:01:56.590Z| vmx| I125: GuestRpc: Reinitializing Channel 4(toolbox-dnd)
2017-06-29T17:01:56.590Z| vmx| I125: GuestMsg: Channel 4, Cannot unpost because the previous post is already completed
2017-06-29T17:02:14.570Z| vcpu-0| I125: Tools: Tools heartbeat timeout.
2017-06-29T17:02:18.803Z| vcpu-8| I125: GuestMsg: Channel 4, Protocol error, state: 0
2017-06-29T17:02:18.803Z| vcpu-8| I125: GuestMsg: Cannot close channel 4: it is not opened
2017-06-29T17:02:18.806Z| vcpu-6| I125: Guest: *** WARNING: GuestInfo collection interval longer than expected; actual=82 sec, expected=30 sec. ***
This happens after the snapshot process of veeam.i guess this is a normal warning during or after snapshot process. Is this a problem ? Is there maybe a way to max the expected=30 sec
setting to a higher value ?
Regards
Hi YllowDnk,
how many partition/disk on the server?
Thanks
Hello,
sorry for the late Reply. 3 disks (3,5 TB (2 Partitions), 2 TB (2 Partitions), 600 GB(1 Partition). However:
- Before 2 Weeks we upgraded vCSA to Version 6.5 U1 / Tools 10.1.7 - Problem not solved
- i couldnt believe it but... after upgrading Tools to Version 10.1.10 .... the Problem was solved 🙂
Me & VMware Support are not able to identify the root cause.... But at the Moment vCSA 6.5 U1 and / or Tools Version 10.1.10 solved the issue.
Best Regards
YllowDnk
PS: In Addition to Thread VMware Tools 10.1.10 - System tray icon no longer available
I suggest the assumption that my problem had something to do with multiple vmtoolsd.exe processes (one process for every loggedin rdp-user). In 10.1.10 one Change was that there is only one vmtoolsd.exe process regardless of how many rdp-users are logged in... VMware didn't mention this Change in the release notes of Tools Version 10.1.10 :-(. Maybe because they don't want to speak about an "issue" when multiple prcoesses are generated, one for every rdp-user 😉
Release Notes 10.1.10: