Hi all,
Hitting an issue in our environment that may not be resolvable but wanted to reach out to folks and see if they found a way to override.
Currently have vm monitoring enabled in our cluster. Thing is we would look to reboot VM's whose VMware tools are showing a status of "not running" - now the key thing is by most definitions the system is available when this happens, can be pinged, RDP'ed etc but we would still like them rebooted as it ultimately resolves the issue we see (eg unable to connect via citrix to the machines in question)
Duncan in the post below points out in order to avoid false positives any storage/network IO activity will be checked after heatbeats have failed in order to "double check" there is a problem with the VM
VM Monitoring (aka VM HA) heartbeat - Yellow Bricks
What I was wondering was is there a way to override the storage/network IO activity check and thus reboot the vm's as soon as the VMware tools stops running on them?
Many tx for any thoughts
"...is there a way to override the storage/network IO activity check and thus reboot the vm's as soon as the VMware tools stops running on them?..."
You can always use some process-supervision tool inside of VM for this (maybe in addition to VM-monitoring). It acts independently and allows much finer control, i.e. it can try first (re)starting vm-tools before rebooting. Even BSOD/kernel-panic can be handled by VM itself. It can be actually even more robust, not depending on VMware-infrastructure...
"...is there a way to override the storage/network IO activity check and thus reboot the vm's as soon as the VMware tools stops running on them?..."
You can always use some process-supervision tool inside of VM for this (maybe in addition to VM-monitoring). It acts independently and allows much finer control, i.e. it can try first (re)starting vm-tools before rebooting. Even BSOD/kernel-panic can be handled by VM itself. It can be actually even more robust, not depending on VMware-infrastructure...
tx yes kinda was thinking as much in terms of perhaps looking more "inward" into the VM itself - we have a few such tools internally so will see what we can do.
Tx for the feedback!