VMware Horizon Community
KyleCompassion
Enthusiast
Enthusiast

all VM's on hosts going Agent Unreachable randomly

So I've been trying to figure this issue out since late December. Tickets with VMware aren't getting far, so now I'm hoping the forums can get me in the right direction.

so we've now had 5 instances since December where we leave for the day and all VM's are fine (0 problem desktops) and come in the next morning and all VM's on a host are Agent Unreachable. The Vm's are still running, they are still responsive, can be logged into via Console, have valid IP's, etc. If I vMotion them to another host they immediately become come out of agent unreachable without needing a reboot, service restarts, etc. I can then vMotion them back onto the original host and they continue to work without issue. This has only ever happened after hours, vCOps doesn't indicate any issues/alerts/anomalies, and has happened on 4 different hosts out of 5 occurrences. I don't see any odd metrics in the performance graphs for cpu, ram, disk, or networking. our SAN metrics don't show anything weird happening. I thought maybe vShield or MOVE were doing something but those logs aren't indicating anything either. The VM logs don't have anything super obvious happening in them. I did see a couple of events where the WSNM service is restarting, but it restarts successfully each time. The fact they are responsive throws me off of the vShield line of thinking where a similar situation happens but the VM itself is unresponsive.

Has anyone seen behavior like this and have a "check out X" kind of direction to point me?

Reply
0 Kudos
3 Replies
mpryor
Commander
Commander

> I did see a couple of events where the WSNM service is restarting

While the VM is idle on the host? That is probably significant, the service should always be running. Are you able to attach a log bundle, generated while the VM is seen as unavailable, to the thread?

Reply
0 Kudos
jdre2134
Contributor
Contributor

I had this same issue pretty much.  It appears that it was being caused by Kaspersky Security for Virtualization on my end.  Every time I shut down the SVM for that particular host all the agents went back out of an unreachable state.

Reply
0 Kudos
Junny321
Contributor
Contributor

jdre2134, did you found the way to solve the problem? Manual reboot of kaspersky security device isn't the solution Smiley Happy

Reply
0 Kudos