VMware Cloud Community
jrapine
Enthusiast
Enthusiast
Jump to solution

VMotion Live Migrations on High-Load Guests

Wondering if anyone has experienced symptoms similar to the following:

A guest VM (Windows 2008r2) has high memory and CPU load, and is live migrated to a different host within the cluster. No storage migration. Shortly after the migration event is completed, no errors recorded in vSphere, CPU usage spikes near 100%, VM performance begins to degrade, and must be rebooted to get back to expected performance levels. We're not even sure VMware is causing this at this point, as load on the affected guests is quite high as it is. We're trying to determine if it's possible that the live migration is exacerbating the issue.

I've opened a ticket with support, but all they could do is check over our VMotion setup. We were unable to reproduce the issue for them.

1 Solution

Accepted Solutions
jrapine
Enthusiast
Enthusiast
Jump to solution

Thanks all for the replies. I was finally able to reproduce the issue for support, and after digging around the vmkernel.log file support determined that the issue was caused by lingering issues following an All Paths Down event. They recommended that we reboot the hosts that were affected by the APD event. We haven't seen a recurrence since rebooting the hosts.

View solution in original post

Reply
0 Kudos
7 Replies
weinstein5
Immortal
Immortal
Jump to solution

Welcome to the Community - I have only seen this type of behavior when vmotioning a heavy loaded machine to a host that is close to the limit as it is - was the vmotion manual or triggered by DRS?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
jrapine
Enthusiast
Enthusiast
Jump to solution

This was triggered by DRS. What would you define as "close to the limit" in such a scenario? Are we talking memory or CPU, or both?

Reply
0 Kudos
weinstein5
Immortal
Immortal
Jump to solution

Could be one or the other or both but since DRS triggered the vmotion that is not it - DRS would not have moved it if there were inufficient resources - what workload is this VM carrying?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
jrapine
Enthusiast
Enthusiast
Jump to solution

It's a 2008 R2 Xenapp 6.5 server with 30+ sessions. CPU usage is typically 25-65% with 75-90% RAM usage.

Reply
0 Kudos
Shingeki
Contributor
Contributor
Jump to solution

Did you check your non-paged pool size? There's a chance that you have a memory leak somewhere. Check your VMTools version, is it outdated? Update it.

williambishop
Expert
Expert
Jump to solution

I have seen this behaviour in the past, it's usually related to high memory on the target host (90%+). If you have the time, it will after a while calm back down....but it can take a while. Is there a reason you're running these so close to the edge?

--"Non Temetis Messor."
jrapine
Enthusiast
Enthusiast
Jump to solution

Thanks all for the replies. I was finally able to reproduce the issue for support, and after digging around the vmkernel.log file support determined that the issue was caused by lingering issues following an All Paths Down event. They recommended that we reboot the hosts that were affected by the APD event. We haven't seen a recurrence since rebooting the hosts.

Reply
0 Kudos