We have an SQL VM with very light load that has issues when you VMotion it. During normal operations I'm able to ping the VM at <1ms, but as soon as the VMotion starts the ping times jump up anywhere from 250ms to 3000+ ms with many dropped packets. We are able to VMotion any other VM in this cluster w/o any impact to the network latencies except for maybe one dropped packet. The issue happens when VMotioning the VM among all ESX hosts in the cluster and it happens every time. The hosts have a dedicated VMotion network.
Windows 2003 64bit
E1000 network adapter
4GB of RAM
Tools status is OK
Default resource settings
16 cores: 4 quad-core AMD Opteron 8354
128GB of RAM
DMX SAN storage (not sure of the exact details, but I can get them)
We have VMs under much more load with the same configuration and they VMotion w/o any issues. I've spoken with a VMware escalation engineer who said this is expected if you are VMotioning a SQL VM with 4GB or greater RAM.
Has anyone seen this before? Any suggestions? If you need more info to help, please let me know and I can try to get it. Thanks.
Does the R905 have onboard Broadcom 5708's ?
We recently converted to Dell M600 blades in DR that have the 5708's. Immediately afterwards our 4CPU/4GB x64 DR domain controller started doing the same thing. I figured it was a blade issue so I rebooted all blades one at a time, vmotion'ing the puppy around. Every time it became latent. None of the other VM's would. Granted all others were 32bit. This last weekend, I installed the latest patches that VMware released . Upon doing so and rebooting the first blade, I was able to vmotion the x64 server to the newly patched box with no issues. I then patched the rest of the cluster with the patches and as of now, everything vmotions great there, no lag... Very weird.. The patches aren't even for the 5708 broadcoms..