Windows Server 2008 R2 VM network latency - ESX 4.0 Update 1
We have a Windows 2008 R2 VM running an IBM Rational Clearcase v7.1.2 VOB server on a small production cluster made up of 3 Dell M610 blades running ESX 4.0 Update 1 (208167) in a DRS/HA cluster. We have found that the clearcase server VM has been performing inconsistently when delivering file data to clients. Some requests complete in a reasonable, expected amount of time and others take about 3x the expected time. We have been looking at the network performance and have found some very odd behavior. When we ping the VM continuously for a while we see that there are bursts of network latency where a ping may take > 1000ms to return. We have looked at the physical switches and found no misconfigurations or bottlenecks. In fact we have gotten the same results pinging the VM from another VM on the same host (same vswitch) as we get pinging from our desktops. The VM has a single E1000 vnic. It has the VMware Tools install from ESX 4.0 update 1. We have not applied all of the latest vmware patches or tools updates. IPv6 was disabled in the windows registry and the system has been rebooted since that change. The VM has 4 vcpus and 8GB memory. esxtop shows 0% RDY states on the CPU and memory is only using 2.5 GB. Any ideas on how we can investigate the network performance further?
Wow how things have changed. We wrestled with these issues for a long time and finally found that the problem is related to an issue with removing snapshots on ESX 4.0 and 4.1. We had been using VDR for backups and it takes a snapshot of each VM before starting the backup process. The clearcase server usually had pretty large deltas during the day, so the backup would take a few hours and then try to remove the snapshot. When this happened, clearcase clients taht were currently active would lose network connectivity to the server because of a bug in ESX taht has been fixed in later 4.1 updates and in ESX 5. Be careful using vm backup appliances that use snapshots unless you have upgraded ESX past 4.0/4.1.