VMware Cloud Community
bean3178
Contributor
Contributor

Periodic Guest Ping timeouts

I have a support case open on this - no resolution yet so figured I'd ask here.

We have 2 VM's that were migrated using the VMware Converter. Both from identical physical hardware. The VM's will periodically/randomly timeout for 20 seconds or so. This happens 10-20 times per day. Other VM's don't seem to have this problem. The host is online with no problems. No errors are logged in the host. No interface errors/logs on the switches. I can still get into the VM via the service console when ping is timing out. In the guest OS, no errors are logged, the NIC link still shows up, etc. I can't ping out from the guest OS either. There was nothing wrong with the physical hardware. It's a bit old (dual PIII system), but stable. VMware Tools is installed. I went into the guest OS device manager, displayed all hidden (old) devices and removed them. That didn't help either.

Any thoughts?

0 Kudos
7 Replies
wpatton
Expert
Expert

I assume we are talking ESX 3.5 or ESXi 3.5? If so, have you reviewed Performance for both of them? When we have had similar behavior, typically the storage queue depth or CPU time was high and basically the system was just not capable of responding.

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

*Disclaimer: VMware Employee* If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
bean3178
Contributor
Contributor

Sorry, talking about ESX 3.5 Update 2.

Both of these VM's are idle. They are powered on with no load. The first VM is consuming 70 Mhz, 16% guest memory and the second is 18 Mhz adn 1% guest memory. I have tried setting CPU and memory reservations, but that didn't help. Support mentioned a possible problem with the underlying hardware, but the old server is still online and funtional (with the NIC disconnected). We haven't had any hardware problems.

0 Kudos
wpatton
Expert
Expert

I wouldn't mess too much with reservations just yet. Are these configured as SMP or single vCPU?

Also, what does the storage performance look like?

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

*Disclaimer: VMware Employee* If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
bean3178
Contributor
Contributor

The physical machines were dual PIII 1 GHz. I've tried both 1 and 2 vProc's - same issue occurs. This is a new cluster that is at ~1% usage overall. Though it's a new install, I'm confident everything is configured correctly. Disk usage is at 633 KBps average for the entire host. CPU is at 418 Mhz. We're using an iSCSI SAN with a Dell MD3000i for storage. All VM's on the host are on the same LUN. A conversion was done from a physical machine using different hardware and that VM is working fine. We also have a RHEL VM up that isn't having any problems.

0 Kudos
bean3178
Contributor
Contributor

So, today I created 2 new VM's and installed a fresh OS. I migrated client/application settings to the new VM's and deleted the old ones. To my surprise, the new VM's are still timing out. This time during a timeout, I pinged the vNIC IP in the VM and I got a ping response. However, I could not ping the gateway. So, at this point the problem either looks like its with the vSwitch or our Cisco 3750's (we're homed into 2 separate ones using trunks). I guess I'll try to disable the vSwitch load balancing and see if that fixes the problem. The 2 VM's are on different hosts and also on different VLAN's. Very odd.

0 Kudos
rahstan
Contributor
Contributor

Hello bean3178.

Did you ever solve this issue?

I am seeing a similar if not identical issue.

Any help would be appreciated.

0 Kudos
Snr_Whippy
Contributor
Contributor

Im also experiencing the timouts you describe. I have 2 vms on the one host but weirdly enough i dont get timouts on both guests at the same time.

Its only affecting the busier server.

Im using cisco 3750's any link?

All looks relatively quiet. The users experience weird stop start type performance with the server.

Any clues did anyone work it out?

0 Kudos