ESX 3.0.2 - ping latency

admin · ‎08-06-2007

Hey all,

we are experiencing trouble after we upgraded to esx 3.0.2. I did one machine last friday and did not notice the problem immediately, today it patched another esx and after the boot sequence was complete i saw this strange behavior..

Normally all our servers have a ping latency around 1 (or smaller) as shown below

Pinging esx1.uz.kuleuven.ac.be \[172.22.3.34] with 32 bytes o

Reply from 172.22.3.34: bytes=32 time=1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

...

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Ping statistics for 172.22.3.34:

Packets: Sent = 24, Received = 24, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

the 2 hosts that are running 3.0.2 have these response times:

Pinging playvmware.uz.kuleuven.ac.be \[172.22.1.79] with 32

Reply from 172.22.1.79: bytes=32 time=1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=108ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time<1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=33ms TTL=63

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=11ms TTL=63

Ping statistics for 172.22.1.79:

Packets: Sent = 33, Received = 33, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 108ms, Average = 8ms

can someone verify this? or are we the only one with this issue.. it seems odd to me.. having 2 boxes just upgraded and experiencing the same problem

bertdb · ‎08-06-2007

tom, are you sure this is a \_problem_ ?

Question: do you ping from a service console ? Just as in a VM, timing in the service console is unreliable on small scale. Try pinging from a physical machines, and post the measurements there.

bertdb · ‎08-06-2007

one additional question: do you see differences in reply latency on the VMKernel interfaces as well ?

admin · ‎08-06-2007

the ping starts from a physical machine (ex. my laptop or desktop pc from other IT-colleague) to the console of the ESX. i've tried pinging from a vm to the same console (the vm is NOT on this esx) and i'm getting same latency times

a ping to the vmkernel seems to 'work' as normal

Pinging playvmware-kernel.uz.kuleuven.ac.be \[172.22.3.21] with 32 bytes of data

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

...

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Ping statistics for 172.22.3.21:

Packets: Sent = 43, Received = 43, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

Message was edited by:

TomVDB

bertdb · ‎08-06-2007

OK, if it's not just the measurement, then it looks like the service console that responds in a "chunky" way, not getting a constant stream of CPU cycles. Has the CPU reservation for the service console changed ?

admin · ‎08-06-2007

nope.. on both systems nothing was changed before or after installing the patch..

amr77 · ‎08-07-2007

We are experiencing the same problem after upgrading from 3.0.1 (no patches) to 3.0.2.

We are running on Dell PowerEdge 1955 blades (Broadcom 5708 chipset) connected to PowerConnect 5316M within the chassis. These then connect to our core network via Cisco 6509E switches.

I noticed the slow ping responses time while I was investigating a problem with poor outbound transfer speed from the service console to a smbfs mount.

The VMKernel for VMotion shares the same physical NIC as the service console - VMKernel pings seem normal (<1ms) and VMotion times are comparable to prior the upgrade. All Virtual Machine traffic uses the other NIC and data transfer rates for the VMs are OK.

After some tests we are getting the following (all servers within same chassis and on the same VLAN so traffic should be internal to 5316M switches):

Outbound

cp to a smbfs mount to Physical Windows server - 0.4MB/s

scp to another 3.0.2 ESX server - 20MB/s

ftp to Physical Windows server - 60MB/s

Inbound

psftp from Physical Windows server to ESX - 10MB/s

Strangely when we initiate an inbound / outbound transfer the ping response times reduce to <1ms. Once finished it then deteriorates to a 8ms average (pinging from physical Windows server and other ESX service consoles).

Our servers have been upgraded from 3.0.0 to 3.0.1 and now to 3.0.2 and also the firmware has never been upgraded since they were implemented just under a year ago so I took another 1955 blade with the latest firmware and installed a fresh copy of 3.0.2 - we get exactly the same poor ping times and smbfs transfer rates.

Also to rule out the PowerConnect 5316M chassis switches I have changed it to a pass-through module so it connects directly with our Cisco switches and again we get the same results.

I am surprised nobody else has noticed this but maybe it is related to a certain network chipset and the driver updates in 3.0.2.

What hardware / network chipset do you use?

admin · ‎08-07-2007

i've opened a service request with vmware this morning and a tech.rep called me back in the afternoon.

he has updated a system of his own and experienced the same problem... seeing ping latency's on the system, it looks like a 'bug' in 3.0.2

although he tell's me that it's not a big problem, i'm finding it a severe problem as i don't want to upgrade my production esx'es with a possible service console problem. so he is looking into it now if he can find somethings that might have changed.

bfrank81 · ‎08-07-2007

Has anyone reached out to support concerning this issue? I was preparing to perform a fresh install on all of my ESX hosts (4 x Dell 2950) of the 3.0.2 version.

I'm thinking now maybe I should wait.

amr77 · ‎08-07-2007

I believe the 2950s use the Broadcom 5708 chipset so I would certainly test the install on a spare server if you have one to see if you have the same problem.

I tested 3.0.2 out on a non-Live server but it did not occur to me to test copying files to smbfs mounts which we only actively use in the Live environment. Luckily the Virtual Machine network seems unaffected.

I have not raised a service request yet as I have only just finished testing the various scenarios and configs.

TomVDB, do you have a reference number so I can relate my case to your issue? Also, what is your hardware / network chipset? I would like to see if we have anything in common.

admin · ‎08-07-2007

i've seen the problem on 2 hosts at this point, a dell 6850 and a dell 2650 (both test machines).. my 4 other test servers are waiting to get the update but now i'm waiting on vmware to solve this

the case number i've got is 192332201.

Network on the 6850 is a bond - BCM5704 and intel 8254nxx

Network on the 2650 is a BCM5703

while typing this i'm goiing to test with the 6850 and disable the broadcom card and see if it's broadcom related... -> doesn't seem to have any effect, ping latency stays

Message was edited by:

TomVDB

Ridiz · ‎08-07-2007

I'm seeing the same ping latency increase (~5-10ms) after upgrading an HP DL385 G2 running ESX 3.0.1 (42829) to 3.0.2. The service console is on the integrated NICs, which are Broadcom BCM5708s.

admin · ‎08-07-2007

ok so it seems to be a 3.0.2 problem where most of us see it happen with broadcom nics (although in my test the intel card also had the problem...) so if somebody can test this with only intel cards then i'm pretty sure it has nothing to do with broadcom but definitely is a vmware esx-console problem.

i don't know how you guys think about it but i would like to see this fixed before upgrading any other machines...

bertdb · ‎08-07-2007

Tom, please check whether you see the problem with larger ping packets as well ? (default ping size is 64 bytes, try with 500, 1000 or 1500 bytes).

admin · ‎08-07-2007

no improvements when using larger ping packets

alhamad · ‎08-07-2007

I just upgraded my servers to 3.0.2 and i experience the same issue. It actually get worst when I ping with a larger packets. However, I pinged two VMs and there seem to be no effect.

thanifan · ‎08-07-2007

same here, HP BL45P and BL25P all G1's, patch panels in the chassis to 1gb cisco catalysts, all in the same subnet

3.0.2 varies between 1ms and 19ms, usually stays around 9ms, while 3.0.1 (patched or not) is 1ms constantly

however, the VM's on both versions respond in 1ms always so this little thing by itself isn't going to keep me from upgrading

acaaew · ‎08-07-2007

I am also another one thats having this problem also. My Dell 1950 with 3.0.1 has no problem. I installed 3.0.2 from scratch on another 1950 and its having the slow ping times. Also it seems like the first reply takes the longest and sometimes it times out. Both of the boxes are using the onboard broadcom nics.

doubleH · ‎08-07-2007

count me in as well. DL385 g2, but I am NOT using the onboard broadcom nics for the service console. i am using HP NC360T nics which look like intel nics with the 82571EB chipset.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

Net1Pro · ‎08-07-2007

Same here after upgrading to 3.0.2. Several DL380G5s and DL 385G2s.

Pings on most servers stay at an average of about 8 Ms. Every once in a while I get a 1 or 2 Ms response.

One server though, is firmly at 8Ms, no more no less, even though, the VMs in this host get a <1.

There is definitly something going on here.