VMware Cloud Community
admin
Immortal
Immortal

ESX 3.0.2 - ping latency

Hey all,

we are experiencing trouble after we upgraded to esx 3.0.2. I did one machine last friday and did not notice the problem immediately, today it patched another esx and after the boot sequence was complete i saw this strange behavior..

Normally all our servers have a ping latency around 1 (or smaller) as shown below

Pinging esx1.uz.kuleuven.ac.be \[172.22.3.34] with 32 bytes o

Reply from 172.22.3.34: bytes=32 time=1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

...

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Ping statistics for 172.22.3.34:

Packets: Sent = 24, Received = 24, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

the 2 hosts that are running 3.0.2 have these response times:

Pinging playvmware.uz.kuleuven.ac.be \[172.22.1.79] with 32

Reply from 172.22.1.79: bytes=32 time=1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=108ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time<1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=33ms TTL=63

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=11ms TTL=63

Ping statistics for 172.22.1.79:

Packets: Sent = 33, Received = 33, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 108ms, Average = 8ms

can someone verify this? or are we the only one with this issue.. it seems odd to me.. having 2 boxes just upgraded and experiencing the same problem

0 Kudos
187 Replies
bertdb
Virtuoso
Virtuoso

tom, are you sure this is a \_problem_ ?

Question: do you ping from a service console ? Just as in a VM, timing in the service console is unreliable on small scale. Try pinging from a physical machines, and post the measurements there.

0 Kudos
bertdb
Virtuoso
Virtuoso

one additional question: do you see differences in reply latency on the VMKernel interfaces as well ?

0 Kudos
admin
Immortal
Immortal

the ping starts from a physical machine (ex. my laptop or desktop pc from other IT-colleague) to the console of the ESX. i've tried pinging from a vm to the same console (the vm is NOT on this esx) and i'm getting same latency times

a ping to the vmkernel seems to 'work' as normal

Pinging playvmware-kernel.uz.kuleuven.ac.be \[172.22.3.21] with 32 bytes of data

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

...

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Reply from 172.22.3.21: bytes=32 time<1ms TTL=63

Ping statistics for 172.22.3.21:

Packets: Sent = 43, Received = 43, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

Message was edited by:

TomVDB

0 Kudos
bertdb
Virtuoso
Virtuoso

OK, if it's not just the measurement, then it looks like the service console that responds in a "chunky" way, not getting a constant stream of CPU cycles. Has the CPU reservation for the service console changed ?

0 Kudos
admin
Immortal
Immortal

nope.. on both systems nothing was changed before or after installing the patch..

0 Kudos
amr77
Contributor
Contributor

We are experiencing the same problem after upgrading from 3.0.1 (no patches) to 3.0.2.

We are running on Dell PowerEdge 1955 blades (Broadcom 5708 chipset) connected to PowerConnect 5316M within the chassis. These then connect to our core network via Cisco 6509E switches.

I noticed the slow ping responses time while I was investigating a problem with poor outbound transfer speed from the service console to a smbfs mount.

The VMKernel for VMotion shares the same physical NIC as the service console - VMKernel pings seem normal (<1ms) and VMotion times are comparable to prior the upgrade. All Virtual Machine traffic uses the other NIC and data transfer rates for the VMs are OK.

After some tests we are getting the following (all servers within same chassis and on the same VLAN so traffic should be internal to 5316M switches):

Outbound

cp to a smbfs mount to Physical Windows server - 0.4MB/s

scp to another 3.0.2 ESX server - 20MB/s

ftp to Physical Windows server - 60MB/s

Inbound

psftp from Physical Windows server to ESX - 10MB/s

Strangely when we initiate an inbound / outbound transfer the ping response times reduce to <1ms. Once finished it then deteriorates to a 8ms average (pinging from physical Windows server and other ESX service consoles).

Our servers have been upgraded from 3.0.0 to 3.0.1 and now to 3.0.2 and also the firmware has never been upgraded since they were implemented just under a year ago so I took another 1955 blade with the latest firmware and installed a fresh copy of 3.0.2 - we get exactly the same poor ping times and smbfs transfer rates.

Also to rule out the PowerConnect 5316M chassis switches I have changed it to a pass-through module so it connects directly with our Cisco switches and again we get the same results.

I am surprised nobody else has noticed this but maybe it is related to a certain network chipset and the driver updates in 3.0.2.

What hardware / network chipset do you use?

0 Kudos
admin
Immortal
Immortal

i've opened a service request with vmware this morning and a tech.rep called me back in the afternoon.

he has updated a system of his own and experienced the same problem... seeing ping latency's on the system, it looks like a 'bug' in 3.0.2

although he tell's me that it's not a big problem, i'm finding it a severe problem as i don't want to upgrade my production esx'es with a possible service console problem. so he is looking into it now if he can find somethings that might have changed.

0 Kudos
bfrank81
Enthusiast
Enthusiast

Has anyone reached out to support concerning this issue? I was preparing to perform a fresh install on all of my ESX hosts (4 x Dell 2950) of the 3.0.2 version.

I'm thinking now maybe I should wait.

0 Kudos
amr77
Contributor
Contributor

I believe the 2950s use the Broadcom 5708 chipset so I would certainly test the install on a spare server if you have one to see if you have the same problem.

I tested 3.0.2 out on a non-Live server but it did not occur to me to test copying files to smbfs mounts which we only actively use in the Live environment. Luckily the Virtual Machine network seems unaffected.

I have not raised a service request yet as I have only just finished testing the various scenarios and configs.

TomVDB, do you have a reference number so I can relate my case to your issue? Also, what is your hardware / network chipset? I would like to see if we have anything in common.

0 Kudos
admin
Immortal
Immortal

i've seen the problem on 2 hosts at this point, a dell 6850 and a dell 2650 (both test machines).. my 4 other test servers are waiting to get the update but now i'm waiting on vmware to solve this Smiley Happy

the case number i've got is 192332201.

Network on the 6850 is a bond - BCM5704 and intel 8254nxx

Network on the 2650 is a BCM5703

while typing this i'm goiing to test with the 6850 and disable the broadcom card and see if it's broadcom related... -> doesn't seem to have any effect, ping latency stays

Message was edited by:

TomVDB

0 Kudos
Ridiz
Enthusiast
Enthusiast

I'm seeing the same ping latency increase (~5-10ms) after upgrading an HP DL385 G2 running ESX 3.0.1 (42829) to 3.0.2. The service console is on the integrated NICs, which are Broadcom BCM5708s.

0 Kudos
admin
Immortal
Immortal

ok so it seems to be a 3.0.2 problem where most of us see it happen with broadcom nics (although in my test the intel card also had the problem...) so if somebody can test this with only intel cards then i'm pretty sure it has nothing to do with broadcom but definitely is a vmware esx-console problem.

i don't know how you guys think about it but i would like to see this fixed before upgrading any other machines...

0 Kudos
bertdb
Virtuoso
Virtuoso

Tom, please check whether you see the problem with larger ping packets as well ? (default ping size is 64 bytes, try with 500, 1000 or 1500 bytes).

0 Kudos
admin
Immortal
Immortal

no improvements when using larger ping packets Smiley Sad

0 Kudos
alhamad
Enthusiast
Enthusiast

I just upgraded my servers to 3.0.2 and i experience the same issue. It actually get worst when I ping with a larger packets. However, I pinged two VMs and there seem to be no effect.

0 Kudos
thanifan
Enthusiast
Enthusiast

same here, HP BL45P and BL25P all G1's, patch panels in the chassis to 1gb cisco catalysts, all in the same subnet

3.0.2 varies between 1ms and 19ms, usually stays around 9ms, while 3.0.1 (patched or not) is 1ms constantly

however, the VM's on both versions respond in 1ms always so this little thing by itself isn't going to keep me from upgrading

0 Kudos
acaaew
Contributor
Contributor

I am also another one thats having this problem also. My Dell 1950 with 3.0.1 has no problem. I installed 3.0.2 from scratch on another 1950 and its having the slow ping times. Also it seems like the first reply takes the longest and sometimes it times out. Both of the boxes are using the onboard broadcom nics.

0 Kudos
doubleH
Expert
Expert

count me in as well. DL385 g2, but I am NOT using the onboard broadcom nics for the service console. i am using HP NC360T nics which look like intel nics with the 82571EB chipset.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points
0 Kudos
Net1Pro
Enthusiast
Enthusiast

Same here after upgrading to 3.0.2. Several DL380G5s and DL 385G2s.

Pings on most servers stay at an average of about 8 Ms. Every once in a while I get a 1 or 2 Ms response.

One server though, is firmly at 8Ms, no more no less, even though, the VMs in this host get a <1.

There is definitly something going on here.

0 Kudos