VMware Cloud Community
admin
Immortal
Immortal

ESX 3.0.2 - ping latency

Hey all,

we are experiencing trouble after we upgraded to esx 3.0.2. I did one machine last friday and did not notice the problem immediately, today it patched another esx and after the boot sequence was complete i saw this strange behavior..

Normally all our servers have a ping latency around 1 (or smaller) as shown below

Pinging esx1.uz.kuleuven.ac.be \[172.22.3.34] with 32 bytes o

Reply from 172.22.3.34: bytes=32 time=1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

...

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Ping statistics for 172.22.3.34:

Packets: Sent = 24, Received = 24, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

the 2 hosts that are running 3.0.2 have these response times:

Pinging playvmware.uz.kuleuven.ac.be \[172.22.1.79] with 32

Reply from 172.22.1.79: bytes=32 time=1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=108ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time<1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=33ms TTL=63

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=11ms TTL=63

Ping statistics for 172.22.1.79:

Packets: Sent = 33, Received = 33, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 108ms, Average = 8ms

can someone verify this? or are we the only one with this issue.. it seems odd to me.. having 2 boxes just upgraded and experiencing the same problem

0 Kudos
187 Replies
Anders
Expert
Expert

Hi all.

FYI: we have reproduced the problem; latency, smb and backup issues seems related.

Looks like a bug due to change in behavior.

A bit of a blunder as the COS performance is suppose to be improved. :smileygrin:

\- Anders

0 Kudos
thickclouds
Enthusiast
Enthusiast

Has this been a confirmed bug by VMware? If so, will they release a fix?

Charlie Gautreaux vExpert http://www.thickclouds.com
0 Kudos
vigneng
Contributor
Contributor

I entered an SR yesterday and got a call back. They verified they have an engineer working the problem. They mentioned they think it has to do with interrupt handling. They did not give a timeline on the fix.

They had me run one test where I started a large file scp copy to the host and the ping rates went down to under 1ms during the copy. Once it finished it kicked right back up. I also noticed that pinging from the host out has no loss in speed it appears to be only traffic going to the server.

It is still a big problem. Working with my lab management software the ping rates are causing some timeouts talking to the agent on the ESX hosts.

Need a fix soon!!

0 Kudos
lholling
Expert
Expert

Hi Anders

Thanks for letting us know.

Is there some way that we can have a reference for the problem internally so that when we create an SR we can just be added to the list of customers waiting for the fix?

Leonard...

---- Don't forget if the answers help, award points
0 Kudos
vigneng
Contributor
Contributor

My SR # is: 193351021

I guess you could tell them to look at my SR and ask them to add you to the customers needing the fix. They did say they would keep the ticket open until they had a fix. Hope that helps.

0 Kudos
markus_herbert
Enthusiast
Enthusiast

I have 4 identical ESX 3.0.2 hosts. My service console network is a seperate network. I made a ping from my Virtual Center Server to each host.

esx1 1-2 ms

esx2 1-5 ms

esx3 5-65 ms

esx4 4-8 ms

That problem seems to be dependend on the (interupt ??) load of the esx host.

I'm also interested on a fix about this problem.

0 Kudos
Anders
Expert
Expert

Hi Anders

Thanks for letting us know.

Is there some way that we can have a reference for

the problem internally so that when we create an SR

we can just be added to the list of customers waiting

for the fix?

Hi.

Your TSE should know the PR number, we're a bit reluctant to share those.

If he cant find it, ask him to e-mail me.

I'm sure this will be a public fix as a lot of people seem to be hit by this.

\- Anders

0 Kudos
dcoz
Hot Shot
Hot Shot

I have performed quite a few vmware converter migrations from 2.5 to 3.0.2 and the ping time looks ok, only when we kick off a constant stream of data. Otherwise as the rest of you really bad response times.

hopefully there will be a fix soon

0 Kudos
Anders
Expert
Expert

Hi all.

A fix is cooking in engineering, but might take some time to bake properly.

Looks like it will not make the next patch release cycle.

\- Anders

0 Kudos
Reedy2642
Contributor
Contributor

Having the same problem, over 10ms latency sometimes.

This is a brand new VMware farm with 10nics in each server, it is fine on all the other interfaces - vmotion, VM, iSCSI etc, but the SC is painfully slow!

Going to ring up support now. Ideally we need the fix before the next patch release... Anyone got any firm info as to when 3.1 is out?!

0 Kudos
thechicco
Enthusiast
Enthusiast

Hey all,

got this issue as well.

Upgraded 2 x 3.0.1 hosts.

Dell 2900's, mix of broadcoms and intels.

0 Kudos
Anders
Expert
Expert

I meant our monthly[/b] patch release cycle.

So I hope you wont have to wait for 3.0.3/3.1 etc.

Given the high profile of this bug it might be released immedietly,

once QA have given their blessings.

\- Anders

0 Kudos
jccoca
Hot Shot
Hot Shot

In the monthly patch release only ESX-1001732 is about network problems and I'm not sure that it solves the problem.

0 Kudos
Anders
Expert
Expert

"Looks like it will not[/b] make the next patch release cycle."

\- Anders

0 Kudos
vigneng
Contributor
Contributor

That is to bad. I have already downgraded to 3.0.1 and now I am doing my work on XenSource.

0 Kudos
admin
Immortal
Immortal

I don't think this issue is bad enough to justify moving to XenSource....nothing can be that bad surely? Smiley Wink

0 Kudos
vigneng
Contributor
Contributor

I need to test certain guest OS's that were supported by 3.0.2 but due to the management software we use and the console problem I had to move it to Xen. Luckily, the management software supports more than one virtualization technology.

I also just received and update to my SR about this. Quote "...it looks like the code has been merged and it will tentatively be in the Esx 3.0.2 September patch. This is not set in stone..."

0 Kudos
bolsen
Enthusiast
Enthusiast

Same issues here on Intel/Broadcom cards. (IBM 3650)

Slow response to service console / normal responses to the vmkernel.

Perhaps there is a low priority on icmp packets?

0 Kudos
SLynched
Contributor
Contributor

Is there an ETA for this patch? I have a few customers with the same ping-time-response problem.

0 Kudos