Hey all,
we are experiencing trouble after we upgraded to esx 3.0.2. I did one machine last friday and did not notice the problem immediately, today it patched another esx and after the boot sequence was complete i saw this strange behavior..
Normally all our servers have a ping latency around 1 (or smaller) as shown below
Pinging esx1.uz.kuleuven.ac.be \[172.22.3.34] with 32 bytes o
Reply from 172.22.3.34: bytes=32 time=1ms TTL=63
Reply from 172.22.3.34: bytes=32 time<1ms TTL=63
Reply from 172.22.3.34: bytes=32 time<1ms TTL=63
...
Reply from 172.22.3.34: bytes=32 time<1ms TTL=63
Reply from 172.22.3.34: bytes=32 time<1ms TTL=63
Ping statistics for 172.22.3.34:
Packets: Sent = 24, Received = 24, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 1ms, Average = 0ms
the 2 hosts that are running 3.0.2 have these response times:
Pinging playvmware.uz.kuleuven.ac.be \[172.22.1.79] with 32
Reply from 172.22.1.79: bytes=32 time=1ms TTL=63
Reply from 172.22.1.79: bytes=32 time=10ms TTL=63
Reply from 172.22.1.79: bytes=32 time=10ms TTL=63
Reply from 172.22.1.79: bytes=32 time=9ms TTL=63
Reply from 172.22.1.79: bytes=32 time=108ms TTL=63
Reply from 172.22.1.79: bytes=32 time=10ms TTL=63
...
Reply from 172.22.1.79: bytes=32 time<1ms TTL=63
Reply from 172.22.1.79: bytes=32 time=9ms TTL=63
Reply from 172.22.1.79: bytes=32 time=8ms TTL=63
Reply from 172.22.1.79: bytes=32 time=8ms TTL=63
...
Reply from 172.22.1.79: bytes=32 time=2ms TTL=63
Reply from 172.22.1.79: bytes=32 time=33ms TTL=63
Reply from 172.22.1.79: bytes=32 time=2ms TTL=63
Reply from 172.22.1.79: bytes=32 time=11ms TTL=63
Ping statistics for 172.22.1.79:
Packets: Sent = 33, Received = 33, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 108ms, Average = 8ms
can someone verify this? or are we the only one with this issue.. it seems odd to me.. having 2 boxes just upgraded and experiencing the same problem
Hi all.
FYI: we have reproduced the problem; latency, smb and backup issues seems related.
Looks like a bug due to change in behavior.
A bit of a blunder as the COS performance is suppose to be improved. :smileygrin:
\- Anders
Has this been a confirmed bug by VMware? If so, will they release a fix?
I entered an SR yesterday and got a call back. They verified they have an engineer working the problem. They mentioned they think it has to do with interrupt handling. They did not give a timeline on the fix.
They had me run one test where I started a large file scp copy to the host and the ping rates went down to under 1ms during the copy. Once it finished it kicked right back up. I also noticed that pinging from the host out has no loss in speed it appears to be only traffic going to the server.
It is still a big problem. Working with my lab management software the ping rates are causing some timeouts talking to the agent on the ESX hosts.
Need a fix soon!!
Hi Anders
Thanks for letting us know.
Is there some way that we can have a reference for the problem internally so that when we create an SR we can just be added to the list of customers waiting for the fix?
Leonard...
My SR # is: 193351021
I guess you could tell them to look at my SR and ask them to add you to the customers needing the fix. They did say they would keep the ticket open until they had a fix. Hope that helps.
I have 4 identical ESX 3.0.2 hosts. My service console network is a seperate network. I made a ping from my Virtual Center Server to each host.
esx1 1-2 ms
esx2 1-5 ms
esx3 5-65 ms
esx4 4-8 ms
That problem seems to be dependend on the (interupt ??) load of the esx host.
I'm also interested on a fix about this problem.
Hi Anders
Thanks for letting us know.
Is there some way that we can have a reference for
the problem internally so that when we create an SR
we can just be added to the list of customers waiting
for the fix?
Hi.
Your TSE should know the PR number, we're a bit reluctant to share those.
If he cant find it, ask him to e-mail me.
I'm sure this will be a public fix as a lot of people seem to be hit by this.
\- Anders
I have performed quite a few vmware converter migrations from 2.5 to 3.0.2 and the ping time looks ok, only when we kick off a constant stream of data. Otherwise as the rest of you really bad response times.
hopefully there will be a fix soon
Hi all.
A fix is cooking in engineering, but might take some time to bake properly.
Looks like it will not make the next patch release cycle.
\- Anders
Having the same problem, over 10ms latency sometimes.
This is a brand new VMware farm with 10nics in each server, it is fine on all the other interfaces - vmotion, VM, iSCSI etc, but the SC is painfully slow!
Going to ring up support now. Ideally we need the fix before the next patch release... Anyone got any firm info as to when 3.1 is out?!
Hi,
Mayby it 3.1 comes at the end of october
http://www.ntpro.nl/blog/archives/149-VMware-ESX-3.1-whats-new.html
http://www.virtualization.info/2007/08/vmware-esx-server-31-virtualcenter-21.html
http://www.vmachine.de/mambo/index.php?option=com_content&task=view&id=331&Itemid=1
http://www.vmware.com/community/thread.jspa?threadID=90011
Regards
Wolfgang
Hey all,
got this issue as well.
Upgraded 2 x 3.0.1 hosts.
Dell 2900's, mix of broadcoms and intels.
In the monthly patch release only ESX-1001732 is about network problems and I'm not sure that it solves the problem.
That is to bad. I have already downgraded to 3.0.1 and now I am doing my work on XenSource.
I don't think this issue is bad enough to justify moving to XenSource....nothing can be that bad surely?
I need to test certain guest OS's that were supported by 3.0.2 but due to the management software we use and the console problem I had to move it to Xen. Luckily, the management software supports more than one virtualization technology.
I also just received and update to my SR about this. Quote "...it looks like the code has been merged and it will tentatively be in the Esx 3.0.2 September patch. This is not set in stone..."
Same issues here on Intel/Broadcom cards. (IBM 3650)
Slow response to service console / normal responses to the vmkernel.
Perhaps there is a low priority on icmp packets?
Is there an ETA for this patch? I have a few customers with the same ping-time-response problem.