VMware Cloud Community
admin
Immortal
Immortal

ESX 3.0.2 - ping latency

Hey all,

we are experiencing trouble after we upgraded to esx 3.0.2. I did one machine last friday and did not notice the problem immediately, today it patched another esx and after the boot sequence was complete i saw this strange behavior..

Normally all our servers have a ping latency around 1 (or smaller) as shown below

Pinging esx1.uz.kuleuven.ac.be \[172.22.3.34] with 32 bytes o

Reply from 172.22.3.34: bytes=32 time=1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

...

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Reply from 172.22.3.34: bytes=32 time<1ms TTL=63

Ping statistics for 172.22.3.34:

Packets: Sent = 24, Received = 24, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 1ms, Average = 0ms

the 2 hosts that are running 3.0.2 have these response times:

Pinging playvmware.uz.kuleuven.ac.be \[172.22.1.79] with 32

Reply from 172.22.1.79: bytes=32 time=1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=108ms TTL=63

Reply from 172.22.1.79: bytes=32 time=10ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time<1ms TTL=63

Reply from 172.22.1.79: bytes=32 time=9ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

Reply from 172.22.1.79: bytes=32 time=8ms TTL=63

...

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=33ms TTL=63

Reply from 172.22.1.79: bytes=32 time=2ms TTL=63

Reply from 172.22.1.79: bytes=32 time=11ms TTL=63

Ping statistics for 172.22.1.79:

Packets: Sent = 33, Received = 33, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 108ms, Average = 8ms

can someone verify this? or are we the only one with this issue.. it seems odd to me.. having 2 boxes just upgraded and experiencing the same problem

Reply
0 Kudos
187 Replies
bertdb
Virtuoso
Virtuoso

hmm, one of the reports higher up suggests that when the SC is receiving a lot of packets (during file transfer), the response time gets better. But when the SC is receiving very few packets, the response time varies.

That sounds like the priority of delivering/sending SC network packets is dynamic, and a ping (+ an SSH connection) leaves the priority very low, but very active network connections like file transfer (whichever protocol you use) makes the priority go up (and the latency go down).

Reply
0 Kudos
alex2801
Contributor
Contributor

I have 2 new ESX servers HP DL380G5. When i ping the server that has no VM's the ping response is approx 6 ms, but when i ping the server that is currently doing a P2V convert the ping response is 1 ms!

Reply
0 Kudos
bfrank81
Enthusiast
Enthusiast

I use esxRanger to backup all of my VMs daily. It pulls the data accross the network to a windows server that we push to tape then it goes off site.

Does anyone know if this problem would cause those backup times to take longer?

Reply
0 Kudos
DavidCOM
Contributor
Contributor

Has anyone got a fix or reply about this? I'm having the same issue.

Reply
0 Kudos
admin
Immortal
Immortal

i think we need to be patient.. they're working on my case.. as soon as i get an update, i'll post it on this board

Reply
0 Kudos
MinEZ
Contributor
Contributor

We have the same problem afther upgrading to 3.0.2

I just reinstalled version 3.0.1 and the latency is back to <1ms

We wanted to see if this was the cause why we are not able to create a datastore afther the lun has been discovered.

The creation of the datastore fails with a time out.

Perhaps this problem belongs in a different post but ...... if anyone has a suggestion.

Reply
0 Kudos
shassing
Contributor
Contributor

This monday we decided to make the update to version 3.0.2.

Everything worked out well except one thing ..

The backup performance is terrible slow, not higher then 200 MB/min!!

Before monday we ran on 1600 MB/min

we are using backupexec 10d with the Unix remote agent

installed the RALUS (remote agent for linux and unix) on our ESX server 5.0

firewall settings are ok, opent tcp out: 6101 and 6102

tcp in: 8192-8198

Opent firewall settings on backupexec 1025 -65535

What could I do more then that?

Does this sound familiar?

Reply
0 Kudos
Argyle
Enthusiast
Enthusiast

Can confirm the same thing on HP BL25p G1 with latest firmware from HP Firmware CD 7.80.

Reply
0 Kudos
bertdb
Virtuoso
Virtuoso

you have checked your link speed, right ? Speeds dropping tenfold smells like a speed 1G->100M difference somewhere.

Reply
0 Kudos
bertdb
Virtuoso
Virtuoso

MinEZ, it seems very unlikely that these two issues are related. The ping latency issue is strange, very strange, and begs for an explanation, but it isn't really a problem. It doesn't keep the service console from responding to connections.

You storage problem can be investigated further by someone with service console commandline experience (esxcfg-vmhbadevs, fdisk, vmkfstools).

Reply
0 Kudos
bertdb
Virtuoso
Virtuoso

higher latency hasn't got a direct link with lower bandwidth, but it \_could_ be connected.

Anyone remember the article that calculated the bandwidth of sending a truck full of DAT tapes from the east to the west coast of the USA ? Impressive bandwidth at the time, but latency ...

Reply
0 Kudos
Sanderm
Contributor
Contributor

I am having the same problem. I upgrade 6 Dell 2950 servers from 3.0.1 (no patch) to 3.0.2 and I am experiencing slow ping rates. With 3.0.1 the ping responded within 1 MiliSecond. Now a ping is taking around 6 MiliSecond.

Reply
0 Kudos
jhanekom
Virtuoso
Virtuoso

Definitely not limited to Broadcom - just did a test on 3.0.2 with an ML350 G5 server with an additional Intel-based card installed.

In my own case there doesn't appear to be any serious issues around the ping latency other than cosmetic: I did a network speed test and still get near wire-speed throughput from the service console.

Unfortunately I only have a 100Mbps setup in my test lab, but I get ~80Mbps+ throughput on both the e1000 and bnx2-based cards, though throughput seems to vary quite a bit between tests.

I don't have any baseline for what performance was like on 3.0.1.

Message was edited by:

jhanekom

(corrected a typo)

Reply
0 Kudos
shassing
Contributor
Contributor

Thanks for your reply:

We are not using a blade server, we have an IBM x235 server.

Reply
0 Kudos
shassing
Contributor
Contributor

No everything is fixed 1000 full duplex

Reply
0 Kudos
waldorfo2
Contributor
Contributor

Thanks amr77 for pointing me to this thread.

I am having exactely the same issue.

While slow ping might be considered as cosmetic issue but VERY VERY slow samba transfers are something more for us. It takes 10 minute to copy 50MB file (something like 0.2Mb/s ?)

The funny thing is that scp copy works perfect. I have not tried ftp yet.

Problem exists on 3 different ESX servers (FSC RX300 and Dell 6950). I tried both Intel and Broadcom NICs. Definitely ESX related as I tried different samba servers as well.

Reply
0 Kudos
meistermn
Expert
Expert

Have you tried to install completely new.

Maybe the is a differnce between a new installation and an upgarde from 3.01 to 3.0.2

Reply
0 Kudos
amr77
Contributor
Contributor

As per my earlier post - I installed from scratch on a new server with the latest firmware and got the same results.

VMware support emailed me this afternoon stating that they are gathering service requests on this issue and will contact me with further information.

Reply
0 Kudos
shassing
Contributor
Contributor

No all network adapters are running on Full 1000 duplex.

Strange thing, we also mentioned the ping latency .. don't know what the effective results are.

Reply
0 Kudos
MinEZ
Contributor
Contributor

Today I reinstalled esx3.0.1 and ....... I can create the datastores again.

probably the latency causes in my situation a time out.

So for me no version 3.0.2 before this problem is solved.

Reply
0 Kudos