trexima
Contributor
Contributor

Low VM-VM speeds on same host

Hello,

I'm troubleshooting quite low VM to VM ethernet speeds on our enterprise plus hosts. The topology is 3x blade servers connected via FC 16G to the same storage and a distributed switch as a network backbone.

I'm trying to move 10GB file between VM1 and VM2 which are on the same subnet, on the same "host" within the cluster. Both of the machines have 1 VMXNET3 adapters and are Windows Server 2019 hosts, both show in the status "10.0Gbps" connection via ethernet.

Maximum speeds when transferring data I get about 1.3Gbps, which is quite low.

Any ideas, where to even start looking for some hickups? Thank you

PS: Not sure if this belongs to vSphere, or NSX, but as because this is by my guess a networking issue, I post it here. Feel free to correct me if I'm wrong

 

Peter

0 Kudos
13 Replies
p0wertje
Enthusiast
Enthusiast

Hi,

 

You could try to use 'iperf'. You can put them in server and client mode. So you can purely test the network. 
But on a host itself you should get high speeds. Maybe it is storage being the issue here.

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
0 Kudos
shank89
Hot Shot
Hot Shot

SSH onto the host and use esxtop to check utilisation. 

 

Esxtop and then n for networking or the link below to check disk utilisation.

https://kb.vmware.com/s/article/1008205

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
0 Kudos
trexima
Contributor
Contributor

Hi,

I've done iperf3, speed is about 1.8Gbit from host to host:

trexima_1-1614001261979.png

 

18Gbit local loopback:

trexima_0-1614001183020.png

 

0 Kudos
trexima
Contributor
Contributor

Aaand here's the report from esxtop networking:

trexima_1-1614001601428.png

Seems rather "lazy" from the first point of view

0 Kudos
trexima
Contributor
Contributor

And here esxtop networking during iperf3:

trexima_0-1614001859813.png

 

it is consistent with those reports from iperf +- roughly 1800Mbit/s, which is really low considering 10Gbit connectivity.

0 Kudos
p0wertje
Enthusiast
Enthusiast

Does the result stay the same when you try more sessions ?

iperf3 -P 10 for example (if i remember right) This will use 10 sessions instead of 1


Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
0 Kudos
trexima
Contributor
Contributor

yes, results are the same In -P 10 it creates 10 streams of max 180Mbit, resulting in 1800Mbit

0 Kudos
p0wertje
Enthusiast
Enthusiast

Do you also have the issue with a normal port group ? Or only with nsx segment ?
Did you check all the setting on the physical switches ? No issues there ? Maybe some qos or something on the blades ?

 

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
0 Kudos
Sreec
VMware Employee
VMware Employee

I'm trying to move 10GB file between VM1 and VM2 which are on the same subnet, on the same "host"

Keeping the Host-Host bandwidth issue aside, VM-VM on the same host and subnet should give you good bandwidth as it is an in-memory switching. 

1. Is the issue specific to these Windows 2019 machines?  

2. Have you tested with any other OS? 

3. How are you moving the files from VM-VM? Any specific software or normal copy&paste operation? 

Cheers,
Sree | CKA|CKAD|VCIX-3X| VCAP-4X| VExpert 5x
0 Kudos
shank89
Hot Shot
Hot Shot

Have you checked the disk utilization using esxtop > pressing d.  Should see something similar to the image below.

shank89_0-1614026210445.png

 

Shashank Mohan

VCAP-NV 2020 | VCP-DCV2019 | CCNP Specialist

https://lab2prod.com.au
0 Kudos
trexima
Contributor
Contributor

Hi there,

trexima_1-1614071055604.png

This is disk usage from esxtop

p0wertje : I do not know if there's a difference between NSX or normal port group? We have vSphere 7 Enterprise Plus in a cluster mode in HA with vMotion across 3 Dell R640 blades in a Dell FX2 chassis and 10Gbe IOMs connected to a distributed switch. Both machines are atm on the same blade.

The reason I came to this issue in first place, is I noticed low transfers from this blade setup to another site on a 10Gbit network due backups and I started digging where might be the problem.

So I performed a simple copy test from one vhost to another and this issue surfaced.

Then performed iperf3 test in order to ignore possible storage bottlenecks (which is FiberChannel 16Gb connected all-ssd ME4024, but then ...  even that shouldn't be a problem, crystal disk mark on the host shows very nice numbers - 1450MB/s read 1580MB/s write, which are near tech caps of our storage).

I'm not aware of any enabled QoS on DSwitch, IOM or network - how can I check it?

 

We have only default MTUs on our DSwitch, no jumbo frames nor any other "non-default" values. But that should not be a problem in this case...

0 Kudos
p0wertje
Enthusiast
Enthusiast

Hi,

 

Like @Sreec mentions: When the vms are on the same host, you should get high numbers because of the in memory switching.
If you do not get high numbers from vm to vm on the same host -> please check the reply from @Sreec.
Maybe you should open a support case with vmware ? So they can look into the issue.

What i mean in difference between a 'normal portgroup' and a 'NSX portgroup':
Normal: not leverage a nsx segment (vlan backed or geneve backed) But a normal port-group you create from within virtual center. (which is vlan based)

Cheers,
p0wertje | VCIX6-NV | JNCIS-ENT
0 Kudos
trexima
Contributor
Contributor

Ok, tried Centos to Centos:

[SUM] 0.00-10.00 sec 10.9 GBytes 9.34 Gbits/sec 5614 sender
[SUM] 0.00-10.00 sec 10.9 GBytes 9.34 Gbits/sec receiver

now this looks more like it. This means that the problem might lie within windows.

0 Kudos