VMware Cloud Community
AlexMercerTheGr
Contributor
Contributor

Poor Infiniband performance on VMware ESXi 5.1 u2

Hello,

I have a really weird problem with Infiniband connection between ESXi Hosts.

Here is my setup :

HP C7000 with BL685c G1 and HP 4x DDR IB Switch Module . The blades are running Vmware Esxi 5.1.0 U2 ( Custom HP image ), I have also installed Mellanox drivers ( MLNX-OFED-ESX-1.8.1.0 ) and ib-opensm as a vib on each of the hosts (http://www.hypervisor.fr/?p=4662 ) .

Here are the vmnics :

# esxcli network nic list | grep 10G

vmnic_ib0  0000:047:00.0  ib_ipoib  Up    20000  Full    00:23:7d:94:d8:7d  4092  Mellanox Technologies MT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]

vmnic_ib1  0000:047:00.0  ib_ipoib  Up    20000  Full    00:23:7d:94:d8:7e  1500  Mellanox Technologies MT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]

I have created a VMkernel port and a switch, both the group and switch are setup to deal with mtu=4k. I have also configured the mlx4_core to support mtu=4k

# esxcli system module parameters list -m=mlx4_core | grep mtu_4k

mtu_4k                  int           1       configure 4k mtu (mtu_4k > 0)


I also have set the opensm to support 4k mtu with partitions.conf file with the following content Default=0x7fff,ipoib,mtu=5:ALL=full;

And here is the problem. When I am using MTU=1500

/opt/iperf/bin # ./iperf -s

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 64.0 KByte (default)

------------------------------------------------------------

[  4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 61140

[ ID] Interval       Transfer     Bandwidth

[  4]  0.0-10.0 sec  3.98 GBytes  3.42 Gbits/sec

[  5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 58854

[  5]  0.0-10.0 sec  4.53 GBytes  3.89 Gbits/sec

[  4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 51600

[  4]  0.0-10.0 sec  3.66 GBytes  3.15 Gbits/sec

[  5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 60066

[  5]  0.0-10.0 sec  4.52 GBytes  3.88 Gbits/sec

[  4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 50728

[  4]  0.0-10.0 sec  4.71 GBytes  4.04 Gbits/sec

[  5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 58792

[  5]  0.0-10.0 sec  4.54 GBytes  3.90 Gbits/sec

MTU=2000

/opt/iperf/bin # ./iperf -s

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 64.0 KByte (default)

------------------------------------------------------------

[  4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 62523

[ ID] Interval       Transfer     Bandwidth

[  4]  0.0-10.0 sec  5.35 GBytes  4.59 Gbits/sec

[  5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 56491

[  5]  0.0-10.0 sec  5.43 GBytes  4.66 Gbits/sec

[  4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 63144

[  4]  0.0-10.0 sec  4.41 GBytes  3.79 Gbits/sec

[  5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 53978

[  5]  0.0-10.0 sec  4.43 GBytes  3.81 Gbits/sec

[  4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 61886

[  4]  0.0-10.0 sec  5.38 GBytes  4.62 Gbits/sec

MTU=4092

/opt/iperf/bin # ./iperf -c 192.168.13.39

------------------------------------------------------------

Client connecting to 192.168.13.39, TCP port 5001

TCP window size: 75.5 KByte (default)

------------------------------------------------------------

[  3] local 192.168.13.36 port 50673 connected with 192.168.13.39 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-79.5 sec  8.00 GBytes   864 Mbits/sec

/opt/iperf/bin # ./iperf -c 192.168.13.39

------------------------------------------------------------

Client connecting to 192.168.13.39, TCP port 5001

TCP window size: 75.5 KByte (default)

------------------------------------------------------------

[  3] local 192.168.13.36 port 49604 connected with 192.168.13.39 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-79.5 sec  8.00 GBytes   864 Mbits/sec

/opt/iperf/bin # ./iperf -c 192.168.13.39

------------------------------------------------------------

Client connecting to 192.168.13.39, TCP port 5001

TCP window size: 35.5 KByte (default)

------------------------------------------------------------

[  3] local 192.168.13.36 port 58764 connected with 192.168.13.39 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-79.5 sec  8.00 GBytes   864 Mbits/sec

All the testing has been done with iperf. Any suggestions why when the mtu is 4092 I get slower connection speeds than when I am using MTU=2000. AFAIK the speed has to increase when the mtu is higher ( I can see this trend from the difference between mtu=1500 and mtu=2000 ) .

Any input is welcome Smiley Happy

0 Kudos
3 Replies
grace27
Enthusiast
Enthusiast

Hi

Welcome to communities.

It means its the best configuration (MTU=2000) of the hardware .

and if you increase MTU it may work or not .

0 Kudos
AlexMercerTheGr
Contributor
Contributor

Well, it seems that all the hardware in my configuration supports mtu=4k - the HCAs support it, the siwtch supports it, the current version of vmware ( ESXi 5.1 u2 ) supports it, the subnet manager is set to support it. So why would the best configuration of the hardware be MTU of 2k ? It doesnt make any sense. And I am not really seeing the speeds that are expected ~10G. I would be happy witch achieving speeds of 6-8G, but still I am more in the are of 3-4. So my guess is the MTU size, the optimum for Ethernet over Infiniband would be 4k MTU, even tho I would see a slightly weaker performance when I measure it with iperf. In real life situations it would be better - that's what all the documentation says. The question is why doesn't it work ?

0 Kudos
mlxali
Enthusiast
Enthusiast

Probably not all relevant components have the 4K mtu configuration applied

a. Check that the vSwitch also configured for 4K mtu (you might have configured the uplink only, need to align the vSwitch as well, check -m flag)

b. Make sure that opensm applied the configuration (you may need to restart the SM, or the IB switch. Check SM user manual)

0 Kudos