Hello,
I have a really weird problem with Infiniband connection between ESXi Hosts.
Here is my setup :
HP C7000 with BL685c G1 and HP 4x DDR IB Switch Module . The blades are running Vmware Esxi 5.1.0 U2 ( Custom HP image ), I have also installed Mellanox drivers ( MLNX-OFED-ESX-1.8.1.0 ) and ib-opensm as a vib on each of the hosts (http://www.hypervisor.fr/?p=4662 ) .
Here are the vmnics :
# esxcli network nic list | grep 10G
vmnic_ib0 0000:047:00.0 ib_ipoib Up 20000 Full 00:23:7d:94:d8:7d 4092 Mellanox Technologies MT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]
vmnic_ib1 0000:047:00.0 ib_ipoib Up 20000 Full 00:23:7d:94:d8:7e 1500 Mellanox Technologies MT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]
I have created a VMkernel port and a switch, both the group and switch are setup to deal with mtu=4k. I have also configured the mlx4_core to support mtu=4k
# esxcli system module parameters list -m=mlx4_core | grep mtu_4k
mtu_4k int 1 configure 4k mtu (mtu_4k > 0)
I also have set the opensm to support 4k mtu with partitions.conf file with the following content Default=0x7fff,ipoib,mtu=5:ALL=full;
And here is the problem. When I am using MTU=1500
/opt/iperf/bin # ./iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 61140
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 3.98 GBytes 3.42 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 58854
[ 5] 0.0-10.0 sec 4.53 GBytes 3.89 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 51600
[ 4] 0.0-10.0 sec 3.66 GBytes 3.15 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 60066
[ 5] 0.0-10.0 sec 4.52 GBytes 3.88 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 50728
[ 4] 0.0-10.0 sec 4.71 GBytes 4.04 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 58792
[ 5] 0.0-10.0 sec 4.54 GBytes 3.90 Gbits/sec
MTU=2000
/opt/iperf/bin # ./iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 62523
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 5.35 GBytes 4.59 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 56491
[ 5] 0.0-10.0 sec 5.43 GBytes 4.66 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 63144
[ 4] 0.0-10.0 sec 4.41 GBytes 3.79 Gbits/sec
[ 5] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 53978
[ 5] 0.0-10.0 sec 4.43 GBytes 3.81 Gbits/sec
[ 4] local 192.168.13.39 port 5001 connected with 192.168.13.36 port 61886
[ 4] 0.0-10.0 sec 5.38 GBytes 4.62 Gbits/sec
MTU=4092
/opt/iperf/bin # ./iperf -c 192.168.13.39
------------------------------------------------------------
Client connecting to 192.168.13.39, TCP port 5001
TCP window size: 75.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.13.36 port 50673 connected with 192.168.13.39 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-79.5 sec 8.00 GBytes 864 Mbits/sec
/opt/iperf/bin # ./iperf -c 192.168.13.39
------------------------------------------------------------
Client connecting to 192.168.13.39, TCP port 5001
TCP window size: 75.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.13.36 port 49604 connected with 192.168.13.39 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-79.5 sec 8.00 GBytes 864 Mbits/sec
/opt/iperf/bin # ./iperf -c 192.168.13.39
------------------------------------------------------------
Client connecting to 192.168.13.39, TCP port 5001
TCP window size: 35.5 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.13.36 port 58764 connected with 192.168.13.39 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-79.5 sec 8.00 GBytes 864 Mbits/sec
All the testing has been done with iperf. Any suggestions why when the mtu is 4092 I get slower connection speeds than when I am using MTU=2000. AFAIK the speed has to increase when the mtu is higher ( I can see this trend from the difference between mtu=1500 and mtu=2000 ) .
Any input is welcome
Hi
Welcome to communities.
It means its the best configuration (MTU=2000) of the hardware .
and if you increase MTU it may work or not .
Well, it seems that all the hardware in my configuration supports mtu=4k - the HCAs support it, the siwtch supports it, the current version of vmware ( ESXi 5.1 u2 ) supports it, the subnet manager is set to support it. So why would the best configuration of the hardware be MTU of 2k ? It doesnt make any sense. And I am not really seeing the speeds that are expected ~10G. I would be happy witch achieving speeds of 6-8G, but still I am more in the are of 3-4. So my guess is the MTU size, the optimum for Ethernet over Infiniband would be 4k MTU, even tho I would see a slightly weaker performance when I measure it with iperf. In real life situations it would be better - that's what all the documentation says. The question is why doesn't it work ?
Probably not all relevant components have the 4K mtu configuration applied
a. Check that the vSwitch also configured for 4K mtu (you might have configured the uplink only, need to align the vSwitch as well, check -m flag)
b. Make sure that opensm applied the configuration (you may need to restart the SM, or the IB switch. Check SM user manual)