Hi,
We have two ESXi hosts. Both were running 5.5u1 build 1623367. We've upgraded one of the hosts to 5.5u2 build 2302651 but now our IPoIB devices are missing and we get some strange looking nics instead. I have removed the VMware Mellanox drivers and installed the vibs from Mellanox (MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820.zip). Below is the listing of the nics, first on 5.5u1 and then on 5.5u2:
5.5u1:
# esxcli network nic list
Name PCI Device Driver Link Speed Duplex MAC Address MTU Description
--------- ------------- -------- ---- ----- ------ ----------------- ---- ------------------------------------------------------------------------------
vmnic0 0000:002:00.0 igb Up 1000 Full 00:25:90:c7:f2:00 1500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:002:00.1 igb Down 0 Half 00:25:90:c7:f2:01 1500 Intel Corporation I350 Gigabit Network Connection
vmnic2 0000:081:00.0 ixgbe Up 10000 Full 90:e2:ba:3f:cd:20 9000 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic3 0000:081:00.1 ixgbe Down 0 Half 90:e2:ba:3f:cd:21 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic_ib0 0000:006:00.0 ib_ipoib Up 40000 Full 00:02:c9:2c:d2:f9 1500 Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
vmnic_ib1 0000:006:00.0 ib_ipoib Up 40000 Full 00:02:c9:2c:d2:fa 1500 Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
5.5u2:
# esxcli network nic list
Name PCI Device Driver Link Speed Duplex MAC Address MTU Description
------------ ------------- ------- ---- ----- ------ ----------------- ---- ------------------------------------------------------------------------------
vmnic0 0000:002:00.0 igb Up 1000 Full 00:25:90:c7:f1:48 1500 Intel Corporation I350 Gigabit Network Connection
vmnic1 0000:002:00.1 igb Down 0 Half 00:25:90:c7:f1:49 1500 Intel Corporation I350 Gigabit Network Connection
vmnic1000402 0000:006:00.0 mlx4_en Down 0 Half 00:02:c9:2c:d3:0d 1500 Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
vmnic2 0000:081:00.0 ixgbe Up 10000 Full 90:e2:ba:3f:cf:30 9000 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic3 0000:081:00.1 ixgbe Down 0 Half 90:e2:ba:3f:cf:31 1500 Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection
vmnic4 0000:006:00.0 mlx4_en Down 0 Full 00:02:c9:2c:d3:0c 1500 Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]
As you can see, on 5.5u1 the ib_ipoib driver was used by default, but now on 5.5u2 the mlx4_en driver has taken control. Do I have to manually reconfigure this? What are the steps? Also, why the strange looking vmnic (vmnic1000402)?
Any help is appreciated.
Thanks,
Tom
I've just noticed that the driver ib_ipoib is not loaded for my updated host
5.5u1:
# vmkload_mod -l | grep ib_ipoib
ib_ipoib 0 140
# echo $?
0
5.5u2:
# vmkload_mod -l | grep ib_ipoib
# echo $?
1
Does anyone know how to load this driver and get IBoIP reconfigured?
Regards,
Tom
Also, the old vib depot (1.8.2.4) includes an ib_ipoib driver but the new one (1.9.10.0) doesn't!
# unzip -l /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip
Archive: /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip
Length Date Time Name
-------- ---- ---- ----
321 03-06-14 09:22 index.xml
205 03-06-14 09:22 vendor-index.xml
11092 03-06-14 09:22 metadata.zip
38334 03-06-14 09:22 vib20/net-ib-cm/Mellanox_bootbank_net-ib-cm_1.8.2.4-1OEM.500.0.0.472560.vib
30708 03-06-14 09:22 vib20/net-ib-umad/Mellanox_bootbank_net-ib-umad_1.8.2.4-1OEM.500.0.0.472560.vib
54194 03-06-14 09:22 vib20/scsi-ib-srp/Mellanox_bootbank_scsi-ib-srp_1.8.2.4-1OEM.500.0.0.472560.vib
92208 03-06-14 09:22 vib20/net-mlx4-ib/Mellanox_bootbank_net-mlx4-ib_1.8.2.4-1OEM.500.0.0.472560.vib
98600 03-06-14 09:22 vib20/net-ib-ipoib/Mellanox_bootbank_net-ib-ipoib_1.8.2.4-1OEM.500.0.0.472560.vib
28628 03-06-14 09:22 vib20/net-ib-sa/Mellanox_bootbank_net-ib-sa_1.8.2.4-1OEM.500.0.0.472560.vib
127648 03-06-14 09:22 vib20/net-mlx4-core/Mellanox_bootbank_net-mlx4-core_1.8.2.4-1OEM.500.0.0.472560.vib
40898 03-06-14 09:22 vib20/net-ib-mad/Mellanox_bootbank_net-ib-mad_1.8.2.4-1OEM.500.0.0.472560.vib
55290 03-06-14 09:22 vib20/net-ib-core/Mellanox_bootbank_net-ib-core_1.8.2.4-1OEM.500.0.0.472560.vib
-------- -------
578126 12 files
~ # unzip -l /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820.zip
Archive: /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820.zip
Length Date Time Name
-------- ---- ---- ----
321 10-19-14 12:40 index.xml
205 10-19-14 12:40 vendor-index.xml
11494 10-19-14 12:40 metadata.zip
52894 10-19-14 12:40 vib20/net-ib-core/Mellanox_bootbank_net-ib-core_1.9.10.0-1OEM.550.0.0.1331820.vib
42490 10-19-14 12:40 vib20/net-ib-mad/Mellanox_bootbank_net-ib-mad_1.9.10.0-1OEM.550.0.0.1331820.vib
19510 10-19-14 12:40 vib20/net-ib-addr/Mellanox_bootbank_net-ib-addr_1.9.10.0-1OEM.550.0.0.1331820.vib
83084 10-19-14 12:40 vib20/net-mlx4-ib/Mellanox_bootbank_net-mlx4-ib_1.9.10.0-1OEM.550.0.0.1331820.vib
38066 10-19-14 12:40 vib20/net-ib-cm/Mellanox_bootbank_net-ib-cm_1.9.10.0-1OEM.550.0.0.1331820.vib
71676 10-19-14 12:40 vib20/net-mlx4-en/Mellanox_bootbank_net-mlx4-en_1.9.10.0-1OEM.550.0.0.1331820.vib
121304 10-19-14 12:40 vib20/net-mlx4-core/Mellanox_bootbank_net-mlx4-core_1.9.10.0-1OEM.550.0.0.1331820.vib
26252 10-19-14 12:40 vib20/net-ib-umad/Mellanox_bootbank_net-ib-umad_1.9.10.0-1OEM.550.0.0.1331820.vib
29786 10-19-14 12:40 vib20/net-ib-sa/Mellanox_bootbank_net-ib-sa_1.9.10.0-1OEM.550.0.0.1331820.vib
36248 10-19-14 12:40 vib20/net-rdma-cm/Mellanox_bootbank_net-rdma-cm_1.9.10.0-1OEM.550.0.0.1331820.vib
55928 10-19-14 12:40 vib20/scsi-ib-iser/Mellanox_bootbank_scsi-ib-iser_1.9.10.0-1OEM.550.0.0.1331820.vib
-------- -------
589258 14 files
Do I have the wrong driver package? Why is the net-ib-ipoib vib missing in 1.9.10.0?
Kind regards,
Tom
I've restored the MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip drivers to the system to get it working again. ESXi still complains that it's not signed so it has to be cajoled into place with a --no-sig-check but at least the ib_ipoib driver is back in place and the vmnic_ib0 and vmnic_ib1 are back. The storage adapters also look good: vmhba_mlx4_0.1.1 and vmhba_mlx4_0.2.1 are back and show 53 paths/tagets each.
Looks like the update is finally done although I would have preferred the more recent Mellanox drivers. I have a support ticket open with Mellanox so hopefully they can explain. Maybe the new driver pack (1.9.10) I'm looking at isn't correct at all for what I'm trying to achieve.
1.8.2.4 is the latest version from Mellanox supporting Infiniband mode (including IPoIB). The newer ones (1.9.x.x, both from VMware and from Mellanox) support only Ethernet mode for these devices. Basically, when you update your hosts to newer release of ESXi, you need to re-install the 1.8.2.4 while removing the mlx4_en VIB. If you don't do the latter, the vSphere HA is not going to work.
Thanks mpogr, it looks like I summed that up all right. It's good to hear it confirmed from someone else, though.
BTW, we are using SRP currently as well. The SRP drivers are also gone from the 1.9.10 driver pack which defaults to iSER instead for vmhbas.
Now a few more questions come up:
Hi!
I also have some troubles with Infiniband. I have four ESXi 5.5 hosts with MT26428 adapters onboard. Also I bought IS5022 switch and MC2206130-00A cables. However, indicators of switch ports are turned off and the vSphere Client app also shows a lack of connection (see image in attachment). What could be the problem?
I don't think anyone but Mellanox can tell you if Infiniband mode is being dropped or not. So I'd suggest you to ask them.
As for moving from SRP to iSER, I guess this is primarily a question of what equipment you currently have/are planning to have. I have several Connect-X2 cards that support 40 Gbps only in Infiniband mode (and 10 Gbps in Ethernet mode) and a very old Voltaire switch (10 Gbps), which doesn't support Ethernet mode at all. So, for me, there is no other choice, I pretty much have to use Infiniband mode and SRP. I have 4 ESXi hosts connecting to a CentOS 7 storage server (with two dual-port Connect-X2 cards) directly using QSFP-QSFP cables, so I can use SRP over full 40 Gbps speed for storage. For inter-ESXi (e.g. VMotion) network I use the second ports on the hosts connected to the switch using QSFP-CX4 cables at 10 (8) Gbps speed over IPoIB. I could potentially switch to the newer VMware/Mellanox drivers on ESXis and switch to Ethernet mode and iSER for storage, but then I'd lose on speed (would get 10 Gpbs instead of 40) and wouldn't be able to establish the inter-ESXi network at all (because my switch doesn't support the Ethernet mode), so no point in doing that.
It doesn't seem like Mellanox drivers for VMware allow separate configuration of ports on dual-port Connect-X2 cards (they do allow that for Connect-X3 ones), which is pity, as such configuration is possible on Linux and Windows.
It looks like you have a Connect-X2 card (same as mine) which is configured in Ethernet mode. Please, note, that in that mode its speed is limited to 10 Gbps rather than 40 Gbps which is possible in Infiniband mode. Your switch seems to be configured in Infiniband mode, that's why the card doesn't connect. You can do one of the following:
This problem caused from inbox vSphere driver for Mellanox Ethernet Adapter.
Therefore you can't upgrade from previsous ESXi host to new version one.
This problem was derive from different core driver version between VPI HCA (Infiniband) and Ethernet Adapter.
This is very serious...
Here is New vSphere VPI driver 1.8.2.4 installation on ESXi host step-by-step setup guide.
* New installation
01. Enter maintenance mode
02. Uninstall inbox driver on ESXi console or SSH console
esxcli software vib remove -n net-mlx4-en
esxcli software vib remove -n net-mlx4-core
03. Reboot the ESXi host
04. Locates vSphere 1.8.2.4 VPI driver in path "/var/log/vmware"
05. Install vSphere 1.8.2.4 VPI driver
esxcli software vib install -d /var/log/vmware/MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip --no-sig-check
* This latest driver was only for vSphere ESXi 5.1. Therefore you must use --no-sig-check option!
06. Reboot the ESXi host
to yzennezy
I can't understand how did you can upgrade from ESXi 5.5u1 build 1623367 to ESXi 5.5u2 build 2302651.
If your ESXi host was installed vSphere VPI driver then ESXi installer shows a warnning that you can't upgrade because of difference in core driver version
that mentioned above.
Therefore I think you must uninstall all Mellanox driver then reboot your ESXi host, at last reinstall vSphere VPI driver 1.8.2.4 with above procedure and reboot the ESXi host again.
* Your log shows your driver isn't Mellanox VPI driver but Ethernet one.
If you wan't uninstall all Mellanox driver from your ESXi host run this scripts on your ESXi console or SSH shell.
* This scripts must be run in same order because there is driver dependencies in modules.
01. Uninstall all of Mellanox driver on ESXi host
esxcli software vib remove -n scsi-ib-srp
esxcli software vib remove -n net-ib-ipoib
esxcli software vib remove -n net-mlx4-ib
esxcli software vib remove -n net-ib-umad
esxcli software vib remove -n net-ib-cm
esxcli software vib remove -n net-ib-sa
esxcli software vib remove -n net-memtrack
esxcli software vib remove -n net-mlx4-en
esxcli software vib remove -n net-mlx4-core
esxcli software vib remove -n net-ib-mad
esxcli software vib remove -n net-ib-core
02. Reboot the ESXi host
03. Check your Mellanox driver status
esxcli software vib list | grep Mellanox
04. If you can't find any Mellanox driver on your ESXi host then reinstall vSphere VPI driver 1.8.2.4 with above scripts
Good luck!
Message was edited by: Jae-Hoon Choi Just edit the typo error.