VMware Cloud Community
yzennezy
Enthusiast
Enthusiast

Infiniband IPoIB missing after upgrade to 5.5u2 build 2302651

Hi,

We have two ESXi hosts. Both were running 5.5u1 build 1623367. We've upgraded one of the hosts to 5.5u2 build 2302651 but now our IPoIB devices are missing and we get some strange looking nics instead. I have removed the VMware Mellanox drivers and installed the vibs from Mellanox (MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820.zip). Below is the listing of the nics, first on 5.5u1 and then on 5.5u2:

5.5u1:

# esxcli network nic list

Name       PCI Device     Driver    Link  Speed  Duplex  MAC Address         MTU  Description                                                                  

---------  -------------  --------  ----  -----  ------  -----------------  ----  ------------------------------------------------------------------------------

vmnic0     0000:002:00.0  igb       Up     1000  Full    00:25:90:c7:f2:00  1500  Intel Corporation I350 Gigabit Network Connection                            

vmnic1     0000:002:00.1  igb       Down      0  Half    00:25:90:c7:f2:01  1500  Intel Corporation I350 Gigabit Network Connection                            

vmnic2     0000:081:00.0  ixgbe     Up    10000  Full    90:e2:ba:3f:cd:20  9000  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection             

vmnic3     0000:081:00.1  ixgbe     Down      0  Half    90:e2:ba:3f:cd:21  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection             

vmnic_ib0  0000:006:00.0  ib_ipoib  Up    40000  Full    00:02:c9:2c:d2:f9  1500  Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

vmnic_ib1  0000:006:00.0  ib_ipoib  Up    40000  Full    00:02:c9:2c:d2:fa  1500  Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

5.5u2:

# esxcli network nic list

Name          PCI Device     Driver   Link  Speed  Duplex  MAC Address         MTU  Description                                                                  

------------  -------------  -------  ----  -----  ------  -----------------  ----  ------------------------------------------------------------------------------

vmnic0        0000:002:00.0  igb      Up     1000  Full    00:25:90:c7:f1:48  1500  Intel Corporation I350 Gigabit Network Connection                            

vmnic1        0000:002:00.1  igb      Down      0  Half    00:25:90:c7:f1:49  1500  Intel Corporation I350 Gigabit Network Connection                            

vmnic1000402  0000:006:00.0  mlx4_en  Down      0  Half    00:02:c9:2c:d3:0d  1500  Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

vmnic2        0000:081:00.0  ixgbe    Up    10000  Full    90:e2:ba:3f:cf:30  9000  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection             

vmnic3        0000:081:00.1  ixgbe    Down      0  Half    90:e2:ba:3f:cf:31  1500  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection             

vmnic4        0000:006:00.0  mlx4_en  Down      0  Full    00:02:c9:2c:d3:0c  1500  Mellanox Technologies MT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]

As you can see, on 5.5u1 the ib_ipoib driver was used by default, but now on 5.5u2 the mlx4_en driver has taken control. Do I have to manually reconfigure this? What are the steps? Also, why the strange looking vmnic (vmnic1000402)?

Any help is appreciated.

Thanks,

Tom

Reply
0 Kudos
9 Replies
yzennezy
Enthusiast
Enthusiast

I've just noticed that the driver ib_ipoib is not loaded for my updated host

5.5u1:

# vmkload_mod -l | grep ib_ipoib

ib_ipoib                 0    140   

# echo $?

0


5.5u2:

# vmkload_mod -l | grep ib_ipoib

# echo $?

1

Does anyone know how to load this driver and get IBoIP reconfigured?

Regards,
Tom

yzennezy
Enthusiast
Enthusiast

Also, the old vib depot (1.8.2.4) includes an ib_ipoib driver but the new one (1.9.10.0) doesn't!

# unzip -l /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip
Archive:  /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip
  Length     Date   Time    Name
--------    ----   ----    ----
      321  03-06-14 09:22   index.xml
      205  03-06-14 09:22   vendor-index.xml
    11092  03-06-14 09:22   metadata.zip
    38334  03-06-14 09:22   vib20/net-ib-cm/Mellanox_bootbank_net-ib-cm_1.8.2.4-1OEM.500.0.0.472560.vib
    30708  03-06-14 09:22   vib20/net-ib-umad/Mellanox_bootbank_net-ib-umad_1.8.2.4-1OEM.500.0.0.472560.vib
    54194  03-06-14 09:22   vib20/scsi-ib-srp/Mellanox_bootbank_scsi-ib-srp_1.8.2.4-1OEM.500.0.0.472560.vib
    92208  03-06-14 09:22   vib20/net-mlx4-ib/Mellanox_bootbank_net-mlx4-ib_1.8.2.4-1OEM.500.0.0.472560.vib
    98600  03-06-14 09:22   vib20/net-ib-ipoib/Mellanox_bootbank_net-ib-ipoib_1.8.2.4-1OEM.500.0.0.472560.vib
    28628  03-06-14 09:22   vib20/net-ib-sa/Mellanox_bootbank_net-ib-sa_1.8.2.4-1OEM.500.0.0.472560.vib
   127648  03-06-14 09:22   vib20/net-mlx4-core/Mellanox_bootbank_net-mlx4-core_1.8.2.4-1OEM.500.0.0.472560.vib
    40898  03-06-14 09:22   vib20/net-ib-mad/Mellanox_bootbank_net-ib-mad_1.8.2.4-1OEM.500.0.0.472560.vib
    55290  03-06-14 09:22   vib20/net-ib-core/Mellanox_bootbank_net-ib-core_1.8.2.4-1OEM.500.0.0.472560.vib
--------                   -------
   578126                   12 files

~ # unzip -l /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820.zip
Archive:  /vmfs/volumes/meta2/updates/MLNX-OFED-ESX-1.9.10.0-10EM-550.0.0.1331820.zip
  Length     Date   Time    Name
--------    ----   ----    ----
      321  10-19-14 12:40   index.xml
      205  10-19-14 12:40   vendor-index.xml
    11494  10-19-14 12:40   metadata.zip
    52894  10-19-14 12:40   vib20/net-ib-core/Mellanox_bootbank_net-ib-core_1.9.10.0-1OEM.550.0.0.1331820.vib
    42490  10-19-14 12:40   vib20/net-ib-mad/Mellanox_bootbank_net-ib-mad_1.9.10.0-1OEM.550.0.0.1331820.vib
    19510  10-19-14 12:40   vib20/net-ib-addr/Mellanox_bootbank_net-ib-addr_1.9.10.0-1OEM.550.0.0.1331820.vib
    83084  10-19-14 12:40   vib20/net-mlx4-ib/Mellanox_bootbank_net-mlx4-ib_1.9.10.0-1OEM.550.0.0.1331820.vib
    38066  10-19-14 12:40   vib20/net-ib-cm/Mellanox_bootbank_net-ib-cm_1.9.10.0-1OEM.550.0.0.1331820.vib
    71676  10-19-14 12:40   vib20/net-mlx4-en/Mellanox_bootbank_net-mlx4-en_1.9.10.0-1OEM.550.0.0.1331820.vib
   121304  10-19-14 12:40   vib20/net-mlx4-core/Mellanox_bootbank_net-mlx4-core_1.9.10.0-1OEM.550.0.0.1331820.vib
    26252  10-19-14 12:40   vib20/net-ib-umad/Mellanox_bootbank_net-ib-umad_1.9.10.0-1OEM.550.0.0.1331820.vib
    29786  10-19-14 12:40   vib20/net-ib-sa/Mellanox_bootbank_net-ib-sa_1.9.10.0-1OEM.550.0.0.1331820.vib
    36248  10-19-14 12:40   vib20/net-rdma-cm/Mellanox_bootbank_net-rdma-cm_1.9.10.0-1OEM.550.0.0.1331820.vib
    55928  10-19-14 12:40   vib20/scsi-ib-iser/Mellanox_bootbank_scsi-ib-iser_1.9.10.0-1OEM.550.0.0.1331820.vib
--------                   -------
   589258                   14 files

Do I have the wrong driver package? Why is the net-ib-ipoib vib missing in 1.9.10.0?

Kind regards,
Tom

yzennezy
Enthusiast
Enthusiast

I've restored the MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip drivers to the system to get it working again.  ESXi still complains that it's not signed so it has to be cajoled into place with a --no-sig-check but at least the ib_ipoib driver is back in place and the vmnic_ib0 and vmnic_ib1 are back. The storage adapters also look good: vmhba_mlx4_0.1.1 and vmhba_mlx4_0.2.1 are back and show 53 paths/tagets each.

Looks like the update is finally done although I would have preferred the more recent Mellanox drivers. I have a support ticket open with Mellanox so hopefully they can explain. Maybe the new driver pack (1.9.10) I'm looking at isn't correct at all for what I'm trying to achieve.

Reply
0 Kudos
mpogr
Enthusiast
Enthusiast

1.8.2.4 is the latest version from Mellanox supporting Infiniband mode (including IPoIB). The newer ones (1.9.x.x, both from VMware and from Mellanox) support only Ethernet mode for these devices. Basically, when you update your hosts to newer release of ESXi, you need to re-install the 1.8.2.4 while removing the mlx4_en VIB. If you don't do the latter, the vSphere HA is not going to work.

Reply
0 Kudos
yzennezy
Enthusiast
Enthusiast

Thanks mpogr, it looks like I summed that up all right. It's good to hear it confirmed from someone else, though.

BTW, we are using SRP currently as well. The SRP drivers are also gone from the 1.9.10 driver pack which defaults to iSER instead for vmhbas.

Now a few more questions come up:

  • Is support for the Infinband mode drivers being dropped?
  • Should I be moving to iSER and Ethernet mode for all my ESXi hosts?
Reply
0 Kudos
anthony2005
Contributor
Contributor

Hi!

I also have some troubles with Infiniband. I have four ESXi 5.5 hosts with MT26428 adapters onboard. Also I bought IS5022 switch and MC2206130-00A cables. However, indicators of switch ports are turned off and the vSphere Client app also shows a lack of connection (see image in attachment). What could be the problem?

mt26428.JPG

Reply
0 Kudos
mpogr
Enthusiast
Enthusiast

I don't think anyone but Mellanox can tell you if Infiniband mode is being dropped or not. So I'd suggest you to ask them.

As for moving from SRP to iSER, I guess this is primarily a question of what equipment you currently have/are planning to have. I have several Connect-X2 cards that support 40 Gbps only in Infiniband mode (and 10 Gbps in Ethernet mode)  and a very old Voltaire switch (10 Gbps), which doesn't support Ethernet mode at all. So, for me, there is no other choice, I pretty much have to use Infiniband mode and SRP. I have 4 ESXi hosts connecting to a CentOS 7 storage server (with two dual-port Connect-X2 cards) directly using QSFP-QSFP cables, so I can use SRP over full 40 Gbps speed for storage. For inter-ESXi (e.g. VMotion) network I use the second ports on the hosts connected to the switch using QSFP-CX4 cables at 10 (8) Gbps speed over IPoIB. I could potentially switch to the newer VMware/Mellanox drivers on ESXis and switch to Ethernet mode and iSER for storage, but then I'd lose on speed (would get 10 Gpbs instead of 40) and wouldn't be able to establish the inter-ESXi network at all (because my switch doesn't support the Ethernet mode), so no point in doing that.

It doesn't seem like Mellanox drivers for VMware allow separate configuration of ports on dual-port Connect-X2 cards (they do allow that for Connect-X3 ones), which is pity, as such configuration is possible on Linux and Windows.

Reply
0 Kudos
mpogr
Enthusiast
Enthusiast

It looks like you have a Connect-X2 card (same as mine) which is configured in Ethernet mode. Please, note, that in that mode its speed is limited to 10 Gbps rather than 40 Gbps which is possible in Infiniband mode. Your switch seems to be configured in Infiniband mode, that's why the card doesn't connect. You can do one of the following:

  • Keep using the newest drivers and reconfigure your switch for Ethernet mode. This will work, but will also limit your speed to 10 Gbps.
  • Switch to 1.8.2.4 drivers from Mellanox (as described above in this thread), which will allow you to use your card in Infiniband mode. This way you can reach the 40 Gbps speed. I guess you'd be better off this way.
Reply
0 Kudos
Jae-Hoon_Choi
Enthusiast
Enthusiast

This problem caused from inbox vSphere driver for Mellanox Ethernet Adapter.

Therefore you can't upgrade from previsous ESXi host to new version one.

This problem was derive from different core driver version between VPI HCA (Infiniband) and Ethernet Adapter.

This is very serious...Smiley Sad

Here is New vSphere VPI driver 1.8.2.4 installation on ESXi host step-by-step setup guide.

* New installation

01. Enter maintenance mode

02. Uninstall inbox driver on ESXi console or SSH console

esxcli software vib remove -n net-mlx4-en

esxcli software vib remove -n net-mlx4-core

03. Reboot the ESXi host

04. Locates vSphere 1.8.2.4 VPI driver in path "/var/log/vmware"

05. Install vSphere 1.8.2.4 VPI driver

esxcli software vib install -d /var/log/vmware/MLNX-OFED-ESX-1.8.2.4-10EM-500.0.0.472560.zip --no-sig-check

* This latest driver was only for vSphere ESXi 5.1. Therefore you must use --no-sig-check option!

06. Reboot the ESXi host

to yzennezy

I can't understand how did you can upgrade from ESXi 5.5u1 build 1623367 to ESXi 5.5u2 build 2302651.

If your ESXi host was installed vSphere VPI driver then ESXi installer shows a warnning that you can't upgrade because of difference in core driver version

that mentioned above.

Therefore I think you must uninstall all Mellanox driver then reboot your ESXi host, at last reinstall vSphere VPI driver 1.8.2.4 with above procedure and reboot the ESXi host again.

     * Your log shows your driver isn't Mellanox VPI driver but Ethernet one.

If you wan't uninstall all Mellanox driver from your ESXi host run this scripts on your ESXi console or SSH shell.

     * This scripts must be run in same order because there is driver dependencies in modules.

01. Uninstall all of Mellanox driver on ESXi host

esxcli software vib remove -n scsi-ib-srp

esxcli software vib remove -n net-ib-ipoib

esxcli software vib remove -n net-mlx4-ib

esxcli software vib remove -n net-ib-umad

esxcli software vib remove -n net-ib-cm

esxcli software vib remove -n net-ib-sa

esxcli software vib remove -n net-memtrack

esxcli software vib remove -n net-mlx4-en

esxcli software vib remove -n net-mlx4-core

esxcli software vib remove -n net-ib-mad

esxcli software vib remove -n net-ib-core

02. Reboot the ESXi host

03. Check your Mellanox driver status

esxcli software vib list | grep Mellanox

04. If you can't find any Mellanox driver on your ESXi host then reinstall vSphere VPI driver 1.8.2.4 with above scripts

Good luck!

Message was edited by: Jae-Hoon Choi Just edit the typo error.

Reply
0 Kudos