VMware Cloud Community
Joe_Papa
Enthusiast
Enthusiast

PSoD on ESXi 6.0 referencing Mellanox drivers

My HP ML350 G6 server has been running ESXi 6.0 for months flawlessly. Just the other day it started PSoD-ing every night. The PSoD references a Mallanox driver mlx4_core which after some digging I determined is an OEM driver lib from HP. Some people solved this issue by removing the Mellanox libs and replacing them with ones signed by VMWare instead of HP. In my case I don't think I even have any Mellanox hardware in my server... So, can I just remove them?

Reply
0 Kudos
5 Replies
scott28tt
VMware Employee
VMware Employee

@Joe_Papa 

Moderator: Moved to ESXi Discussions. Note also that ESXi 6.0 is out of support and updates.


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
Reply
0 Kudos
Joe_Papa
Enthusiast
Enthusiast

Thanks for putting me in the right subforum. I know my situation isn't exactly cutting edge, but I'm hoping someone has some good advice for me. 🙂

Reply
0 Kudos
ashilkrishnan
VMware Employee
VMware Employee

Hi @Joe_Papa ,

As Scott mentioned, it is a end of life product. If you are sure that these drivers are not being used, you are good to uninstall them.

Determine network/storage driver and firmware 

Determine the storage/network driver being used 

This command should help you remove the driver --> esxcli software vib remove -n  vibname

Hope that helps

Reply
0 Kudos
Joe_Papa
Enthusiast
Enthusiast

I dug into the ILo2 a bit and found this error associated with the PSoD.

Critical  PCI Bus  01/04/2021 10:11  01/04/2021 10:11  1  Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 3, Function 0, Error status 0x00000010)

This is what the server device list looks like. 

ML350 G6 Devices.JPG

Am I correct that the device listed in the error message is the Intel Corporation PCI Express Root Port 3?

Reply
0 Kudos
Joe_Papa
Enthusiast
Enthusiast

I looked at the VIB list on my server and found the following drivers VIBs that include the "mlx4" naming.

They are 3 from Mellanox. 

  1. net-mlx4-core
  2. net-mlx4-en

And 3 from VMware.

  1. nmlx4-core
  2. nmlx4-en
  3. nmlx4-rdma

The PSoD just says that "mlx4-core" is not provided and is required by "mlx4-en". 

ML350 G6 current used drivers.JPG

Reply
0 Kudos