HP Proliant DL vmhba32 dead path condition

cjmorrison · ‎01-21-2020

We've been having lots of issues with our HP Proliant's (Various Generations) loosing the path to the 8 GB internal SD cards where ESXi is loaded. Typically a reboot will correct this condition and sometime it requires re-seating the internal SD cards.

Once the host is rebooted it will typically pick up the SD card where ESXi is loaded and boot without issues. If not, we usually have our NOC reseat the SD cards.

When this happens the bootbank vmfs volume where ESXi resides becomes unavailable and shows up in red when running ls -la /bootbank

lrwxrwxrwx 1 root root 49 Apr 10 2019 bootbank -> /vmfs/volumes/a7caa3ee-95aa4a08-2db1-7b6495be01fc

Here's our current ESXi info:

[root@vmesx004:/dev/disks] vmware -vl

VMware ESXi 6.5.0 build-8294253

VMware ESXi 6.5.0 Update 2

I've reviewed this KB2144283 and have updated the iLO from 2.3 to 2.61 but the same condition exist

VMware Knowledge Base

The only thing that changes is the iLO now shows this condition for the internal SD card.

At login screen: iLO Self-Test reports a problem with: Embedded Flash/SD-CARD. View details on Diagnostics page.

Controller firmware revision 2.09.00 Embedded media manager failed initialization

The above error points to this article: https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097

As I mentioned I already updated the iLO to 2.61 and we usually never have to perform a NAND format. The reboot usually gets SD back online.

I've tried stopping and restarting USB arbitration service and then rescanning adapters but this didn't help.

Here's some output from the last few lines of vmkernel.log

Line 90: 2020-01-19T20:19:45.778Z cpu26:65897)ScsiPath: 6787: Path vmhba32:C0:T0:L0 could not be unclaimed from plugin, status Busy. Continue path unclaiming

Line 91: 2020-01-19T20:19:45.779Z cpu26:65897)WARNING: ScsiScan: 1925: Could not delete path vmhba32:C0:T0:L0

Here's some additional output if that helps.

[root@vmesx004:/dev/disks] ls -l ./mpx*

-rw------- 1 root root 2008023040 Jan 21 22:32 ./mpx.vmhba32:C0:T0:L0

-rw------- 1 root root 4161536 Jan 21 22:32 ./mpx.vmhba32:C0:T0:L0:1

-rw------- 1 root root 262127616 Jan 21 22:32 ./mpx.vmhba32:C0:T0:L0:5

-rw------- 1 root root 262127616 Jan 21 22:32 ./mpx.vmhba32:C0:T0:L0:6

-rw------- 1 root root 115326976 Jan 21 22:32 ./mpx.vmhba32:C0:T0:L0:7

-rw------- 1 root root 299876352 Jan 21 22:32 ./mpx.vmhba32:C0:T0:L0:8

I was debating on trying these commands next but wasn't sure if this would help and I don't want to risk impacting the production environment with running VM's.

esxcli system module set --enabled=false --module=usb

esxcli system module set --enabled=true --module=usb

Any ideas on if it's possible to get this vmhba32 dead path condition fixed without rebooting host or re-seating the SD cards?

For some reason this is plaguing a lot of our host and causing issues since we need to get maintenance windows to migrate VM's off to reboot the host.

Seems like we should be able to correct this without rebooting the host...???

Appreciate any insight or suggestions!

If we can find a fix or work-around this would save us a lot of grief!

cjmorrison · ‎02-02-2020

Bump, has anyone encountered this issue?

If so, appreciate any fixes that don't require a reboot or reseat of the SD card.

Thx

NathanosBlightc · ‎02-02-2020

I had similar issue like that, and nothing fixed our problem, (even upgrading the iLO version) especially when we saw re-seating the SD-card will fix it, but after awhile it happened again. So I decided to change the SD-card vendor and model and replace it with another one, then everything worked fine. We used same SD model on another ESXi host with similar platform (DL380 G7) but this problem was seen just on some of them . The only thing that I can suggest to you (if you can't change your SD) is just move every write operaion (like the log generation via the ESXi) from the SD to other partitions

Please mark my comment as the Correct Answer if this solution resolved your problem

cjmorrison · ‎02-03-2020

Thanks for this information Amin. I'll probably look into a different SD vendor and also research setting up a separate partition for the log files. We have a lot more HP Proliants in our environment than Dell Poweredge servers. I don't know if we load ESXi differently on our Dells but I haven't seen this issue once on those. We have been plagued by this issue on our HP Proliant Gen 8,9 and probably soon our 10's. I'll post back if I find a good work around or what we attempt to do to prevent this regular issue.

All

HP Proliant DL vmhba32 dead path condition