VMware Cloud Community
kriegtiger
Enthusiast
Enthusiast
Jump to solution

ESXi 7u1 does not recognize my hba or drives after boot

I have a home ESXi system that has been working well for a year through power outages, manual reboots, etc.

Today I shut it down to add memory to the system (raising from 48gb to the MB max of 64gb) and after booting back up it refuses to read my HBA for the datastore, so all of my VM's are now invalid. 

I am stumped and could really use some assistance, this has effectively killed my entire home network. Any help is greatly apprecaited.

Here's a quick output of the devices. vmhba0 is the onboard sata controller, which has no drives attached. The system boots off of a small NVME m.2 drive. vmhba1 at the bottom is my add-in HBA. It initializes just fine during boot and registers the datastore volume. But when ESXi loads, even though it detects the device as vmhba1 at a command line level if I go to the storage -> Adapters tab, there is no VMHBA1 and the drive under it. Doing a 'df' also shows literally nothing.

I did notice that the date/time were reset back to 2009 and I had to force an NTP sync and have not rebooted since then, would that possibly cause this?

[root@hnpvmh01:~] uname -a
VMkernel hnpvmh01.localdomain 7.0.1 #1 SMP Release build-16850804 Sep 4 2020 11:20:43 x86_64 x86_64 x86_64 ESXi

[root@hnpvmh01:~] esxcli software vib list | egrep -i "bcm|lsi"|sort
bnxtnet 216.0.78.0-1OEM.700.1.0.15843807 BCM VMwareCertified 2021-03-24
bnxtroce 216.0.67.0-1OEM.700.1.0.15843807 BCM VMwareCertified 2021-03-24
lsi-mr3 7.713.08.00-1OEM.700.1.0.15843807 BCM VMwareCertified 2021-03-24
lsi-msgpt2 20.00.06.00-3vmw.701.0.0.16850804 VMW VMwareCertified 2021-03-24
lsi-msgpt3 17.00.10.00-2vmw.701.0.0.16850804 VMW VMwareCertified 2021-03-24
lsi-msgpt35 14.00.00.00-1OEM.700.1.0.15843807 BCM VMwareCertified 2021-03-24
lsuv2-lsiv2-drivers-plugin 1.0.0-4vmw.701.0.0.16850804 VMware VMwareCertified 2021-03-24

[root@hnpvmh01:~] lspci

0000:00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
0000:00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [PCIe RP[0000:00:01.0]]
0000:00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) [PCIe RP[0000:00:01.1]]
0000:00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530
0000:00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller
0000:00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1
0000:00:17.0 SATA controller: Intel Corporation Sunrise Point-H AHCI Controller [vmhba0]
0000:00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [PCIe RP[0000:00:1b.0]]
0000:00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [PCIe RP[0000:00:1c.0]]
0000:00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [PCIe RP[0000:00:1c.4]]
0000:00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [PCIe RP[0000:00:1d.0]]
0000:00:1f.0 ISA bridge: Intel Corporation Z170 Chipset LPC/eSPI Controller
0000:00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller
0000:00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus
0000:00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V [vmnic0]
0000:02:00.0 RAID bus controller: Broadcom ThinkSystem RAID 530-8i PCIe 12Gb Adapter [vmhba1]
0000:05:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

[root@hnpvmh01:~] df
Filesystem Bytes Used Available Use% Mounted on
[root@hnpvmh01:~]

kriegtiger_0-1659203669145.png

kriegtiger_1-1659203705754.png

 

Reply
0 Kudos
2 Solutions

Accepted Solutions
kriegtiger
Enthusiast
Enthusiast
Jump to solution

Freaking finally. The 'alias store' command didn't show any changes to start, but after a reboot it actually did pick it up, and ESXi decided to use it. I still have no idea why it didn't like it the first time but whatever. it's working now. 

--- AFTER moving the card over a slot in the motherboard - address info changed and no logical type for vmhba1 ---

[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --bus-type logical --alias vmhba1 --bus-address "pci#p0000:01:00.0#0"
[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
Bus type Bus address Alias
-------- ------------------- -----
pci p0000:01:00.0 vmhba1
pci p0000:00:1f.6 vmnic0
pci p0000:00:17.0 vmhba0
logical pci#p0000:00:1f.6#0 vmnic0
logical pci#p0000:00:17.0#0 vmhba0

--- 

Rebooted the system and now it comes back with the following, changed the address again but at least it decided to read and work with it

[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
Bus type Bus address Alias
-------- ------------------- -----
pci p0000:00:1f.6 vmnic0
pci s00000010.00 vmhba1
pci p0000:00:17.0 vmhba0
logical pci#p0000:00:1f.6#0 vmnic0
logical pci#s00000010.00#0 vmhba1
logical pci#p0000:00:17.0#0 vmhba0

'df' output, which used to be blank

[root@hnpvmh01:~] df
Filesystem Bytes Used Available Use% Mounted on
VMFS-6 886105440256 595687636992 290417803264 67% /vmfs/volumes/HW-RAID
VMFS-6 999922073600 7844397056 992077676544 1% /vmfs/volumes/SSD non-RAID
VMFS-L 128580583424 3574595584 125005987840 3% /vmfs/volumes/OSDATA-605b5d90-f689a704-e5d9-305a3a072261
vfat 4293591040 179175424 4114415616 4% /vmfs/volumes/BOOTBANK1
vfat 4293591040 65536 4293525504 0% /vmfs/volumes/BOOTBANK2

kriegtiger_1-1659236714090.pngkriegtiger_0-1659236700289.png

kriegtiger_2-1659236725897.png

F*** me, what a cluster-F this was for just adding memory.

View solution in original post

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

CORRECT SOLUTION:

So I replaced my motherboard because the onboard NIC died. Still had the same issues and I was about to rip out what little hair I had left. Went digging around decided that rather than searching for the device name (vmhba1) I would search logs for the driver itself - lsi_mr3.

This was the tipping point. I found all of these messages:

[root@hnpvmh01:/var/log] grep lsi *
shell.log:2022-08-12T22:02:00Z shell[1051142]: [root]: grep lsi *
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_mr3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt35.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt2.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_mr3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt35.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt2.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: Spinning up thread for binding driver lsi_mr3 in parallel.
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: Module 'lsi_mr3' load by uid=0 who=root successful
vmkdevmgr.log:2022-08-12T21:49:33Z vmkdevmgr[1049084]: Device 0x1a5e43055520247b was not bound to driver lsi_mr3 (module 43) for bus=pci addr=p0000:01:00.0 id=100000171d490500010400
vmkdevmgr.log:2022-08-12T21:49:35Z vmkdevmgr[1049136]: Spinning up thread for binding driver lsi_mr3 in parallel.
vmkdevmgr.log:2022-08-12T21:51:18Z vmkdevmgr[1049136]: Device 0x1a5e43055520247b was not bound to driver lsi_mr3 (module 43) for bus=pci addr=p0000:01:00.0 id=100000171d490500010400
vmkdevmgr.log:2022-08-12T21:51:18Z vmkdevmgr[1049136]: Spinning up thread for binding driver lsi_mr3 in parallel.
vmkdevmgr.log:2022-08-12T21:53:02Z vmkdevmgr[1049136]: Device 0x1a5e43055520247b was not bound to driver lsi_mr3 (module 43) for bus=pci addr=p0000:01:00.0 id=100000171d490500010400
vmkernel.log:2022-08-12T21:49:30.282Z cpu0:1048576)VisorFSTar: 1855: lsimr3.v00 for 0x5663f bytes
vmkernel.log:2022-08-12T21:49:30.283Z cpu0:1048576)VisorFSTar: 1855: lsimsgpt.v00 for 0x9483b bytes
vmkernel.log:2022-08-12T21:49:30.326Z cpu0:1048576)VisorFSTar: 1855: lsi_msgp.v00 for 0x78a68 bytes
vmkernel.log:2022-08-12T21:49:30.327Z cpu0:1048576)VisorFSTar: 1855: lsi_msgp.v01 for 0x7fdb8 bytes
vmkernel.log:2022-08-12T21:49:32.688Z cpu7:1049106)Loading module lsi_mr3 ...
vmkernel.log:2022-08-12T21:49:32.688Z cpu7:1049106)Elf: 2052: module lsi_mr3 has license ThirdParty
vmkernel.log:2022-08-12T21:49:32.691Z cpu7:1049106)lsi_mr3: 7.713.08.00
vmkernel.log:2022-08-12T21:49:32.692Z cpu7:1049106)Device: 194: Registered driver 'lsi_mr3' from 43
vmkernel.log:2022-08-12T21:49:32.692Z cpu7:1049106)Mod: 4845: Initialization of lsi_mr3 succeeded with module ID 43.
vmkernel.log:2022-08-12T21:49:32.692Z cpu7:1049106)lsi_mr3 loaded successfully.
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: mfi_AttachDevice:863: mfi: Attach Device.
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: mfi_AttachDevice:871: mfi: mfiAdapter Instance Created(Instance Struct Base_Address): 0x43008ac012a0
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: mfi_SetupIOResource:379: mfi bar: 0.
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1608: RDPQ mode supported
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1644: fusion_init Allocated MSIx count 1 MaxNumCompletionQueues 1
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1656: Dual QD not exposed:disable_dual_qd=0
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1716: maxSGElems 64 max_sge_in_main_msg 8 max_sge_in_chain 64
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1778: fw_support_ieee = 67108864.
vmkernel.log:2022-08-12T21:49:33.696Z cpu7:1049106)WARNING: lsi_mr3: fusion_init:1785: Failed to Initialise IOC
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)lsi_mr3: fusion_cleanup:1871: mfi: cleanup fusion.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_FirmwareInit:2227: adapter init failed.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)lsi_mr3: mfi_FreeAdapterResources:680: mfi: destroying timer queue.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)lsi_mr3: mfi_FreeAdapterResources:691: mfi: destroying locks.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)lsi_mr3: mfi_AttachDevice:863: mfi: Attach Device.
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)lsi_mr3: mfi_AttachDevice:871: mfi: mfiAdapter Instance Created(Instance Struct Base_Address): 0x43008ac012a0
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)lsi_mr3: mfi_SetupIOResource:379: mfi bar: 0.
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkernel.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkernel.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkernel.log:2022-08-12T21:51:18.427Z cpu2:1049137)lsi_mr3: mfi_FreeAdapterResources:680: mfi: destroying timer queue.
vmkernel.log:2022-08-12T21:51:18.427Z cpu7:1049137)lsi_mr3: mfi_FreeAdapterResources:691: mfi: destroying locks.
vmkernel.log:2022-08-12T21:51:18.427Z cpu7:1049137)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)lsi_mr3: mfi_AttachDevice:863: mfi: Attach Device.
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)lsi_mr3: mfi_AttachDevice:871: mfi: mfiAdapter Instance Created(Instance Struct Base_Address): 0x43008ac012a0
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)lsi_mr3: mfi_SetupIOResource:379: mfi bar: 0.
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)lsi_mr3: mfi_FreeAdapterResources:680: mfi: destroying timer queue.
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)lsi_mr3: mfi_FreeAdapterResources:691: mfi: destroying locks.
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkwarning.log:2022-08-12T21:49:33.696Z cpu7:1049106)WARNING: lsi_mr3: fusion_init:1785: Failed to Initialise IOC
vmkwarning.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_FirmwareInit:2227: adapter init failed.
vmkwarning.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkwarning.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkwarning.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkwarning.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkwarning.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkwarning.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkwarning.log:2022-08-12T21:51:18.427Z cpu7:1049137)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkwarning.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkwarning.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkwarning.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkwarning.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkwarning.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure

So I went digging and found a thread related to a different RAID card that used the same adapter. They mentioned turning off the CSM settings in the bios under the boot settings. Sure enough, turning this off FIXED EVERYTHING. Thank goodness that nightmare is finally over. ESXi booted lightning fast and my datastore and VM's are all present. 

 

View solution in original post

Reply
0 Kudos
11 Replies
a_p_
Leadership
Leadership
Jump to solution

Some thoughts:

  • take a look at the vmkernel.log and look for related entries
  • remove the newly added RAM again, just to see if this is related

André

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

There's only one line referencing the device in vmkernel.log. I did a grep on the whole log directory and got a little more info, sorted by timestamp.

[root@hnpvmh01:/var/log] grep vmhba1 *|sort -t: -k2
vmkernel.log:2022-07-30T18:04:37.775Z cpu0:1049047)PCI: 1024: 0000:02:00.0 named 'vmhba1' (was '')
vmkdevmgr.log:2022-07-30T18:04:37Z vmkdevmgr[1049047]: InheritVmklinuxAliases: not a vmnic alias vmhba1
vmkdevmgr.log:2022-07-30T18:04:37Z vmkdevmgr[1049047]: PciBus: Not committing alias vmhba1 for busAddress s00000001.00

It seems like such a weird issue for memory to cause.... I'm reluctant to pull the memory back out because the system is a little finicky and it took a few tries of reseating and booting for it to come up to start with. 

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

I believe I see the problem. It looks like device ordering/naming got mixed up. Looking at device names/aliases the vmnic0 matches the actual PC hardware address, but the vmhba1 one doesn't.

[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
Bus type Bus address Alias
-------- ------------------- -----
pci p0000:00:1f.6 vmnic0
pci s00000001.00 vmhba1
logical pci#p0000:00:1f.6#0 vmnic0
logical pci#s00000001.00#0 vmhba1
[root@hnpvmh01:~] lspci|egrep "vmnic|vmhba"
0000:00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V [vmnic0]
0000:02:00.0 RAID bus controller: Broadcom ThinkSystem RAID 530-8i PCIe 12Gb Adapter [vmhba1]

So now the fun I get to figure out is how to re-name/alias/whatever that to get it to line up right. 

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

I have attempted to modify the aliases using the following commands and done a clean shutdown, then rebooted. Nothing changed.

 

[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
Bus type Bus address Alias
-------- ------------------- -----
pci p0000:00:1f.6 vmnic0
pci s00000001.00 vmhba1
logical pci#p0000:00:1f.6#0 vmnic0
logical pci#s00000001.00#0 vmhba1
[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --bus-type logical --alias vmhba1 --bus-address "pci#p0000:02:00.0#0"
[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --bus-type pci --alias vmhba1 --bus-address p0000:02:00.0
[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list

From what I understand, this would suggest that a host profile manages the aliases instead of a stateful system. However this is just a single esxi and I have not been able to find a place to even look at a host profile, let alone export/edit/apply.

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

Ok, maybe I'm barking up the wrong tree. The hardware pci list command shows the HBA just fine with the s address that's shown above.

 

[root@hnpvmh01:/etc] esxcli hardware pci list
...
0000:02:00.0
Address: 0000:02:00.0
Segment: 0x0000
Bus: 0x02
Slot: 0x00
Function: 0x0
VMkernel Name: vmhba1
Vendor Name: Broadcom
Device Name: ThinkSystem RAID 530-8i PCIe 12Gb Adapter
Configured Owner: VMkernel
Current Owner: VMkernel
Vendor ID: 0x1000
Device ID: 0x0017
SubVendor ID: 0x1d49
SubDevice ID: 0x0500
Device Class: 0x0104
Device Class Name: RAID bus controller
Programming Interface: 0x00
Revision ID: 0x01
Interrupt Line: 0x0a
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x3001
Module ID: -1
Module Name: None
Chassis: 0
Physical Slot: 1
Slot Description: PCIEX8_2
Device Layer Bus Address: s00000001.00
Passthru Capable: true
Parent Device: PCI 0:0:1:1
Dependent Device: PCI 0:2:0:0
Reset Method: Function reset
FPT Sharable: true
...

But I'm still guessing the problem centers on this message:

2022-07-30T23:32:03Z vmkdevmgr[1049045]: PciBus: Not committing alias vmhba1 for busAddress s00000001.00

I just don't know what to do about it. 

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

Apparently my hunch was wrong. Re-enabled the onboard software/SATA controller (originally vmhba0) to see what it would look like in comparison. Everything looks fine, even the 'not committing' message is there. I'm at a total loss.

 

[root@hnpvmh01:/var/log] grep vmhba vmkdevmgr.log
2022-07-31T00:03:15Z vmkdevmgr[1049044]: PciBus: Not committing alias vmhba1 for busAddress s00000001.00
2022-07-31T00:03:15Z vmkdevmgr[1049044]: PciBus: Not committing alias vmhba0 for busAddress p0000:00:17.0
2022-07-31T00:03:15Z vmkdevmgr[1049044]: InheritVmklinuxAliases: not a vmnic alias vmhba1
2022-07-31T00:03:15Z vmkdevmgr[1049044]: InheritVmklinuxAliases: not a vmnic alias vmhba0
2022-07-31T00:05:02Z vmkdevmgr[1049105]: Set alias 'vmhba0' for device 0x2823430554405287
2022-07-31T00:05:02Z vmkdevmgr[1049105]: Set alias 'vmhba0' for device 0x2823430554405287

 

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

Minor update - I gave up and put the original memory back into all the original positions. No change. 

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

Tried moving the card up one slot. System recognized the change, deleted the old alias, created a new PCI alias but no logical alias. Trying to add/store the alias manually doesn't appear to work.

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

Freaking finally. The 'alias store' command didn't show any changes to start, but after a reboot it actually did pick it up, and ESXi decided to use it. I still have no idea why it didn't like it the first time but whatever. it's working now. 

--- AFTER moving the card over a slot in the motherboard - address info changed and no logical type for vmhba1 ---

[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --bus-type logical --alias vmhba1 --bus-address "pci#p0000:01:00.0#0"
[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
Bus type Bus address Alias
-------- ------------------- -----
pci p0000:01:00.0 vmhba1
pci p0000:00:1f.6 vmnic0
pci p0000:00:17.0 vmhba0
logical pci#p0000:00:1f.6#0 vmnic0
logical pci#p0000:00:17.0#0 vmhba0

--- 

Rebooted the system and now it comes back with the following, changed the address again but at least it decided to read and work with it

[root@hnpvmh01:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
Bus type Bus address Alias
-------- ------------------- -----
pci p0000:00:1f.6 vmnic0
pci s00000010.00 vmhba1
pci p0000:00:17.0 vmhba0
logical pci#p0000:00:1f.6#0 vmnic0
logical pci#s00000010.00#0 vmhba1
logical pci#p0000:00:17.0#0 vmhba0

'df' output, which used to be blank

[root@hnpvmh01:~] df
Filesystem Bytes Used Available Use% Mounted on
VMFS-6 886105440256 595687636992 290417803264 67% /vmfs/volumes/HW-RAID
VMFS-6 999922073600 7844397056 992077676544 1% /vmfs/volumes/SSD non-RAID
VMFS-L 128580583424 3574595584 125005987840 3% /vmfs/volumes/OSDATA-605b5d90-f689a704-e5d9-305a3a072261
vfat 4293591040 179175424 4114415616 4% /vmfs/volumes/BOOTBANK1
vfat 4293591040 65536 4293525504 0% /vmfs/volumes/BOOTBANK2

kriegtiger_1-1659236714090.pngkriegtiger_0-1659236700289.png

kriegtiger_2-1659236725897.png

F*** me, what a cluster-F this was for just adding memory.

Reply
0 Kudos
kriegtiger
Enthusiast
Enthusiast
Jump to solution

CORRECT SOLUTION:

So I replaced my motherboard because the onboard NIC died. Still had the same issues and I was about to rip out what little hair I had left. Went digging around decided that rather than searching for the device name (vmhba1) I would search logs for the driver itself - lsi_mr3.

This was the tipping point. I found all of these messages:

[root@hnpvmh01:/var/log] grep lsi *
shell.log:2022-08-12T22:02:00Z shell[1051142]: [root]: grep lsi *
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_mr3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt35.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt2.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_mr3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt35.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt2.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: DriverMap: Parsing map: /etc/vmware/default.map.d/lsi_msgpt3.map
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: Spinning up thread for binding driver lsi_mr3 in parallel.
vmkdevmgr.log:2022-08-12T21:49:32Z vmkdevmgr[1049084]: Module 'lsi_mr3' load by uid=0 who=root successful
vmkdevmgr.log:2022-08-12T21:49:33Z vmkdevmgr[1049084]: Device 0x1a5e43055520247b was not bound to driver lsi_mr3 (module 43) for bus=pci addr=p0000:01:00.0 id=100000171d490500010400
vmkdevmgr.log:2022-08-12T21:49:35Z vmkdevmgr[1049136]: Spinning up thread for binding driver lsi_mr3 in parallel.
vmkdevmgr.log:2022-08-12T21:51:18Z vmkdevmgr[1049136]: Device 0x1a5e43055520247b was not bound to driver lsi_mr3 (module 43) for bus=pci addr=p0000:01:00.0 id=100000171d490500010400
vmkdevmgr.log:2022-08-12T21:51:18Z vmkdevmgr[1049136]: Spinning up thread for binding driver lsi_mr3 in parallel.
vmkdevmgr.log:2022-08-12T21:53:02Z vmkdevmgr[1049136]: Device 0x1a5e43055520247b was not bound to driver lsi_mr3 (module 43) for bus=pci addr=p0000:01:00.0 id=100000171d490500010400
vmkernel.log:2022-08-12T21:49:30.282Z cpu0:1048576)VisorFSTar: 1855: lsimr3.v00 for 0x5663f bytes
vmkernel.log:2022-08-12T21:49:30.283Z cpu0:1048576)VisorFSTar: 1855: lsimsgpt.v00 for 0x9483b bytes
vmkernel.log:2022-08-12T21:49:30.326Z cpu0:1048576)VisorFSTar: 1855: lsi_msgp.v00 for 0x78a68 bytes
vmkernel.log:2022-08-12T21:49:30.327Z cpu0:1048576)VisorFSTar: 1855: lsi_msgp.v01 for 0x7fdb8 bytes
vmkernel.log:2022-08-12T21:49:32.688Z cpu7:1049106)Loading module lsi_mr3 ...
vmkernel.log:2022-08-12T21:49:32.688Z cpu7:1049106)Elf: 2052: module lsi_mr3 has license ThirdParty
vmkernel.log:2022-08-12T21:49:32.691Z cpu7:1049106)lsi_mr3: 7.713.08.00
vmkernel.log:2022-08-12T21:49:32.692Z cpu7:1049106)Device: 194: Registered driver 'lsi_mr3' from 43
vmkernel.log:2022-08-12T21:49:32.692Z cpu7:1049106)Mod: 4845: Initialization of lsi_mr3 succeeded with module ID 43.
vmkernel.log:2022-08-12T21:49:32.692Z cpu7:1049106)lsi_mr3 loaded successfully.
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: mfi_AttachDevice:863: mfi: Attach Device.
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: mfi_AttachDevice:871: mfi: mfiAdapter Instance Created(Instance Struct Base_Address): 0x43008ac012a0
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: mfi_SetupIOResource:379: mfi bar: 0.
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1608: RDPQ mode supported
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1644: fusion_init Allocated MSIx count 1 MaxNumCompletionQueues 1
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1656: Dual QD not exposed:disable_dual_qd=0
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1716: maxSGElems 64 max_sge_in_main_msg 8 max_sge_in_chain 64
vmkernel.log:2022-08-12T21:49:32.696Z cpu7:1049106)lsi_mr3: fusion_init:1778: fw_support_ieee = 67108864.
vmkernel.log:2022-08-12T21:49:33.696Z cpu7:1049106)WARNING: lsi_mr3: fusion_init:1785: Failed to Initialise IOC
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)lsi_mr3: fusion_cleanup:1871: mfi: cleanup fusion.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_FirmwareInit:2227: adapter init failed.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)lsi_mr3: mfi_FreeAdapterResources:680: mfi: destroying timer queue.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)lsi_mr3: mfi_FreeAdapterResources:691: mfi: destroying locks.
vmkernel.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)lsi_mr3: mfi_AttachDevice:863: mfi: Attach Device.
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)lsi_mr3: mfi_AttachDevice:871: mfi: mfiAdapter Instance Created(Instance Struct Base_Address): 0x43008ac012a0
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)lsi_mr3: mfi_SetupIOResource:379: mfi bar: 0.
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkernel.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkernel.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkernel.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkernel.log:2022-08-12T21:51:18.427Z cpu2:1049137)lsi_mr3: mfi_FreeAdapterResources:680: mfi: destroying timer queue.
vmkernel.log:2022-08-12T21:51:18.427Z cpu7:1049137)lsi_mr3: mfi_FreeAdapterResources:691: mfi: destroying locks.
vmkernel.log:2022-08-12T21:51:18.427Z cpu7:1049137)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)lsi_mr3: mfi_AttachDevice:863: mfi: Attach Device.
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)lsi_mr3: mfi_AttachDevice:871: mfi: mfiAdapter Instance Created(Instance Struct Base_Address): 0x43008ac012a0
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)lsi_mr3: mfi_SetupIOResource:379: mfi bar: 0.
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkernel.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)lsi_mr3: mfi_FreeAdapterResources:680: mfi: destroying timer queue.
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)lsi_mr3: mfi_FreeAdapterResources:691: mfi: destroying locks.
vmkernel.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkwarning.log:2022-08-12T21:49:33.696Z cpu7:1049106)WARNING: lsi_mr3: fusion_init:1785: Failed to Initialise IOC
vmkwarning.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_FirmwareInit:2227: adapter init failed.
vmkwarning.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkwarning.log:2022-08-12T21:49:33.697Z cpu7:1049106)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkwarning.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkwarning.log:2022-08-12T21:49:35.215Z cpu5:1049137)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkwarning.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkwarning.log:2022-08-12T21:51:18.427Z cpu2:1049137)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkwarning.log:2022-08-12T21:51:18.427Z cpu7:1049137)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure
vmkwarning.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfiCheckFwReady:1914: megasas: FW in FAULT state!!
vmkwarning.log:2022-08-12T21:51:18.959Z cpu5:1049175)WARNING: lsi_mr3: mfi_FirmwareInit:2217: FW not in READY state
vmkwarning.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfiDoChipReset:4078: megaraid_sas: Diag reset adapter never cleared!
vmkwarning.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:915: mfi: failed to init firmware.
vmkwarning.log:2022-08-12T21:53:02.152Z cpu2:1049175)WARNING: lsi_mr3: mfi_AttachDevice:948: Failed - Failure

So I went digging and found a thread related to a different RAID card that used the same adapter. They mentioned turning off the CSM settings in the bios under the boot settings. Sure enough, turning this off FIXED EVERYTHING. Thank goodness that nightmare is finally over. ESXi booted lightning fast and my datastore and VM's are all present. 

 

Reply
0 Kudos
D_G_Tal
Enthusiast
Enthusiast
Jump to solution

TNX for share

Reply
0 Kudos