VMware Cloud Community
ctucci
Enthusiast
Enthusiast

Attempted update to 7.0b, /bootbank now missing.

I have a 2-node vSAN cluster with Dell r730xd hosts, running ESXi v7.0. I put one of the hosts into maintenance mode today, to start updating them both to 7.0b. When I tried to run the update from vCenter, it stopped with an error and checking the esxupdate log made it seem like something to do with the bootbank not being found. The host has ESXi installed on SD card via the Dell Dual SD Module. iDRAC shows both SD cards in the mirror to be healthy and working normally.

I rebooted the host, and it does seem to boot okay, and comes back online to vCenter but now shows that "No coredump target has been configured. Host core dumps cannot be saved." and trying the update again still produces same error in esxupdate log. I really am still learning vSphere but I would deduce something is wrong with the bootbank partition on the SD card, even though the system still boots okay... I even tried to use a 7.0b ISO USB installer to upgrade ESXi but when the installer gets to scanning devices, it quickly produces some error about "UnicodeDecodeError" and not being able to "decode byte 0x99 in position 5" because of "Invalid Start Byte". I'm guessing when the installer is scanning the SD cards to find existing ESXi, whatever bootbank issue is there causes an error out on installer booting from USB stick.

Either way, what can I do to check into why I am suddenly having what seems to be a bootbank issue and can I repair it somehow?

Post Title was edited: After further review of event logs, it appears the /bootbank symlinks only went missing after the attempted update to 7.0b via Lifecycle Manager in vCenter. See below posts.

EDIT:  For anyone looking in future, this was only resolved by re-installing ESXi and copying the state.tgz file from the bootbank on old install to new. But first, to make sure latest info was in state.tgz, I manually created the symlink /bootbank to /vmfs/volumes/BOOTBANK1. This then allowed "auto-backup.sh" to run and update the stale state.tgz to current state before copying it off host for new install use. After host was back up like normal, I tried the update from 7.0 to 7.0b again but this time from booting 7.0b ISO, and it worked. Not attempting with lifecycle manager again since it caused this issue the first time.

Reply
0 Kudos
11 Replies
Lalegre
Virtuoso
Virtuoso

Hey ctucci​,

If you try to "cd" into /bootbank are you able to list the files? Also have you checked the space?

Please run this two commands to idenfity if maybe one partition is full:

  • df -h
  • vdf -h
Reply
0 Kudos
ctucci
Enthusiast
Enthusiast

Hi Lalegre​,

If I do "cd /bootbank" it returns no such directory. The only place I see bootbank files is under "/vmfs/volumes/BOOTBANK1" and "/vmfs/volumes/BOOTBANK2". There are files in both with bootbank1 having files more recently modified and bootbank2 have files last modified a couple months ago.

If I run "df -h" I get:

FilesystemSize   Used Available Use%   Mounted on
VMFS-6  558.2G   5.3G553.0G   1% /vmfs/volumes/esxi-03-LDS01
VMFS-L   12.2G   1.6G 10.6G  13% /vmfs/volumes/LOCKER-5jjcbe6e-96115ffe-50e1-72g11c1cu7q6
vfat   1023.8M 173.5M850.3M  17% /vmfs/volumes/BOOTBANK1
vfat   1023.8M 173.6M850.2M  17% /vmfs/volumes/BOOTBANK2
vsan     16.4T   8.1T  8.3T  49% /vmfs/volumes/VSAN01-Datastore

If I run "vdf -h" I get:

Tardisk              Space  Used
vmx.v00               116M  116M
vim.v00               139M  139M
tpm.v00                24K   22K
sb.v00                171M  171M
s.v00                  68M   68M
dcism.v00               1M    1M
vmware_p.v00            7M    7M
bnxtnet.v00           688K  685K
bnxtroce.v00          324K  323K
brcmfcoe.v00            2M    2M
brcmnvme.v00          124K  123K
elxiscsi.v00          548K  546K
elxnet.v00            636K  635K
i40en.v00             604K  602K
i40iwn.v00            484K  480K
iavmd.v00             196K  195K
igbn.v00              320K  319K
iser.v00              260K  259K
ixgben.v00            524K  520K
lpfc.v00                2M    2M
lpnic.v00             636K  635K
lsi_mr3.v00           348K  344K
lsi_msgp.v00          484K  482K
lsi_msgp.v01          552K  549K
lsi_msgp.v02          512K  511K
mtip32xx.v00          256K  252K
ne1000.v00            636K  633K
nenic.v00             264K  261K
nfnic.v00             576K  573K
nhpsa.v00             612K  611K
nmlx4_co.v00          784K  781K
nmlx4_en.v00          732K  730K
nmlx4_rd.v00          340K  338K
nmlx5_co.v00            1M    1M
nmlx5_rd.v00          292K  288K
ntg3.v00              116K  115K
nvme_pci.v00          120K  117K
nvmerdma.v00          168K  164K
nvmxnet3.v00          196K  193K
nvmxnet3.v01          172K  168K
pvscsi.v00            124K  121K
qcnic.v00             300K  297K
qedentv.v00             3M    3M
qedrntv.v00             2M    2M
qfle3.v00               2M    2M
qfle3f.v00              1M    1M
qfle3i.v00            368K  367K
qflge.v00             500K  498K
rste.v00              828K  825K
sfvmk.v00             648K  647K
smartpqi.v00          364K  362K
vmkata.v00            204K  202K
vmkfcoe.v00          1008K 1006K
vmkusb.v00              1M    1M
vmw_ahci.v00          236K  234K
crx.v00                12M   12M
elx_esx_.v00            2M    2M
btldr.v00               1M    1M
esx_dvfi.v00          488K  484K
esx_ui.v00             14M   14M
esxupdt.v00             1M    1M
tpmesxup.v00           12K   11K
weaselin.v00            2M    2M
loadesx.v00            56K   53K
lsuv2_hp.v00           72K   70K
lsuv2_in.v00           28K   26K
lsuv2_ls.v00            1M    1M
lsuv2_nv.v00           16K   13K
lsuv2_oe.v00           16K   13K
lsuv2_oe.v01           16K   13K
lsuv2_oe.v02           16K   13K
lsuv2_sm.v00           56K   54K
native_m.v00            2M    2M
qlnative.v00            2M    2M
vdfs.v00               12M   12M
vmware_e.v00          188K  187K
vsan.v00               46M   46M
vsanheal.v00            7M    7M
vsanmgmt.v00           21M   21M
xorg.v00                3M    3M
state.tgz             112K  110K
vmware_f.v00           31M   31M
imgdb.tgz               1M    1M

-----

Ramdisk               Size  Used Available Use% Mounted on
root                   32M    4M   27M  13% --
etc                    28M  812K   27M   2% --
opt                    32M    8K   31M   0% --
var                    48M  752K   47M   1% --
tmp                   256M    9M  246M   3% --
iofilters              32M    0B   32M   0% --
shm                  1024M    0B 1024M   0% --
crx                  1024M    0B 1024M   0% --
configstore            32M   52K   31M   0% --
configstorebkp         32M   52K   31M   0% --
hostdstats           1479M    4M 1474M   0% --
Reply
0 Kudos
ctucci
Enthusiast
Enthusiast

Lalegre

Also, event log on host shows warnings about every hour that says "Bootbank cannot be found at path '/bootbank'." Although, I can navigate to /vmfs/volumes/BOOTBANK1 and /vmfs/volumes/BOOTBANK2 and files are there, as mentioned. If I look in "/" bootbank and altbootbank are missing, where they do exist on my other hosts.

Also looking at the event log, it appears that the /bootbank missing warnings only started after I rebooted the host after the original update attempt failed yesterday. All I did was try to remediate with the 7.0b baseline and it failed with this error, I tried applying update one more time after that first attempt, failed with same error, then rebooted host, then /bootbank is missing.

Annotation 2020-08-29 104543.jpg

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

It seems that there is a whole discussing about this issue and SDs cards on Dell and HPE servers but not finding the SD Card could be related to a hardware issue but as you see that the SD cards shown as healthy in the iDRAC we can assume they are both okay.

I have a few questions to see if we can find the issue:

  • How do you see the SD card on the storage devices? Are all paths up?
  • Was the SD Card recommended buy the vendor? Do you know if it is fully supported.
  • Are your drivers and firmware up to date?

If it is solved then it is something related with the version that you are upgrading and could have correlation with the SD Card, drivers you are using, firmware,etc.

Reply
0 Kudos
ctucci
Enthusiast
Enthusiast

Hello,

I saw that discussion too but yes seemed to be different since hardware checks out okay AND even if though /bootbank is missing, when I reboot the host, it still boots up okay... The Vmware KB articles I found on missing /bootbank error indicate the symlink goes missing when the boot device actually can't be accessed. But I can get to it via the long path under /vmfs/volumes so is it possible I am just missing the symlinks in "/" to the path of each bootbank?

1.) vCenter shows the boot SD card as up on that host, it reports the size, the partitions (4 total), and path with a green icon next to it.

2.) The SD card itself is supported as far as I know. I am using the same SD cards on the other hosts and there seem to be no issues there.

3.) Drivers and firmware are on the same versions between the two hosts and are listed on HCL as supported for ESXi 7.0.

To confirm, I can't try rolling back the update to 7.0b because it didn't actually complete. I'm still on 7.0, the first public build of it. I am afraid to try updating on the other host before I know what happened with this one since if there is a problem/bug with the 7.0b update, I don't want to also cause the issue on the other host and then both hosts in the vSAN cluster could be at risk of a boot issue.

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

Hey ctucci,

Oh well you are running ESXi 7 so there are some changes on the structure inside the ESXi: vSphere 7 - ESXi System Storage Changes - VMware vSphere Blog.​ How ever if you list on the root path you should see something like this:

Screenshot_4.png

What is your output there?

Reply
0 Kudos
ctucci
Enthusiast
Enthusiast

Hi Lalegre​,

Here is what I get. Since the symlink is not there, yet bootbanks show up under /vmfs/volumes and the system does indeed boot, and the boot sd card checks out okay, is it possible the only issue here is the symlinks just being missing? If so, it's strange the attempt to update to 7.0b would cause that. Is there a script that runs on boot that creates those symlinks in "/" that maybe got affected? As you can see in previous post, the event log error that comes up when updating fails mentions VIBs regarding the bootbank, I wonder if there is a bug in the update process?

Screen Shot 2020-08-30 at 8.05.23 AM.png

Reply
0 Kudos
Lalegre
Virtuoso
Virtuoso

It is particularly weird that the bootbank and altbootbank disappeared. There is an issue where the bootbank points to tmp but this is not your case. From here the only thing that i can suggest is give it a try at the next procedure: remount Esxi boot bank – Nick's computer on the cloud ​ or to re-install the ESXi

You can backup and then restore the configurations following the steps in the next KB: VMware Knowledge Base

For sure you can also wait for somebody in this community as they maybe have the solution for your issue.

Reply
0 Kudos
ctucci
Enthusiast
Enthusiast

That localcli command to restore bootbanks returns "Error, Invalid Parameter: bootbanks". I can't backup the config via that method unfortunately, running it returns:

(vmodl.fault.SystemError) {

   faultCause = (vmodl.MethodFault) null,

   faultMessage = <unset>,

   reason = "Internal error"

   msg = "Received SOAP response fault from [<cs p:000000fd472457f0, TCP:localhost:8307>]: backupConfiguration

A general system error occurred: Internal error"

}

I tried recreating the bootbank and altbootbank symlinks manually, which works to create them in "/" but that backup command still doesn't work. And then when I reboot esxi, the symlinks are gone again.

Reply
0 Kudos
toaday
Contributor
Contributor

Did you ever get this resolved? I'm experiencing a very similar issue but I am able to perform backups once I recreate the /bootbank and /altbootbank symlinks.

Reply
0 Kudos
fataktor
Contributor
Contributor

Is there a solution to this? I've raised it with VMware but they were not able to provide solution yet (case is still open).

Frustrating as the upgrade from ESXi6.7 to 7.0.3 went fine on 3 out of 4 of the same hosts (DELL R730) in the same cluster.

Reply
0 Kudos