PatrickDLong's Posts

I have a fleet of hosts running ESXi 7.0 U3c - these hosts all have motherboard-based micro-SD card boot devices.  As part of the recommended mitigations to reduce I/O to this type of boot device con... See more...
I have a fleet of hosts running ESXi 7.0 U3c - these hosts all have motherboard-based micro-SD card boot devices.  As part of the recommended mitigations to reduce I/O to this type of boot device configuration, I have from the beginning (even back when these hosts were ESXi 6.5/6.7) always configured productLocker to point to a shared storage location.  In addition to the I/O reduction to the boot device, this is a convenient way to keep all vm's tools installation current and consistent regardless of what host they are running on, because vmTools are backward and forward compatible to host version.  When a new vmtools version is released, I simply replace the files in my shared storage location (and restart mgmt agents on hosts) to compare guests' Tools version against the newly-uploaded version. The desired productLocker symlink is shown from the ls -n output: productLocker -> /vmfs/volumes/5e2b25a0-281972a8-507b-5cb901ffba10/SharedLocker After upgrading these hosts from 7.0U2d to 7.0 U3c (using the vendor-custom image profile) whenever I run any subsequent VUM updates to patch drivers, etc and reboot, the productLocker value reverts to: productLocker -> /SharedLocker where the SharedLocker text is in red indicating it is an invalid or accessible location.  I can reset this using the API call like: $esxName = '<hostname>.<domain>.net' $dsName = '<Datastore label>' $dsFolder = 'SharedLocker' $esx = Get-VMHost -Name $esxName $ds = Get-Datastore -Name $dsName $oldLocation = $esx.ExtensionData.QueryProductLockerLocation() $location = "/$($ds.ExtensionData.Info.Url.TrimStart('ds:/'))$dsFolder" $esx.ExtensionData.UpdateProductLockerLocation($location) Write-Host "Tools repository moved from" Write-Host $oldLocation Write-Host "to" Write-Host $location which resets the value to the desired location appropriately, and this setting DOES then persist across reboots from that point on...but if I then subsequently apply ANY additional driver update using VUM it resets the productLocker symlink value back to /SharedLocker on reboot.  Has anyone else seen this behavior or know the reason why this happens and how to prevent it?  Note, I do NOT see this behavior on my few hosts that have local high-endurance bootable media. and yes, I will be retrofitting all my hosts with M.2 disks, but I'd prefer to not have to deal with this in the interim every time I patch a host.
@mbartle  Sorry you are continuing to see issues.  From a prior gig (100% Dell compute shop) I am quite familiar with the 11G/12G ISDSM device and I assume the 13G/14G iteration would be similar.  I ... See more...
@mbartle  Sorry you are continuing to see issues.  From a prior gig (100% Dell compute shop) I am quite familiar with the 11G/12G ISDSM device and I assume the 13G/14G iteration would be similar.  I remember them being fairly stable with the exception of a handful of times 1 of the 2 SD cards would inexplicably physically remove itself and I would have to re-click it into position and re-mirror them - I would have to assume they were not entirely properly clicked in and seated from the factory and only slight vibrations over time revealed the issue as they worked their way out.  I currently work in a 100% HPE compute shop that is 100% virtualized - all servers are diskless with single microSD with socket on the motherboard, both Gen9 and Gen10 models  This is of course only anecdotal, but I upgraded in excess of 50 servers from 7.0U2a to 7.0U2c on the day the patch was released and I have not experienced any issues at all since then on any server with 7.0U2c loaded. I'm satisfied enough with the boot device stability at this point that I am going to resume my upgrade cadence for the remaining servers on 6.7.  Perhaps someone else with current Dell hardware using ISDSM as the boot device can chime in here with their experience with 7.0U2c? I would not expect 7.0U2d to provide any additional stability in this regard beyond what is provided in 7.0U2c, since the only fixlist item in the Release Notes indicates:  PR 2824750: ESXi hosts in a cluster on Dell EMC PowerFlex might intermittently fail with a purple diagnostic screen due to a PCPU preemption error The management of persistent memory in a cluster on Dell EMC PowerFlex with NVMe drives added as RDM devices might inconsistently update a PCPU preemption counter in ESXi hosts in the cluster. As a result, ESXi hosts might intermittently fail with a purple diagnostic screen.  
There are myriad posts on this forum and others regarding issues with 7.0U2 and SD-card or USB-based boot devices due to unthrottling of I/O writes to boot media introduced U2 (which explains why you... See more...
There are myriad posts on this forum and others regarding issues with 7.0U2 and SD-card or USB-based boot devices due to unthrottling of I/O writes to boot media introduced U2 (which explains why you had no issues with U1) as well as other factors.  These issues should have been resolved in latest patch release, so I'm not sure why you are seeing this if you are on 7.0U2c build-18426014 , but you should ensure that your iDRAC is at latest versions AND that ISDSM is running the latest firmware, - and additionally consider replacing these ISDSM module boot devices with BOSS card or other high-endurance media per Dell recommendation for ESXi 7.x hosts here: https://www.dell.com/support/manuals/en-us/vmware-esxi-7.x/vmware_esxi_7.0_gsg/getting-started-with-vmware-vsphere?guid=guid-c18ba369-c295-40ea-b289-f82b4cd5270a
This is related to the Virtual Nic function in your iLo.  Please see the following two articles for resolution: https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00117965en_us and http... See more...
This is related to the Virtual Nic function in your iLo.  Please see the following two articles for resolution: https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00117965en_us and https://kb.vmware.com/s/article/83593    
@A13xappreciate your thoughts, but one of your comments IMO highlights part of why so many VMware administrators are quite upset about this issue - WE were the QA. "the errors and log spew would hav... See more...
@A13xappreciate your thoughts, but one of your comments IMO highlights part of why so many VMware administrators are quite upset about this issue - WE were the QA. "the errors and log spew would have been detected after upgrading to make you stop the roll out." ...by VMware QA and stopped the 7.0 U2a release to GA.  There, I fixed it for you.:-)  I honestly don't think most people would have a problem with a major configuration supportability change like this given enough warning and procurement cycle runway to spec replacement hosts with actual disks- but that type of change should only happen at a MAJOR release version.  This issue did not - the VMFS-L formatting change occurred at 7.0GA, true enough, but no one running 7.0U1 is having any issues with USB or SD card boot devices that I am aware of; something clearly changed in the I/O profile of the vmkusb driver released with 7.0.U2 that is causing issues with that class of devices.  Thankfully I've seen no issues so far on the diskless hosts I've patched with U2c. Your statement regarding full and complete rebuild of a host is also a bit confusing.  I happen to agree that reinstalling from scratch is the cleanest method to upgrade between major versions - we've probably both been in this game long enough to see quite a few support issues caused by artifacts left over from previous installations.  But installing 7.0 from scratch would not have saved a diskless host from this issue eventually being triggered by patching to 7.02.  If you mean "rebuild" in the sense of retrofit existing hosts with additional hardware - that might make sense for some smaller implementations, but for larger environments in multiple fully-remote data centers the expense- in new physical equipment, travel, man-hours, and opportunity cost - would be simply staggering.  I should be able to upgrade my current hardware to the latest available version so long as I am compliant with VMware's HCL.  If VMware wants to stop supporting installation on USB/SD card media, they need to have given PLENTY of notice of a change like that coming for a future Major release AND figure out a way to incorporate that information into the HCL when selecting servers from various manufacturers. After waiting an interminable length of time for the patch I'm trying to move on - I've wasted enough time playing whack a mole with this issue and with U2c I'm rather enjoying not waking up every morning and having to check in on which hosts can no longer see their boot device
Thanks for the tip, @LucianoPatrão .  Duncan pointed out the same via Twitter.  I agree that the ESXi 7.0U2c patch shows shows up under ESXi 7.0 (not 7.0.0) dropdown, but this is COMPLETELY inconsist... See more...
Thanks for the tip, @LucianoPatrão .  Duncan pointed out the same via Twitter.  I agree that the ESXi 7.0U2c patch shows shows up under ESXi 7.0 (not 7.0.0) dropdown, but this is COMPLETELY inconsistent with the rest of the patch site for 7.x.  For example, for vCenter patches, the upgrade for VC-7.0U2c (along with VC-7.0u2, VC-7.0u2a, VC-7.0u2b) is clearly shown in its own 7.0.2 dropdown selection, NOT as links under the VC 7.0.0 selection.  Now you may say, fine, but VC is a different product altogether.  Sure, but now lets look at ESXi (Embedded and Installable) on the patch site.  The five patch links for version 7.0.1 (VMware-ESXi-7.0U1, VMware-ESXi-7.0U1a, VMware-ESXi-7.0U1b, VMware-ESXi-7.0U1c,and VMware-ESXi-7.0U1d) all appear under a dedicated dropdown selection of the same name 7.0.1 - NOT under the 7.0.0 dropdown and certainly not under the 7.0 dropdown. Why do 7.0U1a/b/c/d patches appear under a distinct 7.0.1 dropdown, but 7.0U2a/c patches do NOT appear under a distinct 7.0.2 dropdown?  It is inconsistent and confusing.  IMO. the 7.0 Update2 "c" patch release is a modified version of the 7.0. Update2 release, it is NOT a modified version of the 7.0 release and as such it should follow the convention established by the 7.0.1 patch downloads by having it's own specific 7.0.2 dropdown selection. VMware should either create a separate 7.0.2 selection in the dropdown for the 7.0U2a/c patches, or go back to the method used for 6.7 and prior releases where ALL patches and updates show up under the major revision number selection from the dropdown.  
@vmrulz  Confirmed - I also can no longer see it in the dropdown on https://customerconnect.vmware.com/patch/ . Curious, but I will withhold further comment until the reason is known. At this point ... See more...
@vmrulz  Confirmed - I also can no longer see it in the dropdown on https://customerconnect.vmware.com/patch/ . Curious, but I will withhold further comment until the reason is known. At this point I am unsure why it does not show up. I have opened a case with GSS seeking more information, and I reached out to some VMware notables on Twitter for comment. Perhaps @LucianoPatrão knows something?
The issues that trigger the problem are myriad so I cannot say for sure that you will definitely encounter it or will definitely NOT encounter it.  In my experience what I see is that vm's on an affe... See more...
The issues that trigger the problem are myriad so I cannot say for sure that you will definitely encounter it or will definitely NOT encounter it.  In my experience what I see is that vm's on an affected host continue to run but they cannot be vmotioned off to other hosts (not an problem for you since you have just a single standalone host.).  Solarwinds has trouble checking on the vm status and other activities that partially rely on the host (like backups) can throw errors or fail.  ESXi 7.02a is the problem version - since you installed that version onto a new USB stick holding the HPE 32GB MicroSD card,  there is no prior version to easily roll back , and it would require a reinstall to get back to either 7.01 or to 6.7.U3.<latest>.  From what I have read, EITHER a post-U2a patch containing a fix for this issue OR the full U3 patch will be released possibly in August - so you could roll the dice and just keep your eye on it.  I want to stress that info is hearsay and I have no personal insights to share.  If you DO encounter the issues noted above or other guest wierdness, the first thing I would do is ssh to the host and run ls -n to list the root directory contents.  If the command does not complete (or completes but shows bootbank and/or altbootbank in red text) you will need to follow the steps in the article I linked above to regain control of your host.  Once you have done those steps, the host should return to normal operation.  The process recommends rebooting your host once you regain control, but I have had success even WITHOUT rebooting the host (i.e. no down time for the vm's.)  Of course it is up to you to decide whether reloading a prior release for stability is worth the extra effort or if you can tolerate the risk of possible - but easily recoverable - host instability until a patch is released.
@Leon_Straathof  I think the issue (aside from the "endurance" qualities of the physical media connected via USB, regardless of internal or external) is also related the throughput, queue depth and t... See more...
@Leon_Straathof  I think the issue (aside from the "endurance" qualities of the physical media connected via USB, regardless of internal or external) is also related the throughput, queue depth and transfer mode supported by the USB controller itself.  Many of the systems on VMware's compatibility list for 7.x still use USB 2.0 controllers on the motherboard, meaning that their I/O capabilities to those type of boot devices are severely limited by the controller.  I can't see why USB3-connected devices would not continue to be supported as boot devices; and while I've seen plenty of documentation describing the change in boot-device preference, I haven't seen any public documentation from VMware on WHY they are pushing this change, I've particularly not seen anything from VMware that addresses the interface limitations of USB2 vs the capabilities of USB3, other than the small quote I pulled from KB 83963 in a prior post in this thread  "USB devices have a small queue depth".  This is a pretty good read:  https://www.linkedin.com/pulse/usb-30-compared-20-all-implementations-equal-dennis-mungai/  
The current issues with 7.0U2a are NOT specific to micro-SD cards per se.  The issue affects any 7.02a boot device connected to a USB-controller, like your USB stick you are currently using as your b... See more...
The current issues with 7.0U2a are NOT specific to micro-SD cards per se.  The issue affects any 7.02a boot device connected to a USB-controller, like your USB stick you are currently using as your boot device as well as many major system manufacturers' motherboards that include a SD/micro-SD slot (which is actually connected to the USB controller) on the motherboard.  The impact of the issues can be minimized (but not entirely eliminated) by relocating Scratch location, Coredump location and VMtools bits location to <not-the-boot-device> , which you should be doing anyway if booting from USB stick, but even with these mitigations I still encounter the problem periodically.  The workaround to regain control of a host in the throes of this failure is well-documented by @LucianoPatrão  here:  https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/ and in many other posts on this forum and others. Booting from USB-based devices is still a supported method that VMware is calling "Legacy", but booting from a high-endurance, high-throughput device like local SSD, M.2, etc. is the preferred "Long Term Support" option at this point.  I have hundreds of servers in the field using microSD boot devices and will continue to operate them that way as long as it is supported by VMware, and by then those servers will probably fall off the compatibility list for current versions anyway - but I am impatiently waiting for U3 or the patch which contains the fix for this issue.  New systems I order will be spec'd with local disks, mostly because I don't have the time or energy too play whack-a-mole again fixing host issues for months while waiting for a fix the next time some update or patch completely changes I/O patterns and does not get thoroughly QA'd against a major portion of the installed hardware base.
@TAB405ALZ There are two separate but related issues related to ESXi 7.0 U2 and SD-card (or other USB-based boot device media) Issue #1 - loss of connectivity to USB-based boot devices, APD to the b... See more...
@TAB405ALZ There are two separate but related issues related to ESXi 7.0 U2 and SD-card (or other USB-based boot device media) Issue #1 - loss of connectivity to USB-based boot devices, APD to the boot device filesystem: KB 83450 - ESXi hosts experiences All Paths Down events on USB based SD Cards while using the vmkusb driver https://kb.vmware.com/s/article/83450 KB 83963 - Bootbank cannot be found at path '/bootbank' errors being seen after upgrading to ESXi 7.0 U2 https://kb.vmware.com/s/article/83963  "USB devices have a small queue depth and due to a race condition in the ESXi storage stack, some I/O operations might not get to the device. Such I/Os queue in the ESXi storage stack and ultimately time out." A host suffering from this condition can usually be brought back under control in order to perform remediation steps using procedures outlined here:  https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/ credit to Luciano Patrao (@LucianoPatrão).   Issue #2 - corruption of USB-based boot media devices due to continuous and high-volume I/O KB 83376 - VMFS-L Locker partition corruption on SD cards in ESXi 7.0 https://kb.vmware.com/s/article/83376 KB 2149257 - High frequency of read operations on VMware Tools image may cause SD card corruption https://kb.vmware.com/s/article/2149257   Other generally informative related info/links/KB's: If you want to continue using SD-card or other USB-based boot media, you can reduce your chances of encountering this issue only by minimizing I/O going to that device by ensuring that Scratch is set to SAN datastore or local high-endurance media, and that you redirect references to local vmTools bits on the boot device to another location - either by copying vmTools to a RAMdisk or using a SharedLocker (both linked below.)  Even so, this will not 100% completely eliminate the issue; you will need to apply U3 when it is released.  It should contain an updated vmkusb driver that hopefully will resolve these issues. https://blogs.vmware.com/vsphere/2020/05/vsphere-7-esxi-system-storage-changes.html https://blogs.vmware.com/vsphere/2020/07/vsphere-7-system-storage-when-upgrading.html KB 2129825 - Installing and upgrading the latest version of VMware Tools on existing hosts https://kb.vmware.com/s/article/2129825 Redirect vmTools to SAN datastore - see section "Steps to set up /productLocker symlink" in  KB 2129825 - Installing and upgrading the latest version of VMware Tools on existing hosts  https://kb.vmware.com/s/article/2129825 Redirect vmTools to RAMdisk - KB 83782 - ToolsRamdisk option is not available with ESXi 7.0.x releases  https://kb.vmware.com/s/article/83782
@A13xDo you care to share either your source or confidence level in your statement "due next month"?  I will point out that the OP (employee) statement of "recommending the install of P3 in July some... See more...
@A13xDo you care to share either your source or confidence level in your statement "due next month"?  I will point out that the OP (employee) statement of "recommending the install of P3 in July sometime." was clearly either incorrect from the outset or invalidated as the date approached, and Duncan clarified this in his response to complaints on this thread after nothing was released on July 15 as had been widely speculated here and elsewhere.  https://communities.vmware.com/t5/ESXi-Discussions/SD-Boot-issue-Solution-in-7-x/m-p/2857776/highlight/true#M277012 "release dates are typically not shared, mainly as they change based on various aspects. In this case your source was/is wrong." It seems exceedingly clear to me that VMware is not going to make any official statement regarding release date for this patch, and speculation on release dates only serves to improperly set expectations, justified or not.
At least one of the following statements must be true regarding 7.0U2: -VMware didn't realize the severity of the impact of the change in I/O profile to USB-based boot devices (likely) -VMware didn... See more...
At least one of the following statements must be true regarding 7.0U2: -VMware didn't realize the severity of the impact of the change in I/O profile to USB-based boot devices (likely) -VMware didn't realize the volume of their install base using USB-based boot devices (unlikely) -VMware didn't anticipate the blowback that this issue would cause from both clients and hardware vendors and the unbridled anticipation of a forthcoming patch release. (likely) No serious person would equate VMware's change of recommendation of USB-based boot devices in 7.x to "Legacy" with other high-=endurance methods now "Preferred" to have actually meant that USB-based boot devices are now "at risk of catastrophic failure in your vSphere 7.0U2 environment". I value simplicity in my VMware hosts back to the GSX days- the fewer hardware components the better. I worked for YEARS to get all spinning disks out of my hosts to eliminate the most common failure point (aside from the occassional failed DIMM), only to have VMware pull a complete 180 with VSAN which of course requires plentiful local disks. Oh well, AFA vendors got my $ instead of my host compute vendor and I've never regretted the decision.   My entire vSphere environment runs on a large number of top tier (read: the orange company) FC AFA storage arrays that are unbelievably easy to manage and I've only replaced one single disk in an AFA over 6+ years. I REALLY don't want to get back in the business of installing controllers and local disks on my hosts unless it's absolutely necessary. @sbd27indicated "just buy RAID cards and mirrored SD disks...cost difference is about 1%" which may be true for the up-font cost, but *certainly* not for the TCO of the entire environment.  There would be three additional components (potential points of failure) in every host in my environment which will all require regular firmware upgrades and break/fix management. I have a 300-node production environment split across remote data centers. 300 RAID cards, plus 600 SSDs, plus the travel costs and labor hours associated with installing all of that hardware, not to mention the labor required to reload ESXi on all of those hosts with their shiny new "Preferred" boot devices - and the opportunity cost of all of the other business projects that will sit idle while me and my staff accomplish all of this rigamarole - the total cost of an effort like this is ASTRONOMICAL. @lukaslangI'm very interested to see where you read your 8 GB HPE microSD card in the BL460c: "is not officially supportet by HPE for ESXi 7" as I have many of the same blades. According to ESXi 7.0 Hardware Requirements doc, https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.esxi.install.doc/GUID-DEB8086A-306B-4239-BF76-E354679202FC.html   8 GB micro-SD boot devices should meet the requirement, albeit as a "Legacy" storage upgrade scenario requiring use of additional high-endurance device (which you have already said you do by relocating scratch, productLocker and LogDump to FC-SAN, as do I) per Neils' blog post here: https://blogs.vmware.com/vsphere/2020/07/vsphere-7-system-storage-when-upgrading.html I'm patiently awaiting the U3 patch like everyone else, but I am HIGHLY skeptical that it will document the precise root causes of this issue and what exact methods the patch is using to mitigate them. Maybe VMware will surprise me with transparency
@einstein-a-go-gthis debug analysis looks really interesting, although I admit I have no idea what I'm looking at on your real-time graph - can you explain a bit about the visualization, and have you... See more...
@einstein-a-go-gthis debug analysis looks really interesting, although I admit I have no idea what I'm looking at on your real-time graph - can you explain a bit about the visualization, and have you been able to run old/new versions of ESXi to compare?  I'm also interested in your statement regarding I/O to the boot device that you've read that the "majority is from the clustering service (vCLS) VMs".  I have suspected vCLS played a role as well - in fact I even suggested this very thing in one of my comments on @LucianoPatrão 's blog article about this subject - but I haven't seen that written anywhere else. Can you point to your sources for that information?  I'd like to read more.
@MarvinHuffakerIt is unclear from  your post - after installing 7.0.2 cleanly (I assume using the latest  custom Dell .iso " VMware-VMvisor-Installer-7.0.0.update02-17867351.x86_64-DellEMC_Customized... See more...
@MarvinHuffakerIt is unclear from  your post - after installing 7.0.2 cleanly (I assume using the latest  custom Dell .iso " VMware-VMvisor-Installer-7.0.0.update02-17867351.x86_64-DellEMC_Customized-A03.iso" Release Date: 2021-05-27, yes?) did you perform any of the recommended steps to minimize I/O to your SD card boot device listed a few posts up or did you just let it run without redirecting scratch and productLocker?   I have done those mitigations (I'm unable to rollback without a reinstall due to subsequent .vib installations) on my HPE servers -which are a mix of Gen9 blades and Gen10 Synergy compute modules.  All use HPE-branded quality micro-SD cards and I still see very sporadic I/O hitting the device in esxtop and I have not seen the issue in nearly a month;  my impression is that this sporadic I/O is likely heartbeating or keepalive based on the small volume and infrequency at which I see it.  This is a helpful post with details about SD card ratings  https://communities.vmware.com/t5/ESXi-Discussions/SD-Boot-issue-Solution-in-7-x/m-p/2852027/highlight/true#M276515 ...and a link to supported media http://partnerweb.vmware.com/programs/server_docs/Approved%20Flash%20Devices.pdf  - AFAIK, this information is not searchable on the VMware Compatibility Guide.  
@IRIX201110141   1-Regain control of hosts by following steps in this blog post to rescan storage:  https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/  2- R... See more...
@IRIX201110141   1-Regain control of hosts by following steps in this blog post to rescan storage:  https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/  2- Remediate hosts by minimizing I/O to the low-endurance SD boot media- move scratch partition (host logging, etc.) off SD boot device onto local high-endurance media or SAN shared storage 3- move vmTools references off of SD boot device by enabling UserVars/vmToolsRamdisk or by redirecting productLocker symlink to shared storage location 4- wait for 7.0u3 patch in July or August
@bo_busillothis is fantastic!  Are you aware of any way to programatically retrieve this information (or in fact any information at all - mfr, serial# model # etc.) from the SD or Micro-SD media that... See more...
@bo_busillothis is fantastic!  Are you aware of any way to programatically retrieve this information (or in fact any information at all - mfr, serial# model # etc.) from the SD or Micro-SD media that is installed in a server without physically removing the card to look at it?  The lsusb -v command only provides details about the USB hub/reader device, not the media inserted in it and I'm unaware of any other commands that might return this type of information. It's not visible in iLo, and my quotes from BL460c Gen9 servers purchased in 2016 and SY660 Gen10 servers just purchased 2021 show the same vendor P/N, although they are most certainly NOT the same physical card.  Last time I worked on one of my HPE Synergy SY660 compute modules I did take a picture of the card - thanks to your info now I know what all the symbols and numbers mean ; - ) Thank you!
@BohdanKotelyak  From your screenshot it appears your filesystem is reachable because it appears in blue.  If the links to bootbank and altbootbank appeared in red it would mean that your filesystem ... See more...
@BohdanKotelyak  From your screenshot it appears your filesystem is reachable because it appears in blue.  If the links to bootbank and altbootbank appeared in red it would mean that your filesystem is not reachable, likely in an APD situation to the boot device.  There are many issues with ESXi 7 (particularly U2) running on USB-based boot media, including motherboard-mounted SD-cards, etc.  Most of the issues are related to the host losing connectivity to the boot device filesystem intermittently or permanently, and more serious cases have resulted in boot device corruption.  If your host cannot see bootbank and altbootbank, you can likely recover from this without rebooting by following @LucianoPatrão 's excellent blog post here:  https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/  I would recommend that you evacuate one of these hosts, then  make sure you have redirected scratch to a persistent location like local disk or SAN volume, and relocate your vmTools bits either to ramdisk or to other location like shared storage.  These actions will reduce the amount of I/O going to your boot device and allow for greater stability.  Then move some vms back the the host and retry your backup operations - let us know if you still have issues.   Again, this may not be the cause of your issue, but if your scratch (and vmtools) is still pointing to USB you are effectively overwhelming the I/O capabilities of the device and you will have a myriad of issues.
@coolsport00  Agree with everything you said in your post about the historical push to simplify installs by using SD/USB and removing spinning rust failure points.
@barnette08  Only from GSS, and I have had zero luck getting them to give it to me, despite being part of a VERY large company with an ELA.  Good luck!