VMware Cloud Community
bo_busillo
VMware Employee
VMware Employee

SD Boot issue Solution in 7.x

Issue The host goes into an un-responsive state due to: "Bootbank cannot be found at path '/bootbank” and boot device is in an APD state.

This issue is seen due to the boot device failing to respond & enter APD state (All paths down). Some cases, Host goes to non-responsive state & shows disconnected from vCenter.

As of 7.0 Update 1, the format of the ESX-OSData boot data partition has been changed. Instead of using FAT it is using a new format called VMFS-L. This new format allows much more and faster I/O to the partition. The level of read and write traffic is overwhelming and corrupting many less capable SD cards.

We have come across lot of customer’s reporting bootbank errors (host booting from SD cards) and host going into un-responsive state in ESXi version 7. 

Our VMware engineering team is gathering information for a fix, there is a new vmkusb driver version available for testing. There is currently a workaround in place, which is to install version-2 of vmkusb driver and monitor the host.

The action plan for future resolution would be to replace the SD card/s with a capable device/disk. Per the best practices mentioned on Installation guide. 

The version 7.0 Update 2 VMware ESXi Installation and Setup Guide, page 12, specifically says that the ESX-OSData partition "must be created on high-endurance storage devices".

https://docs.vmware.com/en/VMware-vSphere/7.0/vsphere-esxi-702-installation-setup-guide.pdf

You can also refer to the below KB:

Reference: https://kb.vmware.com/s/article/83376?lang=en_US

Resolution

VMware engineering has a fix that will be in the next release of 7.02 P03 which is planned for sometime in July 2021.

175 Replies
barnette08
Expert
Expert

What about systems with local SD boot, but /scratched placed on a remote datastore?  This would relieve some of the SD I/O wouldn't it? Or is it not enough to make a difference?  That's the default installation we use.

LucianoPatrão

Hi,

Finally some information (not official for this big issue).

Meanwhile, I wrote some workaround steps so that customer can get their servers back on without the need to reboot VMs.

https://www.provirtualzone.com/vsphere-7-update-2-loses-connection-with-sd-cards-workaround/

LP

Luciano Patrão

VCP-DCV, VCAP-DCV Design 2023, VCP-Cloud 2023
vExpert vSAN, NSX, Cloud Provider, Veeam Vanguard
Solutions Architect - Tech Lead for VMware / Virtual Backups

________________________________
If helpful Please award points
Thank You
Blog: https://www.provirtualzone.com | Twitter: @Luciano_PT
Reply
0 Kudos
bo_busillo
VMware Employee
VMware Employee

Whats the make/model of the SD card (or) how old are your SD cards? 

Whats your current environment and upgrade plans?

Reply
0 Kudos
LucianoPatrão

Hi,

Even you move the .locker to a datastore(that should be the best practices if using SD Cards) we still get the issue with Update 2.

Luciano Patrão

VCP-DCV, VCAP-DCV Design 2023, VCP-Cloud 2023
vExpert vSAN, NSX, Cloud Provider, Veeam Vanguard
Solutions Architect - Tech Lead for VMware / Virtual Backups

________________________________
If helpful Please award points
Thank You
Blog: https://www.provirtualzone.com | Twitter: @Luciano_PT
barnette08
Expert
Expert

Any ideas where to get version-2 of vmkusb driver?

coolsport00
Enthusiast
Enthusiast

Honestly...there should've been more publicity on this. The Guide you reference also states on pg. 16 SD cards *can* be used. So maybe that should be removed. You all (VMware, that is) promoted SD cards back in the day so heavily, and rightfully so. They're awesome! The install was small (still is); SD is fast (generally); with the exception of no disk redundancy, it's a great way to run ESXi. So ok....VMW changes things up a bit for their boot partitions...fine. Tech changes. We technologists get that. But a LOT of orgs run ESXi on SDs. As such, this change should've been made very public suggesting orgs to work towards moving away from SDs; and if not, what the repercussions would be. Just allowing orgs/customers who use SDs have Hosts go down is pretty crappy...unless of course you all didn't do due diligence QA'ing and notice Hosts crashing.

sysadmin84
Enthusiast
Enthusiast


@coolsport00 wrote:

Honestly...there should've been more publicity on this. The Guide you reference also states on pg. 16 SD cards *can* be used. So maybe that should be removed. You all (VMware, that is) promoted SD cards back in the day so heavily, and rightfully so. They're awesome! The install was small (still is); SD is fast (generally); with the exception of no disk redundancy, it's a great way to run ESXi. So ok....VMW changes things up a bit for their boot partitions...fine. Tech changes. We technologists get that. But a LOT of orgs run ESXi on SDs. As such, this change should've been made very public suggesting orgs to work towards moving away from SDs; and if not, what the repercussions would be. Just allowing orgs/customers who use SDs have Hosts go down is pretty crappy...unless of course you all didn't do due diligence QA'ing and notice Hosts crashing.


It's also worth mentioning that Dell, while not recommending SD cards anymore, never stated that they're not supported anymore:
"The Boot Optimized Storage Solution (BOSS) card is the preferred non-HDD or SSD device for VMware ESXi 7.0 installation. The Dell Internal Dual SD Module (IDSDM) install is no longer recommended due to write endurance issues with the SD flash media."

Source: https://www.dell.com/support/manuals/de-de/vmware-esxi-7.x/vmware_esxi_7.0_gsg/getting-started-with-...

coolsport00
Enthusiast
Enthusiast

Thanks for sharing "sysadmin84". I've always used HPE servers, until last yr. For our refresh cycle last yr, we got a good deal on DELLs and I really like them. Since v7 was out then, according to you, DELL reps who built my server specs should've probably known this and added those modules instead of the IDSDMs. But of course they didn't. Communication is a wonderful thing...and maybe there needs to be more of that with respect to this issue amongst all the h/w vendors, and recommendations to customers for boot h/w moving forward.

Cheers!

LucianoPatrão

Dell or HPE still continues to sell the Servers with SD Cards for vSphere 7, regardless of the statement on that document.

Just talk to a vendor and he will inform you nothing about that and will sell you the servers anyway with that configuration. But besides that, the problem for most of the customers is not the new ones, but the thousands of ESXi hosts that are installed today with SD Cards.

And changing that type of configuration from SD cards to local disks or whatever is not cheap. For hundreds of servers, this change will cost thousands to any company.

Luciano Patrão

VCP-DCV, VCAP-DCV Design 2023, VCP-Cloud 2023
vExpert vSAN, NSX, Cloud Provider, Veeam Vanguard
Solutions Architect - Tech Lead for VMware / Virtual Backups

________________________________
If helpful Please award points
Thank You
Blog: https://www.provirtualzone.com | Twitter: @Luciano_PT
LucianoPatrão


@barnette08 wrote:

Any ideas where to get version-2 of vmkusb driver?


Only from VMware support can provide you that vmkusb driver.

Luciano Patrão

VCP-DCV, VCAP-DCV Design 2023, VCP-Cloud 2023
vExpert vSAN, NSX, Cloud Provider, Veeam Vanguard
Solutions Architect - Tech Lead for VMware / Virtual Backups

________________________________
If helpful Please award points
Thank You
Blog: https://www.provirtualzone.com | Twitter: @Luciano_PT
Reply
0 Kudos
PatrickDLong
Enthusiast
Enthusiast

@barnette08  Only from GSS, and I have had zero luck getting them to give it to me, despite being part of a VERY large company with an ELA.  Good luck!

Reply
0 Kudos
PatrickDLong
Enthusiast
Enthusiast

@coolsport00  Agree with everything you said in your post about the historical push to simplify installs by using SD/USB and removing spinning rust failure points.

barnette08
Expert
Expert

Thanks I spoke with @bo_busillo offline and he confirmed that GSS was only using it to isolate the issue and is not part of a workaround.  The true fix is coming in a feature patch release in the near future.

Reply
0 Kudos
bo_busillo
VMware Employee
VMware Employee

Correct , GSS does NOT want the workaround to be used as a perm fix and recommending the install of P3 in July sometime.

Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot

I would like to know of all the existing client installs of 6.5 and 6.7 on SD cards, using Dell IDSM and HPE Mirrored technology, are UPGRADES to ESXi 7.0 now NOT SUPPORTED or BROKE.

Basically new server, or NEW VIRGIN INSTALL time ?

let me know ?

Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot

Recent client took delivery of Dell EMC R740 with Dell IDSM modules, Client refused to pay the bill, and ask Dell to collect them all, unless Dell EMC upgraded BOSS for FREE!

oh Dell came through with free upgrades!

Reply
0 Kudos
coolsport00
Enthusiast
Enthusiast

I bought new DELLs with IDSDM & installed v7. All good. I upgraded via vLCM to v7U1c and still all was (is) good. The problem seems to be specific to v7U2a. If you have yet to pay the bill, I wouldn't until DELL replaces the IDSDMs with modules geared towards the new I/O requirements of vSphere

einstein-a-go-g
Hot Shot
Hot Shot

Yes, no longer recommended, our Clients loved that print out from Dell Engineers that visited a recent delivery of new Dell EMC R740 !

 

with IDSM modules!

Reply
0 Kudos
einstein-a-go-g
Hot Shot
Hot Shot

Dell EMC sent out Engineers to replace all the IDSM modules, with BOSS for FREE on two sites, and it took the engineers a week!

But I'm not allowed to mention the clients name!