VMware Cloud Community
srodenburg
Expert
Expert
Jump to solution

Image based updates config corrupt. How to reset.

Hello,

My company took over a VMware customer account of a vSAN customer were wrong was done. I inherited a 4 node vSAN stretched cluster (2+2+1) which is now stable but the cluster is configured for Image based updating (IBU). The IBU config is totally messed up. Only the witness is visible in the host list, the 4 vSAN hosts are not there, the thing throws errors left and right and it’s a smoldering mess. I don’t know the history and I don’t know what happened.

The cluster is a production cluster and CANNOT be shut down for maintenance. vCenter lives on the vSAN datastore.

i would like to achieve two things:
A. somehow fix the cluster‘s LCM config so that the 4 nodes appear as healthy in the IBU config + the witness, thus enabling normal updating of the vSAN hosts which is long overdue (i think it runs 7.0 U2).
B. If possible, go back to normale baseline based updating (BBU). I know that going BBU -> IBU is a one way trip but it is so broken right now and I strongly prefer BBU anyway (discussing this preference is out of scope of this thread).

I know how to reset the LCM database (kb 2147284) but my question is: will this revert that cluster‘s LCM config (IBU) to factory defaults incl. standard BBU ?

0 Kudos
1 Solution

Accepted Solutions
srodenburg
Expert
Expert
Jump to solution

To close this topic:  resetting the vLCM DB did not help. VMware Support got involved as one point. Got escalated. They could not fix the config either.

The only way to straighten things out was to do a full vSAN evacuation of 1 vSAN Node, then remove it from the cluster in a temporary cluster. Then move it back in the vSAN cluster which made it appear normally in LCM. Repeat for all 4 nodes (which takes a lot of time because of all that data-evacuation etc.) until everything was normal.

Over the years, I learned to hate image-based updating with a passion. It is fine as long as it works, but Sonja Henie's Tutu!! you are so screwed when something goes wrong. Conversely, problems with baseline-based updating are easy to fix.

View solution in original post

0 Kudos
6 Replies
srodenburg
Expert
Expert
Jump to solution

Nobody?

0 Kudos
RajeevVCP4
Expert
Expert
Jump to solution

You are saying hosts are not visible in cluster ?

did you connect any host by UI , and check where it is connected , on Data center or cluster

Rajeev Chauhan
VCIX-DCV6.5/VSAN/VXRAIL
Please mark help full or correct if my answer is use full for you
0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

Hello @srodenburg,

Regarding your first point, I can see you mentioned version 7.0 U2. Double check if this is your scenario as the ability to manage Witness Host with vLCM has been added into vSAN 7.0 U3 and vCenter 7.0 U3. 

Regarding switching to baseline is not possible without creating a new cluster, once you are in image mode, you stay in image mode. You could try resetting the vLCM Database, anyways this will remove the custom baselines and not modify the Custom Images at all but is worth the try as maybe there is something conflicting there.

0 Kudos
srodenburg
Expert
Expert
Jump to solution

"You are saying hosts are not visible in cluster ?"

Sorry I should have been more clear: Only the witness is visible in the "LCM image based updating member/host list" (in the cluster's update screen)
All hosts are normally connected to vCenter.
In the cluster's update screen, the witness ist still there but the normal vSAN hosts in that cluster have disappeared (I don't know when and how as it happened before we took over this account).

@ Lalegre
I checked and It's a 7 U3c system. I mixed it up with another cluster.
I'll try resetting the vLCM DB and see what happens. I'll make a proper snapshot of the vCenter VM beforehand just in case the thing explodes in my face...

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

@srodenburg,

Yes, honestly I have never faced an issue where some hosts from the same cluster are not showing in vLCM Image, if you have a support contract active it would be worth opening it.

Well, maybe just the DB reset is enough.

Tags (1)
0 Kudos
srodenburg
Expert
Expert
Jump to solution

To close this topic:  resetting the vLCM DB did not help. VMware Support got involved as one point. Got escalated. They could not fix the config either.

The only way to straighten things out was to do a full vSAN evacuation of 1 vSAN Node, then remove it from the cluster in a temporary cluster. Then move it back in the vSAN cluster which made it appear normally in LCM. Repeat for all 4 nodes (which takes a lot of time because of all that data-evacuation etc.) until everything was normal.

Over the years, I learned to hate image-based updating with a passion. It is fine as long as it works, but Sonja Henie's Tutu!! you are so screwed when something goes wrong. Conversely, problems with baseline-based updating are easy to fix.

0 Kudos