VMware Cloud Community
Sharantyr3
Enthusiast
Enthusiast

VMware Lifecycle Manager many issues : design, bugs, HA down, ...

Hello there,

 

We migrated from 6.7 to 8 recently and I found out the yellow banner on top of the update section stating that update manager will go deprecated and VMware recommends to upgrade to lifecycle manager images.

 

So I jumped in, thanks god on a non critical cluster, and now the problems begins.

Also please note I have an open support ticket with vmware on this but yet no bug progress, support seems to discover lifecycle manager with me.

 

First of all, let's talk about bugs.

Bugs

Dell open manage integration

I have DELL servers, I use dell open manage enterprise appliance and I see lifecycle manager works with it to manage firmwares. Great! 

So I installed open manage vcenter plugin, registered plugin with my vcenter, and now I have the plugin present on my vcenter and working.

But when I try to integrate with lifecycle manager, it just ... timeout :

Sharantyr3_0-1676362879665.png

Where to look about why it doesn't work is now a new journey as VMware support will say it's DELL's fault and vice versa. Which log files should I check ? If anyone have some ideas here... thanks !

 

Images are bugged

I have a custom driver deployed with vmware update manager (broadcom bnxtnet 222.0.155.0-1OEM.700.1.0.15843807) that is superior to bundled version in the 8.0a - 20842819 image. So this result in "incompatible" in lifecycle manager as it can't downgrade stuff. Ok.

But selecting "Vendor addon"  Dell addon for PowerEdge Servers 800-A00 is containing a superior driver version (Broadcom NetXtreme-E VMKAPI network and RoCE driver for VMWare ESXi v 222.1.175.0-1OEM), so the result is now "out of compliance", that seems logic because it wants to update the driver.

Yet in the logs with the support we found out that the image was trying to integrate the 216.0.50.0-66 version that is the bundled version by default. It throws errors about the xml descriptor not being valid or something in /var/log/esxupdate.log

Adding the driver manually by selecting it with "components" solved the problem, the error message is gone.

This is nosense because the driver is already included in dell package.

I double checked by removing it from the "components" list and bingo, the image is still ok.

So something definilty wrong here, but we move on.

 

vsan compatibility isn't here

I tested this on a small 2 nodes cluster. When migrating to the lifecycle manager, it automaticly adds the witness into the scope of the image.

Ok, why not.

But the problem is lifecycle manager seems to be unable to talk correctly with witness.

On my base image, I had an issue of downgrade of vmware tools version, because with update manager I upgraded vmware tools version on ESXi, so that vmware tools version is superior to default esxi image.

Then, my 2 esxi and my witness shown "incompatible" status because downgrade not allowed.

You solve this by adding latest vmware tools "VMware Tools Async Release" in "components" 

But ! The witness do not get the change.

My 2 ESXi did catch the change and didnt issue an error anymore about downgrade, but not the witness. The witness was not catching the manually selected vmware tools async "component" and shown status "incompatible" because lifecycle would not allow downgrade of vmware tools....

I had to solve this by uninstalling vmware tools light VIB from the witness !!

Why force integrate the witness if lifecycle can't handle it correctly ?

 

VMware HA

The package vmware HA (fdm vib) makes the ESXi incompatible because lifecycle manager doesn't want to remove it. Theres articles on internet here saying it's normal and vcenter will redeploy the HA vib after remediation.

Can't you add a few lines of code in lifecycle manager to do it automaticly for this extremly specific vib ?

 

DRS is mandatory

On this small 2 nodes vsan cluster, I have standard licence. So no DRS.

When you try to remediate the cluster here is what you get :

Sharantyr3_0-1676365273581.png

Please note I made a new attempt by entering manually the first esxi in maintenance and tried to remediate only this specific host, guess what :

Sharantyr3_1-1676367382457.png

 

 

So what, I have converted my vsan cluster to lifecycle manger and I am now forced to upgrade my licences ?

 

 

Design issues

VMware HA

When you migrate to lifecycle manager image, you will most probably have issues before reaching a working image that will require remediation of all of your hosts, 100% guaranteed because the vmware HA package (fdm) will ALWAYS be required to be uninstalled, then remediate.

But guess what, you think you have time ? NOPE

When you migrate to lifecycle manager, the issues will ... KILL HA !!

HA will be inoperant all VMs will be marked "is not ha protected" and here is what you will see in your cluster :

Sharantyr3_2-1676366312949.png

 

 

Until you solved all the issues with your image, then remediated all your ESXi (I guess reboots needed, we all know how long it can be), until all this is done and if you are lucky that it works at the end, then your will be HA protected again.

 

I'm not a big fan of this integration / requirement / new design. What, a host is not in compliance, don't have a specific vib, then disable HA on entire cluster ??

 

The support engeener tried to solved this by uninstalling vmware fdm vib from my ESXi, but lifecycle manager image is still not compliant, still won't remediate, and now HA status shows

Sharantyr3_1-1676366118457.png

Please note that disconnecting / reconnecting ESXi, and reboot, does not change this.

Enabling / disabling HA has no effect at all.

 

 

No way back for us, vsan users

As you may not know, once migrated to lifecycle image, you are not allowed to return back to update manager.

For a vsan cluster, this is a disaster I guess ?

Also, for such a immature product, please allow us to roll back.

And next time guys, do not trust yellow banners :

Sharantyr3_4-1676366837524.png

Don't do it !! Wait !!

 

Reply
0 Kudos
2 Replies
lamw
Community Manager
Community Manager

JFYI - Thanks for the feedback. I've shared this thread with the vLCM PM

Sharantyr3
Enthusiast
Enthusiast

Hello and thanks !

 

We made progress with the VMware engeener.

Following this kb to reset update manager database fixed the HA issue (??) 

https://kb.vmware.com/s/article/2147284

 

Then, as it's a 2-nodes cluster, HA was blocking updates because of inssuficient reserved ressources. Resolved by disabling HA admission control.

Still struck on the fact that, when remediating an ESXi, it tries to make it enter maintenance mode using DRS, even if the host is in maintenance mode already.

As it is a standard licence ESXi, no DRS is possible and it fails.

 

 

About the witness issue, I think the problem is lifecycle image does not push additional components to the witness, which is ok I guess because it does not needs custom drivers (except if you have physical witness host). So removing the recent vmware tools vib was the key solution here.

 

I will post progress when possible.

Reply
0 Kudos