VMware Cloud Community
franckehret
Enthusiast
Enthusiast
Jump to solution

After Vcenter Patch 7.0.3a causing HA election loop

Hi there,

 

I've installed latest vCenter this morning to fix the Lifecycle access issue but I encountered another one...

After update and reboot, I've noticed HA alertes on all my host and looking at tasks, I've noticed HA election was looping.

I've tried to disable HA and reactivate once all agents were uninstalled but no success, I had to rollback.

 

PS: It's in my lab, I'm a VMUG user so I'll not open a ticket, but I wanted to feel the temperature and maybe warn some others... 😉

1 Solution
22 Replies
howesh
Contributor
Contributor
Jump to solution

Running into the same issue.  Cannot enable HA after 7.0.3.00100 patch.  vCenter 7.0.3 has been a major flop, with issues with the LifeCycle Manager rights, HA, and issues installing ESXi updates.  I will definitely wait a few patches before updating my system.

franckehret
Enthusiast
Enthusiast
Jump to solution

Yep, a flop, that correct! 😁

I've also ESXi updates issues and Lifecycle access issues when I use an AD account

Well, we can say now "there is room for improvement"!

0 Kudos
-xxxxxxxxxxxxx-
Contributor
Contributor
Jump to solution

Saved me a heap of stress and rollbacks. Thanks for posting!

howesh
Contributor
Contributor
Jump to solution

OK, seems like my whole issue is a vib "i40enu". Seems to be multiple version of it causing HA install and Lifecycle update issues.  I ran the script below (I did not write it) and it allowed me to patch my systems.  Had to run it multiple times.  One time to get it to complete the Lifecycle updates and another to get the HA working again.  I ran it on my Dell R750s and Fujitsu servers to allow HA software to install on them.  After running the script it fixed my issues with 7.0.3.  All seems stable now with the latest 7.0.3.00100.  I would have waited a while to go to 7.0.3 if I know the issues I would have had.

 

$VCHosts = Get-VMHost
foreach ($VCHost in $VCHosts)
{
$esxcli_v2 = Get-EsxCli -VMHost $vcHost -V2
$vib_remove_args = $esxcli_v2.software.vib.remove.CreateArgs()
$vib_remove_args.vibname = "i40enu"
$esxcli_v2.software.vib.remove.Invoke($vib_remove_args)

$vib_remove_args.vibname = "brcmnvmefc"
$esxcli_v2.software.vib.remove.Invoke($vib_remove_args)
}

franckehret
Enthusiast
Enthusiast
Jump to solution

Hi,

I'll try to give it a try (we, so start using scripts is not in my plans...) ! Thanks for sharing. 😉

It seems the script is also removing brcmnvmefc, do you know why?

0 Kudos
howesh
Contributor
Contributor
Jump to solution

Not sure why it ran that extra removal.  All 6 of my servers reported that the step failed as it is not on my system.  You could remove that step if wanted.   I left the script as is, as I am not really good at writing scripts.

0 Kudos
howesh
Contributor
Contributor
Jump to solution

Also, I used powershell to connect to server (Connect-VIServer -Server XXXXX -User root -Password XXXXXX -force) and then ran the script.  Make sure you are in maintenance mode.  After I ran the script, I disconnected from server to get ready for the next server. (DisConnect-VIServer - Server XXXXXXX)

0 Kudos
wsasaramago
Contributor
Contributor
Jump to solution

Good hint there @howesh ...
But that fix is related to ESXi VIBs and not vCenter it-self, at least directly.
@franckehret Did you had a chance to understand what was the issue with HA after vCenter 703a update?

I have updated my LAB to 703a and I had no issues at all!

And my hosts are not even compatible with 7.0.2 Build 18538813 whic I'm running on half of my LAB.

So, I'm suspecting of a VIB - firmware version needing a fix here. I would definitively give it a look.

 

Cheers & Thanks!

Allways Learning. Allways having Fun. Thank You!
0 Kudos
franckehret
Enthusiast
Enthusiast
Jump to solution

Hi,

Not yet, so I still don't know what caused HA to fail. I'll proceed to esxi hosts patching first.

I've removed the problematic VIB from one host and it was ok with patching (so with vCenter 7.0 U3 "not" a) and I'll probably patch all my other hosts before trying vCenter update.

But for 4 hosts, as I'm not fit with Powershell and because "engineering" time is bigger, I just used the following command on each host:

esxcli software vib remove --vibname=i40enu

So my process is:

  • removal of i40enu
  • reboot
  • staging of all missing updates
  • reboot
  • verify host is ok
  • when all host are patched, vCenter reboot and recheck patch compliance
  • if compliance ok, try vCenter update with HA on

And if I'm lucky and i40enu was the problem, vCenter should patch without HA error... I'll let you know! 😉

(Otherwise, I'll try to patch it with HA off, and try to re-enable HA after vCenter update)

0 Kudos
wsasaramago
Contributor
Contributor
Jump to solution

Hi @franckehret ..

Just need to confirm one thing:

What did fail: Was it vCenter HA (VCHA or vSphere Availability (HA).

This makes a big difference. Anyway, It's also good to keep ESXi up to date as well as VIBs and firmware versions aligned, i.e. compatibility.

Cheers and keep us posted!

Allways Learning. Allways having Fun. Thank You!
0 Kudos
franckehret
Enthusiast
Enthusiast
Jump to solution

vSphere Availability (HA) 😉

Usually, I always patch pretty quickly after releases, but since U3, it just f***** the whole thing!

(I even tried to completely reset Lifecycle, but it didn't help)

0 Kudos
howesh
Contributor
Contributor
Jump to solution

It was vSphere Availability not vCenter HA.  It was failing installing the vSphere HA software on each host that had the VIB i40enu installed.

0 Kudos
franckehret
Enthusiast
Enthusiast
Jump to solution

Well, at the end, I had to re-do the VIB removal thing once again after I patched the vCenter... 😑

I just hope I don't have to do that ever again... Because it is reeeeaaaallyyy boring!

Quick question to you all: does anybody has "Host has lost time synchronization" errors since vSphere 7U3 on hosts? I've checked with command in host console, time is fine/syncing (also on a lot of others non vSphere servers), so it is clearly a bug.

Let me know! 😉

0 Kudos
howesh
Contributor
Contributor
Jump to solution

NTP failing is an issue I am having also.  Make sure you patch your hosts with the latest version of 7.0.3 as I had 3 of my hosts PSOD.  All three crashed at the same time.  It was a new bug in 7.0.3 that was just fixed.  They also show that they renamed the i40enu back to i40en to fix a bunch of issues.  Another set of nightmares for me with 7.0.3.

Turranius
Contributor
Contributor
Jump to solution

I would like to add one more thing related to this. I have 4 test machines with Areca Raid cards. The machines are HP 8200, HP z420 and HP z440.

The cards are

1 Areca Arc-1212

2 Areca Arc-1260

1 ARC-1880IX-12

On two of the machines, with the Arc-1212 and the Arc-1260, as soon as I updated from ESXi-7.0U2d-18538813 to ESXi-7.0U3-18644231, the Areca cards still showed up but with 0 targets on them. In other words, it did not show any of the array(s) on the cards and the datastores were gone.

The weird thing is that it worked on one z420 with a Arc-1260 but not on another z420, also with an Arc-1260. I could update those just fine and the datastores showed up.

I used the latest Areca drivers for VMWare 7*: arcmsrn 2.00.00.06-1OEM.650.0.0.4598673 arc VMwareCertified 2021-10-25

After troubleshooting with Areca back and forth, they could not replicate the error at all. We tried both the component and vib drivers. No difference. As soon as I updated to 7.0.3, the arrays did not show up. Also, I could not shut down the ESXi host as it always hung on "shutting down device drivers".

But, after finding this thread, I wondered if it could be related, so I uninstalled the i40enu driver, using
esxcli software vib remove --vibname=i40enu

After a restart, I once again tried updating the ESXi host. Update Manager complained about some crap so I updated from shell instead.
esxcli software profile update -p ESXi-7.0U3a-18825058-standard -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml --no-hardware-warning

After a restart all looked fine still, so something about the i40enu problem also affected my Areca drivers for some reason.

 

franckehret
Enthusiast
Enthusiast
Jump to solution

Yep, this update is really f***** up... I hope they fix those problems anytime soon, but at some point, we are also losing trust... 😉

0 Kudos
franckehret
Enthusiast
Enthusiast
Jump to solution

Hi there,

Don't know where you all are but I noticed there was some patches in the meantime (7.0.3a) for ESXi.

I've tried to update straight, no success, I've tried after removing the i40enu vib and reboot, no success too!

I also tried to create a baseline with only these 2 patches, no success...

franckehret_0-1636020786007.png

So now, I don't know... any success on your side? 😉

0 Kudos
dragans2
Contributor
Contributor
Jump to solution

Remove again:

  • esxcli software vib remove -n i40enu
  • reboot
  • CHECK COMPLIANCE
    • Host Security Patches (Predefined)
    • Critical Host Patches (Predefined)
  • REMEDIATE

Done. for me it worked like this

0 Kudos
franckehret
Enthusiast
Enthusiast
Jump to solution

Well, it didn't work on the first host I've tried, but there was also a small difference in the process as after the struggle, I removed the "Non-Critical Host Patches" baseline from all my hosts.

Then all could remediate... with the mentioned procedure of course, not the "easy/normal way"... 😁

PS: I didn't put the non-critical baseline back for now. Do you guys use it normally?

0 Kudos