VMware Cloud Community
dcolpitts
Enthusiast
Enthusiast

VC 8.0u2 OVA fails to boot on AMD 7343 procs with kernel panic related to fips mode

When deploying VC 8.0u2 on AMD 7343 based hosts, as soon as the OVA finishes deploying and the VM powers on (at about 0.8s into the boot every time), it gets a kernel panic.

Specifically:  Kernel panic - not syncing: alg: self-tests for ecdh-generic (ecdh) failed in fips mode!

2023.10.08 - 09.07.51 - SNAGIT -  0026.jpg

Upgrading an existing VC 8.0u1 to 8.0u2 on the same hosts also results in the same kernel panic upon the reboot.

I'm a VMware partner and MSP, and I'm able to replicate this at two separate customer locations, each running multiple hosts in their clusters that are HPE Proliant DL235 Gen10 Plus v2 with AMD EPYC 7343 16-Core Processors.

A quick Google turns up this Red Hat KB that appears to be very similar in nature:  https://access.redhat.com/solutions/6995154 

So the question becomes, how do we work around this with VC?

dcc

0 Kudos
10 Replies
dcolpitts
Enthusiast
Enthusiast

So it appears if you disable fips in the grub boot options, the 8.0u2 VC boots fine.  To do this at boot, press e at the photon boot logo screen immediately after powering on the VC, and changes "fips=1" to "fips=0", then press F10 to boot.  To permanently change it, you need to vi /boot/photon.cfg by ssh'ing into the VCSA.  And it appears when you apply 8.0u2 as a patch, the patch updates photon.cfg to re-enable fips, so you'll need to disable it again.

2023.10.08 - 21.00.14 - SNAGIT -  0027.jpg

So far, I've only found this to affect HPE Proliant DL325 Gen10 Plus v2 nodes with the AMD 7343 processors (with BIOS build v2.80 from July 31, 2023 - 2.84 from August 17, 2023 is available, I just haven't gotten these nodes update yet, but will tomorrow).  I have other HPE Proliant DL325 Gen10 Plus v2 nodes with the AMD 7413 processors and there is no issues.

dcc

0 Kudos
CedricMenzi
Contributor
Contributor

Hey!

Were you able to update the BIOS and test, if this was the issue?

Thanks!

0 Kudos
dcolpitts
Enthusiast
Enthusiast

Yes I did upgrade the host's firmware to 2.84, which is the most current available from HPE, and it made no difference.

 

dcc

0 Kudos
CedricMenzi
Contributor
Contributor

Ok, so you were not able to solve the problem? I currently have a case with HPE VMWare Support open.

0 Kudos
dcolpitts
Enthusiast
Enthusiast

Cool... I've been so busy, it's just not a priority for me right now... But eventually it will be, so if you don't mind, please share the results here (or send me a private message if you want).

 

dcc

0 Kudos
CedricMenzi
Contributor
Contributor

I was just able to solve the issue. You have to change fips=1 to fips=0 in the file /boot/grub/grub.cfg and then reboot. (See Screenshot)

0 Kudos
GabGo
Contributor
Contributor

I can confirm the issue on HPE DL385 Gen10 Plus v2 with AMD EPYC 74F3 CPUs. Still with BIOS v2.72 but as I read v2.84 does not solve the problem.

It seems that security features with ESXi on AMD are a bad combination these days since I have another ongoing issue with crashing Windows VMs and VBS.

Disabling FIPS mode is just a workaround. Are there any news from HPE?

0 Kudos
Kinnison
Commander
Commander

Hello,


Sorry if I'm intruding, perhaps this KB article "could" explain the circumstances of the problem you are experiencing:
https://kb.vmware.com/s/article/95172?lang=en_US&queryTerm=vcenter%208.0u2


Not that consulting it solves the problem, but at least it can be useful to avoid the same thing happening to other customers / users.


Regards,
Ferdinando

0 Kudos
GabGo
Contributor
Contributor

We opened a case through HPE which is our reseller for VMware licenses. ETA for a fix is the first quarter in 2024. Very disappointing and I have no idea why this is taking so long. In the past, issues regarding another big CPU vendor are fixed within days.

We cannot apply the workarounds mentioned in the KB so we are stuck (again). Hopefully there are no critical vCenter or ESXi patches in the meantime which we cannot apply until the fix arrives.

0 Kudos
GabGo
Contributor
Contributor

Hi,

I just want to confirm that vCenter update 8.0u2b fixes the issue.

0 Kudos