When deploying VC 8.0u2 on AMD 7343 based hosts, as soon as the OVA finishes deploying and the VM powers on (at about 0.8s into the boot every time), it gets a kernel panic.
Specifically: Kernel panic - not syncing: alg: self-tests for ecdh-generic (ecdh) failed in fips mode!
Upgrading an existing VC 8.0u1 to 8.0u2 on the same hosts also results in the same kernel panic upon the reboot.
I'm a VMware partner and MSP, and I'm able to replicate this at two separate customer locations, each running multiple hosts in their clusters that are HPE Proliant DL235 Gen10 Plus v2 with AMD EPYC 7343 16-Core Processors.
A quick Google turns up this Red Hat KB that appears to be very similar in nature: https://access.redhat.com/solutions/6995154
So the question becomes, how do we work around this with VC?
dcc
So it appears if you disable fips in the grub boot options, the 8.0u2 VC boots fine. To do this at boot, press e at the photon boot logo screen immediately after powering on the VC, and changes "fips=1" to "fips=0", then press F10 to boot. To permanently change it, you need to vi /boot/photon.cfg by ssh'ing into the VCSA. And it appears when you apply 8.0u2 as a patch, the patch updates photon.cfg to re-enable fips, so you'll need to disable it again.
So far, I've only found this to affect HPE Proliant DL325 Gen10 Plus v2 nodes with the AMD 7343 processors (with BIOS build v2.80 from July 31, 2023 - 2.84 from August 17, 2023 is available, I just haven't gotten these nodes update yet, but will tomorrow). I have other HPE Proliant DL325 Gen10 Plus v2 nodes with the AMD 7413 processors and there is no issues.
dcc
Hey!
Were you able to update the BIOS and test, if this was the issue?
Thanks!
Yes I did upgrade the host's firmware to 2.84, which is the most current available from HPE, and it made no difference.
dcc
Ok, so you were not able to solve the problem? I currently have a case with HPE VMWare Support open.
Cool... I've been so busy, it's just not a priority for me right now... But eventually it will be, so if you don't mind, please share the results here (or send me a private message if you want).
dcc
I can confirm the issue on HPE DL385 Gen10 Plus v2 with AMD EPYC 74F3 CPUs. Still with BIOS v2.72 but as I read v2.84 does not solve the problem.
It seems that security features with ESXi on AMD are a bad combination these days since I have another ongoing issue with crashing Windows VMs and VBS.
Disabling FIPS mode is just a workaround. Are there any news from HPE?
Hello,
Sorry if I'm intruding, perhaps this KB article "could" explain the circumstances of the problem you are experiencing:
https://kb.vmware.com/s/article/95172?lang=en_US&queryTerm=vcenter%208.0u2
Not that consulting it solves the problem, but at least it can be useful to avoid the same thing happening to other customers / users.
Regards,
Ferdinando
We opened a case through HPE which is our reseller for VMware licenses. ETA for a fix is the first quarter in 2024. Very disappointing and I have no idea why this is taking so long. In the past, issues regarding another big CPU vendor are fixed within days.
We cannot apply the workarounds mentioned in the KB so we are stuck (again). Hopefully there are no critical vCenter or ESXi patches in the meantime which we cannot apply until the fix arrives.
Hi,
I just want to confirm that vCenter update 8.0u2b fixes the issue.