VMware Cloud Community
kolibri76
Contributor
Contributor

Upgrade ESXI 7.0.1 - Fatal CPU mismatch

I try to ugprade my homelab ESXi 7.0.0 (Build 16324942) to ESXi 7.0.1 (Build 17168206) and receive the attached purple screen after rebooting the host. The CPU used is an INTEL Atom C2750 which should be still supported according the HCL.

Any ideas?Screenshot 2020-11-30 at 18.30.26.png

 

best regards

Martin

 

0 Kudos
30 Replies
mchaker
VMware Employee
VMware Employee

Thank you so much for the detailed response. That's interesting that the platform ID differs between CPUs -- I'll look more into Intel documentation sometime for the other bits' meanings.

Furthermore, thank you for sharing that boot parameter -- it worked! 😊

0 Kudos
vbondzio
VMware Employee
VMware Employee

Thanks for taking the time Tim!

Just to add, we only introduced cpuUniformityHardCheckPanic in 7.0 U2 (exactly for situations like this), it was not available in U1.

vmezen
Contributor
Contributor

Hello!
Compatibility problem Intel Xeon Gold 6252

Not so long time ago we got new server with 2 sokets Intel Xeon Gold 6252 (2.1Ghz, 24 cores, 48 threads) on S2600WF.

When we setup ESXi 6.7U2 U3, it recognize our server like 4 sokets by 12 cores per soket, and it dosen`t matter is the hypertridding on or off. If disable one core, it recognise like 2 sokets with 23 cores per soket. So we get 46 cores, it dosen`t matter is the hypertridding on or off. ESXi starts but, doesn't work properly.

When we setup ESXi 7U1, it even doesn`t recognize processors on installing stage. We have Purple Screen of Death with error "HW feature incompatibility detected: cannot start" (attached below). It seem like ESXi doesn't support Xeon Gold 6252, but compatibility guide says that it is supporting Xeon Gold 6200 family.

We have no problems with Xeon Gold 6138 on same S2600WF, bought earlier.

I tried to modify boot.cfg like below, but no result:

bootstate=0
title=Loading ESXi installer
timeout=5
prefix=
kernel=/b.b00
kernelopt=cdromBoot runweasel
cpuUniformityHardCheckPanic=FALSE
modules=/jumpstrt.gz --- /useropts.gz --- /features.gz --- /k.b00 --- /uc_intel.b00 --- /uc_amd.b00 --- /uc_hygon.b00 --- /procfs.b00 --- /vmx.v00 --- /vim.v00 --- /tpm.v00 --- /sb.v00 --- /s.v00 --- /bnxtnet.v00 --- /bnxtroce.v00 --- /brcmfcoe.v00 --- /brcmnvme.v00 --- /elxiscsi.v00 --- /elxnet.v00 --- /i40en.v00 --- /i40iwn.v00 --- /iavmd.v00 --- /icen.v00 --- /igbn.v00 --- /iser.v00 --- /ixgben.v00 --- /lpfc.v00 --- /lpnic.v00 --- /lsi_mr3.v00 --- /lsi_msgp.v00 --- /lsi_msgp.v01 --- /lsi_msgp.v02 --- /mtip32xx.v00 --- /ne1000.v00 --- /nenic.v00 --- /nfnic.v00 --- /nhpsa.v00 --- /nmlx4_co.v00 --- /nmlx4_en.v00 --- /nmlx4_rd.v00 --- /nmlx5_co.v00 --- /nmlx5_rd.v00 --- /ntg3.v00 --- /nvme_pci.v00 --- /nvmerdma.v00 --- /nvmxnet3.v00 --- /nvmxnet3.v01 --- /pvscsi.v00 --- /qcnic.v00 --- /qedentv.v00 --- /qedrntv.v00 --- /qfle3.v00 --- /qfle3f.v00 --- /qfle3i.v00 --- /qflge.v00 --- /rste.v00 --- /sfvmk.v00 --- /smartpqi.v00 --- /vmkata.v00 --- /vmkfcoe.v00 --- /vmkusb.v00 --- /vmw_ahci.v00 --- /crx.v00 --- /elx_esx_.v00 --- /btldr.v00 --- /esx_dvfi.v00 --- /esx_ui.v00 --- /esxupdt.v00 --- /tpmesxup.v00 --- /weaselin.v00 --- /loadesx.v00 --- /lsuv2_hp.v00 --- /lsuv2_in.v00 --- /lsuv2_ls.v00 --- /lsuv2_nv.v00 --- /lsuv2_oe.v00 --- /lsuv2_oe.v01 --- /lsuv2_oe.v02 --- /lsuv2_sm.v00 --- /native_m.v00 --- /qlnative.v00 --- /vdfs.v00 --- /vmware_e.v00 --- /vsan.v00 --- /vsanheal.v00 --- /vsanmgmt.v00 --- /tools.t00 --- /xorg.v00 --- /gc.v00 --- /imgdb.tgz --- /imgpayld.tgz
build=7.0.1-0.25.17325551
updated=0

We have ESXi 6.7 license.

Firmware on S2600WF set on latest.

Please help if you can!

Sorry if wrong place to ask. If it is not difficult tell me where to ask.

0 Kudos
vbondzio
VMware Employee
VMware Employee

Yeah, so this is an example where the check found an unsupported configuration. You should get the HW vendor to replace it.

Basically the message says that a couple of cores on the 2nd socket claim to be Cascade Lake (0x50657) while CPU 0 on the first socket is Cascade Lake B-0 (0x50656), the microcode revision is also different on those cores ... What is most likely happening here is that that the whole 2nd socket is a different CPU, i.e. they shipped the server with two slightly different ones.

The option "cpuUniformityHardCheckPanic=FALSE" was only introduced in 7.0 U2 and you should really not be use it in this case.

0 Kudos
vmezen
Contributor
Contributor

Thank you for answer.

But why ESXi 6.7 as and ESXi 7.0 U2 (tasted several minutes ago) see it same way, like 4 sokets per 12 cores without hyperthreading (enable or disable useless), and not like it is 2 sokets per 24 cores (48 threads)?

Does it connected to difference in microcode revision?

Main question is to use ESXi 6.7 because we have license for 3 servers. ESXi 7.0 U1 and ESXi 7.0 U2 were used only for test.

0 Kudos
vbondzio
VMware Employee
VMware Employee

Are you sure ESXi sees 4 sockets and not 4 NUMA nodes? If it is the latter, then you most likely have SNC (Sub NUMA Clustering) enabled in the BIOS. This should be unrelated to the difference of the CPUs though.

I'm not sure I understand your license question though. You can use 6.7, you should still get the HW vendor to replace the non matching CPU.

edit: just saw the screenshot, I'm not sure whether the ESXi host client is correct here. Can you SSH to the host and run "sched-stats -t ncpus" ?

vmezen
Contributor
Contributor

Thank you very much! In fact NUMA were enabled in BIOS, but were hidden in 4-s level of menu and I didn't notice it.

StanthewiZZZARD
Contributor
Contributor

@TimMann 

 

I have on my homelab a supermicro motherboard X11. *
Worked perfectly.
Update to the latest 7.0.3.
Working.


Today (3 days after update)

While mooving a file. PSOD.
CPU mismatch with a SINGLE CPU.

I cant reinstall ESXI PSOD the SAME with 7.0.2

The hardware is dead no ?

Thanks for help

0 Kudos
vbondzio
VMware Employee
VMware Employee

@StanthewiZZZARDjust updating as per your post here, this sounds like a HW issue: https://old.reddit.com/r/vmware/comments/ub45yx/psod_is_the_mobo_or_cpu_dead/i63xwy3/

StanthewiZZZARD
Contributor
Contributor

I agree

CPU Is on it’s way. Hope it’s not the mobo 

thanks

0 Kudos
StanthewiZZZARD
Contributor
Contributor

refurbed CPU found (E3-1220)

Host is back to life
VM also

you can't imagine how I feel (homelab with data)

0 Kudos