VMware Communities
MScarpulla
Contributor
Contributor

NMI on Host System

I am running VMWare Workstation Pro 15.5.2.

I had a perfectly running system using dual Xeon E5-2620 V3 processors running 4 guests.  I changed the processors to dual Xeon E5-2660 V3, and now, starting any guest OS will blue screen the host with an NMI.  Next, I turned ON full VMware logging for the guest and started the guest again.  It runs perfectly normally, no NMI.

Only with the Xeon E5-2660 V3 processors: VMWare logging turned OFF = NMI, VMWare logging turned ON = normal operation.

Can anyone explain what is happening here?  I can run with full logging turned on, but isn't there a performance hit to do this?

Reply
0 Kudos
6 Replies
wila
Immortal
Immortal

Hi,

, starting any guest OS will blue screen the host with an NMI.

Are you starting the guest OS from boot or is it a resume sort of start?

Make sure to have your guest OS shut down completely before running under another CPU as there might be registers or functionality available in old processor that is not available in the new processor. On resuming on a different processor this can trigger weird side effects.

The same thing counts for the host as well, it cannot be in some sort of power resume mode when you swap the processors. Power should be removed when upgrading the processors.

If that still doesn't help then you might have to reset your BIOS/firmware settings back to factory defaults.

Also make sure to upgrade to the latest BIOS/firmware available for your host.

--

Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

NMIs are often (but not always) the result of a hardware problem.

If your host has an event log (either in its BIOS/EFI firmware or through a DRAC/iLO or other Service Processor), check if there are more details logged there.  It seems unlikely that an NMI would end up in the Windows Event Log, but it might be worth checking there too just in case Windows managed to record some details somewhere as it BSODed.

I'll second Wil's suggestion to check that the host's firmware is up-to-date, since support for newer CPUs is often added in host firmware updates.  Weird stability issues (potentially including NMIs) can result if you install CPUs that the system and its firmware does not officially support.

Thanks,

--

Darius

Reply
0 Kudos
MScarpulla
Contributor
Contributor

I tried these things.

1) I upgraded the BIOS.

2) Erased the computer completely, and installed a fresh new Windows Server 2019 with all patches.

3) Installed the latest VMWare 15.5.2

4) Created a new fresh VM from scratch of Windows Server 2019

5) NMI during the install process.

6) WIndows system log indicates HAL.dll was the failure point, but sometime in another system file, but almost always hal.dll

The weird thing is, If I turn on VMWare logging it will NOT cause an NMI.   Figure that one out and I think you will solve the problem.

Going to try ESXi7 next, but it doesn't support my onboard Raid controller, so I will be loosing some capability.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

How did you go with ESXi 7?

What model of system/motherboard is it?  Have you checked that the system/motherboard actually supports the CPUs you have installed?  (Proper compatibility is a lot more involved than whether the CPU sockets are compatible with the CPUs...)

If your host has a service processor (Dell DRAC, HP iLO, etc...), the service processor's event log might include more detail about the cause of the NMI.

Thanks,

--

Darius

Reply
0 Kudos
MScarpulla
Contributor
Contributor

Bottom line, bad or incompatible processors (probably bad, in the PCI Express Lane area).  These processors (E5-2660 V3) pass every conceivable test I can find, including operating normally as long as VMWare 15.5 is not running any VMs.  Run any VM, Boom!  NMI.  With my old processors from the same family (E5-2620 V3) just less cores, will run as many VMs as a can load into memory with no issues ever.

Reply
0 Kudos
MScarpulla
Contributor
Contributor

Oh, and I did try ESXi 7.0.  I installed it from USB onto one of the many hard drives I have laying around, not over the original OS.  This didn't blow up the machine, but 1) I didn't run it that long to be certain, 2) it does not recognize my Raid controller, and 3) it does not recognize my TP-Link network cards, so even if it worked, it was a sub-optimal solution without buying additional VMWare compatible hardware.

Reply
0 Kudos