VMware Cloud Community
elozano69
Contributor
Contributor

NMI IPI: Panic requested by another pcpu

Has anyone experience this pink screen?

16 Replies
SupreetK
Commander
Commander

Going by the functions reported in the stack, looks like the host has crashed in the ahci driver module during extensive logging. Do you have any logs set to verbose or debug level logging? Any other recent changes made to your environment?

Cheers,

Supreet

Reply
0 Kudos
elozano69
Contributor
Contributor

Actually this is a new installation, this pink screen is happening every 2-3 days. i have not set any logs as you mention

Reply
0 Kudos
daphnissov
Immortal
Immortal

What hardware are you running ESXi on here?

Reply
0 Kudos
elozano69
Contributor
Contributor

It is a Power Edge T640

Reply
0 Kudos
daphnissov
Immortal
Immortal

You checked to ensure you have completely up-to-date BIOS and firmware on this server?

Reply
0 Kudos
hjherron
Contributor
Contributor

Hello, I was wondering if you found a resolution to this issue?  We are running VMware ESXi 6.7 on a PowerEdge T640 and it just started presenting this very same error.

Reply
0 Kudos
Yves_
Contributor
Contributor

Not sure if tis the same but at least it begins with the same issue...

I am running a newer Intel Server which is have "the same" issue...

Reply
0 Kudos
pittrider
Contributor
Contributor

Dear Sir.

I Found This Same Screen.

Use Exsi 14 Day. I Found This Screen.

Thak You

NisitS__75415643.jpg

Reply
0 Kudos
paulboniotti198
Contributor
Contributor

You found the solution for this psod

Reply
0 Kudos
bouke
Hot Shot
Hot Shot

I experienced the exact same issue. In my case it was caused by a faulty driver/controller/device (AHCI / DVDROM). I solved it by disabling 'vmw_ahci' and 'ahci' drivers since I didn't use the DVDROM anyway. The SSD and HDD are on a separate RAID controller, so if that's the same in your case:

esxcli system module set --enabled=false --module=vmw_ahci

esxcli system module set --enabled=false --module=ahci

and reboot the server. The servers which have the same experience are stable now.

I also wrote an article about this on my blog, feel free to read:

Solving PSOD 'Panic Requested by another PCPU' - Jume - My Virtualization Blog

Oh no, another Virtualisation signature...
ITCharleston
Contributor
Contributor

Thank you for this post.

After crashing 3 times in 60 minutes, the server has been stable for 24 hours now.

Reply
0 Kudos
scott28tt
VMware Employee
VMware Employee

Moderator note: Moved to ESXi


-------------------------------------------------------------------------------------------------------------------------------------------------------------

Although I am a VMware employee I contribute to VMware Communities voluntarily (ie. not in any official capacity)
VMware Training & Certification blog
Reply
0 Kudos
rabbitsnake
Contributor
Contributor

I have PSOD very similar to yours, however I am running an Cisco UCS blade. Did you find out anything for your PSOD? Cisco pointed me here, but I don't have any AHCI drivers installed.

snavmesx13.png

Reply
0 Kudos
ElizabethFoster
Contributor
Contributor

I just got this same PSOD running on a UCS blade, firmware 4.04(d) and ESXi 6.7 Update 3.  Did you find anything for this?

pastedImage_0.png

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Hi ElizabethFoster,

Your PSOD screen shows Memory Controller Read Error messages which suggest hardware problems.  If the firmware is up-to-date and supported, I would next recommend running hardware diagnostics.

--

Darius

Reply
0 Kudos
DanL2
Contributor
Contributor

I know this is an old post but it's what comes up when you google the issue. There's a KB article that covers exactly what is going on - https://kb.vmware.com/s/article/67560 but it is exceptionally poorly written.

There are two solutions to fix this, either remove/replace the CD/DVD drive or disable AHCI, assuming AHCI isn't being used for anything else. Unfortunately this bug can't be fixed via firmware updates so these drives are trash.

The fix is exactly what bouke posted, disable AHCI using the following commands:
esxcli system module set --enabled=false --module=vmw_ahci
esxcli system module set --enabled=false --module=ahci

If you want to confirm you have the problematic drive you can run this PowerCLI command to find you CD/DVD drive model.
Get-VMHost | where {$_ | get-scsilun -LunType cdrom} | Select Name,@{N="Vendor";E={$_ | Get-ScsiLun -LunType cdrom | select Vendor}},@{N="Model";E={$_ | Get-ScsiLun -LunType cdrom | select Model}}

If it says DU-8A5LH then it's a ticking time bomb waiting to PSOD at any moment (even when not using the drive).

Reply
0 Kudos