evoicefire
Contributor
Contributor

ESXI Crash to nothing. PCPU didn't have a heartbeat. NMI RIPOFF

Hi,

Running into an unusual issue here. Every 2 days or so my system hangs (crashes?) no PSOD or anything just blank screen until a force reboot.

This is an ESXi whitebox with Ryzen 2600 CPU, Asrock Rack X470D4U and 16Gb Kingston DIMM (on QVL for mobo)

I am running ESXi-7.0.0-16324942-standard which is the latest build that I can find.

memtest is returning clean and I have made sure that RDRAND is returning sensible values - I've reached a bit of a loss. Below is the vmkernel.log output when the issue occurs. Any hints would be great.

2020-10-05T23:45:46.161Z cpu6:264546)WARNING: Heartbeat: 767: PCPU 4 didn't have a heartbeat for 7 seconds; *may* be locked up.

2020-10-05T23:45:46.161Z cpu4:262285)ALERT: NMI: 694: NMI IPI: RIPOFF(base):RBP:CS [0x368bd2(0x420035c00000):0x451a0469b9b0:0xf48] (Src 0x1, CPU4)

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469b918:[0x420035f68bd1]CpuSched_PcpuLoadGet@vmkernel#nover+0x26 stack: 0x43007f0025a8

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469b920:[0x420035f69844]CpuSchedMigrateGoodness@vmkernel#nover+0x5e1 stack: 0xff

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469b9c0:[0x420035f6b30c]CpuSched_VcpuMigrateBestPcpu@vmkernel#nover+0x4f5 stack: 0x27a043797c562

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469bcf0:[0x420035f6b7bd]CpuSched_VcpuWakeupMigrateUnified@vmkernel#nover+0x5e stack: 0x27a043797c562

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469bd20:[0x420035f55b0e]CpuSchedVcpuMakeReady@vmkernel#nover+0xdf stack: 0x451a151a1900

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469bd40:[0x420035f55bbc]CpuSchedWorldWakeup@vmkernel#nover+0x8d stack: 0x27a043797c562

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469bd70:[0x420035f55e31]CpuSchedForceWakeupInt@vmkernel#nover+0x82 stack: 0x3c

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469bd90:[0x420035d0b750]Timer_BHHandler@vmkernel#nover+0x1f9 stack: 0x4519c0200560

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469be20:[0x420035cbb531]BH_Check@vmkernel#nover+0x6e stack: 0x0

2020-10-05T23:45:46.161Z cpu4:262285)0x451a0469bea0:[0x420035f58079]CpuSched_SafePreemptionPoint@vmkernel#nover+0x16 stack: 0x1f

2020-10-05T23:52:10.098Z cpu1:262848)DVFilter: 6344: Checking disconnected filters for timeouts

2020-10-06T00:02:09.154Z cpu7:262848)DVFilter: 6344: Checking disconnected filters for timeouts

2020-10-06T00:10:46.301Z cpu2:262793)WARNING: Heartbeat: 767: PCPU 1 didn't have a heartbeat for 7 seconds; *may* be locked up.

2020-10-06T00:10:46.301Z cpu1:262285)ALERT: NMI: 694: NMI IPI: RIPOFF(base):RBP:CS [0x33d79(0x420035c00000):0x451a0469bdf8:0xf48] (Src 0x1, CPU1)

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469bdd8:[0x420035c33d78]NRandomHwrngRdrand@vmkernel#nover+0x9 stack: 0x0

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469bde0:[0x420035c21366]extract_buf@vmkernel#nover+0x33 stack: 0x8

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469bea0:[0x420035c21b77]extract_entropy_user@vmkernel#nover+0x5c stack: 0x451a0469bef0

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469bf00:[0x420035dab142]VmMemCow_PShareUpdateCache@vmkernel#nover+0xab stack: 0x100000

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469bf70:[0x420035f7bcd0]MemSchedEst_PShareLoop@vmkernel#nover+0x161 stack: 0x0

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469bfe0:[0x420035f5e2f9]CpuSched_StartWorld@vmkernel#nover+0x82 stack: 0x0

2020-10-06T00:10:46.301Z cpu1:262285)0x451a0469c000:[0x420035cc44c3]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0

2020-10-06T00:11:16.465Z cpu1:264535)WARNING: Heartbeat: 767: PCPU 5 didn't have a heartbeat for 7 seconds; *may* be locked up.

2020-10-06T00:11:16.465Z cpu5:262285)ALERT: NMI: 694: NMI IPI: RIPOFF(base):RBP:CS [0xab974(0x420035c00000):0x451a0469bdf8:0xf48] (Src 0x1, CPU5)

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469bd40:[0x420035cab973]SHA1Transform@vmkernel#nover+0x16c stack: 0xb580e14d071beb39

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469bde0:[0x420035c213b6]extract_buf@vmkernel#nover+0x83 stack: 0x8

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469bea0:[0x420035c21b77]extract_entropy_user@vmkernel#nover+0x5c stack: 0x1f

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469bf00:[0x420035dab142]VmMemCow_PShareUpdateCache@vmkernel#nover+0xab stack: 0x200000

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469bf70:[0x420035f7bcd0]MemSchedEst_PShareLoop@vmkernel#nover+0x161 stack: 0x0

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469bfe0:[0x420035f5e2f9]CpuSched_StartWorld@vmkernel#nover+0x82 stack: 0x0

2020-10-06T00:11:16.465Z cpu5:262285)0x451a0469c000:[0x420035cc44c3]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0

0 Kudos
5 Replies
evoicefire
Contributor
Contributor

bump

0 Kudos
deganl
Contributor
Contributor

I have exactly the same PSOD here.
Did you found an Resolution?

It is always, if i want to delete one special VM. So i think that this is an storage firmware problem?

VMware ESXi, 7.0.1, 17168206

0 Kudos
continuum
Immortal
Immortal

Check your "one special VM" - if any of its files does not allow you to use hexdump -C >file> and answers with "bad file descriptor" then that is your problem.

 

Ulli

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos
deganl
Contributor
Contributor

I identified the flat.vmdk as the faulty one.

Hexdump ran for a few hours. It gives me now the following Error:
hexdump: Srv-Orbis2-flat.vmdk: Invalid argument.

0 Kudos
continuum
Immortal
Immortal

There is no point in running hexdump -C against the flat.vmdk for more than just a few seconds.

While the VM is powered off now run

vmkfstools -p 0 name-flat.vmdk > /tmp/mapping.txt

If that runs without any error messages and populates the mapping.txt file your vmdk should be in a good enough state to clone it to a different datastore and then rebuild the current datastore from scratch.

If that does not work - then this turns into a recovery project.
In most cases I could probably do that via a remote session - call me via skype if you need assistance.

 

Ulli

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos