VMware Cloud Community
PBTBgray
Contributor
Contributor

ESXi 6.5 U1 VFRC PSOD?

Just upgraded one of our ESXi hosts from 6.0 P5 to 6.5 U1 and ran about 15 VM's on it for about a week and everything was fine.  I then migrated 3 more hosts with VFRC enabled and about 8 hours later I got the PSOD below?  Has anyone ever come across anything like this?  I've got a service request open with support but so far have not been able to give me a concrete answer.

ESXI Crash.png

Tags (2)
0 Kudos
5 Replies
vijayrana968
Virtuoso
Virtuoso

What is hardware model/maker of this server, what is CPU configuration?

0 Kudos
Fred_vBrain
Enthusiast
Enthusiast

Can you give me the SR number?

Fred | vBrain.info | vExpert 2014-2022
0 Kudos
PBTBgray
Contributor
Contributor

SR 17549907708

Thanks Manfred

0 Kudos
PBTBgray
Contributor
Contributor

This came in from support today:

#0  Panic_WithBacktrace (sbt=sbt@entry=0x4300c8081e28, fmt=fmt@entry=0x41802ca63538 "PCPU %d: no heartbeat (%u/%u IPIs received)") at bora/vmkernel/main/panic.c:135
135        Panic_SaveRegs();
(gdb) bt
#0  Panic_WithBacktrace (sbt=sbt@entry=0x4300c8081e28, fmt=fmt@entry=0x41802ca63538 "PCPU %d: no heartbeat (%u/%u IPIs received)") at bora/vmkernel/main/panic.c:135
#1  0x000041802c8b3736 in HeartbeatHandleLockup (lockedUpInMS=49000, i=51) at bora/vmkernel/reliability/heartbeat.c:818
#2  HeartbeatCheckPCPU (timestampInMS=<optimized out>, i=51) at bora/vmkernel/reliability/heartbeat.c:716
#3  Heartbeat_DetectCPULockups (data=<optimized out>, timestamp=<optimized out>) at bora/vmkernel/reliability/heartbeat.c:517
#4  0x000041802c6fd40c in TimerBHHandlerLoop (list=0x43910afb6f50, curTC=994112000013552, t=0x43910afb6000) at bora/vmkernel/main/timer.c:2618
#5  Timer_BHHandler (unused=unused@entry=0x0) at bora/vmkernel/main/timer.c:2727
#6  0x000041802c6b176b in BHCheckBegin (canReschedule=1 '\001') at bora/vmkernel/main/bh.c:996
#7  BH_DrainAndDisableInterrupts (canReschedule=1 '\001') at bora/vmkernel/main/bh.c:1094
#8  0x000041802c6d3372 in IntrCookie_VmkernelInterrupt (vector=239, vectorData=vectorData@entry=0, fullFrame=fullFrame@entry=0x439153e9bc50) at bora/vmkernel/main/intrCookie.c:3958
#9  0x000041802c72e93d in IDTHandleInterrupt (fullFrame=0x439153e9bc50) at bora/vmkernel/main/idt.c:1288
#10 IDT_IntrHandler (fullFrame=0x439153e9bc50) at bora/vmkernel/main/idt.c:1311
#11 0x000041802c73d044 in gate_entry ()
#12 0x000041802c68b9c2 in CPU_StiMwaitInstr (hints=0, extensions=0) at bora/vmkernel/hardware/x86/cpu_int_arch.h:136
#13 Power_ArchSetCState (state=<optimized out>, c1type=<optimized out>) at bora/vmkernel/hardware/x86/power_arch.c:379
#14 0x000041802c67796c in Power_HaltPCPU (now=<optimized out>, c1type=<optimized out>) at bora/vmkernel/hardware/power.c:961
#15 0x000041802c8c49d3 in CpuSchedIdleHaltStart () at bora/vmkernel/sched/cpusched.c:12546
#16 CpuSchedIdleLoopInt () at bora/vmkernel/sched/cpusched.c:12746
#17 0x000041802c8c728a in CpuSchedBusyWait (mySchedPcpu=<optimized out>) at bora/vmkernel/sched/cpusched.c:12835
#18 CpuSchedTryBusyWait (prevIRQL=0 '\000', idleVcpu=0x439140da7100, nowStart=994111992008397, schedPcpu=0x418046c00080) at bora/vmkernel/sched/cpusched.c:7751
#19 CpuSchedChooseAndSwitch (prevIRQL=0 '\000', nhccNow=16392214628518, now=<optimized out>, schedPcpu=0x418046c00080, prev=0x439140da7100) at bora/vmkernel/sched/cpusched.c:7936
#20 CpuSchedDispatch (prevIRQL=prevIRQL@entry=0 '\000', prevState=prevState@entry=2147483648) at bora/vmkernel/sched/cpusched.c:8097
#21 0x000041802c8c8502 in CpuSchedWait (event=..., waitType=CPUSCHED_WAIT_NET, actionWakeupSet=0x0, queue=0x0, cookie=<optimized out>) at bora/vmkernel/sched/cpusched.c:9694
#22 0x000041802c8c85d5 in CpuSched_NoEvqWait (waitType=waitType@entry=CPUSCHED_WAIT_NET) at bora/vmkernel/sched/cpusched.c:9764
#23 0x000041802c8324a2 in NetPollWorldCallback (data=0x4300d10fd980) at bora/vmkernel/net/vmkapi_net_poll.c:605
#24 0x000041802c8c91b5 in CpuSched_StartWorld (destWorld=<optimized out>, previous=<optimized out>) at bora/vmkernel/sched/cpusched.c:10780
#25 0x0000000000000000 in ?? () from /build/storage61/release/bora-5969303/build/linux64/bora/build/esx/release/vmkmod-vmkernel64/chardevs
(gdb)

Analysis:

These events resemble an ongoing PR with the engineering team, where the issue appears to be with lsi_mr3 low memory allocation failure. I do see that the server is installed with the lsi_mr3 driver version
6.910.18.00-1vmw. In case if this PSOD keeps happening at frequent intervals, then we would request you to update the latest lsi_mr3 drivers which is available from the below URL and its corresponding compatible firmware version (25.5.2.0001):

0 Kudos
PBTBgray
Contributor
Contributor

Dell R920

4 Intel(R) Xeon(R) CPU E7-4820 v2 @ 2.00GHz

0 Kudos