VMware Cloud Community
trungtien157199
Contributor
Contributor

PSOD on Server BL660c G9 with ESXi 6.5 U1

We have a proplem with my Esxi host with this information:

- Blade BL660c G9

   + CPU: E5-4650v4

   + NIC:  HPE FlexFabric 10Gb 2P 536FLB

- OS: ESXi 6.5 U1

   + File OS: VMware-ESXi-6.5.0-Update1-6765664-HPE-650.U1.10.1.5.26-Oct2017.iso --> release at this time : 3-NOV-2017

- Purple Screen of Death screen shot

pastedImage_1.png

- Log dump (file attached)

2017-11-30T02:08:37.211Z cpu24:335917)@BlueScreen: PCPU 6: no heartbeat (3/3 IPIs received)

2017-11-30T02:08:37.211Z cpu24:335917)Code start: 0x41800c000000 VMK uptime: 21:19:54:29.304

2017-11-30T02:08:37.211Z cpu24:335917)Saved backtrace from: pcpu 6 Heartbeat NMI

2017-11-30T02:08:37.211Z cpu24:335917)0x4391bcf1b800:[0x41800c87aeb0]__raw_spin_failed@com.vmware.driverAPI#9.2+0x0 stack: 0x4394dd227100

2017-11-30T02:08:37.217Z cpu24:335917)base fs=0x0 gs=0x418046000000 Kgs=0x0

2017-11-30T02:08:37.106Z cpu6:69534)NMI: 663: NMI IPI: We Halt. RIPOFF(base):RBP:CS [0x87aeb0(0x41800c000000):0x1:0x4010] (Src 0x1, CPU6)

2017-11-30T02:08:09.105Z cpu6:69534)NMI: 689: NMI IPI: RIPOFF(base):RBP:CS [0x87aeb5(0x41800c000000):0x1:0x4010] (Src 0x1, CPU6)

2017-11-30T02:07:56.103Z cpu6:69534)NMI: 689: NMI IPI: RIPOFF(base):RBP:CS [0x87aeb2(0x41800c000000):0x1:0x4010] (Src 0x1, CPU6)

2017-11-15T13:46:19.236Z cpu103:310875)Attempting to install an image profile bypassing signing and acceptance level verification. This may pose a large security risk.

2017-11-30T02:08:37.219Z cpu24:335917)Backtrace for current CPU #24, worldID=335917, fp=0x0

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169b8c0:[0x41800c0ed451]PanicvPanicInt@vmkernel#nover+0x545 stack: 0x41800c0ed451, 0x0, 0x43924169b968, 0x4300c80eec98, 0x100000000

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169b960:[0x41800c0ed536]Panic_WithBacktrace@vmkernel#nover+0x56 stack: 0x43924169b9d0, 0x43924169b980, 0x43924169ba0c, 0x41800c0ea46d, 0x6

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169b9d0:[0x41800c2b3736]Heartbeat_DetectCPULockups@vmkernel#nover+0x4be stack: 0x418000000018, 0x600, 0xbf68, 0x30c2fec08, 0x216

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169ba50:[0x41800c0fd40c]Timer_BHHandler@vmkernel#nover+0xdc stack: 0xeb972a7400000, 0x439109c30000, 0x0, 0x41800c0fdfa4, 0xef

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bad0:[0x41800c0b176b]BH_DrainAndDisableInterrupts@vmkernel#nover+0x7b stack: 0x43924169bbb0, 0xef000000ff, 0x0, 0x4180460004c8, 0xffffffffffffffff

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bb60:[0x41800c0d3372]IntrCookie_VmkernelInterrupt@vmkernel#nover+0xc6 stack: 0xef, 0x20, 0x1, 0x41800c12e93d, 0x43924169bc40

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bb90:[0x41800c12e93d]IDT_IntrHandler@vmkernel#nover+0x9d stack: 0x418046000640, 0x41800c13d044, 0x4018, 0x4018, 0x0

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bbb0:[0x41800c13d044]gate_entry_@vmkernel#nover+0x0 stack: 0x0, 0x20, 0x0, 0x0, 0x0

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bc70:[0x41800c08b9c2]Power_ArchSetCState@vmkernel#nover+0x106 stack: 0x7fffffffffffffff, 0x418046000000, 0x418046000080, 0x41800c2c4a13, 0x0

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bca0:[0x41800c2c4a13]CpuSchedIdleLoopInt@vmkernel#nover+0x39b stack: 0x1000000, 0x418046000120, 0x0, 0x180000000, 0x4392416a7100

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bd10:[0x41800c2c72ca]CpuSchedDispatch@vmkernel#nover+0x114a stack: 0x410000000001, 0x4391734a7480, 0x418046000108, 0x418046000120, 0x439140c27100

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169be40:[0x41800c2c8542]CpuSchedWait@vmkernel#nover+0x27a stack: 0x1004309a4196608, 0xeb971dbe1514f, 0x6200000001, 0x0, 0x0

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bec0:[0x41800c2c88b0]CpuSchedTimedWaitInt@vmkernel#nover+0xa8 stack: 0x430900002001, 0x6200000001, 0x4309a27a2fc0, 0x41000005202d, 0x10cd800000000

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bf30:[0x41800c2c8aa6]CpuSched_EventQueueTimedWait@vmkernel#nover+0x36 stack: 0x4309a27a2fc0, 0x41800c0c9fac, 0x4309a27a3000, 0x3b, 0x4301a1521050

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bf50:[0x41800c0c9fac]helpFunc@vmkernel#nover+0x564 stack: 0x4301a1521050, 0x0, 0x0, 0x0, 0x51

2017-11-30T02:08:37.219Z cpu24:335917)0x43924169bfe0:[0x41800c2c91f5]CpuSched_StartWorld@vmkernel#nover+0x99 stack: 0x0, 0x0, 0x0, 0x0, 0x0

0 Kudos
14 Replies
daphnissov
Immortal
Immortal

What power policy are you using on this blade? What's the BIOS version?

0 Kudos
trungtien157199
Contributor
Contributor

Hi daphnissov,

Bios version is:  System ROM I38 v2.40 (02/17/2017)

Power policy: Do you want to ask "power policy" in "enclosure firmware Management"? => I don't enable Enclosure Firmware Management, So the power policy was not set.

pastedImage_0.png

0 Kudos
daphnissov
Immortal
Immortal

Can you show a screenshot of the power policy in vCenter? This would be on the host under Configure then Hardware section.

0 Kudos
trungtien157199
Contributor
Contributor

Yep,This is screen shot.

pastedImage_1.png

0 Kudos
trungtien157199
Contributor
Contributor

Does anyone here can help me to resolve this problem?

0 Kudos
lukaslang
Enthusiast
Enthusiast

I know, that your 10 Gig Adaper is not listed under this HPE KB, but it could be something to try:

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00030080en_us&hprpt_id=HPGL_ALERTS_199...

0 Kudos
SureshKumarMuth
Commander
Commander

[0x41800c87aeb0]__raw_spin_failed@com.vmware.driverAPI#9.2+0x0 stack: 0x4394dd227100

For me it looks like a disk or controller related issue,  it is the latest verion of ESXi may be a new issue. since you have the dump, please create a VMware ticket to analyze the dump to find the root cause.

Also check if all your firmware and driver of the Local SCSI controller/ BIOS / HBA /NIC are updated and compatible with ESXi 6.5 U1.

Regards,
Suresh
https://vconnectit.wordpress.com/
0 Kudos
daphnissov
Immortal
Immortal

I'd recommend you open a case with VMware at this point.

0 Kudos
Finikiez
Champion
Champion

Did this happen once or multiple times?

I would recommend check IML log for this blade to check for any HW errors and run HW tests.

0 Kudos
trungtien157199
Contributor
Contributor

Hi ,

Did this happen once or multiple times?

This psod only appears once.

I would recommend check IML log for this blade to check for any HW errors and run HW tests.

Yep, maybe hardware problem but I'm not determine the psod from hardware or OS version.

0 Kudos
Finikiez
Champion
Champion

Also it might be an issue with drivers like hp-ilo or agents like hp-ams.

I also recommend to check if this versions are up-to-date.

0 Kudos
trungtien157199
Contributor
Contributor

"drivers like hp-ilo or agents like hp-ams"

Could you show me in the log file where problem with hp-ilo or hp-ams.

0 Kudos
Finikiez
Champion
Champion

For that I need to have full vmkernel.log from the host Smiley Happy

0 Kudos
dariushibmca
Contributor
Contributor

Hi,

with HP servers is always a good idea to start by following the VMware HP recipe.

http://vibsdepot.hpe.com/hpq/recipes/Apr2017VMwareRecipeSPP201704_22.0.pdf

0 Kudos