VMware Cloud Community
ITStanG
Contributor
Contributor

PSOD ESXi 6 - PCPU 1 locked up. Failed to ack TLB invalidate

Hello everyone,

On my whitebox ESXi 6.0U3 in my home lab, I have a problem with a PSOD, this is the second time it happened, last time was more than a month ago.

I looked a the vmkernel-log file from the diagnostic dump but I don't understand what could be causing this.
I hope someone here can shed a light on the situation, thanks!

2017-12-24T05:40:41.850Z cpu2:32999) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-6765062 x86_64] [0m

PCPU 1 locked up. Failed to ack TLB invalidate (total of 2 locked up, PCPU(s): 0,1). 

2017-12-24T05:40:41.850Z cpu2:32999)cr0=0x8001003d cr2=0x1c2f8740080 cr3=0xcd83f000 cr4=0x216c

Log and photo attached.

0 Kudos
5 Replies
daphnissov
Immortal
Immortal

First, read and understand this KB if you haven't already. Second, understand that with whitebox servers (i.e. unsupported hardware) your results may be unpredictable with stability not guaranteed. This is one of many possible side effects.

0 Kudos
ITStanG
Contributor
Contributor

I did read the article beforehand, but could not extract any useful information other than:

The Failed to ack TLB Invalidate is caused by either a hardware or a software issue.

I just would like to know if someone can extract relevant information from the log to conclude if it's hardware or software at fault.

I understand the consequence of a whitebox ESXi, but i have been running them for years in my homelab.

Thanks

0 Kudos
ITStanG
Contributor
Contributor

Let's hope it was a software bug.

I did an update of the esxi 6.0U3 to version 6.0.0-3.79.6921384

0 Kudos
admin
Immortal
Immortal

you could check the firmware or driver comparability.Can you try to Upgrade  the ESXi build  and see the PSOD is re-occuring

Regards,

Randhir

0 Kudos
ITStanG
Contributor
Contributor

Another PSOD today. #PF Exception 14

It all seems to point toward a hardware issue, gonna need to do some mem and cpu testing Smiley Sad

2018-01-22T20:00:01.318Z cpu0:422337)World: 9762: PRDA 0x418040000000 ss 0x0 ds 0x10b es 0x10b fs 0x0 gs 0x13b

2018-01-22T20:00:01.318Z cpu0:422337)World: 9764: TR 0x4020 GDT 0x43944e0a1000 (0x402f) IDT 0x4180310ca000 (0xfff)

2018-01-22T20:00:01.318Z cpu0:422337)World: 9765: CR0 0x80010031 CR3 0x16da26000 CR4 0x42768

2018-01-22T20:00:01.322Z cpu0:422337)Backtrace for current CPU #0, worldID=422337, rbp=0x4308c7b5ac70

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bbf0:[0x418031347e7f]PT_GetNextLevel@vmkernel#nover+0x1b stack: 0x4308c7b5ac70, 0x43944e0

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc20:[0x418031347f78]PT_GetL1Table@vmkernel#nover+0x24 stack: 0x0, 0x1d, 0x0, 0x3ffffffff

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc30:[0x418031648446]UserPT_LookupPageTable@<None>#<None>+0x4e stack: 0x0, 0x3fffffffff,

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc80:[0x4180315e09e1]UserMem_HandleMapFault@<None>#<None>+0x865 stack: 0x418040901e00, 0x

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bec0:[0x4180315c6f82]User_Exception@<None>#<None>+0x126 stack: 0x0, 0x43944e09bf30, 0x439

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bf10:[0x418031055953]Int14_PF@vmkernel#nover+0x17f stack: 0x0, 0x4180310c8067, 0x0, 0x13b

2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bf30:[0x4180310c8067]gate_entry_@vmkernel#nover+0x0 stack: 0x0, 0xa5a5c8b, 0xfff35d94, 0x

2018-01-22T20:00:01.323Z cpu0:422337) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-7504637 x86_64] [0m

#PF Exception 14 in world 422337:hostd-probe IP 0x418031347e7f addr 0x6e2e8d

PTEs:0x6e2f04027;0x24ae88027;0x0;

2018-01-22T20:00:01.323Z cpu0:422337)cr0=0x80010031 cr2=0x6e2e8d cr3=0x16da26000 cr4=0x42768

2018-01-22T20:00:01.323Z cpu0:422337)frame=0x43944e09bb30 ip=0x418031347e7f err=2 rflags=0x10297

2018-01-22T20:00:01.323Z cpu0:422337)rax=0x6e2f04 rbx=0xa5a5 rcx=0xffff81016da26001

2018-01-22T20:00:01.323Z cpu0:422337)rdx=0xa5a5 rbp=0x4308c7b5ac70 rsi=0x6e2f04

2018-01-22T20:00:01.323Z cpu0:422337)rdi=0x3 r8=0x43006200e180 r9=0xffff8101c8ea2

2018-01-22T20:00:01.323Z cpu0:422337)r10=0xffff8101c8ea2d28 r11=0x0 r12=0x43944e09be58

2018-01-22T20:00:01.323Z cpu0:422337)r13=0x3fffffffff r14=0x0 r15=0x4308c7b5ac70

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:0 world:422337 name:"hostd-probe" (U)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:1 world:35576 name:"vmm0:BackupSvr" (V)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:2 world:35599 name:"vmm0:ARES" (V)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:3 world:35580 name:"vmm3:BackupSvr" (V)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:4 world:422336 name:"python" (U)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:5 world:35579 name:"vmm2:BackupSvr" (V)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:6 world:35578 name:"vmm1:BackupSvr" (V)

2018-01-22T20:00:01.323Z cpu0:422337)pcpu:7 world:35647 name:"vmm1:vCenterApp" (V)

2018-01-22T20:00:01.323Z cpu0:422337)@BlueScreen: #PF Exception 14 in world 422337:hostd-probe IP 0x418031347e7f addr 0x6e2e8d

PTEs:0x6e2f04027;0x24ae88027;0x0;

2018-01-22T20:00:01.324Z cpu0:422337)Code start: 0x418031000000 VMK uptime: 7:08:07:11.719

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bbf0:[0x418031347e7f]PT_GetNextLevel@vmkernel#nover+0x1b stack: 0x4308c7b5ac70

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc20:[0x418031347f78]PT_GetL1Table@vmkernel#nover+0x24 stack: 0x0

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc30:[0x418031648446]UserPT_LookupPageTable@<None>#<None>+0x4e stack: 0x0

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc80:[0x4180315e09e1]UserMem_HandleMapFault@<None>#<None>+0x865 stack: 0x418040901e00

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bec0:[0x4180315c6f82]User_Exception@<None>#<None>+0x126 stack: 0x0

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bf10:[0x418031055953]Int14_PF@vmkernel#nover+0x17f stack: 0x0

2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bf30:[0x4180310c8067]gate_entry_@vmkernel#nover+0x0 stack: 0x0

2018-01-22T20:00:01.326Z cpu0:422337)base fs=0x0 gs=0x418040000000 Kgs=0x0

0 Kudos