Hello everyone,
On my whitebox ESXi 6.0U3 in my home lab, I have a problem with a PSOD, this is the second time it happened, last time was more than a month ago.
I looked a the vmkernel-log file from the diagnostic dump but I don't understand what could be causing this.
I hope someone here can shed a light on the situation, thanks!
2017-12-24T05:40:41.850Z cpu2:32999) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-6765062 x86_64] [0m
PCPU 1 locked up. Failed to ack TLB invalidate (total of 2 locked up, PCPU(s): 0,1).
2017-12-24T05:40:41.850Z cpu2:32999)cr0=0x8001003d cr2=0x1c2f8740080 cr3=0xcd83f000 cr4=0x216c
Log and photo attached.
First, read and understand this KB if you haven't already. Second, understand that with whitebox servers (i.e. unsupported hardware) your results may be unpredictable with stability not guaranteed. This is one of many possible side effects.
I did read the article beforehand, but could not extract any useful information other than:
The Failed to ack TLB Invalidate is caused by either a hardware or a software issue.
I just would like to know if someone can extract relevant information from the log to conclude if it's hardware or software at fault.
I understand the consequence of a whitebox ESXi, but i have been running them for years in my homelab.
Thanks
Let's hope it was a software bug.
I did an update of the esxi 6.0U3 to version 6.0.0-3.79.6921384
you could check the firmware or driver comparability.Can you try to Upgrade the ESXi build and see the PSOD is re-occuring
Regards,
Randhir
Another PSOD today. #PF Exception 14
It all seems to point toward a hardware issue, gonna need to do some mem and cpu testing
2018-01-22T20:00:01.318Z cpu0:422337)World: 9762: PRDA 0x418040000000 ss 0x0 ds 0x10b es 0x10b fs 0x0 gs 0x13b
2018-01-22T20:00:01.318Z cpu0:422337)World: 9764: TR 0x4020 GDT 0x43944e0a1000 (0x402f) IDT 0x4180310ca000 (0xfff)
2018-01-22T20:00:01.318Z cpu0:422337)World: 9765: CR0 0x80010031 CR3 0x16da26000 CR4 0x42768
2018-01-22T20:00:01.322Z cpu0:422337)Backtrace for current CPU #0, worldID=422337, rbp=0x4308c7b5ac70
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bbf0:[0x418031347e7f]PT_GetNextLevel@vmkernel#nover+0x1b stack: 0x4308c7b5ac70, 0x43944e0
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc20:[0x418031347f78]PT_GetL1Table@vmkernel#nover+0x24 stack: 0x0, 0x1d, 0x0, 0x3ffffffff
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc30:[0x418031648446]UserPT_LookupPageTable@<None>#<None>+0x4e stack: 0x0, 0x3fffffffff,
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bc80:[0x4180315e09e1]UserMem_HandleMapFault@<None>#<None>+0x865 stack: 0x418040901e00, 0x
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bec0:[0x4180315c6f82]User_Exception@<None>#<None>+0x126 stack: 0x0, 0x43944e09bf30, 0x439
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bf10:[0x418031055953]Int14_PF@vmkernel#nover+0x17f stack: 0x0, 0x4180310c8067, 0x0, 0x13b
2018-01-22T20:00:01.322Z cpu0:422337)0x43944e09bf30:[0x4180310c8067]gate_entry_@vmkernel#nover+0x0 stack: 0x0, 0xa5a5c8b, 0xfff35d94, 0x
2018-01-22T20:00:01.323Z cpu0:422337) [45m [33;1mVMware ESXi 6.0.0 [Releasebuild-7504637 x86_64] [0m
#PF Exception 14 in world 422337:hostd-probe IP 0x418031347e7f addr 0x6e2e8d
PTEs:0x6e2f04027;0x24ae88027;0x0;
2018-01-22T20:00:01.323Z cpu0:422337)cr0=0x80010031 cr2=0x6e2e8d cr3=0x16da26000 cr4=0x42768
2018-01-22T20:00:01.323Z cpu0:422337)frame=0x43944e09bb30 ip=0x418031347e7f err=2 rflags=0x10297
2018-01-22T20:00:01.323Z cpu0:422337)rax=0x6e2f04 rbx=0xa5a5 rcx=0xffff81016da26001
2018-01-22T20:00:01.323Z cpu0:422337)rdx=0xa5a5 rbp=0x4308c7b5ac70 rsi=0x6e2f04
2018-01-22T20:00:01.323Z cpu0:422337)rdi=0x3 r8=0x43006200e180 r9=0xffff8101c8ea2
2018-01-22T20:00:01.323Z cpu0:422337)r10=0xffff8101c8ea2d28 r11=0x0 r12=0x43944e09be58
2018-01-22T20:00:01.323Z cpu0:422337)r13=0x3fffffffff r14=0x0 r15=0x4308c7b5ac70
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:0 world:422337 name:"hostd-probe" (U)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:1 world:35576 name:"vmm0:BackupSvr" (V)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:2 world:35599 name:"vmm0:ARES" (V)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:3 world:35580 name:"vmm3:BackupSvr" (V)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:4 world:422336 name:"python" (U)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:5 world:35579 name:"vmm2:BackupSvr" (V)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:6 world:35578 name:"vmm1:BackupSvr" (V)
2018-01-22T20:00:01.323Z cpu0:422337)pcpu:7 world:35647 name:"vmm1:vCenterApp" (V)
2018-01-22T20:00:01.323Z cpu0:422337)@BlueScreen: #PF Exception 14 in world 422337:hostd-probe IP 0x418031347e7f addr 0x6e2e8d
PTEs:0x6e2f04027;0x24ae88027;0x0;
2018-01-22T20:00:01.324Z cpu0:422337)Code start: 0x418031000000 VMK uptime: 7:08:07:11.719
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bbf0:[0x418031347e7f]PT_GetNextLevel@vmkernel#nover+0x1b stack: 0x4308c7b5ac70
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc20:[0x418031347f78]PT_GetL1Table@vmkernel#nover+0x24 stack: 0x0
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc30:[0x418031648446]UserPT_LookupPageTable@<None>#<None>+0x4e stack: 0x0
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bc80:[0x4180315e09e1]UserMem_HandleMapFault@<None>#<None>+0x865 stack: 0x418040901e00
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bec0:[0x4180315c6f82]User_Exception@<None>#<None>+0x126 stack: 0x0
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bf10:[0x418031055953]Int14_PF@vmkernel#nover+0x17f stack: 0x0
2018-01-22T20:00:01.324Z cpu0:422337)0x43944e09bf30:[0x4180310c8067]gate_entry_@vmkernel#nover+0x0 stack: 0x0
2018-01-22T20:00:01.326Z cpu0:422337)base fs=0x0 gs=0x418040000000 Kgs=0x0