VMware Cloud Community
meimeiriver
Enthusiast
Enthusiast

Failed to ack TLB invalidate PSOD

Running ESXi 6 u2, the last 2 days I've been getting 2 PSOD (Pink Screen of Death) like these (see below). I ran a memtest86 for nearly 2 days now, but no errors. What could cause this? (Other than a rather unlikely phycical CPU error)?

I also get a strange warning about "unable to remove deleted USB storage adapters." Probably not related, but you never know.

Thanks.

-------

2016-07-25T17:55:44.536Z cpu8:33031)@BlueScreen: PCPU 11 locked up. Failed to ack TLB invalidate (total of 2 locked up, PCPU(s): 10,11).

2016-07-25T17:55:44.536Z cpu8:33031)Code start: 0x418012800000 VMK uptime: 2:23:17:11.353

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839bbd0:[0x418012877afa]PanicvPanicInt@vmkernel#nover+0x37e stack: 0x4390c839bc68

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839bc60:[0x418012877dc5]Panic_NoSave@vmkernel#nover+0x4d stack: 0x4390c839bcc0

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839bcc0:[0x41801288be15]TLBGetLockedCPUBacktraces@vmkernel#nover+0x25d stack: 0x9

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839be80:[0x41801288c106]TLBDoInvalidate@vmkernel#nover+0x21a stack: 0x4390c99a7000

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839bed0:[0x418012dd0f30]UserMem_CartelFlush@<None>#<None>+0xc0 stack: 0x0

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839bf50:[0x418012e40e26]UserMemTouchedEstimationLoop@<None>#<None>+0x1d2 stack: 0x30bc4a367a

2016-07-25T17:55:44.537Z cpu8:33031)0x4390c839bfd0:[0x418012a14a3e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0

2016-07-25T17:55:44.542Z cpu8:33031)base fs=0x0 gs=0x418042000000 Kgs=0x0

2016-07-22T18:43:07.979Z cpu8:33697)Warning: unable to remove deleted USB storage adapters

2016-07-22T18:43:07.979Z cpu8:33697)Warning: unable to remove deleted USB storage adapters

2016-07-25T17:55:44.542Z cpu8:33031)vmkernel            0x0 .data 0x0 .bss 0x0

2016-07-25T17:55:44.542Z cpu8:33031)chardevs            0x418012db7000 .data 0x417fc0000000 .bss 0x417fc00003c0

0 Kudos
4 Replies
basteku73
Enthusiast
Enthusiast

Hi,

Have you read this KB: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10202...‌‌ ?

It looks  that the "Failed to ack TLB Invalidate" is caused by either a hardware or a software issue.

Try this one solution:

"..... Disable USB devices option in BIOS and then power on the host . (At this time if there are stale entries will be gone). Then go back to USB option in BIOS and enable them and test your server status.

All you require is 2 reboots downtime."

Regads,

Sebastian

0 Kudos
meimeiriver
Enthusiast
Enthusiast

Thanks fo rreplying. I read that KB. Smiley Happy But my ESXI boots from USB, so disabling USB devices in the BIOS just will make my server not start. Guess I don't really understand their proposed solution.

0 Kudos
meimeiriver
Enthusiast
Enthusiast

Anyone? I ran stresslinux (bootable CPU test, amongst others) for a day too. No errors found.

Also, I don't think the stale USB warning is really relevant here: it's the CPU that is allegedly misbehaving.

N.B. That 'Failed to ack TLB invalidate' PSOD seems to show up a lot in ESXi, apparently. Maybe it's really ESXi?

0 Kudos
meimeiriver
Enthusiast
Enthusiast

*bump*

0 Kudos