VMware Cloud Community
EdZ314
Enthusiast
Enthusiast

ESXi 5.5, Dell R815 new BIOS and PSOD TLB ACK invalidate

I'd like to hear from people who have observed PSOD's of type "TLB Ack Invalidate" on Dell PE R815 servers (or similar). I've personally observed 12 occurrences of this across multiple Dell servers with BIOS 3.2.1 over the last several months. Now, there is a new BIOS version 3.2.2 with references a fix for a known stability issue when using the R815 with ESX and AMD 6300 series processors. Is this something that has been widely observed or have I just been unlucky to have run into so many issues? If you have seen it, is there any publicly available information about this known issue?

VMWare support documents state that this type of PSOD is related to a hardware issue. When I hear hardware issue, I think hardware failure, but the BIOS update indicates it is more to do with hardware function than failure. I'd appreciate your thoughts about how to categorize this sort of problem.

Tags (4)
0 Kudos
7 Replies
Alistar
Expert
Expert

Hello,

we do not use this hardware in our environment, but I'd just like to give you a little insight into this issue.

The PSOD was caused within a Translation lookaside buffer - a certain subset of CPU's microcode to speed up the translation of Physical to Virtual address space. The BIOS upgrade comes into play when there is some sort of issue reported with the CPU's microcode. It seems there is a miscommunication somewhere along the bus or within the CPU itself, and then the host crashes to maintain data integrity. This was a purely code-oriented issue and nothing to fear hardware-wise. Try updating the BIOS on your hosts and give it a test drive Smiley Happy

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/
admin
Immortal
Immortal

This may be the result of AMD erratum 815.  See http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf.

EdZ314
Enthusiast
Enthusiast

Thanks for the information about the TLB and the BIOS recommendations. There are two VMWare KB's which I have been referred to which are fixed in a recent ESX 5.5 patch

VMware KB: VMware ESXi 5.x host fails with a purple diagnostic screen with an unexpected function fl...

VMware KB: VMware ESXi 5.x host fails with a purple diagnostic screen with an unexpected function fl...

These are included in the post 5.5 U2 patch below:

VMware KB: VMware ESXi 5.5, Patch ESXi-5.5.0-20141004001-standard

We're going to try both and see what happens. Also, please note that although the TLB ACK invalidate has been consistently referred to as a hardware related problem (CPU not responding) in my support cases, there are some VMWare documents that clearly state that the CPU can go into a non-responsive state for an extended period due to an ESXi software issue as well.

0 Kudos
admin
Immortal
Immortal

The referenced KB articles are in regards to AMD erratum 815, which I mentioned above.  The ESXi patch contains the necessary microcode update to work around this erratum.  If the problem is resolved by either the ESXi patch or the BIOS update, then the likely cause was AMD erratum 815.

0 Kudos
EdZ314
Enthusiast
Enthusiast

Update from Dell - BIOS Version 3.2.2, released recently addresses a known issue with PSOD events on ESXi 5.5. We've deployed it to about 40 out of 60 Dell R815 servers now (along with ESXi 5.5 patch3). So far, there has been one PSOD with the new BIOS version in a period of about one month. If anyone else has had this issue on this model of Dell server, I'd like to hear if you have tried the new BIOS if there are any further PSOD's after this version has been applied.

0 Kudos
gferreyra
Enthusiast
Enthusiast

Hi there.

We experienced a PSOD days ago. Dell R815, ESXi 5.5 build 2302651, BIOS 3.2.2.

PCPU 23 locked up. Failed to ack TLB.

VMware: "Hardware error".

The server is fine.

Still waiting for another opinion.

Cheers!

0 Kudos
admin
Immortal
Immortal

Do you have details on the PSOD?

0 Kudos