VMware Communities
ac2014
Contributor
Contributor

[Hardware Error]: Machine check events logged (since upgrading to Workstation 11)

Hello,

CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

Host: Debian 7 (64 bit)

I've noticed these log entries appearing sporadically after upgrading to Workstation 11:

/var/log/kern.log:

kernel: [220792.810891] [Hardware Error]: Machine check events logged

Then, I've installed mcelog.

/var/log/mcelog:

Hardware event. This is not a software error.
MCE 0
CPU 1 BANK 0
TIME 1418310100 Thu Dec 11 10:01:40 2014
MCG status:
MCi status:
Corrected error
Error enabled
MCA: Unknown Error 5
STATUS 90000040000f0005 MCGSTATUS 0
MCGCAP c09 APICID 2 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60

Does anyone else see these messages?

Thank you,

Alex.

0 Kudos
6 Replies
admin
Immortal
Immortal

Yes; it seems to be fairly common.  Try a google search for 90000040000f0005 or 0x90000040000f0005.  A lot of software seems to induce this exception: VMware products, Hyper-V, QEMU, ...

0 Kudos
ac2014
Contributor
Contributor

Thank you for your reply.

Yes, I did find the same articles while searching Google.

However, I didn't see these log messages while running Workstation 10.

Do you know why Workstation 11 causing these events ?

Thank you,

Alex.

0 Kudos
admin
Immortal
Immortal

Workstation 10 also induces these events on some processors, so I don't think it's anything new in Workstation 11.  We don't know what causes this event, but as your event log indicates, it is a hardware failure rather than a software failure.  On a positive note, the error always appears to be corrected.

ac2014
Contributor
Contributor

Thank you.

0 Kudos
admin
Immortal
Immortal

This is Intel erratum HSD131.  From http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-famil...:

HSD131. Spurious Corrected Errors May be Reported

Problem: Due this erratum, spurious corrected errors may be logged in the IA32_MC0_STATUS register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits [15:0]) of 0x0005. If CMCI is enabled, these spurious corrected errors also signal interrupts.

Implication: When this erratum occurs, software may see corrected errors that are benign. These corrected errors may be safely ignored.

Workaround: None identified.

Status: For the steppings affected, see the Summary Table of Changes

0 Kudos
ac2014
Contributor
Contributor

Thank you for posting link to Intel erratum HSD131.

0 Kudos