VMware Communities
Cornuvia
Contributor
Contributor

WHEA Event 19. Workstation 10, incompatible with Core i7-4800MQ?

Hello all,

I have a brand new Dell Latitude E6540 notebook with Core I7-4800MQ CPU @ 2.7 GHz running Windows 7 prof. 64-bit. All drivers and Windows patches are up-to-date.


Whenever I run VMWare Workstation 10 (I develop software inside a virtual machine), I get at least one error message in the Windows event log of the host machine *per minute*;

The error message reads "Event 19, WHEA -Logger, a corrected hardware error has occurred. Reported by component: Processor Core. Error source: Corrected Machine Check. Error type:internal parity error. Processor ID: 0". (The Processor ID in the error message varies.)

The error *only* happens when I work inside a virtual machine. Otherwise, the notebook runs perfectly stable. I ran the Intel processor diagnostic tool which did a two hour burn-in stress test with no errors. Nothing is over-clocked and the processor doesn't overheat either. The RAM tests fine.


I have contacted Dell Pro Support. They first advised I do a factory restore of the operating system. I did that (took me all day to install my apps again) but the error persisted. A Bios update didn't change anything either. All drivers and service packs are up-to-date too.Then they sent me a technician. First the mainboard was replaced. Then the CPU. Then the RAM. But the error still persists.

Given that all hardware has been replaced and the software is up-to-date, I am at my wits' end. I no longer believe that the hardware is faulty. Do I have to live with this error message forever ?


Thanks for your attention,

Arthur


0 Kudos
9 Replies
admin
Immortal
Immortal

This sounds like Intel erratum HSM102: Processor May Experience a Spurious LLC-Related Machine Check During Periods of High Activity.  See http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-famil....

0 Kudos
Cornuvia
Contributor
Contributor

I forgot to mention, the "Mcistat" flag in the WHEA error message contains the value 0x90000040000f0005

0 Kudos
admin
Immortal
Immortal

Does the WHEA error message indicate which bank this MCI status is from?

0 Kudos
AChanceFind
Contributor
Contributor

I am having the same error show up on a Dell Precision M4800 mobile workstation. I have the same CPU ,  i7-4800MQ @2.70GHz.  I am on Windows 7 64bit professional running VMware player 6.0.1 build 1379776.

I too have ran several hours worth of stress diagnostics and was unable to repro the error other than when running VMware player (somtimes with it just running in the background with an "idle" Ubuntu VM).

When I've tried to trigger the error by running a lot of video sessions in the VM it does not happen. It seems random. Some days I go without any of these errors, other days I'll have  1 or 2, and on other days they'll appear in clusters of several so it does not really seem related to "high activity" for me.   

I have not noticed any bad affects such as freezing or performance hits. I found some other forum chatter that alludes to this being an overclocking issue?

I noticed the MCABank is always 0, but the processor IDs are different each time.

Any clues would be appreciated.  Thanks!

Here is the full error:

Log Name:      System

Source:        Microsoft-Windows-WHEA-Logger

Date:          3/3/2014 3:40:17 PM

Event ID:      19

Task Category: None

Level:         Warning

Keywords:    

User:          LOCAL SERVICE

Computer:      PrecisionM4800

Description:

A corrected hardware error has occurred.

Reported by component: Processor Core

Error Source: Corrected Machine Check

Error Type: Internal parity error

Processor ID: 1

The details view of this entry contains further information.

Event Xml:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{C26C4F3C-3F66-4E99-8F8A-39405CFED220}" />

    <EventID>19</EventID>

    <Version>0</Version>

    <Level>3</Level>

    <Task>0</Task>

    <Opcode>0</Opcode>

    <Keywords>0x8000000000000000</Keywords>

    <TimeCreated SystemTime="2014-03-03T23:40:17.382291400Z" />

    <EventRecordID>49948</EventRecordID>

    <Correlation ActivityID="{DF8A3291-FF9B-438C-80AE-652D7D1918CD}" />

    <Execution ProcessID="2204" ThreadID="16684" />

    <Channel>System</Channel>

    <Computer>PrecisionM4800</Computer>

    <Security UserID="S-1-5-19" />

  </System>

  <EventData>

    <Data Name="ErrorSource">1</Data>

    <Data Name="ApicId">1</Data>

    <Data Name="MCABank">0</Data>

    <Data Name="MciStat">0x90000040000f0005</Data>

    <Data Name="MciAddr">0x0</Data>

    <Data Name="MciMisc">0x0</Data>

    <Data Name="ErrorType">12</Data>

    <Data Name="TransactionType">256</Data>

    <Data Name="Participation">256</Data>

    <Data Name="RequestType">256</Data>

    <Data Name="MemorIO">256</Data>

    <Data Name="MemHierarchyLvl">256</Data>

    <Data Name="Timeout">256</Data>

    <Data Name="OperationType">256</Data>

    <Data Name="Channel">256</Data>

    <Data Name="Length">864</Data>

0 Kudos
richard612
Enthusiast
Enthusiast

Have a look at my post here and see if it helps..

0 Kudos
Cornuvia
Contributor
Contributor

-System

-Provider
[ Name] Microsoft-Windows-WHEA-Logger
[ Guid] {C26C4F3C-3F66-4E99-8F8A-39405CFED220}

EventID19
Version0
Level3
Task0
Opcode0
Keywords0x8000000000000000
-TimeCreated

[ SystemTime] 2014-03-10T07:45:50.208060200Z
EventRecordID39669
-Correlation

[ ActivityID] {75A0F07A-6A48-4B6B-82AB-219458685B0D}
-Execution

[ ProcessID] 2720
[ ThreadID] 2536
ChannelSystem
ComputerArthur-PC
-Security

[ UserID] S-1-5-19
-EventData

ErrorSource1
ApicId7
MCABank0
MciStat0x90000040000f0005
MciAddr0x0
MciMisc0x0
ErrorType12
TransactionType256
Participation256
RequestType256
MemorIO256
MemHierarchyLvl256
Timeout256
OperationType256
Channel256
Length864
RawData435045521002FFFFFFFF0300020000000200000060030000312D07000A030E140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131B18BCE2DD7BD0E45B9AD9CF4EBD4F890644E8DEC333CCF0100000000000000000000000000000000000000000000000058010000C00000000102000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000002000000000000000000000000000000000000000000000018020000400000000102000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000002000000000000000000000000000000000000000000000058020000080100000102000000000000011D1E8AF94257459C33565E5CC3F7E80000000000000000000000000000000002000000000000000000000000000000000000000000000057010000000000000002080000000000C30603000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000700000000000000000000000000000000000000000000000000000000000000000000000000000003000000000000000700000000000000C306030000081007FFFBFA7FFFFBEBBF000000000000000000000000000000000000000000000000000000000000000001000000010000005DF537C1343CCF0107000000000000000000000000000000000000000000000005000F0040000090000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0 Kudos
Cornuvia
Contributor
Contributor

I notice something peculiar; the error only seems to occur when I run 32-bit VM's, not in 64-bit ones.  And I also believe that the errors start after VMWare tools have been installed inside the VM.

I was testing a lengthy (4-hour) unattended network-based 32-bit Windows installation with lots of applications in a VM. The installation of each application in the VM is logged (with timestamp) in a log file.


I notice that the timestamp of the first WHEA error in the host machine coincides *exactly* with the timestamp in my logfile where VMWare tools finished installing in the VM and made the VM reboot.

0 Kudos
misha256
Contributor
Contributor

I'm experiencing the same issue with Workstation 10.0.4, on Windows 7 x64 host fresh install with latest BIOS, drivers and updates.  The PC is a new HP EliteDesk 800 Mini, with Intel i7-4785T CPU.  So far I have only worked with Windows XP 32-bit guests.

Based on what I have read here so far, I will do some further tests and report back.

Excerpt from the Event Viewer:

  <System>

    <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{C26C4F3C-3F66-4E99-8F8A-39405CFED220}" />

    <EventID>19</EventID>

    <Version>0</Version>

    <Level>3</Level>

    <Task>0</Task>

    <Opcode>0</Opcode>

    <Keywords>0x8000000000000000</Keywords>

    <TimeCreated SystemTime="2014-11-04T21:36:20.230309100Z" />

    <EventRecordID>9680</EventRecordID>

    <Correlation ActivityID="{EAE38998-3319-44CB-87AF-E1283FC323C5}" />

    <Execution ProcessID="1536" ThreadID="2300" />

    <Channel>System</Channel>

    <Computer>mc1004b</Computer>

    <Security UserID="S-1-5-19" />

  </System>

  <EventData>

    <Data Name="ErrorSource">1</Data>

    <Data Name="ApicId">3</Data>

    <Data Name="MCABank">0</Data>

    <Data Name="MciStat">0x90000040000f0005</Data>

    <Data Name="MciAddr">0x0</Data>

    <Data Name="MciMisc">0x0</Data>

    <Data Name="ErrorType">12</Data>

    <Data Name="TransactionType">256</Data>

    <Data Name="Participation">256</Data>

    <Data Name="RequestType">256</Data>

    <Data Name="MemorIO">256</Data>

    <Data Name="MemHierarchyLvl">256</Data>

    <Data Name="Timeout">256</Data>

    <Data Name="OperationType">256</Data>

    <Data Name="Channel">256</Data>

    <Data Name="Length">864</Data>

  </EventData>

0 Kudos
misha256
Contributor
Contributor

This is erratum HSM142.  The events are indicating MciStat = 0x90000040000f0005, which is bang on:

HSM142: Spurious Corrected Errors May be Reported

Due this erratum, spurious corrected errors may be logged in the IA32_MC0_STATUS register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits [15:0]) of 0x0005. If CMCI is enabled, these spurious corrected errors also signal interrupts.

When this erratum occurs, software may see corrected errors that are benign. These corrected errors may be safely ignored.

Same issue exists on 4th generation desktop processors – erratum HSD131.  Looks like Workstation exposes the erratum quite nicely under some circumstances.  I wouldn't be surprised if other VM software does the same.

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-famil...

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-famil...

M

0 Kudos