Barreleye
Contributor
Contributor

WHEA-Logger warnings only with VMWare Player

Since building my new Haswell 7x64 system on 6/26, I've found several warnings like this one in my Event Viewer log, and I've determined they coincide exactly with the times I ran VMs for Windows 8.1 and 7:

*****

Log Name:      System

Source:        Microsoft-Windows-WHEA-Logger

Date:          7/18/2013 1:31:03 PM

Event ID:      19

Task Category: None

Level:         Warning

Keywords:     

User:          LOCAL SERVICE

Computer:      ---

Description:

A corrected hardware error has occurred.

Reported by component: Processor Core

Error Source: Corrected Machine Check

Error Type: Internal parity error

Processor ID: 2

*****

The computer has been in heavy use 16 hours a day since I built it on 6/26. However, these warnings have been generated only during the times I was running VMs for Windows 8.1 and 7, and those times have been infrequent and of short duration, about a half dozen runs lasting five minutes or less each. This is documented conclusively by the VMWare logfiles, for the warning times are sandwiched between the VM start and stop times that appear in the logs. There were two such warnings when I discovered the issue on 7/14. While I can't consistently repro it, I managed to get two new warnings out of several attempts within the first few minutes of starting the VMs. I have no idea what aspect of using the VMs triggers it. I run the VMs in windows and never mess with the Unity mode. They are vanilla installs, new copies of VMs I built on my i5 750 system and archived for easy restoration, and basically I've just been verifying they work.

My system is as follows, and I am not now overclocking, nor have I ever overclocked anything:

i5 4670 and Hyper 212 Evo

Asus Gryphon Z87, BIOS 1007 and now 1206

Crucial Ballistix Sport 8 GB 1.35v DDR3-1600 DIMMs x 2, 16 GB total

Intel 4600 graphics -> monitor

Nvidia GT430 -> TV

Seasonic X650 PSU

Samsung 830 SSD boot drive, two 2 TB WD Green drives, all TrueCrypted. The VMs are on the SSD.

Windows 7 x64 w/ SP1, fully patched, installed 6/26

VMWare Player 5.0.2 build-1031769

I've observed Processor ID values of 0, 2, 4, and 6, which I don't understand as the CPU is quad-core with no hyperthreading.

The system is stable on Prime95, Aida64, and Intel Linpack, plus MemTest86+ 5.0 RC1. The stress tests do not provoke the warning, nor do they cause excessively high temps. CPU-intensive multitasking that uses all four cores such as transcoding video with Handbrake while doing network file copies or downloading from the Internet while watching video or browsing the web does not cause the warnings. The only thing that causes the warnings is running a VM in VMWare Player, and it tends to happen within the first few minutes, when CPU usage is nil.

I've had no BSODs or abnormal behavior of any kind; this new system has been great. That said, these warnings are still troubling. From googling, I've learned that overclockers commonly observe WHEA-Logger warnings when stressing the system under extreme overclocking, and they say the solution is usually more vCore. However, I'm not overclocking, and I don't think the 4670 is capable of it anyway. Besides, the warnings occur when my system is basically idle and only when running a VM in VMWare Player. They don't occur at any other times.

Has anyone heard of a problem like this with VMWare Player? Are there any Haswell users who don't observe this problem with VMWare Player?

Tags (1)
0 Kudos
15 Replies
Barreleye
Contributor
Contributor

Anyone?

I found one similar report dated a couple of years ago. The "Error Type" is given as "Cache Hierarchy Error" instead of my "Internal parity error", but it's otherwise the same:


WHEA-Logger Warning only when running VMware Works... - TOSHIBA FORUMS

0 Kudos
itstomd
Contributor
Contributor

I too have the same issues as you list. I have tried many things to get rid of them. I have 2 machines, one a asus 4750k and a  msi 4750k, both have the same issue. I am not having any crashes or anything yet.

Seems to me,when vmachines runs on this new chip set, there must me something new;  internal to the chips set to support vms. I think this is happing within the vm, and when its busy and conditions are just right it logs that error.

I think the error has always been there, and when it happens, even within the vm level, the OS sees it an logs it.

I would not doubt, that "correctable" errors like this could come from the "vm" side, and would be normal (cause vms are sensitive to timing). If you have these errors without running vm, then I think it would be a concern.

So we would need the intel folks to tell us what is different from this chipset in regards to the way it runs vms.

I see on the "msi" bios, I can select a "vm" priority or something to that effect, which I have never seen in a bios before. So for sure, this is some new thing, in the chipset, intel added, and windows is seeing this error bleed over from the vmside..

I hate the errors too, it makes me think the system is not running right, but since 2 of the same chipsets, different motherboards, have the SAME issue, gotta be chip set issue..

Least that's my thought on this error.

( this chipset however paired with a SSD drive sure is fast, real fast)


Barreleye
Contributor
Contributor

Thanks for the reply. Good to know it's not just my system! After another couple of weeks (more than six weeks total by now), the warnings continue to be limited to running VMs, and there's been no abnormal behavior. If I get an answer from Intel, I will update the thread.

0 Kudos
Barreleye
Contributor
Contributor

My last warning was on 7/29, and I didn't upgrade to VMWare Player 6 until 9/12. I figure a Windows Update must have silently addressed the warnings in the interim, as I've been using the same VMs and same usage pattern all this time.

0 Kudos
Kheper
Contributor
Contributor

Hello,

I'm experiencing the same behaviour under VMware Player 6 but under a Linux host with the latest Kernel 3.12 with a Xeon E3-1275v3 (haswell). It occurs only while running VMs, for instance in this case, running FreeBSD 9.2 and compiling some ports will cause the kernel to output MCE internal parity errors. I'm not getting any errors outside VMware Player, memtest and prime95 torture tests runs without errors.

0 Kudos
Kheper
Contributor
Contributor

As an update to the situation, I have also tried under Windows 8 with all the latest updates, and I do get the same exact errors as you with VMplayer 6.0, but in my case, I can easily trigger those error with a FreeBSD guest while compiling some ports. So the problem is both on Windows and Linux and seems to be an architectural problem of the Haswell CPU which is maybe not yet fully implemented on virtualization software and operating system kernels. I have still to see a WHEA-Logger internal parity error (Windows) or MCE internal parity error (Linux) outside of the context of running a VM, All I can say is this is quite frustrating, so far I have swapped the RAM with other sticks without success, soon I will be swapping the PSU. I don't think my E3-1275v3, which is basically an i7-4770, is damaged neither the motherboard which is a SuperMicro X10SAE. We three have the same errors on different CPU/Motherboard, but with the same haswell chipset or similar. I'll keep you updated on the situation.

0 Kudos
Kheper
Contributor
Contributor

as a quick update, the psu swap didn't change anything, I'm also getting the exact same errors with a server which has months of uptime and is brand new on complete different hardware, an Haswell E3-1220v3 which is equivalent to an i5-4570 or so, on an Asus P9D-4L. It might be a BSD problem, or not, I hope this get fixed soon.

0 Kudos
itstomd
Contributor
Contributor

So these errors are now gone after updating windows 8. What patch did it? I have no idea. Its windows 8 pro. its been running for over 3 months ( so the update was at least 3 months ago) with no errors.

its a i5-4670k and ms-7821 (gaming motherboard)

running Plextor ssd, rocket raid 622 with 2 1tb drives...

I'm going to guess, that these errors are within the vm, and are normal. but somehow windows 8 reads the "virtual" cpu errors as well ( some how it leaks out) and that's before the patch...

also to note, this is not specific to "VMware" these errors also occurred using virtual box as well...


0 Kudos
Kheper
Contributor
Contributor

hey thanks for reply,

well its happening also under qemu/kvm, even with -cpu haswell flag, I hope it will get fixed for Linux, still getting errors with latest kernel 3.13-rc1.

0 Kudos
Barreleye
Contributor
Contributor

Barreleye wrote:

My last warning was on 7/29, and I didn't upgrade to VMWare Player 6 until 9/12. I figure a Windows Update must have silently addressed the warnings in the interim, as I've been using the same VMs and same usage pattern all this time.

I wrote the above on 9/29, and I just got my first WHEA-Logger since 7/29 while running XBMC Gotham in a Windows 7 32-bit VM to test a new skin while the host machine was using Handbrake to convert a video and using all 4 cores at near 100%. Aside from ordinary updates, my system is the same as I described in the OP, and VMWare Player is now version 6.0.1. build-1379776. The system has remained solid as a rock all this time.

Interesting note by itstomd about Virtual Box being affected as well. Guess it's just some Haswell-VM weirdness.

0 Kudos
saftsack
Contributor
Contributor

Hello,

I don't use VMWare, but I had the same issue (details below). As first I thought, that it was a hardware error, so I swapped everything, but the operating system and the system which should be emulated (an old installation of Windows Server 2003). Same error with the new hardware, so I decided to find out the cause of the issue. I have to disappoint you, that I didn't find out the cause completely, but I found a way to get rid of the warnings.

Like Kheper I use qemu/kvm and I did the following:

I installed qemu-1.7.0 and replaced "-machine pc-i440fx-1.6" with "-machine pc-i440fx-1.7". After this change I never got any warning.

In a short sight on qemu's sourcecode I saw, that "gigabyte_align" isn't true in the 1.7 version. May this is a good hint for doing further investigations. It affects this line, which looks suspicious.:

ram_addr_t lowmem = gigabyte_align ? 0xc0000000 : 0xe0000000;

Best regards,

Oliver

Hardware event. This is not a software error.

MCE 7

CPU 3 BANK 0

TIME 1390267908 Tue Jan 21 02:31:48 2014

MCG status:

MCi status:

Corrected error

Error enabled

MCA: Internal parity error

STATUS 90000040000f0005 MCGSTATUS 0

MCGCAP c09 APICID 6 SOCKETID 0

CPUID Vendor Intel Family 6 Model 60

0 Kudos
richard612
Enthusiast
Enthusiast

Haswell owner here.  I'm getting these errors along with some occasional host machine hangs.  I'm on Workstation 10 rather than VMware Player, though.  The system passes memtest86+, prime95, CPUburn, etc.

I've been tweaking settings and I'm zeroing in on the C1E (Enhanced C1) power saving state being the culprit.  Not 100% sure yet, but if you can disable individual processor c-states in your BIOS then give disabled C1E a try.

0 Kudos
Kheper
Contributor
Contributor

As an update to my situation, I haven't received a single error since my last post which is 25th of November 2013 under qemu 1.7 with this following setting: qemu-system-x86_64 -cpu Haswell

I'm running only 64bits guests, 8 of them, 24/7, all popular Linux Distro, all three major BSD's, no errors since then with qemu and 64 bits guests.

As for the settings mentionned in the bios, i have tried already to disable c-states and I was getting errors, even with an Haswell compatible PSU. Problem seems to be solved with qemu with the cpu haswell flag and running only 64 bits guests.

0 Kudos
boistordu
Contributor
Contributor

i had encounter the same problem with every virtualbox version at high usage of my cpu i7-4790k under windows 8.1pro or even debien or ubuntu, so not related os. It has occured also on different MB from asrock z97extreme6 and z87M formula one.

I ve used virtualbox because of the use of boinc.berkeley.edu which is not working with vmware or qemu. i ve informed in septembre 2014 of this problem and they weren t aware of it and I think they don't communicate with virtualbox or other virtualization solution.

So if I understand correctly, the problem with QEMU has disappeared since 1.7? what about vmware?

clearly virtualbox have always the same problem with 4.3.28

and also i should say that i ve informed of asrock of this problem and they ve run some test for 3 days and didn't find anything but of course i don't know if maybe they had to create and use several customize VM + VM from atlas@home etc to encounter the same bug.

0 Kudos
boistordu
Contributor
Contributor

I have a question also, did you notice if the CPU was damaged in some sort? or if it was just a bug that disappeared with the deletion of the virtualization solution?

0 Kudos