VMware Cloud Community
rDale
Enthusiast
Enthusiast

McAfee + vmotion = BSOD

Wondering if anyone else is running into this, just updated some virtual machines running on ESX 3.5 U1 hosts with the newest VirusScan Enterprise 8.5.0i and although the machine run fine during there normal opertations on the ESX host the moment that I vmotion them they Blue Screen with mfehidk.sys PAGE_FAULT_IN_NONPAGED_AREA.

simply stopping all services stops the blue screening.

0 Kudos
35 Replies
Zathras
Contributor
Contributor

Hello,

I'd like to add:

My windows 2003 std 64-bit with mcafee virusscan v8.5 patch 6 also give a BSOD on a vmotion.

The processors i am running on are: amd opetron 2356 quadcores.

The same machines on Intel Xeon dual cores give no problem at all.

So it seems to be a combination of: processor type - virusscan 8.5 - windows 64-bit.

I hope there will be a fix soon. Ow, the attachment you get to vmotion: "sure we can work on that server we just vmo... ow, sorry that is not possible atm".

Good luck with finding a solution all!

0 Kudos
wvanaa
Contributor
Contributor

We have had the same problem and have a fix!!!

Our environment:

- Several sites, each site has one cluster (HA & DRS), all running ESX 3.5U2 with all latest patches to date and HP Insight Agents v8.1

- One central Virtual Center server covering all sites

- Main site has 1 Enhanced VMotion Cluster with DRS and HA for several HP DL385G1 and several HP DL585G5 hosts

- All servers have 1 dedicated 1000Gb NIC for Service Console

- All servers have 1 dedicated 1000Gb NIC for VMotion that is the failover NIC for the Service Console

- All servers have 2 additional NICs each with its own port group that fails over to the opposite port group. All virtual servers run through these 2 NICs.

- All servers at the main site have the latest BIOS to date downloaded from the Hp web site.

- Main site servers DL385G1 have the "No-Execute Page Protection" still ENABLED (default)

- Main site servers DL585G5 have the "No-Execute Page Protection" ENABLED to allow Enhanced VMotion with the DL385G1 hosts

- As a corporate standard, we utilize McAfee ePolicy Orchestrator v4.x (latest) so all servers are running the McAfee Agent v4.0.0.1180; System Compliance Profiler v2.0.0.189; Product Coverage Report v4.0.0.1180

- As a corporate standard, we have McAfee VirusScan Enterprise v8.5i with Patch 5 (re-released build from McAfee) rolling out to all servers EXCEPT Windows 2000 Citrix servers; Patch 6 is still in test and had NO impact

Issues:

- VMotion of ANY Windows 2003 Service Pack X 64-bit virtual or Windows 2008 Service Pack 1 64-bit virtual from HP DL585G5 to another HP DL585G5 or to Hp DL385G1 caused BSOD referencing mfehidk.sys.

- VMotion of same virtual from HP DL385G1 to another HP DL385G1 worked without issue

- VMotion of Windows 200x 64-bit virtual with McAfee VirusScan Enterprise 8.5i uninstalled caused larger than normal latency (100 to 256ms ping times for 28 pings) during VMotion

Solution:

- Disable AMD Virtualization BIOS setting and all is well!!! Or at least this problem goes away!

0 Kudos
Triple78
Contributor
Contributor

In the HP intergrated VMware ESX Server 3i 'Getting Started' documentation it states:

"For CPU specific virtualization capabilities, you can select Intel Virtualization Technology or AMD Virtualization. You must perform this step for supporting 64bit Windows operating systems."

So what are the issues that we will potentially encounter by switching this option off????

0 Kudos
wvanaa
Contributor
Contributor

Just for clarification, this is the BIOS setting within the newer servers called "AMD Virtualization". This is NOT a software option within VMware ESX 3.5 or ESXi.

I am finishing up DISABLING this single "AMD Virtualization" BIOS setting after moving around many virtual servers without incident.

0 Kudos
Triple78
Contributor
Contributor

Yes thats right in the BIOS, Adv Options>Processor Options>AMD Virtualization.

Just stating that the HP Doco says to enable this if its going to have VM on it or a member of a farm. Just wondering what might not work or perform properly if this setting is disabled.

0 Kudos
wvanaa
Contributor
Contributor

Fair enough. My understanding is that the AMD-V is to take some of the functions of the hypervisor and bring it into the CPU by enabling the AMD-V option. In our tests today, we had EXTREME latency when VMotioning ANY 64bit OS from and AMD-V enabled host to ANY other host (AMD-V enabled or NOT). Couple that with McAfee VirusScan 8.5i and I got a BSOD.

After DISABLING the feature, all functions of ESX remain, the "performance gian" of AMD-V is the ONLY "loss"...

0 Kudos
Zathras
Contributor
Contributor

I can confirm that Disabling AMD Virtualisation fixes the vmotion issue. Also the 64 bit hosts migrate much smoother.

Although it's a real shame we should disable this since the features AMD Virtualisation seem very nice to have:

But i'd rather have a working enviroment then a fast one that crashes machine's.

I hope it's all because the AMD Virtualisation extensions are new, and still buggy. Can't wait to see a patch from VMWare for this.

0 Kudos
vmware_lic
Enthusiast
Enthusiast

Thank you for this feedback... I have tried the same thing Disabling the AMD-V feature in ther BIOS and I am able to VMotion Windows 64 bit Guests.

I consider this a temporary work around for now.

I have a open call with VMware regarding this issue .. will updateonce I get some feed back.

I would like to know what I am losing, features, performance or what ever it is by disabling this in the BIOS.

Thanks

0 Kudos
pizaro
Enthusiast
Enthusiast

VMware has made an official KB for acknowledgement and workaround of this issue - please refer to KB 1007072 (http://kb.vmware.com/kb/1007072) for more information.

In regards to any side effect of disabling AMD-V/AMD Virtualization ---realistically, you would probably notice little to no performance degradation.

-wen

0 Kudos
mike_laspina
Champion
Champion

Hello,

This appears to be a fairly common event with some of Mcafee's products. Since its during a vmotion I would suspect it's a vmware tools and mfehidk.sys conflict.

Possible workarounds.

Try turning off buffer overflow protection.

Try disabling the vmware tools balloon driver. (Device manager -> view hidden)

Never mind, I should have read a little further. Maybe I should carry the AMD info to the Mcafee forum.

http://blog.laspina.ca/ vExpert 2009
0 Kudos
admin
Immortal
Immortal

Dear VMware Customers,

We have identified the root cause of the issue and will provide a fix in an upcoming patch release, which will be available shortly.

Thanks for your support and best regards,

VMware ESX Product Team

0 Kudos
bertdb
Virtuoso
Virtuoso

As far as I can see, this issue has now been addressed in patch ESX350-200809404-SG

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100708...

0 Kudos
vmware_lic
Enthusiast
Enthusiast

Thank you for all the feedback .. I have applied the fix in the link below and my initial testing of vmotion of 64 bit Windows Guest OS looks very promising .

FYI .. I have enabled AMD Virtualization in the BIOS . Prior to the fix it would BSOD if this was enabled.

So far so good .

Thanks All

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100708...

0 Kudos
joergriether
Hot Shot
Hot Shot

Hi,

I have to warm this one up again, because we experienced a very strange behaviour on a w2008 x64 machine on an esx3.5u2. Vscan 8.5 installed. The machine

was vmotioned somewhere in the past and was rarely used (terminal server for testing purposes) today we discovered that the whole windows

instalation was totally f***ed up. During reboot it discovered about 1000 file system coruptions, tried to fix this and the windows came up but nearly

all services were dead and produced failures. When checking the eventlogs I discovered that all started with mcafee framework errors.

ESX Hosts are DELL 2950, SAN is Equallogic.

Anyone out there who actually discovered my thoughts/experiences and/or has an explanation?

best regards

Joerg

0 Kudos
tfranklin
Contributor
Contributor

I too have the same issue and I am not running any McAfee software. I have about 10 Win2k8 x64 VMs and when they vmotion the blue screen. I am working to apply all outstanding updates to my hosts as of this post to see if this helps. Hosts are Dell 2950's with 16GB ram and Dual Proc Quad Core Xeons. SAN is Equallogic.

0 Kudos
jareed
Contributor
Contributor

I have been trying to track down a very similar issue, and the common thread is being on ESX 3.5 anything under U4 and Windows 2008 64bit causes BSOD on Vmotion. After getting on latest ESX 3.5 updates, issues went away, and this is NOT an issue in vSphere 4.0.

0 Kudos