VMware Communities
ben_m1
Contributor
Contributor

Can't boot any Linux 3.x kernels in WS7/Player3 on Intel i5. Any ideas?

Because we use ACE we're tied to Using Workstation 7.1.6 / Player 3.1.6.

We've got several PCs with latest-generation intel procesors (i7-3770, i5-3450, i5-3570) and on all of them (but not older Core2 Duos, etc). If I create a new VM and try to use a stock Ubuntu install ISO for 11.10 or newer then boot fails almost instantaneously, with a message saying "The CPU has been disabled by the guest operating system. You will need to power off or reset the virtual machine at this point". The VM console is completely black.

Looking inside the VM log files, everything is as normal, then there're these lines;

Jun 27 15:35:39.398: vcpu-0| X86Fault_Warning: vmcore/vmm64/cpu/interp.c:398: cs:eip=0x60:0xc0135838 fault=13
Jun 27 15:35:39.398: vcpu-0| Vix: [3416 vmxCommands.c:9705]: VMAutomation_HandleCLIHLTEvent. Do nothing.
Jun 27 15:35:39.398: vcpu-0| Exiting on CLI;HLT at 0x60:0xc0685ff5

After which the log shows a clean shutdown.

Looking at the system.map for the kernel in question,  the 0xc0135838 address corresponds to the "native_write_cr4", which I deduce is triggering the 'early_protection_fault' routine in the linux kernel, which eventually double-faults while trying to call 'printk' and enters the 'hlt_loop' function, which in the kernel we're using is at the 0xc0685ff5 address as listed in the VM log.

I've tried every single VM parameter I can think of or find on the net in the VMX file, including systematicaly flipping every single cpuid bit. Running the install on another PC and then copying the built VM over hits the exact same problem (with different cs:eip offsets depending on the kernel used.) As far as I can tell, this happens with any Linux kernel newer than about 3.0. I've also tried flipping every remotely relevant setting in the host PCs BIOS (disabling VT-X, CPU cores, etc.)

I've also tried pretty much every linux kernel parameter for disabling subsystems or drivers, changing earlyprintk, etc with no joy.

There's nothing relevant looking in the host OS (Win 7/x64) logs. I've reinstalled everything more times than I care to think. This also happens with the previous versions of WS and player (7.1.5/3.1.5)

Does anyone have any ideas, or has seen anything like this before?

0 Kudos
7 Replies
ben_m1
Contributor
Contributor

Long story short, this is because of SMEP (Supervisor Mode Execution Protection) support being added to the Linux kernel, which makes the kernel set the SMEP bit (bit 20) in the CR4 register, which kills the VM.

The "nosmep" kernel parameter doesn't fix this, as the kernel then writes 0 to the register (to disable SMEP) and on Win7+ this faults (as the bit is already set to 1, and setting it 0 is a supervisor operation.)

Theoretically using CPUID masking to clear the SMEP feature bit (bit 7 of EBX for level 7, page 0 of the CPUID data) should fix this, but the CPUID masking in VMWare doesn't seem to support masking this bit.

The only solution I've found is to recompile the kernel, commenting out most of setup_smep in /arch/x86/kernel/cpu/common.c

Anyone got any ideas on how to do the CPUID masking?

0 Kudos
ttinker
Contributor
Contributor

Ben,

I'm having the exact same issue with this on two different VM's, one Ubuntu and one Fedora.  Windows VM and old Linux VM's are not affected.

Could you expand on your solution?  What did you comment out in setup_smep?  Thanks.

Todd

0 Kudos
admin
Immortal
Immortal

ben_m wrote:


Anyone got any ideas on how to do the CPUID masking?

Workstation 7 doesn't know about leaf 7, so you can't mask it.

ben_m1
Contributor
Contributor

ttinker wrote:

I'm having the exact same issue with this on two different VM's, one Ubuntu and one Fedora.  Windows VM and old Linux VM's are not affected.

Could you expand on your solution?  What did you comment out in setup_smep?  Thanks.

Hi Todd,

I've attached the somewhat quick and dirty patch I've used. A cleaner patch would probably be to insert the line

setup_clear_cpu_cap(X86_FEATURE_SMEP);

before the "if" statement, as that would be more resistant against bitrot.

Ben

0 Kudos
ttinker
Contributor
Contributor

My solution was to upgrade to Workstation 8.  I haven't any problem since.

0 Kudos
TilmanS
Contributor
Contributor

Any chance the fix might make it into an update for ESXi 4 which still has this problem?

0 Kudos
TimHansen201110
Contributor
Contributor

VMware Player 5 is an option too.  I know it's not the solution for ben_m, but for other circumstances it might be.

VMware Workstation 7.1.6 wouldn't run the Ubuntu 11.10 VM I moved to my new PC hardware (Core i7 3770 running Win 7 x64).  My old PC, Core 2 duo, ran the VM just fine under Workstation 7.0.1.  I didn't have a pressing need, other than this problem, to upgrade to Workstation 8 or 9, so I tried VMware Player 5.0.2, and it's running the VM just fine.

0 Kudos