jcp0wermac
Enthusiast
Enthusiast

ESXi 5.5 and Nested KVM

Hello, wondering if someone out here can give me a hand.  I am running Fedora 20 with Linux kernel version 3.15.5-200.fc20.x86_64+debug.  I have configured the VM with the following parameters, along with "expose hardware assisted virtualization to guest OS:

hypervisor.cpuid.v0 = "FALSE"

monitor.virtual_mmu = "hardware"

monitor.virtual_exec = "hardware"

cpuid.1.ecx="----:----:----:----:----:----:--h-:----"

cpuid.80000001.ecx.amd = "-----------------------------H--"

cpuid.8000000a.eax.amd="hhhh:hhhh:hhhh:hhhh:hhhh:hhhh:hhhh:hhhh"

cpuid.8000000a.ebx.amd="hhhh:hhhh:hhhh:hhhh:hhhh:hhhh:hhhh:hhhh"

cpuid.8000000a.edx.amd="hhhh:hhhh:hhhh:hhhh:hhhh:hhhh:hhhh:hhhh"

vcpu.hotadd = FALSE

apic.xapic.enabled = "FALSE"

I am running ESXi 5.5 1331820 on DL165 G5 (Opteron 2352) with vhv.allow = "TRUE" and I cannot get a virtual machine to run on nested KVM.  I get the following error when trying to start a virtual machine:

2014-07-18 18:31:48.425+0000: starting up

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/bin/qemu-kvm -name vm1 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu Opteron_G2,+osvw,+3dnowprefetch,+misalignsse,+sse4a,+abm,+cr8legacy,+extapic,+3dnow,+3dnowext,+pdpe1gb,+fxsr_opt,+mmxext,+popcnt,+x2apic,+vme -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 49d20968-c59f-4f4b-a692-13f52c59bfd2 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/cirros-0.3.2-x86_64-disk.img,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=25,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:0d:26:2f,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -chardev spicevmc,id=charredir2,name=usbredir -device usb-redir,chardev=charredir2,id=redir2 -chardev spicevmc,id=charredir3,name=usbredir -device usb-redir,chardev=charredir3,id=redir3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -S -s

Domain id=4 is tainted: custom-argv

char device redirected to /dev/pts/1 (label charserial0)

KVM internal error. Suberror: 1

emulation failure

EAX=00000000 EBX=000f195c ECX=000f6178 EDX=00000402

ESI=000f22fd EDI=00000000 EBP=000f6178 ESP=00006fb8

EIP=40000000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0

ES =0010 00000000 ffffffff 00409300 DPL=0 DS   [-WA]

CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]

SS =0010 00000000 ffffffff 00409200 DPL=0 DS   [-W-]

DS =0010 00000000 ffffffff 00409300 DPL=0 DS   [-WA]

FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]

GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]

LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT

TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy

GDT=     000f6900 00000037

IDT=     000f693e 00000000

CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000

DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000

DR6=00000000ffff0ff0 DR7=0000000000000400

EFER=0000000000000000

Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

[root@viropspaw003 qemu]#

[root@viropspaw003 ~]# cat /proc/cpuinfo

...

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl tsc_reliable nonstop_tsc pni cx16 x2apic popcnt lahf_lm svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw npt svm_lock

...

I also tried to disable KVM nesting and npt with no luck

systool -m kvm_amd -v

Module = "kvm_amd"

...

  Parameters:

    nested              = "0"

    npt                 = "0"

Attached is the vmware.log.  I also attached the KVM trace-cmd logs, don't know if that could provide some insight.

I have also tried CentOS 7 (3.10.0-123.el7.x86_64) with nested KVM which has no issues running the same virtual machine.  I am assuming something has changed between kernel versions that causes a problem. 

Thanks,

Joe

0 Kudos
3 Replies
admin
Immortal
Immortal

This is arguably a bug in kvm. See this thread for a discussion of the patch: http://www.spinics.net/lists/kvm/msg105618.html

0 Kudos
Datto
Expert
Expert

There may be several other things to look at but first, a couple ideas:

1) Some of those Opteron 2352 versions have the TLB bug in them and thus, are problematic with the capability required to get nested to work properly -- this 2352 CPU likely has the TLB bug B2 Stepping OS2352WAL4BGD -- this other 2352 probably does not have the TLB bug B3 Stepping OS2352WAL4BGH --

2) If you're still back on using virtual hardware 9 version for your ESXI VM then I would create another ESXi VM and then upgrade that 2nd one to virtual hardware 10 then use the web client to go into the CPU settings of the ESXi 5.5 VM and checkmark the check box for "expose hardware virtualization to".Best to see if you can get a Windows 64bit VM to start without errors running on your nested ESXi VM. if your ESXi host was built without any commnad line startup values manually inserted and a 64bit Windows VM starts without errors then you're problem okay and it's a Fedora problem I would next look at.

Datto

0 Kudos
Datto
Expert
Expert

Ahh..just saw JMattson's comment. Yes, look where he said first but I'd still check that CPU out also, just to be sure you don't have one of the TLB bug CPUs.

Datto

0 Kudos