Hi,
I am recently experimenting with how VMware handles NMI interrupts in nested virtualization, and I find that the behavior of VMware is incorrect in a corner case.
The version of VMware and my host OS are:
To reproduce this bug:
The source code of the VMDK file can be found at: https://github.com/lxylxy123456/uberxmhf/blob/9eb50d71910b2586c11f92462f37096a7066502b/xmhf/src/xmhf...
Actual output (on VMware):
...
Detecting environment
End detecting environment
Experiment: 17
Enter host, exp=17, state=0
hlt_wait() begin, source = EXIT_NMI_H (5)
Inject NMI
Interrupt recorded: EXIT_NMI_H (5)
At instruction: 0x00000000
VM-exit reason: 0x00000012
hlt_wait() end
hlt_wait() begin, source = EXIT_TIMER_H (6)
Inject interrupt
Interrupt recorded: EXIT_TIMER_H (6)
hlt_wait() end
hlt_wait() begin, source = EXIT_TIMER_H (6)
Inject NMI
Inject interrupt
Interrupt recorded: EXIT_TIMER_H (6)
hlt_wait() end
Leave host
Interrupt recorded: EXIT_NMI_H (5)
At instruction: 0x00000000
VM-exit reason: 0x0000000a
CPU(0x02): key press: 65, guest=1
source: EXIT_VMEXIT (7)
exit_source: EXIT_NMI_H (5)
rip: 0x082013df
exit_rip: 0x08208488
TEST_ASSERT '0 && (exit_source == source)' failed, line 372, file lhv-guest.c
Expected output (reproducible on real Intel hardware):
...
Detecting environment
End detecting environment
Experiment: 17
Enter host, exp=17, state=0
hlt_wait() begin, source = EXIT_NMI_H (5)
Inject NMI
Interrupt recorded: EXIT_NMI_H (5)
At instruction: 0x00000000
VM-exit reason: 0x00000012
hlt_wait() end
hlt_wait() begin, source = EXIT_TIMER_H (6)
Inject interrupt
Interrupt recorded: EXIT_TIMER_H (6)
hlt_wait() end
hlt_wait() begin, source = EXIT_TIMER_H (6)
Inject NMI
Inject interrupt
Interrupt recorded: EXIT_TIMER_H (6)
hlt_wait() end
Leave host
Interrupt recorded: EXIT_VMEXIT (7)
CPU(0x01): key press: 250, guest=1
Enter host, exp=17, state=1
iret_wait() begin, source = EXIT_MEASURE (1)
iret_wait() end
Leave host
Experiment: 1
... (endless)
Explanation:
The VMDK file (d.vmdk) contains a micro-hypervisor called LHV. Assume VMware runs in L0, LHV runs in L1, the guest of LHV runs in L2.
The code in LHV performs an experiment (called "Experiment 17" in serial output) on CPU 0 to test the behavior of NMI blocking. The experiment steps are:
The expected behavior is:
However, on VMware, the behavior appears to be:
It appears that VMware's implementation is incorrect. NMI is blocked in L2, but Intel's SDM says that NMI is always unblocked in L2. Quote from Intel SDM:
The following items describe the use of bit 3 (blocking by NMI) in the interruptibility-state field if the “virtual NMIs” VM-execution control is 1:
Could you please fix this implementation problem in VMware? Thank you.
It is best to specify also your host CPU model.
Virtualised interrupt delivery and posted process interrupts are features that are available in Xeon processors but not in Intel desktop or mobile CPUs. So these features might affect what you see in your experiments.
You can check the vmware.log
Process posted interrupts (from Ivy Bridge and newer) and virtual-interrupt delivery (Haswell and newer). A value of {0} means it is not available on the CPU. {0,1} would be mean it is.
Thanks for the reply. I have edited the bug report to include my host CPU model: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz
Since this bug is relevant to non-maskable interrupts (NMIs), I think virtualised interrupt delivery and posted process interrupts are likely unrelated. According to SDM, these features are only related to maskable interrupts.
Just FYI, here are the guest VMX capabilities printed in vmware.log:
Guest VT-x Capabilities:
Basic VMX Information (0x00d8100000000001)
VMCS revision ID 1
VMCS region length 4096
VMX physical-address width natural
SMM dual-monitor mode no
VMCS memory type WB
Advanced INS/OUTS info yes
True VMX MSRs yes
Exception Injection ignores error code no
True Pin-Based VM-Execution Controls (0x0000003f00000016)
External-interrupt exiting {0,1}
NMI exiting {0,1}
Virtual NMIs {0,1}
Activate VMX-preemption timer { 0 }
Process posted interrupts { 0 }
True Primary Processor-Based VM-Execution Controls (0xfff9fffe04006172)
Interrupt-window exiting {0,1}
Use TSC offsetting {0,1}
HLT exiting {0,1}
INVLPG exiting {0,1}
MWAIT exiting {0,1}
RDPMC exiting {0,1}
RDTSC exiting {0,1}
CR3-load exiting {0,1}
CR3-store exiting {0,1}
Activate tertiary controls { 0 }
CR8-load exiting {0,1}
CR8-store exiting {0,1}
Use TPR shadow {0,1}
NMI-window exiting {0,1}
MOV-DR exiting {0,1}
Unconditional I/O exiting {0,1}
Use I/O bitmaps {0,1}
Monitor trap flag {0,1}
Use MSR bitmaps {0,1}
MONITOR exiting {0,1}
PAUSE exiting {0,1}
Activate secondary controls {0,1}
Secondary Processor-Based VM-Execution Controls (0x00553cfe00000000)
Virtualize APIC accesses { 0 }
Enable EPT {0,1}
Descriptor-table exiting {0,1}
Enable RDTSCP {0,1}
Virtualize x2APIC mode {0,1}
Enable VPID {0,1}
WBINVD exiting {0,1}
Unrestricted guest {0,1}
APIC-register virtualization { 0 }
Virtual-interrupt delivery { 0 }
PAUSE-loop exiting {0,1}
RDRAND exiting {0,1}
Enable INVPCID {0,1}
Enable VM Functions {0,1}
Use VMCS shadowing { 0 }
ENCLS exiting { 0 }
RDSEED exiting {0,1}
Enable PML { 0 }
EPT-violation #VE {0,1}
Conceal VMX from PT { 0 }
Enable XSAVES/XRSTORS {0,1}
PASID translation { 0 }
Mode-based execute control for EPT {0,1}
Sub-page write permissions for EPT { 0 }
PT uses guest physical addresses { 0 }
Use TSC scaling { 0 }
Enable UMWAIT and TPAUSE { 0 }
Enable ENCLV in VMX non-root mode { 0 }
Enable EPC Virtualization Extensions { 0 }
Bus lock exiting { 0 }
Notification VM exits { 0 }
Tertiary Processor-Based VM-Execution Controls (0x0000000000000000)
LOADIWKEY exiting no
Enable HLAT no
Enable Paging-Write no
Enable Guest Paging Verification no
Enable IPI Virtualization no
True VM-Exit Controls (0x003fefff00036dfb)
Save debug controls {0,1}
Host address-space size {0,1}
Load IA32_PERF_GLOBAL_CTRL { 0 }
Acknowledge interrupt on exit {0,1}
Save IA32_PAT {0,1}
Load IA32_PAT {0,1}
Save IA32_EFER {0,1}
Load IA32_EFER {0,1}
Save VMX-preemption timer { 0 }
Clear IA32_BNDCFGS { 0 }
Conceal VMX from processor trace { 0 }
Clear IA32_RTIT MSR { 0 }
Clear IA32_LBR_CTL MSR { 0 }
Clear user-interrupt notification vector { 0 }
Load CET state { 0 }
Load PKRS { 0 }
True VM-Entry Controls (0x0000d3ff000011fb)
Load debug controls {0,1}
IA-32e mode guest {0,1}
Entry to SMM { 0 }
Deactivate dual-monitor mode { 0 }
Load IA32_PERF_GLOBAL_CTRL { 0 }
Load IA32_PAT {0,1}
Load IA32_EFER {0,1}
Load IA32_BNDCFGS { 0 }
Conceal VMX from processor trace { 0 }
Load IA32_RTIT MSR { 0 }
Load user-interrupt notification vector { 0 }
Load CET state { 0 }
Load IA32_LBR_CTL MSR { 0 }
Load PKRS { 0 }
VPID and EPT Capabilities (0x00000f0106714141)
R=0/W=0/X=1 yes
Page-walk length 3 yes
EPT memory type WB yes
2MB super-page yes
1GB super-page no
INVEPT support yes
Access & Dirty Bits yes
Advanced VM exit information for EPT violations yes
Supervisor shadow-stack control no
Type 1 INVEPT yes
Type 2 INVEPT yes
INVVPID support yes
Type 0 INVVPID yes
Type 1 INVVPID yes
Type 2 INVVPID yes
Type 3 INVVPID yes
Miscellaneous VMX Data (0x00000000400401e0)
TSC to preemption timer ratio 0
VM-Exit saves EFER.LMA yes
Activity State HLT yes
Activity State shutdown yes
Activity State wait-for-SIPI yes
Processor trace in VMX no
RDMSR SMBASE MSR in SMM no
CR3 targets supported 4
Maximum MSR list size 512
VMXOFF holdoff of SMIs no
Allow all VMWRITEs no
Allow zero instruction length yes
MSEG revision ID 0
VMX-Fixed Bits in CR0 (0x0000000080000021/0x00000000ffffffff)
Fixed to 0 0xffffffff00000000
Fixed to 1 0x0000000080000021
Variable 0x000000007fffffde
VMX-Fixed Bits in CR4 (0x0000000000002000/0x00000000003727ff)
Fixed to 0 0xffffffffffc8d800
Fixed to 1 0x0000000000002000
Variable 0x00000000003707ff
VMCS Enumeration (0x000000000000005a)
Highest index 0x2d
VM Functions (0x0000000000000001)
Function 0 (EPTP-switching) supported.
I would suggest that you open a service request with VMware if you want to make sure some action is taken on this by VMware. This forum is not connected to VMware technical support or development engineers. Posting a bug report here does not guarantee that VMware will see it or act on it. VMware employees do not regularly monitor this forum and if they do it's on their own time.
Thanks for letting me know. Interesting, I posted this bug here because I saw https://www.vmware.com/support/policies/defect.html . Do I need something special (e.g. active support agreement) to open a service request? When I am trying to post a service request I do not see anything under "Supported Products".
If you don't have a service contract, you can purchase per-incident support from the VMware Online Store. That's essential a "credit" or "voucher" for you to open a service request. Opening a service request online should then ask you for the "serial number" or something of that ilk that represents the per-incident support that you purchased in the store.
If you were within 30 days of purchasing a new or upgrade license, they have complimentary support to open a support request. The bad thing about that is that most folks don't find things that need attention until after that 30 days are up.
