lxylxy123456
Contributor
Contributor

Bug: NMI incorrectly blocked in guest if host blocks NMI and injects interrupt to guest

Hi,

I am recently experimenting with how VMware handles NMI interrupts in nested virtualization, and I find that the behavior of VMware is incorrect in a corner case.

The version of VMware and my host OS are:

  • Product: VMware(R) Workstation 17 Pro
  • Version: 17.0.1 build-21139696
  • Host OS: Debian 11, kernel version 5.10.0-21-amd64
  • Host CPU: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz

To reproduce this bug:

  1. Create a new virtual machine, choose "Other" as guest OS
  2. Open "Edit Virtual Machine Settings"
  3. Remove the default disk, add attached VMDK file (d.vmdk) instead
  4. Change number of processors from 1 to 2
  5. Enable "Virtualize Intel VT-x/EPT or AMD-V/RVI"
  6. Add a serial port to the virtual machine, be able to read it (e.g. use output file)
  7. Keep other configurations as default (for me, it is 256 MB memory, ...)
  8. Start the virtual machine
  9. Observe the serial port output

The source code of the VMDK file can be found at: https://github.com/lxylxy123456/uberxmhf/blob/9eb50d71910b2586c11f92462f37096a7066502b/xmhf/src/xmhf...

Actual output (on VMware):

 

 

...
Detecting environment
End detecting environment
Experiment: 17
  Enter host, exp=17, state=0
    hlt_wait() begin, source =  EXIT_NMI_H   (5)
      Inject NMI
      Interrupt recorded:       EXIT_NMI_H   (5)
      At instruction:           0x00000000
      VM-exit reason:           0x00000012
    hlt_wait() end
    hlt_wait() begin, source =  EXIT_TIMER_H (6)
      Inject interrupt
      Interrupt recorded:       EXIT_TIMER_H (6)
    hlt_wait() end
    hlt_wait() begin, source =  EXIT_TIMER_H (6)
      Inject NMI
      Inject interrupt
      Interrupt recorded:       EXIT_TIMER_H (6)
    hlt_wait() end
  Leave host
      Interrupt recorded:       EXIT_NMI_H   (5)
      At instruction:           0x00000000
      VM-exit reason:           0x0000000a
CPU(0x02): key press: 65, guest=1

source:      EXIT_VMEXIT  (7)
exit_source: EXIT_NMI_H   (5)
rip:         0x082013df
exit_rip:    0x08208488
TEST_ASSERT '0 && (exit_source == source)' failed, line 372, file lhv-guest.c

 

 

Expected output (reproducible on real Intel hardware):

 

 

...
Detecting environment
End detecting environment
Experiment: 17
  Enter host, exp=17, state=0
    hlt_wait() begin, source =  EXIT_NMI_H   (5)
      Inject NMI
      Interrupt recorded:       EXIT_NMI_H   (5)
      At instruction:           0x00000000
      VM-exit reason:           0x00000012
    hlt_wait() end
    hlt_wait() begin, source =  EXIT_TIMER_H (6)
      Inject interrupt
      Interrupt recorded:       EXIT_TIMER_H (6)
    hlt_wait() end
    hlt_wait() begin, source =  EXIT_TIMER_H (6)
      Inject NMI
      Inject interrupt
      Interrupt recorded:       EXIT_TIMER_H (6)
    hlt_wait() end
  Leave host
      Interrupt recorded:       EXIT_VMEXIT  (7)
CPU(0x01): key press: 250, guest=1
  Enter host, exp=17, state=1
    iret_wait() begin, source = EXIT_MEASURE (1)
    iret_wait() end
  Leave host
Experiment: 1
... (endless)

 

 

Explanation:

The VMDK file (d.vmdk) contains a micro-hypervisor called LHV. Assume VMware runs in L0, LHV runs in L1, the guest of LHV runs in L2.

The code in LHV performs an experiment (called "Experiment 17" in serial output) on CPU 0 to test the behavior of NMI blocking. The experiment steps are:

  1. Prepare state such that the CPU is currently in L1 (LHV), and NMI is blocked
  2. An NMI interrupt arrives at the CPU. However, since NMI is blocked, the NMI interrupt handler of L1 is not invoked
  3. Modify VMCS to make sure that L2 has virtual NMIs enabled (NMI exiting = 1, Virtual NMIs = 1), and L2 blocks NMI (Blocking by NMI = 1)
  4. Modify VMCS to inject a normal interrupt (vector 0x21) to L2 at VM entry
  5. VM entry to L2

The expected behavior is:

  • 6. Immediately after VM entry L2's interrupt 0x21 handler is invoked
  • 7. VM exit happens immediately due to the NMI interrupt at step 2

However, on VMware, the behavior appears to be:

  • 6. Immediately after VM entry L2's interrupt 0x21 handler is invoked
  • 7. After executing some instructions, L2 executes the CPUID instruction, which causes VM exit to L1
  • 8. Immediately after VM exit, L1's NMI interrupt handler is executed

It appears that VMware's implementation is incorrect. NMI is blocked in L2, but Intel's SDM says that NMI is always unblocked in L2. Quote from Intel SDM:

The following items describe the use of bit 3 (blocking by NMI) in the interruptibility-state field if the “virtual NMIs” VM-execution control is 1:

  • The bit’s value does not affect the blocking of NMIs after VM entry. NMIs are not blocked in VMX non-root operation (except for ordinary blocking for other reasons, such as by the MOV SS instruction, the wait-for-SIPI state, etc.)
  • ...

Could you please fix this implementation problem in VMware? Thank you.

Labels (3)
Reply
0 Kudos
bluefirestorm
Champion
Champion

It is best to specify also your host CPU model.

Virtualised interrupt delivery and posted process interrupts are features that are available in Xeon processors but not in Intel desktop or mobile CPUs. So these features might affect what you see in your experiments.

You can check the vmware.log

Process posted interrupts (from Ivy Bridge and newer) and virtual-interrupt delivery (Haswell and newer). A value of {0} means it is not available on the CPU. {0,1} would be mean it is.

Reply
0 Kudos
lxylxy123456
Contributor
Contributor

Thanks for the reply. I have edited the bug report to include my host CPU model: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz

Since this bug is relevant to non-maskable interrupts (NMIs), I think virtualised interrupt delivery and posted process interrupts are likely unrelated. According to SDM, these features are only related to maskable interrupts.

Just FYI, here are the guest VMX capabilities printed in vmware.log:

Guest VT-x Capabilities:
Basic VMX Information (0x00d8100000000001)
  VMCS revision ID                           1
  VMCS region length                      4096
  VMX physical-address width           natural
  SMM dual-monitor mode                     no
  VMCS memory type                          WB
  Advanced INS/OUTS info                   yes
  True VMX MSRs                            yes
  Exception Injection ignores error code    no
True Pin-Based VM-Execution Controls (0x0000003f00000016)
  External-interrupt exiting               {0,1}
  NMI exiting                              {0,1}
  Virtual NMIs                             {0,1}
  Activate VMX-preemption timer            { 0 }
  Process posted interrupts                { 0 }
True Primary Processor-Based VM-Execution Controls (0xfff9fffe04006172)
  Interrupt-window exiting                 {0,1}
  Use TSC offsetting                       {0,1}
  HLT exiting                              {0,1}
  INVLPG exiting                           {0,1}
  MWAIT exiting                            {0,1}
  RDPMC exiting                            {0,1}
  RDTSC exiting                            {0,1}
  CR3-load exiting                         {0,1}
  CR3-store exiting                        {0,1}
  Activate tertiary controls               { 0 }
  CR8-load exiting                         {0,1}
  CR8-store exiting                        {0,1}
  Use TPR shadow                           {0,1}
  NMI-window exiting                       {0,1}
  MOV-DR exiting                           {0,1}
  Unconditional I/O exiting                {0,1}
  Use I/O bitmaps                          {0,1}
  Monitor trap flag                        {0,1}
  Use MSR bitmaps                          {0,1}
  MONITOR exiting                          {0,1}
  PAUSE exiting                            {0,1}
  Activate secondary controls              {0,1}
Secondary Processor-Based VM-Execution Controls (0x00553cfe00000000)
  Virtualize APIC accesses                 { 0 }
  Enable EPT                               {0,1}
  Descriptor-table exiting                 {0,1}
  Enable RDTSCP                            {0,1}
  Virtualize x2APIC mode                   {0,1}
  Enable VPID                              {0,1}
  WBINVD exiting                           {0,1}
  Unrestricted guest                       {0,1}
  APIC-register virtualization             { 0 }
  Virtual-interrupt delivery               { 0 }
  PAUSE-loop exiting                       {0,1}
  RDRAND exiting                           {0,1}
  Enable INVPCID                           {0,1}
  Enable VM Functions                      {0,1}
  Use VMCS shadowing                       { 0 }
  ENCLS exiting                            { 0 }
  RDSEED exiting                           {0,1}
  Enable PML                               { 0 }
  EPT-violation #VE                        {0,1}
  Conceal VMX from PT                      { 0 }
  Enable XSAVES/XRSTORS                    {0,1}
  PASID translation                        { 0 }
  Mode-based execute control for EPT       {0,1}
  Sub-page write permissions for EPT       { 0 }
  PT uses guest physical addresses         { 0 }
  Use TSC scaling                          { 0 }
  Enable UMWAIT and TPAUSE                 { 0 }
  Enable ENCLV in VMX non-root mode        { 0 }
  Enable EPC Virtualization Extensions     { 0 }
  Bus lock exiting                         { 0 }
  Notification VM exits                    { 0 }
Tertiary Processor-Based VM-Execution Controls (0x0000000000000000)
  LOADIWKEY exiting                          no
  Enable HLAT                                no
  Enable Paging-Write                        no
  Enable Guest Paging Verification           no
  Enable IPI Virtualization                  no
True VM-Exit Controls (0x003fefff00036dfb)
  Save debug controls                      {0,1}
  Host address-space size                  {0,1}
  Load IA32_PERF_GLOBAL_CTRL               { 0 }
  Acknowledge interrupt on exit            {0,1}
  Save IA32_PAT                            {0,1}
  Load IA32_PAT                            {0,1}
  Save IA32_EFER                           {0,1}
  Load IA32_EFER                           {0,1}
  Save VMX-preemption timer                { 0 }
  Clear IA32_BNDCFGS                       { 0 }
  Conceal VMX from processor trace         { 0 }
  Clear IA32_RTIT MSR                      { 0 }
  Clear IA32_LBR_CTL MSR                   { 0 }
  Clear user-interrupt notification vector { 0 }
  Load CET state                           { 0 }
  Load PKRS                                { 0 }
True VM-Entry Controls (0x0000d3ff000011fb)
  Load debug controls                      {0,1}
  IA-32e mode guest                        {0,1}
  Entry to SMM                             { 0 }
  Deactivate dual-monitor mode             { 0 }
  Load IA32_PERF_GLOBAL_CTRL               { 0 }
  Load IA32_PAT                            {0,1}
  Load IA32_EFER                           {0,1}
  Load IA32_BNDCFGS                        { 0 }
  Conceal VMX from processor trace         { 0 }
  Load IA32_RTIT MSR                       { 0 }
  Load user-interrupt notification vector  { 0 }
  Load CET state                           { 0 }
  Load IA32_LBR_CTL MSR                    { 0 }
  Load PKRS                                { 0 }
VPID and EPT Capabilities (0x00000f0106714141)
  R=0/W=0/X=1                               yes
  Page-walk length 3                        yes
  EPT memory type WB                        yes
  2MB super-page                            yes
  1GB super-page                             no
  INVEPT support                            yes
  Access & Dirty Bits                       yes
  Advanced VM exit information for EPT violations   yes
  Supervisor shadow-stack control            no
  Type 1 INVEPT                             yes
  Type 2 INVEPT                             yes
  INVVPID support                           yes
  Type 0 INVVPID                            yes
  Type 1 INVVPID                            yes
  Type 2 INVVPID                            yes
  Type 3 INVVPID                            yes
Miscellaneous VMX Data (0x00000000400401e0)
  TSC to preemption timer ratio      0
  VM-Exit saves EFER.LMA           yes
  Activity State HLT               yes
  Activity State shutdown          yes
  Activity State wait-for-SIPI     yes
  Processor trace in VMX            no
  RDMSR SMBASE MSR in SMM           no
  CR3 targets supported              4
  Maximum MSR list size            512
  VMXOFF holdoff of SMIs            no
  Allow all VMWRITEs                no
  Allow zero instruction length    yes
  MSEG revision ID                   0
VMX-Fixed Bits in CR0 (0x0000000080000021/0x00000000ffffffff)
  Fixed to 0        0xffffffff00000000
  Fixed to 1        0x0000000080000021
  Variable          0x000000007fffffde
VMX-Fixed Bits in CR4 (0x0000000000002000/0x00000000003727ff)
  Fixed to 0        0xffffffffffc8d800
  Fixed to 1        0x0000000000002000
  Variable          0x00000000003707ff
VMCS Enumeration (0x000000000000005a)
  Highest index                   0x2d
VM Functions (0x0000000000000001)
  Function  0 (EPTP-switching) supported.

 

Reply
0 Kudos
Technogeezer
Immortal
Immortal

I would suggest that you open a service request with VMware if you want to make sure some action is taken on this by VMware. This forum is not connected to VMware technical support or development engineers. Posting a bug report here does not guarantee that VMware will see it or act on it. VMware employees do not regularly monitor this forum and if they do it's on their own time.

 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
lxylxy123456
Contributor
Contributor

Thanks for letting me know. Interesting, I posted this bug here because I saw https://www.vmware.com/support/policies/defect.html . Do I need something special (e.g. active support agreement) to open a service request? When I am trying to post a service request I do not see anything under "Supported Products".

Reply
0 Kudos
Technogeezer
Immortal
Immortal

If you don't have a service contract, you can purchase per-incident support from the VMware Online Store. That's essential a "credit" or "voucher" for you to open a service request. Opening a service request online should then ask you for the "serial number" or something of that ilk that represents the per-incident support that you purchased in the store.

If you were within 30 days of purchasing a new or upgrade license, they have complimentary support to open a support request. The bad thing about that is that most folks don't find things that need attention until after that 30 days are up.

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos