We're experiencing an issue with nested virtualization, with our custom hypervisor running nested inside vmware VM.
The issue is happening on vmware version 7.0.3 (20395099), running on hardware VMware ESXi, 7.0.2, 18538813 (ProLiant DL360 Gen10), while it is not happening on vmware version 6.7.0 (8170161) running on hardware VMware ESXi, 6.7.0, 8169922 (PowerEdge R430).
The problem:
Our hypervisor is stuck in infinite EPT access violation fault loop, even though we believe we correctly update EPT tables to avoid the fault. Detecting this situation and doing a spurious EPT entry update (writing identical value as what is already present in there) is a workaround which fixes the problem.
Example scenario:
1) We have a VM ("Fred") running in our hypervisor ("uxen"), on top of vmware VM
2) "Fred" is starting up, originally all EPT entries are invalid(not present) since we populate them on demand.
3) "Fred" accesses 4k page at guest physical address 0x0000002fc44000.
4) EPT page table entry (EPTE) corresponding to this gpa currently holds value of 0x6fffffffffe330, low bits "0" indicating that the entry is currently not present
5) Uxen correctly receives EPT access violation fault for gpa 0x0000002fc44000 and updates EPTE from 0x6fffffffffe330 to 0x75a00377, making the EPTE present.
6) "Fred" resumes execution
7) Uxen incorrectly receives exactly same EPT access violation fault for gpa 0x0000002fc44000. Because it tracks that is has already updated EPTE, it does not do it again. "Fred" resumes execution again, and at this point we have an infite fault loop of identical EPT access violation faults.
It should be noted that we don't do INVEPT after EPTE update at step (7) - it is not required since we're updating from non-present to present entry. However, adding a spurious INVEPTs there does not fix the problem.
The fix/workaround on uxen side is to modify step (7) to detect whether the fault is exactly identical to the one just before it, and if so, rewrite the EPTE entry again, writing 0x75a00377 even though that value is already there.
Thus we believe there might be a bug(race?) on the vmware side, and that there are cases the original EPTE update is lost. The problem typically happens when "Fred" is booting, and The reproduction rate of the problem is about 1/100 reboots.