After install an Oracle Linux on a VM over an ESXi 6.7 U3 host, i have faced this issue;
The ESXI 6.7 U3 fall into a crash purple screen when a new Oracle Linux 7 U9 is started with UEK.
When try to start the VM with RHCK kernel, everything runs well.
Have anyone see something like that?
Yeah, i've never seen that before.
Our support contract has expired, and i'm trying to renew it.
This weekend tryed to migrate the .vmdk to another datastore in another storage system and the issue happen again.
So i have migrate the .vmdk to a local server datastore and magicaly nothing wrong happen. One more time, migrate back the .vmdk (to storage system datastore) and the error backs again. To finish my weekend, i left the .vmdk in Local server datastore and the VM is up until now without any problems.
So that does seem to confirm that the PSOD prints relevant context, i.e. that it is related to the qfle3i driver. Check whether this happens with the most up-to-date async driver too, I saw some potentially related issues but I can't say anything without a dump.
Is this really the same VM and the only difference is the kernel used? I.e. same VMX / disks etc.?
Just for some extra "search-ability" and to explain further. The PSOD was caused by an NMI (nonmaskable interrupt), so most likely some device detects a fatal flaw and crashes the host. In most scenarios, those are caused hardware issues but it can originate in software too. I've found a similar failure pattern reported before, one or maybe two that is, so pretty rare. I can't be sure thought because that would require looking at the actual core dump. That being said, what I found _might_ be addressed in the latest async driver for 6.7 and _probably_ in the inbox driver for 7.0 (U1).
Yes, it's the same VM, when the boot menu is started i can choose wich kernel to use.
And yes, probably you are right about qfle3i driver, after install the VIB from this KB: https://kb.vmware.com/s/article/56357, my Lab host worked well with the Oracle Linux VM with the UEK.
Now i need schedule a window to remediate the prod hosts and check if the issue will solved.
For now, i need to thank you so much your idea and your help!
After apply the patch in nodes from the cluster, i back here again and give an updated feedback.
I think the issue in that KB is a different one than you experienced but as long as that updated driver also ships with a fix for what you saw it doesn't really make a difference :-). Fingers crossed it's gone in prod too!