VMware Cloud Community
adelianurlinap
Contributor
Contributor

VMWare ESXi 7 On Top KVM. VM won't Start

Hello, I'm currently succeed for deploying VMWare ESXi 7 on top KVM using CPU host-passthrough. All the SVM, hardware virtualization, etc already configured.

~ # esxcfg-info | grep "HV"

         |----HV Support............................................3

 

But the VM on top ESXi cannot start with this log :

MONITOR PANIC: Invalid VMCB.
VMware Workstation unrecoverable error: (vcpu-0)
vcpu-0:Invalid VMCB.

Information:
- libvirt 8.0.0

- OS Kernel 4.18.0-516.el8.x86_64

- BIOS Model name: AMD EPYC 7401P 24-Core Processor


Any help would be much appreciated..

Reply
0 Kudos
8 Replies
bluefirestorm
Champion
Champion

If I understand correctly, you are running ESXi inside KVM hypervisor as a VM and the VM inside the ESXi is failing to start with the panic.

Did you add

vmx.allowNested = "TRUE"

to the vmx configuration of the VM that is being powered up inside ESXi?

Reply
0 Kudos
adelianurlinap
Contributor
Contributor

Hello! Thank you so much for your reply.

I already add 

vmx.allowNested = "TRUE"

to the vmx configuration of the VM and to the /etc/vmware/config ESXi host. But the VM inside the ESXi still failing to start with this log :

Spoiler
Logs

2023-11-30T05:12:05.119Z| vcpu-0| I005: AMD-V enabled.
2023-11-30T05:12:05.119Z| vcpu-2| I005: AMD-V enabled.
2023-11-30T05:12:05.120Z| vcpu-1| I005: AMD-V enabled.
2023-11-30T05:12:05.120Z| vcpu-3| I005: AMD-V enabled.
2023-11-30T05:12:05.139Z| vcpu-1| W003: MONITOR PANIC: vcpu-0:Invalid VMCB.
2023-11-30T05:12:05.139Z| vcpu-1| I005: Core dump with build build-16555998
2023-11-30T05:12:05.139Z| vcpu-2| I005: Exiting vcpu-2
2023-11-30T05:12:05.139Z| vcpu-0| I005: Exiting vcpu-0
2023-11-30T05:12:05.139Z| vcpu-3| I005: Exiting vcpu-3
2023-11-30T05:12:05.144Z| vcpu-1| I005: Writing monitor `vmmcores.gz`
2023-11-30T05:12:05.144Z| vcpu-1| W003: Dumping core for vcpu-0
2023-11-30T05:12:05.144Z| vcpu-1| I005: VMK Stack for vcpu 0 is at 0x45389ed13000
2023-11-30T05:12:05.144Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:05.746Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:05.747Z| vcpu-1| W003: Dumping core for vcpu-1
2023-11-30T05:12:05.747Z| vcpu-1| I005: VMK Stack for vcpu 1 is at 0x45389cb13000
2023-11-30T05:12:05.747Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:06.074Z| mks| W003: Panic in progress... ungrabbing
2023-11-30T05:12:06.074Z| mks| I005: MKS: Release starting (Panic)
2023-11-30T05:12:06.074Z| mks| I005: MKS: Release finished (Panic)
2023-11-30T05:12:06.322Z| vmx| I005: MKSVMX: Vigor requested a screenshot
2023-11-30T05:12:06.322Z| svga| I005: MKSScreenShotMgr: Taking a screenshot
2023-11-30T05:12:06.359Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:06.360Z| vcpu-1| W003: Dumping core for vcpu-2
2023-11-30T05:12:06.360Z| vcpu-1| I005: VMK Stack for vcpu 2 is at 0x453886393000
2023-11-30T05:12:06.360Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:06.733Z| vmx| I005: MKSVMX: Vigor requested a screenshot
2023-11-30T05:12:06.733Z| svga| I005: MKSScreenShotMgr: Taking a screenshot
2023-11-30T05:12:06.921Z| vmx| I005: MKSVMX: Vigor requested a screenshot
2023-11-30T05:12:06.922Z| svga| I005: MKSScreenShotMgr: Taking a screenshot
2023-11-30T05:12:06.967Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:06.967Z| vcpu-1| W003: Dumping core for vcpu-3
2023-11-30T05:12:06.967Z| vcpu-1| I005: VMK Stack for vcpu 3 is at 0x4538a0013000
2023-11-30T05:12:06.967Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:07.570Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:08.644Z| vcpu-1| W003: A core file is available in "/vmfs/volumes/65644445-1e6e8dbc-f8a3-525400b594e9/ubuntu-jammy-iso/vmx-zdump.000"
2023-11-30T05:12:08.644Z| vcpu-1| I005: Msg_Post: Error
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-1)
2023-11-30T05:12:08.644Z| vcpu-1| I005+ vcpu-0:Invalid VMCB.
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/65644445-1e6e8dbc-f8a3-525400b594e9/ubuntu-jammy-iso/vmware.log".
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.requestSupport.withoutLog] You can request support.
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.requestSupport.vmSupport.vmx86]
2023-11-30T05:12:08.644Z| vcpu-1| I005+ To collect data to submit to VMware technical support, run "vm-support".
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.response] We will respond on the basis of your support entitlement.
2023-11-30T05:12:08.644Z| vcpu-1| I005: ----------------------------------------

if i add 

vmx.allowNested = TRUE
hv.assumeEnabled = TRUE

 the log says

Spoiler
Logs

2023-11-30T05:12:05.119Z| vcpu-0| I005: AMD-V enabled.
2023-11-30T05:12:05.119Z| vcpu-2| I005: AMD-V enabled.
2023-11-30T05:12:05.120Z| vcpu-1| I005: AMD-V enabled.
2023-11-30T05:12:05.120Z| vcpu-3| I005: AMD-V enabled.
2023-11-30T05:12:05.139Z| vcpu-1| W003: MONITOR PANIC: vcpu-0:Invalid VMCB.
2023-11-30T05:12:05.139Z| vcpu-1| I005: Core dump with build build-16555998
2023-11-30T05:12:05.139Z| vcpu-2| I005: Exiting vcpu-2
2023-11-30T05:12:05.139Z| vcpu-0| I005: Exiting vcpu-0
2023-11-30T05:12:05.139Z| vcpu-3| I005: Exiting vcpu-3
2023-11-30T05:12:05.144Z| vcpu-1| I005: Writing monitor `vmmcores.gz`
2023-11-30T05:12:05.144Z| vcpu-1| W003: Dumping core for vcpu-0
2023-11-30T05:12:05.144Z| vcpu-1| I005: VMK Stack for vcpu 0 is at 0x45389ed13000
2023-11-30T05:12:05.144Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:05.746Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:05.747Z| vcpu-1| W003: Dumping core for vcpu-1
2023-11-30T05:12:05.747Z| vcpu-1| I005: VMK Stack for vcpu 1 is at 0x45389cb13000
2023-11-30T05:12:05.747Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:06.074Z| mks| W003: Panic in progress... ungrabbing
2023-11-30T05:12:06.074Z| mks| I005: MKS: Release starting (Panic)
2023-11-30T05:12:06.074Z| mks| I005: MKS: Release finished (Panic)
2023-11-30T05:12:06.322Z| vmx| I005: MKSVMX: Vigor requested a screenshot
2023-11-30T05:12:06.322Z| svga| I005: MKSScreenShotMgr: Taking a screenshot
2023-11-30T05:12:06.359Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:06.360Z| vcpu-1| W003: Dumping core for vcpu-2
2023-11-30T05:12:06.360Z| vcpu-1| I005: VMK Stack for vcpu 2 is at 0x453886393000
2023-11-30T05:12:06.360Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:06.733Z| vmx| I005: MKSVMX: Vigor requested a screenshot
2023-11-30T05:12:06.733Z| svga| I005: MKSScreenShotMgr: Taking a screenshot
2023-11-30T05:12:06.921Z| vmx| I005: MKSVMX: Vigor requested a screenshot
2023-11-30T05:12:06.922Z| svga| I005: MKSScreenShotMgr: Taking a screenshot
2023-11-30T05:12:06.967Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:06.967Z| vcpu-1| W003: Dumping core for vcpu-3
2023-11-30T05:12:06.967Z| vcpu-1| I005: VMK Stack for vcpu 3 is at 0x4538a0013000
2023-11-30T05:12:06.967Z| vcpu-1| I005: Beginning monitor coredump
2023-11-30T05:12:07.570Z| vcpu-1| I005: End monitor coredump
2023-11-30T05:12:08.644Z| vcpu-1| W003: A core file is available in "/vmfs/volumes/65644445-1e6e8dbc-f8a3-525400b594e9/ubuntu-jammy-iso/vmx-zdump.000"
2023-11-30T05:12:08.644Z| vcpu-1| I005: Msg_Post: Error
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-1)
2023-11-30T05:12:08.644Z| vcpu-1| I005+ vcpu-0:Invalid VMCB.
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/65644445-1e6e8dbc-f8a3-525400b594e9/ubuntu-jammy-iso/vmware.log".
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.requestSupport.withoutLog] You can request support.
2023-11-30T05:12:08.644Z| vcpu-1| I005: [msg.panic.requestSupport.vmSupport.vmx86]

 

 

Reply
0 Kudos
bluefirestorm
Champion
Champion

I am not aware of the hv.assumeEnabled and what it does (does it even apply for VMware ESXi?)

Anyway, try adding (or modifying) in the VM vmx also
vcpu.hotAdd = "FALSE"

Did you enable nested virtualisation at the KVM level? I am not familiar with KVM. But what does

cat /sys/module/kvm_amd/parameters/nested

show on the KVM host?

Anyway, you should make sure that KVM allows nested as well. You should refer to KVM documentation for that; such as this https://docs.fedoraproject.org/en-US/quick-docs/using-nested-virtualization-in-kvm/ or whatever KVM documentation that applies to your environment.

Also next time, please attach log(s) as file(s) instead of pasting them as text. It is quite difficult/annoying to navigate large blocks of log text. If you want to paste large log text at least put them inside "Spoiler" so it can condensed/expanded as needed.

 

Reply
0 Kudos
bluefirestorm
Champion
Champion

Anyway, looks like the "Invalid VMCB" is a bug of newer Linux kernel versions that broke nested virtualisation of ESXi inside KVM with AMD CPUs.
See this
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2008583

Reply
0 Kudos
adelianurlinap
Contributor
Contributor

I've tried adding

vcpu.hotAdd = "FALSE"

but unfortunately, the VM still won't start and the logs remain the same as before:

 

[msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-1)

vcpu-0:Invalid VMCB.

 

Yes, the KVM allows nested virtualization. The server is running OpenStack on top of KVM, and the VMs on top of that are running correctly. So, I believe nested virtualization is configured correctly.

 

cat /sys/module/kvm_amd/parameters/nested

1

 

Regarding the bug you mentioned, I've seen that too, but they mention kernel versions 5 and 6, while my kernel is 4.18.0-516.el8.x86_64. Is it possible that the bug also exists in kernel version 4?

Anyway thank you for your response and help.

Reply
0 Kudos
adelianurlinap
Contributor
Contributor

I've tried adding

vcpu.hotAdd = "FALSE"

but unfortunately, the VM still won't start and the logs remain the same as before

 

 

Yes, the KVM allows nested virtualization. The server is running OpenStack on top of KVM, and the VMs on top of that are running correctly. So, I believe nested virtualization is configured correctly.

 

cat /sys/module/kvm_amd/parameters/nested

1

 

Regarding the bug you mentioned, I've seen that too, but they mention kernel versions 5 and 6, while my kernel is 4.18.0-516.el8.x86_64. Is it possible that the bug also exists in kernel version 4?

Anyway thank you for your response and help.

Reply
0 Kudos
bluefirestorm
Champion
Champion

Different Linux distros use different numbering for their kernel versions. Even though they have different numbering they share the same kernel code or at least a large chunks of it.
The reference to version 5.19 to 6.x is for Ubuntu these are the version numbers that Ubuntu has used that were affected by the bug. It mentions 5.15.x kernel of Ubuntu is fine. If you scroll to the last post (27 Nov 2023), it mentions CentOS 9 Stream with 5.14, which is a different numbering from Ubuntu, has the same problem.

But basically looks like it is a Linux kernel bug from 2021.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=174a921b6975ef959dd82e...

I assume that you have RHEL 8.x with the kernel version that you have (4.18.x). With Ubuntu, kernel version 4.1x would have been around 5 years ago with Ubuntu 18.04. I don't know the equivalent of Ubuntu 5.15 to the RHEL version.

 

Reply
0 Kudos
adelianurlinap
Contributor
Contributor

Got it, so this is most likely a kernel bug... I'll try to find another way. Thank you so much for your help!

Reply
0 Kudos