VMware Cloud Community
jsbowden
Contributor
Contributor

What magic spell do I need to cast to pass an HPE H240 SAS controller to Windows Server 2012 R2/2016 and not have the VM hard lock?

I think the title conveys the message:

Version:

6.5.0 Update 2 (Build 8294253)

Model

ProLiant ML350 Gen9

0 Kudos
11 Replies
bluefirestorm
Champion
Champion

I assume you are referring to the PCIe passthrough of the SAS controller card to the VM and the VM occasionally freezes/locks-up.

From page 6-7 on the ML350 Gen 9 quick specs,

https://h20195.www2.hpe.com/v2/getpdf.aspx/c04375628.pdf#page=6&zoom=auto,-271,226

it looks the slots are assigned for specific CPUs.

So if the card is placed in slots 1-4, try to restrict the vCPUs of the VM to the logical processors that are in socket 1 and slots 5-9 to socket 2 by using the Scheduling Affinity settings. For example, if the system has dual CPUs with 6 cores/12 threads each; set the affinity to 0-11 if the card is on slot 1-4 and affinity to 12-23 if the card is installed on slot 5-9.

The theory I have in mind about this is if there is no electrical path/circuitry between the slot and bus while the VM operating using the "wrong" logical CPU, it will fail to communicate with the card thus causing the lock-up.

As for magic spells, you can try the Amazing Mumford's line "Ala peanut butter sandwiches" but YMMV :smileylaugh:

0 Kudos
jsbowden
Contributor
Contributor

Not occasionally.  On boot, like clockwork, the VM will hard lock and power off.  Windows Server 2012 R2 and Server 2016 both exhibit this same behavior.  I can install Linux onto the same VM and things work just fine.  But that doesn't solve my problem as I need the device plugged into that controller to be available in Windows (HP 6250 LTO 6 SAS drive).

0 Kudos
jsbowden
Contributor
Contributor

I don't know how much or if it matters, but a more complete version:

You are running HPE Customized Image ESXi 6.5.0 Update 2 version 650.U2.10.3.0 released on June 2018 and based on ESXi 6.5.0 Update 2 Vmkernel Release Build 8294253.

0 Kudos
pwolf
Enthusiast
Enthusiast

The fact, that the PCI slots are connected to a specific processor, is only of importance, if you have one processor. Then you can only use one half of the ports. As soon as both processor positions are occupied by a processor, each processor can access all PCI slots.

Regarding the other fact - on ESXi 6.0.0 this configuration - with the difference, that the tape drive is external and the controller has external SAS ports (HBA 241) - does work.

Is this behaviour occurring at each boot of the VM? And at which part of the installation process does it occur?

What is the boot sequence of your Windows VM? Do you have an EFI or BIOS virtual firmware?

0 Kudos
jsbowden
Contributor
Contributor

Every single time.  I even tried Windows 10 Enterprise 1709 just out of curiosity and observed the same behavior from it.  This system is dual processor, but I have all the expansion cards on the low numbered slots just for convenience (the cable run on that side of the system, you know, exists).  I originally tried passing the integrated 440ar controller and saw this behavior, but since that controller is explicitly not supported for Windows (HPE provides a driver, but does not support that configuration), I thought putting a known working controller in and passing that to the VM would solve my problem.  I really miss just being able to pass a device to a VM instead of having to pass the entire slot/controller (I have physical Windows servers with H24[0|1]s in them).

0 Kudos
jsbowden
Contributor
Contributor

To answer your other questions, the hard lock and VM power off happens during initial kernel load.  The system is in UEFI mode (this is a hard security requirement and I cannot change it as I am required to run Secure Boot).  The VM I'm attempting to add this to is third in the Autostart sequence.  I could change that, but ESXi ignores the change (pick your sequence carefully, because it's a known problem that you're then stuck with it).

0 Kudos
jsbowden
Contributor
Contributor

Log dump:

2018-10-30T20:44:19.342Z| vcpu-0| E105: PANIC: VERIFY bora/devices/pcipassthru/pciPassthru.c:913

2018-10-30T20:44:20.885Z| vcpu-0| W115: A core file is available in "/vmfs/volumes/5b4e1d21-74a72e02-9f45-5820b109d6ce/Test VM/vmx-zdump.002"

2018-10-30T20:44:20.886Z| mks| W115: Panic in progress... ungrabbing

2018-10-30T20:44:20.886Z| mks| I125: MKS: Release starting (Panic)

2018-10-30T20:44:20.886Z| mks| I125: MKS: Release finished (Panic)

2018-10-30T20:44:20.895Z| vcpu-0| I125: Writing monitor file `vmmcores.gz`

2018-10-30T20:44:20.899Z| vcpu-0| W115: Dumping core for vcpu-0

2018-10-30T20:44:20.899Z| vcpu-0| I125: CoreDump: dumping core with superuser privileges

2018-10-30T20:44:20.899Z| vcpu-0| I125: VMK Stack for vcpu 0 is at 0x43923fc93000

2018-10-30T20:44:20.899Z| vcpu-0| I125: Beginning monitor coredump

2018-10-30T20:44:21.663Z| vcpu-0| I125: End monitor coredump

2018-10-30T20:44:21.663Z| vcpu-0| W115: Dumping core for vcpu-1

2018-10-30T20:44:21.663Z| vcpu-0| I125: CoreDump: dumping core with superuser privileges

2018-10-30T20:44:21.663Z| vcpu-0| I125: VMK Stack for vcpu 1 is at 0x43923fe93000

2018-10-30T20:44:21.663Z| vcpu-0| I125: Beginning monitor coredump

2018-10-30T20:44:21.888Z| mks| W115: Panic in progress... ungrabbing

2018-10-30T20:44:21.888Z| mks| I125: MKS: Release starting (Panic)

2018-10-30T20:44:21.888Z| mks| I125: MKS: Release finished (Panic)

2018-10-30T20:44:22.309Z| vcpu-0| I125: End monitor coredump

2018-10-30T20:44:22.310Z| vcpu-0| W115: Dumping core for vcpu-2

2018-10-30T20:44:22.310Z| vcpu-0| I125: CoreDump: dumping core with superuser privileges

2018-10-30T20:44:22.310Z| vcpu-0| I125: VMK Stack for vcpu 2 is at 0x43923ff13000

2018-10-30T20:44:22.310Z| vcpu-0| I125: Beginning monitor coredump

2018-10-30T20:44:22.890Z| mks| W115: Panic in progress... ungrabbing

2018-10-30T20:44:22.890Z| mks| I125: MKS: Release starting (Panic)

2018-10-30T20:44:22.890Z| mks| I125: MKS: Release finished (Panic)

2018-10-30T20:44:22.907Z| vcpu-0| I125: End monitor coredump

2018-10-30T20:44:22.908Z| vcpu-0| W115: Dumping core for vcpu-3

2018-10-30T20:44:22.908Z| vcpu-0| I125: CoreDump: dumping core with superuser privileges

2018-10-30T20:44:22.908Z| vcpu-0| I125: VMK Stack for vcpu 3 is at 0x43923ff93000

2018-10-30T20:44:22.908Z| vcpu-0| I125: Beginning monitor coredump

2018-10-30T20:44:23.505Z| vcpu-0| I125: End monitor coredump

2018-10-30T20:44:23.892Z| mks| W115: Panic in progress... ungrabbing

2018-10-30T20:44:23.892Z| mks| I125: MKS: Release starting (Panic)

2018-10-30T20:44:23.892Z| mks| I125: MKS: Release finished (Panic)

2018-10-30T20:44:24.266Z| vcpu-0| I125: Printing loaded objects

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6DC3603000-0x6DC46CD7A4): /bin/vmx

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E04CC8000-0x6E04CDF1CC): /lib64/libpthread.so.0

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E04EE5000-0x6E04EE6F00): /lib64/libdl.so.2

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E050E9000-0x6E050F1D08): /lib64/librt.so.1

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E05304000-0x6E0559920C): /lib64/libcrypto.so.1.0.2

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E057CA000-0x6E0583318C): /lib64/libssl.so.1.0.2

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E05A3E000-0x6E05B5237C): /lib64/libX11.so.6

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E05D59000-0x6E05D6801C): /lib64/libXext.so.6

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E05F69000-0x6E0604D341): /lib64/libstdc++.so.6

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E0626C000-0x6E062EC21C): /lib64/libm.so.6

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E064EF000-0x6E06503BC4): /lib64/libgcc_s.so.1

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E06705000-0x6E06865C74): /lib64/libc.so.6

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6DC4AA7000-0x6DC4AC47D8): /lib64/ld-linux-x86-64.so.2

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E06A70000-0x6E06A8A634): /lib64/libxcb.so.1

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E06C8C000-0x6E06C8D95C): /lib64/libXau.so.6

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E0725C000-0x6E072F151C): /usr/lib64/vmware/plugin/objLib/upitObjBE.so

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E0750A000-0x6E076608D4): /usr/lib64/vmware/plugin/objLib/vsanObjBE.so

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E078F7000-0x6E0790BF94): /lib64/libz.so.1

2018-10-30T20:44:24.266Z| vcpu-0| I125: [0x6E07D56000-0x6E07D611D0): /lib64/libnss_files.so.2

2018-10-30T20:44:24.266Z| vcpu-0| I125: End printing loaded objects

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace:

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[0] 0000006e098db480 rip=0000006dc3c816f7 rbx=0000006dc3c811f0 rbp=0000006e098db4a0 r12=0000000000000000 r13=0000000000000001 r14=0000000000000001 r15=0000000000000400

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[1] 0000006e098db4b0 rip=0000006dc37c271c rbx=0000006e098db4d0 rbp=0000006e098db9b0 r12=0000006dc49260f0 r13=0000000000000001 r14=0000000000000001 r15=0000000000000400

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[2] 0000006e098db9c0 rip=0000006dc38c1147 rbx=0000006dc5110ab0 rbp=0000006e098dba40 r12=0000000000000002 r13=0000000000000002 r14=0000000000000001 r15=0000000000000400

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[3] 0000006e098dba50 rip=0000006dc38c11cf rbx=0000000000000003 rbp=0000006e098dbab0 r12=0000000000000000 r13=0000006dc5110b74 r14=0000006dc5110ab0 r15=0000006dc5110b10

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[4] 0000006e098dbac0 rip=0000006dc38c1bb4 rbx=0000000000000002 rbp=0000006e098dbb10 r12=0000006dc5110ab0 r13=0000000000000001 r14=0000006e098dbb3c r15=0000006dc5110b58

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[5] 0000006e098dbb20 rip=0000006dc391778c rbx=0000000000000031 rbp=0000006e098dbb70 r12=0000000000300004 r13=0000006e07fd7020 r14=0000000000000000 r15=0000006dc51116e0

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[6] 0000006e098dbb80 rip=0000006dc3bcf661 rbx=0000006dc4a3a2e8 rbp=0000006e098dbbb0 r12=0000006dc478d580 r13=0000000000000168 r14=0000006dc4eacbb0 r15=0000000000000000

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[7] 0000006e098dbbc0 rip=0000006dc3bf7376 rbx=000000000000012d rbp=0000006e098dbc00 r12=0000006dc49260f0 r13=0000006dc4a31bc0 r14=0000006dc490bfe0 r15=0000000000000000

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[8] 0000006e098dbc10 rip=0000006dc3bcf748 rbx=0000000000000000 rbp=0000006e098dbc20 r12=0000006e098dbc40 r13=0000006dc4f64a90 r14=0000006dc4cc5040 r15=0000000000000003

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[9] 0000006e098dbc30 rip=0000006dc3c6cc25 rbx=0000006dc4a3b8e0 rbp=0000006e098dbd80 r12=0000006e098dbc40 r13=0000006dc4f64a90 r14=0000006dc4cc5040 r15=0000000000000003

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[10] 0000006e098dbd90 rip=0000006e04ccfcfc rbx=0000000000000000 rbp=0000000000000000 r12=0000032579e69a40 r13=0000006e098dc9c0 r14=0000006dc4cc5040 r15=0000000000000003

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[11] 0000006e098dbea0 rip=0000006e067d6ead rbx=0000000000000000 rbp=0000000000000000 r12=0000032579e69a40 r13=0000006e098dc9c0 r14=0000006dc4cc5040 r15=0000000000000003

2018-10-30T20:44:24.266Z| vcpu-0| I125: Backtrace[12] 0000006e098dbea8 rip=0000000000000000 rbx=0000000000000000 rbp=0000000000000000 r12=0000032579e69a40 r13=0000006e098dc9c0 r14=0000006dc4cc5040 r15=0000000000000003

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[0] 0000006e098db480 rip=0000006dc3c816f7 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[1] 0000006e098db4b0 rip=0000006dc37c271c in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[2] 0000006e098db9c0 rip=0000006dc38c1147 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[3] 0000006e098dba50 rip=0000006dc38c11cf in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[4] 0000006e098dbac0 rip=0000006dc38c1bb4 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[5] 0000006e098dbb20 rip=0000006dc391778c in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[6] 0000006e098dbb80 rip=0000006dc3bcf661 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[7] 0000006e098dbbc0 rip=0000006dc3bf7376 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.266Z| vcpu-0| I125: SymBacktrace[8] 0000006e098dbc10 rip=0000006dc3bcf748 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.267Z| vcpu-0| I125: SymBacktrace[9] 0000006e098dbc30 rip=0000006dc3c6cc25 in function (null) in object /bin/vmx loaded at 0000006dc3603000

2018-10-30T20:44:24.267Z| vcpu-0| I125: SymBacktrace[10] 0000006e098dbd90 rip=0000006e04ccfcfc in function (null) in object /lib64/libpthread.so.0 loaded at 0000006e04cc8000

2018-10-30T20:44:24.267Z| vcpu-0| I125: SymBacktrace[11] 0000006e098dbea0 rip=0000006e067d6ead in function clone in object /lib64/libc.so.6 loaded at 0000006e06705000

2018-10-30T20:44:24.267Z| vcpu-0| I125: SymBacktrace[12] 0000006e098dbea8 rip=0000000000000000

2018-10-30T20:44:24.267Z| vcpu-0| I125: Msg_Post: Error

2018-10-30T20:44:24.267Z| vcpu-0| I125: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0)

2018-10-30T20:44:24.267Z| vcpu-0| I125+ VERIFY bora/devices/pcipassthru/pciPassthru.c:913

2018-10-30T20:44:24.267Z| vcpu-0| I125: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/5b4e1d21-74a72e02-9f45-5820b109d6ce/Test VM/vmware.log".

2018-10-30T20:44:24.267Z| vcpu-0| I125: [msg.panic.requestSupport.withoutLog] You can request support.

2018-10-30T20:44:24.267Z| vcpu-0| I125: [msg.panic.requestSupport.vmSupport.vmx86]

2018-10-30T20:44:24.267Z| vcpu-0| I125+ To collect data to submit to VMware technical support, run "vm-support".

2018-10-30T20:44:24.267Z| vcpu-0| I125: [msg.panic.response] We will respond on the basis of your support entitlement.

2018-10-30T20:44:24.267Z| vcpu-0| I125: ----------------------------------------

2018-10-30T20:44:24.269Z| vcpu-0| I125: Exiting

0 Kudos
pwolf
Enthusiast
Enthusiast

So in fact you are adding this additional resource to an already existing Windows server 2016 VM? Did you try to install a fresh VM, where this resource is included before the installation?

My working VM has a vitual BIOS and no secure boot.

0 Kudos
jsbowden
Contributor
Contributor

The install disk won't boot with the device passed through to the VM, so yes I was attempting to add it after the fact.  What should have happened is that Windows should have seen it as a new device and either added a driver (if it has one) or listed it as an unknown device.

Turns out that you can not pass a device through to a Windows VM with the BIOS set to UEFI.  This is apparently a known problem, though good luck finding any reference to it anywhere in the documentation.

I have a hard requirement for all systems to run UEFI and Secure Boot, so I'm not sure what I'm going to do yet.

0 Kudos
bluefirestorm
Champion
Champion

Turns out that you can not pass a device through to a Windows VM with the BIOS set to UEFI.  This is apparently a known problem, though good luck finding any reference to it anywhere in the documentation.

I don't know where you read that it is a known problem for PCIe device passthrough to Windows VMs with virtual UEFI. To be frank that is false. There are numerous instances of PCIe passthrough of GPUs to Windows VMs with virtual UEFI (be it VMware HCL GPUs such as Nvidia Tesla or non-qualified consumer GPU cards). In some cases, using virtual EFI resolves problems that a VM with virtual BIOS with PCIe passthrough would encounter (e.g. VM fails to boot, can only boot with VM virtual RAM less than 32GB, etc).

Since the VM is using virtual UEFI, you could try to set the BAR address for PCIe devices above the 4GB address range (if you have not already done so) by setting

pciPassthru.use64bitMMIO = "TRUE"

If that does not resolve the problem, it will be better for you to raise a support case with VMware as from the partial vmware.log text pasted, it looks like the vmx process crashed.

0 Kudos
jsbowden
Contributor
Contributor

You are correct, going back and looking through all the other posts I can find with this problem, they are all storage controllers, but apparently it's not a huge secret that this is apparently broken for passing storage controllers through in EFI mode.

I already have the pciPassthru set.

I also set DiskMaxIOSize to limit it to smallest value of all storage controllers on the system.  This did not help.  There is apparently no solution for this.

0 Kudos