Highlighted
Contributor
Contributor

Esxi 6.5, PCI.IDS and PCI passthrough with pci bridge

Hi, i am trying to configure an ESXI 6.5 to use some National Instrument PCI cards in a virtualized environment. I had a lot of problems so far, caused by the design of my system: The computer on which ESXI is installed is also the client machine which permit to use some P2V VM with National Instrument PCI carde using Passthru.

The structure work fine, I have acces to ESXI, my master VM, and my P2V VMs on the same machine (and the same screen).
I’m using a PCIe-Q370-r11 Motherboard with a PXE-13S-r50 backplane (8 Pci slots divided in two bridges, 4 Pcie slot.
Actually i’m using 4 pci and 2 pcie slots:
- Gpu Quadro P2000 and USB controller with passthrough to the master VM

- 4 National Instrument pci cards with passthrough to the P2V Vm.

Problem: Passthrough works fine on all VMs, I can start all of them. But on the P2V Vms, PCI cards don’t work. Windows know they’re here, drivers seems to be installed correctly, National Instrument software also detect them correctly.

When I go using lspci on my ESXI, those cards are marqued as  ´´Class FF00’´ (Unassigned class). They are all on the same PCI bridge. The cards names are correct, their ID seems OK too. But the cards Serial Number won’t show up in my VM.


I don’t get it why it’s behaving like this.
The PCI cards are not unknown to ESXI, but for some reason, they are not correctly used by VMs.

Maybe a problem with pci bridge's drivers? Or pci.ids incorrect?

If anybody had this issue before, i’ll be happy to hear it!

Thanks for reading the entire post with my terrible English level 

23 Replies
Highlighted
Virtuoso
Virtuoso

Are all 4 PCI cards assigned to the same VM? Or are they assigned to different VMs (i.e. 1 card each in 1 VM)?

The following text is from https://kb.vmware.com/s/article/2142307

You can read other details in that article but I highlight in red below that might be applicable to your scenario.

PCI Functions behind legacy PCI Bridges

VMware strongly recommends that PCI Functions assigned for VMDirectPath I/O be placed behind PCI Express root ports or switch downstream ports.

VMware discourages VMDirectPath I/O assignment of PCI Functions behind conventional PCI bridges or PCIe-to-PCI/PCI-X bridges. PCI Functions behind PCIe to PCI/PCI-X bridges or PCI conventional bridges must be collectively assigned for VMDirectPath I/O to the same virtual machine.

These bridges take ownership of PCI transactions sent by PCI Functions behind them by placing the bridge’s PCI requester ID on the transactions. This forces the ESXi host to program IOMMU translations using the PCI Bridge’s requester ID, implying that all PCI Functions behind the bridges must be placed in the same IOMMU domain and therefore be collectively assigned to the same virtual machine.
0 Kudos
Highlighted
Contributor
Contributor

Hi, thanks for your answer.

Yes, all of these PCI cards are assigned to the same P2V VM, as they were used by the original machine.

Unfortunately, I saw this VMWare quote so many times during my research...

These PCI cards work with their original machine (WIN32 bits), i also made them working with WIN10 32bits (as their drivers are not 64bits supported). So there is no problem when using Windows OS without virtualization.

I also did a P2V from the original machine, and later from the WIN10 machine. Each time, when they're turned to VM, problems begin. Passthru seems to work fine, as I have no problem starting them with the PCI cards linked. No visible problem at first sight: drivers are correctly installed, device manager and National Instrument software confirm it. But looking deeper, when I want to use them through NI Software, nothing happen.

I noticed that the Serial Numbers are not visible from VM. Cards are detected, but for some reason, it seems like ESXI is missing some informations when it passthrough to the VM.

I'm actually looking for informations about how to upgrade "pci.ids" file. Maybe it's totally useless, but I haven't tested it yet. I also heard of "simple.map" file, but I don't know what effect it could have.

0 Kudos
Highlighted
Virtuoso
Virtuoso

I don't know what you mean by pci.ids and simple.map. There should be a /etc/vmware/passthru.map

This is an old document (ESXi v4) https://www.vmware.com/pdf/vsp_4_vmdirectpath_host.pdf but it details the passthru.map parameters. It also mentions some vmx entries that might be necessary (e.g. switch to IOAPIC mode). Maybe you have also already seen this old document.

Have you tried creating a VM instead of P2V? Is there any difference if the VM is using virtual EFI or virtual BIOS?

For P2V, some devices might become "ghost" devices in the VM; and it may be necessary to uninstall these from Device Manager (show Hidden Devices and uninstall).

0 Kudos
Highlighted
Contributor
Contributor

Hello there!

I tried with VM and P2V VM last week, same results.

Same thing with BIOS/EFI on both VM and host.

Here there is no "ghost", as I said, all PCI devices are correctly displayed in device manager, they just don't work, as if the informations are incapable of going on VM and back to host when I try to use the cards with my software. Like if it's trapped between VM and host.

And indeed I didn't see this document before, but I already tried almost everything it mentionned.

The only thing I didn't try is to play with the passthru.map file, as you said, so I guess I'll be investigating this way today.

I'll let you know if I find something interesting Smiley Wink

0 Kudos
Highlighted
Contributor
Contributor

So, I did new tests today, using "passthru.map" file, and others things, but still no effect. I found these few lines in my ESXI's logs, where the current PCI card i'm testing (0000:03:0f.0) display errors. If anyone has any idea of what those lines means...

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 0 type 2 realaddr 0xa2000000 size 16777216 flags 0

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 1 type 3 realaddr 0x90000000 size 268435456 flags 12

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 3 type 3 realaddr 0xa0000000 size 33554432 flags 12

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 5 type 1 realaddr 0x4000 size 128 flags 1

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device has PCI Express Cap Version 2(size 60)

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Registered a PCI device for 0000:01:00.0 vIRQ 0x11, physical MSI = Enabled (vmmInt = Enabled), IntrPin = 1

2020-10-20T13:10:01.464Z| vmx| I125: PCIPassthru: Device 0000:01:00.1 barIndex 0 type 2 realaddr 0xa3080000 size 16384 flags 0

2020-10-20T13:10:01.464Z| vmx| I125: PCIPassthru: Device has PCI Express Cap Version 2(size 60)

2020-10-20T13:10:01.464Z| vmx| I125: PCIPassthru: Registered a PCI device for 0000:01:00.1 vIRQ 0x12, physical MSI = Enabled (vmmInt = Enabled), IntrPin = 2

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: Device 0000:03:0f.0 barIndex 0 type 2 realaddr 0xa3502000 size 4096 flags 0

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: Device 0000:03:0f.0 barIndex 1 type 2 realaddr 0xa3500000 size 8192 flags 0

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: PCI device 0000:03:0f.0 is marked wrong PCIe

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: Registered a PCI device for 0000:03:0f.0 vIRQ 0x13, physical MSI = Disabled (vmmInt = Disabled), IntrPin = 1

2020-10-20T13:10:09.471Z| vmx| I125: PCIPassthru: Device 0000:05:00.0 barIndex 0 type 3 realaddr 0xa3300000 size 4096 flags 4

2020-10-20T13:10:09.471Z| vmx| I125: PCIPassthru: Device has PCI Express Cap Version 2(size 60)

2020-10-20T13:10:09.471Z| vmx| I125: PCIPassthru: Registered a PCI device for 0000:05:00.0 vIRQ 0x10, physical MSI = Disabled (vmmInt = Disabled), IntrPin = 1

2020-10-20T13:10:10.473Z| vmx| I125: PCIPassthru: Device 0000:00:08.0 barIndex 0 type 3 realaddr 0xa3642000 size 4096 flags 4

2020-10-20T13:10:10.473Z| vmx| I125: PCIPassthru: PCI device 0000:00:08.0 is marked wrong PCIe

2020-10-20T13:10:10.473Z| vmx| I125: PCIPassthru: Registered a PCI device for 0000:00:08.0 vIRQ 0x11, physical MSI = Enabled (vmmInt = Enabled), IntrPin = 1

2020-10-20T13:10:10.474Z| vmx| I125: Ethernet0 MAC Address: 00:0c:29:60:19:c0

2020-10-20T13:10:10.474Z| vmx| I125: USB: Initializing 'xHCI' host controller

2020-10-20T13:24:49.249Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:03:0f.0

2020-10-20T13:24:51.771Z| mks| I125: SOCKET 5 (121) Received websocket close frame with empty status code

2020-10-20T13:24:51.771Z| mks| I125: SOCKET 5 (121) Sending websocket close frame, status code = 1000

2020-10-20T13:24:51.771Z| mks| I125: SOCKET 5 (121) VNC Remote Disconnect: socket closed.

2020-10-20T13:24:51.771Z| mks| I125: MKSControlMgr: Remove VNC connection 0

2020-10-20T13:24:57.255Z| vmx| I125: Tools_SetGuestResolution: Sending rpcMsg = Resolution_Set 1024 793

2020-10-20T13:24:57.272Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:05:00.0

2020-10-20T13:25:05.279Z| svga| I125: MKSScreenShotMgr: Taking a screenshot

2020-10-20T13:25:05.294Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:01:00.0

2020-10-20T13:25:13.302Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:01:00.1

2020-10-20T13:25:21.313Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:00:08.0

2020-10-20T13:25:23.333Z| vcpu-0| I125: SVGA: Registering MemSpace at 0xe8000000(0xe8000000) and 0xf9000000(0xf9000000)

2020-10-20T13:25:23.333Z| vcpu-0| I125: SVGA: Unregistering MemSpace at 0xe8000000(0xe8000000) and 0xf9000000(0xf9000000)

2020-10-20T13:25:23.365Z| vcpu-0| I125: SVGA: Registering IOSpace at 0x1040

2020-10-20T13:25:23.365Z| vcpu-0| I125: SVGA: Unregistering IOSpace at 0x1040

2020-10-20T13:25:23.366Z| vcpu-0| I125: AHCI: Tried to enable/disable IO space.

2020-10-20T13:25:23.366Z| vcpu-0| I125: PCIBridge4: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:1: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:2: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:3: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:4: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:5: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:6: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge4:7: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: PCIBridge5: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.366Z| vcpu-0| I125: pciBridge5:1: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge5:2: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge5:3: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge5:4: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge5:5: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge5:6: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge5:7: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: PCIBridge6: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:1: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:2: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:3: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:4: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:5: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:6: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.367Z| vcpu-0| I125: pciBridge6:7: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: PCIBridge7: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:1: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:2: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:3: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:4: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:5: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:6: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.368Z| vcpu-0| I125: pciBridge7:7: ISA/VGA decoding enabled (ctrl 001C)

2020-10-20T13:25:23.372Z| vcpu-0| I125: SVGA: Registering IOSpace at 0x1040

2020-10-20T13:25:23.372Z| vcpu-0| I125: SVGA: Registering MemSpace at 0xe8000000(0xe8000000) and 0xf9000000(0xf9000000)

2020-10-20T13:25:23.375Z| svga| I125: SVGA enabling SVGA

2020-10-20T13:25:24.009Z| vcpu-0| I125: Tools: Running status rpc handler: 0 => 1.

2020-10-20T13:25:24.009Z| vcpu-0| I125: Tools: Changing running status: 0 => 1.

0 Kudos
Highlighted
Virtuoso
Virtuoso

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: PCI device 0000:03:0f.0 is marked wrong PCIe

From page 4 section titled "PCIPassthru Error Message"

the error is resolved by adding to the vmx

pciPassthruX.virtualDev = "pci"

where X is the respective NI PCI device indicated in the PCI passthrough

Highlighted
Contributor
Contributor

Yep, I noticed it too, unfortunately modifying the option had no effect at all Smiley Sad

0 Kudos
Highlighted
Virtuoso
Virtuoso

Did the MMIO address change after changing the virtualDev to "pci"?

What is device Device 0000:01:00.0 ? Is that the Quadro P2000 GPU or some other device?

If you still have an EFI VM, you could try adding

pciPassthru.use64bitMMIO="TRUE"

I don't know if this will have an effect as the guest OS you mentioned is 32-bit Windows 10. If this has an effect, it at least should make the PCIe MMIO address above the 4GB address space for the ones that are capable (Quadro P2000 likely is capable) and leaving higher address range below 4GB for the PCI devices.

For now the 0xa3500000 and 0xa3502000 is just above the 2.5GB address range.

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: Device 0000:03:0f.0 barIndex 0 type 2 realaddr 0xa3502000 size 4096 flags 0

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: Device 0000:03:0f.0 barIndex 1 type 2 realaddr 0xa3500000 size 8192 flags 0

Do you know what MMIO addresses were used in the physical PC?

You should be able to find out using

msinfo32 -> Hardware resources or

HWinfo32 r HWinfo64 -> Bus -> System Resources

Highlighted
Contributor
Contributor

Yes, 0000:01:00 is the P2000.

pciPassthru.use64bitMMIO="TRUE" seems to have abolutely no effect.

I can find out the old MMIO on the physical machine, but I know for sure that they are different, because the ESXI host's back plane is using PCIe to PCI Bridge, and automatically remap devices under those bridges.

virtualDev=pci seems to have an effect: before this modification, nothing happened when using National Instrument Software.

Now, doing the same thing, i got a beautiful Windows blue screen. So, although it's not functional, it's a big step forward!

0 Kudos
Highlighted
Virtuoso
Virtuoso

For the virtualdev="pci", make sure you do the changes for all the NI PCI cards since they are all on the same PCIe/PCI bridge.

Alternative to the 64-bit MMIO is remove all other passthrough from the VM configuration except just the NI PCI cards; maybe even try one PCI card at a time and see whether MMIO address allocated is different.

The Quadro P2000 occupies a lot of address space (304MB from 0x9000000)

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 0 type 2 realaddr 0xa2000000 size 16777216 flags 0

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 1 type 3 realaddr 0x90000000 size 268435456 flags 12

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 3 type 3 realaddr 0xa0000000 size 33554432 flags 12

2020-10-20T13:09:57.459Z| vmx| I125: PCIPassthru: Device 0000:01:00.0 barIndex 5 type 1 realaddr 0x4000 size 128 flags 1

Maybe removing it temporarily from the VM passthrough will let the NI PCI cards negotiate a different address range that the NI software would prefer.

Are there any configuration settings at the NI software with regards to IRQ/MMIO address?

0 Kudos
Highlighted
Contributor
Contributor

Yep, I'm actually testing the passthrough of a single NI PCI Card, on a new VM (Native Win10 32bits, as I was able to validate the cards as functionnal with this OS using the same machine as for ESXI).

So I don't use the P2000 today, just the PCI-5412 (0000:03:0f.0), with the results I said in my last post Smiley Happy

I'm actually taking some screenshots in order to show you more precisely what it looks like

0 Kudos
Highlighted
Contributor
Contributor

So here are 4 screenshots:

- The NI-MAX Software i'm using. It's from this soft than i get the BSOD, when hitting the "Reset" button

- The BSOD with its error code, seems to be driver's related, but I don't know how yet

- The ESXI VMX for this VM

- The part of the ESXI log in which the BSOD appears

Still investigating on the adress ! Smiley Happy

0 Kudos
Highlighted
Virtuoso
Virtuoso

I assume the Serial Number FFFFFFFF is incorrect whereas previously it was blank? So it is when you click "Reset" in the NI software and then BSOD appears? Before without virtualdev="pci" there was no response?

The screenshot of ESXi vmware.log is a like a text version of the BSOD. It shows the BSOD code (0x1e = KMODE_EXCEPTION_NOT_HANDLED), 0xC0000005 status access violation on address 0x8ec77db8 and BSOD screen show nisrcdk.dll as the trigger.

What is the BAR address when only this single NI card is in the VM?

Going back to the previous log,

2020-10-20T13:10:05.468Z| vmx| I125: PCIPassthru: Registered a PCI device for 0000:03:0f.0 vIRQ 0x13, physical MSI = Disabled (vmmInt = Disabled), IntrPin = 1

Did you also add

pciPassthru[X].msiEnabled = "FALSE"

since from the log the card looks like the card is not using MSI.

0 Kudos
Highlighted
Contributor
Contributor

@I assume the Serial Number FFFFFFFF is incorrect whereas previously it was blank? So it is when you click "Reset" in the NI software and then BSOD appears? Before without virtualdev="pci" there was no response?

You get it. Smiley Wink

Here are some new screenshots that can be useful.

I found something weird in the Win System Information: the driver that seems to trigger the BSOD has a strange filepath (\??\c:\windows...)... It's the first time i see this, and google seems to ignore everything about it too :smileycry::smileyshocked:

0 Kudos
Highlighted
Virtuoso
Virtuoso

The question marks in the path is fine. If the path was a problem it would not have been able to trigger a BSOD.

The explanation of the question marks from here https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system.

It looks like VM IRQ 19 matches the log (0x13) as hex 13 = decimal 19

For the MMIO address, I think it should be either under I/O or Memory in msinfo32.

Did you get to add the PCIpassthru[X].msiEnabled="FALSE" to the vmx?

Highlighted
Contributor
Contributor

Ok, I was worried about this strange path.

Yep, msiEnabled is set to FALSE.

Here are the address for the PCI-5412:

- 0xFE2FD000 - 0xFE2FDFFF

- 0xFE2FE000 - 0xFE2FFFFF

It's the same device, both marked as "Status = OK"

0 Kudos
Highlighted
Virtuoso
Virtuoso

The sizes match but the addresses don't

| vmx| I125: PCIPassthru: Device 0000:03:0f.0 barIndex 0 type 2 realaddr 0xa3502000 size 4096 flags 0

| vmx| I125: PCIPassthru: Device 0000:03:0f.0 barIndex 1 type 2 realaddr 0xa3500000 size 8192 flags 0

0xa3502000-0xa3502FFF    4K

0xa3500000-0xa3501FFF    8K

0xFE2FD000 - 0xFE2FDFFF    4K

0xFE2FE000 - 0xFE2FFFFF    8K

Is there any way to set the I/O address using the NI software?

0 Kudos
Highlighted
Contributor
Contributor

Unfortunately, there is not Smiley Sad

0 Kudos
Highlighted
Virtuoso
Virtuoso

The last time I had to fiddle with I/O addresses for devices was about 20 years ago; so I don't know how exactly it works in Windows 10.

With Windows 10 Device Manager, it looks like the I/O address and/or IRQ in Device Manager in the Resources tab can be changed if the "Use automatic settings" can be unchecked (COM1 port seems changeable). Check to see if that is available for the NI PCI card and match the address to what is shown on the vmware.log.

pastedImage_0.png

0 Kudos