VMware Cloud Community
ciuly
Contributor
Contributor

ESXi pci passthrough device limits

Hello,

I have an esxi 5.1.0 with 8 pci passthrough configured devices:

- 2 go to one particular VM

- 4 to another

- 1 to a 3rd one and

- 1 to the 4th one.

According to https://kb.vmware.com/s/article/1010789 this should be all good.

Problem is that powering up the first 2 VMs (so totaling 6 passed through devices) I can only power up (and have device passed through ok) one of the 3rd/4th VMs but not both.

The devices appears to be passed through, show up in guest OS but I get various errors while the driver tries to load it up. If I only start either one of the 2, all is good.

So the question: is there some way of having all 8 devices passed through ok? Something like the pciHole setting or whatever other trickery that would work.

In case it matters, the devices are: 1 video card (which shows up as 2 devices), 5 USB controllers (mix of onboard and pcie cards) and 1 onboard NIC

There are also 4 mapped raw luns in case that also counts towards something.

Thanks.

PS I tried to upgrade to 6.0 when it came out but ti didn't recognize some card I had so upgrading is not a real option here. I think I had the same issue with 5.5

VMDirectPath with ATI GPU document https://docs.google.com/spreadsheet/ccc?key=0Aqp_xYBwP_Y7dE5EclhtaDdIV09lNWxfODd1alRUTlE
Reply
0 Kudos
7 Replies
Finikiez
Champion
Champion

Probably you hit the limit of memory you can use for PCI passthrough

VMware Knowledge Base

ESXi 5.1 and 5.5

  • The maximum supported size for a single PCI BAR is 1GB.
  • The combined size of all the BARs in the PCI Function must not exceed 3.75GB.
  • The amount of BAR space consumed by other PCI devices in the virtual machine will further limit this, as the combined size of all PCI BARs in the virtual machine is 3.75GB or less. For ESXi 5.1 and 5.5, virtual machines that use legacy BIOS firmware, which maps BARs below the 4GB address boundary, the requirements arise from BAR alignment requirements and memory reservations by the BIOS.

Can you check vmkernel log when you try to start other VMs and work with PCI cards?

Reply
0 Kudos
bluefirestorm
Champion
Champion

I don't think it has anything to do with PCI holes and BAR sizes. PCI holes and BAR sizes are per VM. It is not shared between VMs.

I suspect the two devices in VM #3 and VM #4 is behind the same PCI bridge.

There is no -t option in the lspci of ESXi (in Linux the -t option gives a tree view output). But you could check from the ESXi host client UI whether the two devices that are passed through to VM#3 and VM#4 are behind the same PCI bridge. Alternatively, you could list out lspci -v or lspci -p and see whether the two devices in VM#3 and VM#4 have the same PCIe bus number.

If they are behind the same bridge, you should be assigning both devices to the same VM. You could try adding the device from VM #3 to VM #4 or add the device from VM #4 to VM #3 and see if the passthrough is OK (assuming there is no conflict between the two devices within the same VM); so only VM#3 has both devices or VM#4 devices has both devices.

Reply
0 Kudos
Mirekmal
Contributor
Contributor

What is your motherboard? Some use shared PCI slots, e.g. some PCI lanes are dedicated to slots, some are shareable and if one slot uses, these they become unavailable to others. Not sure how ESXi is handling this, but I guess that once VM initiate particular lane it is no longer available to other (if shared).

Reply
0 Kudos
ciuly
Contributor
Contributor

Thanks for the info. Any idea how I can calcualte those BARs or if there is some log entry somewhere that tells me the BAR for each pci device?

I don't think I've done mych stuff since I posted, I know I removed that device form the list, add it back, rebooted host a few times. Weird thing is now, when I try to start the 4th VM I get an error dialog with "Device 0:25.0 is not a passthrough device"

the vmkernel.log entry for this operation is

2018-03-07T15:31:16.256Z cpu5:14114)Config: 347: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)

2018-03-07T15:31:16.289Z cpu7:77105)MemSched: vm 77105: 7756: extended swap to 8192 pgs

2018-03-07T15:31:16.433Z cpu5:77105)World: vm 77106: 1421: Starting world vmm0:gateway with flags 8

2018-03-07T15:31:16.433Z cpu5:77105)Sched: vm 77106: 6416: Adding world 'vmm0:gateway', group 'host/user/pool2', cpu: shares=-1 min=-1 minLimit=-1 max=-1, mem: shares=-1 min=262144 minLimit=-1 max=-1

2018-03-07T15:31:16.433Z cpu5:77105)Sched: vm 77106: 6431: renamed group 122514 to vm.77105

2018-03-07T15:31:16.433Z cpu5:77105)Sched: vm 77106: 6448: group 122514 is located under group 870

2018-03-07T15:31:16.434Z cpu5:77105)MemSched: vm 77105: 7756: extended swap to 23117 pgs

2018-03-07T15:31:16.473Z cpu5:77105)VSCSI: 3781: handle 8219(vscsi0:0):Creating Virtual Device for world 77106 (FSS handle 879596)

2018-03-07T15:31:16.481Z cpu5:77105)VMKPCIPassthru: 4471: Can not set device 00:19.0 for passthrough

2018-03-07T15:31:16.487Z cpu1:77105)VSCSI: 6343: handle 8219(vscsi0:0):Destroying Device for world 77106 (pendCom 0)

2018-03-07T15:31:16.522Z cpu5:4956)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)

I removed the device from the VM, rebooted the vm, boots ok, added the pci device back to the VM and still this error happens.

I cannot reboot the host today, will try that tomorrow. morning. But this error is weird, I've never received such an error before.

VMDirectPath with ATI GPU document https://docs.google.com/spreadsheet/ccc?key=0Aqp_xYBwP_Y7dE5EclhtaDdIV09lNWxfODd1alRUTlE
Reply
0 Kudos
ciuly
Contributor
Contributor

>> I suspect the two devices in VM #3 and VM #4 is behind the same PCI bridge.

unlikely. The device in 3rd is a usb pcie card. The device in 4th is the onboard LAN.

device info

3rd

00:06:00.0 USB controller Serial bus controller: NEC Corporation uPD720200 USB 3.0 Host Controller

         Class 0c03: 1033:0194

4th

00:00:19.0 Ethernet controller Network controller: Intel Corporation 82579V Gigabit Network Connection

         Class 0200: 8086:1503

# lspci -p

Se:Bu:De.F Vend:Dvid Subv:Subd ISA/irq/Vect P M Module       Name

                               Spawned bus

00:00:00.0 8086:0100 1462:7751 255/   /     @ V

00:00:01.0 8086:0101 0000:0000  11/ 11/0x78 A V              PCIe RP[00:00:01.0]

                                01

00:00:01.1 8086:0105 0000:0000  11/ 11/0x78 A V              PCIe RP[00:00:01.1]

                                02

00:00:14.0 8086:1e31 1462:7751  11/ 11/0x78 A V

00:00:16.0 8086:1e3a 1462:7751  11/ 11/0x78 A V

00:00:19.0 8086:1503 1462:7751   4/  4/0xa0 A P

00:00:1a.0 8086:1e2d 1462:7751  11/ 11/0x78 A P

00:00:1c.0 8086:1e10 0000:0000  11/ 11/0x78 A V              PCIe RP[00:00:1c.0]

                                03

00:00:1c.2 8086:1e14 0000:0000   5/  5/0x98 C V              PCIe RP[00:00:1c.2]

                                04

00:00:1c.3 8086:1e16 0000:0000   3/  3/0xa8 D V              PCIe RP[00:00:1c.3]

                                05

00:00:1c.4 8086:1e18 0000:0000  11/ 11/0x78 A V              PCIe RP[00:00:1c.4]

                                06

00:00:1c.5 8086:1e1a 0000:0000  10/ 10/0x88 B V              PCIe RP[00:00:1c.5]

                                07

00:00:1c.6 8086:1e1c 0000:0000   5/  5/0x98 C V              PCIe RP[00:00:1c.6]

                                08

00:00:1d.0 8086:1e26 1462:7751  11/ 11/0xb0 A P

00:00:1f.0 8086:1e44 1462:7751 255/   /     @ V

00:00:1f.2 8086:1e02 1462:7751   3/  3/0xa8 B V ahci         vmhba0

00:00:1f.3 8086:1e22 1462:7751   5/   /     C V

00:01:00.0 1002:5b62 1002:0b02  11/ 11/0x78 A V

00:01:00.1 1002:5b72 1002:0b03 255/   /     @ V

00:02:00.0 1002:6779 1462:2125  10/ 10/0xc8 A P

00:02:00.1 1002:aa98 1462:aa98   5/  5/0x31 B P

00:04:00.0 8086:10d3 8086:a01f   5/  5/0x98 A V e1000e       vmnic0

00:05:00.0 1b21:1042 174c:2104   3/  3/0xa8 A P

00:06:00.0 1033:0194 ffff:ffff  11/ 11/0x78 A P

00:07:00.0 1095:3531 1095:3531  10/ 10/0xd0 A P              vmhba1

00:08:00.0 1b21:0612 1462:7751   5/  5/0x98 A V ahci         vmhba32

VMDirectPath with ATI GPU document https://docs.google.com/spreadsheet/ccc?key=0Aqp_xYBwP_Y7dE5EclhtaDdIV09lNWxfODd1alRUTlE
Reply
0 Kudos
ciuly
Contributor
Contributor

Thanks for that idea, but the device that fails passthrough is the onboard NIC. That should be on any shared pcie lane.

VMDirectPath with ATI GPU document https://docs.google.com/spreadsheet/ccc?key=0Aqp_xYBwP_Y7dE5EclhtaDdIV09lNWxfODd1alRUTlE
Reply
0 Kudos
ciuly
Contributor
Contributor

I have no explanation to what has gone wrong or how it got "fixed".

Basically I just rebooted the host this morning, allowed for all the VMs to power on and initialize, as usual and then start the 4th VM. All is good this time. I've done this exact thing quite a few times this past week.

Thank you all for your time.

VMDirectPath with ATI GPU document https://docs.google.com/spreadsheet/ccc?key=0Aqp_xYBwP_Y7dE5EclhtaDdIV09lNWxfODd1alRUTlE
Reply
0 Kudos