Hello,
I have an esxi 5.1.0 with 8 pci passthrough configured devices:
- 2 go to one particular VM
- 4 to another
- 1 to a 3rd one and
- 1 to the 4th one.
According to https://kb.vmware.com/s/article/1010789 this should be all good.
Problem is that powering up the first 2 VMs (so totaling 6 passed through devices) I can only power up (and have device passed through ok) one of the 3rd/4th VMs but not both.
The devices appears to be passed through, show up in guest OS but I get various errors while the driver tries to load it up. If I only start either one of the 2, all is good.
So the question: is there some way of having all 8 devices passed through ok? Something like the pciHole setting or whatever other trickery that would work.
In case it matters, the devices are: 1 video card (which shows up as 2 devices), 5 USB controllers (mix of onboard and pcie cards) and 1 onboard NIC
There are also 4 mapped raw luns in case that also counts towards something.
Thanks.
PS I tried to upgrade to 6.0 when it came out but ti didn't recognize some card I had so upgrading is not a real option here. I think I had the same issue with 5.5
Probably you hit the limit of memory you can use for PCI passthrough
ESXi 5.1 and 5.5
Can you check vmkernel log when you try to start other VMs and work with PCI cards?
I don't think it has anything to do with PCI holes and BAR sizes. PCI holes and BAR sizes are per VM. It is not shared between VMs.
I suspect the two devices in VM #3 and VM #4 is behind the same PCI bridge.
There is no -t option in the lspci of ESXi (in Linux the -t option gives a tree view output). But you could check from the ESXi host client UI whether the two devices that are passed through to VM#3 and VM#4 are behind the same PCI bridge. Alternatively, you could list out lspci -v or lspci -p and see whether the two devices in VM#3 and VM#4 have the same PCIe bus number.
If they are behind the same bridge, you should be assigning both devices to the same VM. You could try adding the device from VM #3 to VM #4 or add the device from VM #4 to VM #3 and see if the passthrough is OK (assuming there is no conflict between the two devices within the same VM); so only VM#3 has both devices or VM#4 devices has both devices.
What is your motherboard? Some use shared PCI slots, e.g. some PCI lanes are dedicated to slots, some are shareable and if one slot uses, these they become unavailable to others. Not sure how ESXi is handling this, but I guess that once VM initiate particular lane it is no longer available to other (if shared).
Thanks for the info. Any idea how I can calcualte those BARs or if there is some log entry somewhere that tells me the BAR for each pci device?
I don't think I've done mych stuff since I posted, I know I removed that device form the list, add it back, rebooted host a few times. Weird thing is now, when I try to start the 4th VM I get an error dialog with "Device 0:25.0 is not a passthrough device"
the vmkernel.log entry for this operation is
2018-03-07T15:31:16.256Z cpu5:14114)Config: 347: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2018-03-07T15:31:16.289Z cpu7:77105)MemSched: vm 77105: 7756: extended swap to 8192 pgs
2018-03-07T15:31:16.433Z cpu5:77105)World: vm 77106: 1421: Starting world vmm0:gateway with flags 8
2018-03-07T15:31:16.433Z cpu5:77105)Sched: vm 77106: 6416: Adding world 'vmm0:gateway', group 'host/user/pool2', cpu: shares=-1 min=-1 minLimit=-1 max=-1, mem: shares=-1 min=262144 minLimit=-1 max=-1
2018-03-07T15:31:16.433Z cpu5:77105)Sched: vm 77106: 6431: renamed group 122514 to vm.77105
2018-03-07T15:31:16.433Z cpu5:77105)Sched: vm 77106: 6448: group 122514 is located under group 870
2018-03-07T15:31:16.434Z cpu5:77105)MemSched: vm 77105: 7756: extended swap to 23117 pgs
2018-03-07T15:31:16.473Z cpu5:77105)VSCSI: 3781: handle 8219(vscsi0:0):Creating Virtual Device for world 77106 (FSS handle 879596)
2018-03-07T15:31:16.481Z cpu5:77105)VMKPCIPassthru: 4471: Can not set device 00:19.0 for passthrough
2018-03-07T15:31:16.487Z cpu1:77105)VSCSI: 6343: handle 8219(vscsi0:0):Destroying Device for world 77106 (pendCom 0)
2018-03-07T15:31:16.522Z cpu5:4956)Config: 347: "SIOControlFlag2" = 0, Old Value: 1, (Status: 0x0)
I removed the device from the VM, rebooted the vm, boots ok, added the pci device back to the VM and still this error happens.
I cannot reboot the host today, will try that tomorrow. morning. But this error is weird, I've never received such an error before.
>> I suspect the two devices in VM #3 and VM #4 is behind the same PCI bridge.
unlikely. The device in 3rd is a usb pcie card. The device in 4th is the onboard LAN.
device info
3rd
00:06:00.0 USB controller Serial bus controller: NEC Corporation uPD720200 USB 3.0 Host Controller
Class 0c03: 1033:0194
4th
00:00:19.0 Ethernet controller Network controller: Intel Corporation 82579V Gigabit Network Connection
Class 0200: 8086:1503
# lspci -p
Se:Bu:De.F Vend:Dvid Subv:Subd ISA/irq/Vect P M Module Name
Spawned bus
00:00:00.0 8086:0100 1462:7751 255/ / @ V
00:00:01.0 8086:0101 0000:0000 11/ 11/0x78 A V PCIe RP[00:00:01.0]
01
00:00:01.1 8086:0105 0000:0000 11/ 11/0x78 A V PCIe RP[00:00:01.1]
02
00:00:14.0 8086:1e31 1462:7751 11/ 11/0x78 A V
00:00:16.0 8086:1e3a 1462:7751 11/ 11/0x78 A V
00:00:19.0 8086:1503 1462:7751 4/ 4/0xa0 A P
00:00:1a.0 8086:1e2d 1462:7751 11/ 11/0x78 A P
00:00:1c.0 8086:1e10 0000:0000 11/ 11/0x78 A V PCIe RP[00:00:1c.0]
03
00:00:1c.2 8086:1e14 0000:0000 5/ 5/0x98 C V PCIe RP[00:00:1c.2]
04
00:00:1c.3 8086:1e16 0000:0000 3/ 3/0xa8 D V PCIe RP[00:00:1c.3]
05
00:00:1c.4 8086:1e18 0000:0000 11/ 11/0x78 A V PCIe RP[00:00:1c.4]
06
00:00:1c.5 8086:1e1a 0000:0000 10/ 10/0x88 B V PCIe RP[00:00:1c.5]
07
00:00:1c.6 8086:1e1c 0000:0000 5/ 5/0x98 C V PCIe RP[00:00:1c.6]
08
00:00:1d.0 8086:1e26 1462:7751 11/ 11/0xb0 A P
00:00:1f.0 8086:1e44 1462:7751 255/ / @ V
00:00:1f.2 8086:1e02 1462:7751 3/ 3/0xa8 B V ahci vmhba0
00:00:1f.3 8086:1e22 1462:7751 5/ / C V
00:01:00.0 1002:5b62 1002:0b02 11/ 11/0x78 A V
00:01:00.1 1002:5b72 1002:0b03 255/ / @ V
00:02:00.0 1002:6779 1462:2125 10/ 10/0xc8 A P
00:02:00.1 1002:aa98 1462:aa98 5/ 5/0x31 B P
00:04:00.0 8086:10d3 8086:a01f 5/ 5/0x98 A V e1000e vmnic0
00:05:00.0 1b21:1042 174c:2104 3/ 3/0xa8 A P
00:06:00.0 1033:0194 ffff:ffff 11/ 11/0x78 A P
00:07:00.0 1095:3531 1095:3531 10/ 10/0xd0 A P vmhba1
00:08:00.0 1b21:0612 1462:7751 5/ 5/0x98 A V ahci vmhba32
Thanks for that idea, but the device that fails passthrough is the onboard NIC. That should be on any shared pcie lane.
I have no explanation to what has gone wrong or how it got "fixed".
Basically I just rebooted the host this morning, allowed for all the VMs to power on and initialize, as usual and then start the 4th VM. All is good this time. I've done this exact thing quite a few times this past week.
Thank you all for your time.
