VMware Cloud Community
brayxu
Contributor
Contributor

Passing through Tesla k80 Issue...

After adding a Nvidia Tesla K80M PCI pass-through device into a Guest OS, the Guest OS failed to start.

Here is the related message in the vmware.log.

vmx| I120: PCIPassthru: total number of pages needed (4206592) exceeds limit (917504), failing

I have add pciHole.start = “2048” to vxm file, but is invalid.

Here is the vmx file and log file, thanks!

Tags (1)
Reply
0 Kudos
14 Replies
vmclouds
Enthusiast
Enthusiast

looks like a memory issue. As per documentation support should be there for  Processor support for Platform support for I/O DMA remapping.  http://us.download.nvidia.com/Windows/Quadro_Certified/350.12/350.12-win8-win7-winvista-quadro-grid-... Can you please that?

Also one more try to do might be do some memory reservation and then try.

Regards, Rajn https://virtualtraces.wordpress.com/
Reply
0 Kudos
Linjo
Leadership
Leadership

There is a few things to try but lets start with the simple ones:

Upgrade the bios on the host to the latest one.

Upgrade the firmware on the GPU to the latest one.

In the bios look for a setting that is similar to “Enable >4G Decode”, “Enable 64-bit MMIO”, “Above 4G Decoding”.

     It should be set to “Disabled”

Install the latest ESXi patches.

Create a new VM using hardware version 10, reserve all memory for the vm. (No need to edit the vmx-file anymore)

Install Windows 7.

Passthrough the GPU


If still problems post the vmware.log again.

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos
keshavkant
Enthusiast
Enthusiast

Pease check is Nviaxxxxx is compatible with current version of ESXi.

Reply
0 Kudos
brayxu
Contributor
Contributor

Thanks for http://us.download.nvidia.com/Windows/Quadro_Certified/350.12/350.12-win8-win7-winvista-quadro-grid-...

1.In this document,K80 are supported for device passthrough with ESXI

2.In the "Known Issues": VMware • PCI I/O hole may need to be changed for Windows 64-bit VMs. Windows 64-bit VMs may require that you edit the VM configuration file to configure a larger PCI I/O hole for the GPU.

   I have set PCI I/O hole to 2048 ,do not slove this problem..

Reply
0 Kudos
brayxu
Contributor
Contributor

Hi, what is Nviaxxxxx ,thanks!

Reply
0 Kudos
brayxu
Contributor
Contributor

Hi, The machine is Dell R730, and it's bios is newest.

In the bios, I set Memory Mapped I/O above 4GB to disable,after that, R730 can not boot successful.....

So I can not go to next step.Thanks!

By the way ,the R730 plug two Tesla K80 ,each Tesla K80 have two gpu-chip and 24GB grahics memory.

Perhaps can not support two K80 or 24GB memory?

Reply
0 Kudos
brayxu
Contributor
Contributor

Attach R730's error screen

Reply
0 Kudos
GeertUGent
Contributor
Contributor

Hi brayxu,

did you manage to solve this problem or is it still open ?

I have the same hardware (R730 + one Tesla K80) and I'm interested to hear if anyone has solved this problem by now (although I don't bare high hopes after looking around on the internet for a very long time).

Thanks,

Geert.

Reply
0 Kudos
JoshSimons
VMware Employee
VMware Employee

A previous version of this post included advice to add two VMX file entries (efi.legacyBoot.enabled and efi.bootOrder) as part of the solution. These two settings should NOT be used. Instead, following the directions below.

--------

You should be able to pass a single GPU (that is, half of a K80) to a VM running on ESX 6 by creating an EFI-bootable VM, doing an EFI installation of your guest OS, and then adding the following to the VM's VMX file.

pciPassthru.use64bitMMIO="TRUE"

Trying to pass more than one of these GPUs into the same VM will currently hit a platform memory limit and the VM will fail to boot. (NOTE: This limit has been removed in ESX 6.5).

A smaller card like the K2 does not have this issue: GPGPU Blog Entry

If the above does not work for you, send me email directly at "simons at vmware dot com". In either case, please share your experience with others on the thread.

And if you have any other questions about running HPC applications in a VMware environment, I'd be happy to hear from you directly.

If you are interested in learning more of what we've been doing related to HPC, you can check out our HPC entries on the VMware CTO blog site here: HPC Blog Entries

Josh Simons

High Performance Computing

Office of the CTO

VMware, Inc.

shumy
Contributor
Contributor

Hello, I have the same issue. When applying these lines in the vmx in a windows 10 Pro vm, the machine no longer starts.

When using just pciPassthru.use64bitMMIO="TRUE" I can detect the new hardware in win10, but NVIDIA instalation of "356.54-tesla-desktop-win10-64bit-international-whql.exe" never finishes.

Reply
0 Kudos
israeldias
Contributor
Contributor

This works, thanks for the tip.

Reply
0 Kudos
asvinp
Contributor
Contributor

Hi, I was wondering how you got it to work?

I have a Tesla P100 GPU and I'm trying to passthrough to a VM on ESXi 6 which is on a Dell PowerEdge R730.

Adding the parameter doesn't seem to work for me. The GPU can be added to the Vsphere passthrough list (Advance Settings). After that I set up a Win 10 vm and installed it on EFI and used your parameter,"pciPassthru.use64bitMMIO", and in Windows it sees an unknown 3D Video Controller (Before and after installing VMWare Tools). The Nvidia Tesla drivers don't install as it says the version of Win is not supported and the graphics card can't be found, even though, the driver was from Nvidia for Win 10 and the GPU was added as a PCI device to the VM. Truly appreciate any help as I couldn't find much information online.

Reply
0 Kudos
cpsaicleidos
Contributor
Contributor

I'm running into this same issue.  After adding a Tesla V100 GPU, I get:

vmx| | I005: PCIPassthru: Device 0000:c8:00.0 barIndex 0 type 2 realaddr 0xe8000000 size 16777216 flags 0
vmx| | I005: PCIPassthru: Device 0000:c8:00.0 barIndex 1 type 3 realaddr 0x38d800000000 size 17179869184 flags 12
vmx| | I005: PCIPassthru: Device 0000:c8:00.0 barIndex 3 type 3 realaddr 0x38dc00000000 size 33554432 flags 12
vmx| | I005: PCIPassthru: Device has PCI Express Cap Version 2(size 60)
vmx| | I005: PCIPassthru: Registered a PCI device for 0000:c8:00.0 vIRQ 0x11, physical MSI = Enabled (vmmInt = Enabled), IntrPin = 1
vmx| | I005: PCIPassthru: total number of pages needed (4206592) exceeds limit (917504), failing
vmx| | I005: Module 'DevicePowerOn' power on failed.

Any ideas?

Reply
0 Kudos
kaushik_ray
Contributor
Contributor

you need to add another parameter for this to work refer to the article below. 

pciPassthru.64bitMMIOSizeGB = ????

https://earlruby.org/2022/02/calculating-the-value-for-64bitmmiosizegb/

Reply
0 Kudos