VMware Cloud Community
benevida
Contributor
Contributor
Jump to solution

Can't Load nVIDIA 15.2 vGPU Driver in ESXi 8 Update 1

I’m trying to set up my lab.

Platform: AMD Ryzen Threadripper PRO 5955WX
GPU: nVIDIA RTX a5000
OS: VMWare vSphere ESXi 8.0 Update 1

I signed up for an evaluation account and downloaded the drivers a month ago. I followed the guide to install the VIBs for the vGPU driver and the management daemon.

NVD_bootbank_NVD-VMware_ESXi_8.0.0_Driver_525.105.14-1OEM.800.1.0.20613240.vib
NVD_bootbank_nvdgpumgmtdaemon_525.105.14-1OEM.700.1.0.15843807.vib

After installing both, I took the host out of maintenance mode, and restarted the host. To test the install, I first ran ‘/etc/init.d/nvdGpuMgmtDaemon status’ and received the expected output 'daemon_nvdGpuMgmtDaemon is running. Then, I ran nvidia-smi…

I get the error ‘NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.’

I tried uninstalling the VIBs and re-installing them twice. Didn’t help. What should I do to troubleshoot?

Thanks.

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
benevida
Contributor
Contributor
Jump to solution

The deed is done! Upon more digging, I found that the SR-IOV was not turned on in the BIOS. Turning it on there did not completely solve the problem. Once it was on, I had to allow SR-IOV in the PCI Hardware configuration of ESXi. Once I turned on SR-IOV for the video card I wanted, I rebooted the system, and presto, the error went away and NVIDIA-SMI gave me what I needed. The only caveat is that I still get the 'NVIDIA: Device Groups generation failed.' alert on startup. Does anyone know how I can fix that?

View solution in original post

0 Kudos
1 Reply
benevida
Contributor
Contributor
Jump to solution

The deed is done! Upon more digging, I found that the SR-IOV was not turned on in the BIOS. Turning it on there did not completely solve the problem. Once it was on, I had to allow SR-IOV in the PCI Hardware configuration of ESXi. Once I turned on SR-IOV for the video card I wanted, I rebooted the system, and presto, the error went away and NVIDIA-SMI gave me what I needed. The only caveat is that I still get the 'NVIDIA: Device Groups generation failed.' alert on startup. Does anyone know how I can fix that?

0 Kudos