I’m trying to set up my lab.
Platform: AMD Ryzen Threadripper PRO 5955WX
GPU: nVIDIA RTX a5000
OS: VMWare vSphere ESXi 8.0 Update 1
I signed up for an evaluation account and downloaded the drivers a month ago. I followed the guide to install the VIBs for the vGPU driver and the management daemon.
NVD_bootbank_NVD-VMware_ESXi_8.0.0_Driver_525.105.14-1OEM.800.1.0.20613240.vib
NVD_bootbank_nvdgpumgmtdaemon_525.105.14-1OEM.700.1.0.15843807.vib
After installing both, I took the host out of maintenance mode, and restarted the host. To test the install, I first ran ‘/etc/init.d/nvdGpuMgmtDaemon status’ and received the expected output 'daemon_nvdGpuMgmtDaemon is running. Then, I ran nvidia-smi…
I get the error ‘NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.’
I tried uninstalling the VIBs and re-installing them twice. Didn’t help. What should I do to troubleshoot?
Thanks.
The deed is done! Upon more digging, I found that the SR-IOV was not turned on in the BIOS. Turning it on there did not completely solve the problem. Once it was on, I had to allow SR-IOV in the PCI Hardware configuration of ESXi. Once I turned on SR-IOV for the video card I wanted, I rebooted the system, and presto, the error went away and NVIDIA-SMI gave me what I needed. The only caveat is that I still get the 'NVIDIA: Device Groups generation failed.' alert on startup. Does anyone know how I can fix that?
The deed is done! Upon more digging, I found that the SR-IOV was not turned on in the BIOS. Turning it on there did not completely solve the problem. Once it was on, I had to allow SR-IOV in the PCI Hardware configuration of ESXi. Once I turned on SR-IOV for the video card I wanted, I rebooted the system, and presto, the error went away and NVIDIA-SMI gave me what I needed. The only caveat is that I still get the 'NVIDIA: Device Groups generation failed.' alert on startup. Does anyone know how I can fix that?