GregElement
Contributor
Contributor

View 6.1 w/ Nvidia K1 Error booting up machine after Crash "the amount of Graphics resources available in the parent resource pool is Insufficient"

Hey guys,

We are running View 6.1 on ESXi 6.0 w/ Nvidia K1 Grid cards (in vGPU) setup using the K160 Profile.. today we hard a strange issue.

One of the machine on a the host crashed and need to be reboot.. but we were unable to reboot it due to the following error.

An error was received from the ESX host while powering on VM xxxxxxxxx. Failed to start the virtual machine. Module DevicePowerOn power on failed.

Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_k160q'. No graphics device is available for vGPU 'grid_k160q'.

also after words we tried to power of an power on a different Virtual Desktop on the same host and received the same error.. now the k1 one grid card was not at capacity we have 8 available slots for desktops on it.. and only 6 were being used.

has anyone else seen anything like this or have any idea of how to fix it?

Thanks

8 Replies
Pooran98
Enthusiast
Enthusiast

We are getting a similar error but with a different grid profile with our K1 grid cards

Power On virtual machine Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'grid_k140q'.

This occurred after updating ESXi from 6.0 to 6.0U1 and View from 6.1 to 6.2.1.

This error initially occurred on two of our hosts after we reinstalled ESXi on both, one host is now working fine but the second is still throwing the same error. 

We have SR opened with VMware

0 Kudos
rellis123
Enthusiast
Enthusiast

We are looking at doing a proof-of-concept shortly using the K1 Grid cards, and this type of issue really, really worries me. We could not have an issue in production where desktops would not reboot.

Were you able to workaround this issue by taking the affected machines outside the scope of K1 Grid virtualisation and just reverting back to standard VGA?

Did you get a resolution via support?

Any extra information would be really appreciated.

0 Kudos
RichardEnerBank
Contributor
Contributor

Had the same problem, support was stumped, hp was stumped.  Long story short you have to use the "VMware vSphere 6 Desktop Host License" for this to work, using "Enterprise" doesn't work cause it doesn't list vGPU as a supported feature.  This tripped us up quite a bit, we had been using vDGA and vSGA for some time with no issue.  We have view but haven't got a very big environment so we didn't have dedicated VDI hosts.

Hope this helps.

0 Kudos
YeskeJA
Contributor
Contributor

Any Support resolution to this?

We recently upgraded to Horizon View 6.2.1 with the NVIDIA GRID K1s (1.0) and driver v354.56. Dedicated VDI cluster with Hosts licensed under 'VMware vSphere 6 Desktop Host.(VMs)'.

Anything higher than a K120Q profile results in "the amount of Graphics resources available in the parent resource pool is Insufficient" and fails to start the VM.

0 Kudos
erikl86
Contributor
Contributor

Hi Greg,

I had the same problem. I reinstalled some servers for resource reasons. When migrate a VM to the reinstalled host I got the same error message you have when booting up the VM. I updated the the esxi host to the latest patch level and this solved the problem for me. I updated from 6.0.0 3073146 to 6.0.0 3380124.

Hope this works for you.

Erik

0 Kudos
JJaX
Contributor
Contributor

Same issue as you, I can do K120Q without any problems.

I tried created a new template for K140Q and I get the same error on boot.

I am going to try upgrading to ESXi 6.0 U1b on a host to see if it resolves it.

Update:

Upgrading my host did not resolve my issue.

However, I learned that a GRID K1 has 4 GPUs and each GPU can only serve one type of vGPU profile or passthrough.

If you putty to you host and run "nvidia-smi" you can see what VMs are using what GPU.

In my case, I had all 4 GPUs in use with VMs using a K120Q profile, so a VM using a K140Q profile would not boot.

I hope this helps.

0 Kudos
ITVisionIT
Enthusiast
Enthusiast

Hello,

I had a similar issues and noticed that when running "nvidia-smi" that one of the GPUs was in an ERR state. I am going to attempt a reboot of the host to see if this clears and if not I might start to assume that one of the GPUs on the card is toast.

pastedImage_1.png

0 Kudos
iooy
Contributor
Contributor

Hello

My environment uses GRID K2 in vSphere 6.5, but thinks that it is the same cause.

Please refer to following KB.

“Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU "profile_name"” error when powering on the VM (2149193)

https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2149193&s...

Or "Maximum vGPUs per GPU" or "Maximum vGPUs per Board" in "Table 1 GRID K1 Virtual GPU types" at the following URL.

Or Figure 3 Example vGPU configurations on GRID K2 at the following URL.

http://images.nvidia.com/content/grid/pdf/GRID-vGPU-User-Guide.pdf

0 Kudos