VMware Horizon Community
ITVisionIT
Enthusiast
Enthusiast

ESXi, 7.0.1 NVIDIA GRID X.Org Service stopping

Hello,

Quick overview of my Horizon View cluster environment which contains 4 identical host:

UCSC-C240-M5SX
(2) NVIDIA Tesla T4 GPU in each host
ESXi - 7.0.1, 17168206
ESXi VIB from NVIDA - 450.89

I am able to configure a VM with vGPU resources and when I run "nvidia-smi" I see the allocation and all appears to be working as expected inside of the VM.  However, when I attempt a live vmotion I get the following error:  

One or more devices (pciPassthru0) required by VM are not available on host.

If I shutdown the VM, move it and power it on everything is fine.  I am seeing that the X.Org service is crashing on the host I am trying to move the VM to after a few minutes of it running.  If that service remains running (which its not), I am able to provide live vmotion of VMs with vgpu resources assigned.

I noticed that the X.Org service is stopped and will not start on the other host in the cluster as well.  I have tried to uninstall and reinstall the VIB which has not helped.  I opened a case with VMware and the tech said all the logs I sent look good.  We have a session tomorrow to further troubleshoot but I am at a loss right now.  In the past, when the X.Org service would not start, it was related to the VIB and it needing to be re-installed.

If anyone is running a similar configuration I am interested in any support that can be provided.

Thank you

 

 

Reply
0 Kudos
2 Replies
rfl-itadmins
Contributor
Contributor

Hi ITVisionIT,

Where you able to address this issue?

 

Please advise.

Reply
0 Kudos
rped2k
Contributor
Contributor

 

I have the same error while trying to vmotion a VDI that has a vgpu attached. Did you ever solve this?

Reply
0 Kudos