VMware Cloud Community
link_shahzad
Contributor
Contributor

VGPU configuration with esxi 6.7

Hi Dear all, i want to help with tesla t4 GPU installations with VM in ESXI 6.7, i have free version key of host and try to install  VCSA for configuration, VCSA is downloaded trail version, when its reached at 80 person at first step its stuck any one help me in this regards,

 

0 Kudos
10 Replies
nagesh_u
Enthusiast
Enthusiast

Can you pleas give more information on the error message and it would be great if you can update the screen shot 

0 Kudos
berndweyand
Expert
Expert

this requires enterprise plus licenses (or evaluaton mode for 60 days)

you need to install the VIB on the host and the guest drivers in the vm. which nvidia-version do you want to install ?

https://docs.nvidia.com/grid/

0 Kudos
link_shahzad
Contributor
Contributor

Thanks for reply! i have installed "NVIDIA-VMware-470.63-1OEM.670.0.0.8169922.x86_64.vib" this vib file on ESXi 6.7 licensed (customer support provided key), and NVIDIA license also registered for trail basis (90 days), after then i install VCSA 6.7 and 7.0 boht tried with trail (60 days) registrations (vsphare server for VM shairing of VGPU and other configurations without pass-through the NVIDIA card. At first stage of VCSA installations i stuck on 80% which are RPM stuck. Please help me..... Second thing is it posible if i install only one VM on ESXI 6.7 and without Vsphare server installation shair the all 16 GB of Tesla T4 Nvidia card to VM, ???

0 Kudos
berndweyand
Expert
Expert

so you have trouble in deploying vcenter-appliance ?

esxi6.7 is runnng with the gpu manager ?

to your second question: it is not possible to use nvidia grid without vcenter. with the hostclient you are not able to add a pci device to the vm

0 Kudos
link_shahzad
Contributor
Contributor

Anyone tell me Nvidia license in trail registration (90 days) give us only 1 license in windows installations. or ESXI vib installations, because when i allot number of 16 part of Tesla T4 card to vm its not working VM going to reboot, when i allot 1 part of tesla card then VM working smoothly.. Its mean Nvidia driver not working properly in trail 90 days. Please help me this regards, Following is putty outputs.

 

login as: root
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

WARNING:
All commands run on the ESXi shell are logged and may be included in
support bundles. Do not provide passwords directly on the command line.
Most tools can prompt for secrets or accept them from standard input.

VMware offers supported, powerful system administration tools. Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
[root@localhost:~] nvida-smi
-sh: nvida-smi: not found
[root@localhost:~] dmesg | grep NVIDIA
2021-09-22T10:16:15.636Z cpu10:2100477)ALERT: NVIDIA: module load failed during VIB install/upgrade.
2021-09-22T10:16:15.645Z cpu8:2100478)NVIDIA: Starting vGPU Services.
2021-09-22T10:16:15.659Z cpu33:2100481)NVIDIA: Starting Xorg service.
2021-09-22T10:16:20.959Z cpu40:2102613)NVIDIA: Starting the DCGM node engine.
[root@localhost:~] nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

[root@localhost:~]

0 Kudos
berndweyand
Expert
Expert

ok - what server hardware do you have ?

what esxi-version do you have ? which buildnumber ? nvidia requires 6.7 build 17167734 or 7.0 u2 minimum

bios/firmware updated ?

bios-settings: sr-iov enabled ?

try to disable the onboard graphics

 

0 Kudos
link_shahzad
Contributor
Contributor

Dear my hardware is power edge Dell R740, and ESXi (Updated) ESXi-6.7.0-8169922-standard (VMware, Inc.) installed. BIOS Version/Date Dell Inc. 2.11.2, 4/21/2021
SM BIOS Version 3.2
Embedded Controller Version 255.255
BIOS Mode Legacy
Base Board Manufacturer Dell Inc.
Base Board Product 06WXJT
Base Board Version A02
Platform Role Enterprise Server

Next your question  bios-settings: sr-iov enabled. Yes its enabled and disabled the onboard graphics.

 

 

0 Kudos
IRIX201110141
Champion
Champion

Our Dell Servers need special hardware specification to run Nvidia M60

  1. 1100 Watt PSU or greater
  2. The CPU options are limited because of smaller heatsinks
  3. GPU Option kit because cables are "special"

And the most important is a special BIOS Settings. Without this the nvidia-smi will never work and this is the first step to get the whole running!

Notes:
When using Nvidia A100 there is a memory(1TB) limit for the Server
Cant remember that the T4 is "supportet" from DELL for the R740. I can took a look into the support matrix if needed.

Regards,
Joerg

0 Kudos
IRIX201110141
Champion
Champion

Please verify the following and reboot if you need to change it:

BIOS Integrated Devices
User Accessible USB PortsAll Ports On
iDRAC Direct USB PortOn
SR-IOV Global EnableDisabled
I/O Snoop HoldOff Response2K Cycles
Empty Slot UnhideDisabled
OS Watchdog TimerDisabled
Memory Mapped I/O above 4GBEnabled
Memory Mapped I/O Base56TB
Internal USB PortOn
Integrated Network Card 1Enabled
Embedded Video ControllerEnabled
I/OAT DMA EngineDisabled
Current State of Embedded Video ControllerEnabled

 

Otherwise i have to check my installation Docs. Please try it and run  nvidia-smi again.

0 Kudos
berndweyand
Expert
Expert

yes the T4 is supported by Dell. I have 20 R740 running with 3 T4 each. but i have nvidia gpu software 8 installed, not 13

the memory mapping setting 56tb isnt needed anymore since bios 2.x (dont know the exact version)

@link_shahzadplease update your esxi - your version is 6.7GA from 2018 - nvidia supports build 17167734 or newer

also it looks that you installed from standard-iso and not from dell customized-iso. please reinstall or update with this iso: https://customerconnect.vmware.com/de/downloads/details?downloadGroup=OEM-ESXI67U3-DELLEMC&productId...