Solved: Re: Compute mode vs graphics

OleWeel · ‎02-16-2018

Hi,

I need some help to understand computemode.

Today we have a test system with Nvidia M60 card, and we have been running it in graphics mode, and added several different profiles to ESXi VM`s and that has been working well.

Now we have a customer wanting to test this system in Compute mode instead, so we have changed it with the switch mode command and that`s seems ok. But what I now don`t understand is, should we add profiles to the VM`s or should they use regular VMware VGA card ? They are going to test some applications on CentOS, not quite sure what applications, but it uses Compute mode they said.

The reason I am wondering about the profiles, is because I tried to boot one of the VM`s after we changed ESXi host to compute mode, and they fail to boot with the error "Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU"

Please explain what I am missing here, and the concept around compute mode.

Thanks for reply.

Regards

Andreas

Regards Andreas

bluefirestorm · ‎02-17-2018

Is the error still the same?

The differences between "Compute" and "Graphics" mode are documented here

http://docs.nvidia.com/grid/latest/grid-gpumodeswitch-user-guide/index.html#compute-and-graphics-mod...

So there is no ECC and have a smaller BAR address space (256MB vs 8GB) if it is "graphics mode" and legacy mode is disabled in "compute mode". Legacy mode disabled will probably not work with VM virtual firmware that is BIOS. A large BAR address space will require the VM to use EFI as its virtual firmware and use 64bit MMIO.

https://kb.vmware.com/s/article/2142307

pciPassthru.use64bitMMIO="TRUE"

I have a hunch that it is not an absolute requirement that the Tesla M60 has to be set to "Compute Mode" in order for the compute capabilities to work (just like CUDA applications work in GeForce cards that are also used as display device in physical PCs/laptops). If you know the GPU compute application is using CUDA, you can download and try any of the CUDA samples from the Nvidia website (although I think you have set up a build environment and build the samples). Or you could ask the users to supply you with the CUDA samples as a test to check whether the CentOS VM recognises it as a CUDA compute device aside from being used as a display device.

View solution in original post

bluefirestorm · ‎02-16-2018

Have a look at both the VMware KB and Nvidia KB that seems to describe the problem that you face.

https://kb.vmware.com/kb/2149193

http://nvidia-esp.custhelp.com/app/answers/detail/a_id/4106/~/having-problems-with-new-m6%2Fm60-like...

OleWeel · ‎02-17-2018

Hi,

Thanks for reply, but as I understand that article is, if you are having problems with the graphics mode.

My question is related to compute mode, and my M60 card is configured for compute mode, as you can see from the Class 0302

[root@esx05:~] lspci -n | grep 10de

0000:03:00.0 Class 0302: 10de:13f2 [vmgfx0]

0000:04:00.0 Class 0302: 10de:13f2 [vmgfx1]

[root@esx05:~]

What I don't understand is, how do I configure a VM to use compute mode ? Is for example the VM able to run compute mode applications as long as its deployd on a ESXi host that has a M60 card running in Compute Mode ? Or do I have to configure profiles, or something else ? To me it seems like the profiles are only for Graphics mode ?

Grate if someone could educate me on this

Regards

Andreas

Regards Andreas

bluefirestorm · ‎02-17-2018

Yes, both KB doesn't give advice on how to configure "Compute Mode" properly. But the lspci does show that the card is in "compute mode" given it shows 0302.

Did you add a virtual video adapter that to the VM that uses the VMware video driver? From the looks of it, the VM is looking for a graphic card to boot up with and it is looking for the Nvidia card that was configured for video display passthrough which is now in compute mode.

For compute capabilities on the consumer GeForce line on a physical PC/laptop, it can function as both at the same time (as graphic display and compute capability can still be used). An application would use CUDA (usually) for the compute capability on the Nvidia card.

OleWeel · ‎02-17-2018

Thanks for fast reply.

Sorry I have lack of knowledge when it comes to this, so please educate me

I have set the Nvidia M60 card in compute mode as you can see on the ESXi host, but what I am not sure about is the video settings when deploying a VM that should use compute mode.

Should I configure it like this ?

Or should I configure it like this ?

I don`t know if I need to use vGPU profiles on Compute mode, if that's not the correct way ?

Thanks again for reply.

Regards Andreas

bluefirestorm · ‎02-17-2018

I think you know more about GPU passthrough than I do. It does not seem easy to find documentation about GPU passthrough as compute node.

But I would think you need to add a virtual graphics adapter that does not use the M60 passthrough. Therefore the CentOS VM should boot up using the vmwgfx driver instead of trying to load an nvidia driver that looks for the M60 card that is already in compute mode. Perhaps you add the M60 as "Other device" for passthrough so that the CentOS VM sees as it as well and try to verify using lspci inside the CentOS VM.

OleWeel · ‎02-17-2018

Hi,

Well I don`t understand the whole picture

To me it seems like I have to do the following

1. Set the ESXi host in Compute mode

- This is done

2. Configure the M60 as pass-thru mode

- This has now been done, was not done before on the other posts, I believe it has to be configured as pass-thru mode but not sure ? And as you can see from the image below the VM has assign the M60 card.

As you can see from the image below, I add the M60 card as a PCI Device and not Shared PCI device as I believe this is correct since its pass thru

No error so far, but when I now try to boot the VM I get the following error

I guess there are some simple steps that I am missing, or doing wrong.

For example what I don`t know is that should I use Compute mode with Pass Thru, or will it only work with Graphics mode and Pass Thru

Regards Andreas

bluefirestorm · ‎02-17-2018

Is the error still the same?

The differences between "Compute" and "Graphics" mode are documented here

http://docs.nvidia.com/grid/latest/grid-gpumodeswitch-user-guide/index.html#compute-and-graphics-mod...

So there is no ECC and have a smaller BAR address space (256MB vs 8GB) if it is "graphics mode" and legacy mode is disabled in "compute mode". Legacy mode disabled will probably not work with VM virtual firmware that is BIOS. A large BAR address space will require the VM to use EFI as its virtual firmware and use 64bit MMIO.

https://kb.vmware.com/s/article/2142307

pciPassthru.use64bitMMIO="TRUE"

I have a hunch that it is not an absolute requirement that the Tesla M60 has to be set to "Compute Mode" in order for the compute capabilities to work (just like CUDA applications work in GeForce cards that are also used as display device in physical PCs/laptops). If you know the GPU compute application is using CUDA, you can download and try any of the CUDA samples from the Nvidia website (although I think you have set up a build environment and build the samples). Or you could ask the users to supply you with the CUDA samples as a test to check whether the CentOS VM recognises it as a CUDA compute device aside from being used as a display device.

OleWeel · ‎02-17-2018

Hi,

Not quite same error message, but the vm will not start.

If I configure the ESXi host in Compute mode, and configure the M60 card as shared, and add it as a shared pci device, select vGPU profile Q8, then tries to boot the vm it fails

If I configure the ESXi host in Compute mode, and configure the M60 card as pass thru, and add it as a pci device, then tries to boot the vm it fails

If I configure the ESXi host in Graphics mode, and configure the M60 card as shared, and add it as a shared pci device, select vGPU profile Q8, then the vm boots ok, but not sure if it then can be used with CUDA applications

If I configure the ESXi host in Graphics mode, and configure the M60 card as pass thru, and add it as a pci device, then the vm boots ok, but not sure if it then can be used with CUDA applications

When I configured it with compute mode, I did not change any thing spesific I only run the command gpumodeswitch --gpumode compute and rebooted the host, do you believe I have to do these things as shown below ?

When I right click the VM, check Bios it said "BIOS" end not "EFI"

I am really out of my knowledge regarding this...hehe.

Your are referring to CUDA applications / Samples from Nvidia, is there a easy way to test if the VM is configured for CUDA for example a simple .exe application that will start and say "Yes this machine will work with CUDA applications" hehe ?

Regards Andreas

bluefirestorm · ‎02-17-2018

You can't switch between BIOS and EFI for the virtual firmware for a VM as it won't be bootable anymore.

It looks like you can set the mode of the individual GPUs in the Tesla M60 separately.

http://docs.nvidia.com/grid/latest/grid-gpumodeswitch-user-guide/index.html#switching-individual-gpu...

As an alternative is you can create a new CentOS VM that uses EFI virtual firmware.

You can test the existing CentOS VM with BIOS virtual firmware and with 1 GPU inside in the Tesla M60 set as graphics mode while you use the other GPU in compute mode with the CentOS VM with EFI virtual firmware. This way you can move forward with testing GPU compute with the existing VM while you create another CentOS VM using EFI. You might end up being able to compare difference in performance (if any exists) for the GPU compute application if the Tesla M60 is in graphics or compute mode.

The CUDA SDK has some samples but a build environment has to be set up, and the samples have to built. But given CUDA applications works for lower priced GeForce as both display and CUDA compute, it is unlikely a much more expensive Tesla M60 would not have the capabilities of a lower-priced GeForce card. A lot of the ready to install and download demos from the Nvidia are for rendering demonstrations (so not necessarily CUDA compute) and usually available only for Windows platforms.

All

Compute mode vs graphics