VMware Cloud Community
defkev
Contributor
Contributor

ESXi 5.5 Build 1474528 - VMDirectPath ATI Bonaire XT - Code 43 after VM Reboot -> OK after Host Reboot -> rinse and repeat

Hi folks,

i am currently scratching my head over a somewhat really weird problem and i am making zero progress on this past a really painful workaround.

Let me elaborate on the topic, just for the sake of it:

I was running a ATI HD4870 with VMDirectPath in a Windows 7 SP1 x64 Guest under ESXi 5.1u2 with VT-d for a coupe of months without any issues.

Lately i decided to upgrade the card to something more recently and energy efficient and was initially aiming for a NVIDIA GTX 750Ti, despite the fact that i didn't had much success getting a NVIDIA consumer-card to work (which was a GTX 480 if i recall this properly) prior to setting up the ATI HD4870 under 5.1u2 - and i didn't had much success with NVIDIA this time either, in fact everything i have seen during testing of the 750Ti was like a "deja vu" (Code 43, no picture, NVIDIA Control Panel APPCRASH, incomplete stats in GPU-Z due to the card not completely initializing, etc etc) so pretty much like what i have "learned" about ESXi w/ NVIDIA consumer-cards will trying to get the GTX 480 to work a good year ago...bad memories

So i was again looking towards ATI - a HD7790 with a Bonaire XT this time.

Since everything was already in place from my previous setup (IRQs aligned, primary/secondary display controller configured) i didn't had much trouble getting the ATI card up and passed-through to a fresh vmx-09 VM with Windows 7 x64 SP1 as the guest OS followed by:

* installing VMware Tools (custom installation w/o SVGA driver as i recalled getting a BSOD will installing ATI Catalyst if the vGPU wasn't running Microsoft WDDM - even though this seems to be fixed in ESXi 5.5)

* guest reboot #1

* installing ATI Catalyst

* guest reboot #2

* installing VMware SVGA-driver (to get the ATI card activated)

* guest reboot #3

After all this i...

* attached a monitor to the ATI card

* set it as the primary display

* disabled the vmrc console (to fully utilize the dedicated gpu)

so just like i did with my previous setup and in fact everything seemed to work just like before.

Now here comes the bummer:

Whenever i reboot the VM the monitor gets blank, the vmrc console reactivated and looking in the Device Manager of the VM the ATI card is marked with a yellow exclamation mark and the dreaded Code 43 "Windows has detected an issue and disabled the hardware"-Curse and so far the only way to get the card working again is by rebooting the entire Host/Hypervisor, at least until i reboot the VM again and the progress starts all over again.

Thinks i did try so far without any sign of success:

* Disabling the Onboard GPU

* Reserving all Memory to the VM

* Playing around with pcie.hole parameter (even through i am only using 2GB vRAM in this VM)

* Setting Physical Mode to IOAPIC (pciPassthru0.msiEnabled = "FALSE")

* Setting the reset method of the card to flr/d3d0/link/bridge using /etc/vmware/passthru.map (as i initially though this had to be some problem with ESXi not "properly" resetting the card during VM reboot)

Looking into the vmk*.logs in /var/logs doesnt provide any usable information either, no errors logged, it's just like the card is working completely fine, at least for the Hypervisor.

So despite rolling back to ESXi 5.1u2 - without any guarantee of success - i am currently running out of options.

Any ideas much appreciated

Thanks for reading.

Cheers!

Reply
0 Kudos
8 Replies
dariusd
VMware Employee
VMware Employee

It's a long shot, but try adding to your VM's .vmx file:

   pciPassthru0.opromEnabled = "TRUE"

This takes us further into uncharted territory, so it might even make things horribly worse, but it also just might get it working...

Cheers,

--

Darius

Reply
0 Kudos
defkev
Contributor
Contributor

No shot, well at least no successful.

* added the parameter to the VM

* installed Catalyst w/ rebooted

* installed the SVGA-Driver w/ rebooted

and got a picture on the attached monitor as usual, but this time instead of the vmrc console becoming the secondary display it went black (mouse/keyboard still working, so just like disabling the SVGA-Display), unfortunately after rebooting the VM i am back to Code 43

Furthermore if disable and re-enable the ATI card within the device manager after the reboot the windows application event log contains two informational Desktop Windows Manager related events:

Code 9002 - DWM was unable to start

and

Code 9007 - DWM was unable to start because WDDM is not in use

Which imho makes no sense since the VMware SVGA-Driver is reporting as WDDM Version 1.0

But on the other hand, if the VMware SVGA would be running XDDM it would explain why the ATI card cannot be activated as all cards need to run the same mode, even though that's only a blind shot.

Reply
0 Kudos
defkev
Contributor
Contributor

Ok nvm, i just installed a Windows XP SP3 Professional guest with the Tools-Light from 5,5 Update 1 (without the VMware SVGA II Driver) and the behavior stays the same, display working prior to rebooting the VM after installing the ATI driver, and Code 10 "Cannot start the hardware" after a VM reboot.

Again the only way to get the card working (for a single VM boot of course) is by restarting the Host.

Furthermore i also updated ESXi 5.5 to Update 1 (now running Build 1623387) but the update really only seems to be security related.

This pretty much negates my previous assumption of the VMware SVGA-Driver being the culprit and boils it down to a problem with the specific Hypervisor and/or Version.

Reply
0 Kudos
defkev
Contributor
Contributor

Finally i am making "some" progress.

After downgrading the Host to the previously known-to-work version 5.1u2, just to realize this one shows the exact same behavior with the HD7790 (Code 43 after VM reboot) i did a complete fresh reinstall of 5.5 and gave the passthru.map another try.

Will setting resetMethod to bridge, link and flr didn't make any difference i am now getting a interrupt related PSOD using d3d0 in the final stage of the Windows Bootloader after rebooting the VM.

So adding:

# ATI

1002 ffff d3d0 false

to /etc/vmware/passthru.map (and rebooting the host) now gives me:

vmvisor-5.5.0-1331820_passthru-1002_ffff-d3d0-false.png

after rebooting the VM with the passed thru HD7790.

As before, the initial boot of the VM is working fine regardless of these settings.

Reply
0 Kudos
defkev
Contributor
Contributor

Update Update

After the HD7790/Bonaire XT didn't prove itself to be very "visualization friendly" no matter what, i got myself a HD7770 (with a Cape Verde XT) and this one is working absolutely fine under ESXi 5.5u1 with the latest Catalyst driver 13.12 in a Windows 7 x64 guest, even after rebooting the VM multiple times and without any specific changes to the *.vmx, modifications of passthru.map or other Voodoo.

I am going to mark the thread as assumed answered, as changing the hardware solves the problem overall, and i start to feel all alone in here anyway.

Reply
0 Kudos
Sandvika
Contributor
Contributor

Definitely not alone. I have blundered into the same pit with inadequate research prior to purchase and build. ASUS NVIDIA GTX 660; Motherboard ASUS Z9PA-D8 Dual Socket 2011 with iKVM; 2 x Intel Xeon E5-2620 v2; 64GB ECC RAM.

I can get past the code 43 in Win 7 by disabling the VMware SVGA but no monitors get picked up. This was supposed to be an all-in-one home sandpit environment but having entered the bear pit I'm in the claws of this dilemma.  I've now appreciation of the underlying issues (basically legacy support) which make PCI pass-through of GPUs challenging and am wondering whether to go down the expensive NVIDIA Quadro route to improve my chances. I had chosen ESXi out of familiarity but at this point as the issue is pass-through and an important goal is HD video editing with storage on the NAS appliance in the same box, I figure that whichever Hypervisor provides the best pass-through will be the one to go for.  I have my work cut out, it seems!

Reply
0 Kudos
Sandvika
Contributor
Contributor

SUCCESS!  On the strength of defkev's exprience kindly reported here and others both here and on a certain other popular hypervisor's forum reporting success with this family of cards, I ordered the ASUS AMD Radeon HD 7870 because I want to use the 2 x DVI outputs for a dual headed workstation. There was a minor delay with this card because it has two PCI-E power connectors whereas the NVIDIA has just the one and I needed to get an adaptor to provide the second.

This time there was no problem installing the AMD driver package in Win 7 even with the VMware SVGA in place but again no monitors got picked up and I got an error from the AMD software saying that no graphics drivers were loaded, despite them checking out fine in Device Manager.  This was however still more promising than the NVIDIA scenario, so I persisted in tinkering with the virtual graphics card configurations.

To cut to the chase, it appears that the VMware SVGA virtual adapter is ESSENTIAL to get the VM to boot and then switch to the hardware, but VMware Tools need to be (re)installed AFTER the AMD driver and software package is in place. The monitors attached to the HD 7870 were picked up and leapt into life during the boot after the tools were reinstalled and have been fine since.  Evidently the SVGA is the primary display - it shows the virtual BIOS then the Win 7 start-up screen and identifies itself as "1" on the display properties, with the twin monitors being "2" and "3", however, "disconnecting" the SVGA in the Win 7 display configuration then leaves the twin heads working and the mouse movement restricted to their limits.  Unlike others, I have not found it necessary to introduce scripts to disable the SVGA adapter during boot and re-enable it during shutdown. This has the odd effect of leaving the Win 7 start-up screen on the SVGA for the duration.  I have a hunch that any BSOD would also get directed there and with my hybrid VM now working nicely I'll make a copy and break it to find out.

My last challenge is pass through of the keyboard and mouse via a USB controller. I have failed to pass through one of the motherboard's onboard USB controllers as an interim measure so I am waiting for a PCI-E riser cable to arrive to give me access to the PCI-E x8 slot obstructed by the heat sink and fans of the graphics card where I will connect and hopefully pass through a 4-port USB 3.0 controller. I have more options to try with the motherboard but if I get nowhere with that I might get started by removing the ASUS MIO sound card (which passes through fine) and putting the USB controller in its slot for the time being. [Update] I was able to pass through the motherboard's onboard USB 3.0 controller, but not one of the onboard USB 2.0 controllers. I have also been able to pass through individual USB 2.0 devices on the controller that I could not pass through.  So keyboard and mouse, scanners and printer are working fine in the guest, in addition to GPU and sound card. I've get to try the video capture device but it is also passed through and configured!

Reply
0 Kudos
ermockler
Contributor
Contributor

I am having a similar but different issue.

Running ESXi 6.0 (beta) on an Asrock H77M mini ITX.

Had a Quadro FX3800 in it running VM's - Win7, SteamOS & Android KitKat.

The Win7 & the SteamOS work fine. I used a USB ->SPDIF piped into the Quadro for sound.

I can only get the screen of the Android when rebooting the host. no matter what. The Win7 & SteamOS work after the Android, before whenever. But the Android must be first or no display.

And the SPDIF wasn't working either in Android, had to use the analog from the USB sound.

So I got a Quadro 2000 from CL, and swapped it out.

Now the Win7 still works fine as before ( I am using scripts to disable & re-enable the Vmware SVGA, here and above)

The Android does not work as the first VM.  It only works AFTER I boot the Win7.

So I run the Win7, then Android, and if I shut down the Android I have to run the Win7 again to get the Android out the display.

This card has sound, so I don't need the USB ->SPDIF anymore. Of course it works in Win7, but not Android yet.

I know the sound issues are Android related, so ignore that.

But it seems the card needs to in a "state" that I only happened upon. It's funny how it takes different measures to get it into that state. I am convinced it's got something to do with the video ram on the card, and how it's left.  Android is different since there is no VMware tools, but SteamOS is the same.  I haven't tried SteamOS yet with this card, since there is quite a bit of brain damage involved.

So I thought I would add my experience here, maybe all this adds up for someone......

Reply
0 Kudos