VMware Horizon Community
jslarouche
Enthusiast
Enthusiast

VM reset stuck at 95%

Is anyone having this issue when you reset a VM using View Manager and that it sticks at 95%? If i reboot the ESX host the VM goes into a invalid state in VIrtual Center. Does anyone know why this is happening?

0 Kudos
17 Replies
mgmn
Enthusiast
Enthusiast

I had this happen to me just this morning. I had to rebuild the VM, luckily it was just a test system and not someone's main system.

davesherman
Enthusiast
Enthusiast

I have seen this quite a few times, been happening since beta. I would recommend opening a SR right away (I have an SR open with VMware right now regarding this issue). Not sure what causes the VM to hang up, but in my case the only option I had was to crash the VM (received mks handshake error on VM console). From the users that have reported it perspective, it seems to happen when they are reconnecting to an existing PCoIP session (maybe different resolution/resizing issue?).

Just one tip though, if you crash the machine with the vm-support utility from the service console (vm-support -x to get the world ID, then vm-support -X <wid>), it will bring the machine down somewhat more gracefully than an ESX reboot, and I have seen it eliminate the need to reboot ESX. It will also create a log bundle for you to submit to VMware when you open an SR.

jslarouche
Enthusiast
Enthusiast

Well that did the trick.. That will certainly be a short term workaround.. I'ved poked at our Sales Engineer to see if there is any internal KB on the issue. If i find anything i'll pass it along. We are currently running a small POC environment at the moment. Looks like Vmware needs to fix the bug.

0 Kudos
MattEcc
Contributor
Contributor

VMware has been able to reproduce this issue in house and while we're still working on a fix, we have a somewhat crude workaround that may be helpful for anyone encountering this issue.

The root issue itself is very sporadic but is generally possible to occur whenever PCoIP connections are established or the guest resolution changes (typically in response to maximizing, restoring or otherwise changing the PCoIP client window size). This causes the VM to get into a wedged state such that it becomes unresponsive and the VM, if reset, gets stuck at 95%.

The workaround is to apply the registry key change specified below to the Windows XP guest (we do not yet know whether the issue affects Vista or Win7, nor whether the registry workaround works for them). This registry change will disable "screen blanking" in the VC console when a PCoIP session is active. This unfortunately means that someone with VC console access can "snoop" on a PCoIP session. This can create very real privacy issues and is a major limitation of the workaround that should be considered carefully by anyone choosing to apply it.

We're actively working on a real fix for this, but don't have an ETA at present.

Registry fix:

http://HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap

"NoBlankOnAttach"=dword:00000001

To revert the fix:

http://HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware SVGA DevTap

"NoBlankOnAttach"=dword:00000000

This workaround has only undergone light testing and is provided directly from engineering without official support (although we're more than happy to stay on top of this thread to help handle any issues or questions you may hit).

-Matt

0 Kudos
zzmax65
Enthusiast
Enthusiast

Hi Matt,

we have the same problem, and only when resizing the desktop in PCoIP. But I cannot find the registry key, need I add it?

thanks in advance

Max

0 Kudos
MattEcc
Contributor
Contributor

Yes. Sorry, I probably should have been more explicit with the registry instructions.

You can take those lines and put each into it's own xyz.reg file and those files then can be run directly to achieve the desired effect.

Or you can create the key manually if it does not exist.

I'm attaching two example registry files to save anyone the trouble as well.

SVGADevTap-NoBlankOnAttach.reg enables the workaround

SVGADevTap-BlankOnAttach.reg restores the default behavior and disables the workaround

0 Kudos
zzmax65
Enthusiast
Enthusiast

ok, it works... thanks a lot!

Ciao Max

0 Kudos
virtualroe
Contributor
Contributor

During out testing we have seen the same type of problem, though when we experienced we did end up killing some of the processes via the console and bouncing the ESX server. So I am curious if this workaround will address this short term to minimize the need to perform the additional steps.

I made this registry change on 1 VM and did some checking. I do see what you mean that you can "snoop" on the PCOIP session. I did also notice that the person "snooping" can take control of the session. (In our case, we were able to interrupt the session by moving the IE browser and open a website) We can live with the visibility of the PCOIP session during test if its a read only view. In this case it does not appear to be. Is there no way to disable the keyboard/mouse control via the console in those circumstances?

0 Kudos
MattEcc
Contributor
Contributor

For anyone encountering this issue, and is unsatisfied with the workaround of disabling screen blanking, please contact VMware support.

Message was edited by: rickblythe

0 Kudos
mattwilson
Contributor
Contributor

We have had this happen a number of times in our test and now production environment. Sometimes the machine process can be killed and then the VM can be powered back on. Other times the VM files are left locked and requires a host reboot... and then they can be powered back on. In at least one instance the previous to steps didn't fix the issue and the VM was rendered unusable. We have had a case open with VMware for over a week now and are grilling them hard as this presents instability in the environment.

I'll post any updates I receive from support.

0 Kudos
admin
Immortal
Immortal

All I thought I had posted a response to this weeks ago but I guess I never finished and submitted it.

This problem was identified and patched back in Dec. There is hot patch available through VMware support. The patch was also added as part of the VMware patch update in Jan availalbe through VUM. If you apply the patches or find this one individually you can apply it with VUM. I do not have the patch description handy but will see if I can find it.

WP

0 Kudos
mattwilson
Contributor
Contributor

VMware support confirmed that a patch released in January should fix the issue. Its quite a major patch 400+mb!

We'll plan on deploying this next week.

Here are the patch details:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101629...

Released on Jan 5th 2010

0 Kudos
ncolyer
Enthusiast
Enthusiast

I believe our environment is having the same issue as this. I opened up a seperate thread and 2 support cases with VMWare about this.

Occasionally our View desktops just hit 100% CPU and we have to reset them because the user can't get back in.

We are trying these hotfixes to see if it helps. It seems VMWare released another one on March 3rd also which can be found at

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=101746...

0 Kudos
ncolyer
Enthusiast
Enthusiast

Still a problem.

ESX Server build is at 236512.

Another VM froze last night using the PCOIP protocol. Only way to get it back was to reset it again through vCenter, so the problem still exists with VM's freezing. We never had this problem until we moved them from the our 3.5 ESX Hosts to this one. We currently have about 6 VM's running on the 1 host. All these VM's were imported from our old ESX 3.5 (3 host) cluster.

VMWare Support have been very unhelpful with this. Multiple incidents opened with them and not one engineer there mentioned the hotfixes above.

0 Kudos
csimwong
Contributor
Contributor

Has anyone been ok after either the registry workaround or by applying the patch released in Jan?

I'm having very similar issues after upgrading our hosts to ESX 4.0 and migrating our VMs to View 4.0 using PCoIP.

We opened an SR this week with Support and the fault group is analyzing the dump files and logs that we were able to generate.

We have not engaged the View Support team as we didn't know if it was related. I am considering contacting support to mention this community thread. Is this considered an ESX/PCoIP issue or do I need to get in touch with the View Support team?

Thanks.

0 Kudos
mgmn
Enthusiast
Enthusiast

Everything has been working fine for my View environment after the hosts were updated to that patch earlier this year. No further stuck VMs thankfully.

0 Kudos
jslarouche
Enthusiast
Enthusiast

Everything has been working for us since we've rolled out the new patches.. knock on wood

0 Kudos