VMware Horizon Community
nary4484
Enthusiast
Enthusiast

Application in View VM crashes less after ESXi host reboot?

There is in-house application in our environment that has a crashing issue. It is housed in a floating pool (~90VMs) that is using a few Tesla M10 GPUs (app used a 3D PDF viewer). They have been tracking the crashes for a couple months now and have recently noticed a trend that anytime the hosts are restarted the application crashes less for about a week and then slowly gets worse. I know it's not normal to reboot hosts and have a hard time believing that this is "fixing" the crash issue.

A couple weeks ago I began restarting the VM's every night in an effort to help with the crash rate. This didn't help. There were a couple issues out of my control that required us to restart these hosts in the past 2 months and they noticed a correlation with crash rates being less after it restarted.

I guess my question could there be any validity to this? Even though I delete, recompose, restart VM's frequently there is something else that is reset when the host restarts?

Sorry if this is making little to no sense. I've been tasked to explain why the app crashes less after a host reboot and at a loss. I don't have a lot of experience with managing ESXi hosts and clusters. I mostly manage our View environment and generally don't have to interact much with the hosts because they just work!

Reply
0 Kudos
2 Replies
techguy129
Expert
Expert

The app you are using is on a VM that has a vGPU profile associated with it? Is it just the app that crashes or is the whole VM? vGPU rely a lot on memory and I've had crashing issues in my environment when using certain vGPU profiles. We up the profile to a higher setting and that resolved our crashes. Rebooting the host resets the host memory as well as the vGPU memory. If this is a constant problem I would investigate the host logs as well as the VM vmx.log file for any signs of issues.

Reply
0 Kudos
nary4484
Enthusiast
Enthusiast

Yes, it does have a vGPU profile associated with it.  I'm curious about your statement "rebooting the host resets the memory as well as the vGPU memory"... I think this could very well be part of our issue.

Any idea where would I go to read more about memory and vGPU memory and how it behaves with a host reboot vs as guest OS restart?

Reply
0 Kudos