I'm using Workstation Pro 17 for my job on a daily basis. Every once in a while the host machine unceremoniously turns off and restarts (no shutdown, no BSOD). This exclusively happens when working with Workstation Pro. The system runs perfectly stable otherwise. Furthermore, every unexpected power cycle happened during a very specific workload: building a .NET solution using Visual Studio within the VM. This workload utilizes almost all of the VM's CPU resources and reserves a good amount of RAM as well. The crash is obviously associated with a high-load scenario. To clarify: Building does not always crash my host, but every crash occurred during a build.
Prior to updating to Workstation Pro 17.5 my system crashed once every couple of weeks (9 times over the course of 6 months). It was annoying, but workable. The problem has become much more severe after I updated to Workstation Pro 17.5, indicating a potential problem on VMware's side. I just recovered from my third crash since yesterday. This is beyond annoying, it's an outright blocker and led me to finally seek assistance.
Host
OS: Windows 11 Pro (latest updates)
CPU: 7950x (16 cores, 32 threads)
GPU: 7900XTX
RAM: 64GB
VM
OS: Windows 11 Business (latest updates)
CPU: half of the 7950x (16 virtual cores, 16 threads)
GPU: acceleration enabled
RAM: 32GB
Unfortunately, neither the Windows Event Viewer (host & guest) nor the Workstation Pro log files give any indication as to what the problem is. Has anyone had a similar experience? Is there something I could do to further diagnose the problem, e.g. set up more verbose logging?
If you're using it for your job, don't mess around here in the forums and expect that your issue can get be addressed by the volunteer users (not VMware employees or engineering) that try to help other users. Bugs don't get fixed here.
My recommendation is to open a support request (yes, you'll probably have to pay for it) with VMware so this gets officially put in their queue and you get the attention of their support and engineering staff.
Thank you, I appreciate your recommendation, but I strongly disagree on the notion that I should pay VMware for the privilege of reporting a regression with their latest update. My understanding is... was that, yes, of course bugs should be reported here. That's how I see it being used anyway, considering half of the latest posts are bug reports in one form or another. Are you telling me those will never get escalated?
@Veilenus wrote:Thank you, I appreciate your recommendation, but I strongly disagree on the notion that I should pay VMware for the privilege of reporting a regression with their latest update. My understanding is... was that, yes, of course bugs should be reported here. That's how I see it being used anyway, considering half of the latest posts are bug reports in one form or another. Are you telling me those will never get escalated?
The mistaken impression is that is a VMware product support board. It's not. It's an end-user to end-user community board with very little active participation by VMware employees. Anything posted here has no guarantee to be looked at by VMware employees (once in a while you might get lucky and get VMware to notice, but most of the time it's crickets from them).
I agree that VMware does its users a disservice by not having an ability for users to report a defect (regression or otherwise) in their desktop virtualization software other than a paid support offering. My point is given the way that VMware seems to work the only sure way of getting "escalation" or someone to actually look at a problem is unfortunately paying for support.
I hope you won't shoot the messenger.
I hope you won't shoot the messenger.
I genuinely appreciate you for taking the time to clear up that misconception for me. I'm definitely disappointed in VMware, but I'm sorry if I gave you the impression that I was angry at you. Kudos, Technogeezer!
Hi,
Not saying it is your problem, but the times I have seen host crashes when using VMware products was when my hardware had issues.
For example faulty RAM can trigger this type of error. Most of the time you see no problems, but every now & then when it hits that faulty chip.. strange things happen. Running virtual machines can be extra taxing for hardware and take it just a bit further than most other software will.
If you have the time then I would run hardware tests over the night as a means to exclude that hardware here is being the issue.
Once you've done that.. the next step would be to see if the host hardware has created crash dump files and inspect those for clues. There used to be a website where you could upload those, but I don't think it still exists which means that beyond the standard (checking.. yes osronline has taken down that service), you're on your own.
You can still use BlueScreenView from nirsoft or windbg.
See: https://www.wikihow.com/Read-Dump-Files
Good luck!
--
Wil
Thank you, @wila. I had run tests, most notably Prime95, after I built the machine earlier this year to avoid this exact kind of problem. Back then everything was okay, but I will do another extended test in due course. Thanks again for the recommendation.
Update
Prime95 (blend profile) ran for 18 hours straight. No errors. No crashes. This result is representative of the computer's stability (as long as I'm not using a Workstation Pro VM to build software
).
Unfortunately, no dump files exist. Looking for them was one of the first things I did back when the problem started. I have just double-checked: no dump files to be found.
Hi,
I'd say that that excludes the "it might be a hardware issue" for the most part. It's always possible that it is some combination in the hardware configuration triggering this, but it -at least- isn't due to failing hardware.
As for the lack of dump files, I guess you already verified the dump file settings. AFAICR they are set by default, but it doesn't hurt to check. https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/enabling-a-kernel-mode-dump-file
Re. VMware Support, I think we all agree on that it is a bit weird that it's not available for issues like this, but it is their choice to make. By default you get 30 days of email support, only if you explicitly buy support you can access their support for looking into issues like this.
Sorry I wasn't of more help.
--
Wil
Oh.. maybe one more thing to check.
Is your hypervisor running with the VMware hypervisor or is it using the Windows Hyper-V API's?
In other words if you search in the vmware.log of the VM for the words: "monitor mode" does it say "ULM" or "CPL0"?
--
Wil
Thank you for checking back, @wila.
It is "CPL0" as per vmware.log. Furthermore, Hyper-V along with its sub-features are unchecked in the "Windows Features" settings.
One more thing I would like to mention: I'm making use of nested virtualization (Android emulator) which required some changes to the host's default settings: disabling "Core Isolation" and "Virtualization Based Security" are just the two off the top of my head. I can't remember exactly, but I was following a guide that ultimately did the trick. Sorry, I'm definitely no expert when it comes to virtualization. Maybe those changes somehow play into the problem.
Hi,
It's just basic things that I come up with now that you can try. But nothing noteworthy atm. I do not expect it to be fixed by switching over to Hyper-V (but I've been wrong before
)
Yet another thing to try is disabling 3D acceleration for the VM. As an attempt to see if the issue has something to do with the GPU somehow. Without more crash details it's really just grasping at straws and throwing random things at the wall to see if something sticks..
--
Wil
