VMware Communities
WhiteKnight
Hot Shot
Hot Shot

Windows 10x64 VM crashes several times a day - why?

After having updated VMware Workstation to version 15.5.2 build-15785246, I'm now setting up a new VM. While setting up that new VM today, VMware Workstation has halted the VM three times by now - why?

Attached please find a short excerpt of the current vmware.log. Perhaps it may shed some light on why VMware Workstation is crashing VMs so regularly on my machine?

Your help is very much appreciated.



[VMware]: Workstation 17 Pro; --
[host]: Windows 10x64 host; --
[guests]: Windows 10x64, Windows 8x64.
7 Replies
dariusd
VMware Employee
VMware Employee

It looks like the vmware.log extract was collected without gathering debug information, so there is very little information to suggest what might have gone wrong.

Could you please switch on the Gathering Debugging Information option to "Full"​​ and run your VM until it next fails, then gather the vmware.log which results from that?  We've exchanged emails before, so you can send it to me directly if you would prefer not post it here. :smileycool:

--

Darius

Reply
0 Kudos
WhiteKnight
Hot Shot
Hot Shot

Hi Darius,

thanks so much for offering your valuable help!

I set logging options to "full", but everything seems to run fine since then; the VM didn't crash yet.

I'm now considering to reverse the logging detail level to see if it will cause the VM to crash again.

May I take the chance to spit a log level improvement proposal:

Does your code apply log level filtering before or after preparing the string to be logged?

So, does your logging code rather look like this:

void log(int severity, wchar_t const *const text)
{
  if (severity >= _logLevel) append(text, _logFilePath);
}

... or like this:

void log(int severity, wchar_t const * const fmt...)
{
  if (severity >= _logLevel)
  {
    // compose logging string
    wchar_t *text = ...;

    append(text, _logFilePath);
  }
}

If it's the first alternative, i.e. if the resulting string to log already exists in memory, it may be quite savvy to keep a decent number of logging information in a circular buffer in memory (e.g. wchar_t const [1024];) that's finally dumped to a separate file when a crash occurs:

void log(int severity, wchar_t const *const text)
{
  // adds logging information to ring buffer
  _ringBuffer.add(text);

  if (severity >= _logLevel) append(text, _logFilePath);
}



class RingBuffer
{
  private wchar_t const * const _buffer[1024];
  private unsigned int _offset = 0u;


  // called by the logging function
  public void add(wchar_t const * const text)
  {
    _buffer[_offset++] = text;
    _offset %= 1024;
  }


  // called when a crash occurs
  public void dumpToFile()
  {
    unsigned int offset = _offset;

    do
    {
      if (_buffer[offset]) append(_buffer[offset], _logFilePath);

      offset = ++offset % 1024;
    } while (offset != _offset);
  }
}

So, the ring buffer would hold the most recent 1024 logging entries, no matter whether they were logged or not. When a crash occurs, the ring buffer's strings would be written to a separate logging file. (The above pseudo C++ code is an oversimplification, of course.)

This amendment to logging would cost almost nothing in regard to product performance. But it would make sure that there's always a full debugging log available when a crash occurs.



[VMware]: Workstation 17 Pro; --
[host]: Windows 10x64 host; --
[guests]: Windows 10x64, Windows 8x64.
Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Hmmm.  In some performance-sensitive locations we do use ring buffers in a way broadly similar to what you describe, but we largely depend on being able to switch the troubleshooting option on to capture more information when needed and otherwise keep the performance impact as close to zero as we can.  Even the overhead of maintaining a ring buffer (to be possibly dumped later) is too great for a lot of our code.

The Virtual Machine Monitor (VMM) is especially performance-sensitive, and that's where all the interesting stuff happens during a Triple Fault, which is what is happening in your VM.  Indeed, turning Troubleshooting to Full will enable the monitor ring buffer, which is what I would particularly be looking for in the vmware.log if/when this issue next reproduces with troubleshooting enabled.  The VM should triple-fault and then hit a debug assertion, at which point it will dump the monitor ring buffer into vmware.log as well as potentially dumping other guest state to disk before terminating.  (The non-troubleshooting builds would quietly go ahead and reboot the VM, but the troubleshooting builds will simply stop.)

I'll leave it up to you to decide whether to run with the troubleshooting build for a while longer or switch back to the non-troubleshooting build.  There is really no actionable information in the logs from the non-troubleshooting build, though.

Thanks,

--

Darius

WhiteKnight
Hot Shot
Hot Shot

Even the overhead of maintaining a ring buffer (to be possibly dumped later) is too great for a lot of our code.

I see. So, your code is first determining whether to log an item and then, only if the condition is true, it's concatenating the string about to be logged? (I would have guessed you'd be taking this approach as it's the fastest option, yet as it's cumbersome to implement here and there many programmers choose to concatenate the string regardless and leave it to the logging function whether to output or not.)

Because, otherwise actually there would be no overhead as it wouldn't make a difference whether the concatenated string was deleted by the calling function immediately or by the ring buffer when a new item is added (which is just deleting a previous string and setting a pointer to the new string object).



[VMware]: Workstation 17 Pro; --
[host]: Windows 10x64 host; --
[guests]: Windows 10x64, Windows 8x64.
Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

That's right... We avoid formatting the string unless we know it is going to be logged.

Any luck with gathering any debug logs from the failure?

--

Darius

WhiteKnight
Hot Shot
Hot Shot

The VM didn't crash ever since, I'm afraid. Yesterday evening I reverted the logging level back to default, and now I'm waiting for the crash to occur again. If it's going to occur now, is there anything I could possibly provide that may shed some light into this?



[VMware]: Workstation 17 Pro; --
[host]: Windows 10x64 host; --
[guests]: Windows 10x64, Windows 8x64.
Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

The vmware.log with troubleshooting set to Full is just about the only useful information to collect, I'm afraid.  Without the troubleshooting option set, all we see is a VM which got into a state where its guest OS can not possibly continue to run, so we forced the machine to reset in exactly the same way as would occur if the same condition occurred on a physical machine.

Without a troubleshooting-enabled vmware.log, we're down to the usual troubleshooting procedure for a PC which unexpectedly reboots:

  1. Check the guest's event log.  The type of crash seen here is unlikely to leave anything in the log other than a complaint about an unexpected reboot, but it's worth checking just in case it has the answer!
  2. Ensure that your guest OS has all updates installed.
  3. Inspect the list of drivers installed in the guest OS, particularly looking at drivers which are not from the OS vendor.  Consider uninstalling any guest software which includes device drivers.
  4. Attempt to boot the guest into Safe Mode and see if the failure still occurs.
  5. Run hardware diagnostics on the host.

That's about all I can think of.  Good luck!

--

Darius

Reply
0 Kudos