VMware Communities
WTruitt
Contributor
Contributor

[Issue] VMware Fusion 11.5.2 shuts down agents randomly on macOS Catalina 10.15.3

The errors reported and scrapped logs are below. E105 is the first error reported in the given time period.

2020-04-25T11:32:50.643-06:00| worker-7235712| E105: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FAEA783BA70; workerLibLock)

2020-04-25T11:32:50.643-06:00| worker-7235712| I125: Panic: can't get userlevel lock.

2020-04-25T11:32:50.643-06:00| worker-7235712| W115:

2020-04-25T11:32:50.643-06:00| worker-7235712| W115+ The core dump limit is set to ZERO; no core dump should be expected

2020-04-25T11:32:50.643-06:00| worker-7235712| I125: Backtrace:

2020-04-25T11:32:50.644-06:00| worker-7235712| I125: Backtrace[0] 0000700003272a00 rip=000000010c1bdbcb rbx=0000000000000000 rbp=0000700003272ee0 r12=0000000000000001 r13=0000000000000000 r14=000000010d0418a8 r15=0000000000000016

..

..

..

2020-04-25T11:32:50.650-06:00| worker-7235712| I125: [113B66000-113CC3000): /System/Library/Components/CoreAudio.component/Contents/MacOS/CoreAudio

2020-04-25T11:32:50.650-06:00| worker-7235712| I125: [7FFF4B745000-7FFF4B797000): /System/Library/PrivateFrameworks/AudioSession.framework/Versions/A/AudioSession

2020-04-25T11:32:50.650-06:00| worker-7235712| I125: [7FFF4B797000-7FFF4B7C5000): /System/Library/PrivateFrameworks/AudioSession.framework/libSessionUtility.dylib

2020-04-25T11:32:50.650-06:00| worker-7235712| I125: End printing loaded objects

2020-04-25T11:32:50.650-06:00| worker-7235712| I125: Writing monitor `vmmcores.gz`

2020-04-25T11:32:50.659-06:00| worker-7235712| W115: Dumping core for vcpu-0

2020-04-25T11:32:50.659-06:00| worker-7235712| I125: Beginning monitor coredump

2020-04-25T11:32:50.746-06:00| mks| W115: Panic in progress... ungrabbing

2020-04-25T11:32:50.746-06:00| mks| I125: MKS: Release starting (Panic)

2020-04-25T11:32:50.746-06:00| mks| I125: MKS: Release finished (Panic)

2020-04-25T11:32:50.779-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1c (0) Bad address

2020-04-25T11:32:50.779-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1d (0) Bad address

2020-04-25T11:32:50.779-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1e (0) Bad address

2020-04-25T11:32:50.779-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1f (0) Bad address

2020-04-25T11:32:51.034-06:00| worker-7235712| I125: End monitor coredump

2020-04-25T11:32:51.034-06:00| worker-7235712| W115: Dumping core for vcpu-1

2020-04-25T11:32:51.034-06:00| worker-7235712| I125: Beginning monitor coredump

2020-04-25T11:32:51.145-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1c (0) Bad address

2020-04-25T11:32:51.145-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1d (0) Bad address

2020-04-25T11:32:51.145-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1e (0) Bad address

2020-04-25T11:32:51.145-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1f (0) Bad address

2020-04-25T11:32:51.399-06:00| worker-7235712| I125: End monitor coredump

2020-04-25T11:32:51.399-06:00| worker-7235712| W115: Dumping core for vcpu-2

2020-04-25T11:32:51.399-06:00| worker-7235712| I125: Beginning monitor coredump

2020-04-25T11:32:51.504-06:00| worker-7235712| I125: CoreDump error: Read, page 0xc33 (0x784660) Bad address

2020-04-25T11:32:51.511-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1c (0) Bad address

2020-04-25T11:32:51.511-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1d (0) Bad address

2020-04-25T11:32:51.511-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1e (0) Bad address

2020-04-25T11:32:51.511-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1f (0) Bad address

2020-04-25T11:32:51.765-06:00| worker-7235712| I125: End monitor coredump

2020-04-25T11:32:51.765-06:00| worker-7235712| W115: Dumping core for vcpu-3

2020-04-25T11:32:51.765-06:00| worker-7235712| I125: Beginning monitor coredump

2020-04-25T11:32:51.870-06:00| worker-7235712| I125: CoreDump error: Read, page 0xc33 (0x245777) Bad address

2020-04-25T11:32:51.878-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1c (0) Bad address

2020-04-25T11:32:51.878-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1d (0) Bad address

2020-04-25T11:32:51.878-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1e (0) Bad address

2020-04-25T11:32:51.878-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1f (0) Bad address

2020-04-25T11:32:52.133-06:00| worker-7235712| I125: End monitor coredump

2020-04-25T11:32:52.133-06:00| worker-7235712| W115: Dumping core for vcpu-4

2020-04-25T11:32:52.133-06:00| worker-7235712| I125: Beginning monitor coredump

2020-04-25T11:32:52.245-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1c (0) Bad address

2020-04-25T11:32:52.245-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1d (0) Bad address

2020-04-25T11:32:52.245-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1e (0) Bad address

2020-04-25T11:32:52.245-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1f (0) Bad address

2020-04-25T11:32:52.499-06:00| worker-7235712| I125: End monitor coredump

2020-04-25T11:32:52.499-06:00| worker-7235712| W115: Dumping core for vcpu-5

2020-04-25T11:32:52.499-06:00| worker-7235712| I125: Beginning monitor coredump

2020-04-25T11:32:52.603-06:00| worker-7235712| I125: CoreDump error: Read, page 0xc33 (0x5ae3c2) Bad address

2020-04-25T11:32:52.611-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1c (0) Bad address

2020-04-25T11:32:52.611-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1d (0) Bad address

2020-04-25T11:32:52.611-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1e (0) Bad address

2020-04-25T11:32:52.611-06:00| worker-7235712| I125: CoreDump error: Read, page 0xd1f (0) Bad address

2020-04-25T11:32:52.865-06:00| worker-7235712| I125: End monitor coredump

2020-04-25T11:32:56.796-06:00| worker-7235712| I125: Exiting

A little research tells me that the error is thrown at #L312 of this function open-vm-tools/ulCondVar.c at master · vmware/open-vm-tools · GitHub

But, I can't find the root cause of thy this is happening. The previous instance of this issue occurred on April 14 so this is becoming more regular in its occurrence.

Reply
0 Kudos
23 Replies
nancyz
VMware Employee
VMware Employee

Hi WTruitt ,

Which kind of guest OS are you using?

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

That's an unusual failure!

Which model of Mac are you using?

My first suggestion would be to set troubleshooting to Hang/Crash if you have not already done so.  Your virtual machine will run somewhat more slowly, but it is much more likely to produce a more useful error message if the problem occurs again.

It would be great if you could provide some more of the vmware.log.  Here's the minimum which can be useful for this sort of a failure:

  • The first line of the log file.
  • At least a few more lines leading up to the PANIC line.
  • The entire "Backtrace" and "SymBacktrace" sections of the log;
  • The entire "loaded objects" list.

To make it easier for us to work with, please consider providing the log (full or partial) using the Attach facility in the lower-right corner when composing a reply, instead of copying-and-pasting it into the reply itself.

Thanks,

--

Darius

Reply
0 Kudos
WTruitt
Contributor
Contributor

Thanks dariusd for such a quick response. The VM is hosted on a 2013 Mac Pro - Mac Pro 2013 - Tech Specs - Apple

The exact specs of our hardware is attached.

I noticed the issue on 25th but I've attached logs from 22nd just in case something from few days earlier could have been the trigger.

Reply
0 Kudos
WTruitt
Contributor
Contributor

nancyz​. The physical machine is running on 10.15.3 Catalina while the VM is running on 10.14.5 Mojave

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

I would not have called my response "quick" by any means, at about a week after you posted.  Unfortunately I missed this thread when you first posted it... I only noticed it when nancyz​ replied.  Smiley Sad

The logs you provided are still unfortunately lacking much in the way of useful information.  Not any fault of yours, it's just a factor of the stuff that has ended up in the log file.

I'll reiterate my request to set the troubleshooting option for your VM(s) to Hang/Crash... that will be by far the best opportunity we have for obtaining useful information.  If you are unable to use that option for whatever reason, please let me know and we can explore other options.

Thanks,

--

Darius

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

I have no idea why my previous reply went to the wrong spot in this thread.  Sigh.

One more request for you.  Can you please post (in a reply here) the output of the following command:

   kextstat -l | grep -vF com.apple

Thanks in advance,

--

Darius

Reply
0 Kudos
WTruitt
Contributor
Contributor

Hi dariusd​, here are outputs of kextstat -l from both the VM and the physical machine hosting the said VM.

VM logs:

   97    0 0xffffff7f80dcc000 0xd000     0xd000     com.vmware.kext.VMwareGfx (0806.83.93) D6614AF0-9EE3-3A63-B9F7-81E2F0FC6ECF <93 13 12 8 6 5 3 1>

  136    0 0xffffff7f82d25000 0xa000     0xa000     com.vmware.kext.vmhgfs (0806.83.93) 794E5642-5ACD-390E-B7E8-6CCCF46D49FB <6 5 3 1>

Physical Machine logs:

  165    0 0xffffff7f8499e000 0x15000    0x15000    com.vmware.kext.vmnet (1579.44.94) 19F76B68-99AD-376C-97FB-B08C4F47552F <6 5 3 1>

  166    0 0xffffff7f849b3000 0x13000    0x13000    com.vmware.kext.vmx86 (1579.44.94) 58E468C7-0B9F-3B53-B11C-1C084B5C262D <8 6 5 3 1>

  167    0 0xffffff7f849c6000 0x7000     0x7000     com.vmware.kext.vmioplug.18.7.0 (18.7.0) D901D91B-6725-3146-8C0B-E5664701C8AD <60 6 5 3 1>

Reply
0 Kudos
WTruitt
Contributor
Contributor

The crash happened again on May 3 before I could enable the troubleshooting option dariusd suggested . Attaching the complete log here. I don't know if this would be any different than the old log of April 25. But, this points out that the crashes happen quite frequently.

Reply
0 Kudos
WTruitt
Contributor
Contributor

dariusd​ Since these VM are production VMs I'm waiting to go into maintenance to enable the Hang/Crash option. Once, I do that and get a crash I'll attach the logs.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Thanks.  I'm still very perplexed by this failure and would be very interested to see what happens with the Hang/Crash option selected.  Note that there is a chance that this will make the VM fail sooner after power-on than it did previously...  If that happens, please just collect the vmware.log generated by the Hang/Crash run, then turn off the Troubleshooting option and go back to normal operation until we figure out what is happening.  The most important part is that I end up seeing the vmware.log from the Hang/Crash run.

Also... Since it's a production VM and Fusion is crashing, I would encourage you to make sure you have good backups of the VM and/or its contents.  Hopefully you already have that all under control.  As our forum regulars will point out, AutoProtect is not a "backup", particularly when dealing with hypervisor crashes.

Do you know if this failure correlates with a move from an earlier Fusion version up to version 11.5.2, or with the update of macOS from some earlier version to 10.15.3?  If there is such a correlation, do you know which version you were running previously (where the problem was not observed)?

Thanks,

--

Darius

Reply
0 Kudos
WTruitt
Contributor
Contributor

dariusd​ Here is the log after Hang/Crash was enabled. The crash happened on 06-09-2020. A sneak Peak -

2020-06-09T13:47:43.554-06:00| worker-4| E105: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FD51EE433B0; workerLibLock)

2020-06-09T13:47:43.554-06:00| worker-4| I125: Panic: can't get userlevel lock.

You may want to ignore the logs from May in the attached file

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Thanks for the update.  Unfortunately, the new log really doesn't provide much more useful information (EDITED to add: This is absolutely not any fault of yours!).  For such an unusual failure, that is truly a surprise.

I will do a bit more investigation here and get back to you.  Could you please keep the corefile (/cores/core.94362) for a while?  It will probably be a very large file, but it might prove invaluable in understanding what is happening.  Don't try to send it through just yet... just keep it aside if possible – i.e. if you don't need to reclaim the storage space... or haven't already done so...

Thanks,

--

Darius

Reply
0 Kudos
WTruitt
Contributor
Contributor

Saved the /cores/core.94362. It's huge - 1.7G

Reply
0 Kudos
WTruitt
Contributor
Contributor

dariusd Another crash on June 22. This is a cumulative log hence looking for June 22 would help.

Reply
0 Kudos
WTruitt
Contributor
Contributor

Could this be related to Processors or Memory? We have 4 processor cores allocated to this VM and 4096MB of RAM

Reply
0 Kudos
Mikero
Community Manager
Community Manager

Sometimes (but not always) it can be related to faulty hardware. Have you done a memtest?

-
Michael Roy - Product Marketing Engineer: VCF
Reply
0 Kudos
WTruitt
Contributor
Contributor

No. Could you guide me here?

Reply
0 Kudos
Mikero
Community Manager
Community Manager

How to use Apple Diagnostics on your Mac - Apple Support

or

How to use Apple Hardware Test on your Mac - Apple Support

depending on the model of your Mac.

-
Michael Roy - Product Marketing Engineer: VCF
Reply
0 Kudos
Setheck
Contributor
Contributor

Anyone found a resolution to this? I am seeing the same thing but on

VMware Fusion 11.5.6

Hosts I have seen this occur are running MacOS 10.14.5, 10.14.4, 10.15.7

Since January 2021 (last 3 months) I have seen 10 separate instances across 10 different hosts. I believe a hardware issue is very unlikely at this point.

Every instance is the VM just stops, the guest os is left in a 'crashed' powered down state. When you start it back up, it detects an unclean shutdown.

And the vmware.log on the host is always very similar... (see below)

I have also created a support request, and sent support information, but I would love any answers or any direction...

These are also production hosts...

 

Errors I'm seeing...

"2021-01-15T15:28:51.148-08:00| worker-12531694| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FA6B46EF210; workerLibLock)
2021-01-15T15:28:51.148-08:00| worker-12531694| I005: Panic: can't get userlevel lock."


"2021-01-13T00:32:09.646-08:00| worker-531216| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FC04E52E260; workerLibLock)
2021-01-13T00:32:09.646-08:00| worker-531216| I005: Panic: can't get userlevel lock."


"2021-02-03T14:05:37.443-08:00| worker-24136677| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FF4BFD98460; workerLibLock)
2021-02-03T14:05:37.443-08:00| worker-24136677| I005: Panic: can't get userlevel lock."


"2021-02-05T00:38:13.723-08:00| worker-630581| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7F97BB410F80; workerLibLock)
2021-02-05T00:38:13.723-08:00| worker-630581| I005: Panic: can't get userlevel lock."


"2021-02-05T15:00:38.877-08:00| worker-9571961| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FD43AC5D510; workerLibLock)
2021-02-05T15:00:38.877-08:00| worker-9571961| I005: Panic: can't get userlevel lock."


"2021-02-25T09:00:12.954-08:00| worker-7075291| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FA52206A8D0; workerLibLock)
2021-02-25T09:00:12.954-08:00| worker-7075291| I005: Panic: can't get userlevel lock."


"2021-03-03T00:34:30.635-08:00| worker-7738092| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FC433F27960; workerLibLock)
2021-03-03T00:34:30.635-08:00| worker-7738092| I005: Panic: can't get userlevel lock."


"2021-03-08T01:21:28.896-08:00| worker-22647803| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FBAC5D7CDC0; workerLibLock)
2021-03-08T01:21:28.896-08:00| worker-22647803| I005: Panic: can't get userlevel lock."


"2021-03-09T04:17:39.694-08:00| worker-7144844| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7FBA3D031090; workerLibLock)
2021-03-09T04:17:39.694-08:00| worker-7144844| I005: Panic: can't get userlevel lock."


"2021-03-11T09:44:01.212-08:00| worker-9657756| E001: PANIC: MXUserWaitInternal: failure 22 on condVar (0x7F81FCAE33D0; workerLibLock)
2021-03-11T09:44:01.212-08:00| worker-9657756| I005: Panic: can't get userlevel lock."

 

Reply
0 Kudos