VMware Communities
antnythr
Contributor
Contributor

VMWare Player crashing on Windows Guest Shutdown

Over the last 10 days or so, for some reason every time I attempt to shutdown or restart Windows (Windows guest - Arch is the host), VMWare immediately freezes and the process cannot be killed to the point I need to do a hard reboot because I can't even shutdown my system cleanly (although my system is totally responsive in every other way before I shutdown with the exception of VMWare) and I get the following showing up when I run dmesg:

[ 1954.873606] BUG: unable to handle kernel paging request at ffffffffc057f14e

[ 1954.873651] IP: report_bug+0x94/0x120

[ 1954.873666] PGD 23ba0c067

[ 1954.873666] P4D 23ba0c067

[ 1954.873677] PUD 23ba0e067

[ 1954.873688] PMD 32ed01067

[ 1954.873700] PTE 800000032edf7161

[ 1954.873730] Oops: 0003 [#1] PREEMPT SMP

[ 1954.873744] Modules linked in: dm_mod dax ctr ccm cmac rfcomm vmnet(O) ppdev parport_pc parport fuse vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) arc4 bnep bbswitch(O) intel_rapl dell_wmi x86_pkg_temp_thermal intel_powerclamp sparse_keymap iTCO_wdt coretemp iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc dell_laptop dell_smbios dcdbas aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf btusb psmouse btrtl btbcm i915 pcspkr btintel bluetooth i2c_i801 iwlmvm snd_hda_codec_hdmi mac80211 ecdh_generic snd_hda_codec_realtek snd_hda_codec_generic input_leds evdev led_class joydev mousedev mac_hid iwlwifi snd_hda_intel atl1c lpc_ich cfg80211 drm_kms_helper snd_hda_codec uvcvideo videobuf2_vmalloc drm snd_hda_core

[ 1954.873979]  videobuf2_memops snd_hwdep rfkill videobuf2_v4l2 snd_pcm videobuf2_core intel_gtt syscopyarea videodev snd_timer mei_me sysfillrect snd sysimgblt fb_sys_fops media mei soundcore i2c_algo_bit shpchp thermal wmi battery dell_smo8800 video tpm_tis button tpm_tis_core ac tpm sch_fq_codel ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache sd_mod hid_generic usbhid hid serio_raw atkbd libps2 xhci_pci xhci_hcd ahci libahci libata scsi_mod ehci_pci ehci_hcd usbcore usb_common i8042 serio

[ 1954.874130] CPU: 3 PID: 5334 Comm: vmx-vcpu-0 Tainted: G   O4.12.6-1-ARCH #1

[ 1954.874156] Hardware name: Dell Inc.  Dell System XPS 15Z/0MFNCV, BIOS A12 09/07/2012

[ 1954.874184] task: ffff89daea7a8e40 task.stack: ffffa34688da4000

[ 1954.874204] RIP: 0010:report_bug+0x94/0x120

[ 1954.874219] RSP: 0018:ffffa34688da7b10 EFLAGS: 00010202

[ 1954.874237] RAX: 0000000000000907 RBX: ffffa34688da7c78 RCX: ffffffffc057f144

[ 1954.874260] RDX: 0000000000000001 RSI: 0000000000000ed2 RDI: 0000000000000001

[ 1954.874283] RBP: ffffa34688da7b30 R08: ffffa34688da8000 R09: 000000000000034e

[ 1954.874306] R10: ffffffffa6a06a80 R11: ffff89daea7a8e00 R12: ffffffffc0532e34

[ 1954.874329] R13: ffffffffc0577366 R14: 0000000000000004 R15: ffffa34688da7c78

[ 1954.874353] FS:  00007fb1dd387700(0000) GS:ffff89daffac0000(0000) knlGS:0000000000000000

[ 1954.874379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 1954.874398] CR2: ffffffffc057f14e CR3: 000000032aea2000 CR4: 00000000000406e0

[ 1954.874421] Call Trace:

[ 1954.874439]  ? ext4_set_page_dirty+0x44/0x60 [ext4]

[ 1954.874459]  fixup_bug+0x2e/0x50

[ 1954.874474]  do_trap+0x119/0x150

[ 1954.874487]  do_error_trap+0x89/0x110

[ 1954.874504]  ? ext4_set_page_dirty+0x44/0x60 [ext4]

[ 1954.874522]  ? __enqueue_entity+0x6c/0x70

[ 1954.874537]  ? enqueue_entity+0x109/0x740

[ 1954.874553]  do_invalid_op+0x20/0x30

[ 1954.874567]  invalid_op+0x1e/0x30

[ 1954.874582] RIP: 0010:ext4_set_page_dirty+0x44/0x60 [ext4]

[ 1954.874600] RSP: 0018:ffffa34688da7d28 EFLAGS: 00010246

[ 1954.874618] RAX: 017fff0000001068 RBX: ffffdee80b6ad200 RCX: 0000000000000000

[ 1954.874642] RDX: 0000000000000000 RSI: 0000000000000041 RDI: ffffdee80b6ad200

[ 1954.874665] RBP: ffffa34688da7d28 R08: ffff89daef2ce240 R09: 0000000180400037

[ 1954.874688] R10: ffffa34688da7d48 R11: ffff89daea7a8e00 R12: 0000000000000041

[ 1954.874711] R13: 0000000000000001 R14: ffff89dace1dc860 R15: 0000000000000000

[ 1954.874745]  set_page_dirty+0x5b/0xb0

[ 1954.874767]  qp_release_pages+0x64/0x80 [vmw_vmci]

[ 1954.874792]  qp_host_unregister_user_memory.isra.20+0x27/0x80 [vmw_vmci]

[ 1954.874825]  vmci_qp_broker_detach+0x245/0x3f0 [vmw_vmci]

[ 1954.874854]  vmci_host_unlocked_ioctl+0x18e/0xa30 [vmw_vmci]

[ 1954.874887]  do_vfs_ioctl+0xa5/0x600

[ 1954.874908]  ? __fget+0x6e/0x90

[ 1954.874927]  SyS_ioctl+0x79/0x90

[ 1954.874947]  entry_SYSCALL_64_fastpath+0x1a/0xa5

[ 1954.874970] RIP: 0033:0x7fb34cf8f8b7

[ 1954.874990] RSP: 002b:00007fb1dd37eb28 EFLAGS: 00003246 ORIG_RAX: 0000000000000010

[ 1954.875029] RAX: ffffffffffffffda RBX: 00007fb1dd386580 RCX: 00007fb34cf8f8b7

[ 1954.875066] RDX: 00007fb1dd37eb40 RSI: 00000000000007aa RDI: 0000000000000065

[ 1954.875106] RBP: 0000000000000006 R08: 0000000000000002 R09: 00007fb1dd37ecd0

[ 1954.875147] R10: 00007fb1dd386bfe R11: 0000000000003246 R12: 00007fb34b8be010

[ 1954.875191] R13: 00000000747ff0ec R14: ffffffffffffffff R15: 0000000000000006

[ 1954.875236] Code: 74 59 0f b7 41 0a 4c 63 69 04 0f b7 71 08 89 c7 49 01 cd 83 e7 01 a8 02 74 15 66 85 ff 74 10 a8 04 ba 01 00 00 00 75 26 83 c8 04 <66> 89 41 0a 66 85 ff 74 49 0f b6 49 0b 4c 89 e2 45 31 c9 49 89

[ 1954.875365] RIP: report_bug+0x94/0x120 RSP: ffffa34688da7b10

[ 1954.876980] CR2: ffffffffc057f14e

[ 1954.886476] ---[ end trace cdf1cc11c1557bdd ]---

I've now tried kernels 4.12.{3,5,6,8}, with VMWare 12.5.6 as well as 12.5.7. The issue also persists with existing and clean installs of Windows 10.

I can't think of anything else that's changed, and everything has worked fine for a very long time up to this point.

Any ideas as to what might be going on?

8 Replies
antnythr
Contributor
Contributor

It's not an ideal solution, but I fixed the issue by downgrading my kernel to 4.9-LTS...

No idea what the actual problem was. I was under the impression VMWare Player worked with 4.12 (and it did for me on a number of versions), and then all of a sudden no version of 4.12 would work.

Reply
0 Kudos
mfvianna
Contributor
Contributor

(NOTE: Jump straight to answer 5 if you want a summarized list of the 3 workarounds I identified so far for this issue - continue here if you want some history on the progress and how I ended up realizing each one)

I have the same issue since probably kernel 4.10 as well.  Sometimes it just throws the ext4_set_page_dirty+0x44/0x60 exception in dmesg while others it freezes my box the same way you describe (I also cannot perform a clean shutdown).

Unfortunately, downgrading the kernel is not a viable option to me.  Fedora upgrades packages very often and even if I install kernel 4.9, I cannot compile the vmware modules because they require gcc-6.3 when compiling them for kernel 4.9.

So, I started looking for alternatives.  Since the exception explicitly mentions the ext4_set_page_dirty kernel routine, I guessed that if the VM files are in a different fs the issue possibly would not occur, so I created a separate partition for the VM files and formatted it with xfs at first and then with btrfs.  In both cases the result was successful, i.e., no more ext4_set_page_dirty exception and no more crashes (tested many times already, but I just realized this workaround today).

I guess the issue occurs due to some race condition in the ext4 code since kernel 4.10, but I'm not a kernel specialist.  I'm guessing this because the chances of a crash to happen (in addition to the exception in dmesg that always happen regardless of the crash) increases if I'm executing any disk intensive task while the shutdown in performed.  I'm not sure if VMware or the kernel developers should fix it and this certainly requires a lot of debugging.  In any case, it would be a good starting point to look at the kernel changes in ext4 fs code from 4.9-4.10.

What surprises me most is that I don't find many references of this issue in google.  This makes me believe that this is not affecting much people.  Just checking for similarities, I'm using a Fedora 27 host with Windows 7 Pro guest installed in two different computers and both have the same issue.  In one of these machines the guest OS is an almost fresh install of Win7 (just a few days ago) with minimum additional software.  What they have in common is that they both have Norton Antivirus installed (I didn't try to reproduce the issue without NAV).

Hope the different filesystem is a viable alternative for you or whoever read this post, at least until vmware or the kernel developers fix it.   This is a very annoying an critical bug.

mfvianna
Contributor
Contributor

Actually, there is an even quicker, easier and less radical solution than the separate non-ext4 partition workaround I mentioned previously.

Apparently, bindfs (fuse bindfs) is enough to trick VMware so that it doesn't execute any ext4 specific code (probably optimizations) that trigger the exception.  Consequently there is no need to create separate partitions (which would somehow require pre-allocation of space as well as adventurous spirit to try btrfs or xfs).

Indeed, bindfs even allows a directory to be mounted on top of itself, so, you can mount the directory where your VM files are on top of itself before running vmware workstation or vmplayer.  The VM engine wouldn't even notice the change and ask if you moved or copied the VM.

Important note: bindfs is not the same as "mount -B" or "mount --bind".  bindfs is a user level filesystem.  Native linux bind mount will not work as a workaround as it still identifies the file system as ext4.

The only "issue" is that using bindfs requires root permissions if not run with the --no-allow-other option and, unfortunately, this option prevents this workaround to work with vmware.  In any case it is also possible to add a fstab entry specifying fuse.bindfs as the FS type or to use sudo.

In my case I just created a script that runs "sudo mount -o multithreaded -t fuse.bindfs /home/marcelo/vmware /home/marcelo/vmware" before running vmplayer

OBS: It may be necessary that you install bindfs in your system.  As far as I know, bindfs is not installed by default by any distro.

OBS2: There is a small performance impact for long multi-file tasks, like scanning the entire disks for viruses, which can take a few minutes longer, but nothing perceptible for the daily use.  Some performance improvement is also available with -o multithreaded option at mount time.  (For me, the performance impact is so small and unnoticeable that it is certainly compensated by not needing another fs type in a separate partition, which is currently the only other workaround I found so far).

mfvianna
Contributor
Contributor

Here it is a script I wrote to automate the bindfs mounting process (it has to be adapted to whatever directory your VM files reside in):

OBS: Add mount and umount to your user in the /etc/sudoers file if you don't want the script to keep asking you for passwords (if you don't know sudo, do the proper research on how to do that in order to avoid security breaches in your system).

OBS2: This script uses vmplayer, change it for workstation if this is your case.

---- CUT HERE ----

#!/bin/sh

#export VMWARE_USE_SHIPPED_LIBS=force

mount |grep "/home/marcelo/vmware on /home/marcelo/vmware" > /dev/null

if [[ $? -ne 0 ]]; then

  sudo mount -o multithreaded -t fuse.bindfs /home/marcelo/vmware /home/marcelo/vmware > /dev/null

  if [[ $? -ne 0 ]]; then

    echo "Bindfs mount failed. Please check if bindfs is correctly installed."

    exit 1

  fi

fi

vmplayer "$@"

sudo umount /home/marcelo/vmware > /dev/null

exit 0

---- CUT HERE ----

mfvianna
Contributor
Contributor

In addition to the two workarounds I provided above, I've also digged a little bit further and found that the issue doesn't happen only when the machine is shutdown, but when vmware tools exit, which also happens during shut down (actually right, after the user request the shutdown).

Continuing the investigation, I've isolated the VMCI driver in VMWare tools as the culprit of triggering the issue (by uninstalling this specific driver).  Later, I've found a better and less radical approach that is to disable the vmci virtual hardware instead of uninstalling its driver from VMWare tools.

The VMCI virtual hardware enables two features: 1) allows two Virtual machines to communicate to each other circumventing the network layer; 2) allows the use of shared folders (mount host files in guest os).  I actually use none of them (I use actual samba exports in a physical server to share files instead).  If this is also your case not to need any of these features, this third workaround is for you.  The procedure is very simple: With the virtual machine stopped, first make a backup copy of your .vmx file, then edit it and change the line containing:

vmci0.present = "TRUE"

                    to

vmci0.present = "FALSE"

OBS: It is important to back the .vmx file up because after the first boot with the vmci disabled, other options will also be changed automatically (like the pci slot of the vmci hardware being changed to -1, so the backup will allow you to re-enable it in the future in case you want.

In summary there are currently 3 workarounds available:

1)  Use a file system other than ext4 for holding the virtual machine files (like btrfs, xfs or jfs)

     trade off: requires a separate partition just for the VM files (or a reformatting of the current one);

2) bindfs (fuse) mount the directory containing the virtual machine files on top of itself before turning on the virtual machine

    trade off: a small hit in performance;

3) disable the vmci virtual hardware in the .vmx file

    trade off: cannot use shared folders

Pedulla57
Contributor
Contributor

Good stuff, so this thread should not die. Smiley Happy

I've been struggling with this for a while and glad I found this explanation.

I'm running a licensed VMWP 12 on Mint 18.1 Mate, kernel 4.4.0-138-generic.

I can confirm that turning OFF Shared Folders before I shut the VM down ends the problem of the host freezing.

I like the samba share suggestion, but Clem decided to remove samba from Mint Mate.  When I try to install it there is an issue with samba-libs wanting an older version

Depends: samba-libs (= 2:4.3.11+dfsg-0ubuntu0.16.04.16) but 2:4.3.11+dfsg-0ubuntu0.16.04.17 is to be installed

But that's a problem for another forum Smiley Wink

Wondering if there is a command line option to turn off the Shared Folders and then execute the shutdown?

This way I could script up the safe shutdown.

Reply
0 Kudos
wila
Immortal
Immortal

Hi,

Regarding scripting a safe shutdown.

You could use vmrun and pass it the "disableSharedFolders" option.

Then follow that up by a vmrun option "stop" (stop soft) to shutdown your VM.

You could create a similar script to start the VM and have it enable shared folders on boot.

See:

VIX API Documentation

--
Wil

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva
Reply
0 Kudos
postcert
Contributor
Contributor

For those who are lazy

vmrun disableSharedFolders $VMX_PATH
vmrun stop $VMX_PATH soft

This worked all of 2 uses for me, I promptly then instinctively shutdown my vm from inside Windows and lost work on the host.

Setting up a Samba share is by far the safest/easiest/fastest Smiley Happy

Thanks at ton for finding out the root cause mfvianna​!

Reply
0 Kudos