VMware Communities
DickHinSC
Contributor
Contributor

VMWare Workstation 14.1.2 build-8497320, locks up

VMWare Workstation 14.1.2 build-8497320, locks up "non-responsive"; "waiting on IO" so says vmware.exe service. Workstation window becomes faded (2 "vmware.DMP" files available).  Host Can't Kill the app or the service, SO Only solution is complete Host CPU Poweroff then Restart.  And often the VM Guests  are damaged "beyond repair"  So Win 10 Pro kills all apps, keeps user data and installs a very fresh copy (like out of the box) of Win 10 WITHOUT any settings for local environment.  UGLY and Time Consuming.

Happens when switching to another Workstation VM tab, even when only one vm is powered on, others in shutdown mode. The powered on VM is damaged, as are all powered on VMs. See above.

Same VMs running GREAT on 3 active instances of VMPlayer.  Easy to switch between VMs (and host)

Dell Precision Tower 5820,  Intel Xenon W-2125, 64GB,  MR9440-8i SCSI Disk Device  Raid 5, Windows 10 Pro for Workstations 1803 17134.165  All Dell drivers and bios are current. All Dell's system tests pass.

I've been a workstation user since version 6 days.  First time I've run into non-responsive lockups that require host poweroff.

Any help would be greatly appreciated.

0 Kudos
72 Replies
bonnie201110141
VMware Employee
VMware Employee

From developer 's analysis,

The most immediately-apparent thing in the logs is that the VM is spending a *lot* of time waiting for disk I/O to complete on the host. Sorted by time-to-I/O-completion:

2018-08-15T02:46:29.927-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 12.279 seconds (ok)

2018-08-15T15:30:28.582-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.278 seconds (ok)

2018-08-15T07:58:32.302-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.571 seconds (ok)

2018-08-15T07:58:32.302-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.571 seconds (ok)

2018-08-15T07:58:32.303-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.571 seconds (ok)

2018-08-15T07:58:32.303-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.571 seconds (ok)

2018-08-15T07:58:32.304-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.571 seconds (ok)

2018-08-15T07:58:32.305-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.571 seconds (ok)

2018-08-15T07:58:32.302-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.572 seconds (ok)

2018-08-15T07:58:32.304-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.572 seconds (ok)

2018-08-15T07:58:32.301-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.573 seconds (ok)

2018-08-15T07:58:32.303-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 13.574 seconds (ok)

2018-08-15T03:13:34.661-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 14.348 seconds (ok)

2018-08-15T04:12:37.051-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 14.384 seconds (ok)

2018-08-15T04:32:31.292-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 14.613 seconds (ok)

2018-08-15T05:09:20.116-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 15.101 seconds (ok)

2018-08-15T03:30:12.098-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 15.558 seconds (ok)

2018-08-15T04:31:30.906-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 16.127 seconds (ok)

2018-08-15T05:09:20.116-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 17.317 seconds (ok)

2018-08-15T04:11:03.417-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 21.030 seconds (ok)

2018-08-15T15:30:28.585-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.144 seconds (ok)

2018-08-15T15:30:28.583-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.145 seconds (ok)

2018-08-15T15:30:28.584-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.145 seconds (ok)

2018-08-15T15:30:28.585-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.148 seconds (ok)

2018-08-15T15:30:28.584-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.152 seconds (ok)

2018-08-15T15:30:28.583-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.153 seconds (ok)

2018-08-15T15:30:28.582-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 22.154 seconds (ok)

2018-08-15T04:32:10.387-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 25.243 seconds (ok)

2018-08-15T04:18:54.204-04:00| vmx| I125: scsi0:0: Command WRITE(10) took 26.931 seconds (ok)

2018-08-15T04:38:18.615-04:00| vcpu-1| I125: scsi0:0: Command WRITE(10) took 44.075 seconds (ok)

2018-08-15T04:38:18.614-04:00| vcpu-1| I125: scsi0:0: Command WRITE(10) took 56.063 seconds (ok)

2018-08-15T04:38:18.614-04:00| vcpu-1| I125: scsi0:0: Command WRITE(10) took 56.063 seconds (ok)

2018-08-15T04:38:18.614-04:00| vcpu-1| I125: scsi0:0: Command WRITE(10) took 65.199 seconds (ok)

2018-08-15T04:38:18.614-04:00| vcpu-1| I125: scsi0:0: Command WRITE(10) took 66.061 seconds (ok)

2018-08-15T04:38:18.614-04:00| vcpu-1| I125: scsi0:0: Command WRITE(10) took 75.544 seconds (ok)

2018-08-15T02:10:55.739-04:00| vcpu-0| I125: scsi0:0: Command WRITE(10) took 348.559 seconds (ok)

2018-08-15T02:10:55.739-04:00| vcpu-0| I125: scsi0:0: Command WRITE(10) took 356.550 seconds (ok)

2018-08-15T02:10:55.739-04:00| vcpu-0| I125: scsi0:0: Command WRITE(10) took 376.094 seconds (ok)

2018-08-15T02:10:55.739-04:00| vcpu-0| I125: scsi0:0: Command WRITE(10) took 382.512 seconds (ok)

2018-08-15T02:10:55.739-04:00| vcpu-0| I125: scsi0:0: Command WRITE(10) took 384.303 seconds (ok)

2018-08-15T02:10:55.739-04:00| vcpu-0| I125: scsi0:0: Command WRITE(10) took 384.431 seconds (ok)

Q1) Does the MegaRAID management utility installed on the host OS (or in the host's firmware) reveal the health of the MegaRAID 9440-8i controller and its attached storage devices, or display an event log for the controller, or provide statistics for the controller-perceived I/O throughput and I/O latency while the VMs are running? (I'm not familiar with that controller or its available diagnostic features.)

Q2) It looks like there is a newer driver available for MegaRAID 9440-8i. The host is currently using driver version 7.705.7.0 (February 2018), and version 7.706.2.0 (May 2018) is currently available on the Broadcom website (but not yet on the Dell site):

   https://www.broadcom.com/products/storage/raid-controllers/megaraid-9440-8i

Can the customer try upgrading the host's MegaRAID 9440 8i driver and see if that helps?

0 Kudos
DickHinSC
Contributor
Contributor

bonnie201110141,

Sorry I missed seeing this post from you sooner... I hadn't noticed that this entire thread went to a new page on the Community Site.

Q1)  System passed all built-in BIOS tests, as well as Dell's Support Assist.  I haven't found a health or event log inside the Raid Controller.  I"ll keep looking. 

Q2)    BIOS: I'm running 1.6.2 but I see that 1.7.1 is available

          Intel Rapid Storage Technology enterprise Driver and Management Console: I'm running 5.3.1.19  but I see that 5.3.1.1031 is available

          At this point, I hesitate installing Device Mfgr's driver until it's been vetted by Dell (I'll check with Dell's Service Group - system's under warranty) And I'll look at the broadcom site perhaps the describe what the changes fix.

0 Kudos
Juxxi
Contributor
Contributor

The problem still persists: Vmwre workstation 14.1.3 build-9474260 with 1 Linux guest (ubuntu) open,

Windows 10 Host 17134.228.  After 12 hours of runtime, the host os becomes unresponsive and the

only option is to cold-reboot the host system.

@Vmware: any ideas this far..?!  Just out of curiosity, how many of you have disabled spectre/meltdown

mitigations even when CPU/BIOS provides support (e.g. through windows registry). I have (in host OS)

and I thought if that might be related to the issue?

0 Kudos
DickHinSC
Contributor
Contributor

I just uploaded via FTP "DickHinSC 2018-08-28 Slow Suspend.zip".  As name implies it is of a very slow guest suspend.  Host ran 2 and a half days with only 2 Windows 7 guests, both doing very little.  I wanted to restart host and vm so I would have a fresh system to realtime-record an important radio program.  One of the guests shut down in expected time.  The other took over half an hour - WAY TOO LONG.

Finally, I was able to restart the host and run the slow-suspending guest.  It ran in a very reasonable amount of time..  and today I tested the suspend - good speed - and shutdown - good speed - and run from shutdown - good speed.

So something is WRONG, as also reported in other posts, that over time (hours or days) Workstation 14.1.3 slows way down and eventually locks.  And even can lock the host.  This release is NOT READY for PRODUCTION USE.

All Dell drivers and Microsoft updates are current in the host.  And all Microsoft updates are current in the guests.  So that leave VM code as the likely culprit.  I've been messing with this release 14 problem since EARLY JUNE  and it's now LATE AUGUST.  It's about time to get a working, reliable, dependable Workstation from VMware.  All releases from version 6 through 12 were EXCELLENT, version 14 and it's support is NOT.

How about a realistic timeline for delivering a working product?!

0 Kudos
Juxxi
Contributor
Contributor

ANYONE at @VMWARE !?

What is the situation with this issue. It seems that heavy guest disk io eventually

crashes host os (both Windows and Linux). The issue is easy to reproduce; just

install guest os and run synthetic disk io for 24 hours -> host os begins to accumulate

stalled io requests until the host system crashes.

Can you please respond urgently.

0 Kudos
DickHinSC
Contributor
Contributor

I've just FTP Uploaded "DickHinSC More Lockups.zip"  It contains:

vmware-vmx.DMP,

14-1-3 Screenshot_MORE LOCKUP.docx of the host which has views of Task Manager and it's services. Note the performance screens that show Disk at 100% and average response time 1+ to 3+ seconds...and the causes of the delays (Waiting on I/O)...[in page 64 Juxxi has pointed out the I/O waits stacking up],

Three vmsupports...zip, one for each invocation of workstation 4.1.3,

and

Some workstation logs.

THIS PROBLEM:

There were two workstation invocations each with one win7 guest, and one workstation invocation with two Win10 guests.  All had been in "idle mode" about 12 hours over night...just sitting there.  In the morning the workstations were non-responsive.  Fortunately parts of the host still functioned enough to let the host shut down the guests gracefully, even though the guests couldn't shut themselves down.

Note that all DELL drivers and BIOS are current and so are microsoft Windows Updates.

I sincerely wish y'all would issue an update that fixes these problems.  WS 14 has been out for a full year.  That's way too long to still be having these problems.

0 Kudos
Jake_99
Contributor
Contributor

Adding to the pile.

I have an older PC with VMware Workstation (14.1.2 - latest build)...

          6-core Intel-extreme CPU X99,

          48GB of DDR3 RAM,

          dual Nvidia 970 graphics cards,

          SSDs used exclusively for VMs

          Win10 1803

          ASUS Rampage III Black (BIOS 0602)

          Watercooled..

...use Workstation a lot (like every day) - usually 3-4 VMs active.  No CPU lockups ever occurred on 6-core machine...but RAM/CPU limitation was impacting larger GNS3 simulations...

Built a new system to take over...

New system:

     18-core (7980XE) CPU

     128GB DDR4

     Nvidia 1080Ti

     Samsung NVMe 3x 1TB

     Win10 1803

     ASUS Rampage VI Extreme

     VMware Workstation 14.1.2 & 15....

     Watercooled…

CONSTANT CPU lockups - both HARD and soft lockups...Occurs on every and any VM...Building VMs causes lockups....Same VM that works fine on older system will NOT work on new system.  VM rebuilds - TAKE FOREVER....because of the lockups...cloning - almost a week activity due to lockups....Host system seems to be performing normally, only within VMWare Workstation am I seeing these issues...Upgraded to Workstation 15 - same problems...upgraded VMs to 15...same problems...

Currently trying to build a fresh VM of Ubuntu 18 server - it's been going for 4 hrs!!...should be 15 min tops...

     I've tried overclocking - doesn't help

     Power settings in Win10 set to High Performance...

     CPUID monitoring indicates nothing abnormal....

     Temps ~30 C

My gns3 VM on older system was set with 6 vCPU's - maxing out 1 sim would tax that to 100% which precipitated me to shut down boxes in the sim due to the max out issue - but no CPU lockups...New system - SAME VM...upgraded to 12vCPUs - 48GB RAM....CONSTANT cpu lockups with ~6-12% vCPU utilization going on... within the VM...

It's not being caused by CPU oversubscription - it happens at all vCPU levels...increasing the # of vCPU's doesn't help...

0 Kudos
AngryNapkin
Contributor
Contributor

Anymore news?

Its been months.

0 Kudos
DickHinSC
Contributor
Contributor

No news.  Same from WS 15, so no need to buy it.  I'm just about ready to stop holding my breath.

VMware Workstation from 12.5.9 back was much better.  I never had lockups. And I saw that problems were immediately fixed.  I could run several guests simultaneously and leave them and the host running for over a month. No so now.  Very disappointing.  I guess we WS 14 and 15 users just aren't respected customers.  A shame.

Since "upgrading" to ws 14 in early June 2018 I've gotten no useful work done muchless trustworthy reliable operation.  If I do need to get something done, immediately before I need it, I shutdown the guest ( I  run only one guest), exit WS, restart the host, run WS as admin, power on the one VM, and pray.  I paid for and expected better.

0 Kudos
Susie201110141
VMware Employee
VMware Employee

Hi Juxxi,

What kind of "run synthetic disk io " did you run to reproduce it? Could you share more details, then we can try to reproduce it locally.

0 Kudos
DickHinSC
Contributor
Contributor

I have a support log for you but FTP site rejects logon.  Please advise.

0 Kudos
DickHinSC
Contributor
Contributor

I uploaded support file "DickHinSC_vmsupport-2018-10-22-14-04.zip".

One invocation os WS 14.1.3, 5 Guests (2 Win7, 1 Win 8.1, 2 Win 10)  All had been sitting over night with no Apps running except one of the Win7 guests which had Chrome with many tabs, but only tab with GMail was selected. All 5 had screen blanked due to inactivity timeout.  After about 12+ hours in this state, I moved the mouse to turn on the screens.  I clicked on the black Win7 screen and Workstation title immediately said "Non-Responsive" and everything went cloudy white tint.   Note that host Taskmanager was running and showing on a 2nd display.  It's Performance tab was showing 2 secs to 7 secs "Average Response time" for Disk 0 (C:D:). Host sys on C:, VMs on D:.  I clicked on host "File Explorer" and when I tried to see files on 😧 it too went cloudy white, and it's title said Non-Responsive".  I left the system in this state and came back to it in about 90 minutes.  Cloudy white screens had turned normal and I was able to shut down all VMs successfully, collect support data, and reboot host, then restarting the Win7 which I'd been trying to use some 100 minutes ago.

Please advise when an update that truly FIXES this long-standing problem is available.

0 Kudos
AngryNapkin
Contributor
Contributor

Seems they have abandoned 14 only to push out 15. Bugs still exist in 14???? Just upgrade to 15. Ridiculous!

Last time I give vmware any of my money!

Word to wise...just use Virtualbox. Its faster and less bugs ( it doesnt lock up and cause kernel panics in both windows and linux.)

0 Kudos