12 Replies Latest reply on Apr 25, 2018 10:13 AM by Tango2018

    VM's Lock Up

    Tango2018 Novice

      I have two VM's, one new, one which has been used for some time on my Windows 10 high-powered server (HP Z820).

       

      I am experiencing random 'lock up' issues where VMWare Workstation 12 appears to freeze so no VM can be accessed even when in use. This sometimes happens when copy/pasting files to and from the VM's at other times, it seems almost random and can lock up when highlighting files on the desktop of a VM.

       

      I have tried:

      - Creating new VM's (with both VM 12 and 14 configuration)

      - Installing Windows and VMWare Workstation updates

      - Switching form a dynamic to fixed/allocated space disk setup

      - Disabling AntiVirus and Security

       

      but the issue still seems to happen.

       

      Attached are logs of the VM's from a short time ago, where I copied a file to one VM which appeared to be successful, but then when clicking on the desktop to the VM all locked up for about 6-10 minutes. The whole program was locked so I could not select the other VM which was running either. If I minimised VMWare I could use the host server without issues.

       

      I would be very grateful for any help,

       

      Thanks,

      Liam

        • 1. Re: VM's Lock Up
          bluefirestorm Master

          You could try disabling the Meltdown patch first at the VM level. The Meltdown patch hits systems with CPUs that don't have the INVPCID instruction harder in performance for situations where there is intensive I/O (disk or network). The INVPCID instruction is not available in the CPU that you have in the Z820 (Ivy Bridge 2.6GHz E5-2650 v2) as it was introduced in the Haswell generation of Intel CPUs. You could also try disabling Meltdown at the host level as well.

           

          You can follow this Microsoft article on the steps on how to disable Meltdown (in Switch/Registry settings section).

           

          https://support.microsoft.com/en-us/help/4073119/protect-against-speculative-execution-side-channel-vulnerabilities-in

           

          From the looks of the article it disables both CVE-2017-5715 (Spectre) and CVE-2017-5754 (Meltdown).

           

          The Meltdown patch does not rely on the microcode update. Only the Spectre patch relies on the microcode update. Your system already had the microcode even with your earlier post VMWare Not Working (Stuck at Windows Logo, or intermittently boots but locks up) but since you used 12.1.1 the Spectre microcode was not exposed to the VM (hence the Spectre patch would have been as good as disabled). Versions 12.5.9 and 14.x exposes the Spectre microcode update to the VM.

           

          2018-03-08T16:46:21.408Z| vmx| I125: hostCPUID level 00000007, 0: 0x00000000 0x00000281 0x00000000 0x0c000000

          • 2. Re: VM's Lock Up
            Tango2018 Novice

            Hi,

             

            Thank you very much for the detailed reply. I have just ran these commands on the HP Z820 server itself and both VM's mentioned. I'll restart all of them and see if the issue persists and post an update here.

             

            Fingers crossed, thanks!

            • 3. Re: VM's Lock Up
              Tango2018 Novice

              Hi,

               

              I have applied the fix you mentioned, but we still seem to be having intermittent lock ups. For instance, I was using the host computer, switched back to the VM inside the application but when I clicked on the desktop of the VM, it all locked up for about 40 seconds. Similarity, I was copying some files from one area of the VM to another and it locked up for a few minutes.

               

              Any other ideas what maybe causing this please?

               

              Thank you for your time and support

              • 4. Re: VM's Lock Up
                bluefirestorm Master

                You have many logical and physical drives. From what I can make out from the log

                 

                drive C - system drive

                drive D - VM has some shared folder

                drive E - where the VM resides

                drive H - SCSI CD-ROM drive (it appears to be SCSI but VM setup is virtual SATA CD ROM)

                drive Y - shared folder for VM

                 

                2 x ASMedia USB 3.0 external drives, But from the looks of it these drives are connected to USB 2.0??? (Correct me if I am wrong).

                 

                What I suggest now is if the VM is on external USB drive (assuming drive E is on one of the external USB 3.0 drives), try running the VM off from an internal disk instead of a USB disk. 

                Also try to disable the virtual CD/DVD drive from the VM or change it to SCSI instead of SATA. One of the logs shows long read/write times to the virtual SCSI disk of the VM; on the same log there seems to be some virtual CD/DVD drive errors on one of the logs

                 

                2018-03-26T00:17:04.729+01:00| vmx| I125: scsi0:0: Command WRITE(10) took 9.742 seconds (ok)

                2018-03-26T00:17:36.100+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.372 seconds (ok)

                2018-03-26T00:17:39.303+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.369 seconds (ok)

                2018-03-26T00:18:02.151+01:00| vmx| I125: scsi0:0: Command READ(10) took 1.948 seconds (ok)

                2018-03-26T00:18:04.527+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.369 seconds (ok)

                2018-03-26T00:18:24.286+01:00| vmx| I125: scsi0:0: Command READ(10) took 3.214 seconds (ok)

                2018-03-26T00:18:33.138+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.460 seconds (ok)

                2018-03-26T00:18:33.141+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.462 seconds (ok)

                2018-03-26T00:18:53.462+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.367 seconds (ok)

                2018-03-26T00:18:54.623+01:00| vmx| I125: scsi0:0: Command READ(10) took 1.161 seconds (ok)

                2018-03-26T00:18:54.623+01:00| vmx| I125: scsi0:0: Command READ(10) took 1.158 seconds (ok)

                2018-03-26T00:19:01.930+01:00| vmx| I125: scsi0:0: Command WRITE(10) took 3.569 seconds (ok)

                2018-03-26T00:19:12.551+01:00| vmx| I125: scsi0:0: Command READ(10) took 3.214 seconds (ok)

                2018-03-26T00:19:18.336+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.366 seconds (ok)

                2018-03-26T00:19:23.139+01:00| vmx| I125: scsi0:0: Command WRITE(10) took 3.401 seconds (ok)

                2018-03-26T00:22:11.701+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.427 seconds (ok)

                2018-03-26T00:22:16.873+01:00| vmx| I125: scsi0:0: Command READ(10) took 3.222 seconds (ok)

                2018-03-26T00:22:30.635+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.383 seconds (ok)

                2018-03-26T00:22:34.374+01:00| vmx| I125: scsi0:0: Command READ(10) took 3.729 seconds (ok)

                2018-03-26T00:22:34.375+01:00| vmx| I125: scsi0:0: Command READ(10) took 2.384 seconds (ok)

                2018-03-26T00:24:37.619+01:00| vmx| I125: scsi0:0: Command READ(10) took 111.876 seconds (ok)

                2018-03-26T00:24:37.619+01:00| vmx| I125: scsi0:0: Command READ(10) took 111.875 seconds (ok)

                2018-03-26T00:24:42.231+01:00| vmx| I125: scsi0:0: Command READ(10) took 1.119 seconds (ok)

                 

                Errors on the virtual CD/DVD drive

                2018-03-26T00:17:36.925+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 3.195 seconds (ok)

                2018-03-26T00:17:39.306+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.375 seconds (ok)

                2018-03-26T00:17:41.803+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.490 seconds (ok)

                2018-03-26T00:17:47.520+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.932 seconds (ok)

                2018-03-26T00:18:04.529+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.371 seconds (ok)

                2018-03-26T00:18:14.453+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 3.441 seconds (ok)

                2018-03-26T00:18:17.761+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 3.299 seconds (ok)

                2018-03-26T00:18:23.437+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.368 seconds (ok)

                2018-03-26T00:18:30.672+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.626 seconds (ok)

                2018-03-26T00:18:33.141+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.460 seconds (ok)

                2018-03-26T00:18:42.223+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.365 seconds (ok)

                2018-03-26T00:18:54.625+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 1.159 seconds (ok)

                2018-03-26T00:18:58.348+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 3.713 seconds (ok)

                2018-03-26T00:19:00.725+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.369 seconds (ok)

                2018-03-26T00:19:14.932+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.375 seconds (ok)

                2018-03-26T00:19:23.139+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 1.034 seconds (ok)

                2018-03-26T00:22:08.179+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.486 seconds (ok)

                2018-03-26T00:22:11.704+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 3.514 seconds (ok)

                2018-03-26T00:22:24.734+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 1.188 seconds (ok)

                2018-03-26T00:22:27.132+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 2.392 seconds (ok)

                2018-03-26T00:24:37.619+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 114.249 seconds (ok)

                • 5. Re: VM's Lock Up
                  Tango2018 Novice

                  Hi,

                   

                  Thanks for the quick reply, some follow up notes to help:

                   

                  drive C - system drive  //Yes I keep the VMware Installation and Host OS separate to the VM Drive

                  drive D - VM has some shared folder   //Now disabled

                  drive E - where the VM resides  //Yes this is a separate SSD

                  drive H - SCSI CD-ROM drive (it appears to be SCSI but VM setup is virtual SATA CD ROM)  //Disabled, the VM should not have access to this, the only 'removable devices' the VM has access to is the Network Adapter

                  drive Y - shared folder for VM  //Yes, I had to disable this otherwise it would boot

                   

                  2 x ASMedia USB 3.0 external drives, But from the looks of it these drives are connected to USB 2.0??? (Correct me if I am wrong).

                  // These are backup drives, the VM's are not running from these and can be disconnected

                   

                  What I suggest now is if the VM is on external USB drive (assuming drive E is on one of the external USB 3.0 drives), try running the VM off from an internal disk instead of a USB disk.

                  // VM is on an internal SSD drive not USB

                   

                  Also try to disable the virtual CD/DVD drive from the VM or change it to SCSI instead of SATA.

                  // CD/DVD is not connected. When first creating a VM I tried to install from DVD but it didn't work so I used an ISO, which is also disconnected

                   

                   

                  It's interesting the logs are showing CD/DVD access as no VM has access to these. I've disabled by unchecking the 'connected' and 'connected at power on' in the device status of the Hardware Settings. Is this enough, or is there another place I need to do this?

                   

                  Thanks again for your continued help,

                  • 6. Re: VM's Lock Up
                    bluefirestorm Master

                    The VM1 log (where virtual CD errors were logged, with one taking 114.249 which is almost 2 minutes, combined with the other 2 SCSI 111 seconds, it is almost 6 minutes), the virtual CD/DVD drive was connected.

                     

                    2018-03-26T00:05:05.015+01:00| vmx| I125: DICT        sata0:1.deviceType = "cdrom-raw"

                    2018-03-26T00:05:05.015+01:00| vmx| I125: DICT          sata0:1.fileName = "auto detect"

                    2018-03-26T00:05:05.015+01:00| vmx| I125: DICT           sata0:1.present = "TRUE"

                    2018-03-26T00:05:05.016+01:00| vmx| I125: DICT       sata0.pciSlotNumber = "36"

                    2018-03-26T00:05:05.016+01:00| vmx| I125: DICT        sata0:1.autodetect = "TRUE"

                    2018-03-26T00:05:05.016+01:00| vmx| I125: DICT    sata0:1.startConnected = "TRUE"

                     

                    2018-03-26T00:24:37.619+01:00| vmx| I125: scsi0:0: Command READ(10) took 111.876 seconds (ok)

                    2018-03-26T00:24:37.619+01:00| vmx| I125: scsi0:0: Command READ(10) took 111.875 seconds (ok)

                    2018-03-26T00:24:37.619+01:00| vmx| I125: sata0:1: Command *UNKNOWN (0x4a)* took 114.249 seconds (ok)

                     

                    Unchecking the "Connect on startup" should do disable the virtual CD/DVD from the VM.

                     

                    What is the internal drive controller on the SSD that you use in the Z820? Do you also configured RAID?

                     

                    Have you tried using virtual SATA in the VM instead of the default SCSI? With version 14, there is also support for virtual NVMe.

                     

                    As for the removable devices, I assume you meant inside the VM. Although I don't think it should make any difference to the lock up, you can add the following lines to the vmx configuration so that any additional SCSI, SATA, Ethernet devices inside the VM will not be ejectable.

                     

                    ahci.port.hotplug.enabled = "FALSE"

                    devices.hotPlug = "FALSE"

                     

                    The one thing different with Ivy Bridge Xeon CPUs and later is that there is virtual interrupt delivery capability. But the intention of virtual interrupt delivery is to prevent VMEXIT situation wherein an interrupt is generated inside the VM (such as from the virtual NIC) the VM does not have to go back to the hypervisor; and thus preventing host CPU cycle overheads. I don't think regular Intel desktop and laptop chips have this capability; so far only I have seen only on vmware.log files that has Ivy Bridge Xeon and newer Xeons after Ivy Bridge.

                     

                    2018-03-26T00:05:05.038+01:00| vmx| I125:   Virtual-interrupt delivery               {0,1}

                     

                    From your first post, does the boot up still get stuck on Windows logo? I just want to know whether this should be taken as part of the symptoms of an overall problem (or set of problems).

                     

                    If we take the VM1 log at face value the nearly 6 minute read and sata unknown command to be the time the lockup occurred, the problem might lie there at the virtual disk controller or on the host disk controller.

                    • 7. Re: VM's Lock Up
                      Tango2018 Novice

                      Unchecking the "Connect on startup" should do disable the virtual CD/DVD from the VM.

                      //Apologies you are correct, on VM1 the CD was connected as 'auto locate' I have disabled this. It must have defaulted a setting when making the new VM

                       

                      What is the internal drive controller on the SSD that you use in the Z820? Do you also configured RAID?

                      //We don't have a raid. The internal drive controller (if i've identified it correctly from Device Manager) is 'Intel(R) C600 Series Chipset SATA AHCI Controller' and 'Standard Dual Channel PCI IDE Controller'

                       

                      Have you tried using virtual SATA in the VM instead of the default SCSI? With version 14, there is also support for virtual NVMe.

                      //No i've not tried this. Can I convert the exiting VM? I'm happy to try if I can also convert back so not to lose anything. What would be the advantages of this?

                       

                      From your first post, does the boot up still get stuck on Windows logo? I just want to know whether this should be taken as part of the symptoms of an overall problem (or set of problems).

                      //Yes in part. I transferred a VM from a Z600 to this Z820 and at first it appeared to boot. After turning off and on I again experience the issue where the VM was stuck at the Windows 10 screen, and after a few attempts looped to recovery mode. I tried so many different combinations and could only get it to boot in safe mode. I tried disabling driver signing, malware, removed AV from the host, adjusted CPU's and RAM, and disabled share. I found a combination where it works but i've not 100% what was stopping it and fear changing the settings.

                       

                      I am happy to try anything you think may work. If I take the same VM and run on the Z600 it seems to work without issues.

                       

                      Thanks,

                      • 8. Re: VM's Lock Up
                        Tango2018 Novice

                        Is there any benefit to allowing the VM to use the SSD as a storage drive direct as opposed to it generation a vmdk?

                        • 9. Re: VM's Lock Up
                          bluefirestorm Master

                          GIven that the same VM works without issues on an older Z600, it is unlikely to make any difference to change the virtual disk controller.

                           

                          If you do want to try, I think you should be able to change from SCSI to SATA or NVMe directly as the Windows 10 has native SATA and NVMe controllers. For NVMe, the hardware compatibility has to be version 14. It would be a different story if the guest OS was Windows 7 as it does not have a native NVMe support (hotfix download is required).

                           

                          As for the host controller Intel C600, that is part of the overall C600 chipset, which usually is an Intel RSTe which has software RAID capability. What is the mode setting at the Z820 for the controller, is it RAID or AHCI? If you change to one or the other the host won't be able to boot up. If it is RAID, the Intel RSTe software allows you to set up software RAID within the Windows host. If you have time, it will be better to set it to AHCI instead of RAID (which means re-installing the host OS as well). You might also want to disable any power saving at the EFI/BIOS of the Z820 especially for the SATA controllers.

                           

                          Looking at the Z820 specifications, there looks to be some optional LSI RAID controllers.

                           

                          Alternatively, you could also try to run the VM from an external USB drive and see if the lock-ups disappear. (Yes, I know, it is a 180 degree turn from the earlier post).

                           

                          As to using SSD as raw disk for a VM, you will lose the ability to have snapshots. I haven't used raw disks seriously with VMware VMs so my experience with it

                          • 10. Re: VM's Lock Up
                            Tango2018 Novice

                            Hi,

                             

                            Thanks again for the reply and assistance so far.

                             

                            I am looking into making a new VM with SATA or NVMe as suggested in the new VMware Workstation 14 configuration. Few other bits of information below:

                             

                            What is the mode setting at the Z820 for the controller, is it RAID or AHCI?

                            I don't have a RAID setup so it will be AHCI

                             

                            You might also want to disable any power saving at the EFI/BIOS of the Z820 especially for the SATA controllers.

                            Already done also when setting up the HP Z820 as it can cause issues with it turning on at set times automatically (BIOS power on)

                             

                            Alternatively, you could also try to run the VM from an external USB drive and see if the lock-ups disappear. (Yes, I know, it is a 180 degree turn from the earlier post).

                            I've done this before, but happy to try again. Previously it kept locking up or losing connection so I stopped running even temporary VM's from USB's and instead copied them to the internal drive.

                             

                            I have been trying to use the VM for programming this morning and again it keeps locking up. I'm sure its not anything inside the VM itself. I have attached an updated log in the hope it may reveal more information and would really appreciate any further thoughts.

                             

                            Thanks again,

                            • 11. Re: VM's Lock Up
                              bluefirestorm Master

                              I don't see much else in the vmware.log. Are there any System Event log entries or Application Log entries on the host around the time of the lock ups to indicate anything?

                               

                              At the moment, the suspicion I have is with the storage or the storage controller. If the setting is RAID, even though you haven't configured RAID, it is possible that the storage controller driver is behaving differently. At the very least install the Intel RSTe driver/software either from the Intel or HP website.

                               

                              For this latest log you are using an old version of VMware Tools

                              2018-03-30T07:32:09.657+01:00| vmx| I125: DISKUTIL: scsi0:0 : max toolsVersion = 10246, type = 1

                              2018-03-30T07:32:38.549+01:00| vcpu-5| I125: Guest: vm3d: SVGA WDDM Full Display driver, Version: 8.15.01.0033, Build Number: 3167660

                               

                              Although unlikely to have any bearing to the lock ups, it is better to update the VMware Tools to version 10.2. There were some fixes to shared folders (HGFS, host guest file system)due to the way Windows 10 1709 changed how files are handled.

                               

                              2018-03-30T07:32:09.861+01:00| vcpu-0| I125: HGFSPublish: publishing 0 shares

                              2018-03-30T07:32:46.624+01:00| vcpu-3| I125: HGFileCopyCreateSessionCB: Successfully created the session.

                              2018-03-30T12:54:49.736+01:00| vmx| I125: Progress -1% (msg.HGFileCopy.preparewrite)

                              2018-03-30T12:54:49.837+01:00| vcpu-1| I125: HGFileCopyCreateSessionCB: Successfully created the session.

                              2018-03-30T12:54:49.837+01:00| vcpu-1| I125: Progress 0% (msg.HGFileCopy.WriteFile)

                              2018-03-30T12:54:49.838+01:00| vcpu-1| I125: Progress 0% (msg.HGFileCopy.WriteFile)

                              2018-03-30T12:54:49.843+01:00| vcpu-1| I125: Progress 0% (msg.HGFileCopy.WriteFile)

                              2018-03-30T12:54:49.847+01:00| vcpu-1| I125: Progress 1% (msg.HGFileCopy.WriteFile)

                              2018-03-30T12:54:49.850+01:00| vcpu-1| I125: Progress 1% (msg.HGFileCopy.WriteFile)

                              2018-03-30T12:54:49.861+01:00| vcpu-1| I125: Progress 6% (msg.HGFileCopy.WriteFile)

                               

                              2018-03-30T07:33:19.745+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 50000 numMergedIOs = 12514 numSplitIOs = 35

                              2018-03-30T07:38:01.433+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 100000 numMergedIOs = 27240 numSplitIOs = 53

                              2018-03-30T07:39:26.048+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 150000 numMergedIOs = 35057 numSplitIOs = 60

                              2018-03-30T07:42:55.819+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 200000 numMergedIOs = 38563 numSplitIOs = 70

                              2018-03-30T08:33:38.916+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 250000 numMergedIOs = 48433 numSplitIOs = 74

                              2018-03-30T11:27:24.845+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 300000 numMergedIOs = 56709 numSplitIOs = 85

                              2018-03-30T13:00:18.224+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 350000 numMergedIOs = 67644 numSplitIOs = 91

                              2018-03-30T13:13:16.576+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 400000 numMergedIOs = 78340 numSplitIOs = 110

                              2018-03-30T13:37:59.227+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 450000 numMergedIOs = 87300 numSplitIOs = 119

                              2018-03-30T13:51:16.162+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 500000 numMergedIOs = 93735 numSplitIOs = 126

                              2018-03-30T14:04:02.485+01:00| vmx| I125: DISKLIB-LIB   : numIOs = 550000 numMergedIOs = 122815 numSplitIOs = 165

                              • 12. Re: VM's Lock Up
                                Tango2018 Novice

                                Hi bluefirestorm,

                                 

                                Thank you for your reply, apologies for the delay I have been away.

                                 

                                I reached out to the original workstation builders and the concur with your assessment that it maybe a fault with the RAID or drive controller. They have picked up the workstation and will be replacing affected components and retesting.

                                 

                                Once received back i'll rerun all the tests to see if the same issues are occurring.

                                 

                                Thank you again for you help so far, I should know more in a weeks time.

                                 

                                Thanks again