11 Replies Latest reply on Feb 20, 2014 1:39 AM by amencheng

    Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm

    ewhite7 Novice

      After upgrade from 5.0 to 5.5, I am seeing strange behavior.  These few VMs also hang at 65% when using vMotion.  All have really high CPU used, and IDLE.

      If I log into the guest OS, no activity.  I mean 0.0%us.  Proxies are CentOS 5.2 Linux, the Jasper is Windows.  The Jasper has nothing but default windows process running and is completely idle.

       

      7:07:29pm up 1 day  3:44, 572 worlds, 10 VMs, 16 vCPUs; CPU load average: 0.73, 0.72, 0.73

      PCPU USED(%):  59  70  54  57  51  53  73  71 AVG:  61

      PCPU UTIL(%):  68  81  64  70  56  61  82  82 AVG:  71

       

       

            ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %MLMTD  %SWPWT

        512777   512777 cl-ivrproxy         8  141.94  139.26    1.16  646.57    0.34   14.05  187.08    0.62    0.00    0.00    0.00

         66472    66472 Jasper (windows     7  140.34  138.41    0.04  554.68    0.11    6.78  197.28    0.37    0.00    0.00    0.00

        438541   438541 cl-proxy            7  139.26  137.58    0.48  550.05    0.52   12.28   91.20    0.53    0.00    0.00    0.00

       

      I have another host with a few more VMs behaving the same way.  The rest of the VMs on the hosts are behaving fine.

      The Windows VM has two vCPUs and is using the multiprocessor HAL.

      The two Linux proxies are single vCPU, expect I changed on to two vCPU to see if it made a difference.  No difference.

       

      Any thoughts?

        • 1. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
          ewhite7 Novice

          Also, these same VMs give this error when trying to access console through vCenter:

          "Unable to connect to the MKS: Error connecting to  /bin/vmx process."

          • 2. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
            zXi_Gamer Master

            "Unable to connect to the MKS: Error connecting to  /bin/vmx process.

            • It usually happens when a vm has moved to a different host and the vmx process along with other worlds related to the vm are now in the ownership of the destination host where it has been targetted for migration.
            • In the process, if migration fails, then the vmx process will be transferred to the source and the vmx created in the destination will be killed.

             

            Now coming to the issue at hand, in the host where you are able to see the vms in esxtop, can you press "e" and type in the GID of one VM say, the Jasper and let us know, if you are able to see the vmx process running with it?

            1 person found this helpful
            • 3. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
              ewhite7 Novice

              Wow, that little trick is helpful.

              looks like mks is the culprit.  I can only assume that the svga is also related to the virtual console.

               

              Any advice on how to set it straight?

               

                    ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %MLMTD  %SWPWT

                 67008    66472 vmx                 1    0.17    0.11    0.06   99.82       -    0.09    0.00    0.00    0.00    0.00    0.00

                 67012    66472 vmast.67011         1    0.01    0.01    0.00  100.00       -    0.00    0.00    0.00    0.00    0.00    0.00

                 67014    66472 vmx-vthread-5:J     1    0.00    0.00    0.00  100.00       -    0.00    0.00    0.00    0.00    0.00    0.00

                 67206    66472 vmx-mks:Jasper      1   75.09   74.50    0.00   19.12       -    6.40    0.00    0.27    0.00    0.00    0.00

                 67207    66472 vmx-svga:Jasper     1   59.06   57.57    0.00   27.76       -   14.69    0.00    0.21    0.00    0.00    0.00

                 67208    66472 vmx-vcpu-0:Jasp     1    0.87    0.86    0.00   98.14    0.09    1.01   98.05    0.01    0.00    0.00    0.00

                 67209    66472 vmx-vcpu-1:Jasp     1    0.75    0.74    0.00   98.49    0.09    0.79   98.39    0.01    0.00    0.00    0.00

              • 4. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                zXi_Gamer Master

                looks like mks is the culprit.  I can only assume that the svga is also related to the virtual console.

                Close enough, mks is just a process. We need to find out who called it and why the owner said that it was unable to connect to the mks. svga is again a process to handle the mks thread.

                 

                Any advice on how to set it straight?

                Are you opening the console from the source or destination?. Having asked that, I have also seen such issues when:

                1. I open the console of  a VM

                2. Migrate the VM to another host

                3. The remote console gets stuck for a considerable time and returns the error.

                 

                This is because, the remote console opened with the source host is no longer existing, Because it moved to the destination

                 

                Also, the other way to work around is to restart the VM is you are unable to get any remote consoles

                1 person found this helpful
                • 5. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                  ewhite7 Novice

                  More information:

                  - restarting the VM results in the same state.

                  - migrating with VM powered on results in hung vMotion at 65%

                  - migrating with VM powered off works.

                  - starting VM on another host results in same state

                  - removing the VM from inventory, and re-adding it through datastore manager results in same state ( Have to "esxcli vm process kill" to unlock it before I can re-add it.)

                   

                  I had the thought that if I moved them around until I found the original host, it might fix it.  Didn't seem to work that way.

                   

                  The only thing I found that resolved this strange state was to remove the VM from inventory.  Create a new VM.  Copy VMDK files over to new VM directory. Add disks to new VM configuration.

                  I would rather have a more graceful solution, and I am really curious about how this could happen.

                   

                  I have 6 VMs stuck in this strange state.  It all started towards the end of my upgrade from 5.0 to 5.5.  Everything seemed fine.  I brought up the last of my 4 hosts, and tried to vMotion some VMs around to organize my machines.  These 6 all stuck at 65% and never recovered.

                  • 6. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                    zXi_Gamer Master

                    These 6 all stuck at 65% and never recovered.

                    Only things we can do is check the vmkernel.log to find out what is hogging during the 65% time time. either is VM having too much of snapshots or heavy IO and raise up with VMware tech support if you have valid support contract.

                    • 7. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                      ewhite7 Novice

                      Created support request yesterday before creating this discussion. 

                      Still waiting for VMware to contact me.

                       

                      Thanks for your help.

                      • 8. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                        ewhite7 Novice

                        Root cause turned out to be entries in the VMX files for these machines that vSphere 5.5 did not like.

                        I believe the template that was used to create these handful of machines started off life as a Lab Manager VM a long time ago.

                        As such, it had many extra vmx options than a typical VM, some of which were mks parameters.

                        vSphere 5.0 did not seem to mind, but 5.5 has problems with them.

                         

                        Removing these seemed to fix the problem.

                        • 9. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                          pl1ght Lurker

                          Your post saved the day.  Just replaced 8 hosts with 5.5, many running linux VMs.  Before i started here apparently Lab Manager was used and we were experiencing the same issues as you stated and we were at a loss.  Just wanted to bump with a Thank you.  Solved our issues.

                          • 10. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                            amencheng Novice

                            Dear All,

                             

                            I compared the vmx files of CPU-high-load VMs and the VM template, they are almost the same. Can you give me some key words of the extra/unwanted entries in vmx file?

                             

                            Thank you very much!

                            • 11. Re: Strange VM behavior High %USED, High %IDLE, absolutly zero activity on vm
                              amencheng Novice

                              This is the vmx file's content of CPU-high-load VM:

                               

                              .encoding = "UTF-8"

                              config.version = "8"

                              virtualHW.version = "8"

                              pciBridge0.present = "TRUE"

                              pciBridge4.present = "TRUE"

                              pciBridge4.virtualDev = "pcieRootPort"

                              pciBridge4.functions = "8"

                              pciBridge5.present = "TRUE"

                              pciBridge5.virtualDev = "pcieRootPort"

                              pciBridge5.functions = "8"

                              pciBridge6.present = "TRUE"

                              pciBridge6.virtualDev = "pcieRootPort"

                              pciBridge6.functions = "8"

                              pciBridge7.present = "TRUE"

                              pciBridge7.virtualDev = "pcieRootPort"

                              pciBridge7.functions = "8"

                              vmci0.present = "TRUE"

                              hpet0.present = "TRUE"

                              nvram = "HO-SRV-FLE-02.nvram"

                              virtualHW.productCompatibility = "hosted"

                              powerType.powerOff = "soft"

                              powerType.powerOn = "hard"

                              powerType.suspend = "hard"

                              powerType.reset = "soft"

                              displayName = "HO-SRV-FLE-02"

                              extendedConfigFile = "HO-SRV-FLE-02.vmxf"

                              numvcpus = "4"

                              cpuid.coresPerSocket = "2"

                              scsi0.present = "TRUE"

                              scsi0.sharedBus = "none"

                              scsi0.virtualDev = "lsisas1068"

                              memsize = "3072"

                              scsi0:0.present = "TRUE"

                              scsi0:0.fileName = "HO-SRV-FLE-02.vmdk"

                              scsi0:0.deviceType = "scsi-hardDisk"

                              ide1:0.present = "TRUE"

                              ide1:0.deviceType = "atapi-cdrom"

                              ide1:0.startConnected = "FALSE"

                              ethernet0.present = "TRUE"

                              ethernet0.virtualDev = "e1000"

                              ethernet0.networkName = "Subnet 5"

                              ethernet0.addressType = "generated"

                              svga.vramSize = "8388608"

                              guestOS = "windows7-64"

                              uuid.location = "56 4d 2f cc e1 58 a1 7f-5e 50 93 e4 b9 9b 91 be"

                              uuid.bios = "56 4d 29 cb 3e 0d db 37-da 42 74 ab 0b d6 fd f6"

                              vc.uuid = "52 bb 2c a1 bd b7 98 30-45 1a 8e 16 cb e3 cd 92"

                              tools.upgrade.policy = "manual"

                              ethernet0.generatedAddress = "00:0c:29:d6:fd:f6"

                              vmci0.id = "198639094"

                              tools.syncTime = "FALSE"

                              annotation = "for file checking"

                              cleanShutdown = "FALSE"

                              replay.supported = "FALSE"

                              unity.wasCapable = "TRUE"

                              sched.swap.derivedName = "/vmfs/volumes/4e987720-8c12d07e-c8d6-782bcb4f76fe/HO-SRV-ANZ-02/HO-SRV-ANZ-02-be4d538a.vswp"

                              replay.filename = ""

                              scsi0:0.redo = ""

                              pciBridge0.pciSlotNumber = "17"

                              pciBridge4.pciSlotNumber = "21"

                              pciBridge5.pciSlotNumber = "22"

                              pciBridge6.pciSlotNumber = "23"

                              pciBridge7.pciSlotNumber = "24"

                              scsi0.pciSlotNumber = "160"

                              ethernet0.pciSlotNumber = "32"

                              vmci0.pciSlotNumber = "33"

                              scsi0.sasWWID = "50 05 05 6b 3e 0d db 30"

                              ethernet0.generatedAddressOffset = "0"

                              hostCPUID.0 = "0000000b756e65476c65746e49656e69"

                              hostCPUID.1 = "000206c220200800029ee3ffbfebfbff"

                              hostCPUID.80000001 = "0000000000000000000000012c100800"

                              guestCPUID.0 = "0000000b756e65476c65746e49656e69"

                              guestCPUID.1 = "000206c200020800829822031fabfbff"

                              guestCPUID.80000001 = "00000000000000000000000128100800"

                              userCPUID.0 = "0000000b756e65476c65746e49656e69"

                              userCPUID.1 = "000206c220200800029822031fabfbff"

                              userCPUID.80000001 = "00000000000000000000000128100800"

                              evcCompatibilityMode = "FALSE"

                              vmotion.checkpointFBSize = "8388608"

                              ide1:0.clientDevice = "TRUE"

                              floppy0.present = "FALSE"

                              softPowerOff = "FALSE"

                              toolsInstallManager.lastInstallError = "0"

                              toolsInstallManager.updateCounter = "2"

                              tools.remindInstall = "FALSE"

                              sched.cpu.affinity = "4,5,6,7"

                              sched.mem.affinity = "all"