6 Replies Latest reply on Aug 15, 2019 5:35 AM by timrojaz

    vSAN Witness Appliance high CPU Usage

    beezleinc Novice

      Hi.  I have a question about Witness Appliance CPU usage...

       

      I have a 6.7 witness appliance running on an esxi 6.7U1 (free) host and it is consuming roughly half of the host's CPU steadily even while there is virtually very little disk activity.  This seems very excessive to me.

       

      The vSAN is healthy so I know the witness is working ok, any other data points on witness cpu usage that I can compare to?   Any way to check the witness to see why it is running wide open?   I haven't tried rebooting the witness yet but that is my next step.

       

      FYI, I actually have two witness appliances running on the same esxi host and they are both exhibiting the same high CPU usage so my esxi host is running almost 100% cpu

       

      Pics of the host CPU usage view plus the witness appliance view of the CPU attached.

       

      TIA.

        • 1. Re: vSAN Witness Appliance high CPU Usage
          TheBobkin Virtuoso
          vExpertVMware Employees

          Hello @beezleinc

           

           

          "it is consuming roughly half of the host's CPU"

          Half of what? Half a potato might not be much but half a whale could be a lot

           

          "even while there is virtually very little disk activity"

          A Witness isn't just a storage container for witness components, it is a minimal ESXi host with its own services and modules to run and also since witness components are small (16MB) you might not notice a huge amount of disk activity but this doesn't mean it's not changing 100's of components at any one time.

           

          "any other data points on witness cpu usage that I can compare to? "

          Look at this how you would any other appliance (or even home computer with Task Manager) - open esxtop and look at the metrics for what resources are using what % of what.

           

          "I haven't tried rebooting the witness yet but that is my next step."

          Potentially if it has a long up-time or some process is stuck rebooting might help but you shold be interested in looking at this more granularly e.g. processes using X amount of resources before and after.

           

          How much/many vCPUs, RAM, storage and components do these Witnesses have?

          Anything else running on this ESXi Free host and what are the physical server specs?

           

           

          Bob

          • 2. Re: vSAN Witness Appliance high CPU Usage
            beezleinc Novice

            Hi Bob.   I've logged into both witness esxi appliances and esxtop simply shows the 'system' process consuming a steady CPU %USED of 45 and a %RDY of ~50-60 which, from what I've been reading, indicates cpu contention.... but the host esxi shows each witness 'world' with a very low %RDY which indicates it's getting all the CPU it's requesting(?)

             

            The esxi host is a Dell Poweredge  with 2 x 4 core CPU's (Intel(R) Xeon(R) CPU X5667 @ 3.07GHz) 128GB RAM

            Each witness was installed with 'normal' setting from the OVF.  16GB RAM, 2 vcpu's.   (there are a couple of other low use VM's on the box as well but it is well under utilized from a memory or disk perspective)

             

            Anyway,  Just thought I'd throw it out there and would be curious to know how this compares with other witness appliances.  This may be just the normal idle state of an embedded esxi within an esxi, idk.

            Thx,

            -a

             

            esxtop from within one of the the witness esxi appliance

             

            4:11:05pm up 55 days 21:09, 590 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.24, 0.23, 0.22

            PCPU USED(%):  60 9.7 AVG:  35

            PCPU UTIL(%): 100  10 AVG:  55

             

             

                  ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %MLMTD  %SWPWT

                   1        1 system            170   45.61  193.84    0.00 16974.81       -   59.31    0.00   18.09    0.00    0.00    0.00

            88798951 88798951 esxtop.13388107     1    5.48    5.77    0.00   95.40       -    0.06    0.00    0.01    0.00    0.00    0.00

               11350    11350 hostd.2099283      27    1.04    1.04    0.00 2700.00       -    0.07    0.00    0.00    0.00    0.00    0.00

               15968    15968 vpxa.2099911       38    0.13    0.13    0.00 3800.00       -    0.05    0.00    0.00    0.00    0.00    0.00

              222771   222771 sh.2136210          1    0.10    0.10    0.00  100.00       -    0.41    0.00    0.00    0.00    0.00    0.00

               18963    18963 dcui.2100333        4    0.08    0.08    0.00  400.00       -    0.18    0.00    0.00    0.00    0.00    0.00

              223299   223299 python.2136276     33    0.08    0.08    0.00 3300.00       -    0.02    0.00    0.00    0.00    0.00    0.00

            ...

             

             

            esxtop from within the host esxi

             

            4:19:38pm up 63 days 14:38, 680 worlds, 4 VMs, 7 vCPUs; CPU load average: 0.52, 0.54, 0.54

            PCPU USED(%):  71 9.6  23  54  23  24  23  20  10 5.6 4.5  47 8.2 8.0 3.8  63 AVG:  25

            PCPU UTIL(%):  74  15  39  59  36  38  37  29  10 6.1 5.8  44 8.6 8.1 5.1  60 AVG:  30

            CORE UTIL(%):  80      82      61      57      15      47      16      63     AVG:  53

             

             

                  ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %MLMTD  %SWPWT

              358419   358419 hostd.2169563      37  126.64  139.63    0.00 3552.60       -   23.59    0.00    0.25    0.00    0.00    0.00

              368325   368325 abs-vsan-witnes    13  115.03  111.46    0.06 1193.98    0.00    0.24   89.24    0.11    0.00    0.00    0.00

              370468   370468 abs-vsan-witnes    13  111.59  104.88    0.08 1200.57    0.00    0.16   96.01    0.11    0.00    0.00    0.00

                   1        1 system            309   33.21 1232.67    0.00 29402.59       -  414.86    0.00   14.78    0.00    0.00    0.00

            2714598  2714598 absvc1             11   10.47   10.57    0.06 1093.81    0.02    0.37  190.07    0.07    0.00    0.00    0.00

            2754165  2754165 esxtop.2648390      1    5.29    4.99    0.01   95.46       -    0.00    0.00    0.01    0.00    0.00    0.00

              498795   498795 WitnessRouter      10    0.24    0.25    0.01 1000.00    0.00    0.03  100.85    0.00    0.00    0.00    0.00

             

            • 3. Re: vSAN Witness Appliance high CPU Usage
              beezleinc Novice

              FYI, I finally got around to rebooting my witness appliances and the high CPU load has been cured for now.   Both of my witness appliances exhibited the same high CPU trigger at the same time.

               

              This is the one month graph of CPU W/A usage off of my Virtual Center.   Weird.

               

              I'll post back if it happens again.

               

               

              • 4. Re: vSAN Witness Appliance high CPU Usage
                timrojaz Lurker

                Hi,

                 

                did you get any reoccurance of this? we have around 400 witnesses and found that over the past month 60% of them have creeped up in CPU usage. It all manifests the same as you have mentioned here and a reboot seems to have cleared the CPU hog.

                I've opened a ticket with vmware and i'll see where it goes, but wanted to see what your experience has been like?

                • 5. Re: vSAN Witness Appliance high CPU Usage
                  bmrkmr Novice

                  See my comment in

                  High CPU utilization on witness ESXi appliance caused connectivity error

                   

                  looks like this can happen from time to time...

                  • 6. Re: vSAN Witness Appliance high CPU Usage
                    timrojaz Lurker

                    Thanks, i have an open ticket with vmware already.

                    we have 250+ vms that i've rebooted in the past week. Every one has been up for 49 days and then hit a cpu spike.