Hi. I have a question about Witness Appliance CPU usage...
I have a 6.7 witness appliance running on an esxi 6.7U1 (free) host and it is consuming roughly half of the host's CPU steadily even while there is virtually very little disk activity. This seems very excessive to me.
The vSAN is healthy so I know the witness is working ok, any other data points on witness cpu usage that I can compare to? Any way to check the witness to see why it is running wide open? I haven't tried rebooting the witness yet but that is my next step.
FYI, I actually have two witness appliances running on the same esxi host and they are both exhibiting the same high CPU usage so my esxi host is running almost 100% cpu
Pics of the host CPU usage view plus the witness appliance view of the CPU attached.
TIA.
Hello @beezleinc
"it is consuming roughly half of the host's CPU"
Half of what? Half a potato might not be much but half a whale could be a lot :smileygrin:
"even while there is virtually very little disk activity"
A Witness isn't just a storage container for witness components, it is a minimal ESXi host with its own services and modules to run and also since witness components are small (16MB) you might not notice a huge amount of disk activity but this doesn't mean it's not changing 100's of components at any one time.
"any other data points on witness cpu usage that I can compare to? "
Look at this how you would any other appliance (or even home computer with Task Manager) - open esxtop and look at the metrics for what resources are using what % of what.
"I haven't tried rebooting the witness yet but that is my next step."
Potentially if it has a long up-time or some process is stuck rebooting might help but you shold be interested in looking at this more granularly e.g. processes using X amount of resources before and after.
How much/many vCPUs, RAM, storage and components do these Witnesses have?
Anything else running on this ESXi Free host and what are the physical server specs?
Bob
Hi Bob. I've logged into both witness esxi appliances and esxtop simply shows the 'system' process consuming a steady CPU %USED of 45 and a %RDY of ~50-60 which, from what I've been reading, indicates cpu contention.... but the host esxi shows each witness 'world' with a very low %RDY which indicates it's getting all the CPU it's requesting(?)
The esxi host is a Dell Poweredge with 2 x 4 core CPU's (Intel(R) Xeon(R) CPU X5667 @ 3.07GHz) 128GB RAM
Each witness was installed with 'normal' setting from the OVF. 16GB RAM, 2 vcpu's. (there are a couple of other low use VM's on the box as well but it is well under utilized from a memory or disk perspective)
Anyway, Just thought I'd throw it out there and would be curious to know how this compares with other witness appliances. This may be just the normal idle state of an embedded esxi within an esxi, idk.
Thx,
-a
esxtop from within one of the the witness esxi appliance
4:11:05pm up 55 days 21:09, 590 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.24, 0.23, 0.22
PCPU USED(%): 60 9.7 AVG: 35
PCPU UTIL(%): 100 10 AVG: 55
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %MLMTD %SWPWT
1 1 system 170 45.61 193.84 0.00 16974.81 - 59.31 0.00 18.09 0.00 0.00 0.00
88798951 88798951 esxtop.13388107 1 5.48 5.77 0.00 95.40 - 0.06 0.00 0.01 0.00 0.00 0.00
11350 11350 hostd.2099283 27 1.04 1.04 0.00 2700.00 - 0.07 0.00 0.00 0.00 0.00 0.00
15968 15968 vpxa.2099911 38 0.13 0.13 0.00 3800.00 - 0.05 0.00 0.00 0.00 0.00 0.00
222771 222771 sh.2136210 1 0.10 0.10 0.00 100.00 - 0.41 0.00 0.00 0.00 0.00 0.00
18963 18963 dcui.2100333 4 0.08 0.08 0.00 400.00 - 0.18 0.00 0.00 0.00 0.00 0.00
223299 223299 python.2136276 33 0.08 0.08 0.00 3300.00 - 0.02 0.00 0.00 0.00 0.00 0.00
...
esxtop from within the host esxi
4:19:38pm up 63 days 14:38, 680 worlds, 4 VMs, 7 vCPUs; CPU load average: 0.52, 0.54, 0.54
PCPU USED(%): 71 9.6 23 54 23 24 23 20 10 5.6 4.5 47 8.2 8.0 3.8 63 AVG: 25
PCPU UTIL(%): 74 15 39 59 36 38 37 29 10 6.1 5.8 44 8.6 8.1 5.1 60 AVG: 30
CORE UTIL(%): 80 82 61 57 15 47 16 63 AVG: 53
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %MLMTD %SWPWT
358419 358419 hostd.2169563 37 126.64 139.63 0.00 3552.60 - 23.59 0.00 0.25 0.00 0.00 0.00
368325 368325 abs-vsan-witnes 13 115.03 111.46 0.06 1193.98 0.00 0.24 89.24 0.11 0.00 0.00 0.00
370468 370468 abs-vsan-witnes 13 111.59 104.88 0.08 1200.57 0.00 0.16 96.01 0.11 0.00 0.00 0.00
1 1 system 309 33.21 1232.67 0.00 29402.59 - 414.86 0.00 14.78 0.00 0.00 0.00
2714598 2714598 absvc1 11 10.47 10.57 0.06 1093.81 0.02 0.37 190.07 0.07 0.00 0.00 0.00
2754165 2754165 esxtop.2648390 1 5.29 4.99 0.01 95.46 - 0.00 0.00 0.01 0.00 0.00 0.00
498795 498795 WitnessRouter 10 0.24 0.25 0.01 1000.00 0.00 0.03 100.85 0.00 0.00 0.00 0.00
FYI, I finally got around to rebooting my witness appliances and the high CPU load has been cured for now. Both of my witness appliances exhibited the same high CPU trigger at the same time.
This is the one month graph of CPU W/A usage off of my Virtual Center. Weird.
I'll post back if it happens again.
Hi,
did you get any reoccurance of this? we have around 400 witnesses and found that over the past month 60% of them have creeped up in CPU usage. It all manifests the same as you have mentioned here and a reboot seems to have cleared the CPU hog.
I've opened a ticket with vmware and i'll see where it goes, but wanted to see what your experience has been like?
See my comment in
High CPU utilization on witness ESXi appliance caused connectivity error
looks like this can happen from time to time...
Thanks, i have an open ticket with vmware already.
we have 250+ vms that i've rebooted in the past week. Every one has been up for 49 days and then hit a cpu spike.