frozenak
Enthusiast
Enthusiast

TKG Supervisor Cluster Constant Disk IO

Jump to solution

Is it expected for the supervisor cluster VMs to constantly be writing to disk at roughly 8-16MB/s? Each one is writing at that rate.

Spun up workload management in my homelab, and since the supervisor cluster was deployed there's constant writing to disk, with VMs reporting 4,000 write ops/sec and >30MB/s being written to disk by just these 3 VMs.

root@420631715fd538d484265a9a44e451cb [ /usr ]# /usr/bin/python3 /usr/bin/dstat
You did not select any stats, using -cdngy by default.
Color support is disabled as python-curses is not installed.
/usr/bin/dstat:515: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if isinstance(self.val[name], collections.Sequence) and not isinstance(self.val[name], six.string_types):
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
14 10 41 35 0| 210k 16M| 0 0 | 0 0 | 17k 47k
10 9 43 37 0| 0 16M| 27k 59k| 0 0 | 15k 50k
13 9 42 35 0| 0 14M| 39k 100k| 0 0 | 16k 45k
8 11 45 35 0| 0 15M| 13k 38k| 0 0 | 18k 48k
12 9 45 34 0| 0 15M| 35k 95k| 0 0 | 17k 49k
11 12 43 34 0| 0 15M| 29k 63k| 0 0 | 21k 43k

 

root@42068944986ef297cc64a8a2a234c0dd [ /usr/bin ]# /usr/bin/python3 /usr/bin/dstat
You did not select any stats, using -cdngy by default.
Color support is disabled as python-curses is not installed.
/usr/bin/dstat:515: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if isinstance(self.val[name], collections.Sequence) and not isinstance(self.val[name], six.string_types):
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
10 9 44 37 0| 162k 16M| 0 0 | 0 0 | 15k 41k
12 10 39 38 0| 0 16M| 16k 7118B| 0 0 | 16k 38k
7 7 44 41 0| 0 16M| 147k 58k| 0 0 | 17k 40k
8 8 45 39 0| 0 15M| 22k 9355B| 0 0 | 19k 41k
9 9 45 38 0| 0 16M| 137k 38k| 0 0 | 19k 40k
9 10 42 39 0| 0 16M| 28k 11k| 0 0 | 17k 36k
7 8 44 41 0| 0 16M| 56k 26k| 0 0 | 17k 39k
8 9 43 39 0| 0 16M| 22k 10k| 0 0 | 17k 37k

 

root@4206a24f8bfc148c68b68e56e223fa7f [ ~ ]# /usr/bin/python3 /usr/bin/dstat
You did not select any stats, using -cdngy by default.
Color support is disabled as python-curses is not installed.
/usr/bin/dstat:515: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
if isinstance(self.val[name], collections.Sequence) and not isinstance(self.val[name], six.string_types):
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
20 16 38 26 0| 163k 9567k| 0 0 | 0 0 | 10k 27k
18 15 40 27 0| 0 10M| 50k 21k| 0 0 |7850 26k
18 17 38 27 0| 0 9292k| 23k 11k| 0 0 |8936 25k
15 15 41 29 0| 0 8816k| 52k 23k| 0 0 | 10k 26k
14 14 41 31 0| 0 9464k| 29k 15k| 0 0 | 11k 26k
23 15 34 28 0| 0 9700k| 50k 22k| 0 0 | 12k 27k
17 17 38 28 0| 0 9024k| 25k 10k| 0 0 | 11k 26k
18 12 40 30 0| 0 9696k| 52k 24k| 0 0 | 11k 26k
18 17 38 27 0| 0 8968k| 25k 13k| 0 0 | 11k 26k
21 18 33 27 0| 0 9312k| 42k 18k| 0 0 | 13k 26k
36 24 20 20 0| 0 10M| 22k 10k| 0 0 | 15k 33k
15 13 43 28 0| 0 8524k| 55k 24k| 0 0 | 10k 25k
17 19 37 27 0| 0 8888k| 27k 15k| 0 0 | 11k 27k
14 16 45 25 0| 0 9880k| 46k 20k| 0 0 | 11k 33k
13 17 45 26 0| 0 10M|6903B 4904B| 0 0 |9881 32k

Validated by reviewing statistics on my storage and VM statistics.

https://imgur.com/a/7xqKz1Q

Have not yet figured out how to determine what/where it's writing to, but... that's seems a bit much no? 16MB/sec is a lot of text if it's just logging, and it's all in writes.

I've also deployed these numerous times with the same results. When I used "large" deployment I think it was a touch lower.

0 Kudos
21 Replies
Joffer77
Contributor
Contributor

I got an answer back on my SR. It was a simple fix for us:

You can disable fluentbit logging by:

 

1. ssh into vcsa
2. editing /etc/vmware/wcp/wcpsvc.yaml on the vcsa
     change logging_fluentbit_enabled: true => logging_fluentbit_enabled: false
3. service-control --restart wcp

 

From almost 8000 IOPS to ~250 IOPS:

Joffer77_0-1614347104848.png

 

frozenak
Enthusiast
Enthusiast

Looks like the default logging level was finally fixed in 7.0 U2a.

Just upgraded my vCenter and deployed workspace management again. Not seeing that extraneous disk activity.

Reviewing YAML for vmware-system-logging shows log level of error. I did not see what the setting was before, but per blog post on virten.net one of the solutions could be to manually update YAML to lower log level then rollout the change.

https://www.virten.net/2021/02/vsphere-with-tanzu-supervisorcontrolplanevm-excessive-disk-write-io/

 

Current YAML settings:

kubectl edit configmaps fluentbit-config --namespace vmware-system-logging

fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level error
Daemon off
Parsers_File parsers.conf
Plugins_File plugins.conf

0 Kudos