Re: high OCFlush %USED with no VMs running. Esxtop...

yannbizeul · ‎01-12-2019

Here is a sample output of esxtop on a host running no VMs at all.

9:58:51am up 61 days 13:51, 597 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.08, 0.09, 0.11

PCPU USED(%): 61 0.2 0.9 0.2 0.4 0.3 0.3 0.1 AVG: 8.0

PCPU UTIL(%): 83 0.3 1.3 0.3 0.3 0.4 0.2 0.1 AVG: 10

CORE UTIL(%): 83 1.6 0.7 0.3 AVG: 21

ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %

1 1 system 208 45.28 783.63 0.00 19653.99 - 39.23 0.00 13.77 0.00

14004098 14004098 esxtop.4079612 1 3.74 2.84 0.02 95.57 - 0.00 0.00 0.01 0.00

14002722 14002722 sh.4079405 1 0.07 0.06 0.00 98.36 - 0.00 0.00 0.00 0.00

14003090 14003090 python.4079451 29 0.05 0.06 0.00 2854.05 - 0.01 0.00 0.00 0.00

6062 6062 hostd.2098611 31 0.03 0.04 0.00 3050.82 - 0.01 0.00 0.00 0.00

7488 7488 net-lbt.2098818 1 0.03 0.04 0.00 98.38 - 0.00 0.00 0.00 0.00

9880 9880 vpxa.2099137 38 0.03 0.02 0.00 3739.81 - 0.01 0.00 0.00 0.00

5555 5555 hostdCgiServer. 12 0.01 0.02 0.00 1180.95 - 0.01 0.00 0.00 0.00

2198 2198 net-lacp.209774 3 0.01 0.01 0.00 295.24 - 0.01 0.00 0.00 0.00

14002354 14002354 sshd.4079350 1 0.01 0.01 0.00 98.41 - 0.00 0.00 0.00 0.00

6231 6231 rhttpproxy.2098 27 0.01 0.00 0.00 2657.21 - 0.00 0.00 0.00 0.00

8 8 helper 154 0.00 0.00 0.00 15153.51 - 0.00 0.00 0.00 0.00

2320 2320 vmkiscsid.20977 2 0.00 0.01 0.00 196.83 - 0.00 0.00 0.00 0.00

5178 5178 vmware-usbarbit 1 0.00 0.00 0.00 98.41 - 0.00 0.00 0.00 0.00

You see a pretty reasonably high %USED for the system world, and it doesn't seem to be enough to troubleshoot.

So what I do next is use e key to enter the system world.

This is what I get :

10:01:00am up 61 days 13:53, 597 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.08, 0.09, 0.10

PCPU USED(%): 2.0 1.7 8.1 5.5 3.3 0.7 0.5 0.1 AVG: 2.7

PCPU UTIL(%): 3.5 2.6 11 7.0 2.7 0.8 0.9 0.1 AVG: 3.6

CORE UTIL(%): 6.0 18 3.5 1.0 AVG: 7.1

ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %

2097153 1 idle1 1 0.00 97.28 0.00 0.00 - 1.12 0.00 0.39 0.00

2097154 1 idle2 1 0.00 93.76 0.00 0.00 - 4.65 0.00 1.87 0.00

2097155 1 idle3 1 0.00 95.47 0.00 0.00 - 2.94 0.00 1.43 0.00

2097156 1 idle4 1 0.00 95.89 0.00 0.00 - 2.52 0.00 0.02 0.00

2097157 1 idle5 1 0.00 97.75 0.00 0.00 - 0.65 0.00 0.03 0.00

2097158 1 idle6 1 0.00 97.83 0.00 0.00 - 0.57 0.00 0.03 0.00

2097159 1 idle7 1 0.00 98.30 0.00 0.00 - 0.10 0.00 0.00 0.00

2097160 1 vmkEventAsyncMs 1 0.00 0.00 0.00 98.40 - 0.00 0.00 0.00 0.00

2097161 1 fastslab 1 0.00 0.00 0.00 98.39 - 0.01 0.00 0.00 0.00

2097162 1 SVGAConsole 1 0.00 0.00 0.00 98.39 - 0.01 0.00 0.00 0.00

2097163 1 debugtermlivedu 1 0.00 0.00 0.00 98.40 - 0.00 0.00 0.00 0.00

2097164 1 logSysAlert 1 0.00 0.00 0.00 98.40 - 0.00 0.00 0.00 0.00

2097165 1 serialLogger 1 0.00 0.00 0.00 98.40 - 0.00 0.00 0.00 0.00

2097166 1 tlbflushcount 1 0.00 0.00 0.00 98.40 - 0.00 0.00 0.00 0.00

Not really useful, so I'm gonna sort by %USED using U key :

10:01:55am up 61 days 13:54, 566 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.08, 0.09, 0.10

PCPU USED(%): 0.3 0.1 5.1 0.2 0.0 53 0.5 0.1 AVG: 7.5

PCPU UTIL(%): 0.4 0.1 4.2 0.3 0.0 75 0.6 0.1 AVG: 10

CORE UTIL(%): 0.5 4.4 75 0.1 AVG: 20

Sort by %used

ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %

2097153 1 idle1 1 0.00 100.84 0.00 0.00 - 0.08 0.00 0.02 0.00

2097154 1 idle2 1 0.00 96.73 0.00 0.00 - 4.18 0.00 0.00 0.00

2097155 1 idle3 1 0.00 100.77 0.00 0.00 - 0.14 0.00 0.01 0.00

2097156 1 idle4 1 0.00 100.88 0.00 0.00 - 0.04 0.00 0.00 0.00

2097157 1 idle5 1 0.00 70.16 0.00 0.00 - 30.75 0.00 12.81 0.00

2097158 1 idle6 1 0.00 100.51 0.00 0.00 - 0.40 0.00 0.07 0.00

2097159 1 idle7 1 0.00 100.84 0.00 0.00 - 0.07 0.00 0.00 0.00

2097160 1 vmkEventAsyncMs 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00

2097161 1 fastslab 1 0.00 0.00 0.00 100.00 - 0.01 0.00 0.00 0.00

2097162 1 SVGAConsole 1 0.00 0.00 0.00 100.00 - 0.01 0.00 0.00 0.00

2097163 1 debugtermlivedu 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00

2097164 1 logSysAlert 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00

2097165 1 serialLogger 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00

2097166 1 tlbflushcount 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00

As you can see, it doesn't really work. You might think that's because pretty much everything is at zero, but if I manage to display much more lines in my terminal output by reducing character size, I find some stuff way down :

ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %

[...]

2097277 1 CpuSchedRealloc 1 0.07 0.06 0.00 98.32 - 0.00 0.00 0.00 0.00 0.00 0.00

2097278 1 CpuMetricsLoadH 1 0.01 0.01 0.00 98.37 - 0.00 0.00 0.00 0.00 0.00 0.00

2097279 1 CpuSchedExtende 1 0.00 0.00 0.00 98.37 - 0.01 0.00 0.00 0.00 0.00 0.00

2097280 1 PktSlabMemorySt 1 0.00 0.00 0.00 98.37 - 0.01 0.00 0.00 0.00 0.00 0.00

2097305 1 DCFlushCaches 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097306 1 OCFlush 1 50.76 36.98 0.00 61.40 - 0.00 0.00 0.00 0.00 0.00 0.00

2097341 1 bcflushd 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097358 1 VSCSIPoll 1 0.06 0.06 0.00 98.31 - 0.02 0.00 0.00 0.00 0.00 0.00

2097369 1 Storage-APD 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097414 1 serialSwitcher 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097415 1 logterm 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097416 1 logterm-scroll 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097417 1 memMap-adj 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097418 1 pmemArs 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

2097423 1 NRandomHwrng 1 0.00 0.00 0.00 98.38 - 0.00 0.00 0.00 0.00 0.00 0.00

[...]

So I have two questions :

- Why can't we sort when inside a world in esxtop ?

- What is OCFlush and why it would be taking so much CPU while no VMs are running ?

Why does it matter? This is actually a home lab, and ESX is running on a spare MacBook Pro, bottomline is the fans are making too much noise and I want to fix this!

ESX 6.7.0 (Build 8169922)

yannbizeul · ‎01-12-2019

I just found a similar issue with an idea of the cause (not the solution though)

https://www.reddit.com/r/vmware/comments/9q73fk/esxi_system_pid_1_ocflush_cpu_usage_after_49days/

Apparently OCflush would go rogue after 2^32 milliseconds (49 days), my machine has a 61 days uptime

prdtn · ‎02-10-2019

did you find a solution?

mgittelman · ‎04-06-2019

No solution? I have this on 3 different hardware platforms. All different versions of 6.7.

Any idea what OCFlush is used for?

prdtn · ‎04-07-2019

Not yet, maybe it's an issue of 6.7 only?

mgittelman · ‎04-11-2019

I have support with VMware on 2 servers experiencing the issue. After sending them logs I got the following:

I checked the logs and found that both your servers are not running on the recommended custom image. There is no known bug reported about the CPU thread usage on 6.7.
Request you to update the servers using their respective custom images, to have the customized driver and agents installed on the server.
Please install the latest build available for 6.7

While I guess I made the mistake of not installing custom builds from Dell and HP, it does seem odd to have this exact problem on 3 completely different hardware platforms (if you include mine at home).

It seems like they are not aware of a problem and not intending to fix it.

k72kostas · ‎04-13-2019

Hello Yanni,

Perhaps you have to upgrade to U2 first.

Can you please give us as information input the hardware you are using. It seems as a hardware issue.

Konstantinos Kaminaris

yannbizeul · ‎04-14-2019

Thanks @k72kostas,

That's on totally unsupported hardware, Intel NUC system (home lab), so I'm not complaining too much, but other people seem to have the same issue on enterprise gears.

I just applied U2, will report in 49 days 🙂

prdtn · ‎05-30-2019

I got this working with 6.5.0 Update 1 (Build 5969303) only! Because in my case its a homelab issue too, this is a sufficient workaround for me.

mgittelman · ‎06-01-2019

Both my hosts just hit 50 days and are not experiencing the issue. 6.7 13006603

Can anyone else confirm? I remember playing around with a couple potential settings, but now can't remember what they were. Hoping it was the version I installed 50 days ago that made the difference.

ivanerben · ‎06-28-2019

Hi all,

I'm investigating again what is causing that my home lab CPU temperature & fan speed just suddenly went up

without any significant change in virtual machines cpu load.

Expanding system process in esxtop showed OCFlush process and this is how I found this thread.

System uptime is 75 days, change occurred at Jun 2, which gives 51 days of uptime.

[root@esxi:~] vmware -vl

VMware ESXi 6.7.0 build-10302608

VMware ESXi 6.7.0 Update 1

Well, let's try update and then wait 51 days...but I would like to see some solution or explanation what is OCFlush.

ivanerben · ‎07-15-2019

According to this post High CPU usage by system process after 49 days of uptime (OCFlush) it is maybe fixed in 6.7U2. But I'm unable to upgrade due 'No space left on device' which makes no sense. Seems I'm not only one Re: Vmware Esxi Update No Space Left On Device Error

someone-au · ‎02-28-2022

Did you have get a resolution on this ? I have the same on my Dell R740 with the Dell Image Esxi 7

All

high OCFlush %USED with no VMs running. Esxtop %USED sort not working once in world