VMware Cloud Community
yannbizeul
Enthusiast
Enthusiast

high OCFlush %USED with no VMs running. Esxtop %USED sort not working once in world

Here is a sample output of esxtop on a host running no VMs at all.

9:58:51am up 61 days 13:51, 597 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.08, 0.09, 0.11

PCPU USED(%):  61 0.2 0.9 0.2 0.4 0.3 0.3 0.1 AVG: 8.0

PCPU UTIL(%):  83 0.3 1.3 0.3 0.3 0.4 0.2 0.1 AVG:  10

CORE UTIL(%):  83     1.6     0.7     0.3     AVG:  21

      ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %

       1        1 system            208   45.28  783.63    0.00 19653.99       -   39.23    0.00   13.77    0.00 

14004098 14004098 esxtop.4079612      1    3.74    2.84    0.02   95.57       -    0.00    0.00    0.01    0.00  

14002722 14002722 sh.4079405          1    0.07    0.06    0.00   98.36       -    0.00    0.00    0.00    0.00  

14003090 14003090 python.4079451     29    0.05    0.06    0.00 2854.05       -    0.01    0.00    0.00    0.00  

    6062     6062 hostd.2098611      31    0.03    0.04    0.00 3050.82       -    0.01    0.00    0.00    0.00  

    7488     7488 net-lbt.2098818     1    0.03    0.04    0.00   98.38       -    0.00    0.00    0.00    0.00  

    9880     9880 vpxa.2099137       38    0.03    0.02    0.00 3739.81       -    0.01    0.00    0.00    0.00  

    5555     5555 hostdCgiServer.    12    0.01    0.02    0.00 1180.95       -    0.01    0.00    0.00    0.00  

    2198     2198 net-lacp.209774     3    0.01    0.01    0.00  295.24       -    0.01    0.00    0.00    0.00  

14002354 14002354 sshd.4079350        1    0.01    0.01    0.00   98.41       -    0.00    0.00    0.00    0.00  

    6231     6231 rhttpproxy.2098    27    0.01    0.00    0.00 2657.21       -    0.00    0.00    0.00    0.00  

       8        8 helper            154    0.00    0.00    0.00 15153.51       -    0.00    0.00    0.00    0.00 

    2320     2320 vmkiscsid.20977     2    0.00    0.01    0.00  196.83       -    0.00    0.00    0.00    0.00  

    5178     5178 vmware-usbarbit     1    0.00    0.00    0.00   98.41       -    0.00    0.00    0.00    0.00  

You see a pretty reasonably high %USED for the system world, and it doesn't seem to be enough to troubleshoot.

So what I do next is use e key to enter the system world.

This is what I get :

10:01:00am up 61 days 13:53, 597 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.08, 0.09, 0.10

PCPU USED(%): 2.0 1.7 8.1 5.5 3.3 0.7 0.5 0.1 AVG: 2.7

PCPU UTIL(%): 3.5 2.6  11 7.0 2.7 0.8 0.9 0.1 AVG: 3.6

CORE UTIL(%): 6.0      18     3.5     1.0     AVG: 7.1

      ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %

2097153        1 idle1               1    0.00   97.28    0.00    0.00       -    1.12    0.00    0.39    0.00  

2097154        1 idle2               1    0.00   93.76    0.00    0.00       -    4.65    0.00    1.87    0.00  

2097155        1 idle3               1    0.00   95.47    0.00    0.00       -    2.94    0.00    1.43    0.00  

2097156        1 idle4               1    0.00   95.89    0.00    0.00       -    2.52    0.00    0.02    0.00  

2097157        1 idle5               1    0.00   97.75    0.00    0.00       -    0.65    0.00    0.03    0.00  

2097158        1 idle6               1    0.00   97.83    0.00    0.00       -    0.57    0.00    0.03    0.00  

2097159        1 idle7               1    0.00   98.30    0.00    0.00       -    0.10    0.00    0.00    0.00  

2097160        1 vmkEventAsyncMs     1    0.00    0.00    0.00   98.40       -    0.00    0.00    0.00    0.00  

2097161        1 fastslab            1    0.00    0.00    0.00   98.39       -    0.01    0.00    0.00    0.00  

2097162        1 SVGAConsole         1    0.00    0.00    0.00   98.39       -    0.01    0.00    0.00    0.00  

2097163        1 debugtermlivedu     1    0.00    0.00    0.00   98.40       -    0.00    0.00    0.00    0.00  

2097164        1 logSysAlert         1    0.00    0.00    0.00   98.40       -    0.00    0.00    0.00    0.00  

2097165        1 serialLogger        1    0.00    0.00    0.00   98.40       -    0.00    0.00    0.00    0.00  

2097166        1 tlbflushcount       1    0.00    0.00    0.00   98.40       -    0.00    0.00    0.00    0.00  

Not really useful, so I'm gonna sort by %USED using U key :

10:01:55am up 61 days 13:54, 566 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.08, 0.09, 0.10

PCPU USED(%): 0.3 0.1 5.1 0.2 0.0  53 0.5 0.1 AVG: 7.5

PCPU UTIL(%): 0.4 0.1 4.2 0.3 0.0  75 0.6 0.1 AVG:  10

CORE UTIL(%): 0.5     4.4      75     0.1     AVG:  20

Sort by %used

      ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %

2097153        1 idle1               1    0.00  100.84    0.00    0.00       -    0.08    0.00    0.02    0.00  

2097154        1 idle2               1    0.00   96.73    0.00    0.00       -    4.18    0.00    0.00    0.00  

2097155        1 idle3               1    0.00  100.77    0.00    0.00       -    0.14    0.00    0.01    0.00  

2097156        1 idle4               1    0.00  100.88    0.00    0.00       -    0.04    0.00    0.00    0.00  

2097157        1 idle5               1    0.00   70.16    0.00    0.00       -   30.75    0.00   12.81    0.00  

2097158        1 idle6               1    0.00  100.51    0.00    0.00       -    0.40    0.00    0.07    0.00  

2097159        1 idle7               1    0.00  100.84    0.00    0.00       -    0.07    0.00    0.00    0.00  

2097160        1 vmkEventAsyncMs     1    0.00    0.00    0.00  100.00       -    0.00    0.00    0.00    0.00  

2097161        1 fastslab            1    0.00    0.00    0.00  100.00       -    0.01    0.00    0.00    0.00  

2097162        1 SVGAConsole         1    0.00    0.00    0.00  100.00       -    0.01    0.00    0.00    0.00  

2097163        1 debugtermlivedu     1    0.00    0.00    0.00  100.00       -    0.00    0.00    0.00    0.00  

2097164        1 logSysAlert         1    0.00    0.00    0.00  100.00       -    0.00    0.00    0.00    0.00  

2097165        1 serialLogger        1    0.00    0.00    0.00  100.00       -    0.00    0.00    0.00    0.00  

2097166        1 tlbflushcount       1    0.00    0.00    0.00  100.00       -    0.00    0.00    0.00    0.00  

As you can see, it doesn't really work. You might think that's because pretty much everything is at zero, but if I manage to display much more lines in my terminal output by reducing character size, I find some stuff way down :

      ID      GID NAME             NWLD   %USED    %RUN    %SYS   %WAIT %VMWAIT    %RDY   %IDLE  %OVRLP   %CSTP  %

[...]

2097277        1 CpuSchedRealloc     1    0.07    0.06    0.00   98.32       -    0.00    0.00    0.00    0.00    0.00    0.00

2097278        1 CpuMetricsLoadH     1    0.01    0.01    0.00   98.37       -    0.00    0.00    0.00    0.00    0.00    0.00

2097279        1 CpuSchedExtende     1    0.00    0.00    0.00   98.37       -    0.01    0.00    0.00    0.00    0.00    0.00

2097280        1 PktSlabMemorySt     1    0.00    0.00    0.00   98.37       -    0.01    0.00    0.00    0.00    0.00    0.00

2097305        1 DCFlushCaches       1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097306        1 OCFlush             1   50.76   36.98    0.00   61.40       -    0.00    0.00    0.00    0.00    0.00    0.00

2097341        1 bcflushd            1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097358        1 VSCSIPoll           1    0.06    0.06    0.00   98.31       -    0.02    0.00    0.00    0.00    0.00    0.00

2097369        1 Storage-APD         1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097414        1 serialSwitcher      1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097415        1 logterm             1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097416        1 logterm-scroll      1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097417        1 memMap-adj          1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097418        1 pmemArs             1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

2097423        1 NRandomHwrng        1    0.00    0.00    0.00   98.38       -    0.00    0.00    0.00    0.00    0.00    0.00

[...]

So I have two questions :

- Why can't we sort when inside a world in esxtop ?

- What is OCFlush and why it would be taking so much CPU while no VMs are running ?

Why does it matter? This is actually a home lab, and ESX is running on a spare MacBook Pro, bottomline is the fans are making too much noise and I want to fix this!

ESX 6.7.0 (Build 8169922)

Tags (3)
12 Replies
yannbizeul
Enthusiast
Enthusiast

I just found a similar issue with an idea of the cause (not the solution though)

https://www.reddit.com/r/vmware/comments/9q73fk/esxi_system_pid_1_ocflush_cpu_usage_after_49days/

Apparently OCflush would go rogue after 2^32 milliseconds (49 days), my machine has a 61 days uptime

Reply
0 Kudos
prdtn
Contributor
Contributor

did you find a solution?

Reply
0 Kudos
mgittelman
Contributor
Contributor

No solution?  I have this on 3 different hardware platforms.  All different versions of 6.7.

Any idea what OCFlush is used for?

Reply
0 Kudos
prdtn
Contributor
Contributor

Not yet, maybe it's an issue of 6.7 only?

Reply
0 Kudos
mgittelman
Contributor
Contributor

I have support with VMware on 2 servers experiencing the issue.  After sending them logs I got the following:

I checked the logs and found that both your servers are not running on the recommended custom image. There is no known bug reported about the CPU thread usage on 6.7.

Request you to update the servers using their respective custom images, to have the customized driver and agents installed on the server.

Please install the latest build available for 6.7

While I guess I made the mistake of not installing custom builds from Dell and HP, it does seem odd to have this exact problem on 3 completely different hardware platforms (if you include mine at home).

It seems like they are not aware of a problem and not intending to fix it.

Reply
0 Kudos
k72kostas
Contributor
Contributor

Hello Yanni,

Perhaps you have to upgrade to U2 first.

Can you please give us as information input the hardware you are using. It seems as a hardware issue.

Konstantinos Kaminaris
Reply
0 Kudos
yannbizeul
Enthusiast
Enthusiast

Thanks @k72kostas,

That's on totally unsupported hardware, Intel NUC system (home lab), so I'm not complaining too much, but other people seem to have the same issue on enterprise gears.

I just applied U2, will report in 49 days 🙂

prdtn
Contributor
Contributor

I got this working with 6.5.0 Update 1 (Build 5969303) only! Because in my case its a homelab issue too, this is a sufficient workaround for me.

Reply
0 Kudos
mgittelman
Contributor
Contributor

Both my hosts just hit 50 days and are not experiencing the issue. 6.7 13006603

Can anyone else confirm?  I remember playing around with a couple potential settings, but now can't remember what they were.  Hoping it was the version I installed 50 days ago that made the difference.

Reply
0 Kudos
ivanerben
Enthusiast
Enthusiast

Hi all,

I'm investigating again what is causing that my home lab CPU temperature & fan speed just suddenly went up

pastedImage_0.png

without any significant change in virtual machines cpu load.

Expanding system process in esxtop showed OCFlush process and this is how I found this thread.

pastedImage_1.png

System uptime is 75 days, change occurred at Jun 2, which gives 51 days of uptime.

[root@esxi:~] vmware -vl

VMware ESXi 6.7.0 build-10302608

VMware ESXi 6.7.0 Update 1

Well, let's try update and then wait 51 days...but I would like to see some solution or explanation what is OCFlush.

Reply
0 Kudos
ivanerben
Enthusiast
Enthusiast

According to this post High CPU usage by system process after 49 days of uptime (OCFlush) it is maybe fixed in 6.7U2. But I'm unable to upgrade due 'No space left on device' which makes no sense. Seems I'm not only one Re: Vmware Esxi Update No Space Left On Device Error

Reply
0 Kudos
someone-au
Contributor
Contributor

Did you have get a resolution on this ?  I have the same on my Dell R740 with the Dell Image Esxi 7 

Reply
0 Kudos