Can anyone explain what this process is and why it is consuming my so much CPU? It's running like this on two of the three nodes on my vSAN cluster which have no VM's running on the vSAN datastore at this moment.
This is from the Essential vSAN Guide: Link
Maybe you can run the command mentioned at the end of the paragraph to view the logs. Thank you, Zach.
Maybe I'm entering the command wrong but this is what I'm getting when I try to enter the command:
[root@localhost:/vmfs/volumes/b9e102de-807c1230/esxi01/vsantraces] zcat vsanlog.gz | /usr/lib/vmware/vsan/bin/vsanTraceReader.py > vsanlog.txt
zcat: vsanlog.gz: No such file or directory
No wallclock entry detected. There might be a mismatch between the reader version and the tracefile.
zcat vsanlog.gz | /usr/lib/vmware/vsan/bin/vsanTraceReader.py > vsanlog.txt
I think you have the file name incorrect. When I do an ls of /vsantraces, I get something like:
vsantraces--2016-05-01T03h04m32s385.gz
vsantraces--2016-05-01T09h52m55s956.gz
vsantraces--2016-05-01T17h58m30s717.gz
vsantraces--2016-05-02T01h33m18s481.gz
vsantraces--2016-05-02T10h11m50s871.gz
vsantraces--2016-05-02T19h33m12s753.gz
vsantraces--2016-05-03T05h14m30s680.gz
vsantraces--2016-05-03T14h10m49s522.gz
vsantraces.index
vsantracesUrgent--2016-03-14T22h30m35s502.gz
vsantracesUrgent--2016-03-28T15h34m12s728.gz
vsantracesUrgent--2016-04-12T12h11m17s552.gz
vsantracesUrgent--2016-04-28T19h09m36s639.gz
vsantracesUrgent.index
So my command was: zcat vsantraces--2016-05-03T14h10m49s522.gz | /usr/lib/vmware/vsan/bin/vsanTraceReader.py > test.txt
It took awhile and the file is ~366 MB. Here is a sample:
2016-05-03T18:15:57.762856 [197240107] [cpu26] [] DOMTraceVsanServerDispatchStats:2190: {'index': 2, 'avgQdepth': 1, 'maxExecutionTimeUs': 44, 'maxLatencyUs': 68, 'numActivations': 313, 'avgLatencyUs': 14, 'numOpInstantiations': 108, 'avgExecutionTimeUs': 16, 'role': 'DOM_ROLE_OWNER', 'maxQdepth': 12}
2016-05-03T18:15:57.762856 [197240108] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'write', 'index': 2, 'maxExecutionTimeUs': 44, 'maxLatencyUs': 68, 'numActivations': 207, 'avgLatencyUs': 28, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 11}
2016-05-03T18:15:57.762857 [197240109] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'read', 'index': 2, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 24, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762857 [197240110] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'gwe', 'index': 2, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 0, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762857 [197240111] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'other obj op', 'index': 2, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 63, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762857 [197240112] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'cmmds', 'index': 2, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 0, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762858 [197240113] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'resync', 'index': 2, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 0, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762858 [197240114] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'other base op', 'index': 2, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 19, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762914 [197240115] [cpu26] [] DOMTraceVsanServerDispatchStats:2190: {'index': 1, 'avgQdepth': 1, 'maxExecutionTimeUs': 26, 'maxLatencyUs': 23, 'numActivations': 241, 'avgLatencyUs': 4, 'numOpInstantiations': 72, 'avgExecutionTimeUs': 5, 'role': 'DOM_ROLE_OWNER', 'maxQdepth': 6}
2016-05-03T18:15:57.762914 [197240116] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'write', 'index': 1, 'maxExecutionTimeUs': 26, 'maxLatencyUs': 23, 'numActivations': 114, 'avgLatencyUs': 7, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 4}
2016-05-03T18:15:57.762914 [197240117] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'read', 'index': 1, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 18, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
2016-05-03T18:15:57.762914 [197240118] [cpu26] [] DOMTraceDomServerDispatchStats:2147: {'statsType': 'gwe', 'index': 1, 'maxExecutionTimeUs': 0, 'maxLatencyUs': 0, 'numActivations': 0, 'avgLatencyUs': 0, 'role': 'DOM_ROLE_OWNER', 'avgExecutionTimeUs': 0}
Thank you, Zach.
Nothing jumps out to me, other than it seems to be logging every read/write. That might be normal. It doesn't look like you can set a logging level for vSAN traces. There are these to commands:
vsan trace get
vsan trace set --reset
I don't think they help though, it only sets where they are stored, how often to roll them, etc...
I think you're at the end of my help. Thank you, Zach.
Is there anywhere else you could recommend I pose this question to? I only have a VMUG Advantage license so I don't get support with that.
Sorry, not really. Hope Duncan or Cormac comment
Believe it or not, running "esxcli vsan trace set --reset true" fixed my issue. I too was seeing most of my CORE UTIL % eaten up on my 3 NUC lab hosts, and running that command and then re-checking esxtop disappeared vsanTraceReader from the top CPU consumer in esxtop. My consumed CPU in the H5 client for my hosts went from 61-76% average to ~16% for each of my hosts.