VMware Cloud Community
DerekFowler
Contributor
Contributor
Jump to solution

esxtop

Hi,

Apologies  if this is not correct place. Hopefully correct. (vsphere 6.7)

I am trying to extract data from esxtop. I have a ps script that hauls all of the data to csv files, then a  bash script that filters out stuff we don't need. (both inherited from someone no longer with company) I have no idea on bash, so was hoping for some pointers.

I need to add in further filters, so it can show more details. I would like to add NUMA N%L & Disk Latency (KAVG) + a few others.

This is what I have...again, if this is totally wrong place, apologies.

esxoutputdirectoty="esxtop-gatherer-results"
outputcsv="output.csv"

if [ ! -f "/tmp/filter-output.csv" ];then
echo "vcpuname, %ready result, status" > "${outputcsv}"
else
echo "{outputcsv} already exists"

fi

for FILE in "${esxitopdirectory"/*;do

collistcount=$(head -1 $FILE | tr ',''\n'|nl| sed -e 's/^[[:space:]]*//'|sed -r 's/\s/,/'|grep "Ready"| grep "vcpu"|wc -1)
if [ "$collistcount" !="0" ];then
collist=$(head -1 $FILE | tr '\n'|nl| sed -e 's'/^[[:space:]]*//'|sed -r 's/\s+/,/'|grep "Ready"|grep "vcpu")
colnumber=$(echo -en "$collist"|awk -F "," 'print $1')

echo "$(collist)"
echo "==========================================="
while IFS=read -r line
do
echo ""
echo "Line: $line"
vcpuname=$(cat $FILE | awk -F "," -v var=$line 'BEGIN{ans=var} {print $ans}' | head -1)
echo "$(vcpuname)"
echo "===================================="
readyresult=$(cat $FILE | awk -F "," -v var=$line 'BEGIN{ans=var} {print $ans}' | tail -n +2 | tr -d '"'|sort -r |head -1)
readyresultin=$(echo "${readyresult}" | tr -dc '0-9')
if [ "${readyresultin}" -ge "499" ];then
echo "$readyresult FAIL"
echo "${vcpuname],$}readyresult},FAIL" >> "${outputcsv}"
else
echo "${readyresult} PASS"
echo "${vcpuname},$[readyresult},PASS" >> "${outputcsv}"
fi
echo "==========================================="
done <<<"${colnumber}"
else
echo "skipping $FILE no virtual machines found"
fi
done

Labels (2)
0 Kudos
1 Solution

Accepted Solutions
vbondzio
VMware Employee
VMware Employee
Jump to solution

You just have to look at the column headers, then match using (simple) regular expressions for the egrep.

Let's dissect it:

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 993)-> esxtopBatchFile=foo.csv

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 994)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | wc -l
40886


There are 40886 possible lines, a bit much to read through so you might want to "grep -i" for stuff that you care about, note that some of the counters have different names in esxtop batch compared to live esxtop. Let's understand that line though:

head -1 will return the first line of the csv, i.e. the column headers, all other lines (rows) are values
tr will "translate" commas to newlines, i.e. making rows out of the columns
nl will add a line number count in front of the line
the first sed will remove multiple white spaces before the line numbers
the second sed will change the first white space after the numbers to a single comma
etc.

You can discover all of that by just manually stepping through the script (use a head in the end to only look at 10 rows instead of thousands):

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 995)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | grep "Virtual Disk" | head
32268,"\\hostname.domain.tld\Virtual Disk(VMNAME)\FailedIOs"
32269,"\\hostname.domain.tld\Virtual Disk(VMNAME)\TotalIOs"
32270,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Latency"
32271,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Commands/sec"
32272,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Reads/sec"
32273,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Writes/sec"
32274,"\\hostname.domain.tld\Virtual Disk(VMNAME)\MBytes Read/sec"
32275,"\\hostname.domain.tld\Virtual Disk(VMNAME)\MBytes Written/sec"
32276,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Average MilliSec/Read"
32277,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Average MilliSec/Write"

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 996)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | head # sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | grep "Virtual Disk"
     1  "(PDH-CSV 4.0) (UTC)(0)"
     2  "\\hostname.domain.tld\Memory\Memory Overcommit (1 Minute Avg)"
     3  "\\hostname.domain.tld\Memory\Memory Overcommit (5 Minute Avg)"
     4  "\\hostname.domain.tld\Memory\Memory Overcommit (15 Minute Avg)"
     5  "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (1 Minute Avg)"
     6  "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (5 Minute Avg)"
     7  "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (15 Minute Avg)"
     8  "\\hostname.domain.tld\Physical Cpu(0)\% Processor Time"
     9  "\\hostname.domain.tld\Physical Cpu(1)\% Processor Time"
    10  "\\hostname.domain.tld\Physical Cpu(2)\% Processor Time"

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 997)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | head # sed -r 's/\s+/,/;' | grep "Virtual Disk"
1       "(PDH-CSV 4.0) (UTC)(0)"
2       "\\hostname.domain.tld\Memory\Memory Overcommit (1 Minute Avg)"
3       "\\hostname.domain.tld\Memory\Memory Overcommit (5 Minute Avg)"
4       "\\hostname.domain.tld\Memory\Memory Overcommit (15 Minute Avg)"
5       "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (1 Minute Avg)"
6       "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (5 Minute Avg)"
7       "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (15 Minute Avg)"
8       "\\hostname.domain.tld\Physical Cpu(0)\% Processor Time"
9       "\\hostname.domain.tld\Physical Cpu(1)\% Processor Time"
10      "\\hostname.domain.tld\Physical Cpu(2)\% Processor Time"

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 998)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | head # grep "Virtual Disk"
1,"(PDH-CSV 4.0) (UTC)(0)"
2,"\\hostname.domain.tld\Memory\Memory Overcommit (1 Minute Avg)"
3,"\\hostname.domain.tld\Memory\Memory Overcommit (5 Minute Avg)"
4,"\\hostname.domain.tld\Memory\Memory Overcommit (15 Minute Avg)"
5,"\\hostname.domain.tld\Physical Cpu Load\Cpu Load (1 Minute Avg)"
6,"\\hostname.domain.tld\Physical Cpu Load\Cpu Load (5 Minute Avg)"
7,"\\hostname.domain.tld\Physical Cpu Load\Cpu Load (15 Minute Avg)"
8,"\\hostname.domain.tld\Physical Cpu(0)\% Processor Time"
9,"\\hostname.domain.tld\Physical Cpu(1)\% Processor Time"
10,"\\hostname.domain.tld\Physical Cpu(2)\% Processor Time"


Ok, I think we have everything covered and can mark this topic as solved 🙂 I hope you can replace that script with a better solution soon!

Cheers,

Valentin


P.S.
As far as donations go, preferably something that is matched, e.g. if you employer has a policy that matches up to a certain yearly amount and tool to make that easy (e.g. Bright Funds) use that. Any cause will do, animal welfare, climate, "anything to make the world a better place" really.

If you want optimize for impact, check whether the orgs under https://www.givewell.org/charities/top-charities have UK arms.

Or if that is too much work, I'm pretty sure MÉDECINS SANS FRONTIÈRES is in pretty much registered in any country 🙂

View solution in original post

0 Kudos
15 Replies
vbondzio
VMware Employee
VMware Employee
Jump to solution

Ok, before we look at that hacky "parser", let me ask you a few questions first, why esxtop? Why not use PowerCLI? Because ready and KAVGs can be as easily retrieved via "Get-Stat". It's a whole other discussion whether you really need N%L but that could also be done via "Get-Esxtop".

0 Kudos
DerekFowler
Contributor
Contributor
Jump to solution

Hi, 

Thank you for the reply... for varying reasons I am at the moment restricted to using esxtop.  

I was using N%L mainly as an example.. but would be interested to know why you think whether on not it is required.

I dont think I mentioned I already have a full esxtop output to csv...( I have a ps script that logs onto each host, then saves esxtop data to a csv) that script interrogates  the csv files & just pulls out relevant data.  then from that data we run perfmon on the output of bash script.

does that make sense? 

 

Thank you

 

 

 

0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

The requirement for an esxtop batch / csv file is clear from the script, which seemingly was either OCRed, typed off a screenshot or vandalized because it definitely doesn't run the way it is pasted in your description. Just looking at the first couple of lines, the first variable has a typo, I'm pretty sure the for loop would not be happy with the content of that variable nor the syntax, wc doesn't take 1 as an argument, tr in the first branch needs two arguments and there is an extra single quote that doesn't belong there.

Hence why I don't think it is a good idea to extend something that would have to be fixed first, let alone made more extendable. If you already have PS script that gets the batch file, why not do the analysis in something that you can understand and actually extend? I'm assuming this is run in WSL or PS is run on a Linux box? I mean I don't care for what reasons but this is a pretty odd "pipeline".

While this is the right forum to ask questions about esxtop, batch output, interpretation etc, I don't think it is the right forum for combing through a (non-functioning) bash script. I'm sure it would be easy enough to fix but is it worth it?

With regards to N%L, you'd need a lot more context to decide whether e.g. some clients being intermittently remote is an issue or not, having a snapshot in time to "alert" on a value like this script is trying to, just doesn't have much value. NUMA migrations are as normal as vMotions done by DRS.

0 Kudos
DerekFowler
Contributor
Contributor
Jump to solution

Thank you for the reply. 

because of the environment I work in, I am very limited as to what I can use & esxtop is the tool I need to use. 

The bash script does run n a linux machine. It actually does work & gathers required info. which at the moment is only Vcpu. & was was just hoping to add in a couple more counters.

The ps script that gathers the data, ends up with a csv file that is  too large to open, that's one of the  the reasons for the  bash script. 

& yes, there probably are typos, I had to type out the script into this window... (cant copy/paste) which is probably why it doesn't look right...& has typos...

I am more just looking for pointers on what maybe needs to be added so I can get any relevant stats out, tharther that just a massive CSV thats hard to read  🙂

 

Many Thanks

 

Derek

 

 

 

 

0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

I found this on my "maybe" pile this Sunday and spent half an hour fixing the script (https://pastebin.com/1WWFVeu0), I'm not sure where the "off by one column" bug could have been introduced by a typo, are you sure that the "vcpuname" variable in your working output contained "Ready" and not "Idle"?

Anyhow, to answer your question, you'd have to change half the script since is written just for the "vcpu ready" check and not really extendable for other metrics. You could "copy" most of the script for different counters which should be possible with any scripting experience, expanding it for an arbitrary metric / pass / fail is probably not feasible if you don't have at least beginner level shell / bash skills.

There are probably some other existing projects in a language that is easier to work with for you. E.g. just searching for "esxtop csv parser" came up with this: https://github.com/cesirx/vSphere-esxtop-parser/blob/master/esxtop_parser.py

Let me write you one example for DAVG and that should give you an idea of the greps / matches to replace and the fail / pass check value formatting.

DerekFowler
Contributor
Contributor
Jump to solution

Hi vbondzio,

 

Thank you again.  the column reads %ready result.

I sort of guessed, that the script would need a pile of stuff added  🙂  & my scripting experience is very limited.

That py script looks ok at first glance. (your pastebin link gives me a re-direct loop. which is down to my work... I got it another way)

 

If you could do me a example, that would be great.. 😄

Del

 

0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

Hi Derek,

https://pastebin.com/WQPEtLsp is better but still bad, lipstick on a pig.

I didn't change too much from the actual flow but made it somewhat extendable and hopefully easier to read. Just call it with a defined "metric" and it will the respective output csv for it.

Let me know if something isn't clear.

Cheers,

Valentin

P.S.
While I was just doing this on the side, I did spend a bit over an hour of my free time, if this is useful I'd appreciate if you could donate a couple of bucks to your favorite charity.

DerekFowler
Contributor
Contributor
Jump to solution

Hi Valentin,

Thank you for that... I will certainly donate to a charity... Thank you for all your help.

Ive just tried runnning, Im getting a error (looking to see if I can work it out)  
line 2: '$'\r: command not found

line 3: syntax error near unexpected token '$'in \r ''

line 3: case $arg1

is that the output file?

 

Del

Derek

0 Kudos
DerekFowler
Contributor
Contributor
Jump to solution

fixed it,, had to run

sed -i 's/\r//' esk.sh

sorry, last Q... I have multiple esxtop.csv  will this loop through them all ?

 

answering my own Q.. it looks like it does

0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

It does but there was a tiny bug because I didn't test it with multiple files, works now: https://pastebin.com/raw/b4y9yBRs

It looks for all csv files in the local directory and expects a local "output" folder. It doesn't check for duplicates or anything so re-running against the same set of files will just append the same content again. Make sure to delete everything in the output folder before doing a run with new batch files!

Let me say it one last time, this is a very coarse tool, it basically checks the maximum value at any point over the duration of the esxtop batch and alerts on an arbitrary threshold. I.e. a vCPU running with 1% ready all the time but blip'ing to 5% once will alert, a vCPU at 4.5% at all times will not. You might want to increase the sample length of the esxtop batch collection to ~ a minute to flatten minor intermittent spikes.

Also, don't copy / paste through any Windows tools that add carriage returns 🙂

DerekFowler
Contributor
Contributor
Jump to solution

Hi,

Thank you again...

I do have multiple time samples, my esxtop, runs at at a couple of different time samples.

I was getting a error "line 58 integer expression expected"   not quite yet figured out what that means 🙂

not C&P in windows 🙂

 

Del

 

0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

Are you using the updated version from my last post? Because that is what I fixed (removed quotes so that the files are passed one by one and not it on big string). Open the script with e.g. "less -N" and then check what is on line 58, in this case, the expression is expecting an integer which it isn't getting because the data fed is wrong due to the aforementioned bug.

0 Kudos
DerekFowler
Contributor
Contributor
Jump to solution

Hi Valentin,

I was running old script with that error, now running new script & it seems to be running great... thank you very much...

I guess, I just need to change the ready/vdiskread/vdiskwrite to what ones I want, or just add extra lines below using same format.  * is there a List of bits I can add?

 

one again, thank you for all your help.... Is there a charity you would prefer?  & I`ll get some money to them.. *Im UK based, so if they have a UK arm, that would be great.. avoiding exchange rates  🙂

 

Derek

0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

You just have to look at the column headers, then match using (simple) regular expressions for the egrep.

Let's dissect it:

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 993)-> esxtopBatchFile=foo.csv

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 994)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | wc -l
40886


There are 40886 possible lines, a bit much to read through so you might want to "grep -i" for stuff that you care about, note that some of the counters have different names in esxtop batch compared to live esxtop. Let's understand that line though:

head -1 will return the first line of the csv, i.e. the column headers, all other lines (rows) are values
tr will "translate" commas to newlines, i.e. making rows out of the columns
nl will add a line number count in front of the line
the first sed will remove multiple white spaces before the line numbers
the second sed will change the first white space after the numbers to a single comma
etc.

You can discover all of that by just manually stepping through the script (use a head in the end to only look at 10 rows instead of thousands):

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 995)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | grep "Virtual Disk" | head
32268,"\\hostname.domain.tld\Virtual Disk(VMNAME)\FailedIOs"
32269,"\\hostname.domain.tld\Virtual Disk(VMNAME)\TotalIOs"
32270,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Latency"
32271,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Commands/sec"
32272,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Reads/sec"
32273,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Writes/sec"
32274,"\\hostname.domain.tld\Virtual Disk(VMNAME)\MBytes Read/sec"
32275,"\\hostname.domain.tld\Virtual Disk(VMNAME)\MBytes Written/sec"
32276,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Average MilliSec/Read"
32277,"\\hostname.domain.tld\Virtual Disk(VMNAME)\Average MilliSec/Write"

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 996)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | head # sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | grep "Virtual Disk"
     1  "(PDH-CSV 4.0) (UTC)(0)"
     2  "\\hostname.domain.tld\Memory\Memory Overcommit (1 Minute Avg)"
     3  "\\hostname.domain.tld\Memory\Memory Overcommit (5 Minute Avg)"
     4  "\\hostname.domain.tld\Memory\Memory Overcommit (15 Minute Avg)"
     5  "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (1 Minute Avg)"
     6  "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (5 Minute Avg)"
     7  "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (15 Minute Avg)"
     8  "\\hostname.domain.tld\Physical Cpu(0)\% Processor Time"
     9  "\\hostname.domain.tld\Physical Cpu(1)\% Processor Time"
    10  "\\hostname.domain.tld\Physical Cpu(2)\% Processor Time"

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 997)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | head # sed -r 's/\s+/,/;' | grep "Virtual Disk"
1       "(PDH-CSV 4.0) (UTC)(0)"
2       "\\hostname.domain.tld\Memory\Memory Overcommit (1 Minute Avg)"
3       "\\hostname.domain.tld\Memory\Memory Overcommit (5 Minute Avg)"
4       "\\hostname.domain.tld\Memory\Memory Overcommit (15 Minute Avg)"
5       "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (1 Minute Avg)"
6       "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (5 Minute Avg)"
7       "\\hostname.domain.tld\Physical Cpu Load\Cpu Load (15 Minute Avg)"
8       "\\hostname.domain.tld\Physical Cpu(0)\% Processor Time"
9       "\\hostname.domain.tld\Physical Cpu(1)\% Processor Time"
10      "\\hostname.domain.tld\Physical Cpu(2)\% Processor Time"

(vbondzio)-(jobs:0)-(~/batch_script_tmp)
(! 998)-> head -1 ${esxtopBatchFile} | tr ',' '\n' | nl | sed -e 's/^[[:space:]]*//' | sed -r 's/\s+/,/;' | head # grep "Virtual Disk"
1,"(PDH-CSV 4.0) (UTC)(0)"
2,"\\hostname.domain.tld\Memory\Memory Overcommit (1 Minute Avg)"
3,"\\hostname.domain.tld\Memory\Memory Overcommit (5 Minute Avg)"
4,"\\hostname.domain.tld\Memory\Memory Overcommit (15 Minute Avg)"
5,"\\hostname.domain.tld\Physical Cpu Load\Cpu Load (1 Minute Avg)"
6,"\\hostname.domain.tld\Physical Cpu Load\Cpu Load (5 Minute Avg)"
7,"\\hostname.domain.tld\Physical Cpu Load\Cpu Load (15 Minute Avg)"
8,"\\hostname.domain.tld\Physical Cpu(0)\% Processor Time"
9,"\\hostname.domain.tld\Physical Cpu(1)\% Processor Time"
10,"\\hostname.domain.tld\Physical Cpu(2)\% Processor Time"


Ok, I think we have everything covered and can mark this topic as solved 🙂 I hope you can replace that script with a better solution soon!

Cheers,

Valentin


P.S.
As far as donations go, preferably something that is matched, e.g. if you employer has a policy that matches up to a certain yearly amount and tool to make that easy (e.g. Bright Funds) use that. Any cause will do, animal welfare, climate, "anything to make the world a better place" really.

If you want optimize for impact, check whether the orgs under https://www.givewell.org/charities/top-charities have UK arms.

Or if that is too much work, I'm pretty sure MÉDECINS SANS FRONTIÈRES is in pretty much registered in any country 🙂

0 Kudos
DerekFowler
Contributor
Contributor
Jump to solution

ok, I`ll sort something 🙂

and once more, thank you for all your help.

It has helped a lot 🙂

 

Del

 

0 Kudos