VMware Cloud Community
mazdajai
Contributor
Contributor
Jump to solution

Capture hung process with get-stat

We have an environment (500+ vms) where often time there is hung process. We observed one of the indication is stale cpu utilization over a period of time.

Let's say if there is a vm with a steady 40% cpu utilization over a day, it is safely assume that there is bad process running because an idle process should consuming minimum.

That being said. Is it possible to query vms and return vms with stale cpu utilization? I am thinking to use get-stat and look for non-zero value but doesn't seem to working. Any thoughts?

0 Kudos
1 Solution

Accepted Solutions
LucD
Leadership
Leadership
Jump to solution

Try like this, it adds a condition to see if the average CPU usage is above 20%

Group-Object VM | %{

  $values = $_.Group | Sort-Object -Property Hour | %{$_.CPU}

  $avg = $_Group | Measure-Object -Property CPU -Average | Select -ExpandProperty Average

  $stdvar = Get-WelfordStdVar -Samples $values

  if($avg -gt 20 -and $stdvar -le $threshold){

    $_ | Select Name,@{N='StdVar';E={[math]::Round($stdvar,2)}},@{N='CPU';E={[string]::Join('/',$values)}}

  }

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

View solution in original post

0 Kudos
8 Replies
LucD
Leadership
Leadership
Jump to solution

This will report all VMs that have a daily average CPU usage over 40%

$vms = Get-VM

$stat = 'cpu.usage.average'

$start = (Get-Date).AddDays(-1)

$threshold = 40

Get-Stat -Entity $vms -Stat $stat -Start $start |

Group-Object -Property EntityId | %{

  New-Object PSObject -Property @{

    VM = $_.Group[0].Entity.Name

    CPU = [math]::Round(($_.Group | Where {$_.Instance -eq ''} | Measure-Object -Property Value -Average | Select -ExpandProperty Average),1)

  }

} |

where {$_.CPU -ge $threshold} |

Select VM,CPU


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
mazdajai
Contributor
Contributor
Jump to solution

Is it possible to capture utilization that was constant in a fixed duration?

Yes:

For example: 2pm = 20%, 3pm = 20%, 4pm = 20%, 5pm = 20%

No:

For example: 2pm = 0%, 3pm = 40%, 4pm = 0%, 5pm = 40% 

0 Kudos
LucD
Leadership
Leadership
Jump to solution

Let me see if I get this right, you want hourly intervals, and then only report the VMs that have the same value for all the intervals ?


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
LucD
Leadership
Leadership
Jump to solution

As a matter of fact this is a very interesting question, and touching the world of statistics. one of my favorite subjects.

Since your cpu.usage.average values for each hour will hardly be ever exactly the same, you would need to have a value to express how close the values are together.

In the statistics world they mostly use the standard variation for this.

In the following script I use a function based on the Welford algorithm to calculate the standard variation.

The following script uses a threshold of 5 for the standard deviation to determine if the CPU percentages are "close" together, and thus indicating a rather constant CPU usage.

Play around with the threshold.

function Get-WelfordStdVar{

  param([int[]]$Samples)

  $n = $mean = $M2 = 0

  $Samples | %{

    $n++

    $delta = $_ - $mean

    $mean += ($delta/$n)

    $M2 += ($delta*($_ - $mean))

  }

  if($n -lt 2){0}

  else{

    $M2/($n-1)

  }

}

$vms = Get-VM

$stat = 'cpu.usage.average'

$start = (Get-Date).AddDays(-1)

$threshold = 5

Get-Stat -Entity $vms -Stat $stat -Start $start |

Group-Object -Property EntityId,{$_.Timestamp.Hour} | %{

  New-Object PSObject -Property @{

    VM = $_.Group[0].Entity.Name

    Hour = $_.Group[0].Timestamp.Hour

    CPU = [math]::Round(($_.Group | Where {$_.Instance -eq ''} | Measure-Object -Property Value -Average | Select -ExpandProperty Average),1)

  }

} |

Group-Object VM | %{

  $values = $_.Group | Sort-Object -Property Hour | %{$_.CPU}

  $stdvar = Get-WelfordStdVar -Samples $values

  if($stdvar -le $threshold){

    $_ | Select Name,@{N='StdVar';E={[math]::Round($stdvar,2)}},@{N='CPU';E={[string]::Join('/',$values)}}

  }

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

mazdajai
Contributor
Contributor
Jump to solution

"the same value for all the intervals" - Yes. This is a symptoms of hung process therefore we are looking to correlate it with vmware.

I know a bit of R but don't know anything about it in the powershell world. Will take a look.

0 Kudos
mazdajai
Contributor
Contributor
Jump to solution

Very interesting on computing this using Welford's.

I have ran this for couple days and found it is extremely useful!

Can you suggest where I should put a filter to only show $stat there is greater than 20 to filter idle vms?


Name    StdVar CPU

----    ------ ---

vm1   0.14 0.5/0.5/1.4/0.5/0.5/0.6/0.5/0.5/0.5/0.5/0.5/0.5/0.5/1.4/0.5/0.5/0.5/0.5/0.5/1.4/0.5/0.5/0.5/0.5


Thanks!

0 Kudos
LucD
Leadership
Leadership
Jump to solution

Try like this, it adds a condition to see if the average CPU usage is above 20%

Group-Object VM | %{

  $values = $_.Group | Sort-Object -Property Hour | %{$_.CPU}

  $avg = $_Group | Measure-Object -Property CPU -Average | Select -ExpandProperty Average

  $stdvar = Get-WelfordStdVar -Samples $values

  if($avg -gt 20 -and $stdvar -le $threshold){

    $_ | Select Name,@{N='StdVar';E={[math]::Round($stdvar,2)}},@{N='CPU';E={[string]::Join('/',$values)}}

  }

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
mazdajai
Contributor
Contributor
Jump to solution

Works perfect - Thanks a lot LucD!

0 Kudos