checking esxi hardware for problems using powercli

NucleusVM · ‎10-06-2015

Is there a way to check all the esxi hosts on a vcenter for hardware issues?

Currently I have to go on each esxi, and click on the "hardware status" tab, to see if there are any errors.

It would be much faster if I could just run a script and output a report (html or csv) and just check that.

I currently have an esxi server with a memory issue so it's a good opportunity to test a script.

Thanks

LucD · ‎10-06-2015

Try something like this

foreach($esx in Get-VMHost){

$hs = Get-View -Id $esx.ExtensionData.ConfigManager.HealthStatusSystem

$hs.Runtime.SystemHealthInfo.NumericSensorInfo |

where{$_.HealthState.Label -ne 'Green' -and $_.Name -notmatch 'Rollup'} |

Select @{N='Host';E={$esx.Name}},Name,@{N='Health';E={$_.HealthState.Label}}

}

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

Under the "Health" column the output is always "Unknown". Shouldn't it say something like "Healthy" or "Faulty"?

LucD · ‎10-06-2015

Does it also show "unknown" in the vSphere client or Web client ?

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

When I open the "Hardware Status" tab on the vSphere Client I get a list of 772 sensors.

If there are any alarms, they are shown there.

LucD · ‎10-06-2015

The script can filter out "Unknown" as well

foreach($esx in Get-VMHost){

$hs = Get-View -Id $esx.ExtensionData.ConfigManager.HealthStatusSystem

$hs.Runtime.SystemHealthInfo.NumericSensorInfo |

where{$_.HealthState.Label -notmatch "Green|Unknown" -and $_.Name -notmatch 'Rollup'} |

Select @{N='Host';E={$esx.Name}},Name,@{N='Health';E={$_.HealthState.Label}}

}

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

Now it doesn't produce any output. Its like everything is unknown.

Get-VMHost lists 43 esxi servers, and one of them is the one with the faulty ram.

LucD · ‎10-06-2015

Run the script, without the Where-clause, against that specific ESXi host, just to check what comes out.

foreach($esx in Get-VMHost -Name <faulty-vmhost>){

$hs = Get-View -Id $esx.ExtensionData.ConfigManager.HealthStatusSystem

$hs.Runtime.SystemHealthInfo.NumericSensorInfo |

Select @{N='Host';E={$esx.Name}},Name,@{N='Health';E={$_.HealthState.Label}},Rollup

}

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

Now under the "Health" column everything is listed as green. (which is an improvement) but nothing to indicate that there's a memory problem.

LucD · ‎10-06-2015

When you open the Memory line with the alarm, which sensor shows the error ?

The top alarm is a rollup, which the script didn't show.

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

This is what it shows me. And when I scroll down to check all the DIMMs none of them has an alert.

LucD · ‎10-06-2015

Ok, that seems to be a roll-up issue then.

Since the original script skipped the roll-ups, you didn't see it.

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

Yes but now it doesn't skip the rollups right?

And again, the output of the script shows no indications that there is a problem with the server's memory.

LucD · ‎10-06-2015

Could this be an issue that only manifests itself in the vSphere client ?

Did you already restart your vSphere client ?

Or try with the Web client ?

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

I get the same thing in the web client

LucD · ‎10-06-2015

This is definitely an issue, but not really PowerCLI related as I see it.

Would a reset of the sensors be an option ?

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-06-2015

The target here is that I want to replace the manual checks with this PowerCLI script.

If vCenter shows an error and PowerCLI doesn't pick it up, then PowerCLI is not a reliable solution and can't be used.

I've reset the sensors and vSphere client still shows the faulty memory.

The other thing I've noticed is that vSphere client tells me that there are 768 sensors, but the output of the script only lists 395 lines. Could this be relevant?

LucD · ‎10-07-2015

This is not a PowerCLI issue, moreso since we obtain the sensor data directly from the vSphere API.

The other way of obtaining the sensor readouts, via CIM SMASH, returns approximately the same number of sensors.

In fact, if you Export the Hardware Status Sensors to an XML file, you will notice that the number of entries also is approximately the same as the number returned by the earlier script.

To me it looks as if the number of sensors shown on the page is off, or they calculate that number in a different way.

Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference

NucleusVM · ‎10-07-2015

Could there be other parameters that can be added to the script that will display additional data?

Any chance we're just not querying all the hardware on the server?

NucleusVM · ‎10-07-2015

ok it seems there were other elements that were not added to the script, and that's why its not displaying the faulty memory module. I tried the below, and it showed me that it can actually detect the problem.

Can you help me add this to the script, and anything else it could be missing?

All

checking esxi hardware for problems using powercli