VMware Cloud Community
esxi1979
Expert
Expert

HA errors : get from Powercli

I upgraded vcenter to latest patch

The HA is not working on some nodes since then

Solution

check the fdm version :-

/tmp/scratch/log # esxcli software vib list |grep fdm

vmware-fdm                     5.5.0-2646482                          VMware           VMwareCertified   2015-07-17

remove it

/tmp/scratch/log # esxcli software vib remove -n vmware-fdm

reboot the esxi in Question coz it says below

Removal Result

   Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.

   Reboot Required: true

   VIBs Installed:

   VIBs Removed: VMware_bootbank_vmware-fdm_5.5.0-3252642

   VIBs Skipped:

Then just remove it from maint mode & should auto fix.

Actions for powercli for automation  :

1. Identify servers in the cluster which has failed HA agent

2. In sequence ( not in parallel) put 1 host in maint mode

3. on that node run , esxcli software vib remove -n vmware-fdm

4. reboot it

5. remove its miant mode

6. carry on the same acts on the esxi which has this issue.

Can we accomplish this with powercli

Thanks

17 Replies
esxi1979
Expert
Expert

FYI in a cluster random esxi have this problem of HA ..

I have multiple clusters & hence many esxi

Also i tried basic stuff like disable/re-enable ha in each cluster etc..

0 Kudos
LucD
Leadership
Leadership

If you only want to check on the vib version, you can do something like this

$tgtName = 'vmware-fdm'

$tgtVersion = '5.5.0-2646482'

$dryrun = $false

$force = $false

$maintenencemode = $false

$noliveinstall = $false

Get-Cluster | %{

    Get-VMHost -Location $_ | %{

        $esxcli = Get-EsxCli -VMHost $_

        $vib = $esxcli.software.vib.list() | where{$_.Name -eq $tgtName}

        if($vib.Version -eq $tgtVersion){

            Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false

            $esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$vib.Name)

            Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false

            $esx = Get-VMHost -Name $_.Name

            while($esx.PowerState -ne 'PoweredOn'){

                sleep 10

                $esx = Get-VMHost -Name $esx.Name

            }

            Set-VMHost -VMHost $esx -State Connected -Confirm:$false

        }

    }

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

esxi1979
Expert
Expert

Thanks !!!

LucD

You made it so easy

Small things i observered & sorry at time its difficult to explain issue in writing

1. I will change

if($vib.Version -eq $tgtVersion){

            $esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$vib.Name)

            Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false

...


to

if($vib.Version -eq $tgtVersion){

    Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false

            $esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$vib.Name)

        ...

So i will put host in maint mode 1st then remove the FDM agent.

2.

The server where i need to remove FDM & reboot it is on below criteria

pastedImage_0.png

So, $tgtVersion = '5.5.0-2646482' , may not be ideal


Kindly suggest


0 Kudos
LucD
Leadership
Leadership

1) Yes, you are right.

I updated the script to set the host first in maintenance mode before the VIB is removed.

2) What is the full message of the HA Agent ?

What does it say when you click the balloon next to the mesage ?


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
esxi1979
Expert
Expert

LucD

Here is the info

pastedImage_1.png

0 Kudos
LucD
Leadership
Leadership

Can you check fi the following displays the ESXi with the HA error ?

foreach($cluster in Get-cluster){

    ($cluster.ExtensionData.RetrieveDasAdvancedRuntimeInfo()).DasHostInfo.HostDasState |

    Select @{N='Cluster';E={$cluster.Name}},Name,RuntimeState

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
esxi1979
Expert
Expert

That did not worked.

But i tested below works

Get-VMHost |

Select Name,@{N='State';E={$_.ExtensionData.Runtime.DasHostState.State}} |sort state

I get output like

State

-----

connectedToMaster

connectedToMaster

fdmUnreachable

fdmUnreachable

fdmUnreachable

fdmUnreachable

fdmUnreachable

fdmUnreachable

fdmUnreachable

master

0 Kudos
LucD
Leadership
Leadership

The you could try like this

$tgtName = 'vmware-fdm'

$dryrun = $false

$force = $false

$maintenencemode = $false

$noliveinstall = $false

Get-Cluster | %{

    Get-VMHost -Location $_ |

    where{$_.ExtensionData.Runtime.DasHostState.State -eq 'fdmUnreachable'} | %{

        $esxcli = Get-EsxCli -VMHost $_

        Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false

        $esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$tgtName)

        Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false

        $esx = Get-VMHost -Name $_.Name

        while($esx.PowerState -ne 'PoweredOn'){

            sleep 10

            $esx = Get-VMHost -Name $esx.Name

        }

        Set-VMHost -VMHost $esx -State Connected -Confirm:$false

    }

}


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

esxi1979
Expert
Expert

Thanks LucD

I will try your code & will get back

0 Kudos
esxi1979
Expert
Expert

LucD

what is found is the codes does not wait for the current esxi to come up & go ahead with putting next in maint mode & reboot .. Pls suggest Thanks

        Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false

        $esx = Get-VMHost -Name $_.Name

        while($esx.PowerState -ne 'PoweredOn'){

            sleep 10

            $esx = Get-VMHost -Name $esx.Name

        }

...

0 Kudos
LucD
Leadership
Leadership

That is exactly what the While-loop should be taking care off.

Does that mean the While-loop with the sleep is not working ?

I would need to seem debugging info to further analyse.

Try putting some write-output lines in the code to check what is happening.


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

0 Kudos
esxi1979
Expert
Expert

also i found a new status now host which came up but still showing bad status

(Get-VMHost -location xxx|Select Name,@{N='State';E={$_.ExtensionData.Runtime.DasHostState.State}} |sort state)

 

State

-----

connectedToMaster

connectedToMaster

connectedToMaster

fdmUnreachable

fdmUnreachable

fdmUnreachable

fdmUnreachable

initializationError

master

uninitialized

So we have so far

initializationError/connectedToMaster/fdmUnreachable/master/uninitialized

So -ne "connectedToMaster" or "master" would be a correct thing to do seems LucD

0 Kudos
esxi1979
Expert
Expert

I will try adding some write-output & will let u know

0 Kudos
esxi1979
Expert
Expert

LucD

I believe this must have happened, below is the o/p for host xxx  & yyy

xxx... Maintenance PoweredOn  8    3523   46288       4.646     383.966   5.5.0
xxx.. Connected   PoweredOn  8   3523   46288       4.646     383.966   5.5.0
YYY.. Maintenance PoweredOn  8    9948   46288     107.401     383.966   5.1.0

So may be the sleep duration need to be more ..

Post the reboot it takes some time to change the status to PowerOff

0 Kudos
LucD
Leadership
Leadership

That could indeed be the case.

Perhaps put an extra sleep before the While-loop.

To test for multiple conditions you can do

where{'fdmUnreachable','uninitialized' -contains $_.ExtensionData.Runtime.DasHostState.State} | %{


Blog: lucd.info  Twitter: @LucD22  Co-author PowerCLI Reference

esxi1979
Expert
Expert

May be small issue, it never went in the while loop... i added below write-output

==

Get-Cluster $cluster   | %{

    Get-VMHost -Location $_ |

    where{$_.ExtensionData.Runtime.DasHostState.State -eq 'fdmUnreachable'} | %{

        $esxcli = Get-EsxCli -VMHost $_

        Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false

Write-Output "$_ is now in maint mode  "

        $esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$tgtName)

Write-Output "$_ is now being rebooteded  "

        Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false

        $esx = Get-VMHost -Name $_.Name

Write-Output " $_ is the focus now"

write-output " Power status of $($esx) is $($esx.PowerState)  "

sleep 600 ## Some reasons its not going in below while loop

        while( ($esx.PowerState) -ne 'PoweredOn'){

sleep 400 

        Write-Output " Lets wait 5 min for $esx to come up"

            $esx = Get-VMHost -Name $esx.Name

        }

       

        Write-Output " $esxi Must be in PoweredOn status now"

        Set-VMHost -VMHost $esx -State Connected -Confirm:$false

    }

}

===

post reboot the box come up in normal mode & also  i need to basically reconfigure the HA agent manually .. i found 2 way ...

simple one

Set-Cluster -Cluster $cluster -HAEnabled:$false

Set-Cluster -Cluster $cluster -HAEnabled:$true

Other one with function :-

function ReconfigureHA {

if ( $_ -isnot [VMware.VimAutomation.Client20.VMHostImpl] ) {

Write-Error "VMHost expected, skipping object in pipeline."

continue

}

$vmhostView = $_ | Get-View

$vmhostView.ReconfigureHostForDAS_Task()

}

Get-VMHost -Name xxxx  | ReconfigureHA

==

is it possible to incorporate the function in above code ?

Thanks

0 Kudos
esxi1979
Expert
Expert

may be

   $esx = Get-VMHost -Name $_.Name

to

   $esx = Get-VMHost -Name $_

is needed

I can not test now as all fixed for my env

0 Kudos