I upgraded vcenter to latest patch
The HA is not working on some nodes since then
Solution
check the fdm version :-
/tmp/scratch/log # esxcli software vib list |grep fdm
vmware-fdm 5.5.0-2646482 VMware VMwareCertified 2015-07-17
remove it
/tmp/scratch/log # esxcli software vib remove -n vmware-fdm
reboot the esxi in Question coz it says below
Removal Result
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
VIBs Installed:
VIBs Removed: VMware_bootbank_vmware-fdm_5.5.0-3252642
VIBs Skipped:
Then just remove it from maint mode & should auto fix.
Actions for powercli for automation :
1. Identify servers in the cluster which has failed HA agent
2. In sequence ( not in parallel) put 1 host in maint mode
3. on that node run , esxcli software vib remove -n vmware-fdm
4. reboot it
5. remove its miant mode
6. carry on the same acts on the esxi which has this issue.
Can we accomplish this with powercli
Thanks
FYI in a cluster random esxi have this problem of HA ..
I have multiple clusters & hence many esxi
Also i tried basic stuff like disable/re-enable ha in each cluster etc..
If you only want to check on the vib version, you can do something like this
$tgtName = 'vmware-fdm'
$tgtVersion = '5.5.0-2646482'
$dryrun = $false
$force = $false
$maintenencemode = $false
$noliveinstall = $false
Get-Cluster | %{
Get-VMHost -Location $_ | %{
$esxcli = Get-EsxCli -VMHost $_
$vib = $esxcli.software.vib.list() | where{$_.Name -eq $tgtName}
if($vib.Version -eq $tgtVersion){
Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false
$esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$vib.Name)
Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false
$esx = Get-VMHost -Name $_.Name
while($esx.PowerState -ne 'PoweredOn'){
sleep 10
$esx = Get-VMHost -Name $esx.Name
}
Set-VMHost -VMHost $esx -State Connected -Confirm:$false
}
}
}
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
LucDLucD Thanks !!!
LucD
You made it so easy
Small things i observered & sorry at time its difficult to explain issue in writing
1. I will change
if($vib.Version -eq $tgtVersion){
$esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$vib.Name)
Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false
...
to
if($vib.Version -eq $tgtVersion){
Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false
$esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$vib.Name)
...
So i will put host in maint mode 1st then remove the FDM agent.
2.
The server where i need to remove FDM & reboot it is on below criteria
So, $tgtVersion = '5.5.0-2646482' , may not be ideal
Kindly suggest
1) Yes, you are right.
I updated the script to set the host first in maintenance mode before the VIB is removed.
2) What is the full message of the HA Agent ?
What does it say when you click the balloon next to the mesage ?
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
LucD
Here is the info
Can you check fi the following displays the ESXi with the HA error ?
foreach($cluster in Get-cluster){
($cluster.ExtensionData.RetrieveDasAdvancedRuntimeInfo()).DasHostInfo.HostDasState |
Select @{N='Cluster';E={$cluster.Name}},Name,RuntimeState
}
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
That did not worked.
But i tested below works
Get-VMHost |
Select Name,@{N='State';E={$_.ExtensionData.Runtime.DasHostState.State}} |sort state
I get output like
State
-----
connectedToMaster
connectedToMaster
fdmUnreachable
fdmUnreachable
fdmUnreachable
fdmUnreachable
fdmUnreachable
fdmUnreachable
fdmUnreachable
master
The you could try like this
$tgtName = 'vmware-fdm'
$dryrun = $false
$force = $false
$maintenencemode = $false
$noliveinstall = $false
Get-Cluster | %{
Get-VMHost -Location $_ |
where{$_.ExtensionData.Runtime.DasHostState.State -eq 'fdmUnreachable'} | %{
$esxcli = Get-EsxCli -VMHost $_
Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false
$esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$tgtName)
Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false
$esx = Get-VMHost -Name $_.Name
while($esx.PowerState -ne 'PoweredOn'){
sleep 10
$esx = Get-VMHost -Name $esx.Name
}
Set-VMHost -VMHost $esx -State Connected -Confirm:$false
}
}
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
Thanks LucD
I will try your code & will get back
LucD
what is found is the codes does not wait for the current esxi to come up & go ahead with putting next in maint mode & reboot .. Pls suggest Thanks
Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false
$esx = Get-VMHost -Name $_.Name
while($esx.PowerState -ne 'PoweredOn'){
sleep 10
$esx = Get-VMHost -Name $esx.Name
}
...
That is exactly what the While-loop should be taking care off.
Does that mean the While-loop with the sleep is not working ?
I would need to seem debugging info to further analyse.
Try putting some write-output lines in the code to check what is happening.
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
also i found a new status now host which came up but still showing bad status
(Get-VMHost -location xxx|Select Name,@{N='State';E={$_.ExtensionData.Runtime.DasHostState.State}} |sort state)
State
-----
connectedToMaster
connectedToMaster
connectedToMaster
fdmUnreachable
fdmUnreachable
fdmUnreachable
fdmUnreachable
initializationError
master
uninitialized
So we have so far
initializationError/connectedToMaster/fdmUnreachable/master/uninitialized
So -ne "connectedToMaster" or "master" would be a correct thing to do seems LucD
I will try adding some write-output & will let u know
LucD
I believe this must have happened, below is the o/p for host xxx & yyy
xxx... Maintenance | PoweredOn | 8 | 3523 | 46288 | 4.646 | 383.966 5.5.0 |
xxx.. Connected | PoweredOn | 8 | 3523 | 46288 | 4.646 | 383.966 5.5.0 |
YYY.. Maintenance | PoweredOn | 8 | 9948 | 46288 | 107.401 | 383.966 5.1.0 |
So may be the sleep duration need to be more ..
Post the reboot it takes some time to change the status to PowerOff
That could indeed be the case.
Perhaps put an extra sleep before the While-loop.
To test for multiple conditions you can do
where{'fdmUnreachable','uninitialized' -contains $_.ExtensionData.Runtime.DasHostState.State} | %{
Blog: lucd.info Twitter: @LucD22 Co-author PowerCLI Reference
May be small issue, it never went in the while loop... i added below write-output
==
Get-Cluster $cluster | %{
Get-VMHost -Location $_ |
where{$_.ExtensionData.Runtime.DasHostState.State -eq 'fdmUnreachable'} | %{
$esxcli = Get-EsxCli -VMHost $_
Set-VMHost -VMHost $_ -State Maintenance -Evacuate -Confirm:$false
Write-Output "$_ is now in maint mode "
$esxcli.software.vib.remove($dryrun,$force,$maintenencemode,$noliveinstall,$tgtName)
Write-Output "$_ is now being rebooteded "
Restart-VMHost -VMHost $_ -Evacuate -Confirm:$false
$esx = Get-VMHost -Name $_.Name
Write-Output " $_ is the focus now"
write-output " Power status of $($esx) is $($esx.PowerState) "
sleep 600 ## Some reasons its not going in below while loop
while( ($esx.PowerState) -ne 'PoweredOn'){
sleep 400
Write-Output " Lets wait 5 min for $esx to come up"
$esx = Get-VMHost -Name $esx.Name
}
Write-Output " $esxi Must be in PoweredOn status now"
Set-VMHost -VMHost $esx -State Connected -Confirm:$false
}
}
===
post reboot the box come up in normal mode & also i need to basically reconfigure the HA agent manually .. i found 2 way ...
simple one
Set-Cluster -Cluster $cluster -HAEnabled:$false
Set-Cluster -Cluster $cluster -HAEnabled:$true
Other one with function :-
function ReconfigureHA {
if ( $_ -isnot [VMware.VimAutomation.Client20.VMHostImpl] ) {
Write-Error "VMHost expected, skipping object in pipeline."
continue
}
$vmhostView = $_ | Get-View
$vmhostView.ReconfigureHostForDAS_Task()
}
Get-VMHost -Name xxxx | ReconfigureHA
==
is it possible to incorporate the function in above code ?
Thanks
may be
$esx = Get-VMHost -Name $_.Name
to
$esx = Get-VMHost -Name $_
is needed
I can not test now as all fixed for my env