I have been running a script in "local.sh" on some of our hosts that does several things, amongst which is a task to bring a host out of maintenance mode.
We have been using this script for some years, starting at a time when we were using vSphere 6.5.
We are now on 6.7u2 and on one of our systems the script has started playing up.
Especially this bit:
MaintenanceModeStatus=$(esxcli system maintenanceMode get)
case $MaintenanceModeStatus in
logger -s "AUTO-START : Exiting Maintenance Mode"
if [ $? -ne 0 ]
logger -s "AUTO-START : Maintenance Mode exit failed"
logger -s "AUTO-START : Already out of Maintenance Mode"
logger -s "AUTO-START : Invalid MaintenanceMode status - $MaintenanceModeStatus"
We have started seeing log entries for an "Invalid MaintenanceMode status" appearing for both hosts that use the script on one of the systems.
$MaintenanceModeStatus is coming back as "Connection error".
Putting a sleep delay before this section of the script seems to help, but we would like to understand why.
When "local.sh" is run, is possibly the case that not all services in VCSA are fully ready, and calls to get information may come back invalid, empty or unexpected values?
I might try that.
I also going to propose that we put a test script in local.sh that makes lots of maintenance mode status requests in a tight loop and logs the results. Then we hopefully baseline the readiness time of the host and put either a fixed delay or alternatively change the request to loop until the response is either "Enabled" or "Disabled" as we expect. In the latter case though I am not sure if it would be prudent to put a limit on the loop, i.e. would we need to cover of the possibility that we don't ever get the expected responses?
I suppose in reality we could remove the check altogether, and simply request a maintenance mode exit whenever the script runs. That assumes that the request can't suffer the same "Connection error" of course.
would we need to cover of the possibility that we don't ever get the expected responses?
Maybe somehow ...
Please test two ways:
1.Try this instead of esxcli: vimsh -n -e /hostsvc/maintenance_mode_exit (vim-cmd has only enter option not exit)
2. Run other esxcli / localcli system syntax (instead of maintenanceMode) when the host is in maintenanceMode and check their operations
Not really keen to go the "localcli" route. But will try it if permitted to do so.
I suspect at the moment that the issue I am seeing is simply to do with timing. The script is trying to do things before the host is really ready. All this worked flawlessly under ESXi 6.5 for nearly two years. ESXi 6.7 definitely behaves differently in subtle ways.
Finally vim-cmd most definitely does have an exit maintenance mode, I use it quite often:
Copied from SSH session just now:
[root@MollESXi:~] vim-cmd hostsvc/
Commands available under hostsvc/:
advopt/ enable_ssh refresh_firewall
autostartmanager/ firewall_disable_ruleset refresh_services
datastore/ firewall_enable_ruleset reset_service
datastorebrowser/ get_service_status runtimeinfo
firmware/ hostconfig set_hostid
net/ hosthardware standby_mode_enter
rsrc/ hostsummary standby_mode_exit
storage/ login start_esx_shell
summary/ logout start_service
vmotion/ maintenance_mode_enter start_ssh
connect maintenance_mode_exit stop_esx_shell
cpuinfo pci_add stop_service
disable_esx_shell pci_remove stop_ssh
disable_ssh queryconnectioninfo task_list
enable_esx_shell querydisabledmethods updateSSLThumbprintsInfo