Hi there,
I am running ESXi 7.0 update 3 on a Dell server. Autostart is enabled under Host > Manage > Autostart (see settings in screenshot below). The issue is that as soon as ESXI powers on, it attempts to start the VM. However, the disk is not yet available so it throws the error "Failed - The attempted operation cannot be performed in the current state (Powered off)."
How can I delay the start of this VM until the disk starts and the VM is available? Why will it not wait for the start delay I have set (300 sec)?
Thank you in advance!
Edit to add solution:
Thank you @StephenMoll for the help! Using @StephenMoll's code (marked as the answer) I was able to delay EXSI from trying to autostart until the datastore was available. Having the script running in local.sh stops ESXI from initiating its autostart until after rc.local has completed. I did not even have to start the VMs in the script - it is still done by ESXI autostart.
As mentioned in another comment, the code needs to be placed in etc/rc.local.d/local.sh
local.sh only runs when UEFI Secure Boot is Disabled in the BIOS.
I am doing a top level reply.
I have been working with Dell and made sure all firmware is up to date. The issue still persists. ESXI is trying to autostart the VM before the datastore is available. Is there anyway to delay ESXI starting?
Are you using UEFI secure booting?
If not have you tried the suggestion of adding a delay into the local.sh script.
I believe I am. I tried adding a script before, but it is erased in reboot.
Interesting. I did not know what the behaviour of the system would be with respect the local.sh file when running UEFI secure boot. I had sort of expected the file to be simply be ignored, not restored to default on each boot. When I have used local.sh in the past, it has just behaved like a normal persistent file. Albeit I think I had to wait a sufficient amount of time for changes to be copied to the bootbank before attempting a restart of the host.
I wonder if booting in legacy mode is a work around you could try, with it perhaps allowing you to create a persistent local.sh file with a delay in it?
How long would I have to wait?
I just looked and there is not a local.sh on the server. local.tgz and local.tgz.ve
I developed a script for one of our systems that checks for datastore access, and stays in a loop until the datastore is contactable.
We had to do this because the way the system was powered up, was via a big single switch. The hosts would boot up quicker than the SAN, resulting in hosts booting without datastores being connected.
The script did much more than that but it has a snippet much like this, which will make up to 5 attempts, a minute apart to see a chosen datastore:
DefaultDS="{Put_datastore_name_here}"
logger -s "Checking for $DefaultDS"
for ATTEMPT in 1 2 3 4 5
do
if [ ! -L "/vmfs/volumes/$DefaultDS" ]
then
logger -s "$DefaultDS not accessible - HBA Rescan triggered"
sleep 60s
esxcli storage core adapter rescan --all --type=all
else
logger -s "$DefaultDS accessible"
break
fi
done
Oh well if the file doesn't exist then I guess this technique is not going to be of use to you.
Guessing this is ESXi 8? I haven't had a chance to play with that yet.
Where did you put this script?
I tried adding a startup script before, but it was erased on startup. Maybe I am putting it in the wrong place.
/etc/rc.local.d/local.sh
Does the script run automatically in that folder?
It does when I've used it, but we are not using secure boot.
If you are watching the ESXi boot screen, when the progress bar is almost to the end you will see the point when the local.sh script is called. Its about the last thing you see reported before the host boot sequence is complete.
I found local.sh in that folder you mentioned in the other comment. But it says it does not run on UEFI secure boot. I think that is enabled in the iDRAC (I am not positive where I saw it). Would it hurt anything to disable it so local.sh runs?
Just made another comment, but I'll post again here.
I found local.sh in that folder you mentioned in the other comment. But it says it does not run on UEFI secure boot. I think that is enabled in the iDRAC (I am not positive where I saw it). Would it hurt anything to disable it so local.sh runs?
Do I really need secure UEFI?
Depends on your environment whether or not you feel the need to have secure boot enabled. I would like to have it enabled, but it would give us a number of engineering challenges, this one being one of them. As well as the need for hosts to re-scan until the datastores are ready, the other thing we have to do is bring hosts out of maintenance mode on boot up, and automatically start some designated VMs.
So secure boot is supposed to ensure the integrity of the host from the server hardware, through the hypervisor up to the guests.
I suppose that if you are confident you have mitigated the risk of unauthorised changes in other ways, it may be acceptable to turn of secure boot for the time being, if the benefit of a clean autostart outweighs the inconvenience of having to manually intervene.
I would like to leave it on, but for the sake of testing I have turned it off.
After running your script to make sure the datastore is available, how to do you start the VM?
Our script starts VMs, because our hosts are clustered, and clustering disables Vm Autostart in ESXi.
If your hosts is standalone (not clustered) you should still be able to use the autostart setup you were using before.
You can start VMs from the script using:
vim-cmd vmsvc/power.on {VMID}
The VMID is the unique ID number the host identifies VMs with. You can determine these by using :
vim-cmd vmsvc/getallvms
This will spit out a list of registered VMs on the host with VMIDs alongside the VM names.
I am doing some testing now. I will report back soon.
Good luck.
After a boot up, if you examine the syslog file, and search (grep) for the "$DefaultDS not accessible - HBA Rescan triggered" strings, you would be able to determine how many re-scans were done before the datastore became visible to the host.
If it is consistently less than 5, you could tweak the max. number of ATTEMPTS, and the sleep time to speed up overall boot time. For example if the datastore is only really taking 10 seconds to become ready, the script as it stands would be consistently adding 50 seconds to the overall boot time. A sleep time of 5 seconds and a max. number of attempts of 5 would mean that the script would introduce a worst case boot delay of 5 seconds or thereabouts.
On the other hand some of these servers take sooooooo long to POST and boot up, a minute either way doesn't seem to be much of difference! LOL!