Hi there,
I am running ESXi 7.0 update 3 on a Dell server. Autostart is enabled under Host > Manage > Autostart (see settings in screenshot below). The issue is that as soon as ESXI powers on, it attempts to start the VM. However, the disk is not yet available so it throws the error "Failed - The attempted operation cannot be performed in the current state (Powered off)."
How can I delay the start of this VM until the disk starts and the VM is available? Why will it not wait for the start delay I have set (300 sec)?
Thank you in advance!
Edit to add solution:
Thank you @StephenMoll for the help! Using @StephenMoll's code (marked as the answer) I was able to delay EXSI from trying to autostart until the datastore was available. Having the script running in local.sh stops ESXI from initiating its autostart until after rc.local has completed. I did not even have to start the VMs in the script - it is still done by ESXI autostart.
As mentioned in another comment, the code needs to be placed in etc/rc.local.d/local.sh
local.sh only runs when UEFI Secure Boot is Disabled in the BIOS.
I do not think I can put a VM on the boss card. There does not appear to be a way to reach via ESXI.
I had tried before to use scripting to delay the boot, but it seems ESXi overwrites all the files and does not save any changes I make.
Unfortunately, no it does not. The problem is that the VM is not even attempting to Autostart because it is trying to start before the datastore is available.
I am replying to this again, because I am not sure it went through the first time.
This solution does not work because the VM is completely failing to Autostart in the first place. The issue is to delay the start of the VM until the datastore is available.
I wouldn't expect it to, unless you were able to get the dummy onto the boot device somehow.
What environment are we talking about? vSphere 6.7, 7.0?
Does the host boot with BIOS or EFI?
/etc/rc.local.d/local.sh is an editable script location that is persistent. We use it on some of our systems. However I believe it becomes disabled or ignored if using EFI boot and/or TPM assurance.
I may have found an issue here in our configuration of the hardware (BOSS boot + SSD datastore). Our datastore was using the High Performance Plugin (HPP) plugin that is typically reserved for NVMe devices. I confirmed that our SSD and BOSS card were both running under the HPP plugin with 'esxcli storage hpp device list'. When I use 'esxcli storage nmp device list', no devices were shown.
The vmkernel logs indicate that the HPP plugin is not bringing the local datastore online until 20+ seconds after boot. The plugin was then changed to use the Native Multipathing Plugin (NMP). After the change, the SSD is now in the NMP device list.
I found this article related to HPP vs NMP, which does say that slower SSDs should not be set to use HPP:
https://4sysops.com/archives/vmware-vsphere-nvme-high-performance-plug-in-hpp/
Some other VMware documentation on the different plugins:
That said, unfortunately, this change has not solved my auto-start issues. But, I thought it was worth mentioning if your configuration was similar.
I filed two feature requests for you:
1. ability to delay the auto-start of the first VM!
2. ESXi not trying to auto-start a VM until the resources are accessible (storage in this case)
As always, no guarantees if and when this would potentially make it into the product.
Thank you! This is a major issue in this application. Without being able to Autostart, I will have to manually start these VMs any time power is lost.
That is why vSphere HA exists, but I am guessing you have a single host and no shared storage
I do have two servers, but there is no shared storage. They are meant to operate independently for redundancy.
you could add a pause to the local.sh script, that is fully supported to be modified as mentioned here:
https://kb.vmware.com/s/article/2043564
You could use something like:
read -t 60 -p "waiting for 60 seconds"
I already mentioned the use of local.sh, but I wasn't sure it was a definite possibility for them.
If they are using EFI secure boot, won't it be ignored?
Our deployment is similar -- single or redundant server configuration with no shared storage.
I see that VMware has now released 7.0.3 Update 3i, and it mentions more datastore fixes. Hopefully, Dell's custom version will be out soon to try that as a resolution.
https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u3i-release-notes.html
Thank you all for your ideas and help -- this one's been challenging!
I tried adding a delay to the local.sh file, but the file is apparently not executed on our server. I briefly saw a warning about UEFI secure boot, but I didn't have a chance to read it on screen as it flashed by.
I see the exact same behavior as I did previously, even with the local.sh file modifications. My VMs show up as "invalid" until 20-30+ seconds have passed; then, they magically show up in the list. This explains why they aren't auto-started. But, I still don't know why the datastore just isn't available for a long time.
The HPP/NMP plugin theory made sense, but it didn't change the behavior. Something is just delaying the availability of the datastore on the SSD.
Working with Dell, I think we found the solution to our server.
We have a PERC H355 RAID card that controls the SSD drive in our server. Dell just recently updated the firmware for this H355 RAID card on 28 Nov 2022. The current version is 52.21.0-4606, A02.
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=n3mp5
After updating this firmware and rebooting, ESXi loads normally -- and, all my VMs auto-start as expected. So, in my case at least, it was a Dell PERC firmware issue.
Hope this helps solve your issue, too!
@nsharpaus wrote:
Working with Dell, I think we found the solution to our server.
We have a PERC H355 RAID card that controls the SSD drive in our server. Dell just recently updated the firmware for this H355 RAID card on 28 Nov 2022. The current version is 52.21.0-4606, A02.
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=n3mp5
After updating this firmware and rebooting, ESXi loads normally -- and, all my VMs auto-start as expected. So, in my case at least, it was a Dell PERC firmware issue.
Hope this helps solve your issue, too!
thanks for sharing that! That will help others running into this issue
@nsharpausCould you please go into a little more detail how you updated the firmware? Do you do it through ESXi or one of your Guests?
Hopefully once I try this, it will resolve my issue too.
I updated the firmware for the PERC H355 RAID Controller through the iDRAC/LC using the Windows DUP, per the instructions on the firmware update page:
IDRAC/LC
1. Download the Windows DUP Executable from https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=n3mp5
2. Upload DUP to IDRAC or LC
3. Submit job and reboot the system
It's not the driver in the VM/ESXi, but rather the firmware on the actual, physical RAID card that needs updating.
How do you run DUP on the server? Do you run it in the Guest OS?
I also just finished working with Dell. They said my RAID was up to date, but iDRAC and BIOS were not. After updating both, the problem STILL persists.
Any other thoughts?
I also worked with Dell support. They said my RAID was up to date, but iDRAC and BIOS were not. After updating both, I STILL have the problem. Any other ideas?