VMware Cloud Community
SCPlus
Contributor
Contributor
Jump to solution

ESXi Delay First VM Autostart

Hi there,

I am running ESXi 7.0 update 3 on a Dell server.  Autostart is enabled under Host > Manage > Autostart (see settings in screenshot below).  The issue is that as soon as ESXI powers on, it attempts to start the VM.  However, the disk is not yet available so it throws the error "Failed - The attempted operation cannot be performed in the current state (Powered off)."

How can I delay the start of this VM until the disk starts and the VM is available?  Why will it not wait for the start delay I have set (300 sec)?

SCPlus_0-1668716820103.png

 

Thank you in advance!

 

Edit to add solution:

 

Thank you @StephenMoll for the help!  Using @StephenMoll's code (marked as the answer) I was able to delay EXSI from trying to autostart until the datastore was available.  Having the script running in local.sh stops ESXI from initiating its autostart until after rc.local has completed.  I did not even have to start the VMs in the script - it is still done by ESXI autostart.

As mentioned in another comment, the code needs to be placed in etc/rc.local.d/local.sh

local.sh only runs when UEFI Secure Boot is Disabled in the BIOS.

Reply
0 Kudos
61 Replies
SCPlus
Contributor
Contributor
Jump to solution

I do not think I can put a VM on the boss card.  There does not appear to be a way to reach via ESXI.

I had tried before to use scripting to delay the boot, but it seems ESXi overwrites all the files and does not save any changes I make.

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

Unfortunately, no it does not.  The problem is that the VM is not even attempting to Autostart because it is trying to start before the datastore is available.

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

I am replying to this again, because I am not sure it went through the first time.

This solution does not work because the VM is completely failing to Autostart in the first place.  The issue is to delay the start of the VM until the datastore is available.

Reply
0 Kudos
StephenMoll
Expert
Expert
Jump to solution

I wouldn't expect it to, unless you were able to get the dummy onto the boot device somehow.

What environment are we talking about? vSphere 6.7, 7.0?

Does the host boot with BIOS or EFI?

 

/etc/rc.local.d/local.sh is an editable script location that is persistent. We use it on some of our systems. However I believe it becomes disabled or ignored if using EFI boot and/or TPM assurance.

Reply
0 Kudos
nsharpaus
Contributor
Contributor
Jump to solution

I may have found an issue here in our configuration of the hardware (BOSS boot + SSD datastore). Our datastore was using the High Performance Plugin (HPP) plugin that is typically reserved for NVMe devices. I confirmed that our SSD and BOSS card were both running under the HPP plugin with 'esxcli storage hpp device list'. When I use 'esxcli storage nmp device list', no devices were shown.

The vmkernel logs indicate that the HPP plugin is not bringing the local datastore online until 20+ seconds after boot. The plugin was then changed to use the Native Multipathing Plugin (NMP). After the change, the SSD is now in the NMP device list.

I found this article related to HPP vs NMP, which does say that slower SSDs should not be set to use HPP:

https://4sysops.com/archives/vmware-vsphere-nvme-high-performance-plug-in-hpp/ 

Some other VMware documentation on the different plugins:

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.storage.doc/GUID-9DED1F73-7375-4957...

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.storage.doc/GUID-F7B60A5A-D077-4E37... 

That said, unfortunately, this change has not solved my auto-start issues. But, I thought it was worth mentioning if your configuration was similar.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

I filed two feature requests for you:

1. ability to delay the auto-start of the first VM!

2. ESXi not trying to auto-start a VM until the resources are accessible (storage in this case)

As always, no guarantees if and when this would potentially make it into the product.

SCPlus
Contributor
Contributor
Jump to solution

Thank you!  This is a major issue in this application.  Without being able to Autostart, I will have to manually start these VMs any time power is lost.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

That is why vSphere HA exists, but I am guessing you have a single host and no shared storage

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

I do have two servers, but there is no shared storage.  They are meant to operate independently for redundancy.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

you could add a pause to the local.sh script, that is fully supported to be modified as mentioned here:

https://kb.vmware.com/s/article/2043564

You could use something like:

read -t 60 -p "waiting for 60 seconds"

Reply
0 Kudos
StephenMoll
Expert
Expert
Jump to solution

I already mentioned the use of local.sh, but I wasn't sure it was a definite possibility for them. 

If they are using EFI secure boot, won't it be ignored?

Reply
0 Kudos
nsharpaus
Contributor
Contributor
Jump to solution

Our deployment is similar -- single or redundant server configuration with no shared storage.

I see that VMware has now released 7.0.3 Update 3i, and it mentions more datastore fixes. Hopefully, Dell's custom version will be out soon to try that as a resolution.

https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u3i-release-notes.html 

Thank you all for your ideas and help -- this one's been challenging!

Reply
0 Kudos
nsharpaus
Contributor
Contributor
Jump to solution

I tried adding a delay to the local.sh file, but the file is apparently not executed on our server. I briefly saw a warning about UEFI secure boot, but I didn't have a chance to read it on screen as it flashed by.

I see the exact same behavior as I did previously, even with the local.sh file modifications. My VMs show up as "invalid" until 20-30+ seconds have passed; then, they magically show up in the list. This explains why they aren't auto-started. But, I still don't know why the datastore just isn't available for a long time.

The HPP/NMP plugin theory made sense, but it didn't change the behavior. Something is just delaying the availability of the datastore on the SSD.

Reply
0 Kudos
nsharpaus
Contributor
Contributor
Jump to solution

Working with Dell, I think we found the solution to our server.

We have a PERC H355 RAID card that controls the SSD drive in our server. Dell just recently updated the firmware for this H355 RAID card on 28 Nov 2022. The current version is 52.21.0-4606, A02.

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=n3mp5 

After updating this firmware and rebooting, ESXi loads normally -- and, all my VMs auto-start as expected. So, in my case at least, it was a Dell PERC firmware issue.

Hope this helps solve your issue, too!

depping
Leadership
Leadership
Jump to solution


@nsharpaus wrote:

Working with Dell, I think we found the solution to our server.

We have a PERC H355 RAID card that controls the SSD drive in our server. Dell just recently updated the firmware for this H355 RAID card on 28 Nov 2022. The current version is 52.21.0-4606, A02.

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=n3mp5 

After updating this firmware and rebooting, ESXi loads normally -- and, all my VMs auto-start as expected. So, in my case at least, it was a Dell PERC firmware issue.

Hope this helps solve your issue, too!


thanks for sharing that! That will help others running into this issue

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

@nsharpausCould you please go into a little more detail how you updated the firmware?  Do you do it through ESXi or one of your Guests?

Hopefully once I try this, it will resolve my issue too.

Reply
0 Kudos
nsharpaus
Contributor
Contributor
Jump to solution

I updated the firmware for the PERC H355 RAID Controller through the iDRAC/LC using the Windows DUP, per the instructions on the firmware update page:

IDRAC/LC
1. Download the Windows DUP Executable from https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=n3mp5 
2. Upload DUP to IDRAC or LC
3. Submit job and reboot the system

It's not the driver in the VM/ESXi, but rather the firmware on the actual, physical RAID card that needs updating.

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

How do you run DUP on the server?  Do you run it in the Guest OS?

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

I also just finished working with Dell. They said my RAID was up to date, but iDRAC and BIOS were not. After updating both, the problem STILL persists. 

Any other thoughts?

Reply
0 Kudos
SCPlus
Contributor
Contributor
Jump to solution

I also worked with Dell support. They said my RAID was up to date, but iDRAC and BIOS were not. After updating both, I STILL have the problem. Any other ideas?

Reply
0 Kudos