spchurchill
Contributor
Contributor

Delay VM power-up until SAN is ready after power loss

Hi,

I'm trying to work out how to delay VMs starting up after a major power failure. We recently had a power cut which lasted longer than our UPS could support so our SAN and the ESX hosts lost power. When the power came back on all of the hardware came back online correctly but the SAN took longer to become ready than the ESX hosts (as would be expected). As soon as the hosts were ready, HA tried to start some hosts but the SAN wasn't ready so it didn't work. Obviously we'll get onto site as soon as possible in this situation to do some things manually but it'd be nice if it recovered as much as possible on its own.

I know how to change the priority of when the servers should be switched back on (Cluster > Settings > HA > VM Options) but this is only relative to the other VM selections in the list so if you put them all to Low they will still come on as fast as if you'd put them all to High. What I'm after is a feature that tells HA to wait a futher 5 mins after the host is ready before starting any VMs because other supporting infrastructure might not be ready yet I guess the "other supporting infrastructure" could apply to all sorts of bits like network switches and WAN links etc (although they're unlikely to take as long as the host to come back online).

I'm aware this feature might not exist and this might be impossible but I thought it was worth asking.

Thanks,

Sam

Tags (2)
0 Kudos
8 Replies
chill
Hot Shot
Hot Shot

What about the Virtual Machine Startup/Shutdown option under the Configuration tab of the ESX host? You can delay the startup for as long as you want. Will this help with your scenario? This is assuming that you are using Virtual Centre.

If you find this information helpful or correct, please consider awarding points.
spchurchill
Contributor
Contributor

Thanks Chill, I think that's very close to what I want but I get the feeling that's not actually to do with HA as we don't have it set to start any of the VMs automatically at the moment and yet they do try to be started.

Interestingly I was looking at this bit just the other day on VMware Server and the delay seems to be after the specified VM's startup rather than before which means that it's still impossible to delay the first one. (I'm happy to be corrected on this if I've got this wrong!) In this case I was trying to delay the startup of a VM which needed access to an SQL server which was on the host Windows 2003 machine - SQL was not started and being responsive by the time the service on the VM started and tried to get its data.

0 Kudos
korpy
Enthusiast
Enthusiast

Hi,

In some systems you can change a startup delay in the bios settings. I've used this to solve this problem.

regards -frank-

0 Kudos
spchurchill
Contributor
Contributor

That sounds like a good plan Korpy, I'll have a look at doing that.

0 Kudos
spchurchill
Contributor
Contributor

I've had a look at the BIOS boot delay settings but it occurs to me that this will apply each time it boots rather than just when HA is starting the VM after a failure. When I'm restarting VMs during normal working time I don't want to have to wait for it to have this delay every time as I know the SAN will be up and running at that point.

I do find this a bit strange as I thought HA was one of the fundamental advantages of using VMware VI but it doesn't seem to recover after this sort of failure. What do other people do in this situation or do you simply not expect your environment to recover after a power outage?

Thanks, Sam

0 Kudos
Tigerstolly
Enthusiast
Enthusiast

Hi,

I've done this in the past with APC's managed rack PDU's. They can be programed to supply power first to SAN and ethernet switch infrastructure, and then servers after say 120 seconds.

This wil delay the startup of the physical hosts, so its a physical solution to a virtual problem.

edit....

This was the product i used

http://www.apcc.com/products/family/index.cfm?id=70&tab=features#anchor1

spchurchill
Contributor
Contributor

Thanks, that's a good idea. I rang APC about it and they didn't seem to understand what I was asking for as they started saying that PowerChute sorted that out. I'll see what I can work out and get budget for.

I'm also investigating putting the delay into the BIOS of the ESX servers that we have (HP DL380 G5) but I think it may be limited to about a minute.

0 Kudos
jpaf
Contributor
Contributor

Hi,

I know this is an old thread but has anyone found a solution to this? Recently due to numerous power outages my organization has experienced the same issue with some ESX servers which depend on SANS not being able to boot and must wait for the SANS to boot first- which means sometimes the server is waiting till someone realizes something is down then having to manually reboot the server. It would be much appreciated if the server could repeatedly try to boot if unable to do so intially. Now some servers, the host boots but some VMs don't.

The servers we have are mostly HP ProLiant DL380p Gen8 servers.

0 Kudos