VMware Cloud Community
Dryv
Enthusiast
Enthusiast

HA vs Single Physical Server

Hi Guys,

I need a little bit of theory here please.. specifically in the case of a power outage

So, first of all VMware HA. In the case of a single ESXi server completely losing its power and simply powering off hard, a VM (if configured correctly) will restart on another host in the cluster. The VM will be in a crash consistent state I believe.

Now to compare that with the above VM not actually being a VM but being deployed on a single physical server (No SAN this time but local disk). Again, pull the power and let it power off hard. After this feed the power back and let the server start, again it should be in a crash consistent state.

In this particular abrupt power off scenario, am I better off running the server as a VM or as a single Physical Server? As in is it likely one is more crash consistent than the other and more likely to recover from the abrupt power off, the fear is maybe a blue screen upon power restore as both went down abrupt.

As usual, thanks!

D

0 Kudos
3 Replies
MKguy
Virtuoso
Virtuoso

As far as I'm aware ESXi does not employ any kind of software (write) caching or similar transparent IO transformation/reordering/dedupe/etc of VM IO. The IO requests of the VM are transferred to the block storage device as-is (extremely large IOs may be split, but the default split IO size is so huge practically no application uses something like that).

Also just to note HA does not really matter in this case, it would just automatically re-power-on the VM from the the same storage device as opposed to you manually doing the same.

But it should be obvious that the additional virtualization layer adds complexity and more possibilities for failures/inconsistencies. While VMFS is a journaling filesystem, a power failure might still cause corruption on the VMFS layer even if the guest OS file system and application layers would have been fine.

Our data recovery guru  continuum can probably tell you more than a few horror stories of broken VMFS volumes (due to various causes, including but not limited to power failures).

In the end it comes down to the reliability of the hardware, the resilience of the application software and sheer luck. I can't stress this point enough, but this is why you always need to have backups.

-- http://alpacapowered.wordpress.com
0 Kudos
Dryv
Enthusiast
Enthusiast

Hi MKGuy

Thanks for the detailed response.  The case being made is the complexity case of virtualising single server app servers which currently have zilch application level resilience. I'm putting forward that in the case of a power outage, that everyone has started to hold onto (they are against virtualising), we are in no worse position by virtualising... only better given we benefit from all the other features of a properly designed virtual platform. I found this article which is pretty good at stating we get the same as on physical in tems of io consistency:

VMware KB: Storage IO crash consistency with VMware products

In a crash state i can't guarantee that the VM wont blue screen or the app gets corrupted! But i also don't think they can guarantee this by being on physical hardware! Or really argue they are better off on their single server app server setups?

completely agree backups are key.

0 Kudos
MKguy
Virtuoso
Virtuoso

Sadly, I know exactly with what kind of people you have to deal with. The best you can do is an apples-to-apples comparison by creating 2 identical setups, one virtual and one physical, and compare how they behave in a realistic failure scenario (while doing that, you can also showcase the vMotion/HA features and how it will help reduce downtime). Then have management decide which way to go and have them clearly communicate that decision to everyone.

As you mentioned, nobody can guarantee there won't be any issues with data corruption or other issues after a hard power failure, but this goes for virtual and physical systems alike. We all know Murphy's law and how it slaps us when we least expect it. It's possible your tests will run fine, but an actual outage later causes issues. However, again it's impossible to say this wouldn't have happened in a physical setup.

If they are really that worried about corruption from power failures, maybe they should invest into a decent UPS first regardless of running the application virtual or physical.

-- http://alpacapowered.wordpress.com
0 Kudos