Re: ESX San Recovery Best Practices

dquintero · ‎07-24-2008

Hi, im a newbie in Vmware ESX Implementation services and althought

i have covered all the basics the are certain aspects that are till unknow to me.

I was wondering if anyone could help me with this right now im working with a client

that had a major SAN failure (storage system suddenly going down), After we helpeed

getting the storage up he found out one of his virtual machines got corrupted either

by the process he used to bring it up after the failure or by the failure itself.

In short he tried to start the vmachine and it failed saying that the disk wasnt found

he proceeded to manually search for the disk and started and althought it started okay

it didnt match with his latest snapshot state, he then proceed to try and recover and older

snapshot wich in turn was worse cause the snapshot wouldnst start so he ended up

losing two day wise business data.

So after all this my question is Best Practice Wise wich is the right procedure

to execute when having this kind of problem.

Regards

Daniel Quintero

whynotq · ‎07-24-2008

may i say firstly, welcome to the family....

as far as your query goes, i'd start with implementing a backup policy for the VMs, analyse RTO and RPO requirements and build around these. i wouldn't rely solely on carsh consistent snapshots to role back to although they are usually good enough. it seems strange that 1 machine was affected and the others ok although SAN failure is pretty rare and it is hard to say what this particular VM was doing at the time, it may have tried to VMotion as the SAN went down.

the only thing i'd have tried would have been an FSCK from the ESX host before the VMs were brought back up.

dquintero · ‎07-25-2008

Jeje thanks for the warm welcome

Another question is there a white paper , article or document that address more deeply how

esx address partial faults , like lost of a hba, lost of a path to san , lost of san controller

etc ?

Daniel Quintero

kjb007 · ‎07-25-2008

See these:

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

All

ESX San Recovery Best Practices