I'd like to have some feedback on what we are trying to achieve here...
On the same site, we have 2 distant rooms for disaster recovery on the same site. ESX host(s) in each room with its own SAN. We would like to have the best possible solution for an Oracle Database with transparent failover and thus (perfect / live) replication between those two rooms.
Our current idea would be to use RAC and ASM - with an ASM Disk Group that has disks from the SAN in each room. The idea is to avoid using any kind of SAN replication and let Oracle handle everything. I would then configure the Oracle Virtual Servers in room #1 and #2 with disks from the SAN in room #1 and from the SAN in room #2. All of them in the same ASM disk group.
The way I understand this, Oracle ASM will handle the "replication" and if Room #1 burns down, RAC will allow business to continue transparently.
Am I day dreaming?
What about having disks from two SAN in different rooms in an ASM disk group? Is this viable?
Any other ideas / possibilities to solve our issue?
Thank you very much!
Do you really want RAC on top of VMware though? Both are techniques for partitioning hardware and so I don't really see the benefit. Yes, RAC runs nicely on ESXi lab servers but I'm not sure I'd want to have the two technologies together in production.
If it were physcial RAC that certainly is feasible, though there have been some ASM bugs affecting these cross site systems. Don't forget you'll need a small server at a 3rd site for the voting disk to prevent split brain scenarios.
You do also need to consider whether a RAC cluster is justified (i.e. do you need horizontal scalability and as little unplanned downtime as possible?). Otherwise you might be better with VMware and Data Guard (or 3rd party standby management) or even using vMotion etc.
I clearly understand your points. This is the solution our Oracle DBA's are used to and I just presented my case this way.
Note I know nothing about Oracle.
To my understanding RAC / CRS is quite similar to Windows' Clustering and I agree with you it may not make sense when using under vSphere.
I am more concerned about having the database portion replicated "realtime". That is why I exposed my scenario with an ASM Disk Group with disks from 2 different sites inside. Those two sites (computer rooms) will be connected by FC and fast ethernet connectivity as if it was the same room. Thinking about this concept, I was just wondering if the data was effectively written to all the disks of this group - and if my room #1 (SAN #1 and ESXi #1) dies - will the database be in a crash consistent state when turning on the VM in room #2 with the disks from SAN #2?
You talked about "some ASM bugs affecting these cross site systems". Do you know where I can find this information?
Again, my main issue is about having this data replicated on the other site with no loss of information. With Data Guard will not a substential amount of time be taken to turn it into an active database and effectively replace my production site?
Thanks again for any advices.
Sorry not to reply soon - must have missed the email.
I don't have the exact number for the ASM issue but should be some notes for it in My Oracle Support - I've just searched MOS for ASM Extended Cluster and nothing obvious has jumped out but I know from some user group sessions I attended that there were issues (maybe fixed now) so it would spending an hour or two digging. It was related to split-brain issues if you lose the interconnect I think.
Switching over to a standby managed by Data Guard can be pretty quick (say, a minute or so). Of course you will lose all your connected sessions (as compared to RAC cluster where you might lose, say, half of the sessions) but, if you don't need RAC for scalability reasons, Data Guard is a far less complex solution (and well understood by most DBAs). Switching the primary back to the other site these days is also much easier using flashback and then rolling forward. Plus with Data Guard you have the benefits of having a standby database you can check, take backups off, etc.
These are all just my opinions of course; others may disagree (or suggest a VMware way of doing things like vMotion etc) 😉