I currently have 1 SAN with 2 x SAN switches and 3 ESXi 5 hosts in a building.
We have purchased a second SAN with 2 x SAN switches and placed this in building 2 on the same campus. Both SANs are fibre based and have been connected together by linking SAN 1 Switch 1 to SAN 2 Switch 2, and SAN 1 Switch 2 to SAN 2 Switch 2.
Currently I am using SAN 1 as our production SAN, with all live VM's running on this SAN. However I am using SAN 2 as the store for our VDR appliances. So essentially I have SAN 1 with production data, and SAN 2 as a backup SAN.
What I would like to do going forward is put 3 ESXi 5 hosts in building 2. At this point I would like to do the following:
For any VM which has its own mechanism of replication, simply create another seperate VM on SAN 2. This would cover for example DC's, DFS servers, Exchange DAG etc. This should in theory provide the highest level of availablility because the loss of anything in building 1 should mean the application itself should be available via the infrastructure in building 2.
For any VM which is just a single instance (So for example an 3rd party application server), run it on SAN 1 with the option to move it to SAN 2.
At this point have VDR appliances in each building which backup to a backup area on the other building's SAN
Now here is where I am getting stuck. The SAN's I have support replication although we have not purchased it yet (EMC VNX), so this is an option to follow (Also possibly use it for the backup LUNs and have fewer VDR appliances), or I have the simple restore from VDR backup onto building 2 SAN.
My questions revolve around how I should do this from both a VMware point of view, and a networking point of view.
Should I create 2 separate clusters in VMware, or use the DRS groups and have a single cluster? Two separate clusters seems to be the most logical and easy to maintain situation, but what would I lose?
Also the networking is proving confusing. Currently we have a flat L2 network so originally this was not going to an issue at all, however the network team have decided to implement a geographical IP scheme of allocating a class B per building, and dividing this into lots of class C networks. So essentially building 2 will have a different IP range.
After some research I can see I have a few options from simply allocating a "floating" range and stretching the VLAN (This seems fairly simple on the ProCurve switches we have), to using such things as VXLAN, OTV etc. (Which I do not fully understand, nor do I seem to have the kit for?). Can anyone comment on this?
Thinking about stretching the VLAN, my obvious question is what happens with the gateway? I was thinking of some kind of VRRP setup to eliminate this weakness, but I have also seen suggestions around having a second gateway on the stretched end and simply having to reconfigure the gateway of the server being moved, (And not changing its IP etc)...
The other questions around this involve being able to move a single server as opposed to the entire IP range?
Replication that you are talking about seems more akin to DR/Site Recovery but you seem to be talking about on demand agility between buildings.
If your are talking vmotion between the buildings you'll need a R/W copy of your datastores on both sides to achieve this even if your L2 network is extended. EMC Vplex or IBM SVC can do this (stretched cluster storage).
But if your saying you can take a hit for the time it takes to (manual or automated) make a destination lun R/W then the next step is how to deal with an IP change. If you want to make the process as automated as possible you might need to look at SRM, but that requires 2 vcenters.
Talk to your network folks about static NAT type functionality where your VM's IP can stay the same even even they move to a different subnet. VMware vSheild also has this NAT capability by the way.