VMware Cloud Community
andrebrownjm
Contributor
Contributor

SRM with LeftHand SAN in Multi-site configuration

I'd appreciate if anyone with HP/LeftHand experience could comment on the following.

I've got a LeftHand SAN (two nodes) which is currently configured in a multi-site setup with one management group. From my research I have discovered that I need to have two management groups to get the SRA to recognize the SAN as being capable of replication.

My first question is this: can I have a multisite configuration with two management groups with only two LeftHand nodes? I've seen references to using VSA to create a second management group. Does the VSA management group need to have one of the nodes added, or does it not matter?

My second question is related, but of a different nature. In the same multi-site setup, is it possible to have two ESXi hosts on two different networks connected to the same volume? Here is the setup:

Site 1 ESXi:

VM and management network: 192.168.101.x

iSCSI network: 192.168.98.x

Site 2 ESXi:

VM and management network: 192.168.201.x

iSCSI network: 192.168.206.x

SAN VIPs:

192.168.98.11

192.168.206.11

Since the SAN is setup as a multi-site SAN, can I have both ESXi hosts connected to the same volume? I've found that I can connect one, or the other, but not both at the same time. And sometimes after connecting and disconnecting on host, I can't connect the second.

Thanks for any help.

The answer you seek is *+5,2*3,2
0 Kudos
14 Replies
andrebrownjm
Contributor
Contributor

Ok, I got the answer to the "two subnets one volume" scenario. You can get the details here. The summary is this: whenever a SAN volume is accessed by an iSCSI initiator, the volume becomes bound to the virtual IP through which the access was initiated, and therefore it is also bound to the initiators/virtual IPs subnet. The VMware iSCSI initiator does not support accessing targets outside it's own subnet. Therefore, two servers cannot be connected to the same volume using initiators on two different subnets.

However, there is a work around. Create an additional VMkernel ports for the other VLAN. So each host will have VMkernels ports on two subnets.

The answer you seek is *+5,2*3,2
0 Kudos
hutchingsp
Enthusiast
Enthusiast

Did you managed to try this yet?

I'm looking at a P4000 multi-site setup and trying to figure all this out from the documentation - in the scenario you mentioned I'm not clear what happens if the storage in a site goes down, do the vSphere hosts automatically connect to the LUNs on the remaning VIP?

Thanks.

0 Kudos
andrebrownjm
Contributor
Contributor

Two node configured in a "multi site" setup will not work. There has to be two management groups with two separate volume lists, and a Remote Copy setup between them.

When there is a failure, storage or otherwise, at the Protected Site (the main site) you initiate a failover and SRM will map the recovery host to the Remote Copied volume, promote the volume to a real volume, and then boot the VMs on it.

The answer you seek is *+5,2*3,2
0 Kudos
mbasso19
Contributor
Contributor

Did you manage to get this scenario running in the end?

got a similar scenario where I bought a Multi-site Lefthand solution (4 nodes) and I am wondering what's the best practice to get SRM working on it? The purpose is having Network Raid implemented between all nodes therefore using synchronous replication but if we need to configure 2 Mgmt groups and 2 clusters with remote snapshots between the two you cannot have "live" replication.

how did you sort it out?

Cheers

MB

0 Kudos
andrebrownjm
Contributor
Contributor

Hi MB,

Yes I got it working in the end. But unlike you, I didn't have the luxury of 4 nodes Smiley Happy I ended up setting up a single node at each site. So the only replication I have is with Remote Copy. I don't get network RAID.

With 4 nodes, you can have and ideal setup. The key is that you need two management groups so that you can setup Remote Copy/Snapshots between them. It's the remote snapshots that are recognized by SRM. With 4 nodes, can setup two sites with two nodes per site, and network RAID between the nodes at the site.

MANAGEMENT GROUP 1/CLUSTER 1 - NODE 1 @ Site 1, Node 2 @ Site 1

MANAGEMENT GROUP 2/CLUSTER 2 - NODE 3 @ Site 2, Node 4 @ Site 2

Network RAID would take place between NODE 1 and NODE 2, and between NODE 3 and NODE 4.

Let me know how it goes.

The answer you seek is *+5,2*3,2
0 Kudos
mbasso19
Contributor
Contributor

Yeah, you see that's the problem here...

Cluster1@SITE1 creates Remote Sanpshots to Cluster2@Site2 -


> it's asynchronous replication (not live)

How close can you create the incremental snapshots? Can you do it every minute or so ?

As we have a gigabit link between the 2 sites with latency <1ms we were hoping to use SRM to fire up machine intelligently and use Network RAID across the board (in a single cluster) but I am finding the hard way that SRM/SRA won't understand this configuration Smiley Sad

0 Kudos
hutchingsp
Enthusiast
Enthusiast

If you have same subnet connectivity you can put all the nodes on the same subnet.

0 Kudos
mbasso19
Contributor
Contributor

spanning the VLAN/subnet across the two sites, it would not be a problem. Are you suggesting to ditch SRM and use HA instead? if its the case how would the VCenter placement work?

please advise further

0 Kudos
hutchingsp
Enthusiast
Enthusiast

Ah sorry, I don't know the SRM part of that, but with P4000 you can have a cluster span multiple sites and use the same VIP address.

0 Kudos
mbasso19
Contributor
Contributor

yep, got that but how do you manage the DR scenario and plan for VCenter placement?

I take you'd need 2 x VCenter servers (1 x Prod, 1 x DR)

You'd need 1 VMWARE cluster only, which includes ESX hosts in production and ESX hosts in DR

You'd need to manage the HA process as you dont want you servers to flip to the DR site or have machine located to those ESX hosts in DR in normal circumstances

pls advise

0 Kudos
hutchingsp
Enthusiast
Enthusiast

First thing is we don't have SRM so this how I'm planning to do it when the kit arrives.

Site A - Some P4000 units and servers and a dedicated switch for iSCSI

|

|dedicated fibre link

|

Site B - Some P4000 units and servers and a dedicated switch for iSCSI

Let's say the VIP of the P4000 cluster is 192.168.1.1 and that spans both sites.

In Site A (our primary) we'd also run a FOM on local storage on one of the ESX boxes, so as to keep quorum if the link goes down (to avoid split brain).

We'd have a FOM configured but not switched on at Site B in case we lost Site A.

In our case we would have a single P4000 cluster and a single vSphere cluster as we don't have SRM so would want HA to keep an eye on things.

I'm not sure if we're talking about different scenarios here though.

0 Kudos
mbasso19
Contributor
Contributor

yes - but how do you place VCenters in your scenario? because you'll need two for the solutions to work

psl adv

0 Kudos
hutchingsp
Enthusiast
Enthusiast

You only need one as if a site fails your shared storage is still available in the other site, so you can start vCenter on a server in that site (assuming your vCenter is a VM and is on the P4000) surely?

0 Kudos
andrebrownjm
Contributor
Contributor

On the vCenter side of the issue, you will need two vCenter Servers. vCenter manages the whole SRM failover process, so you need one configured and running at the recovery site. There is no way around that.

If you want the synchronous failover, then here is one possibility:

MANAGEMENT GROUP 1 - NODE 1, NODE 2 - Network RAID between these

MANAGEMENT GROUP 2 - NODE 3, NODE 4 - Network RAID between these

Then you would configure Remote Snapshots between the two management groups (required for SRM).

SITE A - NODE 1, NODE 3 - One node from each management group

SITE B - NODE 2, NODE 4 - One node from each management group

With the above setup, you get synchronous replication between the sites, as well as the Snapshots required by SRM. The result is multiple levels of redundancy - network RAID between sites, and SRM.

In answer to your question about HA, the "official" response is that it's not "designed" for DR scenarios.

And on the LeftHand replication, the minimum period for replication is 30 minutes.

The answer you seek is *+5,2*3,2
0 Kudos