terrible_towel
Contributor
Contributor

ISCSI SAN selection help needed

Hello,

I'm looking for advice for the back end storage solutions for ESX servers.

What can give me a datastore that is visible to all my ESX hosts, and that datastore

is protected from:

  • Controller Failure (or planned upgrade outages) (clustered controllers ??)

  • File system corruption (mirrored LUNS ??)

  • Disk failures (RAID DP)

I've been bit by controller failures and file system corruptions that have take multiple hours to recover from in the past.

I'm wiling to pay for double the disk space and 2 controllers.

I've talked to NetApp about a clustered solution, but I'm not sure that

is what I'm looking for. I've used NetApp for a long time and trust

them,but I cant get stuck doing a 12 hour WAFL_Check on the ESX

datastore luns...

I want a setup where ESX sees a single ISCSI target with one or more LUNS. And that target

is really a cluster of 2 controllers that are using 2 separate sets of

disk that are kept in sync. And if the primary controller goes down,

the secondary takes over without having to do anything on the ESX side.

Sorry for the vagueness... but my head is starting to spin while doing this research.

Thanks.

0 Kudos
16 Replies
azn2kew
Champion
Champion

You can look at the following iSCSI solutions:

Lefthand Networks

iStor

NetApp

Compellent

Equallogic

Falconstor

3PAR

SANRAD

Best is to contact their sales reps and give you details about their solutions and compare from each vendors with performance, price, flexibility and stability. I would look at Equallogic, Lefthand, NetApp if possible. NetApp is really good with NFS solution as well.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA
0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

I'm using 2 LeftHand NSM2120's. On the critical VMFS volumes - I have the volume replicated to both LeftHand devices. With this setup - even if an entire SAN dies; the failure is transparent to ESX and it will keep running your VM's as if everything's normal. To conserve space; you have the option to NOT replicate all the volumes. i.e.: Volumes dedicated for testing or etc.

Other single point of failures are you iSCSI switches. You may also want to ensure you have 2 switches setup for better redundancy as well.

0 Kudos
EllettIT
Enthusiast
Enthusiast

I'm an EQ user however outside of looking at them I'd take a look at Compellent. I recently heard a sale's pitch in their favour and as far as that goes they look interesting. I know off the top of my head they do support clustering of the controllers however I couldn't speak to how they do that.

Linky:

http://www.compellent.com/Demos-and-Downloads/Datasheets.aspx

0 Kudos
terrible_towel
Contributor
Contributor

Hi Kevin,

Thanks for the reply. Can you give me more details about your setup ? Do the ESX servers see both of your arrays or only one ? How is the replication done -- do you make VM snapshots to make sure you get consistant vmdk files ? Does the replication happen cjontinuously ? You can PM with an answer if you want.

Thanks,

Paul

0 Kudos
terrible_towel
Contributor
Contributor

Thanks for the replies.

I'm still looking for the answer to how to recover from back end disk storage failure.

If I've got at an ISCSI array with one big LUN that is being used for

all my VM's, and that array dies, what are the ways to recover from

that. Say recovery time can be up to 1 hour.

I am hoping to find an array cluster that is that in an active/passive

setup with the passive head being a mirror of the first. The array to

array copies would need to be done by the arrays themselves. And then

if the active array fails, the passive array would take the identity of

the active. There would be a small outage, probably need to rescan the

ISCSI HBAs on the ESX servers and then reboot all the VM's.

Can this be done ?

I've talked to NetApp, but they get all tied up with VMware SRM and

their Snap Manager for VI products. Those seem overkill. I'm not

looking for a DR site recovery or a way to recover files from VM via

NetApp snapshots...

Paul

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Remember - your ESX servers are seeing an abstracted abstracted volume. Your SAN's will actually form a cluster (with it's own cluster IP). When your ESX host talks to the SAN - it doesn't talk to the SAN's individually per say but rather to the cluster address. Lets say that I have a cluster of 4 SAN's and I created a volume that's replicated to all 4 SAN's - ESX will only see one volume. It just so happens that the particular LUN will be highly available (survive up to the loss of 3 SAN's).

The actual setup is easy and goes something like this:

1) Create your Volume (LUN)

2) Tell it how many times it should be actively replicated (i.e.: pick 2 if you have 2 SAN's and the volume's important to you...)

3) For me I also make sure that the volumes are on thin provisioning...then I go ahead and create the volume

4) Tell you ESX server to talk to the SAN cluster and rescan for new storage. Then go ahead and create a new datastore.

Once you created your datastore off of this LUN - even if you lose a controller or a whole SAN; your VM's running off of that datastore won't know anything.

You don't need to take snapshots for any of what I just said; all the replication happens live. Lefthand also has a witness server that can run in an ESX environment to monitor your SAN's (in case you have an even number of SAN's). It's a tiny little 256MB VM that you can download off of LeftHand's site. Good to have to avoid split-brain situations when you have failures.

For offsite replication - I can create another volume and tell it to replicate remotely to another volume (different subnet / etc). I took a VCP course last month and my instructor told me that the new VMware SRM course runs off of LeftHand SAN's.

Hope this helps you understand it all better

jguidroz
Hot Shot
Hot Shot

We're about to be installing two Lefthand Networks NSM 2120s. The other feature that drove us towards Lefthand is the ability to locate each node of the SAN in different datacenters, but still visible as one SAN to the ESX hosts.

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Yup it's a great feature the remote replicatin. We're deploying 2 more in a remote site that will replicate to the office I'm in at a block-level. Saves bandwidth. Smiley Happy

0 Kudos
terrible_towel
Contributor
Contributor

Thank you for pointers to LeftHand. I have started looking at them and really like what I see.

Paul

0 Kudos
arthurvino1
Contributor
Contributor

I dont like the fact that Lefthand only has 1 controller per NSM 2120 unit.

One basically has to purchase 2 units in a mirror and only have 1 unit of usable space..

Hw does a 3 unit purchase works in term of usable space? Kinda like raid-5?

Also, I was told to be careful not to stripe the data volume over multiple NSM units as a single unit failure would cause a data failure.

Any pointers would be greatly appreciated.

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Both of our NSM2120's have 2 controllers each.

The mirroring is done at a volume (LUN) level not at the device level.

If you have 3 units and you want maximum protection - then you would set your replication level for the "critical" LUN's / volumes to be "3 way". If 2 of your units die - a copy of the volume is still on the last unit. If the volume was not set to be replicated then yes you can lose data

Hope this helps you.

0 Kudos
arthurvino1
Contributor
Contributor

NSM2120s have 1 controller each, not 2.. Its listed on their web site..

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

You're right about the website - it does say one.

Our units both appears to have 2 (I attached a fresh screenshot).

0 Kudos
arthurvino1
Contributor
Contributor

Kevin,

Thats interesting. I think its just 2 RAID-5 sets?

If LH had 2 controllers, i'd make my purchase decision much easier..

How u like the units?

0 Kudos
terrible_towel
Contributor
Contributor

Does anyone have any experience with LeftHand using SATA instead of SAS drives for an ISCSI target for ESX ?

In our particular environment, the vast majority of our VM's local

disks only hold the OS's. All application programs and data are coming

from NFS or CIFS shares from other sources. In this case I think SATA

would fine. We may have slower boot ups, cloning operations, etc, but

normal steady state operations should not put very much of a load on

the ISCSI datastore. I am currently using an old NetApp via NFS as the

datastore, I have 30 VM's running, and routinely see <100 NFS

ops/sec during normal operations.

I'm thinking of a two 2060 units with 6 SATA drives each. The volume

would be mirrored between the 2 units. I think we will end up with

75-100 VMs on this datastore.

Any comments ?

Thanks,

Paul

0 Kudos
Kevin_Gao
Hot Shot
Hot Shot

Hmmm you could be right. In our setup - we have 2 local LeftHand's replicating to each other (for the critical LUN's). Then for mission-critical LUN's - we have them replicated to 2 LeftHand's in a remote site. So I never really paid attention to how many controllers these things have. I'll ask bug our LeftHand contact later and see what he says. Smiley Happy

As far as SATA is concerned we never tried it but did think about it (for the remote SAN's). We ended up going with 15K RPM SAS drives on all 4 units. Anyways whether you go with SATA or SAS; it seems to be a "recommended practice" to not exceeed 16 active VM's per datastore for iSCSI SAN.

How's your NetApp? What's the exact model and how many drives you have? I was always curious about those as we're using a cheap NFS store right now (for backup and template storage).

0 Kudos