VMware Cloud Community
tlyczko
Enthusiast
Enthusiast

Seeking SAN recommendations/info with this starting config

This is for a SAN with three ESXi hosts and 30-40 VMs. It's already been determined a SAN works best for us.

Our measured IOPS for our three ESXi hosts are: 1294 at 95%, 1770 at 99%, 2507 at peak, throughput was 92.


We have considerably more reads than writes but the write portion could increase in years to come.

At least 4 TB to start with 10-20% (I think) annual growth, hence the SAN need not be fully populated...

We're probably not able to use SATA disks except if we have a mixed  configuration (e.g. X number of SAS 10k drives, Y number of SATA  drives), as I'm told with these IOPS numbers we'd have up to 80% IOPS  utilization if it were to be a SATA-only array. That's not a lot of  headroom, to say nothing of the SAN's processor.

I am interested in reading comments from any participants about the  pros/cons of HP P2000/P4000, HDS HUS 110, EMC 3100/3150, Dell EQL  SANs...

My interests are comparative performance (general info, I understand  you can't be super-specific), thin provisioning implementation and  usage, ease of adding disks to storage group/pool of disks (or  additional tray/node), what software management is included, and how  support plans work, if possible.

Price (I will get quotes from our vendors) will be important after reading any/all comments people are willing/able to offer.

Thank you, Tom

0 Kudos
6 Replies
mikeyb79
Enthusiast
Enthusiast

I can go on and on about how happy I am with Dell EqualLogic SANs. I have two, one PS4000E I stuck in a branch office, and a PS5000E I inherited through an acquisition. The ease of use and all-in-one licensing model really sold the first one for me, so when I found out I was getting a second I was pretty pleased. As long as you make sure you follow the best practices documentation for VMware, you'll see great performance on something like a PS4100X (24x 10k SAS spindles, 2 arrays per group). You can buy a half-populated array and add capacity as you need, our PS4000E is half-filled and the PS5000E just recently went from 8 to 16 drives as part of an upgrade. SAN HQ is great software, too. Intuitive, with a cool dashboard page for IOPS and latency monitoring.

I would track down a Dell partner in your area and have them set you up with a DPACK benchmark, they can use that to size your storage environment appropriately in terms of IOPS and capacity. You can read more about it here: http://content.dell.com/us/en/business/d/sb360/dpack-sc.

As an ex-HPer, I can't say I was overly fond of their offerings in the low end to midrange market (P2000, P4000 - we always went for the EVA upsell) and I had horrible support experiences with IBM for our DS4000 series, I would be afraid of having that spill over into their DS3000 range.

There is also NetApp in that range as well, you can buy a FAS2220 for reasonable money nowadays. Just watch out for the pricing on the software features, and I'm not fond of OnCommand GUI (OnTAP CLI is nice though).

0 Kudos
Josh26
Virtuoso
Virtuoso

Question here is what servers you have?

I wouldn't put a Dell SAN on an HP server and vice versa. Of course it *should* work, but you're in vendor hell when there's a problem and each of them blame the other.

Edit; This type of thread here is very common, you will nearly always get a glowing review of a certain SAN that is undeniably traced to a vendor, so beware.

0 Kudos
mikeyb79
Enthusiast
Enthusiast

Let me qualify my experiences a bit more (and hopefully try and dispell the multivendor hell myth somewhat):

We have servers from Dell (R310, R610), HP (DL160/380 G5 and G6), IBM (older x336 and HS21/22 blades) and Cisco (C200 M2) in our server rooms.

We've not had a problem connecting any of our hosts to any of our storage devices, over a mix of Brocade (branded IBM) and Cisco fiber channel switches (to NetApp and IBM DS series) and a mix of HP, Cisco, and Dell IP switches for iSCSI and NAS protocols (NetApp, EqualLogic, Openfiler and QNAP devices).

Qlogic 2462, 2472, and 2562 HBA's and Broadcom and Intel NICs provide connectivity to our various SAN fabrics. No issues with getting support from 3 of the 4 above server vendors regardless of what's behind them for SAN or IP connectivity.

Our remote sites consist of:

1- HP DL160's, Dell PowerConnect 6200's, EqualLogic PS5000;

2- Dell R610's, HP Procurve 2524-G's, EqualLogic PS4000.

And I work for a farm equipment manufacturer. But I did work for HP about 5 years ago.

0 Kudos
TheVMinator
Expert
Expert

I will weigh in here with the disclaimer that I used to work in NetApp technical support.

As far as the support center finger pointing issue:  NetApp traditionally distinguished themselves with support in attempting to compete with EMC.  You would have been unlikely 5 years ago to experience finger pointing.  However as NetApp is growing, they are definitely becoming more stringent in attempting to support only what they think is a NetApp issue.  From the mouths of many ex-EMCers there, NetApp is still a bit more generous than EMC.  Now NetApp is trying to push customers to buy premium support plans over standard support plans.  There will be a noticable distinction in your experience if you have a premium plan, which are much more expensive.   There are frequently issues where the issue of who should own the issue (i.e. server vendor vs. storage vendor) is not clear, especially in the beginning of the case.  In those cases you are much less likely to experience finger-pointing and buck passing with a premium plan and will usually get the benefit of the doubt.  NetApp used to be willing 5 years ago to actually log in to your physical or virtual server and fix a problem with your Windows or VMware configuration that is stopping it from connecting to storage even knowing the issue is 100% on the host side.  However you won't see that level of generosity in general anymore I don't think even with a premium plan.  I don't know how Dell and HP compare support-wise.

As far as thin provisioning, I think WAFL file system in Data OnTap does a pretty good job in that category and has pretty advanced features that can be turned on and off simply by entering a license key.  Everything is already built into one operating system ready to be turned on.   You don't have one OS for small systems and another for Enterprise systems as with EMC.

When you are planning for the performance needs, I will offer one word of caution.   Choose your drive type carefully.  Research the type of drive you are going to get and the manufacturer of it.  Make sure there are not known issues with that manufacturer's drives.  Beware of getting SATA drives when you need SAS.  Do your own homework thoroughly when calculating your IOPS and throughput needs and what drive types will meet them.  Check out what any  technical salesperson says against an objective standard if you can find one.  Don't pressue a salesperson to give you a given performance solution at a certain price.   You might get a solution designed for you at that price that doesn't meet your performance goals.  Having an entire storage system on your hands 2 years from now with drives that all have to be upgraded at a huge price because a compromise was made in the initial design is a very bad thing.   Adding capacity and adding shelves is easy and everyone understands the need.  But you don't want to be throwing away a few hundred SATA drives in 2 years and replacing them with SAS drives and be involved in making that wrong decision and the wasted money.   I have seen environments that worked well at the beginning  grind to a halt in 2 years when performance exploded on SATA drives.

As far as price, I think NetApp is going to tend to cost more than Dell or HP although I haven't ever priced them specifically.

As far as performance monitoring, in my opinion the command line peformance tools in OnTap are a bit cryptic and require training to decipher.   If you choose NetApp performance monitoring and reporting are imporant and you want to look at something intuitive and easy to use, I would plan to invest in extra GUI based NetApp software that doesn't come with OnTap.

0 Kudos
Josh26
Virtuoso
Virtuoso

To add some non-biased comments to this...

As far as the support center finger pointing issue

Thank you for these comments. I do think the worst of it is the "Dell SAN, HP Server" scenario. Neither of these groups get along and I'd had experiences with issues refusing to even be investigated on either side when that happens.

Once you introduce someone like NetApp, that doesn't do servers and everyone is "supposed" to work with, I'd imagine it gets a lot better.

When you are planning for the performance needs, I will offer one word of caution. Choose your drive type carefully

I can't stress how much I agree with this.

I see so much garbage on this forum about people stressing over RAID levels and LUN configuration because "performance is important to us" and I see some Openfiler machine running SATA disks.

0 Kudos
TheVMinator
Expert
Expert

One more thought as to the finger pointing issue.  Anyone who works with storage day in and day out is eventually going to have a situation where you are troubleshooting an issue such as connectivity or performance, and you have to determine whether the problem is on the host side, the storage side, the storage fabric, inside the OS or in the application or some combination of these, etc.  It won't always be possible to always know accurately fully and immediately exactly where the problem is.

If you call support often enough of any storage vendor, eventually you will have a situation where the problem looks like storage initially but with more info becomes clear it is really not on the storage side.  In that case that vendor is probably going to redirect you to the support of wherever they think the problem is.  For example, VMware host can't connect to a Fibre Channel LUN.  You verify the intiators are logged in, that the LUNs are online, read-write and mapped properly and you can see the LUNs in the storage adapters pane in VMware client but you can't see content of the LUNs and the VMs are down and can't access their hard disks.  At that point you can do no more to help the problem on the storage array.  You have to start working on the physical server and withing VMware.  Unless you have an unusually generous storage vendor you are probably going to have to contact the manafucturer of your server hardware and or VMware.  That will probably be the case no matter what storage vendor you are talking to.  Whenever people have to make this kind of a second call to another company, usually that ends up being called "finger-pointing".  But what is "finger pointing" really?  I think it is not necessarily having to call another vendor but having to call another vendor when you should not have had to.  That is what I think of as finger pointing.  I have certainly seen support reps make those kind of mistakes both on the storage support side and the host support and hypervisor side.  Mistaken judgment calls happen in any support center and the goal is to minimize them.  But the question is did they make a redirection that they SHOULD have made or one that they SHOUDN'T have made.  Did they jump to a conclusion that the problem is somewhere else before making a thorough enough analysis of the facts.  A good support rep knows when to redirect the call and how to explain why it is necessary but it still is going to be perceived negatively as "finger pointing" a lot of the time.

I've seen back and forth issues between NetApp and Microsoft, NetApp and VMware and it can happen between NetApp and just about any other vendor.  I think the same would be true for any storage company you deal with.  Especially if you are dealing with a complex performance issue in a large environment, you pretty much are guaranteed to have to involve multiple vendors and have multiple support contacts.

As far as drive type selection, as he said it is definitely a serious decision that can affect your job down the road.  I will elaborate with  an example from what I have commonly seen: your environment of 40 VMs runs for 2 years and then is suddently down.  All VMs are down.  The whole company is down, all domain controllers, critical infrastructure servers and all application servers and thousands of users are affected.  The cost of the outage is thousands of dollars per hour.  After a thorough performance analysis, which takes time, it is determined that your SATA drives could not handle an unexpected performance spike during a certain time of year.  You have to buy a large number of new drives at a large price and the data has to move over to them.  There is no quick solution and the visibility of the issue rises to the top of your company and takes days to fully resolve.  The question gets asked "if we needed SAS drives to support the performance of the environment, who was responsible for the decision to use SATA"?  The culprit turns out to be the storage admin and possibly the technical salesman - but ultimately the responsibility get's put on you.  That is the person you DON'T want to be 2 years from now, and it happens often enough to take serious heed.

0 Kudos