VMware Cloud Community
mwheeler1982
Enthusiast
Enthusiast

Single vs Dual HBA in ESX Host

I just have a quick question...

Right now, we have 2 ESX 3.0.1 hosts with Dual HBAs. In July, we are adding 2 more hosts. Is it really worth it to buy dual HBAs for my hosts when I have that many? Isn't HA around for host failures?

In the past, we have been buying our HBAs from our SAN vendor. They are a little bit more expensive than buying from some random vendor off of pricewatch, but they give lifetime replacement, so it's not that bad of a deal. We have standardized on QLogic cards if that matters.

If I switched the hosts to have only 1 HBA, I would probably keep a spare card lying around just in case one died... but is it really worth it to buy dual HBAs for my hosts, eat up ports on my fibre switch, etc etc waiting around for the rare case that a card is going to die?

What are your experiences?

Reply
0 Kudos
14 Replies
Ken_Cline
Champion
Champion

Well, that's a question that only you can answer. You have to weigh the risk of a failure - and the associated downtime - against the cost of the HBA and supporting infrastructure. If you feel that having outage that cost your company $100,000 could have been avoided by spending $3,000 in infrastructure costs - then you'll buy the extra HBA and fibre port.

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
mwheeler1982
Enthusiast
Enthusiast

Thanks for the response, Ken

We're a university, so we're a bit different from the public sector in that sense. We can't really equate downtime to $$$. Also, our main administrative system cannot be virtualized as it is AIX.

I mainly asked this question to find out what others out there are doing.

It seems to me that memory is more likely to fail than anything else.. and there's not much you can do about that.

Reply
0 Kudos
esiebert7625
Immortal
Immortal

I think it's a good idea, if I am paying $20,000 plus for a server that will take the place of 12+ servers paying an extra $1,200 for an additional HBA is good insurance. I know if you use HA you're covered if a HBA or path to the SAN fails but it does mean a hard crash of all your VM's and possible data loss and corruption. If you have the money I would say do it, if money is tight then spend it on something more worthwhile like additional memory for the server. Depending on your SAN environment the chances of a failure can be very small, I recently had a new 585 take a dump with MCE errors and a PSOD, HP came out and replaced the memory and processor board. You are correct that the likelyhood of something else failing is greater. I just like to cover all my bases and use as much redundancy as possible.

My 2 cents...

Reply
0 Kudos
mwheeler1982
Enthusiast
Enthusiast

There is one other thing I hadn't completely thought through...

We're booting our ESX hosts from the SAN.. So, since the current version of ESX cannot handle multipathing for the boot volume, the whole box is going to go down anyways if the HBA fails.

I guess that's a downside to booting from the SAN!

Reply
0 Kudos
evanstra
Contributor
Contributor

Oh... this is was a very heated discussion for us. We just bought a bunch of new 385g2 and 538g2 servers to replace our aging 380g2's (current environment).

For my 'sandbox' ESX server I spec'ed a single dual port HBA because is it is just that ... a test/sandbox and it is a 385 with limited slots (but more than a 380)

For the production environment I spec'ed two HBA's per server. When the order came in I had a single dual port HBA for the 585's. The manager who changed the order stated that in his umpteen years of being with the company he had never seen a failure of an HBA cause downtime. ...since the dual port HBA's were cheaper than two single port and accomplished the "same thing" he made the executive decision!

I'm sure when we have the first crash it will be more along the lines of why didn't "I" spec the hardware right.

...went to lunch with our SAN guy not long after the order came in and he was chuckling about it. We've had HBA failures before and the only reason the manager didn't really know about it was because the second HBA failed over the way it was designed!!

So to my point: I'm a firm believer in two HBA's in a production/critical box!

Eric

Reply
0 Kudos
esiebert7625
Immortal
Immortal

Ha thats funny, you have 2 distinct and different paths all the way to the SAN but a fibre card that is a single point of failure....

Reply
0 Kudos
FredPeterson
Expert
Expert

Make a diagram for the guy...showing three rivers...two of them have two bridges and one has one. Then ask the guy what happens to traffic if bridge A, B etc, go down.

Reply
0 Kudos
mreferre
Champion
Champion

Some people see a dual-path architecture as a mean to provide redundancy at the SAN/Switch level rather than at the box level (i.e. I am concerned if a switch fail Vs I am concerned if an HBA fail). It really boils down to your own design/requirements but dual-port HBA's has a play in some circumstances ......

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
glynnd1
Expert
Expert

In answer to mwheeler1982 question, you do always want two paths out of the server. Granted there is an additional cost associated with this, and maybe depending on your SLAs or environment you can skimp on this.

As for one dual port card vs two single port card, yes two is better then one, but in some environments this can be difficult to do. The typical 2U server has two (DL320) or three expansion (DL380, Dell 2950) slots, this leave you balancing between network & storage connections. If you have only two slots you have no choice but to go with a dual port card - unless you run VMotion, the Service Console and your VMs over the two on-board NICs, do-able...but tight. With three slots if you use single port cards you may be forced to use a quad port NIC if you need many network connections. This is of course a limit of small but powerful servers.

I know one of the big server makers is coming out with a 2U, Dual Quad Core and 128gb RAM server with four on-board network cards that they are targeting at the virtulization market - no doubt we'll find some other limitation .

Reply
0 Kudos
bretti
Expert
Expert

In our environment we've made it a standard part of our server order to put two HBA's in each host. It was a long argument with management for the extra $$$, but in the end it has paid off for us.

Besides the fact that hardware failures do occur, there is also human error to deal with. Misconfigurations, improper zoning, poor cabling, accidents, etc.

In our case the most common failure we have had was on the FC switch itself. We have had 6 cases that I can remember where one of the four "Quads" on a blade would go out. Taking out all four ports on that quad. To get it fixed we had to replace the blade. Before going dual connected that meant downtime, and extra time and effort when replacing the failed blade. Now that the hosts are dual connected, we can do maintenance whenever we need to without downtime.

I would recommend dual HBA's in every host. Unless the VMs that are hosted there can be down for extended periods of time with short notice.

Reply
0 Kudos
williambishop
Expert
Expert

Make sure you document now, the peckerhead will deny he ever did such a thing in 1 year when everyone's forgotten(even him) that he did this....

CYA.

--"Non Temetis Messor."
Reply
0 Kudos
sheetsb
Enthusiast
Enthusiast

With fully redundant dual paths maintenance becomes less of a problem. If you need to upgrade firmware or work on one switch in one fabric, there should be no downtime needed for your ESX hosts. We have redundant switches, controllers, HBAs, etc. and we, too, are a university. Most of the redundancy is required for our hospital and financial applications, but the same needs are required since we currently have 200 VMs operating.

Bill S.

Reply
0 Kudos
murreyaw
Enthusiast
Enthusiast

Keep with the dual. Do you have dual switches as well? Without them, it doesn't really matter. Personally, I have seen more switch failures than individual card failures.

Reply
0 Kudos
MattG
Expert
Expert

We only install ESX with 2 x 1 port HBAs. We use dual ports HBAs in our dev servers.

Like others have said, it is not the HBA failure that you are insuring against, but rather the paths to the SAN. Think of the case where the SAN team needs to do maintenance on the fiber switches. Typically, they are redundant and they will make changes to one at a time. If you are not redundant, either you will go down hard, or you would need to take your ESX server down before the change.

-MattG

-MattG If you find this information useful, please award points for "correct" or "helpful".
Reply
0 Kudos