jacquesdp
Contributor
Contributor

Physical Server Configurations

Hi all,

We are in the planning phase of Virtualizing our Infrastructure, and are currently stuck with the question whether we should invest in a couple of really powerful machines, or a blade solution with more, less powerful blades. I am sure other companies have faced this choice and I would be grateful for any comments.

Thanks!

Jacques

0 Kudos
36 Replies
TomHowarth
Leadership
Leadership

OMG that post is almost as old any my time on the forums Smiley Happy it just goes to show, I am still the whippersnapper around these here parts

Tom Howarth

VMware Communities User Moderator

Blog: www.planetvm.net

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
mreferre
Champion
Champion

Tom,

ever heard someone saying "I was doing this before you were born!" .... ? Smiley Wink

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
Rodos
Expert
Expert

I have to chime in here., mainly at Tom and Ken.

Tom, I have seen customers buy two enclosures just for redundancy, even after I tried to convince them it was not worth the money initially. It does happen. They are very stable and really have no single points of failure. If you are happy with a single SAN you should be happy with a single blade chassis, well a modern one anyway.

However I have seen one go pffft. It is possible to damage the back plane and cause a few power issues, in that case you have damaged the non-mechanical hardware and you have to shutdown to replace it. This was on a HP pClass, I assume it could happen on a cClass or any other brand too. How do you do this, you take a blade and shove it into the enclosure like you are trying to drive the main stake in the ground for a large circus tent, then you do it over and over again. This can occur when you have a tech who is convinced of his own genius as well as his amazing strength. "It can't be my configuration of the SAN switches, the blade is just not seated right, thats why it can't see the storage, let me just slam it into the enclosure yet another time to convince you of my total stupidity". You probably have more chance of severing a power cord unmounting/mounting a standard server in the rack and blowing all the power and taking everything down that stuffing a blade enclosure.

I'm with Ken on this one, I have grown to love blades, especially with shorty from HP (I still have my shirt from VMworld when they released it, anyone remember them?). Previously it was hard to get a lot of nics in them, but these days you can get enough, and most customers don't need them.

I think in the lower and upper bands you have the real need for either servers or blades, but most people fit in the middle where either are fine, its going to be business and comfort factor decisions that drive it rather than straight technology, both have their pros and cons.

Rodos

Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}
0 Kudos
jacquesdp
Contributor
Contributor

Thanks everyone for your comments...

Yes what we plan to do is to have boot and storage volumes on the SAN. I am assuming that for VMotion to work we need to have at least the boot volumes there. I am just not sure what will give the best performance (i.e. should we RAID 0+1 or RAID5 for boot?). The storage volume will vary according to application.

Thanks!

Jacques

0 Kudos
mreferre
Champion
Champion

This by no mean is meant to downplay blades (nor a specific blade vendor). However I have to agree with Tom that the chassis (being it one) is indeed something that might cause issues.

See this example:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01519680&dimid=1012424238&di...

>HP has identified a potential, yet extremely rare issue with HP BladeSystem c7000 Enclosure 2250W Hot-Plug Power Supplies manufactured prior to March 20, 2008.

>This issue is extremely rare; however, if it does occur, the power supply may fail and this may result in the unplanned shutdown of the enclosure, despite redundancy, and the enclosure may become inoperable.

I am not implying that everybody that buys blades should by 2 x chassis but it is a fact that, being the chassis a piece of technology itself... it could experience issues of its own.

I guess what I am trying to say is that it boils down to being a compromise... 2 mainframes are more reliable than 1 mainframe but yet you don't usually buy 2 mainframes for redundancy reasons if you only need one... (1 mainframe is considered to be redundant and resilient enough). I am not obviously even trying to say that a blade chassis (especially the HP chassis Smiley Wink ) is as reliable as a mainframe.....

It's a fact that tightly coupled systems has a built-in higher risks than loosely coupled systems. There is always going to be a certain degree of risk when you push the bar and try to get the max out of the technology. It is not very different than situations where coupling together two independent physical hosts into a single High Availability cluster has caused more issues than it could potentially solve (sometimes).

Blades are loosely coupled systems when it comes to the logical layout (i.e. independent x86 servers) but are tightly coupled when it comes to the physical layout and this might cause "rare" circumstances where this is a problem (see article above).

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
jacquesdp
Contributor
Contributor

Hi Guys,

We currently have 5 chassis spread over 4 locations. Currently these chassis are used to house terminal servers. They have been in a good number of years (4 or so) and I think the only failure we had was one specific blade loosing a drive. So in terms of reliability either they are extremely good or we are extremely lucky (i would like to think it is the former). But can't the same argument be applied to having one big server to house lots of VMs? The server will also have multiple single points of failure, the same as the chassis being one. Also, we plan to have a chassis with PSUs in stock. So repair can be done without calling in 3rd parties or waiting for parts that are not immediately available.

Just another thing here, has anyone had experience virtualizing terminal servers? Of our 150 machines there are probably 70 terminal servers, each have 4 cores with 4-8Gb RAM. They are running near capacity with about 40 users each. Again, I suppose if we want to use blades we should at least be able to carry two of these per blade otherwise it does not make sense.

Thanks!

Jacques

0 Kudos
mreferre
Champion
Champion

Well jacques that underlines what I was saying. While the chassis could have potential issues you have experienced none (as expected) because the level of risk associated to a single chassis is very low.

It is in fact much lower than the level of risk associated to having a single big ESX server (where you have more points of failure and, more importantly, a single hypervisor running). That's why usually it's not suggested to have a single big ESX server (but at least 2 or 3 or more). Typically the discussions is "should I use 4 x 4-way servers or should I use 8x2-way servers?"; the discussions is typically not "should I use 4 x 4-way or should I use 1 x 16-way?".

As I said there is however a difference between a single big ESX server and a single big mainframe. With all respect to the VMware technology .... we are not quite there......

Again and again it is a trade-off.... using less (bigger) physical servers have pros and cons as it has pros and cons using more (smaller) physical servers. This is a document I wrote 4 years ago on the subject... while the technology has changed most of the criteria has not:http://www.redbooks.ibm.com/abstracts/redp3953.html?Open

TS has never been a shining workload on ESX. In fact I think it's one of the few workloads that is known to perform very poorly. Typically the problem is that you exhaust CPU resources supporting a fraction of the users you would support on a physical system. 70 TS looks a big farm to me and it really depends what's your bottleneck that only allows you to get to 40 users. If it's CPU I doubt you'll be any lucky moving onto VI3. The best thing would be to create the 71st server on the ESX farm and see what happens for real.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
Rodos
Expert
Expert

HP has identified a potential, yet extremely rare issue with HP BladeSystem c7000 Enclosure 2250W Hot-Plug Power Supplies manufactured prior to March 20, 2008.

This issue is extremely rare; however, if it does occur, the power supply may fail and this may result in the unplanned shutdown of the enclosure, despite redundancy, and the enclosure may become inoperable.

Massimo is that really playing fair? Smiley Happy If I was to search through the support docs for many SANs we could find numerous issues in hardware and software (I have seen that with EMC) that require a full SAN outage to repair (even with redundant SPs). Is this a reason to buy two SANs (some people do). But I am not playing fair either, buying two SANs is a lot more drastic than simply getting a chassis or not getting one in the first place. But you get my point. Can we see a concentric circle coming.

Rodos

Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}
0 Kudos
mreferre
Champion
Champion

I stated upfront I didn't want to bash HP with that (while I can say I laughed the first time I saw it months ago). I just brought it in for the purpose (and a data point) of the discussion.

The SAN discussion is interesting but it's tricky. You are somewhat forced to buy a single SAN Vs two SANs simply because dealing with 2 SANs (for HA reasons) is not transparent and no simple by any mean. So while I agree that there are more potential outages with single SANs than there are with single chassis ..... most customers will have to accept that 1 SAN is good enough but the same customers might argue that, since they have an easy way out option with the single chassis "issue" ... they won't go with it (i.e. they will go with 2 x chassis or standard rack mounted servers).

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
TomHowarth
Leadership
Leadership

The thing is Mass, I am not used to finding many people older than me still doing this line of work 😄




If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

Tom Howarth

VMware Communities User Moderator

Blog: www.planetvm.net

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
LoneStarVAdmin
Enthusiast
Enthusiast

I agree with Ken - the level of redundancy in the blade chassis (backplane, power supply, network & fibre modules, OBA, and cooling) lends itself as a highly available alternative to rack mount servers, provided your SAN fabric and ethernet cabling was properly planned. We are running HP c7000s with 14 active BL490c blades, dual XEON L5420, 32GB RAM, and no internal drives - of the 6 power supply units installed, two have never had to be utilized and the chassis operates at around 1/3 power. We tested disconnecting a FC and ethernet VC module and the failover to the other modules worked seamlessly.

0 Kudos
Ken_Cline
Champion
Champion

I stated upfront I didn't want to bash HP with that (while I can say I laughed the first time I saw it months ago). I just brought it in for the purpose (and a data point) of the discussion.

I have to agree w/Massimo here. We've known each other for a long time (and had good fun at each other's expense Smiley Wink ) - but, in general, he's not one to play the vendor card.

The SAN discussion is interesting but it's tricky. You are somewhat forced to buy a single SAN Vs two SANs simply because dealing with 2 SANs (for HA reasons) is not transparent and no simple by any mean. So while I agree that there are more potential outages with single SANs than there are with single chassis ..... most customers will have to accept that 1 SAN is good enough but the same customers might argue that, since they have an easy way out option with the single chassis "issue" ... they won't go with it (i.e. they will go with 2 x chassis or standard rack mounted servers).

I think that a Cisco switch might be a better comparison. If you look at a 6500 series switch, there is a tremendous amount of redundancy built in to the chassis. Many customers are quite comfortable with only one of those beasts, but most "enterprise" customers will opt for two to provide redundancy. I think the same is true for the blade chassis. Most SMB or SME customers are going to be willing to accept the (minimal) risk associated with a single chassis. The customer who has a multi-million dollar IT budget is much more likely to deploy more than one - both because they are concerned about availability and because they simply have enough demand to require more than one to satisfy the workload.

Ken Cline

Technical Director, Virtualization

Wells Landers

TVAR Solutions, A Wells Landers Group Company

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
Ken_Cline
Champion
Champion

Yes what we plan to do is to have boot and storage volumes on the SAN.

I am assuming that for VMotion to work we need to have at least the boot volumes there.

Are you referring to the host boot volume? If so, it doesn't matter where it lives for VMotion. If it's the VM boot volume, then yes - it (and all other VM volumes) must reside on shared volume for VMotion to work.

I am just not sure what will give the best performance (i.e. should we RAID 0+1 or RAID5 for boot?). The storage volume will vary according to application.

In most cases, the performance difference isn't going to make much difference, particularly if it's simply a boot volume.

Ken Cline

Technical Director, Virtualization

Wells Landers

TVAR Solutions, A Wells Landers Group Company

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
mreferre
Champion
Champion

Ken,

thanks for the first part... good example on the second..

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
admin
Immortal
Immortal

I agree with Rodos. I have also seen customers with 2 Blade Chassis in a C7000 6 Blades in each. An firmware issue affected all switch modules simultaneously instantly isolating all blades in the same chassis. Because they were the first 6 blades built it took down all 5 Primary HA agents. The VMs powered down and never powered back up. Because of this I recommend using two chassis and limiting cluster size to 8 nodes to ensure that the 5 primary nodes will never all reside on the same chassis.

My point is that blades are a good solution but require special planning and configuration to do right. . .

0 Kudos
Rodos
Expert
Expert

Virtek, great point!

You don't need to limit your cluster size, just reconfigure HA and it will spread the primary and secondaries out again. Its a really good point though, so good that I did a blog entry on it and quoted you, hope you don't mind.

Rodos

Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}
0 Kudos
admin
Immortal
Immortal

Thanks for the reference in the blog. You are absolutely right there are several ways to eliminate this SPOF. My actual recommendation to the customer was a a choice to either limit cluster size (eliminate potential for human error) or redistribute HA Primary nodes.

0 Kudos