VMware Cloud Community
rayvd
Enthusiast
Enthusiast

Linking IBM x Series servers. Advantages?

I'm considering the IBM x series (x3850 M2 to be exact) to power our virtualization infrastructure. One somewhat appealing looking feature of these servers is you can tether them together into one "node" so to speak (someone give me the real name for this technology!).

My question is, what sort of advantage would 4 systems tethered together and appearing as one node to ESX give me over four separate systems running four separate copies of ESX? Aside from manageability, is there any performance benefit? There doesn't seem to be any licensing cost advantages doing it this way either.

I like the IBM servers, I'm just not sure that this feature gives me anything....

Thanks...

Reply
0 Kudos
13 Replies
gary1012
Expert
Expert

You'll get numerous opinions on whether to grow it tall or grow it

wide. IMHO, it's a matter of personal preference. If you elect to link these

units together, there are some best practices to address NUMA and node

interleaving that you should be aware of. See pg 14 here.

Community Supported, Community Rewarded - Please consider marking questions answered and awarding points to the correct post. It helps us all.
Reply
0 Kudos
AndrewSt
Enthusiast
Enthusiast

It isn't clear the advantage of the 3850M2.

We just had a sales call to discuss this with an IBM vendor. They said that one advantage is the ability to add a node, and ESX server (as long as you have the licenses) will recognize the increase resources and begin using - without a reboot. I don't know if I believe that.

So, scale vertically, instead of horizontally. Personal preference and the fact that the 3850M2 ends up costing $2-4k more than the competitors (Dell, HP). Hard to justify the bottom line. Although, if we can verify that the nodes can be added/removed from a running ESX server without a reboot - suddenly that uptime value becomes important.

-


-Andrew Stueve

-Remember, if you found this or any other answer useful, please consider the use of the Helpful or Correct buttons to award points

----------------------- -Andrew Stueve -Remember, if you found this or any other answer useful, please consider the use of the Helpful or Correct buttons to award points
Reply
0 Kudos
rayvd
Enthusiast
Enthusiast

That would definitely be a plus -- especially given that these machines would be targeted for a HA environment.

Other reasons I've heard to use the x series is that a) it's the fastest of the bunch (Dell, HP, IBM) and also uses less power thanks to using regular DIMM's vs fully buffered. I'm not sure how much of this is smoke and how much we'd really see an improvement in reality.

Reply
0 Kudos
rayvd
Enthusiast
Enthusiast

One other thought (brought up by a co-worker). Does NUMA only run on Enterprise? If it runs on Foundation and does support the "hot swap" addition and subtraction of nodes, could this be used as a poor man's VMotion? How robust is ESX's response to an unplanned removal of a NUMA node?

Reply
0 Kudos
kcollo
Contributor
Contributor

Currently, our setup is running 3 - 3850 M2 servers as the main cluster. The tier2 cluster will soon be replaced with 3850s as well. The only problem I have had with them is that in ESX, the raid card battery will throw an alert. VC will show it as an error, eventhough the status shows fully charged. This has been just a minor annoyance. Other than that, they have been performing amazingly. Servers have now been upgraded to 128gig of RAM. We run them standalone, so they are not tethered together. They are also FC over to 2 different storage arrays. For one more added benefit that I don't think the Dells will give you, google "3850 snoop cache". My favorite part is how module the server is. Seperate memory boards for RAM, and 2 hot swap PCI-X card ports. Here's a few links over to posts referencing vmware and 3850. When the replacement tier2 servers come in, I do want to try and "tether" them.

http://blog.colovirt.com/2008/11/17/ibm-3850-m2-and-vmware/

http://blog.colovirt.com/2008/12/11/vmware-hardware-ibm-3850-m2-statistics/

Kevin Goodman

Linux / SAN / Virtualization

kevin@colovirt.com

http://blog.colovirt.com

Reply
0 Kudos
meistermn
Expert
Expert

It is IBM's chipset EX4 and having a Level 4 Cache. HP , Dell and Sun have all the same standrad chipset .

The X3850 M2 has although a numa archtecture for Intel Servers. HP , Dell and SUN do not have it. They have only AMD Servers for Numa architectures.

The Intel architecture will change with nehalem servers.

Interessting would be new vmmark results with 64 Coes or 96 Cores with 4 x 3950 M2 (which is a X3850 M2).

If you look at the vmmark results , you will see that the 16 Core AMD Shanghai CPU (e.g. HP DL 585 G5 or Dell PowerEdge R905 scales best.

24 Cores from Intel are not faster than the 16 cores from AMD. And AMD is only running at 2.7 GHZ. So bad for Intel, So Intel needs their new Nehalem Plattform.

32 Cores from Intel and AMD don't scale good against 16 Cores form AMD. AMD needs Hytransport . Intel needs nehalem beckton platform.

Reply
0 Kudos
mreferre
Champion
Champion

Rayvd,

first off there is no x86 Operating system as of today (which I am aware of) that does allow hotplugging CPU resources on the fly. This is true for ESX as well so adding a second node to the master would not work (sw wise). Plus the plug of a new node requires a reconfiguration of the scalability configuration (through the RSAII interface) which requires the two nodes to boot together to "merge" and have an 8-socket monolithic single system image.

In a nutshell there are two main advantages are in my opinion:

- potential cable reduction. Given the # of HBAs/NICs are not (likely) the bottleneck in your host, adding a seprate 4S server to the cluster would force you to cable it to the backbone (FC/Eth). Scaling up an existing 3850M2 node with another one allows you to increase the VM your cluster supports using the same # of ESX server you have as well as w/o requiring new ports on the switches.

- manageability as you noticed: the fewer system you have the easier is to manage them (this has obviously a backside which is they become more critical - this is where usually religious discussions come into play).

FYI I am working on a presentation for VMworld Europe 2009 that discusses this very (religious) thing: scale up Vs scale out. See this: http://it20.info/blogs/main/archive/2009/02/04/175.aspx.

I'd like you to see the draft of the presentation so that you can give me your feedback (if you want to do so please send me a private email at my e-mail address).

Thanks. Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
mreferre
Champion
Champion

I forgot to mention that if you use 6 cores CPUs you cannot merge 2 x 4S chassis as VMware is limited to 32 pCPU (vs the 48 you would get out of such a setup).

This will change.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
rayvd
Enthusiast
Enthusiast

Thanks for the reply. I am reading through your paper right now.

I'm curious what would happen if a member node in a NUMA system failed or went offline. What would happen to the ESX environment? Would all VM's crash and the ESX environment need to be reconfigured for just one node?

I'm just trying to get a feel whether or not I absolutely need VMotion (it doesn't sound like NUMA can substitute for this, but as you describe in your paper an inherently more HA server may make it less of a requirement).

So my advantages on the xServers are more along the lines of performance and power usage right now (vs the R900 class servers in the Dell world).

Thanks!

Reply
0 Kudos
richardmcmahon
Contributor
Contributor

It would be interesting to know what happens if the master node needs maintenance work. I would expect this would mean the whole server would beed shut down. If this is the case you would need to build a least 2 sets of xseries boxes and still need vmotion to maintain uptime on your vm's since pure HA would only crash restart the VM's. A 2 node VM cluster be a better architecture since you are not sticking all your eggs on a single node even although it is made up of several discrete chassis.

Hope this is helpful

Richard

Reply
0 Kudos
mreferre
Champion
Champion

The master role is typically associated to the chassis with the boot disks. If the master chassis fails you won't be able to boot anyway. There have been discussion as to how implement a fully redundant config (i.e. SAN boot with a pair of FCA distributed in different chassis) but the complexity associated in doing (and supporting) that is big (I won't get into the details).

I suggest you look at the 8S 3950M2 as a BIG brick (compared to smaller bricks such as 2S / 4S) if you really want/need to scale up. I wouldn't say the 3950M2 has inherent features that allows you to get rid of standard VMware features.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos
dpomeroy
Champion
Champion

Massimo,

Do you ever get sick of doing the "Scale Up vs. Scale Out" presentations Smiley Happy

Don Pomeroy

VMware Communities User Moderator

Reply
0 Kudos
mreferre
Champion
Champion

Smiley Happy

Welcome back Don....

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
Reply
0 Kudos