VMware Cloud Community
siranar
Contributor
Contributor

Planning for big setup

Hi Everyone,

We are running one of the bigger dating sites on the internet, and are currently around the 200th most visited website worldwide. We have over 1 million unique hits per day, and about 120 servers.

I am considering moving to a fully virtualised infrastructure, based on blades and centralised storage. After reading a lof of information we are heading for the direction of dell/netapp. More precisely we are considering

buying 10 netapp blade chassis with 160 blades in total. I am also considering running vmware on systems that are 100% loaded (we would be running just one VM on one ESX). I would love to hear some feedback about

doing this and if its a good or bad idea. The extra price doesn't really matter - its the comfort and managability that does. Ideally we are going to make all 160 blades exactly the same - dual xeons with 16 GB ram (ram is cheap

nowadays). I will shortly explain what we have right now and what we require:

mysql servers: 20 dual xeons with 4 15k rpm drives configured in raid 0, they are mostly at 100% load, and so are the disks (we need to get more servers)

webservers: 40 webservers, dual xeons with 2 15 k rpm drives, plus an NFS share which they are all connected to - moderate load

30 random machines which are running various things

6 machines that are running load balancers to spread the load for the above mysql and webservers

To repharese my main questions are:

Running vmware one on one - yes or no? (price isnt an issue)

We are getting pricing for a netapp system with about 200 disks (15k rpm drives) - should we use NFS for vmware?

Any remarks on the above and/or recommended best practices? If you think netapp/dell/nfs is a bad choice please let us know. What we really

want is flexibility so that when there is more load on the site and when we grow, that expansion is a breeze. Also, rebooting and debugging remotely is a must

since driving to the datacenter is a waste of time and resource.

Thanks in advance!

0 Kudos
25 Replies
PhilipArnason
Enthusiast
Enthusiast

We are a 100% Dell shop and are in the midst of migrating data to Netapp storage. I really like NFS rather than the alternatives due to easier managability. People will tell you that using iSCSI as a mount for your VMs has the same amount of performance, but we noticed significant performance advantages using NFS. SQLIO and Microsoft's Jetstress were our benchmarks.

We have space galore in our datacenter so we didn't bother with blades. In an environment were you are short in rack space I suppose they make sense. Dell has worked very well for us. I would strongly advise you to put more than 16gb of RAM in your servers no matter what you get. In our environment we always run out of memory before CPU despite the fact all of our servers are running with 32gb of RAM. I understand you are quite a bit more CPU intensive than us but nevertheless I predict that you will find you can put more than a 1 to 1 using VMware despite the fact that you are running 100% now. If you are getting quad core dual processor servers, you are almost certainly going to be underutilizing them, because the maximum number of cores a VM can have is 4. The best way to figure this kind of stuff out is to just play around with it. Perhaps virtualize one of your SQL servers and then check out the load of the guest and then also the host.

As you go through the planning process EACH ELEMENT will need to be tested for load. This means the load on the disks, the load on the network if you are using NFS, CPU load, memory load etc. etc. This can't be stressed enough, and you are unlikely to get a simple yes or no answer out of anybody here. If the network ends up being too loaded, you can probably still use NFS by using 10gb NIC cards, or using Etherchannel to bond Nics together. Both VMware and Netapp support 10gb NICs.

At any rate, VMWare, Dell, and Netapp should work well for your project if properly sized. Let us know how the testing goes, it sounds like an exciting project.

Philip Arnason

Texiwill
Leadership
Leadership

Hello,

mysql servers: 20 dual xeons with 4 15k rpm drives configured in raid 0, they are mostly at 100% load, and so are the disks (we need to get more servers)

If these are at 100% load then they may not be good candidates for virtualization. It is possible but most likely you will want a high end SAN solution and not a NFS/iSCSI solution. Are you planning on expanding this to how many VMs? You want to look at Memory utilization, disk, network, and CPU utilization to make this determination.

webservers: 40 webservers, dual xeons with 2 15 k rpm drives, plus an NFS share which they are all connected to - moderate load

Again these may not make good candidates look at CPU, Disk, Network, and memory utilization. You will need to state what 'moderate' is.

30 random machines which are running various things

These most likely make fine candidates.

6 machines that are running load balancers to spread the load for the above mysql and webservers

These tend to be heavily loaded so not necessarily good candidates.

Running vmware one on one - yes or no? (price isnt an issue)

Not sure, depends on whether or not these are good candidates.

We are getting pricing for a netapp system with about 200 disks (15k rpm drives) - should we use NFS for vmware?

4-8GB SAN in this case. You will be I/O bound.

Any remarks on the above and/or recommended best practices? If you think netapp/dell/nfs is a bad choice please let us know. What we really

want is flexibility so that when there is more load on the site and when we grow, that expansion is a breeze. Also, rebooting and debugging remotely is a must

since driving to the datacenter is a waste of time and resource.

Well DRAC cards will help here....

What you have is a very large installation and ESX will work, you need to plan this rollout very carefully. I would run quite a few tests. First, take your system and compress it into ESX as a test case. 100% utilized systems do not always make good candidates..... Read everything you can from the http://communities.vmware.com/community/vmtn/general/performance forum. There is one on how they got 2 million IOPS for a db. See how that was done.

As for testing, I would take your suite of applications and determine how they respond. Use more than one ESX host to achieve this. Also see how the load works when you have many VMs vs very few. Also the split of applications across ESX hosts makes a difference as well.

I support a site that gets 202 Million queries a day on 5 hosts.... It is designed to go to 3 times that amount. In essence, I would NOT virtualize this as the network IO is VERY intensive. I would test it on perhpas an 8 way Quad Core (HP has one) with minimally 128GBs of memory and dedicated pNIC for each VM. Blades do not have enough pNIC for this setup.

Note that you will need minimally 8 pNIC per host to support the basic functionality and I would suggest 4-Way Quad Core blades for your tests.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
Ken_Cline
Champion
Champion

Edward, using a one VM per host ratio (as the OP suggests) will address many of the issues that you raise. The OP will be limiting the performance of each host (since a VM can use only 4 cores and 64 GB RAM), but, if he's willing to accept that limitation and scale out as opposed to scaling up, he should be OK. His stated purpose for virtualizing is to gain the flexibility of VMs, and he states that the additional licensing costs are not a real concern. Given that, a well architected solution should be able to scale out to support his needs. The biggest limitation is going to be the storage subsystem - and I agree with you that I would recommend a FC setup with either 4Gb or 8Gb fabric going to a robust backend. I'd suggest something that will scale well beyond what he sees as the original requirement, because chances are good the environment will grow faster than anticipated...

Ken Cline

Technical Director, Virtualization

Wells Landers[/url]

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
ChrisDearden
Expert
Expert

It sounds like consolidation isn't the aim for this project , but more the flexability of provisioning new guests from template and the HA features of ESX. With such a low guest per VM density, running machines with vSMP shouldn't be an problem , even with zero% overcommit on CPU. You may find that you can get more than 1 guest per CPU

How much memory do your webservers use usually ? they might a better candidate for consolidation , especially if you have hosts running quad core CPU's ( you didn't mention the age of your current hardware - is it muti core ? ) I think FC would possilby be a better choice for the storage as the other posters have mentioned , simply for the higher IO. Building the environment out gradually with plenty of validation at each stage is clearly the way forward here.

If this post has been useful , please consider awarding points. @chrisdearden http://jfvi.co.uk http://vsoup.net
0 Kudos
Texiwill
Leadership
Leadership

Hello,

You may be able to get 2 VMs per host actually. But even so, network IO and disk IO will be an issue. You could also get more VMs per host if you were to split up the load amoungst the VMs differently. It would take a fairly intelligent Load balancer to do that as well.

The key would be to never overcommit a single resource. So if the blades were 4-way Quad Cores you may be able to place 1 4vCPU VM, 3 2vCPU VMs or 7 1 vCPU VMs. Granted the proper amount of physical memory and networking. You then end up with storage being the major issue.

I would want to see more utilization numbers and come up with a set of tests that could be used to stress the system before I would purchase anything.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
siranar
Contributor
Contributor

Hey Guys,

Thanks for the answers so far. Is there a reason why you want to do FC instead of NFS? We have heard so many good things about vmware-nfs, and it seems a perfect fit? I don't think IO would be a problem

as long as we have enough disks in the backend?

I'll try to come up with some current utilization numbers soon.

Thanks.

0 Kudos
PhilipArnason
Enthusiast
Enthusiast

We are running NFS and I'm extremely happy with it. The only downside that I can think of is that there is more of a fiber community out there, even though I believe it is a bit of a dying technology. I prefer to be bleeding edge in this regard, but others may wish to stick with fiber for the community support....

Philip Arnason

0 Kudos
Texiwill
Leadership
Leadership

Hello,

VMware did a performance test and got 2 million IOPS with Fibre, granted the array was a huge array with a large number of disks. THat is an extremely impressive number and truly depends on the # of spindles. However, given that this is possible and that we are talking about a disk intensive activity, fibre is generally the best approach.

Yet, as I stated, create a test set of VMs that you can then use to verify that all will work and have a bake-off between the storage vendors. You may be surprised. Your test should include enough web servers and db instances that really mimic your environment. You can then use this to determine the workload on each system and which storage works best for your VM load.

Some swear by NFS, others iSCSI, and other Fibre. Only testing your environment on each of these will tell what works best for your load.

As always with Virtualization there is 'alot' of it depends.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
azn2kew
Champion
Champion

Our client is extremely happy with their NFS/NetApp solutions and performance just great and very flexible. If you're looking to maximize this deployment your concerns always been storage/network bandwidth and you can check out Neterion product which is a 10GB solution that allows you to manage bandwidth per application server or allocated whole 10GB to the server itself as well. I believe it cost around $1000USD per card check them out for details. This will work really nice for NFS solution especially with high end NetApp gear plus talking with NetApp & VMware integrators they will give you the best test reports and granted some demo gears if you requested. If your deployment project is fairly large, than asking NetApp for live demo onsite isn't a problem.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA
0 Kudos
meistermn
Expert
Expert

Want happens if the 10 Gbit card is fully used. I think cpu 1 core 1 will use the complete resources. What is esxtop for cpu 1 reporting?

0 Kudos
williambishop
Expert
Expert

I'm going to throw in here. Since you'll be going from dual xeons to quad core xeons(if I"m reading correctly) you will get more bang per server. Even using a 1 to 1 approach(and I know people who do simply because management is extremely easy afterward), you will see a lot of benefits. Just to be able to migrate it off for maitenance makes it worth the process alone. HA is icing on the cake.

Storage. Obviously, with the kind of load you're looking at, Fiber is king. I see sales going up, and performance going up, and no one I know that is actually in storage sees fiber going away anytime soon(in fact it's commonly understood that for the enterprise environment, it will go higher). 8 gig fiber? You're not going to beat that with anything in the ethernet world. No one seems to get that ethernet means IP traffic. Windowing, small payload, retransmits....The more I hang around here the more it surprises me when someone says "iScsi or NFS will be faster". No, it will not. Not a chance. When performance is the number one critera, it's FC all the way. Both the others have their place, and if you needed to archive images to storage or radiology, or the like, then yes, either of the others would be perfect. But if balls to the wall is what you're after, you're going to be looking at FC. Already stated, price is no object, so do it right.

So, don't discard fiber, in fact, I would recommend it. Make sure you set for small block transfer sizes, put 2 or even 4 paths to each server, setup load balancing and what the data fly.

You're really only looking at about 50 servers or less to carry the current load. If you do blades, you can probably get everything you need in single rack, including the storage(but I would recommend spreading it out for energy, weight, and heat purposes. In any case, this is a medium setup, but with a good budget, I'm sure you will be happy with the results either way you go. Test anyway, that is the key to a successful project.

--"Non Temetis Messor."
0 Kudos
siranar
Contributor
Contributor

What do you think about that guys Smiley Wink

They say fibre/iscsi and nfs are almost compareable.

0 Kudos
williambishop
Expert
Expert

I'd say you didn't read the entire document, or didn't read it carefully.

1. NFS is within 7-9% in one category(the category right under it is more telling). If you look in detail, you'll see that the other protocols(depending on the workload size) used upwards of 40-70% more cpu to get within 7-9% of FC(nearly 10% faster can be a crucial detail in a setup that requires performance such as the one presented on this thread).

That cpu hit affects the vm's which brings performance down. Keep in mind they didn't have these boxes loaded with other vm's or apps. Do you think you will get the same number of VM's on that host now? No, you'll get 40-70% less workload on the box compared to the FC transport only.

I would also point out that for NFS and iSCSI, ethernet is already set to a suitable size for 4 and 8 k requests, and I don't see where they configured the FC(it's not specifically mentioned) setup to run smaller sizes. The higher you go, the bigger the difference is between FC and ethernet based protocols(you go past 16k and you start calculating FACTORS of increased, not percentages of 1 factor)

We've been pointing out and this pdf actually bears it out...You get the same 4 lane highway, but you don't get the low latency, low cpu utilization, and certainly not the performance above 8k, with NFS or iSCSI that you with fiber. So, it's your choice, the escort or the porsche. The same road, different cars. The price difference to me is not enough for me to crawl behind the wheel of the ford.

They took one bit of the paper, focused on it(for purposes of sales obviously) and presented that piece as the summary. Read closely through this again, don't read their summary, then post your feelings on it, make sure you read what they want to gloss over. They admitted very clearly NFS is "NFS a lower cost alternative". Their future is in NFS(which is an EXCELLENT protocol, don't think I'm dissing either protocol, I'm not), and iSCSI. Most of them are.

--"Non Temetis Messor."
0 Kudos
Texiwill
Leadership
Leadership

Hello,

This depends entirely on the load you are using, the fabric (switching network) involved as well as the SAN/NAS in use. There are some extremely high end SANs that will outperform most NFS/iSCSI servers today. But that again depends on configuration, more than anything.

I would not make a choice like this off documentation. Have a bake-off between your various storage vendors using the load mix you choose. See which gives the best performance, fastest failover, and best business continuity capabilities.

I would device a test suite using the virtualized versions of your existing infrastructure with enough load generators to really test the system. Start low, then really crank up the # of hits. Try multiple VMs per blade as well. But note, you may want to switch from blades to something else, you will need more network ports than most blades offer if you use NFS.... Plus most likely 10G ports. You may be pushed out to 1U or 2U servers.


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

CIO Virtualization Blog: http://www.cio.com/blog/index/topic/168354

As well as the Virtualization Wiki at http://www.astroarch.com/wiki/index.php/Virtualization

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
siranar
Contributor
Contributor

Thanks for your enlightening reply. The advantage of netapp would be that we can do both fibre and nfs tests natively once we aquire the unit.

Do you have good experiences with netapp and fibre? Or would you recommend against it.

0 Kudos
williambishop
Expert
Expert

Actually I love netapp, but you stated budget was not an issue, and maximum performance was the requirement.

--"Non Temetis Messor."
0 Kudos
siranar
Contributor
Contributor

Yes, 100% true. But I just wanted to make sure that fibre on netapp runs as well as fibre from other storage vendors Smiley Happy

So I assume you have good experiences with netapp + fibre then?

0 Kudos
siranar
Contributor
Contributor

With fibre I mean fibrechannel of course Smiley Happy

0 Kudos
williambishop
Expert
Expert

I haven't done fiber on netapp, but I've not heard any of my fellow storage geeks ever speak poorly on netapp, and my ethernet experience with it has been great. Ethernet is the future of storage, it's just not the porsche that FC is....yet.

Netapp builds good storage, I don't think you would be disappointed(especially since most of it is FC internally)

--"Non Temetis Messor."
0 Kudos