suggestions advice etc. sought for how to justify ...

tlyczko · ‎09-12-2013

I must report to our CFO whose main interest and focus is finance, not IT.

I know we should upgrade our servers and SAN and I would like ideas and suggestions on how to justify it beyond the following valid reasons:

1) Two servers are almost 5 years old but they are already outdated and not on the HCL -- HP DL390 G5 servers with DDR2 RAM, which is barely made/sold now. They are almost ready for a new round of CarePacks, too...same for the SAN.

2) The SAN remains on the HCL but needs a complete new set of disks to accommodate future storage -- though the exisiting disks are 300GB 15k drives. 600 or 900 GB drives are possible but they are only 10k drives, reducing available IOPS. The SAN is an HP P2000 G3 iSCSI which is otherwise fine. It could be repurposed to a DR site.

3) One server is an HP DL360 G7 which could conitinue as a host server if need be...

4) The servers are not in a true N+1 setup -- if the G7 dies, the two G5s cannot handle all the existing VMs, even with the absolute minimum quantity of VMs running. One G5 + G7 might work...

5) Upgrading to more current hardware enables using newer VMware features...

7) We are getting nearer the SAN's storage limits, though no over-provinsioning messages have appeared, and as many VMs as possible are thin provisioned, but VMs and appliances seems to grow larger over time...

😎 We are on VMware 5.0 U2, we would skip to 5.5...

What other things should I consider and discuss to justify upgrading??

I'm sure I'm missing a lot of things to discuss...

Thank you, Tom

TheVMinator · ‎09-18-2013

compare support for VAAI based features on your current SAN with support for features on your new SAN. Enumerate all VASA and storage related features, VADP, xcopy, and compare support for these on your old and new array. Elaborate performance and operational advantages
compare monitoring support on your old and new servers. Look at what capabilities are available for bringing hardware level information into vcenter for monitoring through hp insight control. Look at how newer servers can surface more information and faciliate monitoring and response.
Why are they not in an N+1 setup? That sounds like not having enough servers rather than not having new enough servers. If the issue is not having enough servers, then buying new servers means you have to enable EVC on the cluster to be able to vmotion and use DRS. If you are out of cluster resources, better to bring all hardware to the same specs rather than try to implement EVC to make old and new work on the same cluster.
As far as getting nearer to the sans storage limits - are you looking only at space or have you evalauted limits in terms of IOPS, bandwidth, througput, and response time limits? I would evaluate ALL of these and know when you are going to hit limits on ALL of them and present that date to your boss. You might run out of IOPS before you run out of space and that creates an OUTAGE. Managers need to hear the word outage when a purchasing decision can prevent it.
The same with your drive speeds - know the aggregate limits of your disks and when you will run out of IOPS capacity as more vms are provisioned.
If you are using a DR site look at support for things like replication on the array. Compare remote replication features of old and new array and translate into meaningful business terms.
Consider whether you should be thin provisioning at the VM or the array level. Do a study of doing it at the array level with an array that supports it as there are some advantages to doing it on the array in some scenarios depending on your environment and operational model. If current array does not do it, make a case for it.

tlyczko · ‎09-18-2013

Many thanks for the detailed reply!!

It will take awhile to review and apply it, particularly collecting data. ☺

On the N+1 part, one server has 96 GB the other two have 32 GB apiece which it’s not practical to buy more because it’s old/expensive DDR2 RAM.

Thank you again, Tom

TheVMinator · ‎09-18-2013

In the case of the compute cluster design, I would step back and try to think through through your design. If you haven't already, check out Duncan Eppings "vSphere 5 Clustering Technical Deepdive". Use his arguments on HA slot sizes as basis for developing a solid compute cluster HA design. His explanation will help make the case for consistently sized servers in a cluster. If you can't failover all your VMs due to lack of N+1 redundancy, then you can use a clearly presented email stating this as a risk. You can state that VMs that can't be failed over if, for example, your largest server goes down - will experience an outage. You can state what the business impact will be in terms of time and money of these VMs being down if known. Present a solid cluster design solution requiring the new hardware going forward and the risks in writing of not implementing the solution with the required hardware.

To help you collect data on things like IOPs and do predictive analysis, capacity planning software will help you, the Enteprise trial version of vCOPS, or something similar will help you get started. Just vCenter Server data might be very hard to put together in a meaningful, convincing way in this scenario. Try the vCOPS "what if" scenarios as an example for a presentable explanation of what happens when your enviorment has 20% more VMs 6 months from now on your existing, unupgraded servers and storage array. If one exists, get the HP storage adapter for VCOPs to aid your vision and integrate the information. That may help your case for new storage hardware.

Josh26 · ‎09-18-2013

For a finance person, you'll need to work with him on calculating the cost of downtime.

At some point he'll say "OK, so if that server fails and has no warranty, I'll need to spend $x to buy a part, and we'll be offline for one day", you say "but they don't make them any more, so what you will have is a surprise hunt through eBay which may or may not be fruitful. If the latter occurs, you're in for a surprise new server whether you want to buy one or not. And it may take weeks to get sorted out.

He'll inevitably feel that being without computer isn't so bad, so you start to say "can we actually sell a product during that downtime, or are you paying the salary of people who are just sitting on their thumbs?"

regnak2012 · ‎09-19-2013

Hi Tom,

Bit hard to talk financial when you're a techie. See if you can get help from your reseller to put a TCO case together he will understand! The Care Packs to keep that equipment under maintenance go up after 3 years and keep rising after that, get some figures and graphs to show at what point you would be better buying new now than running another 2-3 years on the old kit....

If you have NO maintenance on key kit then find out what the leadtime is like on a replacement, add delivery, build time and testing and out of hours overtime to replace the most critical / expensive component and throw that on top of the outage cost to the business. Can people work without their IT servers, go to the business and ask the managers directly? Ask them for costs....

Keep a record of issues and rate them according to severity, downtime etc and show if the systems are getting more unstable over time. Calculate the actual hours spent in fixing vs operational activity, $$$!

Then use this to justify a proper DR so if you get budget for new, add DR to it you sell a business benefit with the hardware refresh. Just pick a really warm climate to base the Datacenter, somewhere with a beach and remote wifi and you're set...!!!

Mike

All

suggestions advice etc. sought for how to justify upgrading host servers and SAN