Sorry this is long, I just wanted to paint the picture of what my environment is and what issues are popping up. If you do read this, reply and help, THANKS!
I need some feedback on all you VMware Pros out there on running MS SQL databases and other apps in VMware ESX 4. We have been a VMware shop for 7 years now, and we have grown over the years and been through several upgrades. I'm a VCP, have been pretty much the sole Administrator of our environment the entire time. I am not however a SQL DBA, so consider myself fairly unknowledgeable when it comes to tuning and working with databases, particularly MS SQL, but I am not a total noob. This creates some friction with vendors and app guys fairly constantly.
I go through the usual routine about every other month about going physical with all of our busiest app servers, and leaving VMware for the smaller servers such as print servers. So far however, we have always managed to continue to virtualize and grow. But, everyday, I get the usual routine from app people and non technical types about adding MORE CPU's and RAM every time there is a perception of slowness. Sometimes, they come to me right out of the box and want 8 CPU's and 8GB of ram on every new server, but 2 weeks after moving into production, I look at the performance numbers and see 5% cpu usage, and 3GB of consumed memory. There are several boxes that do spike up for a few hours while a report runs and if the numbers show were bottlenecking I will add the resources. Typically, I insist they start small and let us attempt to correctly size a box, but its often a case of me vs's the mob and I am forced to 'give in'.
I do have a few allies, but I'm increasingly outnumbered as we have grown. I HAVE read manuals and white papers and understand the majority of the docs out there. I have done my own testing and experimenting with various settings and have a fairly good working environment from my point of view. I have seen several very busy servers run amazingly well, that accounts for why we continue to stay virtual. However, I have seen times where the environment does not appear to perform as well as I expect and there's always room for improvement. I feel that overcrowding and over allocation contribute in a number of cases.
I have 3 blade environments, but in this case I will narrow it to 2 of the host types and give a quick rundown of what we're running.
The first is a c3000 HP blade array with 8 bl460's they are dual quad core intel boxes, with 32G RAM, 4G FC connections and 1G Ethernet each.
The second is a c7000 HP blade array with 4 bl680c's they are quad six core intel boxes with 40G RAM 8G FC connections and 10G Ethernet each.
My SAN is a compellent system, its got approximately 64 disks, in 4 enclosures (shelves of disks) 1/2 are FC disks the lower tier is SATA. Benchmark tests show throughput depends upon setup factors of the vm, but in general, 150-200MB/s when I throughput test and I can get iops approx 3000-4000 or more when my settings are geared for iops. Some tools report differently, so there is always some wiggle room.
We just added the new c7000 blades so I just started to migrate, but before the upgrade, on the 8 x 8way boxes I had 113vms, 275GB of RAM allocation, and 260 allocated vcpu's. With 256G of physcial RAM we used 275G, no a too terrible of an over allocation, however I had to plead to upgrade physical ram from 192 when ballooning was starting to cause me misery. However, I do feel that having 64 physical CPU's and allocating 260 virtual CPU's seems a bit much but the overall ready values and total CPU overhead of the host are not too bad, but we have occaisonal spikes.
Now for the 'get to the point' moment I have once again been confronted with the 'we cant virtualize MS SQL' mantra. I can agree that physical servers DO run faster when doing benchmark tests. Especially, Direct Attached Storage. However, we do officially run several databases on vm and some on physical boxes and yes both do have performance issues. I have done several benchmark tests myself, and even have come up with a few tricks that can make my SAN smoke with performance, if I sacrifice a little fault tolerance in the setup of the .vmdk. What I don't have is a ton of MS SQL virtualization experience to go with it. I will repeat, I have seen several db boxes run just fine and operate to the satisfaction of users and admins. The issue often resolves around the 'when to go physical' question.
On several occasions vendors and dba's and app types have all at one point or another whined that the virtualization was the 'root cause' of the problem. Only after some painful weeks/months of going back and forth jumping through hoops (and 'giving in' to increased resources in the face of irrationalism) trying to prove to them that I have given enough resources and performance was within tolerable ranges, do they dig deeper and find their problem was some lame configuration issue. THIS HAS HAPPENED MANY TIMES and is quite frustrating!
I would like some feedback and experiences from those of you who have the knowledge to help steer me into the answers I need to arm myself with when these issues arise over and over. I searched for some good doc and whitepapers for running db's on VMware, concerning sizing of volumes, CPU numbers, RAM considerations, iops and throughput but never really come up with real world info.
Thanks again for feedback and reading this novel!