vheff
Enthusiast
Enthusiast

Virtual machine technology getting unfairly blamed for peformance issues!

Hi all,

I wonder how many of you have had the same troubles as I'm getting in my organization? I work for a government body, and we have our own IT development department. Over the past 8 months, we've been working on a high profile project to provide a web-based portal, and as the infrastructure project-lead it was my decision to go with VMWare VI3. Since VMWare VI3 was introduced into our environment back in Feb this year, a couple of the lead developers haven't been too happy with using a VMWare environment for their applications. However, I did manage to convince them that there we can provide them with guaranteed resources using VI3 and resource pools and DRS.

A few times now when they've hit performance issue's with their application, they blamed our VMWare environment and asked me to investigate. The virtual servers for this project are very well spec'd by the way, with two web servers with 4GB Ram each, and 3Ghz Intel Xeon CPU's. When I produced some reports for them, they were shocked to see that the maximum memory percentage utilisation on each server was no more than 15%. The CPU was a similar story, averaging at less than 2%. Shortly after producing these performance reports, it all went quiet and suddenly they had no performance issues! I later found out that it was a bug with their application.

This week, I've hit deja-vu again! They have performance issues with the application again and they're blaming VMWare saying it's not got enough hardware resources available. I spent 5 hours last night investigating, testing, etc and found no performance issues with RAM, CPU, network, disk at all! In fact, the virtual servers in my opinion are over-spec'd, and will never hit resource issue's anyway. Even under their stress testing, the memory didn't reach 50% of utilisation.

It makes me angry, as I feel that they're very quick to blame a technology they don't understand, and it's my reputation at risk as I 'sold' them VMWare VI3 for the project. Has anyone else experienced similar attitude's towards virtualization technology?

0 Kudos
13 Replies
TomHowarth
Leadership
Leadership

I find this to be common practice, Devlopers will always blame their tools and not there own code. just keep putting the preformance information under their noses and remember to copy in Management. if need be request that you get a independent audit done at their projects expense to prove your point, you have nothing to lose, as your stats prove your point.

The Last time this happened to me was on a SBC transformation project. we later found out that the issue had always been there when the clients were running thick. offer them a P2V on the application server and see if the perfromance issue is there, MS do it and 9 time out of 10 they have egg on there faces.

I have rarely found an performacne issue on VI3 that is caused by virtualisation persay. it is usually caused by under specifiaction of the VM hosts or inapproiate viirtualised guests, from what you are saying this seems not to be the case.

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

Kind Regards

Tom,

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
vheff
Enthusiast
Enthusiast

Thanks for the response, I'm glad I'm not alone here. It's a good suggestion regarding the use of external auditors, but I just hope it won't come to that! In fact, they even said themselves that another virtual web server which is in production is performing fine. They didn't realise that this server is actually a virtual machine running on the same ESX host!

0 Kudos
oreeh
Immortal
Immortal

They didn't realise that this server is actually a virtual machine running on the same ESX host!

Maybe you should tell them Smiley Wink

0 Kudos
vheff
Enthusiast
Enthusiast

They didn't realise that this server is actually a virtual machine running on the same ESX host!

Maybe you should tell them Smiley Wink

I have told them. The ball is back in their court now Smiley Wink

0 Kudos
derekn
Enthusiast
Enthusiast

I run into this issue all the time. I deal with a lot of purchasing specialized banking apps, and most vendors say we don't support vmware. I ask why, and they always point to virtualization. I think you have hit the nail on the head, 9 times of 10 its always sloppy coding and they feel I have to justify getting a dedicated machine for their apps. I usually avoid that decision and set forth placing these apps on vm, and they seem to never notice. I usually just give the response of "yes its on a physical server". Smiley Wink

go easy...

-go easy
0 Kudos
TLKern
Enthusiast
Enthusiast

Welcome to the VM club. After 30 years of running virtual machines, I still haven't found a way to keep people from blaming VM technology for their poor performance. And when the project works out great , you and your right VM decision will never be mentioned in the reports to upper management.

My advice is to accept it, be prepared for it, don't take it too personally. Those people aren't worth the ulcers. (I knew I should have bought stock in Tums)

0 Kudos
vheff
Enthusiast
Enthusiast

Welcome to the VM club. After 30 years of running virtual machines, I still haven't found a way to keep people from blaming VM technology for their poor performance. And when the project works out great , you and your right VM decision will never be mentioned in the reports to upper management.

My advice is to accept it, be prepared for it, don't take it too personally. Those people aren't worth the ulcers. (I knew I should have bought stock in Tums)

30 years? Wow. I'm curious, I've only been working with virtual machines for 5 years. I'd love to know the history behind it all, and where / how it all started. Are you referring to mainframes?

0 Kudos
petedr
Virtuoso
Virtuoso

I see it all the times as well. It is always easier to point to someone else ( in this case the virtualization ) then to look at your own application when it comes to performance issues. Then as the vmware admin you have to prove that the fact that the server is a VM is not the performance issue. This is common in a lot of areas I think, when I ran an Oracle environment I got the same thing with developers there. The performance problems must be the database is not tuned correctly when in turn it usually was bad sql.

www.thevirtualheadline.com www.liquidwarelabs.com
0 Kudos
TomHowarth
Leadership
Leadership

Thoses are the babies, LPARs, I love 'em

If you found this or any other post helpful please consider the use of the Helpfull/Correct buttons to award points

Kind Regards

Tom,

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
0 Kudos
TLKern
Enthusiast
Enthusiast

Yes, mainframes have had VM technology for quite a while. I logged onto my first virtual machine on June 23, 1976. I haven't stopped since (sort of in a rut here). Back then, we were running 3 production virtual machines modelling the world's weather and a half dozen interactive virtual machines to edit FORTRAN, look at output, submit jobstreams, etc. All on an Amdahl 470V6 with 1.5 MB of memory.

0 Kudos
meistermn
Expert
Expert

1.) Have looked at high context switches (use process explorer)

source :

This is what I'd like to talk about: Memory hardware assists. With the new code-named Barcelona quad-core CPU due to be available in a few weeks with volume shipments in a few months, AMD is going to provide support for what they refer to as "Nested Page Tables" (or NPT for short) which is nothing but memory virtualization support.

A year ago at VMworld 2007 Sr Director R&D Jack Lo provided an illuminating session on the matter: VMware and Hardware Assist Technology (Intel-VT and AMD-V). This session provided a very interesting inside about the mechanisms that VMware is using today in terms of memory virtualization (i.e. Shadow Page Tables) that are basically a software "fake" that allows Guest OS'es to pretend to have full control of the memory address space provided to them while in reality it is the hypervisor maintaining full control of that. In fact if you think about it, in a standard x86 world, only one OS could run on the system and it is that OS keeping control of the hardware resources. In a virtual environment this stack is "screwed up" since the OS doesn't run on real hardware (and there are many OS'es running on the system) so the hypervisor needs to create this software re-mapping of physical resources into the Guest space. Mr. Lo also touched on future hardware assist technologies that should provide a performance boost in this area and AMD NPT was in fact mentioned. The good thing is that "future" at some point becomes "present" and here we are.

The whole idea is that now the processor itself can keep track of these two levels of memory space (i.e. the one that the hypervisor sees and the one that each guest OS sees) without any sort of software remapping being done within the hypervisor as it is the CPU that is able to maintain these multiple mappings onto the registries built into the silicon. What VMware has been suggesting lately is that while their "software binary translation" has better performance than the silicon counterpart Intel-VT and AMD-V for CPU operations, these Nested Page Tables will give a performance boost comparing to their own "software shadow page tables" for memory operations. Without getting into the specifics you should rest assured that VMware is going to intercept NPT support in future releases of the hypervisor in a timely manner. And no, if you were wondering, ESX 3.0.2 (which is the current version as of today) won't support NPT.

So when is this supposed to show big improvements? As always for performance related things it really depends on what you are doing. For the vast majority of CPU intensive and/or IO intensive workloads NPT won't make much of a difference. There are however some workloads that might gain huge performance benefits. Typically these applications are those with specific memory patterns. This does not necessarily mean virtual machines with big memory footprints but specifically virtual machines with a very high number of "context switches". A <context switch> occurs whenever a thread needs to leave control to another thread; at the high-level when this occurs the OS needs to save the volatile state of the exiting thread and load the previously saved volatile state of the next thread to be executed. On a standard physical system this is a procedure that the OS handles with the support of the processor while in a virtual environment the Guest OS tries to do the same but instead of getting hardware support to achieve the context switch the hypervisor traps the request and re-works it to fit into the real system resources (well what happens is more complex but you have got the point). This generates overhead especially if you think that you normally get hundreds if not thousands of context switches per second on a Windows system. NPT is all about getting rid of this software re-mapping and allow a much streamlined path from the Guest to the physical resource without the hypervisor acting as the "man in the middle".

I have come across a situation a while back with one of our biggest customers reporting "performance" issues in a particular virtualized workload. This was an in-house built COM+ application. During the analysis it turned out that the system under stress at peak hours was generating between 20.000 and 30.000 context/switches per second which is obviously a number that is well above the average number of context switches you would find on a Windows box. It is interesting that the problem being brought to my attention was not that the response time was not acceptable nor the application didn't scale. The problem was that the virtual machine(s) in subject were performing fine (in terms of response time) but CPU usage was absolutely abnormal: where a 2-cpu physical system running the same workload was showing an average 5-10% of cpu utilization with peaks in the range of 20-30%, the same workload in a 2-cpu vm would show an average 30-40% of cpu utilization with peaks in the range of 70-80%. And this was not an overcommitted ESX host obviously, it was an 8-way system and this was the only vm running on it at test-time. My current speculation is that this workload poses an extreme overhead on the hypervisor layer due to the very high number of context switches and this causes in turn a very high CPU utilization to handle the re-mapping. This is a circumstance where NPT would/could/should be a life saver (based on my speculations of course).

2.) How to configure process explorer for context switches?

Look at the figures. the rest is in german

3.) High kernel times though the application

configure this in task manager

4.) Maybe for testing useful

0 Kudos
mreferre
Champion
Champion

Did I see this? I still have to see an enterprise (SMB is another matter) where I don't see this pattern.

BTW this is the same reason for which, by default, ISV's will tell you that "virtual machines are not supported". Think about it: for internal developers and ISV's an hypervisor is just and additional layer that 1) won't buy them anything and 2) actually will introduce yet another variable in the overall picture. You bet they don't like it: x86 systems are so cheap that they will want you to buy "a new one" instead of sharing one. Yes it will be a systems management nightmare but who cares ..... they won't manage those systems ..... that's on you .... they don't care.

That's why it's important to go up to the person that has (management role) visibility for both applications and infrastructures. He/she will valuate pros and cons (and usually virtualization will win).

I find all this story a little bit frustrating ...... at the end of the day they want to use their innovative sw development technologies on our infrastrcutures and we don't tell anything ..... can you imagine if we start telling them "no you can't use AJAX on my infrastructure ..... C is the only programming technique we support on our systems"?. They need to understand that innovations happen at multiple level .....they just need to accept it... and they will in the long run.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
tom_e_reynolds
Enthusiast
Enthusiast

Keep the performance reports running, so when they start to cry wolf, you already have the data to justify. Eventually it will pass.

Also, don't tell them more then they need to know. I had a similar experience where the silly developers were comparing the "poor" performance of the virtualized box to the "great" performance of a "real" server. They just stormed out of the room when I showed them that both boxes were on the same server, as they were both virtualized!

Keep notes on your ROI. How much are you saving in dollars? That's what management will be watching in the long run.

0 Kudos