Big performance difference virt. Linux vs virt. Wi...

stevan · ‎02-25-2007

Hallo all

I have a small question:

I have over here ESX 3.0.1 running and many 100 VM's. Now my big problem is that if I run performance tests of an HTTPD server I get the following values:

Virtual Linux vs physical Linux:

The physical system delivers about 3.5 times more connections per second then the virtual Linux system.

Virtual Windows vs physical Windows:

The physical system delivers about 0.3 times more connections per second then the virtual Windows system.

Virtual Linux vs virtual Windows:

Windows outperforms Linux by about the factor 2.7

Physical Linux vs physical Windows:

Linux outperforms Windows by about the factor 1.2. Using more lighter HTTP servers on Linux leaves Windows much more behind. But this is not important.

I don't care if Linux is faster or slower then Windows. And I know that you can not 100% compare them. And I know as well that physical can not be compared to virtual. And I know that many factors play a role in the performance. And I have done those tests many times and +/- the result is the same. So it must be something in VI3 that makes a virtual Linux system to slow. I don't know what it is. But I have my hard time to accept that the difference between virtual Linux guest and physical Linux is so huge. Especially since those tests where done on very special conditions (one VM having one ESX server exclusively).

Does any one know any special trick to get the Linux guest to perform better? Or does any one know the reason for this incredible difference?

epping · ‎02-26-2007

can you see where the contention is, why is it going slower??

stevan · ‎02-26-2007

can you see where the contention is, why is it going
slower??

No. Currently I don't see the reason for the slowness. It is very difficult to spot the problem. But I would like to know what the reason is for the slowness. I know that we have a problem with the underlaying HW (the NIC) and VI3 not playing nice together. It is a bug in VI3 and we are in escalation with VMware (have some sort of workaround but not a real fix for that NIC issue). But even if I test on the loopback interface, I still get slow results.

There are so many things to look at. It is not that easy to 100% say that VI3 is responsible for the performance hit. However... today I again tested W2K3 and it does not seam to be that fast as the last time some one of our team members did the test. In fact today I am faster with a Linux guest then with a W2K3 guest. But I have to confess that I did the extreme approach now. I installed Gentoo in one VM and tried to max out the speed as much as I could. And this Gentoo VM is at least 2 to 6 times faster (depending what I test on the HTTP stack) then using Red Hat AS 4 update 2.

I really don't care so much about the fastest of the fastest. I know that a well made Gentoo system will +/- always leave a Red Hat system in the dust. This is not my point. The biggest point I had was that the W2K3 system had almost no difference if virtual vs physical (which I proved today to not be the case) and RH4u2 physical vs virtual had the difference of at least 3.5 and up.

I strongly suspect the vmware tools to be the problem. But I can not say that with 100% sureness.

Using now Gentoo as a test option was a good thing (but it was everything else then easy to get the vmxnet and other drivers to play under Gentoo and 2.6.20.1 Kernel). I could defiantly rule out some myths. I think that if some one is reading this message they will ask them self what my problem is? And that I should be happy with getting the double amount of speed by using Gentoo. But this is not an option. It was hoping to get closer to the speed of the physical system (and tagging the RH system to be the problem. At least in the virtual environment). But I am not getting there. Even not with Gentoo as the only VM in one ESX blade. The physical system is still double faster as the virtual. I would love to see maximum a 1.3 factor in that constellation (one ESX server and only one VM in that ESX server). But I don't see that. Currently I see a penalty of 50% and more when I use a virtual system. And I do just simple HTTP stuff. Nothing fancy and nothing big in CPU consumption and nothing heavy on IO. The only big usage is memory. But I have 16GB on that ESX server and my 32Bit VM only uses around 4GB. So memory is not the problem. It can't be. Not if the VM is the only VM on the ESX. It has the whole ESX for it self. It does not need to share anything with another VM. Only the ESX server and one VM. Why then this insane difference in performance? What is going on? What is the problem? How to spot the problem? How to isolate certain problem areas?

Any help or hint would be appreciated.

CWedge · ‎03-02-2007

What kind of system are you using ?

On the ESX side if you are using a 4cpu host, you'll want to manually map 4 cpus in the VM and thier respective Affinty to the physical processors to get the best performance..

stevan · ‎03-02-2007

What kind of system are you using ?

HP Blades (BL25p and BL45p)

On the ESX side if you are using a 4cpu host, you'll
want to manually map 4 cpus in the VM and thier
respective Affinty to the physical processors to get
the best performance..

Hmm... I was hoping that I don't have to go that way. But I will try once to look if this changes drastically the performance.

Thanks for your answer.

BigHug · ‎03-02-2007

VM is single vcpu, right? The CPU penalty is around 0-6% by the VMware report and I got the similar results (3-8%). The network is also 3-6% by VMware. The I/O will probably takes some hits. I didn't really measure it yet. Since you stress test the http server, if the server log is recorded, there might be a heavy IO.

I'll bet on the network. Some people have proplems pumping the network of the VM. You might need to break down the component and test one by one. I'll test the network first.

All

Big performance difference virt. Linux vs virt. Windows guest and physical