devinnate
Contributor
Contributor

ESXi 4.0 poor (one quarter) performance of workload running on Linux (shell scripts) - specific example given

Jump to solution

Hi everyone;

Basics:

Our vmware environment is vmware esxi 4.0 on an IBM x3650, dual quad-core CPUs, 32 gb RAM, 6x 450GB 15k RPM drives (local). There is a single vmware guest which is idle except for this test. The guest is a RHEL 5.3 x86_64 fully patched host. Best practices have been used: vmware tools current release is installed, vmxnet3 driver is used, kernel boot params are notsc divider=10, all 6 drives are in a raid10 array, elevator is set to noop.

The above vmware environment, as it relates to performance, is facing off against a 5 year old IBM x336, which has an identical copy of RHEL installed.

Scenario:

We have some shell scripts which perform a rather complicated software install. Without going into it too much, the scripts do a ton of calls to subprograms (e.g. VAR=$(echo $line |cut -f1 -d:) ). This script is working horribly slow in the new vmware and otherwise idle vmware environment (less than 1/2 the speed, perhaps only 1/3 the speed) of the 5 year old physical server. Since the operation is a 15 minute operation at the best of time, it's now taking 30-60 minutes to complete, on a brand new server.

As a testcase, I made this script (yes I know "let i=i+1" would be faster, but not reflective of the actual workload.

#!/bin/sh

i=0

while \[ $i -lt 1000 ]

do

     i=$(echo "$i+1" |bc)

done

I then run this script (which fundamentally incriments i from 0 to 1000) and time. I time it using an external timer because I know vmware can skew times a bit. Nevertheless, the results paint the picture. The above script takes about 8.5 seconds on the guest os, and about 2.25 seconds on the physical host. Basically, it's dramatically slower.

I've been reading the VI 3 and vSphere 4 performance best practices.. I've been reading that potentially binary translation and VMware could be causing a problem, or fork then /bin/sh being slow on vmware (some strange article re Zap?) .. but I'm curious if anyone has had any similar experiences and/or recommendations? For the linux guys out there, both RHEL and other vendors, does the above script equally give an 8-10 second response?

For the vmware folks, is there anything about bash (/bin/sh) that makes it slow.. especially on the $() iterations which call /bin/sh over and over? I tried replacing the above $i= with "let i=i+1" and it completes in less than a second, and I tried adding a /bin/true into the execution mix and it had minimal impact on speed (trying to test the forkO / exit() combo - and re-writing this particular program/script I know is an option, please don't write to say to do that Smiley Wink

Any feedback welcome, thanks

0 Kudos
1 Solution

Accepted Solutions
drummonds
Hot Shot
Hot Shot

MMU-intensive operations like fork/exec run much faster on AMD CPUs with RVI, which we support with ESX 3.5, and Intel CPUs with EPT, which require vSphere.

Scott

More information on my communities blog and on Twitter:

http://communities.vmware.com/blogs/drummonds

More information on my blog and on Twitter: http://vpivot.com http://twitter.com/drummonds

View solution in original post

0 Kudos
5 Replies
drummonds
Hot Shot
Hot Shot

The test that you are running spends the majority of its time in the fork() and exec() system calls. Tests focused on system calls we call microbenchmarks. It is possible to find microbenchmarks that take even ten times longer in a virtual environment than physical.

But system calls are a very small part of real applications. Code other than the system calls usually runs in user space at speeds that match native. So consider as an example a hypothetical application that runs in 100s and spends 2% of its time in system calls that run in 500% of the time of native. The user code will still take 98 s but the system calls that previously took 2s will now take 10s. The application now takes 108s, which is better than 90% of native.

In short, it is possible to find microbenchmarks that show huge slowdowns. But these tests are not representative of real applications.

Scott

More information on my communities blog and on Twitter:

http://communities.vmware.com/blogs/drummonds

More information on my blog and on Twitter: http://vpivot.com http://twitter.com/drummonds
0 Kudos
devinnate
Contributor
Contributor

Thanks for the reply.. the problem is (and while I totally agree with the general concept you identify), I wrote the posted testcase to demonstrate what a real life application is doing and the substantial performance decrease we're seeing in this real app.

Re-worded, are you/the vmware community aware of any tuning paramaters or options which would allow fork() and exec(), or fork() to /bin/sh operate faster? This particular app makes hundreds of thousands of calls like that... (and yes I appreciate re-writing the app is an important longer term fix), however, wondering if there are any options in the several hundreds/thousands in esxi that would help.

Oh, as an interesting note... the more vCPUs I add to the machine the slower the testcase becomes. Consistent with VMware docs (and no, I cannot demote this machine to a single vCPU, the other aspects of the application need the extra cores).

Thanks, more feedback will be appreciated.

0 Kudos
drummonds
Hot Shot
Hot Shot

MMU-intensive operations like fork/exec run much faster on AMD CPUs with RVI, which we support with ESX 3.5, and Intel CPUs with EPT, which require vSphere.

Scott

More information on my communities blog and on Twitter:

http://communities.vmware.com/blogs/drummonds

More information on my blog and on Twitter: http://vpivot.com http://twitter.com/drummonds

View solution in original post

0 Kudos
devinnate
Contributor
Contributor

Hi Scott;

Thanks for the info- it was extremely helpful. It lead me to an excellent document from vmware vroom about intel ept and amd rvi. The testcases presented in the document appear near identical to those that I presented.

I have 2 follow up questions:

1. When you mentio vSphere above, I assume esxi 4.0 has the feature in all levels, from free to enterprise plus?

2. Are you (anyone) aware of any good documentation on specific operations/functions that are mmu intensive or which are know slow on vmware without ept.

I'd have not known that fork and exec are mmu intensive.

Thanks very much.

0 Kudos
drummonds
Hot Shot
Hot Shot

<span class="jive-thread-reply-body-container">1. When you mentio vSphere above, I assume esxi 4.0 has the feature in all levels, from free to enterprise plus?

Correct.

<span class="jive-thread-reply-body-container">2. Are you (anyone) aware of any good documentation on specific
operations/functions that are mmu intensive or which are know slow on
vmware without ept.

We have not generated any good documentation on this, unfortunately. The rule of thumb for heavy MMU activities is to watch for many processes. Applications like XenApp instantiate a handful of processes for each desktop. A large number of desktops increases the process count linearly which can increase context switches (a type of MMU activity) much faster than that. Linux scripts like yours that loop quickly and invoke shell commands also create/destroy many processes. 'configure' and 'make' are prime examples of this behavior. Some applications (Apache, Oracle DB, I believe) can be run in thread mode or process mode. Lacking their own memory space, thread context switches are less painful and are therefore desirable when RVI or EPT is not present. So, we believe that the threaded modes of these applications are superior but have not quantified the difference.

Scott

More information on my communities blog and on Twitter:

http://communities.vmware.com/blogs/drummonds

More information on my blog and on Twitter: http://vpivot.com http://twitter.com/drummonds
0 Kudos