VMware Cloud Community
ineya2
Contributor
Contributor
Jump to solution

vmware ESX and slow syscalls

We have 2 physical machines in company. Both have the same HW configuration, running the same CPU:

Intel(R) Xeon(R) CPU E5420 @ 2.50GHz

One is a regular linux, and on second one we have ESX, version 4.

In the ESX we have linux, which should be almost identical with the linux on first machine.

The kernel version is: (a bit old for these days, but needed cause of old project)

Linux x 2.4.21-53.ELhugemem #1 SMP Wed Nov 14 03:46:17 EST 2007 i686 i686 i386 GNU/Linux

The problem is that virtualized linux is running slower. I have read, that the overhead should be ~8%, which is something I could live with. But the performance hit can be seen by naked eye.

I made 2 test programs:

First was just doing some extensive work in userspace (e.g. giant loop and counting numbers). Here the performance hit is around 8%-10%, which is fine.

Second program is doing syscalls - "close(0);" in loop. And this is where things aren't pretty anymore:

Linux running on real HW:

% time seconds usecs/call calls errors syscall

-


-


-


-


-


-


99.65 0.963257 10 100002 99999 close

0.15 0.001403 33 43 41 open

0.14 0.001368 34 40 36 stat64

0.06 0.000566 566 1 execve

0.00 0.000027 5 5 old_mmap

0.00 0.000007 4 2 fstat64

0.00 0.000006 6 1 read

0.00 0.000006 6 1 munmap

0.00 0.000004 4 1 uname

0.00 0.000003 3 1 brk

-


-


-


-


-


-


100.00 0.966647 100097 100076 total

real 0m4.613s

user 0m0.760s

sys 0m3.730s

[code]

Process 14702 detached

% time seconds usecs/call calls errors syscall

-


-


-


-


-


-


77.76 17.1206772 182 100002 99999 close

3.01 0.703602 703602 1 execve

2.99 0.700382 700382 1 set_thread_area

2.99 0.700337 700337 1 munmap

2.99 0.700328 700328 1 uname

2.99 0.700123 700123 1 read

2.99 0.700108 700108 1 brk

2.14 0.500571 100114 5 old_mmap

1.71 0.400229 200115 2 fstat64

0.43 0.100360 33453 3 1 open

-


-


-


-


-


-


100.00 23.412812 100018 100000 total

real 0m48.434s

user 0m5.410s

sys 0m40.610s

[code]

Process 14702 detached

% time seconds usecs/call calls errors syscall

-


-


-


-


-


-


77.76 17.1206772 182 100002 99999 close

3.01 0.703602 703602 1 execve

2.99 0.700382 700382 1 set_thread_area

2.99 0.700337 700337 1 munmap

2.99 0.700328 700328 1 uname

2.99 0.700123 700123 1 read

2.99 0.700108 700108 1 brk

2.14 0.500571 100114 5 old_mmap

1.71 0.400229 200115 2 fstat64

0.43 0.100360 33453 3 1 open

-


-


-


-


-


-


100.00 23.412812 100018 100000 total

real 0m48.434s

user 0m5.410s

sys 0m40.610s

[code]

Process 14702 detached

% time seconds usecs/call calls errors syscall

-


-


-


-


-


-


77.76 17.1206772 182 100002 99999 close

3.01 0.703602 703602 1 execve

2.99 0.700382 700382 1 set_thread_area

2.99 0.700337 700337 1 munmap

2.99 0.700328 700328 1 uname

2.99 0.700123 700123 1 read

2.99 0.700108 700108 1 brk

2.14 0.500571 100114 5 old_mmap

1.71 0.400229 200115 2 fstat64

0.43 0.100360 33453 3 1 open

-


-


-


-


-


-


100.00 23.412812 100018 100000 total

real 0m48.434s

user 0m5.410s

sys 0m40.610s

Linux running on ESX:

Process 14702 detached

% time seconds usecs/call calls errors syscall

-


-


-


-


-


-


77.76 17.1206772 182 100002 99999 close

3.01 0.703602 703602 1 execve

2.99 0.700382 700382 1 set_thread_area

2.99 0.700337 700337 1 munmap

2.99 0.700328 700328 1 uname

2.99 0.700123 700123 1 read

2.99 0.700108 700108 1 brk

2.14 0.500571 100114 5 old_mmap

1.71 0.400229 200115 2 fstat64

0.43 0.100360 33453 3 1 open

-


-


-


-


-


-


100.00 23.412812 100018 100000 total

real 0m48.434s

user 0m5.410s

sys 0m40.610s

The machine running on ESX spent 1200% more time doing the same thing.

Any ideas why this is happening? It seems, that the context switch is very expensive for some reason.

Reply
0 Kudos
1 Solution

Accepted Solutions
agesen
VMware Employee
VMware Employee
Jump to solution

You are correct that EPT starts with Nehalem; Core (2) has no EPT.

Regarding your comment

We are running hugemem kernel because we have more than 4g of RAM.So,

I'm thinking to use hugemem kernel config as base, but switching from

4g/4g to 3g/1g split.

let me first point out that other kernels than hugemem (e.g.,

bigsmp) can address up to 64 GB of memory, using PAE in 32 bit

mode.

Novell has some verbiage here

http://www.novell.com/coolsolutions/tip/16262.html

that you can use.

I have not personally tried to switch the hugemem to 3/1 (I didn't

even know it could be done), so I can't say if this will help you or

not. But if it doesn't, the bigsmp kernel seems to meet your needs

for memory addressability beyond 4 GB (and it is supported by VMware).

Best of luck,

Ole

View solution in original post

Reply
0 Kudos
9 Replies
ineya2
Contributor
Contributor
Jump to solution

I attached the tables as file, because it broke in previous post.

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

That's a surprisingly large slowdown. Using binary translation, system calls should run about 2000 cycles more than native. See this ASPLOS paper.

For this workload, I would recommend reconfiguring your VM to use hardware-assisted virtualization, which runs system calls at native speed. Your Xeon E5420 should support VT-x.

Reply
0 Kudos
ineya2
Contributor
Contributor
Jump to solution

I'm looking at the /proc/cpuinfo and I can't see vmx. Is this reliable source to check if vmx is currently turned on?

I'm going definetly check the BIOS settings on Monday regarding vmx.

I was thinking, that maybe because "sysenter" is also missing at procinfo, that int 0x80 is perhaps the root cause.

This is what I see inside the guest (virtualized) linux:

processor : 1

vendor_id : GenuineIntel

cpu family : 6

model : 7

model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz

stepping : 10

cpu MHz : 2493.779

cache size : 6144 KB

physical id : 1

siblings : 4

core id : 7

cpu cores : 4

runqueue : 7

fdiv_bug : no

hlt_bug : no

f00f_bug : no

coma_bug : no

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm nx lm

bogomips : 4980.73

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

The guest will never report vmx, because we do not virtualize VT-x. Sysenter is there; it's called "sep".

Use esxcfg-info on the host to see if VT-x is enabled. Look for HV Support. A value of 3 means that the system supports VT-x and VT-x is enabled in the BIOS.

Reply
0 Kudos
agesen
VMware Employee
VMware Employee
Jump to solution

As a follow-up to the comment that Jim made earlier

That's a surprisingly large slowdown. Using binary translation, system calls

should run about 2000 cycles more than native. See this ASPLOS paper.

it is, in a way, not surprising that system calls run very slow

when using the hugemem kernel with BT.

This hugemem kernel (and only this kernel) uses separate address

spaces for kernel and user space. As a consequence, every system

call will requires two address space changes: the first on the way

into the kernel and the second on the way back from the kernel to

user space. For this reason, the hugemem kernel is also known as

teh 4g/4g kernel since it provides a full 4G address space to both

user-mode and kernel space.

Other kernels, including windows and "normal" versions of linux map

the kernel into the top 2 GB (or 1 GB, it depends) of the address

space. This allows system calls to proceed without change of address

space. As a result, system calls are much faster.

The hugemem kernel's approach slows down system calls, even natively.

In a VM, the slowdown can get further amplified because the address

space change (%cr3 assignment) is in itself slower than native (unless

you run with RVI/EPT support).

Enabling VT-x will help some, but it will not fix all the performance

problems. Running on an Intel CPU with EPT, or an AMD CPU with RVI will

help even more. If this is not possible, you should probably change the

kernel.

By the way, VMware does not support guests that run with the hugemem

kernel (because it is too slow, not because we know of any correctness

problems with it).

Hope this helps,

Ole

ineya2
Contributor
Contributor
Jump to solution

Ole: Thank you, your reply is very helpful, I completely missed that the kernel is using 4g/4g split.

From what I could find, EPT support starts with Nehalem microarchitecture, and E5420 is Intel core.

We are running hugemem kernel because we have more than 4g of RAM.So, I'm thinking to use hugemem kernel config as base, but switching from 4g/4g to 3g/1g split.

Reply
0 Kudos
agesen
VMware Employee
VMware Employee
Jump to solution

You are correct that EPT starts with Nehalem; Core (2) has no EPT.

Regarding your comment

We are running hugemem kernel because we have more than 4g of RAM.So,

I'm thinking to use hugemem kernel config as base, but switching from

4g/4g to 3g/1g split.

let me first point out that other kernels than hugemem (e.g.,

bigsmp) can address up to 64 GB of memory, using PAE in 32 bit

mode.

Novell has some verbiage here

http://www.novell.com/coolsolutions/tip/16262.html

that you can use.

I have not personally tried to switch the hugemem to 3/1 (I didn't

even know it could be done), so I can't say if this will help you or

not. But if it doesn't, the bigsmp kernel seems to meet your needs

for memory addressability beyond 4 GB (and it is supported by VMware).

Best of luck,

Ole

Reply
0 Kudos
ineya2
Contributor
Contributor
Jump to solution

By switching away from bigsmp kernel, the performance of (virtual build) machine increased by 25-35%. Comparing this to real HW, the overhead of virtualization is about: 4-15% (depending of how the build is done: clearmake vs. make)

Reply
0 Kudos
ineya2
Contributor
Contributor
Jump to solution

oh, not bigsmp,.. I meant hugemem Smiley Happy

Reply
0 Kudos