Poor ldap write performance on RHEL 5 VM

lockenkeyster · ‎08-26-2009

We currently have an SR open, but I thought a post to the community might also turn something up. Currently, a customer is running their ldap server in a Red Hat Cluster setup. They are looking to migrate to our ESX environment and are seeing some very poor write throughput with initial testing. Here are some numbers from the current environment:

Avg r=1004.60/thr (502.30/sec), total= 5023

Avg r= 677.00/thr (338.50/sec), total= 3385

Avg r= 595.00/thr (297.50/sec), total= 2975

Avg r= 726.60/thr (363.30/sec), total= 3633

Avg r= 954.80/thr (477.40/sec), total= 4774

Avg r= 888.00/thr (444.00/sec), total= 4440

Avg r= 887.20/thr (443.60/sec), total= 4436

and what the same test shows on their VM:

Avg r= 40.20/thr ( 20.10/sec), total= 201

Avg r= 19.60/thr ( 9.80/sec), total= 98

Avg r= 13.00/thr ( 6.50/sec), total= 65

Avg r= 14.00/thr ( 7.00/sec), total= 70

Avg r= 11.00/thr ( 5.50/sec), total= 55

Avg r= 33.60/thr ( 16.80/sec), total= 168

Avg r= 12.40/thr ( 6.20/sec), total= 62

Avg r= 58.00/thr ( 29.00/sec), total= 290

Avg r= 17.20/thr ( 8.60/sec), total= 86

The fibre-attached storage is coming from the same tier on the same SAN, so no differences there. I have tried a normal VMFS datastore .vmdk, a virtual RDM, and also a physical RDM - none shows any significant improvement over the others. Our SAN administrator tells me that there is very little in the way of IOPS on the lun and parity group at the time of testing and the switch ports show little activity, so I believe the bottleneck is further upstream.

Read/search performance is fine for the VM, in fact it is better than on the existing environment:

Avg r=10414.40/thr (5207.20/sec), total= 52072

Avg r=10835.60/thr (5417.80/sec), total= 54178

Avg r=10909.00/thr (5454.50/sec), total= 54545

Interestingly, running a suggested command of "time dd if=/dev/zero of=test count=500000" appears to show better throughput performance on the VM:

On the VM -

time dd if=/dev/zero of=test count=500000
500000+0 records in
500000+0 records out
256000000 bytes (256 MB) copied, 2.21985 seconds, 115 MB/s
real 0m2.258s
user 0m0.164s
sys 0m2.047s
On the existing environment -
time dd if=/dev/zero of=test count=500000
500000+0 records in
500000+0 records out
256000000 bytes (256 MB) copied, 7.06046 seconds, 36.3 MB/s
real 0m7.098s
user 0m0.150s
sys 0m6.458s

Has anyone seen something like this before?

Our environment:

ESXi 3.5 U4

VirtualCenter 2.5 U4

Dell PowerEdge M600

8 CPUs (Intel Xeon E5440 @ 2.83Ghz)

32GB RAM

ISP2432-based 4GB Fibre Channel to PCI Express HBA

Broadcom NetXtreme II BCM5708 1000Base-SX

Sun 9990 SAN

Dual-ported fibre channel disks in a RAID5 (7+1) configuration

The customer's VM:

RHEL 5 (64-bit)

VMware Tools installed

4 vCPU

~5.5GB vRAM

ldap installation:

Sun-Java(tm)-System-Directory/6.3.1 B2008.1121.0522 (32-bit)

Database Cache: 500M

Entry Cache: 1.5G

Max File Descriptors: 8192

Max # Threads: 1000

Maximum # of Persistent Searches: 30

All_IDs Index Threshold: 10,000

of Entries Currently in the Directory: 1.2 million

puzzledtux · ‎08-26-2009

Have you tried with lowering the vCPU's? Is that a requirement? I have heard that there may be a virtualization overhead causing performance degradation if the application is single threaded and the virtual machine is assigned multiple vCPU.

lockenkeyster · ‎08-26-2009

Yes, that was the first thing VMware support had us try. They have run the test on 1,2, and 4 vCPUs - and actually the results got worse with fewer vCPUs.

javavaperware · ‎08-27-2009

Could it be the JVM heap or runtime settings such as is described here, it looks like you set the db cache size

...

For example, if you were importing 200K entries, you might specify 2 Gbytes for the JVM heap size, then allocate at least 1 Gbyte for the directory server runtime environment and the rest for the DB cache.