VMware Cloud Community
bsAG2010
Contributor
Contributor

Debian 6 "Squeeze" guest bad disk i/o performance

Hi,

we installed an Debian 6 Guest on an ESXi 4.1 Host.

The storage is attached with iSCSI-Luns.

At first the guest system was installed with the Debian 5 profile.

After the slow installation we changed the guest profile to Other Linux 2.6.x kernel which slightly improved the disk performance.

If you copy files whin the guest or to or from the guest the disk performance is bad.

The average file copy speed is arround 500 kb/s

As long as you copy large files or many small files (for backup) the apache stops responding and munin shows disk latency peeks up to 700m

What can I do to improve the i/o performance?

All other Windows Guest on the same host and same storge perform well.

You could copy files with 40 MB/s with no noticeable effect on the other guests.

0 Kudos
7 Replies
bsAG2010
Contributor
Contributor

Anybody  an idea?

0 Kudos
J1mbo
Virtuoso
Virtuoso

What numbers do you see with iperf between two Debian-6 VMs, and between a Debian-6 VM and a Debian-6 physical machine?

What disk performance does dd report, for example:

  • dd if=/dev/zero of=test bs=1048576 count=2048 to write test 2GB
  • dd if=test of=/dev/null bs=1048576 to read test the file just created

Also, when transferring the files, what does top show in the CPU counters?

0 Kudos
bsAG2010
Contributor
Contributor

HI J1mbo,

i ran the tests serveral times.

1.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=2048
1429+0 Datensätze ein
1429+0 Datensätze aus
1498415104 Bytes (1,5 GB) kopiert, 718,453 s, 2,1 MB/s

root@xxx:/dev# dd if=test of=/dev/null bs=1048576
1429+0 Datensätze ein
1429+0 Datensätze aus
1498415104 Bytes (1,5 GB) kopiert, 36,8582 s, 40,7 MB/s

2.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=1024
1024+0 Datensätze ein
1024+0 Datensätze aus
1073741824 Bytes (1,1 GB) kopiert, 51,0351 s, 21,0 MB/s
root@xxx:/dev# dd if=test of=/dev/null bs=1048576
1024+0 Datensätze ein
1024+0 Datensätze aus
1073741824 Bytes (1,1 GB) kopiert, 2,07941 s, 516 MB/s

3.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=1536
1536+0 Datensätze ein
1536+0 Datensätze aus
1610612736 Bytes (1,6 GB) kopiert, 23,4687 s, 68,6 MB/s
root@xxx:/dev# dd if=test of=/dev/null bs=1048576
1536+0 Datensätze ein
1536+0 Datensätze aus
1610612736 Bytes (1,6 GB) kopiert, 2,62495 s, 614 MB/s

4.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=2048
2048+0 Datensätze ein
2048+0 Datensätze aus
2147483648 Bytes (2,1 GB) kopiert, 239,942 s, 9,0 MB/s
root@xxx:/dev# dd if=test of=/dev/null bs=1048576
2048+0 Datensätze ein
2048+0 Datensätze aus
2147483648 Bytes (2,1 GB) kopiert, 10,624 s, 202 MB/s

5.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=2048
2048+0 Datensätze ein
2048+0 Datensätze aus
2147483648 Bytes (2,1 GB) kopiert, 52,552 s, 40,9 MB/s
root@xxx:/dev# dd if=test of=/dev/null bs=1048576
2048+0 Datensätze ein
2048+0 Datensätze aus
2147483648 Bytes (2,1 GB) kopiert, 2,34233 s, 917 MB/s

6.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=2048
2048+0 Datensätze ein
2048+0 Datensätze aus
2147483648 Bytes (2,1 GB) kopiert, 6,31328 s, 340 MB/s

7.
root@xxx:/dev# dd if=/dev/zero of=test bs=1048576 count=2536
2536+0 Datensätze ein
2536+0 Datensätze aus
2659188736 Bytes (2,7 GB) kopiert, 527,574 s, 5,0 MB/s
root@xxx:/dev# dd if=test of=/dev/null bs=1048576
2536+0 Datensätze ein
2536+0 Datensätze aus
2659188736 Bytes (2,7 GB) kopiert, 3,47001 s, 766 MB/s

CPU-Load ist betwen 2 and 10 % (The Guest have 4 assind CPUs)

If i copy Files via  Samba to the Guest it startet with round about 20 Mbyte/s

After 2 minutes this happens

[URL=http://img851.imageshack.us/i/copye.jpg/][IMG]http://img851.imageshack.us/img851/1906/copye.jpg[/IMG][/URL]

It goes down to 2 MB/s an decreasing.

A few minutes more the load auf the Cpus go ab to 25 % an the guest took a long time

[URL=http://img806.imageshack.us/i/picsrw.jpg/][IMG]http://img806.imageshack.us/img806/8072/picsrw.jpg[/IMG][/URL]

When the copy process is aborted the CPU-load go down and the server responds normal.

0 Kudos
J1mbo
Virtuoso
Virtuoso

What SAN and switches is this running on?  How is the iSCSI component configured in the host?

Some ideas - does the storage switch port stats show any issues (incorrect speed, late collisions, CRC etc)?  Is the SAN LUN configured for write-back caching?  Are there any SAN snapshots on the LUN?  Does the SAN give any network stats (retransmits for example)?

0 Kudos
bsAG2010
Contributor
Contributor

Hi,

the SAN is Running on Open-E VSS6   6.0up55.8101.5087 64bit

with Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01) attached.

The Switch is a Nortel 10 Gb Uplink Ethernet Swithc Module 10.0.1.18 attach in an IBM Bladecenter

The Host is a IBM Blade with 4 NICs attached to. The Connection to the SAN Uses the 2second onboard NIC.

Ports look all clear to me (speed, no collisions..)

Write-back caching is disabled for this LUN.

No SAN-Snapshots for this LUN.

SAN Network only shows receivd/transmitted stats.

No Retransmits.

0 Kudos
J1mbo
Virtuoso
Virtuoso

How does it look if the LUN is set to write-back caching?

0 Kudos
bsAG2010
Contributor
Contributor

Hi,

just to let you know, we found the solution for our "problem"

It looks like the Blade-Server which hosts the Debian6 Guest have a hardware problem.

If more than one VM Guest is running the performance of the Debian6 Server break down.

We moved the Debian6 Guest to another BladeServer and everything runs smooth. No more gabs in Munin.

Regards

bsAG2010

0 Kudos