VMware Cloud Community
JaroF
Enthusiast
Enthusiast

ESXI disk access slows down overnight

I have actually posted this in the ESXI 3.5 forum, but I think this is probably a better place for it.

I have a very strange problem. I am running 8 virtual machines on VMware ESX Server 3i 3.5.0 build-158874 - it is an Intel Quad Core 8GB RAM machine. I am using a Perc 6 raid controller with 256Mb Cache and BBU. It was running for some time (three months) and only after I added the last machine (a shared storage based on FreeBSD) I suddenly noticed that almost every day (usually happens overnight) instead of getting 60Mb/s file transfers, I get 65k/s, it totally actually stalls.

If i check the network speed from the guest with iperf, it is still around 750mb/s, so it is most probably something with disk access. I checked with esxtop, and nothing seems wrong, everything at almost idle.

If I restart the guest in question, it waits at the boot stage and really very very slowly gets up. If I shut the guest down and then power it back up everything works great until it happens again.

I removed one network card (had 4) which was sharing IRQ with RAID controller (are there no more then 16 (0-15) interrupts? I though APIC is enabled in ESXI?) and hoping that it would help, but it did not, I just think that it happens a little more rarely. But this is just a feeling as it can not be triggered by some user action.

I tried to recreate the guest from scratch - of course using the existing data as it is some 800 GB of it - but still the same result. The system disk for this machine is separate, so I created a new one, and installed from iso image, and when configuration is restored the problem happens again. The data is on two different datastores, and all get really slow.

What is very interesting is that another guest (A Mindtouch Wiki virtual machine) gets slow at the same time too. Instead of being able to download an attachment at 30mb/s I am getting 1mb/s max. After the machine in question is restarted, wiki machine starts working again, without touching it at all.

I also upgraded (actually did a clean install) to ESXI 4.0, with hope that it will fix it up, but it does not.

Current list of devices is as follows (irqs):

~ # lspci -p

Bus:Sl.F Vend:Dvid Subv:Subd ISA/irq/Vec P M Module Name

Spawned bus

00:00.00 8086:29a0 1043:81ea V

00:01.00 8086:29a1 0000:0000 11/ 11/0x69 A V

001

00:28.00 8086:283f 0000:0000 11/ 11/0x69 A V

004

00:28.01 8086:2841 0000:0000 15/ 15/0x71 B V

003

00:28.02 8086:2843 0000:0000 10/ 10/0x79 C V

002

00:30.00 8086:244e 0000:0000 V

005

00:31.00 8086:2810 1043:81ec V

00:31.03 8086:283e 1043:81ec 5/ / C V

00:31.05 8086:2825 1043:81ec 5/ 5/0x81 B V ata_piix vmhba1

01:00.00 1000:0060 1028:1f0a 11/ 11/0x69 A V megaraid_sas vmhba2

02:00.00 8086:107d 8086:1082 10/ 10/0x99 A V e1000e vmnic0

03:00.00 8086:10b9 8086:1083 15/ 15/0xa1 A V e1000e vmnic1

05:01.00 8086:107c 8086:1376 3/ 3/0x89 A V e1000 vmnic2

05:02.00 5333:8811 0000:0000 0/ 0/0x91 A V

Most interesting is that on the same host, a server 2008 is running, with Exchange 2007. The speed of this machine is not influenced, it works great all the time. I even made a share on it, to copy a few GB of data back and forth, and it works great.

All the data stores are on a Perc6e raid controller, and if it was the problem, I think that all the guests would be having the same symptoms.

I also replaced the power supply with a new one, but nothing changed.

Is there some way I can try to catch the event that causes the slowdown?

I think that I have some conflict somewhere, but can not find it. When the machine is slow, esxtop is normal. Nothing is running high.

0 Kudos
0 Replies