Solved: Linux guests stutter under load

devzero · ‎10-02-2007

please also see http://communities.vmware.com/message/533875

i still have no explanation for why this happens and just don`t want this thread to get forgotten.

ok - it`s an issue with SATA disks, but it`s yet to be explained what`s causing the problem.

i see more and more people using sata disks attached to sas controllers (for cheap test boxes) - so i`m wondering if i`m alone with this....

repost:

if i do "dd if=/dev/zero of=/test.dat bs=1024k" inside a linux VM, my VM begins to stutter very soon, "vmstat 1" doesn`t print a line each second but needs at least 5-10 times longer (time seems "jumpy")

after a while (until disk gets full), dd reports a write rate which is quite nice, but only from the VMs point of view. (time is relative, as Einstein tells).

If i measure the real time for that write of ~5GB and calculate the average, write performance is horrible with this (factor 10 less performant from what i expect or what i have seen with booting from knoppix before installing ESX)

this is on a server with integrated LSILogic SAS-controller and IME (integrated mirroring)

i tried this with my home system and could see a similar effect.

my home system has LSIlogic Megaraid 150-4 controller (SATA) and for my surprise i was able to solve this issue (sluggishness and slow write) completely by changing the controllers caching policy from write-through to write-back. works and performs pretty well here at home. i get around native disk speed from inside a VM!

the problem with the SAS system and IME is:

it seems that i cannot tune caching policy of that controller!

http://www.lsilogic.com/files/docs/techdocs/storage_stand_prod/PCISCSICont/Software/ir_ug.pdf

telling:

"The IM firmware disables disk write caching. This is done to increase data integrity, so that the disk write log stored in NVSRAM is always valid. If disk write caching were enabled (not recommended), the disk write log could be invalid.

while this "stuttering" happens, the VM is nearly unusable and timing is completely crazy....

Can someone tell, why this happens?

Can this be tuned somehow ?

Maybe VMFS3.DoSyncDiskActions in advanced settings?

aqualityplace · ‎04-30-2008

I have the same problem. I have had to disable the cache on my system. I had a couple of power outages which destroyed 2 mysql databases, aslo physical disks would go offline for no apparent reason. IMO you can only enalbe cache if the raid controller has onboard battery backup, UPS protection is not enough.

Havent tried changing the virtual scsci type from LSI. I will give this a try on a new vm to see if it helps. My machines dont stutter but disk IO seems terrible. For example I can download files from the internet quicker than it takes to make a copy of them locally!!

Not sure if this can be fixed with a firmware update??

View solution in original post

nzsteve · ‎10-03-2007

Do you get the same problem with windows VMs? I may be over simplfying, but it sounds like the problem is just slow disk subsystem.

The best write I've had out of the LSI 1064 / 1068 IME in testing with ESX is 2MB/s. Far to slow to run VMs off.

Enabling the write back on a megaraid sata-4 controller is essesntially enabling the cache, meaning that writes are made to cache before commiting to disk. Any write under 64MB on this card (the size of the cache) will be super fast as its stored in RAM. write through disables the cache and commits straight to disk.

The IME cards dont have cache like this, so the performance will be low / similar to flasher HBAs with cache disabled.

Does that help?

Steve

nzsteve · ‎10-03-2007

.

devzero · ‎10-03-2007

i didn`t try windows yet, but i wonder why this would make a difference.

"but it sounds like the problem is just slow disk subsystem"

how do you define a slow disk subsystem?

having sas disks attached would give better performance with more iops, more throughput, better parallel workloads.

ok - i now i have sata attached which is slower.

so i would expect my VM would perform "not that good" as with SAS.

but why do the VMs stutter and being unsuable due to severe timing issues on the same controller ?

The best write I've had out of the LSI 1064 / 1068 IME in
testing with ESX is 2MB/s. Far to slow to run VMs off.

that`s weird, because this controllers perform quite well with sata disks if i run linux natively on them.

Does that help?

let`s say , it`s an attempt

but i think it`s not a satisfactory explanation for the issue i see.

roland

devzero · ‎04-03-2008

any news on this?

since 1064 + sata is a supported scenario now, i gave it another try with esx 3.5 and really wonder why i have that weird linux timing issues which keeps me away of taking that system into production and buying a license.

devzero · ‎04-05-2008

update:

- problem exists when onboard lsilogic controller has write-cache disabled AND virtual scsi controller is also set to lsilogic

- problem does not exist when VM`s virtual scsi controller is buslogic or when write-cache is enabled for onboard(physical) lsilogic controller (cannot set this trough bios but must use appropriate addon-tool)

any clue what`s the issue here ?

how can i analyze this problem?

tried esxtop, but i don`t get any meaningful output from that.

aqualityplace · ‎04-30-2008

I have the same problem. I have had to disable the cache on my system. I had a couple of power outages which destroyed 2 mysql databases, aslo physical disks would go offline for no apparent reason. IMO you can only enalbe cache if the raid controller has onboard battery backup, UPS protection is not enough.

Havent tried changing the virtual scsci type from LSI. I will give this a try on a new vm to see if it helps. My machines dont stutter but disk IO seems terrible. For example I can download files from the internet quicker than it takes to make a copy of them locally!!

Not sure if this can be fixed with a firmware update??

devzero · ‎04-30-2008

wondering, why this problem isn`t reported more often.

lsilogic 1064/1068 is very common controller and sata disks with that shouldn`t be too exotic, too.

Rumple · ‎04-30-2008

If you do not have a BBU unit on the controller (SATA, SAS or SCSI) then you are always going to have horrible write performance as the controller is gonig to default to Write Through which waits for an ack from the disk saying the data was written successfully before sending the next bit of data.

With Write back the controller stores the data in the cache and assumes it will get written eventually.

An example where i've seen this is a typically LSI controller with no BBU and 146GB u320 SCSI disks raid 1. Creating the 1GB Dile using DD took 11 minutes. Enabled Write back (for testing) and it took 57 seconds.

I see this all the time on SATA controllers like the SRCS16. In windows a disk queue of 2 or higher is noticable for performance issues (normally it should be in the .02-1 range). Without a BBU on that SATA controller a simple download of 500Kbps from the internet has caused a disk queue of as high as 20!!!

I've also seen SATA without BBU cause exchange/sql corruption as the data and transaction logs are written to slow or not at call causing the database to crap out...

NEVER use a controller without a BBU

devzero · ‎04-30-2008

now - if you can tell me how to attach a bbu to these ones ?

http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/basic_connectivity/index.ht...

http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/advanced_connectivity/index...

so these are all crap, or what?

Rumple · ‎04-30-2008

I've looked at all the documentation and I don't think you can attach a BBU to these controllers. They are pretty nice looking, economical controllers but suprisingly there is no option to use a BBU with them. I have a Dell Perc SATa controller (2610SA) that also doesn't have a BBU option for it and while I can run VM's on it I do see performance issues with it and only use it for lightly used VM's for testing.

Paul_Lalonde · ‎04-30-2008

I completely concur... ESX does no read / write caching at all within the VMkernel as this is left to the disk controller. Whereas an HBA may work fine in a real physical Windows or Linux server with OS filesystem caching effectively substituting for HBA write cache, it will totally TANK within an ESX server. ESX VMkernel device drivers specifically do not enable disk write back caching for controllers with no BBU installed.

Paul

aqualityplace · ‎05-01-2008

I tired building a vm using the bus logic card, made no difference. What I have decided to do is enable the write back on the controller just form the vmfs Logical drive (didnt realise you could enable it for specific LUNS). You can do this using the megaraid megamgr tool avialalbe from lsi, which you can run from the ESX console. I have enabled backups of all MYSQL databases every hour. If the power goes I wont lose esx, VM's with open files may have some problems but at least I can get the mysql db back from the last hour.

Enabling the write back increased performace by about 6X using a simple file copy test

I will monitor to see if the physical drives are maked offline, this is the problem I had when turning on the cache before. There is also an option to turn of the cache for each physical disk. Current config

Logical drive 1 write through

Logical drive 2 write back

all physcial disks cache off

devzero · ‎05-01-2008

write cache enabled or not - the matter is:

why does sas disks work without issues and sata show this weird performance hiccups?

ok, sas has more iops and possibly more troughput - but what should that matter?

for me it seems, there must be some tuneable/param (some queue or buffer depth) under the hood which we need to touch, but which ?

esx supports this controller and it also supports sata disks now - so what do i need to do to make it work or, at least why does it show this behaviour ?

Rumple · ‎05-01-2008

I believe the support model for SATA is the following, or at least that is how I've read it in the past.

SATA is support for the ESX Operating system (vmkernel) with specific controllers that are on the HCL (others that emulate SCSI devices will work - like megaraid controllers)

SATA is supported for VMFS when used with a system like a SAN which provides the interface between the disk and the ESX kernel. Directly connecting SATA disk to a controller is not supported for vmfs due to the obvious performance and reliability issues.

If you have a megaraid 300 series controller you can put a BBU you can get some acceptable performance with Writeback enabled which will probably also work quite well.

SATA is only using a subset of the capabilities of the SAS interface (which is why they are interchangable on the same card (although not mixed together).

devzero · ‎05-07-2008

that still doesn`t explain why sata has issues and sas hasn`t.

devzero · ‎05-07-2008

i think this is related: http://communities.vmware.com/message/938188 , Poor Linux I/O performance / long machine pauses on recent kernels?

devzero · ‎06-05-2008

i have seen this issue now with 3 independent disk with different types of lsi 1064/68 based controllers and different sata disks and still have no clue what`s the problem here.

no expert around who knows some details ?

devzero · ‎06-05-2008

digging in linux driver details is often very enlightening

spend some more time on this today and i came across mpt_can_queue and mpt_reply_depth.

set both to 64

problem solved!

after setting write-cache for lsi-controller to on, my sata-esx performs like hell. (write-rate in vm >50mb/sec)

i assume the default setting of mpt_can_queue=128 is good for sas, but not for sata - as sata cannot handle

as much iops as sas. so i assume the queue will be constantly flooded to much and this is causing the weird timing behaviour.

see:

http://www.lsi.com/support/downloads/hbas/fibre_channel/software_drivers/linux/FC_Linux_ReadMe.txt

5.1 Queue Depth

..................

The mpt_can_queue parameter specifies the maximum IO depth per HBA port. This value can range

from 32 to 1024, however the value should not be configured to exceed the combined I/O queue

depths of the attached devices.

insmod ./mpbase.ko mpt_can_queue = 128

The reply depth parameter specifies the Reply Queue depth for the HBA. In general, the reply

queue depth should be greater than or equal to the mpt_can_queue depth

insmod ./mpbase.ko mpt_reply_depth = 128

if this solves a problem for you, please send me the cheque. i send banking details on request :smileygrin: :smileygrin: :smileygrin:

dominic7 · ‎06-05-2008

Please send banking details. I am a foreign national and have recently come into a large sum of money, for your help in this transaction....

All

Linux guests stutter under load