In my first trails of VMWmare server (my first real use of a VMware procuct on a Linux host) I have experienced VERY poor performance of Windows XP and Vista guests with >512MB memory. Guests would lock up for >60seconds as a time, even doing simple things like opening a command prompt.
Searching these forums, I found this seems to be an issue many users have experienced with VMWare products on linux hosts. I found many recommendations for tweaking settings, but none solved the issue for me.
However, the advice and experiences I found did help me greatly with tracing the problem, and finding a solution that has significantly improved the speed of hosts for me. Therefore I wanted to share my solution, in the hope it can also be of help to others.
Before going further, I should note this solution has worked for me on the following host configuration. It may or may not be of use with different configurations (in particular 64bit host OS's)
Host System:
- Dell T740, Quad core Xeon (E5420), 4GB RAM, 4x SATA disks
- Gentoo Linux, 32 bit, Kernel 2.6.24-gentoo-r4 with HIGHMEM64G enabled
- One disk is dedicated to virtualk machines, to avoid other processes causing delays in disk access.
- XFS Filesystem for VMs
- All VM's configured with debug and logging disabled (running vmware-vmx not vmware-vmx-debug)
Symptoms:
- Windows guest performance inversly proportional to the amount of RAM allocated (the opposite of what one might expect!) More memory resulted in >60s hangs of the guest OS, with no system load on host or guest, and no other guests running - just opening a windows command prompt to >60s for example.
- Excessive disk thrashing (on the vmware disk) whenever the guests were paused.
- Top showing high iowaits.
- During pauses in guest VMs the host system showed no noticeable slowdown, only the guest VM process was affected.
The solution (for the impatient - more detail on why I think this works below):
Set the following values in /proc/sys/:
vm/swappiness = 0
vm/overcommit_memory = 0
vm/dirty_background_ratio = 2
vm/dirty_ratio = 100 * This is the real key
vm/dirty_expire_centisecs = 9000 * Not recommended if you have an unreliable power source!
vm/dirty_writeback_centisecs = 3000
The following did not appear to make a difference alone, but I have left them set anyway:
In /etc/vmware/config:
prefvmx.minVmMemPct = "100"
prefvmx.allVMMemoryLimit = "3200"
prefvmx.useRecommendedLockedMemSize = "TRUE"
svga.enableOverlay = "FALSE"
MemTrimRate = "0"
sched.mem.pshare.enable = "FALSE"
MemAllowAutoScaleDown = "FALSE"
mainMem.useNamedFile = "TRUE" * Not FALSE as recommended by others!
Run at boot:
blockdev --setra 16384 /dev/sdb * This is the disk with my VMs on. Other drives are set differently for my requirements.
echo anticipatory > /sys/block/sdb/queue/scheduler * Again, sdb is the disk with my VMs. Other drivers are using CFQ.
Personally, I am not happy with just changing settings without some understanding of what I am changing and why. So for those like myself, here is a brief explanation of how and why I reached the above solution.
Further investigation and other posts on the forums led me to the conclusion that the pauses were a result of iowaits while the disk backed memory file (VMEM files) were updated. I saw many recommendations to set:
mainMem.useNamedFile = "FALSE"
The justification of this from many posts was that it turned off the VMEM file. However this is not strictly true, it just uses an unlinked (deleted) file in $TMPDIR instead. I learned this the hard way when my /tmp partition filled up:-( However even setting $TMPDIR to a directory on my VMWare disk, this still did not give me a great performance boost, and the VM's still had the pauses. Other users investigation this had reported that Linux seemed to delay writing back changes when the file was unlinked. I'm not sure if this has changed in the new Kernels, is different in the XFS kernel driver than ext3 used by others, or is related to my 32 bit, HIGHMEM64G config, but it doesn't seem to make a difference for me. To avoid delays suspending and resuming guests I set this back to TRUE.
I followed recommendations for tweaking host performance with the other /etc/vmware/config options above, though alone they did not help. Other forum posts document the purpose of these well, so I will not repeat it here.
Next I found recommendations to set the following in /proc/sys:
vm.swappiness=0
vm.overcommit_memory=1
vm.dirty_background_ratio=5
vm.dirty_ratio=10
vm.dirty_expire_centisecs=1000
These values did not help me, in fact the vm.dirty_background_ratio and vm.dirty_ratio values were the same as the defaults for my kernel anyway. However it did lead me to investigate these options further. I discovered:
vm.dirty_background_ratio:
Defines the percentage of memory that can become dirty before a background flushing of the pages to disk starts.
Until this percentage is reached no pages are flushed to disk.
When the flushing starts, then it's done in the background without disrupting any of the running processes in the foreground.
vm.dirty_ratio
Defines the percentage of memory which can be occupied by dirty pages before a forced flush starts.
If the percentage of dirty pages reaches this number, then all processes become synchronous,
they are not allowed to continue until the io operation they have requested is actually performed and the data is on disk.
Now, by my understanding (and I am by no means an expert on this, so please correct me if I have this wrong), every VMs RAM is mapped (with mmap in 1MB blocks) to a file on disk.
<aside> I cannot claim to understand the reason for this - my host machines RAM is not disk-backed (No the hosts swap does not count - the guest OS's can have swap working in the same way as the hosts), so why when I tell VMWare to give a guest 2GB of my RAM, and not to put any in swap, does it back it with a file on disk. I have yet to see a convincing explanation of this apparent madness (speeding up suspend/resume is not a good reason in my book, I can live without this.)</aside>
So, now imagine the scenario, my Windows guest thinks it has 2GB of RAM, which it merrily accesses as it sees fit, running apps, caching disk etc. However every time the guest RAM is written, the Linux host sees a new dirty page that must then be written back to the mmap'ed file on disk. Now 2GB of RAM in the VM is a lot of pages on the host, and therefore the dirty page count can increase rapidly even when the guest does simple task (eg. caching disk reads, which the guest kernel believes is improving performance, when in fact it is actually slowing things down by causing the Linux host to perform slower disk writes as a result!)
OK, so we have the idea, the guest is merrily changing RAM and creating dirty pages. Now the Linux host kernel needs to decide when to commit these dirty pages back to disk. This can happen in 2 ways, in the background by a pdflush process, which other processes still run, or synchronously, causing any processes dirtying RAM (eg. vmware-vmx) to wait for the dirty page writes to complete. Clearly the synchronous option is going to have severe consequences for the guest VM process. This, to me, completely explains the huge and regular pauses in guests with large memory allocations.
This conclusion reached, I reached the opinion the best way to manage writing back the vmem files was:
1) To avoid synchronous writes, and have all writing done by the background pdflush process
2) Not to accumulate so many dirty pages, as a long synchronous write becomes unavoidable.
3) Not update the file too often. Guest RAM could change fast, we do not want to cause a disk write for every change.
Now, in my understanding, each of these seems to relate directly to a /proc/sys parameter:
1) This calls for a high value for vm.dirty_ratio
I am sure there a good reasons why this is set as low as 10% in recent kernels, however, for the configuration discussed above, a 2GB RAM host very quickly causes 10% of dirty pages. Therefore I maxed this out at 100%, of the belief that all suspending processes for disk writes should be a last resort. Maybe somewhere a bit lower would be better, but 100% has been working well for me so far. I'd welcome others thoughts on this.
2) This calls for a low value for vm.dirty_background_ratio
From my tests the background pdflush process does not appear to adversely affect the performance of the system, so I set this down to 2%. My theory is that the earlier background writes are started, the lower the chances of reaching vm.dirty_ratio and forcing synchronous writes.
3) This can be controlled by vm/dirty_expire_centisecs
I set this up at 9000, causing Linux to wait 90seconds before committing changes to disk. It is important to remember that this affects the whole system, not just VMWare vmem, and caching writes for 90 seconds could well be considered a 'bad thing', especially if you are value data integrity and are not running off a good UPS. From what I can tell this value is ignored once vm.dirty_background_ratio is reached, and thus this does not counter objective number 1.
As I said before, this works for me. I am not an expert, and there may be good reasons for not using these settings (please let me know!). What I do know is that the above configuration has restored my confidence in VMWare server on Linux (I was for a brief period considering having to try a Windows host OS!)
I hope this information is of use to others.
Regards,
Graham