I've got a production 1.04 VMWare Server that performed "ok" at first but then I've gotten steadily increasing complaints from users about sluggishness. The server hosts six XP guests, and three Linux guests (one FC4 + two RHEL4). The XP guests are replacements for physical workstations, which folks RDP into to get their work done. The RHEL4 machines host a live web server and the development sandbox for the live server. Most of the time, only one or two of the XP workstations are active, and the web server sees very, very light use.
The 1.x VMWare Server is deployed on the following:
Dell PowerEdge 1800
Dual 3GHz Xeon (hyperthreaded)
320G SATA in RAID1 using software RAID
OpenSuSE 10.1 (x64)
Given I had made some design mistakes in the original deployment that were certainly causing performance issues, and we really needed a warm spare for this machine, I built a whitebox upon which I am running tests and benchmarks:
2.5GHz Xeon Quad-Core
750G SATA in RAID1
The drives are on the LSI MPT/1064e controller on the mainboard. So far, my testing has shown Ubuntu 8.04 to be the performance leader among the supported x64 host OS's for RC2. Software RAID seems to beat using the MPT controller by a noticeable margin, at least on block writes/rewrites (bonnie++ 1.03).
I took a recent backup of all the VM's from the deployed 1.x server and performed the following optimizations on the XP guests:
Converted all vmdk files to preallocated---they were originally growable.
Switched from IDE to LSI SCSI on the vmdk file
Defragged the virtual disk using JkDefrag within the guests
Installed the RC2 VMWare Tools (did not upgrade from Virtual Hardware 4 though)
Did the standard oft-cited vmx tweaks (MemTrimRate=0, etc)
On the host system, I did the following optimizations:
The standard /etc/sysctl.conf VMWare tweaks (vm.swappiness=0, et al)
The partition with the VM's is separate from the OS, and is on faster part of the disk (i.e., the start)
The VM partition is ext3 with data=writeback, and set for noatime.
After the above vmdk tunings were complete, I loaded the VM's onto the VM partition fresh,so host fragmentation should be minimal
Kernel is booted with "elevator=deadline nohz=off"
So ok, here I am at the end of the road in terms of knobs I can tweak. I have a fast host machine, with the exception that it doesn't have a $5000 SCSI disk subsystem. Oodles of RAM. But the performance still sucks.
I didn't go whole hog on getting every disk metric, but the basics tell enough of the story. When against the VM partition, here are the numbers:
bonnie++ shows block I/O at 79MB/s write, 40MB/s rewrite, and 102MB/s read
hdparm -t shows 107MB/s read
Pretty decent don't you think? Well, on the RC4 guest, the bonnie++ numbers are very different:
16MB/s write, 7MB/s rewrite, 9MB/s read
Results are similar on the done-everything-I-can XP guests:
DiskTT shows 11MB/s write, 10MB/s read on a 2G test file
I know that there's some apples/oranges going on with bonnie vs DiskTT, but that can't explain the vast difference. But even worse, while DiskTT is running the system load average spikes into the 6-8 range, mostly due to iowaits. Now everything on the server, including the other guests, are all but stopped until DiskTT is done.
I factored out contention between the guests by doing this test with only one XP guest running. Nope, same speeds, same suspended animation of the server during the test. So this isn't a server that is suffering from overload.
So I tweak, Google, tweak, Google, tweak, and Google some more. I get the XP guests to work reasonably well, but anytime the disk gets even moderate usage the entire server slows to crawl. I even completely disable the paging file on the XP guests, which helps mitigate really-slow-after-some-idle-time problem, since Windows is limited as to what it can swap out to the ultra-ultra-slow disk it's dealing with.
Finally, FINALLY, I hit this posting:
And that, my friends, was the magic bullet. Sort of. I took a lot of info from that thread and took one extra step:
mainMem.useNamedFile = FALSE (puts the vmem file in /tmp)
/tmp is mounted as tmpfs with a 12G limit
As the poster correctly surmises, the problem seems to be lie in the memory-mapped file that VMWare Server insists on creating and keeping up to date. The original poster seemed to eke out decent performance by optimizing disk writes to prevent the kernel from getting saturated, but that didn't work well for me. Nope, I had to go and force the damned memory-mapped file back into RAM. Once I did that, DiskTT gave me this happy news:
64MB/s write, 97MB/s read,
Even better, load average stayed in the low 1's, with iowait in the teens during write, and negligable during reads. Performance on the workstations was near-native, even with the full complement of guests booted up.
While I'm happy that I think I finally have a solution, the fact that this underlying problem exists has been driving me a bit batty for the past 24 hours. As stated in the aforementioned posting:
I cannot claim to understand the reason for this - my host machines RAM
is not disk-backed - so why when I tell
VMWare to give a guest 2GB of my RAM, and not to put any in swap, does
it back it with a file on disk? I have yet to see a convincing explanation of this apparent madness.
That's putting it rather nicely, if you ask me. At this point I'd have some choicer words for it, but yes, madness indeed.
I'm now really glad I put 8G of RAM in this box. On the production server, the reported numbers from the MUI almost never get above 1G total usage for all running VM's, with a top spike of around 3-4G. 8G was beginning to look like a waste.
With the tmpfs solution
-the only solution that has worked for me-I now have a situation where VMWare forces gratuitously wasteful RAM usage. Why? Because while VMWare is extremely good at minimizing how much RAM the guest's working set occupies on the host, the vmem file is always allocated to be the full RAM size of the guest. So a 1G guest might be eating less than a 100MB while it's idle, the file up in /tmp (and because that's tmpfs, it's likely to be in RAM) is still going to be a full 1gig in size. On top of that, a page update within the guest can result in numerous writes on the host---at least one to copy the page into the RAM used by tmpfs, plus all the overhead of getting it out of the VM and through the filesystem layer.
Observations welcome. I'd love a solution that isn't so byzantine and distasteful, so I'm all ears. But this is looking like the way I'll be forced to go.
(apologies for the uneven formatting, I can't seem to get the Rich Text tab to do the right thing with paragraph spacing...)