VMware Communities
befungle
Contributor
Contributor

Hosting System memory countdown issue - Fusion related?

I am using Fusion on a Apple XSERVER to host a couple VMWARE servers. This server was specifically built to serve this tasks and it's specification detailed below. The only public services active on this server are two Centos 5 64bit Linux systems and a small MySQL database with nomimal use.

Over the past three weeks, I have been reviewing and trying to correct a slow but painful death my server continues to see. Once booted, review of memory use and access shows that the available memory is consumed at a rate of a few hundred megs a day. This consumption is consistent, but has proved incredibly ellusive to resolve. VMWARE systems are obviously at the top of memory users, but the server memory capacity exceeds requested memory on VMWARE servers by 400% (VM machines requesting 4 Gig and server with 16 Gig installed and running).

Other heavy players in the memory use are Kernel use which seems to be releated to VMWARE use.

I have ran multiple reviews and test with other co-workers as well. Thus far no one has been able to find a trick to stopping the continuous loss of memory that when lost results in disk trashing and eventual failure (if not rebooted before critical loss is reached).

Below I will include as much relevant information as I can along with a current "TOP" read out during about mid life of the system. Note that all servers are reporting healthy operation and systems have been checked for virus/hacking. No errors on the OS X console or outside normal operational messages on Linux host.

System Specs:

OS X Server 10.5.8

Processor 2x2.8 Quad Core Xeon

Memory 16 GB 800 Mhx DDR2 FB-DIMM

Harddrive has 279 GB total with 56 GB free.

VMWARE Fusion Version: 2.0.5

Top Results

Processes: 77 total, 6 running, 71 sleeping... 362 threads 16:33:28

Load Avg: 0.35, 0.51, 0.52 CPU usage: 0.94% user, 3.07% sys, 95.99% idle

MemRegions: num = 14958, resident = 312M + 0 private, 149M shared.

PhysMem: 4070M wired, 1077M active, 6619M inactive, 11G used, 4633M free.

VM: 7389M + 0 973682(0) pageins, 76(0) pageouts

Swap: 14M + 50M free Purgeable: 12M 12544(0) pages purged

PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE

410 vmware-vmx 6.8% 2:43:39 25 162 2286 11M 13M 1535M 1854M

0 kernel_tas 1.3% 33:26.72 63 2 1244 111M 0 310M 289M

364 vmware-vmx 9.9% 9:38:56 34 180 5588 37M 14M 96M- 1906M

332 vmware 0.8% 23:59.23 19 235 567 26M+ 26M 54M 268M

129 WindowServ 0.1% 1:29.94 11 198 590 12M+ 19M 34M+ 175M+

I need advise and suggestions if anyone can help. I've exhaused my knowledge of computers and OS X trying to resolve this. While blame could be on OSX or MySQL, review of these systems in process does not indicate such. I believe this is Fusion related, but need help understanding and resolving. Thanks in advance for any insight!

Tags (3)
Reply
0 Kudos
8 Replies
WilliamReid
Enthusiast
Enthusiast

Hi there,

Please attach your support logs for review.

While Fusion is in Focus..

Help >> Collect Support Information and attach the .tgz file.

befungle
Contributor
Contributor

Can you confirm which logs you are looking for? VMWARE or OSX (or both)? I have full access to all areas, but want to make sure you get what you're looking for.

Reply
0 Kudos
WilliamReid
Enthusiast
Enthusiast

The Support Tool grabs both...

Reply
0 Kudos
befungle
Contributor
Contributor

Thanks for the help. I misunderstood the request and did not previously know of the "support tool'. :smileyshocked: Been using VMWARE since early beta in 2007, but very rarely have ever had a problem. This is quite hard to track down. I am also persuing review with MySQL and Apple as well, but that research has not provided any usefull information. Attached is the requested data. Thanks for any insight you have.

Reply
0 Kudos
WilliamReid
Enthusiast
Enthusiast

Hi there,

I managed to take a quick look over the logs on your system and here's what I've come up with so far.

From the system log on the Mac

I'm seeing some messages regarding leaking, desyncing and vmmon which I will forward to our dev's for further input.

I see your running few things on the Mac host itself.

Mysqld

Samba

Cups

As far as your VM's go, here's something I am seeing looking at those logs.

mysql001/Webserver001

I'm seeing some read/write delays - this might be due to something running, missing tools, or things running on the host causing delays for guests.

mbuf errors, might be caused from running applications on the host that are saturating your link... torrent client? high load servers? Backups?

Some things to try to see if anything improves...

If your running it init 5, change to 3 to disable X - which you shouldn't need for a web and mysql server.

Trim down on services that are not needed for your run level.

I'm not sure what settings you have for the actual mysql server itself and the httpd (assuming apache?) server. so I can't advise much there.

I'm assuming you have sufficient swap space created for each guest?

Might want to watch memory loads on guest VM's do you really need 2 gig?

befungle
Contributor
Contributor

Thank you for a very comprehensive assessment. I will review these recommendations with my server team. Some items mentioned were on the board as candidates for further review, but the observed "leak" seemed heavy enough that I thought I should involve public discussion to research.

I am traveling and working much harder than a person should. But I will work to work through each of your points and get back to the thread on any positive results encountered.

As you mentioned involving development, I will keep a look out for any observations or recommendations you may have to foward from them.

Thanks again. I'm not sure if my use of Fusion is common. But my chosen platform is Xservers and so advanced VMWare options are not available to me (sadly). Maybe in the future - it would make my business run much better to have less consumer focused tools on the mac varients.

Reply
0 Kudos
WilliamReid
Enthusiast
Enthusiast

Not a problem, in the meantime I've sent the questionable items to the dev's for further feedback and will let you know my findings.

Wm

Reply
0 Kudos
befungle
Contributor
Contributor

Update on this issue. I can now report that the 3 week trend of a slow but continuous memory loss has stablized for over 24 hours.

Shortly after (or before) posting my original question here I had attempted a couple things. Since they also involved a reboot of Virtual Machines, the results could not immediatly be confirmed. However I began to notice throughout the day yesterday (and confirmed this morning) that the memory is holding steady (at plus 200 megs above the TOP post of the 11th).

Because this is a production machine - I am not able to easy validate my "fix" to confirm which action helped. So I will outline the two things that may or may not have impacted below.

1. VMWARE performacne settings were changed from "Optimize for Virtual Machine" to "Optimize for OS X". This setting does not indicate a restart required (though such seems logical) and DID not have any impace when toggled initially. The machines were then restarted per below.

2. My normal start process for this server is to automatically load VM machines on boot. Meaning immediately after OSX load, the VM machines open and begin crunching the memory and hard drive. On the 11th. All VM's were shutdown cleanly. Closed and VMWARE turned off. All non operational OSX systems were turned off. Then slowly each was started up and memory observed during the load. This continued with starting each of the VM's one by one holding until the first was completely up and stablized before starting the next.

This test was another effort to isolate who is responsible for the problem seen - but almost impossible to judge since initial memory load for a VM can take significant time. The end result has been, however, that the memory stablized at around 4.8 gb free and has stayed happily there for 2 days.

Was it the Optimize toggle? Was it the controlled boot? Not sure and again welcome insight or opinions. Because the need to have a automated restart process is pretty important, I don't want to declare the issue fixed just yet. But having seemed to stop the memory countdown for 48 hours is very nice!

Reply
0 Kudos