VMware Cloud Community
peetz
Leadership
Leadership

vpxd.exe memory leak?

Hi all,

we recently re-installed out vCenter server with version 2.5 U4 and moved the database from a separate Oracle platform to a SQL2005 installation that is on the same machine now (a 4-core-server with 4 GB RAM running Windows 2003 -32bit).

Since then we notice that the vxpd.exe process continously uses more and more memory. It starts with about 120 MB (working set) which has raised to 340 MB now after about 60 hours. The first time we noticed that the process was consuming more than 1,5 GB RAM, finally failing with an "Out of memory" exception.

We have 16 hosts and about 400 VMs connected to the vCenter instance.

What vpxd.exe RAM usage are you seeing and do you also notice it increasing?

Thank you for any comments.

Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
22 Replies
pointdexter
Contributor
Contributor

I've got the same problem except my usage does not climb as fast as yours. I only have 90 or so VMs in my clusters.

Have you had any luck figuring out what is wrong?

0 Kudos
peetz
Leadership
Leadership

Hi,

thanks for you reply. No, I have not yet found out the root cause of the issue. However, I'm glad to hear that I'm not alone. I expected to see more comments here of people who are seeing the same problem.

Maybe we come closer to the cause when we compare our configurations and find commonalities?! This is our configuration:

A (physical) Windows 2003 server that hosts the VirtualCenter service and the UpdateManager service. The database (MS SQL 2005) is also on the same box. I guess that it is a very common setup. We use all the enterprise features: DRS, HA and EVC. We have not defined any resource pools.

Now some details that might be not so common: We use several self-developed Perl-scripts (using the VI SDK) that query VirtualCenter, modify custom attributes that we defined in VirtualCenter and so on. And we have raised the statistics level of the 5-minute-interval from 1 to 3. The reason for this was that we currently evaluate the VKernel Capacity and Bottleneck Analysis appliance that requires this setting. The issue might be related to this modification, we have not yet tried to undo it. Would be a next step.

Are there any similarities to your setup?

Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
pointdexter
Contributor
Contributor

Ours is a vm with 2 cpu's and 4gb of ram. It does everything also, HA, DRS, EVC. 2 clusters with about 90 vm's, no resource pools or anything like that.

I have also done some more checking and found that our 3rd party monitorining system shows that the memory usage for our management server was leeking for months before u4. It logs only 6 months worth of performace data so I don't know how long this could have been going on but my guess is that it has always been happening.

0 Kudos
mfish-ibc
Contributor
Contributor

We have the same problem here also. I have 5 clusters accross 2 DC's about 600 VM's and about 75 hosts. I have exactly the same issue. When I start VC service, jumps to about 150MB or so....then by the end of the day it is at around 300 or 400 MB RAM, a few days later it is 1.5GB of RAM or higher. Virtual Center service is set to auto-restart so I am not sure when the issue really began. It became much more obvious to us once we implemented VDI with the citrix broker which freaks out if VC goes away even for a minute, so the Citrix side of things alerted us to examine VC a lot closer and I found this memory leak condition

I build a brand new clean Virtual Center server and DB for the VDI stuff to get them off our problematic VC environment. Since moving their 8 hosts and 400 VM's to their own VC and Database, they have been great and VC has been up for weeks. No crashes and they hover around 280MB RAM for vpxd.exe.

Ours on the other hand continued to have the same issue even after I got the ESX hosts for VDI off of there.

Last night I did the following to try to address the vpxd.exe high mem usage (but to no avail)

1.)Deployed a clean windows 2003 server with sp2

2.)Installed a clean VC 2.5 u4 and pointed it to a blank DB (just so the binaries would lay down and a fully functional 2.5 U4 would be running on this new VC server)

3.)went to our EXISTING (problem) environment and ran an inplace upgrade to U4 (which ran the DB upgrade wizard) and got the environment to U4 (so the VC service,etc was U4 now)

4.)Shutdown the EXISTING problem VC server and gave the IP address and servername to the newly deployed 2003 server with blank 2.5 U4

5.)changed the ODBC entry/connection of the new clean 2.5 U4 server to point to my prod VC DB (where all my clusters,etc are stored)

6.)fired up the new box and made sure all ESX hosts in all clusters reconnected OK (needed to manually click reconnect on them but that was only real issue)

Noticed now that RAM is still starting to RAMP UP now and it has only been 12 hours since I went through all this. Now keep in mind again...this is a clean front end VC server but the backend DB is still the same. The DB I am on has been inplace upgraded since version 1.3 (1.3 to 1.4 to 1.5 to 2.0 to 2.1 to 2.5) over the past 4 years.

QUESTION: Is anyone else who is having this problem also running on a backend DB that has been upgraded through the years inplace (preserving clusters,permissionsetc) or is this happening to people that have had a new fresh DB deployed and are on U4.

Any help or info. anyone else has had with this problem would be appreciated.

Thanks

Matt

0 Kudos
RLI27
Contributor
Contributor

Hi all

I have the same problem and can confirm upgrade path from VC2.0.x till VC2.5U4.

Any DB-cracks out there? Cheers

Roger

0 Kudos
peetz
Leadership
Leadership

I have not pointed this out in my original post, but we started with a clean new VC database and a clean new VC installation (2.5 U4) on a freshly installed Windows 2003 server. We re-recreated the clusters and all configuration in VC, then removed the hosts from the old VC instance and added them to the new one.

So, no updates and no database upgrades involved here

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
mfish-ibc
Contributor
Contributor

Wow, so even going from a clean install straight to this version with a fresh, new DB you still see the memory leak? Interesting.

0 Kudos
grog
Contributor
Contributor

I am having the same issue. I have about 50 hosts and about 1000 vms. I have tried everything down to reinstalling from scratch and nothing seems to work. I am basically at the point where I just wait and see what VMware comes out with next.

Any suggestions would be greatly appreciated...

Thanks,

Marc

0 Kudos
mfish-ibc
Contributor
Contributor

I have a ticket open with them now and at this point hey are only having

me capture perfstats with performance counter and alerts to watch the

vpxd.exe go up. I have a bunch of counters added. By the end of the

day today, the process should have creeped up enough in RAM for me to

stop the counters and send them the results. If I get anything useful

from them....especially a fix. I will post it here. Thanks for the

replies everyone

Matt

0 Kudos
mainx
Contributor
Contributor

I have been experiencing a memory leak using VCS 2.5 U4 as well. The vpxd.exe process leaks about 100 - 150MB of memory per day. When the vpxd.exe process gets to around 1.5 - 1.6GB of memory usage it crashes with an out of memory exception. I'm wondering if this memory leak is the result of VCB backups. Is anybody seeing this memory leak and not using VCB? My environment consists of 1600 VM's with 70 ESX 3.5 servers. We are running MS SQL 2005 as our database on a separate dedicated host. We also are using VMware View 3 manager, which consists of about 200 of our VM's.

I have a VMware support ticket open for this issue and I am currently testing an engineering build that is suppose to fix a memory leak in the vpxd.exe process, but so far it doesn't seem to help at all. I'll keep everybody posted if I hear anything back from VMware or make any further progress tracking down this issue.

Jason

0 Kudos
peetz
Leadership
Leadership

... no VCB backups here, and no VMware View.

After updating to VC 2.5 Update 5 The problem is still here.

Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
mainx
Contributor
Contributor

So far the instrumental build is up to 1.2GB of memory, so it doens't look like it's helping. The instrumental build I'm using should core dump in a couple days, and then hopefuly VMware can track down where the memory leak is occurring.

That's interesting your seeing the issue without VCB, which leads me to believe it's some internal VCS task either for VM's or ESX servers.

0 Kudos
mfish-ibc
Contributor
Contributor

I am still battling with this memory leak issue. the funny thing is, we migrated to a new DC about 2 months ago (migrated all VM's over from old DC to new DC using SRM and EMC SRDF). Anyway, the new DC was setup with a new VC 2.5 U4 and DB (since we used SRM) and new hosts were out there. Recently I saw this problem creep back in to the picture (vpxd.exe starts off using about 250 - 300MB and then leaks memory over a GB and eventually crashes.

In our situation I seemed to have narrowed it down to DRS I think. I disabled HA and DRS accross the board in all clusters in this new VC and restarted the VC service (vpxd.exe) and let it run for a few days with DRS and HA disabled, memory usage held steady at 300MB.

I then re-enabled only DRS on my one main prod cluster (consists of 27 ESXi 3.5 and 3 full blown esx 3.5). Within 24 hours of turning DRS on the memory utilization went over 1GB and the service failed.

I then turned DRS BACK OFF on that cluster (so no HA or DRS was enabled at all) and restarted vpxd.exe again. The service ran fine all day Friday and over the weekend to now (Monday morning).

My next test today which I have just started a few horus ago was to turn on HA and NO DRS on the cluster. So far with HA re-enabled memory usage is holding steady around 320MB which is fine for our VC which has 600 VMs and about 70 hosts.

My suggestion to anyone seeing this problem to see if you see the same symptoms as me is to disable DRS accross all clusters in your VC and stop and then start the VC service and see if you see the memory leak issue after a few days. I am curious to see if anyones symptoms besides ours goes away when DRS is DISABLED. Let me know. I will post if I find anything more out.

Matt

0 Kudos
mfish-ibc
Contributor
Contributor

Did you see my most recent post about what I found with DRS seeming to cause the memory leak at our site? I posted a pretty detailed description as to what I did to narrow it down to DRS, Just wondering if anyone else tried disabling DRS and seeing if the leak continued.

0 Kudos
peetz
Leadership
Leadership

We have migrated to VSphere 4 (re-installed vCenter from scratch with new database) and have not seen the leak anymore since then.

We used DRS before (with 3.5 and observing the leak) and still use it. However, I cannot reproduce the 3.5 behaviour without DRS, because it's all on 4.0 now.

Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
legonima
Contributor
Contributor

Hello,

We are experiencing a memory leak on our vcenter 4.1 (HPDL380, 8GB RAM, W2008 Server R2), managing 77 VMs split on 8 ESXi 4.1 servers). Physical RAM is completely filled after a couple of minutes.

Problem does not disappear even after disabling both DRS and HA... Did anyone experience the same issue ? Did you solve it ? Thanks for your help...

0 Kudos
BobWarrington1
Contributor
Contributor

I also have observed a memory leak issue with vpxd.exe since upgrading to vCenter 4.1.  In my case I run vCenter 4.1 (along with UM) on a physical server running Windows 2008 R2 as well as SQL Server 2008.  Restarting the service frees up the memory but obviously this is not an acceptable solution.  Along with Update Manager plug-in periodically failing, I have kept this build in the lab until it is stable enough to see the light of day in production.  My test environment consists of 22 hosts and approximately 1500 VMs, and yes I use DRS.

0 Kudos
murphyslaw1978b
Contributor
Contributor

Unfortunately for me, I'm running vCetner 4.1 in production, and it's not very stable.  I've rebuilt it once due to a crash, and it take 30 minutes to start up.  RAM usage on Tomcat gets up to 4GB, and VPXD can go as high as 6GB. 

0 Kudos
psyclone1976
Contributor
Contributor

Same problem here on vSphere 4.1, has anyone found a fix for this yet?

0 Kudos