VMware Cloud Community
vinny95
Contributor
Contributor
Jump to solution

Postgres using 99% CPU

Hi,

I have vcu 3.3.2, running fine for several weeks.

It s been a week that vcu is at 100% CPU due to postgres :

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

5991 postgres  20   0  102m  47m  35m R 99.9  1.3   3:03.04 postgres

    1 root      20   0 10548  844  712 S  0.0  0.0   0:00.37 init

    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd

    3 root      20   0     0    0    0 S  0.0  0.0   0:00.06 ksoftirqd/0

No network issue, disk usage is good, data collecting is still occuring (i can check the monitor tab)

but I can't view any report nor got the automatic report emailed.

When I try to generate a report it automatically cancel it :

Starting report “Monthly Usage.”

Production of the Monthly Usage was canceled.

I try to delete old data but it looks like it never did something. One hour later, I rebooted the appliance and still got Postgres loading vcu appliance.

Can you advise me ?

regards,

vinny

Labels (1)
0 Kudos
1 Solution

Accepted Solutions
dbriccetti
Hot Shot
Hot Shot
Jump to solution

Usage Meter 3.4 stores much less data in the database so this problem should not occur after you upgrade.

View solution in original post

0 Kudos
10 Replies
IamTHEvilONE
Immortal
Immortal
Jump to solution

I'm thinking you'll need to contact support about diagnosing this unless dbriccetti has seen something like this before.

I think Usage Meter the application shows up as a Java process, which is why it works just fine for most things ... but the symptoms seem odd that the DB is claiming all the cycles.

0 Kudos
dbriccetti
Hot Shot
Hot Shot
Jump to solution

Hi. Sorry to hear about this. If you like, you can log in to the Usage Meter appliance, and issue this command (actually an alias) to start the PostgreSQL client:

sql

then paste in this:

SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,

       pg_stat_get_backend_activity(s.backendid) AS current_query

    FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;

and let us know the result.

Restarting the Usage Meter, from root

service tomcat restart

may clear up the problem. If not, I would recommend a reboot.

Dave Briccetti

Lead developer, vCloud Usage Meter

0 Kudos
vinny95
Contributor
Contributor
Jump to solution

Here is the result of the sql command :

usgmtr=> SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;

procpid |                                                                                 current_query

---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   23176 | Select                                                                                                                                                                        +

         |   sum(capped("Sample2"."vmBillingMemory",$1)) as "c0"                                                                                                                         +

         | From                                                                                                                                                                          +

         |   "SampleToLicense" "SampleToLicense1",                                                                                                                                       +

         |   "Sample" "Sample2",                                                                                                                                                         +

         |   "Collection" "Collection3",                                                                                                                                                 +

         |   "Vm" "Vm4",                                                                                                                                                                 +

         |   "HostLatest" "HostLatest5",                                                                                                                                                 +

         |   "License" "License6"                                                                                                                                                        +

         | Where                                                                                                                                                                         +

         |   ((("SampleToLicense1"."vmId" in ((Select                                                                                                                                    +

         |      "CustomerVm9"."vmId" as "CustomerVm9_vmId"                                                                                                                               +

         |    From                                                                                                                                                                       +

         |      "CustomerVm" "CustomerVm9",                                                                                                                                              +

         |      "CustomerVmRule" "CustomerVmRule10"                                                                                                                                      +

         |    Where                                                                                                                                                                      +

         |      (("CustomerVmRule10"."createdBy" = $2) and ("CustomerVm9"."customerVmRuleId" = "CustomerVmRule10"."id"))                                                                 +

         |   ) )) and (betweenco("SampleToLicense1"."start",$3,$4) and "SampleToLicense1"."reportExclusionReason" is null)) and ("SampleToLicense1"."sampleId" = "Sample2"."id"))

    2813 | SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;

I try the service restart but still no reports :

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

      3283 postgres  20   0  102m  45m  35m R 33.6  1.3   0:05.49 postgres

      3013 postgres  20   0  102m  47m  35m R 33.3  1.3   1:44.18 postgres

23176 postgres  20   0  102m  47m  35m R 32.9  1.3 968:08.62 postgres

I have now 3 postgres process, the appliance still gathering stats but no reports displayed.

vinny

0 Kudos
dbriccetti
Hot Shot
Hot Shot
Jump to solution

Thanks. That’s a query for running a report.

I should have suggested the following:

service tomcat stop

service vpostgres restart

service tomcat start

Or, just restart the appliance.

Have you tried the 3.4 beta? Because it saves only changes, and not hourly VM samples, reports are much faster.

Dave

0 Kudos
vinny95
Contributor
Contributor
Jump to solution

I just tried the services restart, still the same behaviour : as soon as the tomcat start, postgres grabs all cpu cycles.

I installed this appliance a month ago, it was not doing this, at least the first week when I tested it.

For the 3.4, it's a beta, so I usually do not use a beta app for a prod environment. But you certainly know it better than me, if you think it can be used, I can give it a try.

vinny

0 Kudos
dbriccetti
Hot Shot
Hot Shot
Jump to solution

How about a reboot, then?

I am suggesting 3.4 not for production reporting, but to run against your production environment to see how well it does for you. We want to discover any problems prior to release, and those of you with large or otherwise uncommon environments can be especially helpful to us.

0 Kudos
vinny95
Contributor
Contributor
Jump to solution

I already rebooted several times but still the same behaviour.

vinny

0 Kudos
pauwel
Contributor
Contributor
Jump to solution

Hi guys,

I have had the exact same issue you guys are having.

I just managed to fix it.

After opening the case Vmware sent me this : http://kb.vmware.com/kb/2070066

This corrupted my appliance and render it useless, and i had to revert the snapshot.

What i did to fix it

1) take a snapshot of the appliance

2) when the postgres process is still at +-100% cpu nothing will work in terms of database trimming.

--> on the reports page choose a report other then monthly report

--> fill in the details and run it with the browse button

--> immediately after run the same one again but with the export button

----> you should see some java error coming up and the cpu goes back down to normal

3) backup your appliance (clone, backup, export to ovf,etc)

4) go to the support page and click the remove old data

---> keep the default 1095 days, type CONFIRM and run it, should take a while and then finish with success (my install had only a little over 3 year of data, so if you have more then try with more then 1095)

WHY ? because going directly to 90 failes and corrupts my appliance...

--> keep trimming down with 50 at a time until you reach 90 days (more the 90 failed and corrupted my appliance...)

-------> yes i know it will take a very long time to do this :smileysilly:

5) run report -> with success

Hope you guys get the same result...

Kr,

Pauwel

0 Kudos
zettaserve
Contributor
Contributor
Jump to solution

I had this same issue after upgrading from 3.3.2 to 3.3.3. In my case I deleted to 1095 days, then deleted in 100 day lots down to 300 after which I my reports ran successfully.

I don't know exactly how much data I had, but I've been a member of the program since 2010 so close to 5 years worth of data.

0 Kudos
dbriccetti
Hot Shot
Hot Shot
Jump to solution

Usage Meter 3.4 stores much less data in the database so this problem should not occur after you upgrade.

0 Kudos