VMware Beta Community
rickvanvliet
Enthusiast
Enthusiast

authorize every minute

In our VCD event logs we see this message every minute:
User 'clusterauthor' (83137b58-3506-416b-a31d-e68962bce07b) authorize

together with this event:

OAuth token created for client ${oauthToken.clientName}(${oauthToken.clientId})
Event Id:
urn:vcloud:audit:f06957fc-5c8d-4dc6-8747-af8d88181f86
Type:
token
 
This resulted in the VCD database filling up the disk which resulted in VCD stopping.
 
Are other having the same issue, or is this a known issue?
0 Kudos
11 Replies
ccalvetbeta
Enthusiast
Enthusiast

Hi,
I have a similar issue. vCloud director "database full".

In my case the table "audit_trail" in the "vcloud" database in vCloud director postgres database is now 160 GB. Database size has been increased multiples time but this can't be the solution in long term vision.
Stopping the vAPPs associated to the Tanzu kubernetes cluster "beta" stops the generation of new logs.

I see also many "Access token created ..." events but i am not sure if they are related to beta or legacy CSE clusters.
But i think the ones consuming the largest amount of data are the events of type "definedEntity/modify ''beta006'' (9ebee87d-9d05-4c3f-b8e7-01ea477ac48c)"   - beta006 is one of the cluster create with CSE beta.
Because the Details are very large. See attached file.


Solution attempted so far:
I have already reduced in "Administration">"General settings">"Activity logs", logs history to keep and shown to 20 days but it doesn't seem that the entries older than 20 days in audit_trail are removed.
I guess there is a script responsible for cleaning old events, if it is the case and someone knows how to manually start it please let me know.
I am not even sure if it could work because i was in the assumption these settings works with the "audit_event" table, and when i was looking at row there, there are none in this table.

Questions:
Is it expect that cluster created with CSE beta will create events and therefore many rows in "audit_trail" database?
(Note, it is possible that cloud director is/was configured with advanced settings i am not aware, like adding extra logging during a previous support call)
What is the best way of cleaning the "audit_trail" database?
Are "Activity logs" settings supposed to have an impact on the "audit_trail" table? If yes, how to manually start the cleaning script?
What would be the impact of deleting the oldest rows using SQL commands against "audit_trail"?
If it could be done without breaking Cloud DIrector it would be an easy workaround.



0 Kudos
agoel
VMware Employee
VMware Employee

Thanks for bringing this up. We are looking into it and might come back with some questions. Could you share the version of VCDs you are using?

0 Kudos
akrishnakuma
VMware Employee
VMware Employee

Hi @ccalvetbeta @rickvanvliet,

Thanks for this report and we will fix the repeated logins at top priority.

Based on our understanding, an audit trial log for a login should be ~1KB. So even with so many logins we should not use up the database to that extent. So our suspicion is that something else could be going on to raise this size to the 160GB mentioned.

Do you have a sense of what tables could be large in this database? What are the frequent operations that you perform and what is the scale that you run.

 

0 Kudos
ccalvetbeta
Enthusiast
Enthusiast

Hi @akrishnakuma @agoel 

As mentioned i doubt the login events are the one consuming the most space in my case.
Database: vcloud
Table: audit_trail
(I am wondering is this audit_trail is expected or maybe was due to previous troubleshooting on this cloud director instance, i don't have full control of history)

ccalvetbeta_0-1663841940750.png


The weird thing is if i look at the first rows they are always the same, they are never purged and therefore database grows in size.

ccalvetbeta_1-1663842046117.png

If i looks at latest row, (i have just build a new cluster)
I see a lot of "modify" event, and i think there are the one filling the database because the payload is large. (I can't display it with the query, because it breaks all formatting)

ccalvetbeta_2-1663842227906.png

Example of such event

ccalvetbeta_3-1663842294843.png


(Details of a similar event could be seen attached to my previous post)

My immediate concern is how to clean this database. Could i just run a query to remove the first  (oldest) 1000 rows for example?
Note, this cloud director database is not in a cluster anymore.

 

0 Kudos
ccalvetbeta
Enthusiast
Enthusiast

Update: Just discovered with another sql query that there are older logs in this table.

ccalvetbeta_0-1663844279996.png

So just using "limit 5" does not display the first entries.

By using instead 
SELECT id, event_type, event_time, org_member_id, tenant_id FROM audit_trail ORDER BY event_time limit 100;
i end up with the real first events, which seems already related to the beta. So maybe starting when the beta and clusters were first deployed.

ccalvetbeta_0-1663844508691.png

 




0 Kudos
aritrasen
VMware Employee
VMware Employee

select event_type, count(*) as num from audit_trail group by event_type order by num desc

This should give you the aggregates without having to rely on limits, the query should be pretty fast. I can run the query in under 10 seconds on a 30GB db

0 Kudos
aritrasen
VMware Employee
VMware Employee

KB to cleanup audit table
https://kb.vmware.com/s/article/2106123
The KB is little old, for postgres the following should work
DELETE from audit_trail WHERE event_time < '2022-09-01 06:00:00.000';

0 Kudos
ccalvetbeta
Enthusiast
Enthusiast

Hi,
I did manage to remove rows.
However the table itself is still displayed at the same size.
Could you please let me know what should be the next step?

Is it normal that CSE beta is using the audit_trail and not only audit_event?

0 Kudos
ccalvetbeta
Enthusiast
Enthusiast

Hi, any update on this topic?

0 Kudos
Cevic1
Contributor
Contributor

Hi, I have this problem. Please, how to fix this? Vcloud director version 10.4.1

Cse version 4.0.1

0 Kudos
rickvanvliet
Enthusiast
Enthusiast

I believe it's fixed in a newer version of CSE

0 Kudos