I recently discovered that, in VMware ESXi 7.0, event logging has been changed to include ssh login and logout events. These events then get captured and logged in the /storage/seat partition of the vCenter appliance. We have a busy enough environment that these SSH connections to the ESXi hosts generate a significantly high number of events that are filling up the vCenter database (see attached image) and I can see a huge number of events in the esx.audit.ssh.session.opened and esx.audit.ssh.session.closed tables in the vCenter database.
Of course this brought our vCenter appliance down and I followed the instructions from https://kb.vmware.com/s/article/2119809 to reduce the disk space usage of the /storage/seat partition. I also increased the disk space for the vCenter appliance /storage/seat partition per https://kb.vmware.com/s/article/2145603 .
Will there be an option to filter out specific events such as SSH events in future releases of ESXi? In the meantime, can I create some type of a cron job to regularly purge these specific types of events from the database?
Did you find a solution for your problem? I have same...
Unfortunately not - I continue to periodically monitor the partition sizes on each of my vcsa appliance's and follow kb 2119809 to reduce the space usage of the /storage/seat partition when needed. I'm concerned that this issue is low on the priority list and may even have been engineered on purpose since VMware competes with Nutanix.
Not really sure what that would have to do with Nutanix? Sounds like this would affect any vcenter depending on the number of ssh sessions. Its more likely they don't expect a large number of ssh sessions since most things can be done with other tools.
Yes, I have the same problems. Nutanix hosts generate these ssh connection but I don't find yet the way to disable audit for ssh connection...
This issue was reported few months back by a customer where in Nutanix Controller VM's kept on login to the hosts and creating the below sessions rapidly.
esx.audit.ssh.session.closed and esx.audit.ssh.session.opened.
SSH to VCSA: cd to /storage/seat/vpostgres and run du -shc * and share output.
Can u connect to vcdb (/opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres )and run the below:
SELECT COUNT(EVENT_ID) AS NUMEVENTS, EVENT_TYPE, USERNAME FROM VPXV_EVENT_ALL GROUP BY EVENT_TYPE, USERNAME ORDER BY NUMEVENTS DESC LIMIT 5;
Note:- This query can take some time.
I am quite certain Nutanix is making this connections and filling up vcdb faster. Last I remember VMware Engineering asking Nutanix involvement as to why so many connections r made.
I have truncated vpx_event* tables this morning and this query result for now :
13259 vim.event.UserLogoutSessionEvent root
13259 vim.event.UserLoginSessionEvent root
Almost 1lakh in a day is too high. I suppose Nutanix needs to tell why are they doing so many login and logout
The reason why this is happening is because VMware changed their logging behaviour in vSphere 7.
This could affect any other platform in theory but I guess less likely. Nutanix uses SSH excessively for communication between the CVM and the hypervisor. I am not sure why VMware decided to start logging this and I do not know whether you can disable these events from logging.
The workaround is to increase the SEAT partition and/or reduce retention.
Also, set up vCenter alerts to monitor health.
I have attached a Nutanix KB that explains it in more detail
Recommendation is always to keep the SSH service down on the ESXi hosts and only bring it up for ad-hoc tasks.
Good that VMware started logging these information. This was long pending.
I would say reduce retention . Increasing seat partition would increase the storage requirement for a future upgrade but you can still do it if not bother about space.
You can't disable the SSH service on a Nutanix-backed ESXi cluster. SSH is required for the CVM to communicate with ESXi. I am not saying it is a good or bad decision, it is just a fact.
Increasing any of the drives of the VCSA forces a move to a larger deployment model in a future upgrade and if you enlarge enough it goes to the max, so it is more than just space, it is CPU and Memory too.
I don't recommend enlarging the drives unless you have no other choice, we have seen negative consequences of doing so.
Nutanix Consistency Checker (NCC) runs its health check about every minute or so and uses SSH to run its host health collection routines. The more Nutanix hosts you have managed by VC, the more login/logout events it will capture. IMO we should request a feature to add SEAT granularity by event type to VC so we can capture just what we need.