VMware Cloud Community
daphnissov
Immortal
Immortal

Build query/alert help: No logs received from given hosts/agents

I'm struggling to figure out a way to build this as a query and subsequently as an alert. My previous post is involved in this issue. The problem I'm trying to address here is when a host or hosts stop sending logs to Log Insight for one reason or another. The reasons could be various including failure of the agent or syslog daemon, firewall definitions change and block communication, etc. For certain critical infrastructure pieces where log data is the only stateful data that needs to be preserved, having logs are immensely important. For other systems, having logs are similarly important when it comes to troubleshooting and postmortem analysis. The goal here is to build a query/alert that detects when one or multiple hosts stop sending logs to vRLI and then to pass that alert over to vROps associating it with the object which has stopped logging. This could be an ESXi host, vCenter, switch, or VM that has the vRLI agent installed. Creating a 1:1 mapping between host and alert is a simple thing, however this does not scale well and is a maintenance nightmare. Logically, I'd like to create a user-defined tag with a certain value, apply that key-value pair to an agent definition, and build a query/alert that understands for any system that contains that tag to alert when it does not see any logs for a given time period. So far, I'm not finding a way to make this happen other than to create an alert for each and every system that should be "watched". I welcome any thoughts or ideas on how to accomplish this goal.

4 Replies
admin
Immortal
Immortal

So there is 2 ways to do this -

1. involving a special tag like you requested - create a new VIP (Admin \ Cluster page) and put a tag on the VIP e.g. product = sensitive and configure all the sensitive ( i.e. the ones that you want to be notified on if they stop sending logs) sources to send logs to this vip. So all the logs from these sources will get the tag product=sensitive. Then in Interactive Analytics

Create a query for count of events with a filter on product field with 'contains' operator and value as 'sensitive' . Run the query for last minutes. Now select the red Bell icon on the right and select the option to create alert from query. In the Threshold section select when 'less than' 0 (zero) matches are found in the last '15 mins'  ( you can select the time to be different as desired and also the zero can be 10 or 1000 or another number as desired. Now select the notification option when the alert is triggered e.g email or webhook or vROps and save your alert.

You will now be notified if you do not see events from your sources in the last 15 mins or the time you configured. (Note this will alert if there are no events from any of the sources so if one source stops sending but other are sending this alert will not fire). If you desire to look for a particular source you can create a similar for the one source.

2. The second option is to create 1 alert per log source  - creation of the alert will the same as described above; but you do not need to do the VIP and tag portion of the steps. You query will simply look like

Count of events over time filtered on source ( or you can use hostname if desired I recommend source) contains value '10.11.12.13' and run the query for last 5 mins and then follow the same steps as above by selecting the red bell icon to create the alert.

You should do this separately for each log source that is sensitive and you want to be notified on so that you do not miss it and the alert text will show the exact source that stopped sending logs for you to go fix / troubleshoot it further to address the issue.

This option does create an over head of creating multiple alerts but it will NOT create an email spam if you select your threshold correctly so you do not get notified if no logs are seen really quickly without giving LI a chance to ingest the logs for e.g  if you have 10 vCenters integrated with log insight I would make my threshold to be 30 mins.

Hope this helps.

0 Kudos
daphnissov
Immortal
Immortal

Thanks for your reply, yogitap​. I came up essentially with the same options, although instead of creating a VIP and applying a tag there, I just added a tag to an agent group in the "Common" field. They do essentially the same thing, but the problem with both that render either solution useless is that it requires *all* messages be dropped and has no ability to detect if one of out, say thirty, has stopped logging. I'll file a feature request over on the Log Insight page because I believe this is important to have in the product.

0 Kudos
admin
Immortal
Immortal

I believe there is a feature request for this already, Please search and up vote it if possible.

0 Kudos
daphnissov
Immortal
Immortal

Thanks, I found it out there.