We have several ESXi 5.0 Hosts that are unexpectedly rebooting (PSOD) and would like to use Log Insight to quickly gauge how rampant this is across the fleet and frequency of occurence.
Unfortunately we have not been able to generate a successful query via Interactive Analysis to return any hits.
Here is a sample of actual events via the vobd.log on a given host:
2013-07-01T14:29:21.790Z: [UserLevelCorrelator] 96423176us: [esx.audit.host.boot] Host has booted.
2013-07-01T14:29:22.469Z: [UserLevelCorrelator] 97102185us: [vob.user.host.coredump] An unread host kernel core dump has been found.
Have tried setting exclusions specifically for the host name in question, free text searching for 'core dump', etc. all to no avail.
User ignorance?
If you configured the ESXi host(s) in question to send remote syslog to Log Insight after the PSOD occurred then this would be expected behavior. Configuring any device to send remote syslog applies only to events that occur after the configuration has been put in place. The next time the event occurs you should be able to search for it in Log Insight (you could even create a dashboard or configure an alert beforehand). Log messages would only be dropped in the case of network connectivity issues or hitting some maximum on the remote syslog target. In terms of ESXi the only thing to keep in mind is the remote syslog bug. I hope this helps!
It sounds like you are doing it correctly. Have you tried to search for either 'esx.audit.host.boot' or 'Host has booted'? Do either return results? What do you have the time range set to? Perhaps expand the time range? Are you sure the hosts in question are logging to Log Insight? ESXi 5.0 has a known issue in that versions prior to 5.0 update 2 on UDP and all version on TCP can stop remote logging (i.e. logging to Log Insight) if a remote syslog destination becomes unavailable. When this occurs you need to reload the syslog process on the ESXi hosts (the configure-esxi script on the Log Insight virtual appliance can do this for you through use of the -r flag).
I do not have an environment that has the core dump message, but I do have one where the host has booted message appears and searching for either 'esx.audit.host.boot' or 'Host has booted' I do get results. I hope this helps!
Thank you for the excellent suggestions.
I did confirm that my time range was not a limiting factor (searches were for 'All Time') and we had previously cycled the syslog process locally via 'esxcli system syslog reload'.
I tried your search phrase suggestions and the latter returned hits unfortunately they were for three separate (different) hosts than for the one we know with surety was PSOD-ing.
From a timing perspective we did add Log Insight as a syslog target after the PSOD occurred so perhaps this is a limiting factor (although we confirmed that the Host is sending some of its logs as evidenced by Host name queries it seems like not all of the logs were sent).
Would it be expected behavior to drop this particular vobd.log on any ESX host that is configured with Log Insight and have that log be searchable (is there any value in sending you our specific log to test this scenario)?
If you configured the ESXi host(s) in question to send remote syslog to Log Insight after the PSOD occurred then this would be expected behavior. Configuring any device to send remote syslog applies only to events that occur after the configuration has been put in place. The next time the event occurs you should be able to search for it in Log Insight (you could even create a dashboard or configure an alert beforehand). Log messages would only be dropped in the case of network connectivity issues or hitting some maximum on the remote syslog target. In terms of ESXi the only thing to keep in mind is the remote syslog bug. I hope this helps!
That explains it - thank you for the clarification!
Great! Can you please mark this question as answered?
