vmsyslog filling up vCenter database

Version 1

    In our datacenter we had a few customers where their vCenter database filled up at an alarming rate. After checking in MSSQL what table was the culprit, we found out the VPX_EVENT and VPX_EVENT_ARG tables kept growing. Sure enough, in vCenter we saw a constant flow of these alerts:

     

    vmsyslog1.jpg

    Also in the vmkernel.log file we saw similar messages:

     

    2014-11-18T22:52:03.354Z cpu40:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages

    2014-11-18T22:52:03.355Z cpu40:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages

    2014-11-18T22:55:03.383Z cpu42:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 756 log messages

    2014-11-18T22:55:03.384Z cpu42:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages

    2014-11-18T22:55:03.386Z cpu42:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages

    2014-11-18T22:55:03.395Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 852 log messages

    2014-11-18T22:58:03.419Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 991 log messages

    2014-11-18T22:58:03.422Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 992 log messages

    2014-11-18T22:58:03.424Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 2 log messages

    2014-11-18T23:01:03.454Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1090 log messages

    2014-11-18T23:01:03.467Z cpu36:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1102 log messages

    2014-11-18T23:01:03.468Z cpu36:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 3 log messages

    2014-11-18T23:04:03.482Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1073 log messages

    2014-11-18T23:04:03.494Z cpu38:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1153 log messages

    2014-11-18T23:07:03.521Z cpu38:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 525 log messages

    2014-11-18T23:07:03.532Z cpu38:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 893 log messages

    2014-11-18T23:10:03.554Z cpu37:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 718 log messages

    2014-11-18T23:10:03.566Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 796 log messages

    2014-11-18T23:10:03.568Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages

    2014-11-18T23:10:03.568Z cpu37:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages

    2014-11-18T23:13:03.589Z cpu44:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 939 log messages

    2014-11-18T23:13:03.595Z cpu40:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 987 log messages

    2014-11-18T23:16:03.628Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 811 log messages

    2014-11-18T23:16:03.629Z cpu44:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 926 log messages

    2014-11-18T23:19:03.659Z cpu38:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 844 log messages

    2014-11-18T23:19:03.670Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 846 log messages

    2014-11-18T23:19:03.671Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages

    2014-11-18T23:22:03.690Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 868 log messages

    2014-11-18T23:22:03.691Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 1 log messages

    2014-11-18T23:22:03.704Z cpu36:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 948 log messages

    2014-11-18T23:25:03.723Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 609 log messages

    2014-11-18T23:25:03.741Z cpu46:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 585 log messages

    2014-11-18T23:28:03.760Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 756 log messages

    2014-11-18T23:28:03.774Z cpu40:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 790 log messages

    2014-11-18T23:31:03.794Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 882 log messages

    2014-11-18T23:31:03.806Z cpu40:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 963 log messages

    2014-11-18T23:31:03.808Z cpu40:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1 log messages

    2014-11-18T23:34:03.828Z cpu42:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 719 log messages

    2014-11-18T23:37:03.851Z cpu39:3167699)ALERT: vmsyslog logger 10.122.126.71:514 lost 1616 log messages

    2014-11-18T23:37:03.862Z cpu36:3167700)ALERT: vmsyslog logger 10.122.126.5:514 lost 761 log messages

    ...

     

    An issue with vmsyslog so it seems. Our syslog configuration however was perfectly fine:

     

    (/etc/vmsyslog.conf)

    [DEFAULT]

    loghost = <none>

    default_timeout = 180

    logdir_unique = false

    rotate = 8

    logdir = <none>

    size = 1024

     

     

    [vmsyslog]

    loghost = tcp://10.122.126.71:514, tcp://10.122.126.5:514

    rotate = 8

    size = 1024

     

    Digging a bit deeper into the syslog error log revealed this:

     

    (/var/log/.vmsyslogd.err)

    2014-11-18T23:37:03.861Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 761 log messages

    2014-11-18T23:37:03.866Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe

    2014-11-18T23:40:03.895Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 1045 log messages

    2014-11-18T23:40:03.899Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe

    2014-11-18T23:43:03.933Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 661 log messages

    2014-11-18T23:43:03.937Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe

    2014-11-18T23:46:03.969Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 761 log messages

    2014-11-18T23:46:03.973Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe

    2014-11-18T23:49:04.002Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 573 log messages

    2014-11-18T23:49:04.006Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe

    2014-11-18T23:52:04.036Z vmsyslog.msgQueue        : ERROR ] 10.122.126.5:514 - lost 665 log messages

    2014-11-18T23:52:04.040Z vmsyslog.loggers.network : ERROR   ] 10.122.126.5:514 - socket error : [Errno 32] Broken pipe


    At this stage, I was starting to wonder if the other side was actually working fine. We use VMware Syslog Collector to capture these logs and although the service was running fine, we noticed this in the debug log:


    2014-11-19 06:26:27,345 - sc.twisted - Log observer <bound method DefaultObserver._emit of <twisted.python.log.DefaultObserver instance at 0x01095F80>> failed.

    Traceback (most recent call last):

    File "twisted\python\context.pyo", line 59, in callWithContext

     

      File "twisted\python\context.pyo", line 37, in callWithContext

     

      File "twisted\internet\selectreactor.pyo", line 154, in _doReadOrWrite

     

      File "twisted\python\log.pyo", line 221, in err

     

    --- <exception caught here> ---

    File "twisted\python\log.pyo", line 292, in msg

     

      File "twisted\python\log.pyo", line 623, in _emit

     

    1. exceptions.IOError: [Errno 9] Bad file descriptor

     

    2014-11-19 06:26:27,348 - sc.core - TCP connection lost from 192.168.243.200:49420

    2014-11-19 06:26:27,348 - sc.log - Error cleaning up state: '192.168.243.200'

    2014-11-19 06:26:47,546 - sc.core - New TCP connection from IPv4Address(TCP, '192.168.243.1', 58438)

    2014-11-19 06:26:47,549 - sc.twisted - Unhandled Error

    Traceback (most recent call last):

    File "twisted\python\log.pyo", line 84, in callWithLogger

     

      File "twisted\python\log.pyo", line 69, in callWithContext

     

      File "twisted\python\context.pyo", line 59, in callWithContext

     

      File "twisted\python\context.pyo", line 37, in callWithContext

     

    --- <exception caught here> ---

    File "twisted\internet\selectreactor.pyo", line 146, in _doReadOrWrite

     

      File "twisted\internet\tcp.pyo", line 455, in doRead


    After some digging around so see how this service actually works (it basically is a Python script), I've noticed the Twisted framework being used is actually quite old (v8.2). I've then replaced SyslogCollectorLibrary.zip (the whole Python 2.6 environment compressed) with a version in which I've included the latest version on Twisted for Python 2.6 (v12.0). I stopped the service, replaced the zip file and restarted the service and almost immediately the flow of events stopped.

     

    I've attached this modified zip file so other people can see if it helps if they encounter this issue.