Solved: Host dropping vmsyslog logs

casgrain2 · ‎09-20-2019

Hi,

I get frequent email from a host in my cluster that goes like this:

Target: esx2014-m-2

Stateless event alarm

Alarm Definition:

([Event alarm expression: Host error] OR [Event alarm expression: Host warning])

Event details:

Issue detected on esx2014-m-2: vmsyslog logger 192.168.20.210:10514 lost 1671391 log messages

(2019-09-20T07:56:00.704Z cpu4:3545365)

And the /var/log/.vmsyslogd.err looks like this:

2019-09-20T13:23:31.763Z vmsyslog.loggers.file : ERROR ] Failed to spawn onrotate call

Traceback (most recent call last):

File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/site-packages/vmsyslog/loggers/file.py", line 379, in writeLog

File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/subprocess.py", line 676, in __init__

File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/subprocess.py", line 1228, in _execute_child

OSError: [Errno 28] No space left on device

2019-09-20T13:38:26.408Z vmsyslog.msgQueue : ERROR ] 192.168.20.210:10514 - lost 1671391 log messages

The port isn't standard but I have a custom firewall rule to allow outbound. Works no problem for other hosts. And the SIEM does receive some logs from this host so it seems intermittent.

Any idea how to resolve this? It's running 6.5 U2

Vijay2027 · ‎11-02-2019

This issue has been documented in 6.5U3 release notes under resolved issues section:

VMware ESXi 6.5 Update 3 Release Notes

Limitations in the network message buffer (msgBuffer) size might cause loss of log messages

The size of the network msgBuffer is 1 MB but the main msgBuffer supports up to 25 000 lines of any length, which is at least 3 MB. If the network is slow, the write thread to the network msgBuffer is faster than the reader thread. This leads to loss of log messages with an alert: lost XXXX log messages.

This issue is resolved in this release. The size of msgBuffer is increased to 3 MB.

Please mark my comment as the Correct Answer if this solution resolved your problem

View solution in original post

NathanosBlightc · ‎09-20-2019

Hi

First of all my friend, it mentions there is no space left on your (storage) device that is associated as the log files local repository.

OSError: [Errno 28] No space left on device

Can you check it please ... I think it's not related to the network communication (port and firewall) just is about allocated space on the ESXi host's datastore.

Please mark my comment as the Correct Answer if this solution resolved your problem

casgrain2 · ‎10-28-2019

Hi,

Most articles online do refer to storage issue but after validation there is plenty space on a locations:

[root@esx2014-m-2:~] df -h

Filesystem Size Used Available Use% Mounted on

NFS 8.2T 4.6T 3.6T 56% /vmfs/volumes/Installation (NAS-M)

NFS 8.2T 4.8T 3.4T 59% /vmfs/volumes/Installation (NAS-01-M)

VMFS-5 550.8G 5.6G 545.2G 1% /vmfs/volumes/esx2014-m-2-DS

VMFS-5 9.8T 6.7T 3.1T 68% /vmfs/volumes/MTL-SAS-Lun-1

VMFS-5 10.9T 6.8T 4.1T 63% /vmfs/volumes/MTL-SAS-Lun-0

VMFS-5 1.5T 760.1G 729.1G 51% /vmfs/volumes/MTL-SSD-Lun-2

vfat 249.7M 170.0M 79.7M 68% /vmfs/volumes/ee7b7e77-eeac5e12-b31d-50430caa0bb5

vfat 249.7M 170.0M 79.7M 68% /vmfs/volumes/34dccf76-dc06e314-741f-c134f8876e3d

vfat 4.0G 42.5M 4.0G 1% /vmfs/volumes/5d2a3c76-3d416958-0871-a0369f49bf74

vfat 285.8M 172.9M 112.9M 60% /vmfs/volumes/53f43c33-8b79ad7c-9b93-ecf4bbc0c76c

Unless “device” refers to something else I’m clueless as to why I get this error.

Cheers,

Vijay2027 · ‎10-28-2019

There is a know issue with limitations in the network message buffer size that might cause loss of log messages.

Fix available in 6.5U3

Can you attach vmsyslogd.err log file ??

casgrain2 · ‎10-29-2019

Hi Vijay2027,

Odd only 1 of 3 of my host, identical in version and hardware, is affected, no?

Anyhow, here is the log file in question.

I'm running a custom ESX ISO and manufacturer released 6.5 U3 recently so I should be able to upgrade at some point if that's the actual solution.

Cheers,

Vijay2027 · ‎10-29-2019

You are hitting the same issue as I mentioned in my previous post.

2019-10-29T09:47:46.016Z vmsyslog.main : CRITICAL] Dropping messages due to log stress (qsize = 25000)

casgrain2 · ‎10-29-2019

Fair enough but why only one host would be affected out of 3?

If this was a defect with the release I would expect all hosts to be affected, no? Any setting I could tweak?

You have documentation on this issue? Can't seem to find anything documented online

EDIT: seems other hosts .vmsyslogd.err contain that error message as well but mostly these instead of the python error:

vmsyslog.main : ERROR ] reloading (65990)

Vijay2027 · ‎10-29-2019

I am not sure if there is any documentation available around this issue.

Open a SR with GSS (2019-10-29T09:47:46.016Z vmsyslog.main : CRITICAL] Dropping messages due to log stress (qsize = 25000)) to have the fix validated.

And I am not awrae of any workaround.

casgrain2 · ‎10-29-2019

I got a SR open for this, referencing to to this post. We'll see what they say.

You seem to reply faster then support so thanks for that mate

Vijay2027 · ‎11-01-2019

Did you get confirmation from GSS or is there a workaround available.

casgrain2 · ‎11-01-2019

So far they confirm this should be fixed in U3, that is all I got out of them.

Vijay2027 · ‎11-02-2019

This issue has been documented in 6.5U3 release notes under resolved issues section:

VMware ESXi 6.5 Update 3 Release Notes

Limitations in the network message buffer (msgBuffer) size might cause loss of log messages

The size of the network msgBuffer is 1 MB but the main msgBuffer supports up to 25 000 lines of any length, which is at least 3 MB. If the network is slow, the write thread to the network msgBuffer is faster than the reader thread. This leads to loss of log messages with an alert: lost XXXX log messages.

This issue is resolved in this release. The size of msgBuffer is increased to 3 MB.

Please mark my comment as the Correct Answer if this solution resolved your problem

All

Host dropping vmsyslog logs