VMware Cloud Community
casgrain2
Contributor
Contributor
Jump to solution

Host dropping vmsyslog logs

Hi,

I get frequent email from a host in my cluster that goes like this:

Target: esx2014-m-2

Stateless event alarm

Alarm Definition:

([Event alarm expression: Host error] OR [Event alarm expression: Host warning])

Event details:

Issue detected on esx2014-m-2: vmsyslog logger 192.168.20.210:10514 lost 1671391 log messages

(2019-09-20T07:56:00.704Z cpu4:3545365)

And the /var/log/.vmsyslogd.err looks like this:

2019-09-20T13:23:31.763Z vmsyslog.loggers.file    : ERROR   ] Failed to spawn onrotate call

Traceback (most recent call last):

  File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/site-packages/vmsyslog/loggers/file.py", line 379, in writeLog

  File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/subprocess.py", line 676, in __init__

  File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/subprocess.py", line 1228, in _execute_child

OSError: [Errno 28] No space left on device

2019-09-20T13:38:26.408Z vmsyslog.msgQueue        : ERROR   ] 192.168.20.210:10514 - lost 1671391  log messages

The port isn't standard but I have a custom firewall rule to allow outbound. Works no problem for other hosts. And the SIEM does receive some logs from this host so it seems intermittent.

Any idea how to resolve this? It's running 6.5 U2

Reply
0 Kudos
1 Solution

Accepted Solutions
Vijay2027
Expert
Expert
Jump to solution

This issue has been documented in 6.5U3 release notes under resolved issues section:

VMware ESXi 6.5 Update 3 Release Notes

Limitations in the network message buffer (msgBuffer) size might cause loss of log messages

The size of the network msgBuffer is 1 MB but the main msgBuffer supports up to 25 000 lines of any length, which is at least 3 MB. If the network is slow, the write thread to the network msgBuffer is faster than the reader thread. This leads to loss of log messages with an alert: lost XXXX log messages.

This issue is resolved in this release. The size of msgBuffer is increased to 3 MB.

Please mark my comment as the Correct Answer if this solution resolved your problem

View solution in original post

Reply
0 Kudos
11 Replies
NathanosBlightc
Commander
Commander
Jump to solution

Hi

First of all my friend, it mentions there is no space left on your (storage) device that is associated as the log files local repository.

OSError: [Errno 28] No space left on device

Can you check it please ... I think it's not related to the network communication (port and firewall) just is about allocated space on the ESXi host's datastore.

Please mark my comment as the Correct Answer if this solution resolved your problem
Reply
0 Kudos
casgrain2
Contributor
Contributor
Jump to solution

Hi,

Most articles online do refer to storage issue but after validation there is plenty space on a locations:

[root@esx2014-m-2:~] df -h

Filesystem   Size Used Available Use% Mounted on

NFS 8.2T   4.6T      3.6T 56% /vmfs/volumes/Installation (NAS-M)

NFS 8.2T   4.8T      3.4T 59% /vmfs/volumes/Installation (NAS-01-M)

VMFS-5 550.8G   5.6G    545.2G 1% /vmfs/volumes/esx2014-m-2-DS

VMFS-5 9.8T   6.7T      3.1T 68% /vmfs/volumes/MTL-SAS-Lun-1

VMFS-5      10.9T 6.8T      4.1T  63% /vmfs/volumes/MTL-SAS-Lun-0

VMFS-5 1.5T 760.1G    729.1G  51% /vmfs/volumes/MTL-SSD-Lun-2

vfat       249.7M 170.0M     79.7M 68% /vmfs/volumes/ee7b7e77-eeac5e12-b31d-50430caa0bb5

vfat       249.7M 170.0M     79.7M 68% /vmfs/volumes/34dccf76-dc06e314-741f-c134f8876e3d

vfat         4.0G 42.5M      4.0G   1% /vmfs/volumes/5d2a3c76-3d416958-0871-a0369f49bf74

vfat       285.8M 172.9M    112.9M 60% /vmfs/volumes/53f43c33-8b79ad7c-9b93-ecf4bbc0c76c

Unless “device” refers to something else I’m clueless as to why I get this error.

Cheers,

Reply
0 Kudos
Vijay2027
Expert
Expert
Jump to solution

There is a know issue with limitations in the network message buffer size that might cause loss of log messages.

Fix available in 6.5U3

Can you attach vmsyslogd.err log file ??

Reply
0 Kudos
casgrain2
Contributor
Contributor
Jump to solution

Hi Vijay2027,

Odd only 1 of 3 of my host, identical in version and hardware, is affected, no?

Anyhow, here is the log file in question.

I'm running a custom ESX ISO and manufacturer released 6.5 U3 recently so I should be able to upgrade at some point if that's the actual solution.

Cheers,

Reply
0 Kudos
Vijay2027
Expert
Expert
Jump to solution

You are hitting the same issue as I mentioned in my previous post.

2019-10-29T09:47:46.016Z vmsyslog.main            : CRITICAL] Dropping messages due to log stress (qsize = 25000)

Reply
0 Kudos
casgrain2
Contributor
Contributor
Jump to solution

Fair enough but why only one host would be affected out of 3?

If this was a defect with the release I would expect all hosts to be affected, no? Any setting I could tweak?

You have documentation on this issue? Can't seem to find anything documented online

EDIT: seems other hosts .vmsyslogd.err contain that error message as well but mostly these instead of the python error:

     vmsyslog.main            : ERROR   ] reloading (65990)

Reply
0 Kudos
Vijay2027
Expert
Expert
Jump to solution

I am not sure if there is any documentation available around this issue.

Open a SR with GSS (2019-10-29T09:47:46.016Z vmsyslog.main            : CRITICAL] Dropping messages due to log stress (qsize = 25000)) to have the fix validated.

And I am not awrae of any workaround.

Reply
0 Kudos
casgrain2
Contributor
Contributor
Jump to solution

I got a SR open for this, referencing to to this post. We'll see what they say.

You seem to reply faster then support so thanks for that mate Smiley Happy

Reply
0 Kudos
Vijay2027
Expert
Expert
Jump to solution

Did you get confirmation from GSS or is there a workaround available.

Reply
0 Kudos
casgrain2
Contributor
Contributor
Jump to solution

So far they confirm this should be fixed in U3, that is all I got out of them.

Reply
0 Kudos
Vijay2027
Expert
Expert
Jump to solution

This issue has been documented in 6.5U3 release notes under resolved issues section:

VMware ESXi 6.5 Update 3 Release Notes

Limitations in the network message buffer (msgBuffer) size might cause loss of log messages

The size of the network msgBuffer is 1 MB but the main msgBuffer supports up to 25 000 lines of any length, which is at least 3 MB. If the network is slow, the write thread to the network msgBuffer is faster than the reader thread. This leads to loss of log messages with an alert: lost XXXX log messages.

This issue is resolved in this release. The size of msgBuffer is increased to 3 MB.

Please mark my comment as the Correct Answer if this solution resolved your problem

Reply
0 Kudos