Hi,
I get frequent email from a host in my cluster that goes like this:
Target: esx2014-m-2
Stateless event alarm
Alarm Definition:
([Event alarm expression: Host error] OR [Event alarm expression: Host warning])
Event details:
Issue detected on esx2014-m-2: vmsyslog logger 192.168.20.210:10514 lost 1671391 log messages
(2019-09-20T07:56:00.704Z cpu4:3545365)
And the /var/log/.vmsyslogd.err looks like this:
2019-09-20T13:23:31.763Z vmsyslog.loggers.file : ERROR ] Failed to spawn onrotate call
Traceback (most recent call last):
File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/site-packages/vmsyslog/loggers/file.py", line 379, in writeLog
File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/subprocess.py", line 676, in __init__
File "/build/mts/release/bora-13635690/bora/build/esx/release/vmvisor/sys-boot/lib64/python3.5/subprocess.py", line 1228, in _execute_child
OSError: [Errno 28] No space left on device
2019-09-20T13:38:26.408Z vmsyslog.msgQueue : ERROR ] 192.168.20.210:10514 - lost 1671391 log messages
The port isn't standard but I have a custom firewall rule to allow outbound. Works no problem for other hosts. And the SIEM does receive some logs from this host so it seems intermittent.
Any idea how to resolve this? It's running 6.5 U2
This issue has been documented in 6.5U3 release notes under resolved issues section:
VMware ESXi 6.5 Update 3 Release Notes
Limitations in the network message buffer (msgBuffer) size might cause loss of log messages
The size of the network msgBuffer is 1 MB but the main msgBuffer supports up to 25 000 lines of any length, which is at least 3 MB. If the network is slow, the write thread to the network msgBuffer is faster than the reader thread. This leads to loss of log messages with an alert: lost XXXX log messages
.
This issue is resolved in this release. The size of msgBuffer is increased to 3 MB.
Please mark my comment as the Correct Answer if this solution resolved your problem
Hi
First of all my friend, it mentions there is no space left on your (storage) device that is associated as the log files local repository.
OSError: [Errno 28] No space left on device
Can you check it please ... I think it's not related to the network communication (port and firewall) just is about allocated space on the ESXi host's datastore.
Hi,
Most articles online do refer to storage issue but after validation there is plenty space on a locations:
[root@esx2014-m-2:~] df -h
Filesystem Size Used Available Use% Mounted on
NFS 8.2T 4.6T 3.6T 56% /vmfs/volumes/Installation (NAS-M)
NFS 8.2T 4.8T 3.4T 59% /vmfs/volumes/Installation (NAS-01-M)
VMFS-5 550.8G 5.6G 545.2G 1% /vmfs/volumes/esx2014-m-2-DS
VMFS-5 9.8T 6.7T 3.1T 68% /vmfs/volumes/MTL-SAS-Lun-1
VMFS-5 10.9T 6.8T 4.1T 63% /vmfs/volumes/MTL-SAS-Lun-0
VMFS-5 1.5T 760.1G 729.1G 51% /vmfs/volumes/MTL-SSD-Lun-2
vfat 249.7M 170.0M 79.7M 68% /vmfs/volumes/ee7b7e77-eeac5e12-b31d-50430caa0bb5
vfat 249.7M 170.0M 79.7M 68% /vmfs/volumes/34dccf76-dc06e314-741f-c134f8876e3d
vfat 4.0G 42.5M 4.0G 1% /vmfs/volumes/5d2a3c76-3d416958-0871-a0369f49bf74
vfat 285.8M 172.9M 112.9M 60% /vmfs/volumes/53f43c33-8b79ad7c-9b93-ecf4bbc0c76c
Unless “device” refers to something else I’m clueless as to why I get this error.
Cheers,
There is a know issue with limitations in the network message buffer size that might cause loss of log messages.
Fix available in 6.5U3
Can you attach vmsyslogd.err log file ??
Hi Vijay2027,
Odd only 1 of 3 of my host, identical in version and hardware, is affected, no?
Anyhow, here is the log file in question.
I'm running a custom ESX ISO and manufacturer released 6.5 U3 recently so I should be able to upgrade at some point if that's the actual solution.
Cheers,
You are hitting the same issue as I mentioned in my previous post.
2019-10-29T09:47:46.016Z vmsyslog.main : CRITICAL] Dropping messages due to log stress (qsize = 25000)
Fair enough but why only one host would be affected out of 3?
If this was a defect with the release I would expect all hosts to be affected, no? Any setting I could tweak?
You have documentation on this issue? Can't seem to find anything documented online
EDIT: seems other hosts .vmsyslogd.err contain that error message as well but mostly these instead of the python error:
vmsyslog.main : ERROR ] reloading (65990)
I am not sure if there is any documentation available around this issue.
Open a SR with GSS (2019-10-29T09:47:46.016Z vmsyslog.main : CRITICAL] Dropping messages due to log stress (qsize = 25000)) to have the fix validated.
And I am not awrae of any workaround.
I got a SR open for this, referencing to to this post. We'll see what they say.
You seem to reply faster then support so thanks for that mate
Did you get confirmation from GSS or is there a workaround available.
So far they confirm this should be fixed in U3, that is all I got out of them.
This issue has been documented in 6.5U3 release notes under resolved issues section:
VMware ESXi 6.5 Update 3 Release Notes
Limitations in the network message buffer (msgBuffer) size might cause loss of log messages
The size of the network msgBuffer is 1 MB but the main msgBuffer supports up to 25 000 lines of any length, which is at least 3 MB. If the network is slow, the write thread to the network msgBuffer is faster than the reader thread. This leads to loss of log messages with an alert: lost XXXX log messages
.
This issue is resolved in this release. The size of msgBuffer is increased to 3 MB.
Please mark my comment as the Correct Answer if this solution resolved your problem