Hello VMware Community,
the following error case exists:
For our VMware environment, we have currently built a new LogInsight instance with an ILB. (Size Large, 3 Nodes) The LogInsight is the target for about ~1000 host sources (ESXi/vSphere). During the basic configuration of the ESXi hosts we noticed that they throw the following error.
[root@esxi:~] less /var/log/.vmsyslogd.err
vmsyslog.loggers.network : ERROR ] vRLI.ilb.dns:514 - socket error : [Errno 32] Broken pipe
vmsyslog.loggers.network : ERROR ] Error shutting down socket.
Error Message in the WebUI - The host "vRLI.ilb.dns:514" has become unreachable. Remote logging to this host has stopped.
Network side as well as the firewall configuration has already been checked and unfortunately this is not the source of the error.
[root@esxi:~] nc -z vRLI.ilb.dns 514
Connection to vRLI.ilb.dns 514 port [tcp/shell] succeeded!
As a result, we suspected the error was in the kernel configuration for the TCP stack and made the following changes in /etc/sysctl.d (on the LogInsight nodes):
#Max Buffer Size 2^28 --> 268.435.456 ~270MB
# Provide adequate buffer memory.
# rmem_max and wmem_max are TCP max buffer size
# settable with setsockopt(), in bytes
# tcp_rmem and tcp_wmem are per socket in bytes.
# tcp_mem is for all TCP streams, in 4096-byte pages.
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.core.rmem_default = 1638400
net.core.wmem_default = 1638400
net.ipv4.tcp_rmem = 4096 1638400 268435456
net.ipv4.tcp_wmem = 4096 1638400 268435456
# This server might have 1500 clients simultaneously, so:
net.ipv4.tcp_mem = 4096 1638400 268435456
# Disable TCP SACK (TCP Selective Acknowledgement),
# DSACK (duplicate TCP SACK), and FACK (Forward Acknowledgement)
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_fack = 0
#
net.ipv4.tcp_max_syn_backlog = 100000
net.core.somaxconn = 100000
net.core.netdev_max_backlog = 100000
Even after changing the buffer size and the rest of the LogInsight TCP configuration parameters, we still have multiple packet drops and the socket error on the ESXi hosts.
root@LogInight[ / ]# netstat -s | grep "SYNs to LISTEN"
59848 SYNs to LISTEN sockets dropped
Is the described error case already known and if yes, does a workaround exist?
Greeting
Garimos
Which ESXi version is in use?
Have you check syslog global dir configuration? If IP of loginsight is visible?
"Have you check syslog global dir configuration?"
-->ESXi Version 7.0U3
[root@esxi:~] esxcli system syslog config get
Allow Vsan Backing: false
Check Certificate Revocation List: false
Dropped Log File Rotation Size: 100
Dropped Log File Rotations: 10
Enforce SSLCertificates: true
Local Log Output: /scratch/log
Local Log Output Is Configured: false
Local Log Output Is Persistent: true
Local Logging Default Rotation Size: 1024
Local Logging Default Rotations: 8
Log Level: error
Log To Unique Subdirectory: false
Message Queue Drop Mark: 90
Remote Host: tcp://loginsight:514
Remote Host Connect Retry Delay: 180
Remote Host Maximum Message Length: 1024
Strict X509Compliance: false
"If IP of loginsight is visible?"
[root@esxi:~] nslookup loginsight.dns.com
Server: [DNS-Server]
Address: [DNS-Server]:53
loginsight.dns.com canonical name = vRLI01.dns.com
Name: vRLI01.dns.com
Address: 1.2.3.4
loginsight.dns.com canonical name = vRLI01.dns.com
[root@esxi:~] nslookup 1.2.3.4
Server: [DNS-Server]
Address: [DNS-Server]:53
Non-authoritative answer:
1.2.3.4.in-addr.arpa name = vRLI01.dns.com
[root@esxi:~] nc -z 1.2.3.4 514
Connection to 1.2.3.4 514 port [tcp/shell] succeeded!
[root@esxi:~] nc -z vRLI01.dns.com 514
Connection to vRLI01.dns.com 514 port [tcp/shell] succeeded!
[root@esxi:~] nc -z loginsight.dns.com 514
Connection to loginsight.dns.com 514 port [tcp/shell] succeeded!
Check here again if your integration with vCenter was ok
https://www.vladan.fr/how-to-setup-vmware-vrealize-log-insight/
The vSphere integration is fully configured - the vCenters were also connected via syslog (vcsa:5480/syslog).Due to the current error pattern, I have removed the host configuration for the time being because the "Unreachable" error is flooding the UI.
Have you checked network settings? I mean firewall ports
The network configuration related to syslog is open for all instances and works in theory - see the previous netcat commands on the ESXi hosts.
Strange to be honest