VMware Cloud Community
Garimos
Enthusiast
Enthusiast
Jump to solution

ESXi Host unreachable Syslog (vRLI)

Hello VMware Community,

the following error case exists:

For our VMware environment, we have currently built a new LogInsight instance with an ILB. (Size Large, 3 Nodes) The LogInsight is the target for about ~1000 host sources (ESXi/vSphere). During the basic configuration of the ESXi hosts we noticed that they throw the following error.

[root@esxi:~] less /var/log/.vmsyslogd.err
vmsyslog.loggers.network : ERROR ] vRLI.ilb.dns:514 - socket error : [Errno 32] Broken pipe
vmsyslog.loggers.network : ERROR ] Error shutting down socket.

Error Message in the WebUI - The host "vRLI.ilb.dns:514" has become unreachable. Remote logging to this host has stopped.

Network side as well as the firewall configuration has already been checked and unfortunately this is not the source of the error.

[root@esxi:~] nc -z vRLI.ilb.dns 514
Connection to vRLI.ilb.dns 514 port [tcp/shell] succeeded!

As a result, we suspected the error was in the kernel configuration for the TCP stack and made the following changes in /etc/sysctl.d (on the LogInsight nodes):

#Max Buffer Size 2^28 --> 268.435.456 ~270MB
# Provide adequate buffer memory.
# rmem_max and wmem_max are TCP max buffer size
# settable with setsockopt(), in bytes
# tcp_rmem and tcp_wmem are per socket in bytes.
# tcp_mem is for all TCP streams, in 4096-byte pages.
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.core.rmem_default = 1638400
net.core.wmem_default = 1638400
net.ipv4.tcp_rmem = 4096 1638400 268435456
net.ipv4.tcp_wmem = 4096 1638400 268435456

# This server might have 1500 clients simultaneously, so:
net.ipv4.tcp_mem = 4096 1638400 268435456

# Disable TCP SACK (TCP Selective Acknowledgement),
# DSACK (duplicate TCP SACK), and FACK (Forward Acknowledgement)
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_fack = 0

#
net.ipv4.tcp_max_syn_backlog = 100000
net.core.somaxconn = 100000
net.core.netdev_max_backlog = 100000

Even after changing the buffer size and the rest of the LogInsight TCP configuration parameters, we still have multiple packet drops and the socket error on the ESXi hosts.

root@LogInight[ / ]# netstat -s | grep "SYNs to LISTEN"
59848 SYNs to LISTEN sockets dropped


Is the described error case already known and if yes, does a workaround exist? 

Greeting

Garimos

Labels (3)
Reply
0 Kudos
1 Solution

Accepted Solutions
Garimos
Enthusiast
Enthusiast
Jump to solution

Hi @schitz3michael,

in our case we have reached the maximum supported number of open TCP sessions - solution was a "vertical scale out" from a 3 node cluster to a 5 node cluster.

Another option is to switch to UDP for the transmission of logs.

View solution in original post

10 Replies
maksym007
Expert
Expert
Jump to solution

Which ESXi version is in use? 

Have you check syslog global dir configuration? If IP of loginsight is visible?

Reply
0 Kudos
Garimos
Enthusiast
Enthusiast
Jump to solution

"Have you check syslog global dir configuration?"

-->ESXi Version 7.0U3

[root@esxi:~] esxcli system syslog config get
Allow Vsan Backing: false
Check Certificate Revocation List: false
Dropped Log File Rotation Size: 100
Dropped Log File Rotations: 10
Enforce SSLCertificates: true
Local Log Output: /scratch/log
Local Log Output Is Configured: false
Local Log Output Is Persistent: true
Local Logging Default Rotation Size: 1024
Local Logging Default Rotations: 8
Log Level: error
Log To Unique Subdirectory: false
Message Queue Drop Mark: 90
Remote Host: tcp://loginsight:514
Remote Host Connect Retry Delay: 180
Remote Host Maximum Message Length: 1024
Strict X509Compliance: false

 

"If IP of loginsight is visible?"

[root@esxi:~] nslookup loginsight.dns.com
Server: [DNS-Server]
Address: [DNS-Server]:53

loginsight.dns.com canonical name = vRLI01.dns.com
Name: vRLI01.dns.com
Address: 1.2.3.4

loginsight.dns.com canonical name = vRLI01.dns.com

[root@esxi:~] nslookup 1.2.3.4
Server: [DNS-Server]
Address: [DNS-Server]:53

Non-authoritative answer:
1.2.3.4.in-addr.arpa name = vRLI01.dns.com

 

[root@esxi:~] nc -z 1.2.3.4 514
Connection to 1.2.3.4 514 port [tcp/shell] succeeded!


[root@esxi:~] nc -z vRLI01.dns.com 514
Connection to vRLI01.dns.com 514 port [tcp/shell] succeeded!


[root@esxi:~] nc -z loginsight.dns.com 514
Connection to loginsight.dns.com 514 port [tcp/shell] succeeded!

 

 

Reply
0 Kudos
maksym007
Expert
Expert
Jump to solution

Check here again if your integration with vCenter was ok

https://www.vladan.fr/how-to-setup-vmware-vrealize-log-insight/ 

Reply
0 Kudos
Garimos
Enthusiast
Enthusiast
Jump to solution

vSphereInte.png

The vSphere integration is fully configured - the vCenters were also connected via syslog (vcsa:5480/syslog).Due to the current error pattern, I have removed the host configuration for the time being because the "Unreachable" error is flooding the UI.

 

 

Reply
0 Kudos
maksym007
Expert
Expert
Jump to solution

Have you checked network settings? I mean firewall ports

Reply
0 Kudos
Garimos
Enthusiast
Enthusiast
Jump to solution

The network configuration related to syslog is open for all instances and works in theory - see the previous netcat commands on the ESXi hosts.

Reply
0 Kudos
maksym007
Expert
Expert
Jump to solution

Strange to be honest 

Reply
0 Kudos
schmitz3michael
Contributor
Contributor
Jump to solution

@Garimosdo you have any solution to your problem?
I get the same messages, so far I can't find a problem that causes them.


best regards
Michael

Reply
0 Kudos
Garimos
Enthusiast
Enthusiast
Jump to solution

Hi @schitz3michael,

in our case we have reached the maximum supported number of open TCP sessions - solution was a "vertical scale out" from a 3 node cluster to a 5 node cluster.

Another option is to switch to UDP for the transmission of logs.

schmitz3michael
Contributor
Contributor
Jump to solution

Thanks @Garimos , that helped us! We currently have a small deployment with a single syslog host. We have now increased the max. number of TCP connections on this host.

Reply
0 Kudos