VMware Cloud Community
thepj
Enthusiast
Enthusiast
Jump to solution

Log Insight Workers failing to join

SR: 14493589706

Log Insight Master 2.0.3

Log Insight Workers 2.0.1

Description:

I am having issues joining workers to the master installation. The workers are located in a geographically separate vCenter, and both vlans have the ability to talk between them with the only ACL, port 22.

The error that is given is:

"Failed to grant the membership to a cluster java.net.ConnectException: Connection refused"

I see on the docs that 59778 TCP, 12543 TCP, 16520 TCP, 16580 TCP, need to be open, so there is no ACL's in place for it. I also noticed that 'service iptables status' returned no firewall! So, I am out of options on how to troubleshoot this 😞

-Patrick

Labels (1)
0 Kudos
1 Solution

Accepted Solutions
sflanders
Commander
Commander
Jump to solution

The best practice / supported way is to deploy a LI cluster in the same LAN/DC. In terms of ports, more than port 22 is needed so my guess is you have an ACL issue. Have a look at the security guide for required ports: Log Insight Firewall Recommendations

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===

View solution in original post

0 Kudos
18 Replies
sflanders
Commander
Commander
Jump to solution

The best practice / supported way is to deploy a LI cluster in the same LAN/DC. In terms of ports, more than port 22 is needed so my guess is you have an ACL issue. Have a look at the security guide for required ports: Log Insight Firewall Recommendations

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

I already read that article, and stated in the original post I have already checked for ACL's! The worker that is on the same port group, in the same datacenter is working fine. The ports I listed are FROM the document you linked me, specifically required for a worker.

Is there a best practice/support method that is currently documented? I have read everything on VMware vCenter Log Insight 2.0 Documentation Center and nothing shows how to set it up, or what is even suggested for best practice besides just saying "load balancing"

0 Kudos
sflanders
Commander
Commander
Jump to solution

Well LB is separate from creating a cluster. The LB is for handling traffic sent to the cluster (e.g. syslog traffic). Creating a cluster is straight forward and since it works with nodes in the same DC, I would suspect it is a firewall issue. One thing you can try is logging into the master and running: telnet <ipOfWorker> <portFromSecurityDoc> and see if any are not open. The issue would not be on the virtual appliance, the issue would be on the network between the master and the worker. You might also try looking at the logs in /storage/var/loginsight on both the worker and the master and see what additional information is presented about the issue.

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

Telnet is not installed on, nor does it come with, Log Insight 😞 Also, the only firewall that would be in the way between the locations, would the IPtables on this VM. Since there no installation of iptables on the log insight vm either, the issue is not a firewall or ACL. The slave even gets a denied response from the Master, so I know it can talk, as well as ping each other.

telnet.JPG

Side note, looks like the ROOT account is set to expire 365 days from the time it was changed, guess I will be setting a reminder for that!

0 Kudos
sflanders
Commander
Commander
Jump to solution

Hmm, thought telnet was, but you are correct. I noticed you stated worker is running older version than master. Was the worker ever joined to master? Perhaps you could try upgrading worker to same version as master before joining (though if same config worked within DC then not issue). Next up would be looking at the logs on the master and worker.

You mentioned root account expires in 365 days - can you paste where you see this?

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

I will try deploying another one that is 2.0.3 this weekend.

Regarding the root expiration, you can do "cat /etc/shadow" and it will show you the expiration at 365 days, I am no longer VPN'ed into that environment, but anyone can cat /etc/shadow and see theirs too 🙂

Message was edited by: thepj

0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

I tried the 2.0.3 OVA this weekend, and they have the same problems. There are no ACL's or firewalls that are blocking between them.

0 Kudos
sflanders
Commander
Commander
Jump to solution

I am looking into the SR. A support bundle for the master node AND the worker node will be needed so please ensure they are uploaded to the SR if they are not already.

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

Yep! I already uploaded the one from the master.

Have a WebEx this afternoon with him, will update what we find in here.

0 Kudos
sflanders
Commander
Commander
Jump to solution

Cool! Can you also upload the worker support bundle?

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

Sure! Will need to get SSH enabled, and create the support bundle via CLI!

0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

I lied, I can't.

log.JPG

This is a fresh 2.0.3 install, I have tried this on two different locations, both have the same error. These are workers that have not been joined to anything.

0 Kudos
sflanders
Commander
Commander
Jump to solution

That is okay, it should still continue to run and complete - it may take several minutes. Unfortunately, it is not very verbose as it generates the support bundle.

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

Turns out it was in fact, an ACL. Our Networking department swore up and down that it wasn't, but in the end, it was.

I spoke to the Rep this morning, and the configuration setup of having 1 worker in each location (3 separate physical sites) was our hope. He is following up to see if the workers hold the data locally, and when the query is ran, it queries the worker's local cache of logs and not send them over VPN to the master.

0 Kudos
sflanders
Commander
Commander
Jump to solution

It's always a network problem Smiley Happy I am glad you were able to get it resolved. Data is stored locally, though the master must receive query information from workers.

If your question is answered, can you please mark this question as answered?

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
thepj
Enthusiast
Enthusiast
Jump to solution

How does the Archiving work in this scenario then? Will future releases of Log Insight have the ability to archive in a different locations for the workers? If so, that would be awesome.

0 Kudos
sflanders
Commander
Commander
Jump to solution

Currently archiving is configured on the master and applies to all nodes. In short, all nodes archive to the same location. There is a roadmap item to separate this out, but I would encourage you to open a discussion on loginsight.vmware.com as community voted items will be prioritized!

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos
sflanders
Commander
Commander
Jump to solution

Thanks for opening! Again, please note that a geo-located Log Insight is not supported today. When it is, handling archiving based on location should also be addressed.

Hope this helps! === If you find this information useful, please award points for "correct" or "helpful". ===
0 Kudos