I'm doing some testing on the upgrade procedure from 3.0 to 3.3.1. When I connect to the master node IP address and import the pak file after a while it tells me the upgrade is sucessfull and to the monitor the cluster status. I click on the cluster status and see the second node is now upgrading... After a few mins the admin web page for the master node is now unresponsive! When I connect to third node and look at the cluster status, all nodes are showing version 3.0 and status connected, it seems the upgrade is hung or something!
I have now resolved this issue in 3.3.1. There is a new JAVA version in 3.3.1 (jre1.8.0_71) which doesnt know anything about the previously configured cacerts keystore.
Resolution (On each Cluster Node)
Give it a few minutes ...if its a 3 node cluster the max I'd wait is about 30 mins ....from what you describe it looks like the upgrade of the master was successful but the worker node upgrades hit a snag and a rollback was initiated.
If the cluster does not recover on its own we can do a quick webex tomorrow and take a look.
That was the second time I tried it, def more that 30 mins each time, Its only upgrading the second node though, 30 mins seems way too long!
Agreed that it is too long .... upgrades were made faster in 3.3 so if you upgrade from 3.0 -> 3.3 it will be slow and then from 3.3 -> 3.3.1 it should be fast as the fixes went into LI 3.3
Further testing this morning, over 1 hour now! Something isnt right!, ticket logged
The issue is vShield App blocking ports between the LI Nodes on the vDS, once this was opened up the upgrade completes. However new issue now that the Nodes have been upgraded to 3.3.1, event forwarding is no longer working. Are there any known issues with this, i guess its cert related? Was working fine in 3.0
We have not seen this outside of the internal clusters , can you share a bit more detail and possibly a support bundle ?
I can see the following in the runtime.log on the Master Node: Failed connection with <forwarding_IP_address> java .net.ssl.SSHhandshakeException: sun.security.validator.ValidatorExcpetion: PKIK path building failed: sun securtiy.provider.certpath.SunCertpathBuilderException: unable to find valid certifcation path to requested target.
Is there a custom certificate that you were using that is now messed up post upgrade perhaps?
Yes the Root and Sub CA certs were imported to the Log Insight appliance, all was well when I added two more nodes to the 3.0 cluster, but now I noticed its broken.
I have now resolved this issue in 3.3.1. There is a new JAVA version in 3.3.1 (jre1.8.0_71) which doesnt know anything about the previously configured cacerts keystore.
Resolution (On each Cluster Node)