4 Replies Latest reply on Oct 24, 2019 7:59 AM by dinish

    CockroachDB cluster issue

    dinish Novice

      iam facing some issue while launching cockroachdb cluster

       

      There seems to be an issue with the certificates created by Kubernetes and the database cluster instances not being able to connect to each other.

       

      The instructions for setting up the cluster at https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes#secure-mode-1

      The manifest used is essentially this one https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset-secure.yaml

      After approving the certificates (kubectl certificate approve ...) the logs indicate to certificate verification problems:

      W191003 12:56:16.464941 85 vendor/google.golang.org/grpc/clientconn.go:1304 grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"ca\")". Reconnecting... W191003 12:56:16.465089 46 vendor/google.golang.org/grpc/server.go:603 grpc: Server.Serve failed to complete security handshake from "x.x.x.x:42218": remote error: tls: bad certificate I191003 12:56:16.465308 80 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n1] circuitbreaker: gossip 0.0.0.0:26257->cockroachdb-0.cockroachdb:26257 tripped: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"ca\")" I191003 12:56:16.465377 80 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n1] circuitbreaker: gossip 0.0.0.0:26257->cockroachdb-0.cockroachdb:26257 event: BreakerTripped W191003 12:56:16.465442 80 gossip/client.go:122 [n1] failed to start gossip client to cockroachdb-0.cockroachdb:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"ca\")"

      This manifest works just fine in  GCP. Appreciate help or fixing this issue.

        • 1. Re: CockroachDB cluster issue
          daphnissov Guru
          vExpertCommunity Warriors

          I'm still playing around with this and trying to identify the cause. I see the same thing as you. My guess is how the CA gets issued by PKS for the Kubernetes cluster (and therefore how the node certs get assigned owing to the fact that the controller manager uses this cert as its root). Passing the --insecure flag should get it going, but then you'd have to secure it later on. If I have more time to keep poking I'll do so.

          • 2. Re: CockroachDB cluster issue
            dinish Novice

              daphnissov   Thanks for your efforts

             

            As per my observation  the CSR signing CA (--cluster-signing-cert-file=/var/vcap/jobs/kube-controller-manager/config/cluster-signing-ca.pem) is different from CA cert cockroachDB symlink

            (-symlink-ca-from=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt ) to -certs-dir=/cockroach-certs ....is that causing issue?

             

            Also I tried bring-your-own-certs way with pre-created certificates.It is working

            • 3. Re: CockroachDB cluster issue
              daphnissov Guru
              Community WarriorsvExpert

              Yes, I'm betting that's the reason for the failure. The CSRs are signed with one cert and the CA cert is another hence the lack of trust. In PKS, the cluster signing cert file is generated at the time the cluster is built but the root CA file is generated when PKS itself is stood up. Because both are essentially self-signed, if I were going to do this in a secured way I'd just bring my own certificates anyhow.

              • 4. Re: CockroachDB cluster issue
                dinish Novice

                daphnissov thanks, im proceeding with my own certificate...