Description
Reported by @meridional
Setup: we create MR clusters in CC using one kubernetes cluster per region. Nodes in each region has a single SRV hostname to represent their grpc port. And we use the following flags for nodes to discover each other:
--listen-addr 0.0.0.0:26258
--http-addr 0.0.0.0:8080
--sql-addr 0.0.0.0:26257
--advertise-addr cockroachdb-42bqr.cockroachdb.us-east4.svc.cluster.local:26258
--advertise-sql-addr cockroachdb-42bqr.cockroachdb.us-east4.svc.cluster.local:26257
--join _grpc._tcp.cockroachdb.asia-southeast1.svc,_grpc._tcp.cockroachdb.us-east4.svc,_grpc._tcp.cockroachdb.us-west2.svc
The above is taken from one of the nodes in a 3-region cluster, running on version http://us-docker.pkg.dev/cockroach-cloud-images/cockroachdb/cockroach:v22.2.7.
The DNS records are setup by using k8s services with the following spec:
spec:
clusterIP: None
clusterIPs:
- None
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: sql
port: 26257
protocol: TCP
targetPort: 26257
- name: grpc
port: 26258
protocol: TCP
targetPort: grpc
- name: http
port: 8080
protocol: TCP
targetPort: 8080
publishNotReadyAddresses: true
selector:
crdb.cockroachlabs.com/cluster: cockroachdb
svc: cockroachdb
sessionAffinity: None
type: ClusterIP
It supports lookups for SRV records as well as A records for the names we use in join flag _grpc._tcp.cockroachdb.asia-southeast1.svc. We don’t control how and when the records are generated. But an educated guess is that k8s populates them after crdb pods start to run (and have an assigned pod IP).
The issue is if region A’s nodes' start to run when region B’s nodes haven’t (and DNS records are missing), region A will try to join B’s port 26257, and be stuck in a retry loop, even after B’s nodes are up. A restart in region A fixes the issue. The logs from region A when the issue happens:
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 ‹[core]›‹[Channel #5841 SubChannel #5842] grpc: addrConn.createTransport failed to connect to {›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹ "Addr": "_grpc._tcp.cockroachdb.asia-southeast1.svc:26257",›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹ "ServerName": "_grpc._tcp.cockroachdb.asia-southeast1.svc:26257",›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹ "Attributes": null,›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹ "BalancerAttributes": null,›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹ "Type": 0,›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹ "Metadata": null›
W230410 16:42:17.662906 54727 google.golang.org/grpc/grpclog/component.go:41 ⋮ [-] 1935 +‹}. Err: connection error: desc = "transport: authentication handshake failed: context deadline exceeded"›
W230410 16:42:17.663353 108 server/init.go:420 ⋮ [n?] 1936 outgoing join rpc to ‹_grpc._tcp.cockroachdb.asia-southeast1.svc:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: context deadline exceeded"›
Logs from region B:
E230410 16:40:02.963899 178402 1@server/server_sql.go:1464 ⋮ [n1,client=‹10.16.0.11:33866›] 4295 serving SQL client conn: message size ‹352 MiB› bigger than maximum allowed message size ‹16 MiB›
10.16.0.11is the IP of a crdb pod in region A.
Jira issue: CRDB-27433