Description
Today the flag --sql-advertise-addr
is offered for symmetry with --advertise-addr
.
We're not too happy about the status quo. There is more work needed.
Background
The reason why this flag is at all necessary is because of the following specific combination:
- if the cluster is multi-DC, the private address used to start up each node will be inadequate to establish the connection from another DC
- for this reason, we also store a separate "public" address, informed by the operator via flag
--advertise-addr
for the RPC port; this is used internally e.g. by node-node connections - for symmetry with the RPC port we also implement
--sql-advertise-addr
for when the SQL listener is split from RPC, which we'd like to make security best practice. - (and we're also implementing a separate tenant advertise addr when the tenant server is split off in a multi-tenant config)
Pros of the feature
-
in the node descriptor (
roachpb.NodeDescriptor
): all advertised addresses are stored there. -
the RPC adv addr (not SQL) is pulled from node descs for node-node connections. This is necessary for normal cluster operation.
-
inside SQL, the SQL adv addr is used to populate the vtable
crdb_internal.node_build_info
.This might be used by 3rd party automation to discover the addresses of SQL servers, e.g. to automate setting up load balancers. We're not sure (it's also not documented).
-
cockroach debug zip
needs to establish SQL connections to every node in the cluster, in addition to RPC connections; so even ifdebug zip
is ran on one of the node's VMs, it may still need to reach out to other DCs using the nodes' public address.Currently
debug zip
discovers the RPC and SQL public address by looking at the node descriptor on the 1st node it connects to. That's necessary for proper operation ofdebug zip
but then will drop after Ability to obtaindebug.zip
directly via http endpoint #51008 is addressed
So in summary while a separate RPC adv address is necessary for proper operation, the utility of a separate SQL adv address is relatively small.
Cons of the feature
- users/customers come along and ask "How do I configure this properly?" and it's creating support burden.
- the feature is only relevant for clusters that split the RPC and SQL listeners onto separate TCP addresses/ports. This is not the default config in CockroachDB.
- SQL client apps do not need an "advertised address" to be configured server-side. SQL apps learn the address of the CockroachDB nodes via the postgres connection URL, and/or the load balancer.
- we don't even use the advertise SQL addr in
cockroach gen haproxy
and therefore our own recommended configuration generator for load balancers don't use this feature properly
Ways forward
There are two possible ways forward, with different action items:
-
remove the concept:
- address Ability to obtain
debug.zip
directly via http endpoint #51008 and ensure that all thedebug zip
logic (that becomes server-side) operates using RPCs, without the need for node-to-node SQL connections. - explain in docs that SQL load balancers must be configured using the public addresses, and add a check in the
cockroach gen haproxy
command that all nodes are reachable using the selected address - deprecate the flag
--sql-advertise-addr
and eventually remove it.
(Note: in this approach, it would still be possible to split RPC and SQL listeners, it's just that we wouldn't need to configure or store an advertised SQL address server-side)
- address Ability to obtain
OR
-
double down on the feature:
- make
cockroach gen haproxy
use the advertised SQL address - document how to use separate SQL and RPC listeners by default (and maybe make this default config)
- explain in docs how the "advertised SQL address" is important and why, and how to configure it
- make
Note that in this second approach, until we have more comprehensive docs it is confusing to mention the option to users upfront; as it raises more questions than it answers. Therefore it may be a good idea to hide the flag until we have more comprehensive deployment docs.
Jira issue: CRDB-3971