RFD 213 - Relay, a lightweight tier 2 proxy (#54603)

espadolini · web-flow · commit dc5482d6c7cc · 2025-09-25T13:37:13.000Z
* RFD 213 - Relay, a lightweight tier 2 proxy

* clarify review questions

* Simplify wording, divide servers into clients/agents/relays

* no relay to proxy tunnels, default relay address on login, cleanups

* cspell

* required approvers

* separate listeners for ssh transport and kube sni routing
diff --git a/rfd/0213-relay-a-lightweight-proxy.md b/rfd/0213-relay-a-lightweight-proxy.md
@@ -0,0 +1,73 @@
+---
+authors: Edoardo Spadolini (edoardo.spadolini@goteleport.com)
+state: draft
+---
+
+# RFD 213 - Relay, a lightweight tier 2 proxy
+
+## Required Approvers
+
+* Engineering: @rosstimothy
+* Product: @klizhentas
+
+## What
+
+A new cluster component, called Relay, that acts as a lightweight proxy, receiving reverse tunnel connections from agents and client connections, and routing connections from clients to resources without the need for the connection to go back and forth through the broader control plane.
+
+## Why
+
+Geographically distributed Teleport proxies are quite good at reducing latency when accessing resources in large, widespread clusters; however, for applications such as large data transfers in situations where the originator and the receiver of the connection are physically in the same datacenter, with a high performance network (that has no ingress/egress costs) it can be very beneficial to arrange for connections to not go through the internet (or some other slow and/or expensive network link) to reach the Teleport control plane just to immediately get forwarded back through the same link.
+
+On-prem setups can sometimes make use of trusted clusters in such scenarios - they're not supported at all in Teleport cloud - to allow connections to stay limited to a given datacenter, office, geographical region (from here on out, a "subenvironment"), but a trusted cluster setup requires a full, independent Teleport leaf cluster with its own set of permissions, with separate auditing and recordings, and using the leaf cluster directly with credentials from the parent cluster is only partially supported and getting credentials from the leaf generally requires duplicating users, SSO connectors or bots.
+
+This RFD proposes an alternative that's entirely focused on connection topology, a peripheral agent service that serves as a lightweight, tier 2 proxy, called a Relay. A Teleport instance running the Relay service will itself connect to the control plane like any other agent, and it will receive and manage reverse tunnel connections from agents in the same subenvironment, as well as receive and (passively) route connections from clients to the agents.
+
+## Details
+
+### Relays, relay groups and the intended relay deployment
+
+A Relay is a Teleport instance running the `relay_service`, with host credentials for the builtin role `Relay` (similarly to all other host credentials). The relay runs a "relay tunnel" server (similar in functionality to the Proxy service's reverse tunnel server), which handles connections from agents that wish to provide connectivity to their own service through the relay, some servers to provide connectivity from clients in the subenvironment to the tunneled agents (a gRPC `teleport.transport.v1.TransportService` server for SSH like the proxy does, and an SNI-based forwarder for Kubernetes access), and a "relay peering" server for use by other relay instances of the same group to forward client connections in case a tunnel is not directly available.
+
+The tunnel listener also serves some configuration data for the agents; for now, this consists of the relay group name and the target connection count. Implicitly, the configuration also includes the address of the load balancer. Relays will also advertise their group name and individual peering address in their heartbeat to be used by other relays to route client connections if a tunnel is not available locally. No configuration needs to be given to clients for now, so there's no API centered around that.
+
+Relays will route connections from clients to agents within the same subenvironment. Client connections from the subenvironment to the relays are never going to be forwarded to the broader cluster if the agent that the client is trying to reach is not connected to the relay group. All Teleport API access (agent joining, heartbeats, audit logging and session recording upload, client login), as well as use of the Teleport web UI and the regular reverse tunnels from agents to the control plane, will use the regular Teleport control plane directly, and will not be handled or forwarded by the relays - this is mostly to keep the required permissions for the `Relay` role as narrow as possible, but it will also simplify the implementation and it will avoid adding another potential point of failure for internal connectivity.
+
+This RFD will detail support for (agentful) SSH and Kubernetes, other protocols are out of scope - what _can_ be supported in the future varies based on the protocol and the class of client used to access it: everything should be supportable through a local `tsh proxy` (and/or Teleport Connect and/or Teleport VNet), direct client connections for Database access might be possible, App access seems harder to support (as it's heavily tied to the web UI, in its direct client connection mode).
+
+### Agent configuration and behavior
+
+Agents will be configured with a `proxy_server` address pointed at the usual Proxy public address (which is still going to be used for the initial joining process and for all Teleport API access), but they can optionally also include a `relay_server` pointing at the relay tunnel address. If such an address is specified, agents will connect to it to fetch the relay group configuration data (from any of the relays). They will then open tunnel connections until they've successfully opened tunnel with enough distinct relay instances as directed by the target connection count. Relays that are being gracefully terminated will report themselves as such to the connected agents and agents will not consider those relays for the purpose of filling the connection quota. Agents will include the relay group name and list of relay host IDs that they're connected to in their resources' heartbeats (the same way that proxy IDs are advertised in heartbeats today if proxy peering is enabled).
+
+The discovery data will be occasionally re-fetched to allow for changes to the target connection count. Scaling the connection count down will likely require a new rollout of relays, since tunnel connections are not dropped if the relay is not shutting down, but scaling the count up will take effect quickly. Agents will assume that hitting the tunnel load balancer address will eventually result in enough distinct relays, and it is responsibility of the environment to run enough relay instances in the group for that to be the case.
+
+As part of future development, it can be made possible to allow agents to keep track of relay instances and connect to distinct ones individually; this will require that agents are kept up to date with regards to the relay instances in the group, through an event fanout system similar to the one used by the regular reverse tunnel.
+
+As part of future development, it can be made possible to configure agents to only open tunnels to the relay group and not open the regular reverse tunnels to the control plane. This should require some explicit configuration in the agents (to prevent an outdated, misconfigured or malicious relay from restricting access to resources from the broader control plane) as well as support for tunnel connectivity in the relay and in the proxy services (including logic on how to route connections to an agent that's advertising relay-only connectivity).
+
+### Client behavior
+
+A relay address (the address of the load balancer in front of the "client" servers of the relay service deployment) can be specified at `tsh login` time, and it can be explicitly specified and/or disabled as part of the invocation of `tsh ssh`, `tsh proxy ssh`, `tsh kube login` or `tsh proxy kube`, with either a command line option or an environment variable; as usual, the command line option overrides the envvar which overrides the configuration stored at login time. If no relay address is specified (or explicitly disabled), the cluster will be able to provide a default relay address for the user at login time; in the first implementation, this default relay address is going to be chosen by taking the value of a specific user trait, which can be configured in the list of traits for local users, passed to Teleport at login time by a SSO connector, or set by matching on other traits through a login rule. The logic to choose the default relay address will be implemented in the Auth Service, not in the client side, so it will be possible to consistently change it without requiring client updates.
+
+Access through machine ID will need explicit configuration as part of the "output" or "service" in use, and any need to rapidly switch between relay and non-relay routing can be served by creating two copies of a given output or service. At least in the first implementation of the client tooling side of this feature, there will be no support for automatic switching or fallback between relay and non-relay connection routing, as well as no safeguards against connecting through the broader control plane even if a relay could've been used. This is a deliberate choice to prioritize the ability to guarantee that connections intended to stay within a given subenvironment don't accidentally bounce through the control plane.
+
+Manually specifying a relay address at login time in Teleport Connect is left for future design work, but a possible way to allow that could be an "Advanced..." button in the "Enter cluster address" modal, or some UI element to change settings for the active cluster after logging in. Editing the profile configuration file is also an option (albeit one that we don't necessarily want to encourage or support), as well as using `tsh ssh --relay <relayaddr> <user>@<host>` in a terminal tab instead of choosing a server to connect to in the list of resources.
+
+For SSH, the existing proxy transport protocol will also be implemented by relays, which will run the same connection router that's used by the proxy to resolve a connection target into a host ID; however, unlike the actual proxies, relays will only ever route connections to resources that are available within the same relay group and will never open TCP connections to direct dial agents, and, to avoid giving relays any more privileges than they strictly require, agentless SSH servers and proxy recording mode are not going to be supported. For consistency with the router in the Teleport proxy, host ID resolution will happen against the full SSH inventory without filtering for nodes available in the subenvironment, and the usual rules will apply in case of an ambiguous match. The list of SSH servers shown with `tsh ls` will be updated to show `<-relay (<relayname>)` rather than `<-tunnel` for nodes connected to a relay, and similar UI changes will be applied to the web UI and Connect.
+
+For Kubernetes access through a relay we will tweak the protocol to include the destination cluster name as part of the SNI of the connection, and we will tweak the host side credentials of Kubernetes service agents to also include a wildcard DNS SAN (with a suffix to be determined) to allow unmodified Kubernetes API clients to validate the server certificate of the Kubernetes service. Connections coming from the regular control plane of the cluster will remain unmodified. The relay will check the inventory of kubernetes clusters based on the name extracted from SNI, it will pick a host serving the requested cluster, and forward the TLS connection along without terminating it; similarly to SSH, the resolution will happen against the full inventory, and if the resolved host ID is not connected to the relay group then the connection will fail.
+
+### Connection routing and relay peering
+
+Clients connecting to a relay will (implicitly or explicitly) specify a target that will be resolved to a target host ID that's hopefully known to be available through some of the relays of the group; if the relay chosen by the load balancer is one of them (i.e. if the relay serving the connection request has a tunnel for that host ID) then the connection will just go through the reverse tunnel - otherwise, the connection will be forwarded to one of the listed relays, together with the original client source address, through a mechanism that's similar to the one of proxy peering - seeing as latency between relays is not a factor, however, it's likely going to be better to open individual connections for each forwarded client connection rather than do the same multiplexing over a shared preexisting channel.
+
+The initial implementation won't include the ability for connections from the control plane to go through the relay, so the behavior of those connections is left unchanged.
+
+### Security considerations
+
+Relays will be limited to passive connection routing, and will not ever be in possession of cleartext data or credentials. They will be allowed to fetch and watch the full resource inventory in the cluster to make routing decisions, but the routing decisions will only expose information that's equivalent (or similar enough) to the one exposed today as a result of routing for SSH (and Desktops, soon) in the proxy. The existence of a given kubernetes cluster by name is something that's currently not revealed to users that don't have access to the cluster, but it's the same information disclosure that we have accepted for SSH servers and desktops, and it allows us to not grant similar Kubernetes impersonation powers to the ones of the Proxy instances to every Relay.
+
+The source IP information will be the one as seen by the relay (or rather the load balancer in front of the relay, forwarding the client IP info through a PROXY header as it's currently supported for most Teleport listeners), which means that the same IP address might be seen in multiple unrelated audit log events even if it's different sources in different subenvironments that just happened to use the same private IP address. This, however, is essentially just the dual problem to seeing the same public client IP from different machines behind NAT; and seeing as no local IP address will ever "leak" to the broader control plane, the use of relays will result in more granular and precise logging info.
+
+Credentials with source IP pinning will not be usable through the relay, since the only source IP accessible at login time (let alone plausibly trustable by the control plane) is the public one as seen from the proxy, which is not accessible to the relay.
+
+Relay instances will use a listener that should be reachable by agents, some listeners that should be reachable by clients, and a listener that should be reachable by other relays of the same group, so network rules can be put in place to restrict connectivity between different ports and/or load balancers. All connections handled by the relay are authenticated through mTLS, using certificates signed by the host and user CAs of the cluster, except for Kubernetes access connections which are forwarded without termination to a Kubernetes agent (which will, in turn, use mTLS to authenticate and authorize the connection).
diff --git a/rfd/cspell.json b/rfd/cspell.json
@@ -13,6 +13,7 @@
     "Acks",
     "Addrs",
     "adfs",
+    "agentful",
     "Agentless",
     "AIAA",
     "Alexey",
@@ -124,10 +125,10 @@
     "cdou",
     "centralus",
     "certbot",
-    "checkmodule",
     "certutil",
     "CEST",
     "CGNAT",
+    "checkmodule",
     "Cieślak",
     "ciphertext",
     "ckms",
@@ -194,9 +195,9 @@
     "dockerized",
     "Doyensec",
     "dpkg",
+    "dspublish",
     "DSSE",
     "Dtmfa",
-    "dspublish",
     "dynamodbattribute",
     "eastus",
     "ECCP",
@@ -339,8 +340,8 @@
     "jojk",
     "journalctl",
     "jsmith",
-    "jsontag",
     "jsonpathprop",
+    "jsontag",
     "Jtwy",
     "jumphost",
     "justinas",
@@ -355,8 +356,8 @@
     "keypair",
     "keypairs",
     "keypresses",
-    "keyv",
     "keytype",
+    "keyv",
     "Kgou",
     "kimlisa",
     "Klaassen",
@@ -479,8 +480,8 @@
     "NXDOMAIN",
     "Nyckowski",
     "Oaaz",
-    "oazahf",
     "OAEP",
+    "oazahf",
     "Obeid",
     "objc",
     "Observabilty",
@@ -609,9 +610,9 @@
     "rdsproxy",
     "readyz",
     "reauth",
+    "REAUTH",
     "reauthenticated",
     "reauthenticates",
-    "REAUTH",
     "reccfg",
     "reconnections",
     "recordingencryption",
@@ -630,8 +631,10 @@
     "refmodules",
     "regexes",
     "Rego",
-    "relogin",
     "Rekor",
+    "relayaddr",
+    "relayname",
+    "relogin",
     "renameio",
     "replaceall",
     "replayable",
@@ -650,7 +653,6 @@
     "Robakowski",
     "robotest",
     "rolearn",
-    "rolesanywhere",
     "rolebindings",
     "roleid",
     "rolesanywhere",
@@ -762,6 +764,8 @@
     "subcases",
     "subcondition",
     "Subconditions",
+    "subenvironment",
+    "subenvironments",
     "subkind",
     "subkinds",
     "Submatch",