RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 #59141

cthach · 2025-09-15T16:25:39Z

What

A RFD doc proposing to move multi-factor authentication (MFA) enforcement from out-of-band to in-band to SSH session establishment.

The proposal includes new gRPC endpoints and a MFA verification layer on top of SSH, with a migration and deprecation plan for backward compatibility.

See more background and motivations in this comment.

Proof of Concepts

There were multiple PoCs done for this RFD.

~~The first one where TransportService did MFA enforcement can be found here~~

The second and latest iteration is where the SSH service at the Teleport Agent performs MFA enforcement within the SSH protocol is here.

Signed-off-by: Chris Thach <[email protected]>

rosstimothy · 2025-09-16T19:34:36Z

rfd/0224-in-band-mfa-ssh-sessions.md

+All SSH traffic destined for target nodes will be proxied through the Proxy service, which will handle authentication,
+authorization, and session management. Direct SSH connections to nodes will deprecated and removed after the [transition
+period](#backward-compatibility).


What do you mean by direct SSH connections to nodes here? Is this connections to direct dial nodes?

I believe so? It is the case where a user is able to dial directly to the node's SSH server if they're on the same network as the node without access to Proxy or Auth services and the SSH server is exposed on the network.

Now that I think about it after our conversation yesterday, do we even we support that?

Connections to direct dial nodes still go through the Proxy and are not direct connections from tsh to the Teleport SSH agent.

It is the case where a user is able to dial directly to the node's SSH server if they're on the same network as the node without access to Proxy or Auth services and the SSH server is exposed on the network

This is an infrequent but supported use case today, though it would likely be better solved via the Relay server. Though that begs the question, how will this work for users connecting via the Relay server? Will the Relay server call out to the Proxy/Decision service to mediate the MFA ceremony?

I thought about this. Either it continues to be handled mostly on the client or moved to the control plane and more deeply embedded in the protocol - I'm leaning towards the latter since this is architecturally consistent with how this new model we're moving towards with Proxy<->Clients in this RFD.

Isn't the Relay a proxy anyway? Couldn't we just implement TransportServiceV2 there also for consistency?

There are other ways we can solve this that we discussed, like following WebRTC patterns and embracing more peer-to-peer, but that is too much of a shift from where we are and where we have already started heading.

Isn't the Relay a proxy anyway? Couldn't we just implement TransportServiceV2 there also for consistency?

Yes the relay will need to implement TransportServiceV2 - it currently already implements TransportService to facilitate the connections. However, it will likely not have a local decision service running like the Proxy does. So it will likely have to broker the MFA ceremony with the Proxy/Auth somehow to uphold the guarantees that we want w.r.t in band per-session MFA.

I believe you missed one key difference: in the proposed change, per-session MFA certificates are no longer required to convey that MFA was completed for a session.

I'm in agreement about removing per-session MFA certificates, the flow I presented has the same outcome. The Proxy is in the position to accept/deny the connection.

In this aspect, the difference between our two proposals is whether:

the Client presents the mfa response to Proxy, Proxy validates it with Auth

the Client presents the mfa response to Auth with a challenge ID attached, Auth Validates it, Auth sends a validation success to the Proxy using the challenge ID

To be clear, I still don't see any benefit to the more complex StartAuthenticateChallenge and CompleteAuthenticateChallenge flow currently. It will just make the implementation more difficult and require more backend/Auth resources - to track/watch/reply to Proxy challenge requests in a headless-like implementation.

Another thing to keep in mind is that there can be multiple Auth servers, so I don't believe you can guarantee that the Client is handling the challenge on the same Auth Server that Proxy sent the StartAuthenticateChallenge request to. To achieve this type of flow you would need to create a watcher for the Challenge ID on the backend rather than any inter-process communication. You could create this watcher in the Auth server as part of StartAuthenticateChallenge, or make StartAuthenticateChallenge a non streaming endpoint and have the Proxy create the watcher after starting the challenge. Headless takes the latter approach.

The client solves the challenge and sends the MFAAuthenticateResponse back to the Proxy

What if the connection is being brokered by a Relay and not a Proxy? Do we want to give possession of the MFAAuthenticateResponse to the Relay?

The crux of this problem and proposed solution boils down to the fact that the trust level of a Proxy and a Relay are different. The Relay intentionally has as few permissions as required to proxy end user connections to their destinations.

@rosstimothy There's some similar conversation happening in a few threads on this PR about the security model of Relay vs. Proxy. It sounds like an important thing for us to hash out.

Should we pull that discussion into a single place?

Also, do you have some detail / docs on the intentions with Relay so that we can all be on the same page?

What if the connection is being brokered by a Relay and not a Proxy? Do we want to give possession of the MFAAuthenticateResponse to the Relay?

Good question. It's true that the StartAuthenticateChallenge flow is more controlled and prevents an attacker on the relay service from stealing an MFA response, but the fact that an MFA response can be stolen by an attacker in different scenarios is built into our threat model. Traditionally we don't treat an MFA response as a full blown secret. This is the reason we have scoped MFA challenges, so that even if an attacker manages to steal an MFA response within its 5 minute expiration window (including with reuse enabled), it can not be used by an attacker to perform unrelated MFA actions. Note that even to use the user session MFA response, the attacker would also need an active key/cert for the user.

I do however see the potential of StartAuthenticateChallenge. If we want to move away from scoped MFA challenges to "action" MFA challenge, tying an MFA challenge to an exact action being carried out, this would be one direction to go.

Another direction which avoids the complexity of the watcher/etc would be an ACTION MFA scope and a new ChallengeExtensions.ActionID uuid field:

Relay sends EvaluateSSHAccess (stream?) request to Decision Service

Decision Service (EvaluateSSHAccess) creates a random action UUID generated and sends it back to the Relay to indicate that MFA is required.

Relay sends the MFA action UUID to the client

The client performs the MFA ceremony with the ACTION scope and ActionID field set. The action ID is stored in the challenge on the backend.

The client sends the MFA response to the Relay who sends it back to Decision Service through the EvaluateSSHAccess stream.

The Decision Service validates the MFA response and checks that the scope and action ID are correct using rpc ValidateAuthenticateChallenge.

The Decision Service returns an access permit with MFA verified to the Relay.

In this flow, we ensure the following:

The Decision Service ensures that the MFA response provided by the client/relay was intended for that specific call to EvaluateSSHAccess via the uuid and scope matching

Since the Relay is the Policy Enforcement Point, this technically is not necessary. The Relay could produce the random UUID itself and validate it itself. For example. instead of being a streaming rpc, EvaluateSSHAccess could return an MFA required error. The Relay would then generate the UUID and kick off the MFA flow. It would then provide the MFA response and action UUID to the Decision Service to validate that it matches.

The MFA response provided to the Relay can not be used for anything other than completing that exact EvaluateSSHAccess request. A stolen ACTION scoped MFA challenge is useless even if the thief has full user credentials.

The Relay Service does not require any additional permissions, It just needs EvaluateSSHAccess

It also limits the changes needed to just adding a new scope and challenge extension field. It should be much easier to implement.

tldr; having the Relay service possess a user's scoped MFA challenge response to execute an EvaluateSSHAccess request would not be a major departure from our current MFA security model. Even if we think Relay possession of the MFA challenge response must be avoided, I think there are simpler ways to do this, and this would likely be better tackled in a separate RFD / follow up to scoped MFA challenges (I volunteer as tribute).

We can continue discussion on the MFA flow this new thread.

edits: clarifications now that I understand the Relay is the PDP.

rfd/0224-in-band-mfa-ssh-sessions.md

Signed-off-by: Chris Thach <[email protected]>

…a-ssh-sessions

…rt mention. Signed-off-by: Chris Thach <[email protected]>

…andling and connection flow Signed-off-by: Chris Thach <[email protected]>

Signed-off-by: Chris Thach <[email protected]>

strideynet · 2025-09-23T12:17:19Z

rfd/0224-in-band-mfa-ssh-sessions.md

+### Overview
+
+All SSH traffic destined for target nodes will be proxied through the Proxy service, which will handle authentication,
+authorization, and session management. Direct SSH connections to nodes will deprecated and removed after the [transition


I may be missing something on my first read through of the RFD, but, it's not entirely clear to me why direct SSH connections to nodes is something that must be deprecated as part of this work. It would be nice to understand this better as this is a strategy that we have recommended to customers fairly recently for non-human connections. I'm guessing this is because we want to shift the Node to only accepting/understanding Permits?

Great question! I need to reword this section as I learned a lot since I wrote this part of the RFD. More background is definitely needed. TODO for me. The use of Permits are nice, but not the main reason.

The main driver for this change is due to security gaps on how Per-session MFA is implemented. Part of this was hinted in the Why section but to cut to the point, our Per-session MFA implementation was completely ineffective against a remote authentication bypass attack against a node in CVE-2025-49825. In this particular CVE case, MFA checks were completely bypassed (therefore useless) by leveraging a vulnerability in node trust. Technically it also exposed a vulnerability in our MFA implementation so...

As a direct action item from CVE-2025-49825, the goal now is to close those security gaps that should have prevented the remote authentication bypass in the first place, the whole point of MFA is that we require a second factor of auth. An attacker should not be able to authenticate with just a single credential (e.g., client cert) if Per-session MFA is enabled.

With the way we have implemented Per-session MFA with SSH, the MFA assertion is embedded into the client certificate via an extension i.e., two factors of auth combined into a single credential. Which really all means, our MFA implementation is really a Single Factor Authentication (SFA) implementation.

Additionally, it's hard to really enforce authz or security policies when you have 1000s or more node agents, all possibly running different Teleport versions and possibly enforcing this policy differently. What if we need to make a policy or security update to the enforcement logic? How do we ensure consistency and correctness? Rolling out a security patch for vulnerable node agents were another pain point of CVE-2025-49825.

This is why this RFD proposes we move from a distributed model to a centralized one so that the critical enforcement points are easier to secure, and therefore reducing the platform's attack surface. This is pretty consistent with our product philosophy here at Teleport, now that I think about it 🤔. We're also splitting the MFA assertion from the credential itself and embedding it deeper into the protocol e.g., SSH, Desktop, etc (they should not be combined!!! 🔓). We would now force all connections through the Proxy (or a delegate TBD), so we can make sure this policy is consistently and correctly enforced for all access.

Unfortunately a potential casualty of this architectural move is we lose direct SSH access to the node. I say potentially because I'm not absolutely certain that we can't make it work while preserving the spirit of what we're trying to achieve. If you or anyone have any ideas, I'm all ears. From what I can tell, the SSH protocol doesn't allow us to natively perform MFA enforcement that easily integrates with the Teleport platform.

That being said, it's not a total loss AFAIK. For non-humans, they can still use tsh or tbot. If a Teleport user wants high throughput or traffic to flow locally instead of routed through the Proxy, I believe they will still get that with the upcoming Relay service.

Hopefully this helps! If not, happy to answer for questions and/or brainstorm with you.

P.S. Did you happen to see this thread? There is so more info there.

Credits to @rosstimothy for explaining all of this to me multiple times before I finally got it into my 🧠

Thanks for the detailed response.

That being said, it's not a total loss AFAIK. For non-humans, they can still use tsh or tbot

Yep - in this case I am talking about cases where the customers are using tbot.

I believe they will still get that with the upcoming Relay service.

I appreciate that the Relay service will allow for similar performance - but we should be cognizant that this will be a pretty big breaking change and work for customers to switch to from something that works for them today.

Unfortunately a potential casualty of this architectural move is we lose direct SSH access to the node. I say potentially because I'm not absolutely certain that we can't make it work while preserving the spirit of what we're trying to achieve. If you or anyone have any ideas, I'm all ears. From what I can tell, the SSH protocol doesn't allow us to natively perform MFA enforcement that easily integrates with the Teleport platform.

For non-human use-cases, we can't perform MFA ceremonies anyway. Would it be be possible to retain the the direct connection ability - but add the limitation that this will not function if Per-Session MFA is required?

No problem!

Just to be completely clear, we're both talking about this use case, right?

It is the case where a user is able to dial directly to the node's SSH server if they're on the same network as the node without access to Proxy or Auth services and the SSH server is exposed on the network

If yes, I don't think it will be possible to retain this ability, as the node would still be acting as an MFA enforcement point since it needs logic to decide whether Per-Session MFA is required by policy before it goes to enforce that policy (e.g., invoke Auth or Decision service -> decide if MFA enforcement must be enforced for this specific request).

The idea is we remove this decision making from the node altogether and centralize enforcement at a single point, the Proxy (or a delegate). The vision is that the node would not need to handle any of this, nor would the clients, like they currently do.

I'm not writing it off as impossible though, as I think this is a valuable use case. I'm happy to put this in as a future consideration into the RFD and we can come back with a v2 that improves on this design, thoughts?

Just to be completely clear, we're both talking about this use case, right?

Yeah - that's correct.

The idea is we remove this decision making from the node altogether and centralize enforcement at a single point, the Proxy (or a delegate). The vision is that the node would not need to handle any of this, nor would the clients, like they currently do.

Ok cool - I think I understand the reasoning/background now.

tsh ssh only supports connections via the Proxy today - it does not do any direct dialing. For that use case one would need to connect via OpenSSH.

tsh ssh only supports open connections via the Proxy today - it does not do any direct dialing. For that use case one would need to connect via OpenSSH.

This makes a lot of sense. Thanks. I recall you mentioning this to me, but it didn't click until now when you mapped specific commands to the different implementations 😄

If yes, I don't think it will be possible to retain this ability, as the node would still be acting as an MFA enforcement point since it needs logic to decide whether Per-Session MFA is required by policy before it goes to enforce that policy (e.g., invoke Auth or Decision service -> decide if MFA enforcement must be enforced for this specific request).

I don't see why the node would have to be acting as a PDP for this; asking the remote PDP service about the incoming connection could happily come back with "MFA needed".

the SSH protocol doesn't allow us to natively perform MFA enforcement that easily integrates with the Teleport platform

SSH is literally the only protocol where it's expected that clients spawn a binary of our choosing and speak with it rather than open a network connection, so if we can't do MFA enforcement in SSH we have little hope for it elsewhere, fwiw.

I don't see why the node would have to be acting as a PDP for this; asking the remote PDP service about the incoming connection could happily come back with "MFA needed".

Yes, this is the proposed plan so far. See refs to EvaluateSSHAccess for more context.

SSH is literally the only protocol where it's expected that clients spawn a binary of our choosing and speak with it rather than open a network connection, so if we can't do MFA enforcement in SSH we have little hope for it elsewhere, fwiw.

The current proposal is that we do MFA enforcement at the control plane level. Are you saying that we should go a level deeper into the user's SSH network connection to do the enforcement?

… package Signed-off-by: Chris Thach <[email protected]>

Signed-off-by: Chris Thach <[email protected]>

…fallback to v1 Signed-off-by: Chris Thach <[email protected]>

Signed-off-by: Chris Thach <[email protected]>

…a-ssh-sessions

rfd/0225-in-band-mfa-ssh-sessions.md

…gation Signed-off-by: Chris Thach <[email protected]>

rob-picard-teleport · 2025-10-03T19:21:09Z

rfd/0225-in-band-mfa-ssh-sessions.md

+
+  Proxy->>Client: Send ClusterDetails
+
+  Client->>Proxy: Establish SSH connection


I'm new to this part of the codebase, so I'll ask a dumb question.

One of the primary goals of this RFD is to update our threat model such that being able to forge certificates should not allow you to establish an SSH connection if that resource is configured for per-session MFA.

In this step, what context from the prior steps is used to tie session establishment explicitly to the MFA success, and how is forgery of that prevented?

If you could point me to the right place in code (transport service?) that would be super helpful in wrapping my head around this flow.

I think I've got it. This is all happening over a single stream, so the proxy would be validating the state of the stream is appropriate.

I'm new to this part of the codebase, so I'll ask a dumb question.

No such thing as dumb questions. I'm new to the codebase too. It took me quite some time to wrap my head around what is done where.

One of the primary goals of this RFD is to update our threat model such that being able to forge certificates should not allow you to establish an SSH connection if that resource is configured for per-session MFA.

Correct!

In this step, what context from the prior steps is used to tie session establishment explicitly to the MFA success, and how is forgery of that prevented?

If you could point me to the right place in code (transport service?) that would be super helpful in wrapping my head around this flow.

Once you understand the high-level flow and you see how "MFA success" is encoded into the certificate, we can move over to enforcement.

Eventually the SSH server HandleConnection (primary entrypoint for all SSH conns) method on the node will parse the client certificate that was presented and evaluate access based on the client cert. Checking if the MFA requirement was satisfied is one of many checks based on authz policy.

If you keep diving down the 🐇 🕳️, you'll eventually end up where the magic happens.

how is forgery of that prevented?

Circling back to this. The RFD proposes the removal of these per-session MFA SSH certificate. Instead of temp certificates to convey MFA satisfaction, we move this information to be conveyed via the control plane (e.g., Transport service, Proxy Router, etc).

Like I mentioned earlier in my motivations comment, doing this allows us separate two factors of auth that were combined into one credential, mitigating against an attacker that is able to forge these certificates and bypass MFA. In other words, an attacker having access to just the SSH certificate (one factor) won't be enough when MFA is required, they'll need to compromise the control plane to provide the second factor.

rosstimothy · 2025-10-07T15:56:41Z

rfd/0225-in-band-mfa-ssh-sessions.md

+`CompleteAuthenticateChallenge` RPC with the challenge ID and complete the challenge. Once the client completes the MFA
+challenge, the `TransportService` will receive the pass/fail result and `ProxySSH` will unblock and proceed accordingly.
+
+If the MFA verification fails, the stream is immediately terminated. Similarly, any connectivity issues with the Proxy


What kind of error message does a user get in this case?

Thanks! Added clarification here: 270307b.

I'm thinking we can provide more context in addition to the AccessDenied or InternalServer errors. Not sure if we need to go into that detail in the RFD. Happy to go to that level though if you think it's needed.

rosstimothy · 2025-10-07T15:59:10Z

rfd/0225-in-band-mfa-ssh-sessions.md

+- `StartAuthenticateChallenge`: Only the Proxy service is permitted to invoke this RPC, allowing it to initiate MFA
+  challenges on behalf of users. Direct user access to this RPC will be denied.


How does this work for connections that are routed a Relay instead of a Proxy?

I expect connections routed via the Relay should behave the same as via the Proxy. From what I can tell, they share the same v1 TransportService implementation.

We'll need to implement the v2 TransportService and then have Proxy and Relay import that new v2 implementation. The Relay will now also need to dial Auth in order to invoke the new MFAService to initiate MFA challenges.

I updated the RFD in 1e4568e to try clarify the Relay will need changes that mirror Proxy.

rosstimothy · 2025-10-07T16:03:11Z

rfd/0225-in-band-mfa-ssh-sessions.md

+    // SSH payload
+    Frame ssh = 2;


Is forwarding the SSH agent frames no longer required?

No longer required according to this comment by @espadolini

rosstimothy · 2025-10-07T16:05:08Z

rfd/0225-in-band-mfa-ssh-sessions.md

+    // Final response indicating the result of the MFA challenge.
+    bool success = 2;


Is there any other data that we might want to provide when a challenge is completed successfully?

I tried to keep this message as minimal as possible to reduce the sharing any potentially sensitive information beyond what was needed since the caller, Proxy/Relay, are expected to have a reduced privilege set.

That being said, your comment did make me realize that there was no way of conveying errors that happen between the User <-> Auth service. I added a message for the result of that interaction in b0181b2.

rosstimothy · 2025-10-07T16:09:16Z

rfd/0225-in-band-mfa-ssh-sessions.md

+Per-session MFA SSH certificates are not required in the new design except for backwards compatibility with legacy
+clients. They were previously used to convey session metadata and enforce MFA at the Teleport Agent. With the new
+architecture, the Proxy and Auth service handle these responsibilities directly. Support for per-session MFA SSH
+certificates via `ProxySSH` RPC will initially be retained during the transition period to ensure backward compatibility
+with existing clients (see [Backward Compatibility](#backward-compatibility)).


Per-session MFA SSH certificates are not required in the new design except for backwards compatibility with legacy clients.

Is compatibility only something that concerns clients? What happens if an older SSH agent is still around that only knows about MFA SSH certificates?

Great catch. I added some handling for legacy agents in 7583f96

rob-picard-teleport · 2025-10-08T14:53:56Z

rfd/0225-in-band-mfa-ssh-sessions.md

+    end
+  end
+
+  Proxy->>Node: Dial target host


How does the node know whether MFA was required for the session?

My understanding is that we have the auth server complete the MFA challenge so that we don't have to elevate the level of trust given to the proxy and relay servers.

If the decision service lives in proxy / relay though, they can just decide not to do MFA, unless the node itself has some concept of "Wait, MFA should be required for this session, but I don't see a stapled permit."

I forgot to update this diagram after some previous changes. I did so in this commit.

How does the node know whether MFA was required for the session?

It should no longer be concerned with this since it happens before it receives the incoming dial from Proxy / Relay.

My understanding is that we have the auth server complete the MFA challenge so that we don't have to elevate the level of trust given to the proxy and relay servers.

Correct

If the decision service lives in proxy / relay though, they can just decide not to do MFA, unless the node itself has some concept of "Wait, MFA should be required for this session, but I don't see a stapled permit."

Going back to trust, in this new model, we're moving away from node-level MFA session checks and having the node trust that the Decision / Proxy / Relay services did their jobs correctly.

If the Decision / Proxy / Relay services decides not to enforce MFA when they are suppose to for a session, that is a issue and it is a known risk as defined in the Access Control Decision API.

For proxy that makes sense, but this would be a new issue with relay right? Based on the comments elsewhere it seems like the relay isn't meant to have that level of authority, and is just supposed to be more like a network router.

Relay shares the exact same TransportService as Proxy does. As proposed, it will eventually share the same v2 version of TransportService later.

I think Relay being just a pure network router isn't true because the current Transport v1 that it currently implements does more than that. For example, it has access to the user's auth context and eventually does authz checks on it.. Maybe a desired future state?

Are we all OK with just giving the same level of access to Relay as Proxy has to initiate MFA challenges with Auth? @espadolini @tigrato @rosstimothy would love your input here too.

Are we all OK with just giving the same level of access to Relay as Proxy has to initiate MFA challenges with Auth?

For what it's worth, I don't have a strong opinion on whether Relay should have the same security model as Proxy. I just want to make sure we're all on the same page as to whether it does, and more importantly that our customers understand the security model too.

Going back to trust, in this new model, we're moving away from node-level MFA session checks and having the node trust that the Decision / Proxy / Relay services did their jobs correctly.

If the Decision / Proxy / Relay services decides not to enforce MFA when they are suppose to for a session, that is a issue and it is a known risk as defined in the Access Control Decision API.

From gathering context and speaking briefly with @espadolini, it does not sound like the Relay Service is intended to be a Policy Enforcement Point, and even moreso not the Policy Decision Point. The Relay service is only meant to forward connections to the target node in the same capacity that the Proxy does today. The contradiction between the Relay service taking the responsibility of forwarding connections and the Proxy taking responsibility as the PEP, and enforcing that through connection forwarding decisions, has not been addressed yet as far as I can tell.

This whole concern seems to be beyond the scope of this RFD, so we shouldn't make assumptions about the Relay becoming an Policy Enforcement or Decision Point. In the current state of the Decision and Relay services, it does not seem like the Transport service is the right place to enforce in-band MFA.

We need to instead find a way to enforce MFA on the Node as it is today, just without the MFA certs. For example, we could do this by extending the SSH protocol to handle in-band MFA authorization, with the node being the one to start / validate MFA challenges.

Moved to #59141 (comment)

Signed-off-by: Chris Thach <[email protected]>

klizhentas

Product: approved since this RFD has UX/Product changes.

Signed-off-by: Chris Thach <[email protected]>

Joerger · 2025-10-09T18:43:57Z

rfd/0225-in-band-mfa-ssh-sessions.md

+A new MFA service will be introduced to handle MFA challenges and responses instead of continuing to introduce new RPCs
+to the legacy AuthService. Existing MFA related RPCs in the Auth service can eventually be migrated to this new MFA
+service in a future effort.
+


Re: MFA flow discussion. Context:

broad mfa flow discussion

relay mfa security discussion

Thanks, @Joerger and @espadolini for the context on the Relay service. I agree, given the purpose of the Relay service that it doesn't make sense to make it become a PEP or a PDP.

We need to instead find a way to enforce MFA on the Node as it is today, just without the MFA certs. For example, we could do this by extending the SSH protocol to handle in-band MFA authorization, with the node being the one to start / validate MFA challenges.

@rosstimothy and I initially wrote this off before I started the RFD because of possible issues with extending the SSH protocol (e.g., client compatibility) and problems rolling out authz updates to the agents (one of this RFD goals).

I'm going to do a deep dive to see if this is a feasible path. I know we implement a custom SSH conn handler. Need to see how much it can be extended and any possible issues that may come from it.

I did a quick PoC of extending golang.org/x/crypto/ssh server to support MFA in-band here. It proves that we can chain multiple authentication methods and require them all to succeed in order to grant a session to a user. I did the PoC with golang.org/x/crypto/ssh since it's the library we use for our Teleport SSH server.

Now we know that its possible, I'm going to extend our own implementation to make it work with tsh (another PoC). Will post an update next week.

For that to work we definitely need a MFA challenge response mechanism that is safe for the client to do when speaking to the agent rather than the control plane; the obvious one - which is unfortunately really tied to the SSH protocol itself, but maybe we can make the interaction with our MFA system somewhat generic - is that the challenge and response should be tied to the session identifier (which is available in x/crypto/ssh in server auth callbacks but not in client ones except VERY indirectly and hackily) so that both parties know that the MFA challenge and response is tied to a specific SSH connection and any attack involving MITM or reuse will just not work.

@cthach This is super interesting. I think this is a good case for Doyensec to review once you have an implementation ready as well.

For that to work we definitely need a MFA challenge response mechanism that is safe for the client to do when speaking to the agent rather than the control plane; the obvious one - which is unfortunately really tied to the SSH protocol itself, but maybe we can make the interaction with our MFA system somewhat generic - is that the challenge and response should be tied to the session identifier (which is available in x/crypto/ssh in server auth callbacks but not in client ones except VERY indirectly and hackily) so that both parties know that the MFA challenge and response is tied to a specific SSH connection and any attack involving MITM or reuse will just not work.

Just want to acknowledge this great point and mention I'm merging this with @Joerger's proposed path and some learnings from the PoC. I should have the next iteration ready for review in the next few days (aiming for ASAP lol).

Thanks all for your continued patience and feedback!

…a-ssh-sessions

Signed-off-by: Chris Thach <[email protected]>

…h implementation, and handle backwards compatibility Signed-off-by: Chris Thach <[email protected]>

cthach · 2025-10-16T22:42:22Z

rfd/0225-in-band-mfa-ssh-sessions.md

+
+#### Decision Service
+
+The Decision service will be updated to support evaluating SSH access requests with MFA challenge responses. This


@fspmarshall I would love to get your thoughts on this new approach before I open this for review.

What are your thoughts on the Decision service accepting the MFA challenge response before it issues a permit? I noticed that this is common pattern for our APIs to do. It will be responsible for calling Auth to validate it.

I also added a structured response to callers (per your suggestion) to more robustly inform them that a permit was denied because MFA wasn't done, compared to just an error message string on how it is implemented today.

This is a tricky question, and I've been talking it over with some others. There are pros and cons.

On the plus size, as you point out, representing the need for MFA as an error/Denial is the common standard across teleport right now. Because of that, going this way (with MFA requirement being expressed via a Denial) is definitely the easier path forward from a development perspective since thats already how the ssh decision logic currently works. And, once we accept that the PDP represents MFA requirements as a Denial, then I see how it flows logically that one might also want the PDP to validate the MFA response as part of the Permit path.

However, I have some misgivings about both the structure of representing MFA requirement as part of the denial, and about making the PPD be the response validator.

On the broader subject of how to represent the MFA requirement I have two main misgivings:

One of our goals in the decision service is to have it eventually be a tool for policy introspection. I.e. an auditing/testing tool, not just an internal implementation detail of teleport's access-control. In that context, MFA requirement being enforced by a Denial isn't as desirable. Say, for example, that I have a user named alice who can ssh into node.example.com, but only with MFA. If I ask the decision service, "can alice ssh into node.example.com?", getting a Denial decision and needing to re-run the query with something like "can alice ssh into node.example.com with --mfa-verified=true?" to see her allowed access is very cumbersome. A much better experience would be to receive a single answer that says something like "yes, alice ssh into node.example.com, but only if she provides MFA". I.e. having the MFA requirement be expressed as part of a conditional allow decision makes a lot more sense from a UX perspective than having it be a special-case denial.

Needing to "re-run" a decision after having already taken action based on a previous incarnation of that decision is IMO a weaker model overall. Roles/configuration may change between calls to the decision service. If we make a decision and that decision requires us to enforce certain conditions, it is a cleaner behavioral model to have the parameters of the allowed access and the conditions being enforced originate from the same "configuration state", rather than enforcing conditions derived from one configuration state, then deriving parameters of access from another. Additionally, any stateful element associated with decisions (e.g. rate-limiters, logging, etc) become easier to work with if decisions have a 1-to-1 relationship with access attempts.

On the more specific subject of making validation part of the PDP, I think this conflicts somewhat with the philosophical model of the PDP. One of the key design goals of the PDP is to provide robust logical isolation between "decision" logic and "enforcement" logic. This is why the PDP APIs accent a description of a user identity, rather than validating a user certificate, and why we don't try to, for example, increment the max_connections semaphore inside of the PDP. Separating out enforcement/authentication/validation/etc from core decision logic makes decision logic easier to audit, easier to test, more portable, etc. I believe the same holds true for MFA challenge validation. MFA challenge validation is a responsibility that I would prefer to live outside of the PDP if possible, to help keep decision logic more cleanly isolation from enforcement logic.

In a perfect world, my preference would be that the PDP API represented MFA requirements as a condition within a Permit decision, and that enforcement-side logic acted upon that condition. Something like:

rsp, _ := pdp.EvaluateSSHAccess(ctx, &EvaluateSSHAccessRequest{...}) if rsp.GetDenial() != nil { rejectAccessAttempt() return } if rsp.GetPermit().GetRequirePerSessionMfa() { if err := doMFACeremony(); err != nil { rejectAccessAttempt() return } } continueWithAccess()

I'm open to the idea that some of this might be more implementation churn than we want. Making the PDP represent MFA requirements as part of the permit would require updating the internals of services.AccessChecker to allow for a mode where MFA requirement gets expressed as a parameter rather than an error. However I do feel that this would result in a better system overall.

Signed-off-by: Chris Thach <[email protected]>

…e MFA challenges Signed-off-by: Chris Thach <[email protected]>

Signed-off-by: Chris Thach <[email protected]>

cthach · 2025-10-17T18:23:14Z

rfd/0225-in-band-mfa-ssh-sessions.md

+However, when connecting to _multiple SSH hosts_ as part of a single user action (e.g., `tsh ssh root@env=example
+uptime`), the user may need to complete the MFA challenge multiple times depending on each target host's MFA
+requirements.
+
+This is due to the fact that the current design only evaluates MFA requirements _once_ on the first host matching the
+label, and if MFA was required, the per-session MFA certificate was used for all subsequent hosts without further MFA
+checks. Moving to in-band MFA enforcement means that each target host will independently evaluate MFA requirements
+during session establishment, which increases security.


@rosstimothy or anyone

I'm not sure we can avoid this without somehow sharing a MFA challenge between different hosts. This opens us up to replay attacks if we allow sharing a single MFA challenge.

Is this how we accomplished "single MFA challenge for many hosts" with per-session MFA certificates like in the current implementation? If yes, would we like to continue making that security tradeoff in favor of a better UX?

The other option we discussed was some sort of meta-session that is platform-wide, but that can easily blow up the scope of this RFD, so ruling it out like we agreed.

💥 (it's a link)

Honestly I didn't realize we allowed users to connect to multiple hosts with a single MFA cert like that, nice catch. We could try to adopt that same tsh db exec flow, but not sure exactly how that fits in to the new flow. Just wanted to share the additional context before a deeper review.

Signed-off-by: Chris Thach <[email protected]>

doc: Add RFD for In-Band MFA for SSH Sessions

b8b83e0

Signed-off-by: Chris Thach <[email protected]>

cthach self-assigned this Sep 15, 2025

cthach added documentation security Security Issues rfd Request for Discussion server-access size/md no-changelog Indicates that a PR does not require a changelog entry labels Sep 15, 2025

fix: Add words to cspell config

27cfa45

Signed-off-by: Chris Thach <[email protected]>

cthach changed the title ~~doc: Add RFD for In-Band MFA for SSH Sessions 🔒🧑‍💻~~ WIP: doc: Add RFD for In-Band MFA for SSH Sessions 🔒🧑‍💻 Sep 15, 2025

cthach added 5 commits September 15, 2025 17:47

refactor: Improve clarity and consistency

3baaf36

Signed-off-by: Chris Thach <[email protected]>

docs: clarify session certs

607e9e7

Signed-off-by: Chris Thach <[email protected]>

docs: add updates to DialRequest message

64f2646

Signed-off-by: Chris Thach <[email protected]>

docs: add versions and permit

fdd5898

Signed-off-by: Chris Thach <[email protected]>

docs: remove session SSH cert from design

9b58fd7

Signed-off-by: Chris Thach <[email protected]>

rosstimothy reviewed Sep 16, 2025

View reviewed changes

cthach added 5 commits September 17, 2025 15:13

docs: Rename to ProxySSH. Add approvers.

907f708

Signed-off-by: Chris Thach <[email protected]>

Merge remote-tracking branch 'origin/master' into rfd/0224-in-band-mf…

80792b4

…a-ssh-sessions

docs: update security to call out risk of new Auth RPC. Remove SSH ce…

312ee9e

…rt mention. Signed-off-by: Chris Thach <[email protected]>

docs: enhance TransportServiceV2 description to clarify in-band MFA h…

45e3683

…andling and connection flow Signed-off-by: Chris Thach <[email protected]>

docs: Teleport clients, not agents

cecdefc

Signed-off-by: Chris Thach <[email protected]>

strideynet reviewed Sep 23, 2025

View reviewed changes

cthach added 7 commits September 26, 2025 14:15

docs: rename to TransportServiceV2 to just TransportService in the v2…

c15c076

… package Signed-off-by: Chris Thach <[email protected]>

docs: client must send dial_target first

31b3b50

Signed-off-by: Chris Thach <[email protected]>

docs: introduce new MFAService

2bfd689

Signed-off-by: Chris Thach <[email protected]>

docs: improve consistency

83ccc70

Signed-off-by: Chris Thach <[email protected]>

docs: update client and web terminal to use v2 TransportService with …

1fd8c19

…fallback to v1 Signed-off-by: Chris Thach <[email protected]>

docs: extend dependencies for Decision

2b5382e

Signed-off-by: Chris Thach <[email protected]>

Merge remote-tracking branch 'origin/master' into rfd/0224-in-band-mf…

6d600e1

…a-ssh-sessions

rosstimothy reviewed Sep 30, 2025

View reviewed changes

rfd/0225-in-band-mfa-ssh-sessions.md Show resolved Hide resolved

doc: make security section easier to read and add conn downgrade miti…

829d63b

…gation Signed-off-by: Chris Thach <[email protected]>

cthach mentioned this pull request Oct 3, 2025

Per-session MFA does not work in Recording Proxy mode #8843

Open

rob-picard-teleport reviewed Oct 3, 2025

View reviewed changes

rosstimothy reviewed Oct 7, 2025

View reviewed changes

rob-picard-teleport reviewed Oct 8, 2025

View reviewed changes

cthach and others added 4 commits October 8, 2025 11:47

Merge branch 'master' into rfd/0224-in-band-mfa-ssh-sessions

136c435

docs: remove new vs legacy client from diagram

7bcfd7b

Signed-off-by: Chris Thach <[email protected]>

docs: return an AccessDenied/InternalServer error on failures

270307b

Signed-off-by: Chris Thach <[email protected]>

docs: clarify relay also gets upgraded responsibilities and perms

1e4568e

Signed-off-by: Chris Thach <[email protected]>

klizhentas approved these changes Oct 8, 2025

View reviewed changes

cthach added 2 commits October 8, 2025 19:44

docs: return result struct on completion of StartAuthenticateChallenge

b0181b2

Signed-off-by: Chris Thach <[email protected]>

docs: handle legacy agents

7583f96

Signed-off-by: Chris Thach <[email protected]>

Joerger reviewed Oct 9, 2025

View reviewed changes

cthach marked this pull request as draft October 14, 2025 17:35

cthach changed the title ~~RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻~~ RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 - DO NOT REVIEW 🛑 Oct 14, 2025

cthach changed the title ~~RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 - DO NOT REVIEW 🛑~~ WIP: RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 - DO NOT REVIEW 🛑 Oct 14, 2025

cthach added 3 commits October 15, 2025 14:22

Merge remote-tracking branch 'origin/master' into rfd/0224-in-band-mf…

97aa82b

…a-ssh-sessions

refactor: Do MFA enforcement at SSH service within SSH protocol

a4c928b

Signed-off-by: Chris Thach <[email protected]>

docs: Add non-goals, handle edge cases, populate security, test, roug…

21afa57

…h implementation, and handle backwards compatibility Signed-off-by: Chris Thach <[email protected]>

cthach commented Oct 16, 2025

View reviewed changes

cthach added 3 commits October 17, 2025 10:54

fix: enum DenialMetadataReason should have suffix

5a7f8ae

Signed-off-by: Chris Thach <[email protected]>

fix: clarity that running commands on multiple hosts may need multipl…

8db7abf

…e MFA challenges Signed-off-by: Chris Thach <[email protected]>

docs: add more details to ValidateAuthenticateChallengeResponse

8b4fd8a

Signed-off-by: Chris Thach <[email protected]>

cthach marked this pull request as ready for review October 17, 2025 18:11

cthach changed the title ~~WIP: RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 - DO NOT REVIEW 🛑~~ RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 Oct 17, 2025

github-actions bot requested review from GavinFrazar and rosstimothy October 17, 2025 18:12

cthach commented Oct 17, 2025

View reviewed changes

docs: switch to JSON for keyboard-interactive question

88e13d5

Signed-off-by: Chris Thach <[email protected]>

cthach force-pushed the rfd/0224-in-band-mfa-ssh-sessions branch from 39a9315 to 88e13d5 Compare October 17, 2025 18:24


		Proxy->>Client: Send ClusterDetails

		Client->>Proxy: Establish SSH connection

		- `StartAuthenticateChallenge`: Only the Proxy service is permitted to invoke this RPC, allowing it to initiate MFA
		challenges on behalf of users. Direct user access to this RPC will be denied.

		// Final response indicating the result of the MFA challenge.
		bool success = 2;


		#### Decision Service

		The Decision service will be updated to support evaluating SSH access requests with MFA challenge responses. This

RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 #59141

Are you sure you want to change the base?

RFD 0225: In-Band MFA for SSH Sessions 🔒🧑‍💻 #59141

Conversation

cthach commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Proof of Concepts

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Joerger Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cthach Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rosstimothy Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

cthach commented Sep 15, 2025 •

edited

Loading

Joerger Oct 9, 2025 •

edited

Loading

cthach Sep 25, 2025 •

edited

Loading

rosstimothy Sep 29, 2025 •

edited

Loading

rob-picard-teleport Oct 8, 2025 •

edited

Loading

Joerger Oct 9, 2025 •

edited

Loading