Skip to content

Conversation

@emelialei88
Copy link
Collaborator

@emelialei88 emelialei88 commented Oct 24, 2025

This PR introduces several refactors and enables authentication:

  • Adds an Authenticator class to encapsulate authentication logic.
  • Migrates connection initialization logic from InitialConnectionHandler to InitialConnectionContext.

@emelialei88 emelialei88 requested a review from a team as a code owner October 24, 2025 19:53
@emelialei88 emelialei88 marked this pull request as draft October 24, 2025 20:58
@678098 678098 self-assigned this Nov 6, 2025
@678098 678098 self-requested a review November 6, 2025 16:05
@emelialei88 emelialei88 force-pushed the authn/change-negotiator branch 2 times, most recently from d2c4c40 to ea2450d Compare November 12, 2025 18:51
@emelialei88 emelialei88 marked this pull request as ready for review November 12, 2025 19:38
Copy link
Collaborator

@dorjesinpo dorjesinpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interim review.
we have pairs of the following:
decodeInitialConnectionMessage,
scheduleRead,
readCallback
readBlob,
processBlob,
complete
... in both InitialConnectionHandler and InitialConnectionContext.

The point of FSM-like state management is that everything happens in handleEvent and there is only one place where we call complete
Otherwise, we can't rely on controlling actions by the state

Is it possible to engage handleEvent immediately in InitialConnectionHandler::handleInitialConnection and call InitialConnectionContext::complete from handleEvent only.

We seem to have InitialConnectionContext::handleInitialConnection but we call InitialConnectionHandler::handleInitialConnection instead which does not seem to support auth?

context));

// Register as observer of the channel to get the 'onClose'
bsl::weak_ptr<InitialConnectionContext> context_wp =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind the use of weak_ptr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we have InitialConnectionContext bind to onClose, which means it won't be destroyed until the channel closes. It's works but I was hoping to clean it up once the initial connection part finishes, to make the lifetime match with the semantic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there is d_initialConnectionContextCache now, maybe we do not need to bind bsl::shared_ptr<InitialConnectionContext> anymore?

bslmt::LockGuard<bslmt::Mutex> guard(&d_mutex); // LOCKED

if (d_state != AuthenticationState::e_AUTHENTICATING) {
errorDescription << "State not AUTHENTICATING (was " << d_state << ")";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was -> is

@dorjesinpo dorjesinpo assigned emelialei88 and unassigned dorjesinpo Nov 14, 2025
@emelialei88 emelialei88 force-pushed the authn/change-negotiator branch 2 times, most recently from a009820 to e4183da Compare November 17, 2025 15:58
Copy link
Collaborator Author

@emelialei88 emelialei88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response

context));

// Register as observer of the channel to get the 'onClose'
bsl::weak_ptr<InitialConnectionContext> context_wp =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we have InitialConnectionContext bind to onClose, which means it won't be destroyed until the channel closes. It's works but I was hoping to clean it up once the initial connection part finishes, to make the lifetime match with the semantic.

Copy link
Collaborator

@dorjesinpo dorjesinpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interim review

// BDE
#include <bdlbb_pooledblobbufferfactory.h>
#include <bsl_iostream.h>
#include <bsl_memory.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double checking if this is necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indirectly included in mqbmock_cluster.h but maybe we shouldn't depend on indirect includes. Needed for shared_ptr.

d_scheduler_p,
d_allocators.get("Authenticator")),
d_allocator_p);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not creating SessionNegotiator below in the same way (vs. using raw pointer and then bslma::ManagedPtr)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setAdminCommandEnqueueCallback is a function for SessionNegotiator, but not exposed for mqbnet::Negotiator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor (since we do not throw exceptions) but what I meant is

    bslma::ManagedPtr<mqbnet::Negotiator> negotiatorMp(
        new (*d_allocator_p)
        SessionNegotiator(&d_bufferFactory,
                          d_dispatcher_mp.get(),
                          d_statController_mp->clientsStatContext(),
                          &d_blobSpPool,
                          d_scheduler_p,
                          d_allocators.get("SessionNegotiator"),
        d_allocator_p);
    negotiatorMp
        ->setAdminCommandEnqueueCallback(
            bdlf::BindUtil::bind(&Application::enqueueCommand,
                                 this,
                                 bdlf::PlaceHolders::_1,    // source
                                 bdlf::PlaceHolders::_2,    // cmd
                                 bdlf::PlaceHolders::_3,    // onProcessedCb
                                 bdlf::PlaceHolders::_4));  // fromReroute

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand. But mqbnet::Negotiator doesn't expose an interface setAdminCommandEnqueueCallback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a problem (we could do bslma::ManagedPtr<mqbnet::Negotiator> negotiatorMp(impl); where impl is bslma::ManagedPtr<mqbnet::Negotiator>). The problem is setClusterCatalog below.
This is ok since we do not throw exceptions, do early returns

}

void InitialConnectionContext::handleEvent(
int statusCode,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the meaning of statusCode is not clear. It seems that all errors are communicated as InitialConnectionEvent::e_ERROR and TCPSessionFactory::initialConnectionComplete just needs statusCode != 0 (which is not enforced by InitialConnectionEvent::e_ERROR case).
Unless we want to communicate exact value to bmqio::Channel::close for some reason, the code can be simplified by removing statusCode argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the errors that occur during initial connection don’t print the error code or logs until initialConnectionComplete is called. If we were to follow this convention, it still feels necessary to pass the error code through handleEvent to initialConnectionComplete so it can be properly logged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. But d_state is primary and statusCode is secondary. On the line 710, there is no check for statusCode, if it is 0, there will be no complete call.

Maybe, we should rely on InitialConnectionState::e_FAILED and InitialConnectionState::e_NEGOTIATED being the only final states and call complete as soon as we reach them. Assuming, any other state is transitory and awaits for more event(s).

context));

// Register as observer of the channel to get the 'onClose'
bsl::weak_ptr<InitialConnectionContext> context_wp =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there is d_initialConnectionContextCache now, maybe we do not need to bind bsl::shared_ptr<InitialConnectionContext> anymore?

} // close mutex lock guard // UNLOCK

// Keep track of active channels, for logging purposes
++d_nbActiveChannels;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving ++d_nbActiveChannels changes the semantics of it. It used to track all channels, including incomplete ones.

Copy link
Collaborator Author

@emelialei88 emelialei88 Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be the race condition we discussed earlier, though I don’t recall the exact scenario. All the tests are passing now — possibly because the fuzz tests have sped up and no longer reproduce the issue. Definitely a good reminder for me to document these cases better next time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall having any problems with d_nbActiveChannels, we do not seem to check it, only log

Copy link
Contributor

@chrisbeard chrisbeard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave the authenticator interfaces a first pass, looking good overall, some comments about the implementation details and minor design bits.

Comment on lines 152 to 153
/// The authentication message received.
bmqp_ctrlmsg::AuthenticationMessage d_authenticationMessage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for storing this for the lifetime of the connection? We should not store this material once authentication completes. Can that be avoided?

Comment on lines 155 to 156
/// The encoding type used for sending a message. It should match with the
/// encoding type of the received message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment feels like it can be made clearer, perhaps something like:

// The encoding type used for sending authentication responses. The value matches
// the encoding type set by the client during authentication and ensures that all
// responses are sent using the same encoding.

const bsl::shared_ptr<mqbplug::AuthenticationResult>& value);

// NOTE: AuthenticationMessage and encodingType are set only when
// AuthenticationState is e_AUTHENTICATED. authenticationMessage() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the state mentioned in this comment correct? e_AUTHENTICATED feels wrong here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions are called only during reauthentication. The state should be e_AUTHENTICATED at that time.

Comment on lines 77 to 89
/// Produce and send outbound authentication message with the specified
/// `context`. Return 0 on success, or a non-zero error code and populate
/// the specified `errorDescription` with a description of the error
/// otherwise.
virtual int authenticationOutbound(
bsl::ostream& errorDescription,
const bsl::shared_ptr<AuthenticationContext>& context) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method feels a little out of place. Maybe that means design-wise we should rethink this, leave it out for now, or documentation on the method and interface should better define its purpose.

Comment on lines 67 to 68
const int k_MIN_THREADS = 1; // Minimum number of threads in the thread pool
const int k_MAX_THREADS = 3; // Maximum number of threads in the thread pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be configuration-driven.

Comment on lines +1029 to +1040
context->setAuthenticationMessage(authenticationMessage);
context->setAuthenticationEncodingType(
event.authenticationEventEncodingType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here, we should prefer to not store the authentication message for the lifetime of the connection.

int rc = rc_SUCCESS;
bsl::string error;

// Setup error handler based on whether this is initial authn or reauthn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Setup error handler based on whether this is initial authn or reauthn
// Set up error handler based on whether this is initial authn or reauthn

bsl::optional<bdlb::ScopeExitAny> scopeGuard;

if (isReauthn) {
// For reauthentication: setup error guard to handle failures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// For reauthentication: setup error guard to handle failures
// For reauthentication: set up error guard to handle failures

channel));
}
else {
// For initial authentication: setup state machine transition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// For initial authentication: setup state machine transition
// For initial authentication: set up state machine transition

}

bmqu::MemOutStream errorStream;
errorStream << "Authentication timeout after " << lifetime << " ms";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error could be made clearer. If I understand correctly, it's trying to convey that the client did not reauthenticate within before the lifetime expired?

Copy link
Collaborator

@dorjesinpo dorjesinpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments, mostly minor

d_scheduler_p,
d_allocators.get("Authenticator")),
d_allocator_p);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor (since we do not throw exceptions) but what I meant is

    bslma::ManagedPtr<mqbnet::Negotiator> negotiatorMp(
        new (*d_allocator_p)
        SessionNegotiator(&d_bufferFactory,
                          d_dispatcher_mp.get(),
                          d_statController_mp->clientsStatContext(),
                          &d_blobSpPool,
                          d_scheduler_p,
                          d_allocators.get("SessionNegotiator"),
        d_allocator_p);
    negotiatorMp
        ->setAdminCommandEnqueueCallback(
            bdlf::BindUtil::bind(&Application::enqueueCommand,
                                 this,
                                 bdlf::PlaceHolders::_1,    // source
                                 bdlf::PlaceHolders::_2,    // cmd
                                 bdlf::PlaceHolders::_3,    // onProcessedCb
                                 bdlf::PlaceHolders::_4));  // fromReroute


BSLS_ASSERT_SAFE(
d_initialConnectionContextCache.contains(initialConnectionContext_p));
bsl::shared_ptr<InitialConnectionContext> initialConnectionContext_sp =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of this seems to be to keep the object alive until this method returns since there is no other reference left. Is that correct? If so, it is worth a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment for d_initialConnectionContextCache in mqbnet::tcpsessionfactory.h.

if (scheduleRc != 0) {
rc = (scheduleRc * 10) + rc_SCHEDULE_REAUTHN_FAILED;
error = scheduleErrStream.str();
return; // RETURN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better still to respond (with an error)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like failing to schedule reauthentication is more an internal error than something to notify the client.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should always respond to request, client logic may depend on it. In this case, it looks like an internal error and the response shoudl indicate an error.

mqbnet::AuthenticationState::e_AUTHENTICATING // state
);

context->setAuthenticationContext(authenticationContext);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like BSLS_ASSERT_SAFE(!d_authenticationCtxSp); in InitialConnectionContext::setAuthenticationContext

} // close mutex lock guard // UNLOCK

// Keep track of active channels, for logging purposes
++d_nbActiveChannels;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall having any problems with d_nbActiveChannels, we do not seem to check it, only log

@dorjesinpo dorjesinpo assigned emelialei88 and unassigned dorjesinpo Nov 20, 2025
@emelialei88 emelialei88 force-pushed the authn/change-negotiator branch from f1e6e18 to d7bfeb3 Compare November 21, 2025 19:25
Copy link
Collaborator

@dorjesinpo dorjesinpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think we always need to respond to auth request even in the case of internal error.
A minor point - since d_initialConnectionContextCache, there no need for bsl::enable_shared_from_this<InitialConnectionContext>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants