Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Background tryAdd functionality in TransactionQueue #4617

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

bboston7
Copy link
Contributor

@bboston7 bboston7 commented Jan 9, 2025

This is a draft change that will resolve #4316 when it is complete. The change makes TransactionQueue thread safe and runs the tryAdd function in the background when the feature is enabled. The implementation closely follows the
design document I wrote. The implementation still requires the main thread to re-broadcast the transactions (for now). I've opened this PR for visibility / early feedback on the implementation.

This change is very much a work in progress, with the following tasks remaining:

  • Fix catchup. I seem to have broken catchup in rebasing these changes on master. I need to figure out what is going on there and fix it.
  • Rebase on top of parallel ledger close changes
  • Fix failing tests. These are failing because they don't update TransactionQueues new snapshots correctly.
  • Rigorous testing, both for correctness and performance.
  • Address all TODOs I added
  • Improve documentation
  • I'd like to take a look at pushing the cut-point out a bit to enable flooding in the background as well. If this is a relatively simple change, I'd like to roll it into this PR. If it looks hairy, then I'll leave it for a separate change later.

@bboston7 bboston7 requested a review from marta-lokhova January 9, 2025 19:27
result = mTransactionQueue.tryAdd(tx, submittedFromSelf);
if (mApp.getConfig().BACKGROUND_TX_QUEUE && !submittedFromSelf)
{
mApp.postOnOverlayThread(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we could move the whole flow Peer::recvTransaction to the background to avoid going to the main thread. Right now the flow is:

  • Transactions is read on the overlay thread
  • Transaction is sent to an appropriate queue on the main thread
  • Adding to the tx queue is done on the overlay thread.

I think the switch between threads will prevent us from observing performance benefits, because txs will be stuck in the main thread scheduler. Ideally, we can bypass the scheduler, and add to the queue directly.

mApp.postOnOverlayThread(
[this, tx]() { mTransactionQueue->tryAdd(tx, false); },
"try add tx");
result.code = TransactionQueue::AddResultCode::ADD_STATUS_UNKNOWN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will go away if you move this flow to the background completely, but I wanted to point out that we don't want to introduce "unknown" add status. Right now the status is used for two purposes: to decide if the transaction should be flooded, and to return something to the user submitting the transaction. The latter one is important, because downstream systems depend on submission results. We should structure the code such that the submission result is decided synchronously during transaction submission. Note that for local transaction submission (in CommandHandler::tx), we can do it on the main thread for this first iteration.

};

// TODO: Docs
class ImmutableValidationSnapshot : public ValidationConnector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind ImmutableValidationSnapshot is definitely a step in the right direction! I think to ensure correctness and safety, we'd need to take it a step further, and store all these configs inside of the snapshot object. Otherwise, right now callers have to manage both the snapshot and soroban config/protocol version, and make sure those are refreshed at the same time, and do not drift. I opened #4607 to clean that up, since I encountered the same thing in #4543. If we implement #4607, then we'll have a canonical way of accessing read-only immutable ledger state, such that callers don't need to worry about inconsistencies.

Copy link
Contributor

@marta-lokhova marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we go the route described above, we should be able to remove ImmutableValidationSnapshot argument from the whole checkValid flow, and use LedgerSnapshot instead to retrieve SorobanConfig/LCL/etc.

src/herder/TransactionQueue.cpp Show resolved Hide resolved
src/herder/TransactionQueue.cpp Outdated Show resolved Hide resolved
src/herder/TransactionQueue.h Show resolved Hide resolved
@@ -116,6 +133,17 @@ AppConnector::now() const
return mApp.getClock().now();
}

VirtualClock::system_time_point
AppConnector::system_now() const
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system clock access via chrono should be thread-safe. Virtual time wasn't thread-safe, but it should be now as of latest master (added in #4543 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the comment in AppConnector about thread-safety was a bit misleading, because it doesn't account for virtual time (it was initially used for overlay TCP peer only, which doesn't use virtual time, but now that we're expanding its use cases, the comment is no longer accurate)

src/herder/TransactionQueue.h Outdated Show resolved Hide resolved
src/herder/TransactionQueue.cpp Outdated Show resolved Hide resolved
@@ -266,6 +270,7 @@ isDuplicateTx(TransactionFrameBasePtr oldTx, TransactionFrameBasePtr newTx)
bool
TransactionQueue::sourceAccountPending(AccountID const& accountID) const
{
std::lock_guard<std::recursive_mutex> guard(mTxQueueMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any insights into lock contention with the current implementation of tx queue synchronization? Might be useful to turn these into RECURSIVE_LOCK_GUARD, so you can observe thread blocking in Tracy.

@marta-lokhova
Copy link
Contributor

marta-lokhova commented Jan 10, 2025

thanks for putting this together! I'm excited about moving more of transaction processing to the background. It also seems like making tx queue thread-safe is not as dangerous/scary as we thought, which is great. My two main suggestions are to consolidate all read-only ledger state into a single object to avoid footguns, and re-work recvTransaction flow a little bit to remove main thread completely (this way we'll have a synchronous flow independent of main thread, from when the message is popped off the wire all the way to tx queue insertion)

This is a *draft* change that will resolve stellar#4316 when it is complete.
The change makes `TransactionQueue` thread safe and runs the `tryAdd`
function in the background when the feature is enabled. The
implementation closely follows the
[design document](https://docs.google.com/document/d/1pU__XfEp-rR-17TNsuj-VhY6JfyendaFSYLTiq6tIj4/edit?usp=sharing)
I wrote.  The implementation still requires the main thread to
re-broadcast the transactions (for now). I've opened this PR for
visibility / early feedback on the implementation.

This change is very much a work in progress, with the following tasks
remaining:

* [ ] Fix catchup. I seem to have broken catchup in rebasing these
      changes on master. I need to figure out what is going on there and fix
      it.
* [ ] Fix failing tests. These are failing because they don't update
      `TransactionQueue`s new snapshots correctly.
* [ ] Rigorous testing, both for correctness and performance.
* [ ] I'd like to take a look at pushing the cut-point out a bit to
      enable flooding in the background as well. If this is a relatively
      simple change, I'd like to roll it into this PR. If it looks hairy,
      then I'll leave it for a separate change later.
This is what was breaking catchup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants