WIP: Background `tryAdd` functionality in TransactionQueue #4617

bboston7 · 2025-01-09T19:27:33Z

This is a draft change that will resolve #4316 when it is complete. The change makes TransactionQueue thread safe and runs the tryAdd function in the background when the feature is enabled. The implementation closely follows the
design document I wrote. The implementation still requires the main thread to re-broadcast the transactions (for now). I've opened this PR for visibility / early feedback on the implementation.

This change is very much a work in progress, with the following tasks remaining:

Fix catchup. I seem to have broken catchup in rebasing these changes on master. I need to figure out what is going on there and fix it.
Rebase on top of parallel ledger close changes
Fix failing tests. These are failing because they don't update TransactionQueues new snapshots correctly.
Rigorous testing, both for correctness and performance.
Address all TODOs I added
Improve documentation
I'd like to take a look at pushing the cut-point out a bit to enable flooding in the background as well. If this is a relatively simple change, I'd like to roll it into this PR. If it looks hairy, then I'll leave it for a separate change later.

marta-lokhova · 2025-01-10T00:36:42Z

src/herder/HerderImpl.cpp

-        result = mTransactionQueue.tryAdd(tx, submittedFromSelf);
+        if (mApp.getConfig().BACKGROUND_TX_QUEUE && !submittedFromSelf)
+        {
+            mApp.postOnOverlayThread(


I was hoping we could move the whole flow Peer::recvTransaction to the background to avoid going to the main thread. Right now the flow is:

Transactions is read on the overlay thread

Transaction is sent to an appropriate queue on the main thread

Adding to the tx queue is done on the overlay thread.

I think the switch between threads will prevent us from observing performance benefits, because txs will be stuck in the main thread scheduler. Ideally, we can bypass the scheduler, and add to the queue directly.

marta-lokhova · 2025-01-10T00:42:10Z

src/herder/HerderImpl.cpp

+            mApp.postOnOverlayThread(
+                [this, tx]() { mTransactionQueue->tryAdd(tx, false); },
+                "try add tx");
+            result.code = TransactionQueue::AddResultCode::ADD_STATUS_UNKNOWN;


I think this will go away if you move this flow to the background completely, but I wanted to point out that we don't want to introduce "unknown" add status. Right now the status is used for two purposes: to decide if the transaction should be flooded, and to return something to the user submitting the transaction. The latter one is important, because downstream systems depend on submission results. We should structure the code such that the submission result is decided synchronously during transaction submission. Note that for local transaction submission (in CommandHandler::tx), we can do it on the main thread for this first iteration.

marta-lokhova · 2025-01-10T00:54:56Z

src/transactions/TransactionFrameBase.h

+};
+
+// TODO: Docs
+class ImmutableValidationSnapshot : public ValidationConnector


The idea behind ImmutableValidationSnapshot is definitely a step in the right direction! I think to ensure correctness and safety, we'd need to take it a step further, and store all these configs inside of the snapshot object. Otherwise, right now callers have to manage both the snapshot and soroban config/protocol version, and make sure those are refreshed at the same time, and do not drift. I opened #4607 to clean that up, since I encountered the same thing in #4543. If we implement #4607, then we'll have a canonical way of accessing read-only immutable ledger state, such that callers don't need to worry about inconsistencies.

Note: if we go the route described above, we should be able to remove ImmutableValidationSnapshot argument from the whole checkValid flow, and use LedgerSnapshot instead to retrieve SorobanConfig/LCL/etc.

src/herder/TransactionQueue.cpp

src/herder/TransactionQueue.h

marta-lokhova · 2025-01-10T01:19:44Z

src/main/AppConnector.cpp

@@ -116,6 +133,17 @@ AppConnector::now() const
    return mApp.getClock().now();
 }

+VirtualClock::system_time_point
+AppConnector::system_now() const
+{


system clock access via chrono should be thread-safe. Virtual time wasn't thread-safe, but it should be now as of latest master (added in #4543 )

I realize the comment in AppConnector about thread-safety was a bit misleading, because it doesn't account for virtual time (it was initially used for overlay TCP peer only, which doesn't use virtual time, but now that we're expanding its use cases, the comment is no longer accurate)

src/herder/TransactionQueue.h

src/herder/TransactionQueue.cpp

marta-lokhova · 2025-01-10T01:56:54Z

src/herder/TransactionQueue.cpp

@@ -266,6 +270,7 @@ isDuplicateTx(TransactionFrameBasePtr oldTx, TransactionFrameBasePtr newTx)
 bool
 TransactionQueue::sourceAccountPending(AccountID const& accountID) const
 {
+    std::lock_guard<std::recursive_mutex> guard(mTxQueueMutex);


Any insights into lock contention with the current implementation of tx queue synchronization? Might be useful to turn these into RECURSIVE_LOCK_GUARD, so you can observe thread blocking in Tracy.

marta-lokhova · 2025-01-10T02:01:05Z

thanks for putting this together! I'm excited about moving more of transaction processing to the background. It also seems like making tx queue thread-safe is not as dangerous/scary as we thought, which is great. My two main suggestions are to consolidate all read-only ledger state into a single object to avoid footguns, and re-work recvTransaction flow a little bit to remove main thread completely (this way we'll have a synchronous flow independent of main thread, from when the message is popped off the wire all the way to tx queue insertion)

This is a *draft* change that will resolve stellar#4316 when it is complete. The change makes `TransactionQueue` thread safe and runs the `tryAdd` function in the background when the feature is enabled. The implementation closely follows the [design document](https://docs.google.com/document/d/1pU__XfEp-rR-17TNsuj-VhY6JfyendaFSYLTiq6tIj4/edit?usp=sharing) I wrote. The implementation still requires the main thread to re-broadcast the transactions (for now). I've opened this PR for visibility / early feedback on the implementation. This change is very much a work in progress, with the following tasks remaining: * [ ] Fix catchup. I seem to have broken catchup in rebasing these changes on master. I need to figure out what is going on there and fix it. * [ ] Fix failing tests. These are failing because they don't update `TransactionQueue`s new snapshots correctly. * [ ] Rigorous testing, both for correctness and performance. * [ ] I'd like to take a look at pushing the cut-point out a bit to enable flooding in the background as well. If this is a relatively simple change, I'd like to roll it into this PR. If it looks hairy, then I'll leave it for a separate change later.

…ueSizeOps`

This is what was breaking catchup

bboston7 requested a review from marta-lokhova January 9, 2025 19:27

marta-lokhova requested changes Jan 10, 2025

View reviewed changes

marta-lokhova reviewed Jan 10, 2025

View reviewed changes

bboston7 added 6 commits January 15, 2025 16:32

Restore protocol version check in `SorobanTransactionQueue::getMaxQue…

5c77e4d

…ueSizeOps`

Fix missing soroban network config issue

ab96235

This is what was breaking catchup

recursive_mutex -> mutex

3ccb6e1

Remove redundant passing around of protocol version

ff35c02

Fix error introduced in rebase

6410694

bboston7 force-pushed the bg-tx-queue branch from 1697978 to 6410694 Compare January 16, 2025 00:57

bboston7 added 3 commits January 16, 2025 13:47

Fix tx queue tests

139dc27

Fix integration with bg ledger close

992bccd

Typo

18e03a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Background `tryAdd` functionality in TransactionQueue #4617

WIP: Background `tryAdd` functionality in TransactionQueue #4617

bboston7 commented Jan 9, 2025 •

edited

Loading

marta-lokhova Jan 10, 2025

marta-lokhova Jan 10, 2025

marta-lokhova Jan 10, 2025

marta-lokhova Jan 10, 2025 •

edited

Loading

marta-lokhova Jan 10, 2025

marta-lokhova Jan 10, 2025

marta-lokhova Jan 10, 2025

marta-lokhova commented Jan 10, 2025 •

edited

Loading

WIP: Background tryAdd functionality in TransactionQueue #4617

Are you sure you want to change the base?

WIP: Background tryAdd functionality in TransactionQueue #4617

Conversation

bboston7 commented Jan 9, 2025 • edited Loading

marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

marta-lokhova Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

marta-lokhova Jan 10, 2025

Choose a reason for hiding this comment

marta-lokhova commented Jan 10, 2025 • edited Loading

WIP: Background `tryAdd` functionality in TransactionQueue #4617

WIP: Background `tryAdd` functionality in TransactionQueue #4617

bboston7 commented Jan 9, 2025 •

edited

Loading

marta-lokhova Jan 10, 2025 •

edited

Loading

marta-lokhova commented Jan 10, 2025 •

edited

Loading