-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Background tryAdd
functionality in TransactionQueue
#4617
base: master
Are you sure you want to change the base?
Conversation
result = mTransactionQueue.tryAdd(tx, submittedFromSelf); | ||
if (mApp.getConfig().BACKGROUND_TX_QUEUE && !submittedFromSelf) | ||
{ | ||
mApp.postOnOverlayThread( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping we could move the whole flow Peer::recvTransaction to the background to avoid going to the main thread. Right now the flow is:
- Transactions is read on the overlay thread
- Transaction is sent to an appropriate queue on the main thread
- Adding to the tx queue is done on the overlay thread.
I think the switch between threads will prevent us from observing performance benefits, because txs will be stuck in the main thread scheduler. Ideally, we can bypass the scheduler, and add to the queue directly.
mApp.postOnOverlayThread( | ||
[this, tx]() { mTransactionQueue->tryAdd(tx, false); }, | ||
"try add tx"); | ||
result.code = TransactionQueue::AddResultCode::ADD_STATUS_UNKNOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will go away if you move this flow to the background completely, but I wanted to point out that we don't want to introduce "unknown" add status. Right now the status is used for two purposes: to decide if the transaction should be flooded, and to return something to the user submitting the transaction. The latter one is important, because downstream systems depend on submission results. We should structure the code such that the submission result is decided synchronously during transaction submission. Note that for local transaction submission (in CommandHandler::tx), we can do it on the main thread for this first iteration.
}; | ||
|
||
// TODO: Docs | ||
class ImmutableValidationSnapshot : public ValidationConnector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind ImmutableValidationSnapshot is definitely a step in the right direction! I think to ensure correctness and safety, we'd need to take it a step further, and store all these configs inside of the snapshot object. Otherwise, right now callers have to manage both the snapshot and soroban config/protocol version, and make sure those are refreshed at the same time, and do not drift. I opened #4607 to clean that up, since I encountered the same thing in #4543. If we implement #4607, then we'll have a canonical way of accessing read-only immutable ledger state, such that callers don't need to worry about inconsistencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: if we go the route described above, we should be able to remove ImmutableValidationSnapshot argument from the whole checkValid flow, and use LedgerSnapshot instead to retrieve SorobanConfig/LCL/etc.
@@ -116,6 +133,17 @@ AppConnector::now() const | |||
return mApp.getClock().now(); | |||
} | |||
|
|||
VirtualClock::system_time_point | |||
AppConnector::system_now() const | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
system clock access via chrono should be thread-safe. Virtual time wasn't thread-safe, but it should be now as of latest master (added in #4543 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize the comment in AppConnector about thread-safety was a bit misleading, because it doesn't account for virtual time (it was initially used for overlay TCP peer only, which doesn't use virtual time, but now that we're expanding its use cases, the comment is no longer accurate)
src/herder/TransactionQueue.cpp
Outdated
@@ -266,6 +270,7 @@ isDuplicateTx(TransactionFrameBasePtr oldTx, TransactionFrameBasePtr newTx) | |||
bool | |||
TransactionQueue::sourceAccountPending(AccountID const& accountID) const | |||
{ | |||
std::lock_guard<std::recursive_mutex> guard(mTxQueueMutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any insights into lock contention with the current implementation of tx queue synchronization? Might be useful to turn these into RECURSIVE_LOCK_GUARD, so you can observe thread blocking in Tracy.
thanks for putting this together! I'm excited about moving more of transaction processing to the background. It also seems like making tx queue thread-safe is not as dangerous/scary as we thought, which is great. My two main suggestions are to consolidate all read-only ledger state into a single object to avoid footguns, and re-work recvTransaction flow a little bit to remove main thread completely (this way we'll have a synchronous flow independent of main thread, from when the message is popped off the wire all the way to tx queue insertion) |
This is a *draft* change that will resolve stellar#4316 when it is complete. The change makes `TransactionQueue` thread safe and runs the `tryAdd` function in the background when the feature is enabled. The implementation closely follows the [design document](https://docs.google.com/document/d/1pU__XfEp-rR-17TNsuj-VhY6JfyendaFSYLTiq6tIj4/edit?usp=sharing) I wrote. The implementation still requires the main thread to re-broadcast the transactions (for now). I've opened this PR for visibility / early feedback on the implementation. This change is very much a work in progress, with the following tasks remaining: * [ ] Fix catchup. I seem to have broken catchup in rebasing these changes on master. I need to figure out what is going on there and fix it. * [ ] Fix failing tests. These are failing because they don't update `TransactionQueue`s new snapshots correctly. * [ ] Rigorous testing, both for correctness and performance. * [ ] I'd like to take a look at pushing the cut-point out a bit to enable flooding in the background as well. If this is a relatively simple change, I'd like to roll it into this PR. If it looks hairy, then I'll leave it for a separate change later.
This is what was breaking catchup
This is a draft change that will resolve #4316 when it is complete. The change makes
TransactionQueue
thread safe and runs thetryAdd
function in the background when the feature is enabled. The implementation closely follows thedesign document I wrote. The implementation still requires the main thread to re-broadcast the transactions (for now). I've opened this PR for visibility / early feedback on the implementation.
This change is very much a work in progress, with the following tasks remaining:
TransactionQueue
s new snapshots correctly.