Make futures send #560

LLFourn · 2022-03-08T05:45:17Z

Draft PR that will eventually fix #559 #165 #521

Don't review yet just here so coauthors @thomaseizinger @nickfarrow @notmandatory can keep track.

Although somewhat convenient to have, coupling the Wallet with the blockchain trait causes development friction and complexity. What if sometimes the wallet is "offline" (no access to the blockchain) but sometimes its online? The only thing the Wallet needs the blockchain for is to sync. But not all applications will even use the sync method and the sync method doesn't require the full blockchain functionality. So we instead pass the blockchain in when we want to sync. - To further reduce the coupling with blockchain I removed the get_height call from `new` and just use the height of the last sync in the database. - I split up the blockchain trait a bit into subtraits.

The current options are awkward and it would be good if we could introduce more in the future without breaking changes.

to make supporting verify_tx easier

For some reason while doing the "remove blockchain from wallet PR" I changed the tests around in what I thought was a benign way. But it meant (apparently) that both parties "test_sync_double_receive" were using the same "blockchain". This meant that when the blockchain was RPC they both imported their addresses to it and got each other's results when syncing. This bugged out the sync and this commit fixes that.

Co-authored-by: Steve Myers <[email protected]>

You can do this with ensure_addresses_cached if you really want to.

johncantrell97 · 2022-03-08T16:37:10Z

probably not directly relevant for this PR but @notmandatory thought it might be helpful to share some recent thoughts I had about syncing a wallet when working with ldk and bdk.

LDK provides the lightning-block-sync utility (https://docs.rs/lightning-block-sync/0.0.105/lightning_block_sync/) which helps bring and keep your channel state in sync with the chain by polling a BlockSource. I was thinking since I'm already using this for ldk it would be awesome to use the same interface for keeping the bdk wallet/database in sync.

I guess one way to look at it would be to implement BlockSource for a Blockchain and then use the Blockchain to keep channel state in sync.

The other way was to implement ldk's chain::Listen interface for either bdk's Wallet or Database objects so they could be fed blocks directly using lightning-block-sync utility and then use those blocks to update the database without needing a Blockchain at all.

I guess you could actually do both of these ideas. The latter is an alternative 'sync' method where the user is in charge of pushing blocks to the wallet instead of just calling "sync".

I'm not sure if these ideas are relevant to this work but thought I'd share just in case.

thomaseizinger · 2022-03-08T21:13:50Z

They are somewhat relevant, thanks for sharing!

Our sync interface will center around scripts. You should be able to track a lightning channel by taking the multi-sig or punish transaction script and "syncing" that :)

LLFourn · 2022-03-09T00:07:29Z

The other way was to implement ldk's chain::Listen interface for either bdk's Wallet or Database objects so they could be fed blocks directly using lightning-block-sync utility and then use those blocks to update the database without needing a Blockchain at all.

I don't think it makes sense to feed blocks to the database directly. What if the block you are feeding is not the child of the last block you fed? In LDK they have a BlockSource which needs to provide any block via block header which makes it easy to solve this problem.

I think what you are after is being able to sync quickly every time there's a new block. Indeed our work here should make this much easier. So the API we are aiming for using esplora or electrum etc would be:

let sync_session = wallet.start_sync(options)?;
let update = esplora.sync(&sync_session)?;
wallet.apply_update(update)?;

So with a more real-time block approach we could make this something like:

loop {
   let update = block_source.look_for_updates(addresses).await;
   wallet.apply_update(update);
}

So every time block_source finds a new transactions related to the addresses it creates a wallet update which is applied immediately and then it goes back to looking.

The tricky thing will be updating addresses as the wallet produces more of them. But there are many ways you could go about this and it will depend on your application.

johncantrell97 · 2022-03-09T00:19:43Z

Yeah that api would work for esplora / electrum style implementation.

As for making sure you feed blocks in order all you need to do is track the last block synced. LDK's "lightning-block-sync" module takes an object that implements Listen and it's last known good BlockHash and can bring it and keep it to tip.

So you just need to implement logic for updating the wallet database for the block connected and block disconnected events.

I ended up going this route for now but it's a bit of a hack given the limitations of the current bdk api.

Will watch this PR and hopefully once finalized will allow for a cleaner integration.

LLFourn · 2022-03-10T22:57:23Z

We've sketched some further changes to the syncing on the ureq. Here's an interesting questions that came up:

How do we replace the Progress trait?
@afilini mentioned that we could just have a Box<dyn FnMut(usize)>> passed through and then no need for trait.
@thomaseizinger mentioned that we could track progress by items being consumed from the iterator.
Alternatively we could return an iterator of transactions associated with each address from the ureq implementation which would make it easy to track progress. It would also mean we could get rid of last_active_index tracking from the ureq implementation.

LLFourn · 2022-03-15T01:19:10Z

Point 4 above doesn't really work since we need to know additional information like the block height/header the information is valid for. We can't return any results until we know this was consistent from start to finish.

I think most of the work done so far here should be replaced with an incremental update approach and the idea of a full sync is just when you use a stop gap to find updates. Here's some more detail:

// Generated by the descriptor watcher
struct SyncSession<I> {
   checkpoints: Vec<(u32, BlockHash)>,
   scripts: I // an iterator of scripts
} 

// Generated by the blockchain
struct SyncUpdate {
    transactions: Vec<Transaction>,
    checkpoint_invalidate: Option<(u32, BlockHash)>.
    new_checkpoint: (u32, BlockHash),
}

This is missing some detail that's in #165 but the point is that when you start a sync you give the blockchain a list of checkpoints (height and header) that you know about. The blockchain will then try and invalidate one or more of the checkpoints by finding the oldest checkpoint that is no longer valid (the block doesn't exist). It also returns a new checkpoint.

The descriptor watcher then deletes transactions if a checkpoint was invalidated. It then adds the new checkpoint to its list of checkpoints.

thoughts?

notmandatory · 2022-03-15T03:07:30Z

This incremental scheme reminds me a bit of how CBF blockchain sync works (#81), comparing block height/hash checkpoints with peers before downloading filters.

Is a Database (with a Wallet) an example of a descriptor watcher? if using an async Database and Blockchain would it look something like below?

let sync_session = my_database.sync_session(&my_wallet).await?; // get checkpoints from db, scripts from wallet
let sync_update = my_blockchain.sync(sync_session).await?; // plus some sort of progress callback
my_database.update(sync_update).await?; // if database updates are large may need a progress callback here too
my_database.list_transactions().await?; // a bit awkward but I'm not sure how else to do this async

LLFourn · 2022-03-15T03:30:37Z

This incremental scheme reminds me a bit of how CBF blockchain sync works (#81), comparing block height/hash checkpoints with peers before downloading filters.

Good. I hope this makes it easier.

Is a Database (with a Wallet) an example of a descriptor watcher? if using an async Database and Blockchain would it look something like below?

Yeah a DescrioptorWatcher or maybe DescriptorManager is essentially what a Wallet is now except it only has one keychain and maybe doesn't have signers. It just keeps a database of related stuff and processes updates. So yes a bit of a wrapped around a database (maybe!).

I dunno how we're going to integrate async databases into this atm. What @thomaseizinger has pointed out to me is that you can pretty much have all txs and indexes associated with your wallet in memory at the same time without it costing much. Sure an exchange with hundreds of thousands of tx's will need a gigabyte of memory but that's fine.

If we want async storage then what we might wanna do is provide a way of syncing the in memory database back (and maybe forth?) with a more permanent database via an event sourcing kind of approach which could be async. I don't have a strong vision here thought but I do want to try and avoid more async traits.

thomaseizinger · 2022-03-15T05:30:35Z

Point 4 above doesn't really work since we need to know additional information like the block height/header the information is valid for. We can't return any results until we know this was consistent from start to finish.

I think most of the work done so far here should be replaced with an incremental update approach and the idea of a full sync is just when you use a stop gap to find updates. Here's some more detail:
// Generated by the descriptor watcher
struct SyncSession<I> {
   checkpoints: Vec<(u32, BlockHash)>,
   scripts: I // an iterator of scripts
} 

// Generated by the blockchain
struct SyncUpdate {
    transactions: Vec<Transaction>,
    checkpoint_invalidate: Option<(u32, BlockHash)>.
    new_checkpoint: (u32, BlockHash),
}
This is missing some detail that's in #165 but the point is that when you start a sync you give the blockchain a list of checkpoints (height and header) that you know about. The blockchain will then try and invalidate one or more of the checkpoints by finding the oldest checkpoint that is no longer valid (the block doesn't exist). It also returns a new checkpoint.

The descriptor watcher then deletes transactions if a checkpoint was invalidated. It then adds the new checkpoint to its list of checkpoints.

thoughts?

I think this makes sense. What I'd like to mention is that from a dependency PoV, having a (shared) struct is maybe not a good idea. If you want to move each blockchain backend into its own crate, they would have to depend on some shared crate that contains the struct that is not bdk so that the bdk facade crate can then bundle all backends together and re-export them. But maybe we need such a shared "core" crate anyway so those structs could go in there.

thomaseizinger · 2022-03-15T05:31:54Z

One more question: Why would we need to pass in multiple checkpoints per script iterator? Wouldn't one be enough assuming that script iterator == descriptor and thus we call this backend API multiple times?

thomaseizinger · 2022-03-15T05:32:54Z

If we want async storage then what we might wanna do is provide a way of syncing the in memory database back (and maybe forth?) with a more permanent database via an event sourcing kind of approach which could be async.

Just a minor note: The syncing can be achieved in various ways, not necessarily just with event sourcing although it might lead itself naturally to it to record all changes to the memory database as events and upon sync, apply them to the persistent one (but not store the events there).

LLFourn · 2022-03-15T22:17:53Z

Just a minor note: The syncing can be achieved in various ways, not necessarily just with event sourcing although it might lead itself naturally to it to record all changes to the memory database as events and upon sync, apply them to the persistent one (but not store the events there).

I think that this can be modelled by the sync checkpoints themselves. Each sync checkpoint represents a change in the database and if we can make it easy to find all changes associated with each checkpoint I think we've won (that also makes it easy to roll back a checkpoint if that checkpoint becomes invalid).

One more question: Why would we need to pass in multiple checkpoints per script iterator? Wouldn't one be enough assuming that script iterator == descriptor and thus we call this backend API multiple times?

I don't understand what you meant in the second sentence. The reason why you pass in multiple is so that we know which checkpoints are invalid and which ones are still valid so we can apply the change even if one of them was invalid.

thomaseizinger · 2022-03-16T02:03:02Z

One more question: Why would we need to pass in multiple checkpoints per script iterator? Wouldn't one be enough assuming that script iterator == descriptor and thus we call this backend API multiple times?

I don't understand what you meant in the second sentence. The reason why you pass in multiple is so that we know which checkpoints are invalid and which ones are still valid so we can apply the change even if one of them was invalid.

I don't understand that. The checkpoint is the current height right as known to the wallet / database? Why is it not good enough for the sync backend to be told what the latest height is that we have in the database? The sync backend function (the one we prototyped for esplora) should IMO outright fail if the hash at the given height is no longer the same.

As a result, the overall sync function has to grab an earlier checkpoint from the DB and try syncing again. You wouldn't want to reimplement this logic for every backend so I think it shouldn't live in there but in an overall sync function that looks like this:

mod bdk {
	// Sub-module for blocking API, modeled after `reqwest::blocking`
	mod blocking {
			pub fn sync(backend: &impl FetchRelatedTransactionsBlocking, wallet: &Wallet) {
				// for each descriptor in wallet
				// fetch related transactions
				// apply result to wallet

				// if any error during syncing, apply fallback logic
				// for example, if sync backend fails because hash at height differs, restart this loop and grab an earlier checkpoint from the wallet
			}
	}

	pub async fn sync(backend: &impl FetchRelatedTransactions, wallet: &Wallet) {
		// same as above but async version of `FetchRelatedTransactions`
	}
}

Edit: IMO we should strive for the logic to be implement by every backend to be as minimal as possible and try to reuse as much as we can across all of them.

LLFourn · 2022-03-17T02:42:01Z

@thomaseizinger Ok I think I can get on board with that approach. I don't mind leaving it up to another layer of logic to collect all the invalid checkpoints!

thomaseizinger · 2022-03-17T04:55:50Z

@thomaseizinger Ok I think I can get on board with that approach. I don't mind leaving it up to another layer of logic to collect all the invalid checkpoints!

What I'd like to point out is that such a design will push us towards FetchRelatedTransactions or whatever we call it to be a good building block. If for whatever reason, our sync logic is not suitable for a bdk user, then they can always copy-paste the above and modify to their liking and things will still work!

adef166 Create vector of thread handles to spawn threads (nickfarrow) Pull request description: ### Description Speeds up esplora ureq syncing. Taken from #560 The current sync just creates a map of scripts to joinhandles which doesn't yet spawn the sync threads due to lazy evaluation. In the following `handles.map()`, the thread handles are *sequentially* evaluated and joined. With the fix, the handles are collected so that the threads spawn in parallel, and then joined ### Notes to the reviewers I had to add a `#[allow(clippy::needless_collect)]` so that it wouldn't complain about collecting and then iterating. (Perhaps clippy is partially responsible for this issue!) Tested sync performance by doing a fresh sync on an existing [gun](https://gun.fun) wallet. ``` ---- Before fix --- real0m13.121s real0m13.367s real0m13.211s ---- After fix ---- real0m5.516s real0m5.251s real0m5.594s ``` ### Checklists #### All Submissions: * [x] I've signed all my commits * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md) * [x] I ran `cargo fmt` and `cargo clippy` before committing #### Bugfixes: * [ ] This pull request breaks the existing API * [ ] I've added tests to reproduce the issue which are now passing * [ ] I'm linking the issue being fixed by this PR ACKs for top commit: notmandatory: ACK adef166 Tree-SHA512: 47c617117afde9b4706bfa63759bf06d1ec60ff95d8a80931e5b7e40e3293c855d2f7dac0c681173d43aecf77201a842e739b82291da09ac81909cf526a51c8d

cryptoquick · 2022-06-09T23:42:45Z

CHANGELOG.md

+
+- Removed `Blockchain` from Wallet.
+- Removed `Wallet::broadcast` (just use `Blockchain::broadcast`)
+- Depreciated `Wallet::new_offline` (all wallets are offline now)


*Deprecated

cryptoquick · 2022-06-09T23:49:29Z

src/blockchain/esplora/ureq.rs

    ) -> Result<(), Error> {
        use crate::blockchain::script_sync::Request;
-        let mut request = script_sync::start(database, self.stop_gap)?;
+        let mut request = script_sync::start(database, 20)?;


I'd suggest bringing this magic number out into a constant

cryptoquick

This is a clever approach. I just thought of another case where I could really make use of a unified interface between wasm and desktop builds; I'm looking to make a benchmark without it needing to be browser-side, so I took the time to look things over. You have my concept ACK.

LLFourn · 2022-06-10T06:58:03Z

hey @cryptoquick, this idea balooned into https://bitcoindevkit.org/blog/bdk-core-pt1/
This is the approach I'm taking to solve this now. Feedback really welcome.

I'll close this.

LLFourn and others added 11 commits February 24, 2022 20:39

Add SyncOptions as the second argument to Wallet::sync

410a513

The current options are awkward and it would be good if we could introduce more in the future without breaking changes.

Restore but depreciate new_offline

dcd90f8

Split get_tx into its own trait

c0e75fc

to make supporting verify_tx easier

apply doc suggestions from @notmandatory

fbb50ad

Co-authored-by: Steve Myers <[email protected]>

Merge branch 'master' into remove-blockchain-from-wallet

d03aa85

Remove max_addresses sync param

45767fc

You can do this with ensure_addresses_cached if you really want to.

Fix typo in CHANGELOG

660faab

Temporarily disable async stuff

6292d02

implement ureq single keychain full sync

5698666

notmandatory assigned LLFourn Mar 9, 2022

nickfarrow mentioned this pull request Mar 9, 2022

Fix for esplora thread iterator LLFourn/bdk#3

Merged

Full sync with ureq esplora

7601420

LLFourn marked this pull request as draft March 10, 2022 21:17

nickfarrow mentioned this pull request Apr 5, 2022

Faster sync by collecting esplora ureq thread handles #579

Merged

6 tasks

notmandatory mentioned this pull request Apr 27, 2022

W17 BDK Library Team Call bitcoindevkit/.github#1

Closed

9 tasks

cryptoquick reviewed Jun 9, 2022

View reviewed changes

LLFourn closed this Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make futures send #560

Make futures send #560

LLFourn commented Mar 8, 2022

johncantrell97 commented Mar 8, 2022

thomaseizinger commented Mar 8, 2022 •

edited

Loading

LLFourn commented Mar 9, 2022

johncantrell97 commented Mar 9, 2022

LLFourn commented Mar 10, 2022

LLFourn commented Mar 15, 2022 •

edited

Loading

notmandatory commented Mar 15, 2022

LLFourn commented Mar 15, 2022

thomaseizinger commented Mar 15, 2022

thomaseizinger commented Mar 15, 2022

thomaseizinger commented Mar 15, 2022 •

edited

Loading

LLFourn commented Mar 15, 2022

thomaseizinger commented Mar 16, 2022 •

edited

Loading

LLFourn commented Mar 17, 2022

thomaseizinger commented Mar 17, 2022

cryptoquick Jun 9, 2022

cryptoquick Jun 9, 2022

cryptoquick left a comment

LLFourn commented Jun 10, 2022

Make futures send #560

Make futures send #560

Conversation

LLFourn commented Mar 8, 2022

johncantrell97 commented Mar 8, 2022

thomaseizinger commented Mar 8, 2022 • edited Loading

LLFourn commented Mar 9, 2022

johncantrell97 commented Mar 9, 2022

LLFourn commented Mar 10, 2022

LLFourn commented Mar 15, 2022 • edited Loading

notmandatory commented Mar 15, 2022

LLFourn commented Mar 15, 2022

thomaseizinger commented Mar 15, 2022

thomaseizinger commented Mar 15, 2022

thomaseizinger commented Mar 15, 2022 • edited Loading

LLFourn commented Mar 15, 2022

thomaseizinger commented Mar 16, 2022 • edited Loading

LLFourn commented Mar 17, 2022

thomaseizinger commented Mar 17, 2022

cryptoquick Jun 9, 2022

Choose a reason for hiding this comment

cryptoquick Jun 9, 2022

Choose a reason for hiding this comment

cryptoquick left a comment

Choose a reason for hiding this comment

LLFourn commented Jun 10, 2022

thomaseizinger commented Mar 8, 2022 •

edited

Loading

LLFourn commented Mar 15, 2022 •

edited

Loading

thomaseizinger commented Mar 15, 2022 •

edited

Loading

thomaseizinger commented Mar 16, 2022 •

edited

Loading