Skip to content

Make futures send #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

Conversation

LLFourn
Copy link
Contributor

@LLFourn LLFourn commented Mar 8, 2022

Draft PR that will eventually fix #559 #165 #521

Don't review yet just here so coauthors @thomaseizinger @nickfarrow @notmandatory can keep track.

LLFourn and others added 11 commits February 24, 2022 20:39
Although somewhat convenient to have, coupling the Wallet with
the blockchain trait causes development friction and complexity.
What if sometimes the wallet is "offline" (no access to the blockchain)
but sometimes its online?
The only thing the Wallet needs the blockchain for is to sync.
But not all applications will even use the sync method and the sync
method doesn't require the full blockchain functionality.
So we instead pass the blockchain in when we want to sync.

- To further reduce the coupling with blockchain I removed the get_height call from `new` and just use the height of the
last sync in the database.
- I split up the blockchain trait a bit into subtraits.
The current options are awkward and it would be good if we could
introduce more in the future without breaking changes.
to make supporting verify_tx easier
For some reason while doing the "remove blockchain from wallet PR" I
changed the tests around in what I thought was a benign way. But it
meant (apparently) that both parties "test_sync_double_receive" were
using the same "blockchain". This meant that when the blockchain was RPC
they both imported their addresses to it and got each other's results
when syncing. This bugged out the sync and this commit fixes that.
You can do this with ensure_addresses_cached if you really want to.
@johncantrell97
Copy link
Contributor

probably not directly relevant for this PR but @notmandatory thought it might be helpful to share some recent thoughts I had about syncing a wallet when working with ldk and bdk.

LDK provides the lightning-block-sync utility (https://docs.rs/lightning-block-sync/0.0.105/lightning_block_sync/) which helps bring and keep your channel state in sync with the chain by polling a BlockSource. I was thinking since I'm already using this for ldk it would be awesome to use the same interface for keeping the bdk wallet/database in sync.

I guess one way to look at it would be to implement BlockSource for a Blockchain and then use the Blockchain to keep channel state in sync.

The other way was to implement ldk's chain::Listen interface for either bdk's Wallet or Database objects so they could be fed blocks directly using lightning-block-sync utility and then use those blocks to update the database without needing a Blockchain at all.

I guess you could actually do both of these ideas. The latter is an alternative 'sync' method where the user is in charge of pushing blocks to the wallet instead of just calling "sync".

I'm not sure if these ideas are relevant to this work but thought I'd share just in case.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Mar 8, 2022

They are somewhat relevant, thanks for sharing!

Our sync interface will center around scripts. You should be able to track a lightning channel by taking the multi-sig or punish transaction script and "syncing" that :)

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 9, 2022

The other way was to implement ldk's chain::Listen interface for either bdk's Wallet or Database objects so they could be fed blocks directly using lightning-block-sync utility and then use those blocks to update the database without needing a Blockchain at all.

I don't think it makes sense to feed blocks to the database directly. What if the block you are feeding is not the child of the last block you fed? In LDK they have a BlockSource which needs to provide any block via block header which makes it easy to solve this problem.

I think what you are after is being able to sync quickly every time there's a new block. Indeed our work here should make this much easier. So the API we are aiming for using esplora or electrum etc would be:

let sync_session = wallet.start_sync(options)?;
let update = esplora.sync(&sync_session)?;
wallet.apply_update(update)?;

So with a more real-time block approach we could make this something like:

loop {
   let update = block_source.look_for_updates(addresses).await;
   wallet.apply_update(update);
}

So every time block_source finds a new transactions related to the addresses it creates a wallet update which is applied immediately and then it goes back to looking.

The tricky thing will be updating addresses as the wallet produces more of them. But there are many ways you could go about this and it will depend on your application.

@johncantrell97
Copy link
Contributor

Yeah that api would work for esplora / electrum style implementation.

As for making sure you feed blocks in order all you need to do is track the last block synced. LDK's "lightning-block-sync" module takes an object that implements Listen and it's last known good BlockHash and can bring it and keep it to tip.

So you just need to implement logic for updating the wallet database for the block connected and block disconnected events.

I ended up going this route for now but it's a bit of a hack given the limitations of the current bdk api.

Will watch this PR and hopefully once finalized will allow for a cleaner integration.

@LLFourn LLFourn marked this pull request as draft March 10, 2022 21:17
@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 10, 2022

We've sketched some further changes to the syncing on the ureq. Here's an interesting questions that came up:

  1. How do we replace the Progress trait?
  2. @afilini mentioned that we could just have a Box<dyn FnMut(usize)>> passed through and then no need for trait.
  3. @thomaseizinger mentioned that we could track progress by items being consumed from the iterator.
  4. Alternatively we could return an iterator of transactions associated with each address from the ureq implementation which would make it easy to track progress. It would also mean we could get rid of last_active_index tracking from the ureq implementation.

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 15, 2022

Point 4 above doesn't really work since we need to know additional information like the block height/header the information is valid for. We can't return any results until we know this was consistent from start to finish.

I think most of the work done so far here should be replaced with an incremental update approach and the idea of a full sync is just when you use a stop gap to find updates. Here's some more detail:

// Generated by the descriptor watcher
struct SyncSession<I> {
   checkpoints: Vec<(u32, BlockHash)>,
   scripts: I // an iterator of scripts
} 

// Generated by the blockchain
struct SyncUpdate {
    transactions: Vec<Transaction>,
    checkpoint_invalidate: Option<(u32, BlockHash)>.
    new_checkpoint: (u32, BlockHash),
}

This is missing some detail that's in #165 but the point is that when you start a sync you give the blockchain a list of checkpoints (height and header) that you know about. The blockchain will then try and invalidate one or more of the checkpoints by finding the oldest checkpoint that is no longer valid (the block doesn't exist). It also returns a new checkpoint.

The descriptor watcher then deletes transactions if a checkpoint was invalidated. It then adds the new checkpoint to its list of checkpoints.

thoughts?

@notmandatory
Copy link
Member

This incremental scheme reminds me a bit of how CBF blockchain sync works (#81), comparing block height/hash checkpoints with peers before downloading filters.

Is a Database (with a Wallet) an example of a descriptor watcher? if using an async Database and Blockchain would it look something like below?

let sync_session = my_database.sync_session(&my_wallet).await?; // get checkpoints from db, scripts from wallet
let sync_update = my_blockchain.sync(sync_session).await?; // plus some sort of progress callback
my_database.update(sync_update).await?; // if database updates are large may need a progress callback here too
my_database.list_transactions().await?; // a bit awkward but I'm not sure how else to do this async

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 15, 2022

This incremental scheme reminds me a bit of how CBF blockchain sync works (#81), comparing block height/hash checkpoints with peers before downloading filters.

Good. I hope this makes it easier.

Is a Database (with a Wallet) an example of a descriptor watcher? if using an async Database and Blockchain would it look something like below?

Yeah a DescrioptorWatcher or maybe DescriptorManager is essentially what a Wallet is now except it only has one keychain and maybe doesn't have signers. It just keeps a database of related stuff and processes updates. So yes a bit of a wrapped around a database (maybe!).

I dunno how we're going to integrate async databases into this atm. What @thomaseizinger has pointed out to me is that you can pretty much have all txs and indexes associated with your wallet in memory at the same time without it costing much. Sure an exchange with hundreds of thousands of tx's will need a gigabyte of memory but that's fine.

If we want async storage then what we might wanna do is provide a way of syncing the in memory database back (and maybe forth?) with a more permanent database via an event sourcing kind of approach which could be async. I don't have a strong vision here thought but I do want to try and avoid more async traits.

@thomaseizinger
Copy link
Contributor

Point 4 above doesn't really work since we need to know additional information like the block height/header the information is valid for. We can't return any results until we know this was consistent from start to finish.

I think most of the work done so far here should be replaced with an incremental update approach and the idea of a full sync is just when you use a stop gap to find updates. Here's some more detail:

// Generated by the descriptor watcher
struct SyncSession<I> {
   checkpoints: Vec<(u32, BlockHash)>,
   scripts: I // an iterator of scripts
} 

// Generated by the blockchain
struct SyncUpdate {
    transactions: Vec<Transaction>,
    checkpoint_invalidate: Option<(u32, BlockHash)>.
    new_checkpoint: (u32, BlockHash),
}

This is missing some detail that's in #165 but the point is that when you start a sync you give the blockchain a list of checkpoints (height and header) that you know about. The blockchain will then try and invalidate one or more of the checkpoints by finding the oldest checkpoint that is no longer valid (the block doesn't exist). It also returns a new checkpoint.

The descriptor watcher then deletes transactions if a checkpoint was invalidated. It then adds the new checkpoint to its list of checkpoints.

thoughts?

I think this makes sense. What I'd like to mention is that from a dependency PoV, having a (shared) struct is maybe not a good idea. If you want to move each blockchain backend into its own crate, they would have to depend on some shared crate that contains the struct that is not bdk so that the bdk facade crate can then bundle all backends together and re-export them. But maybe we need such a shared "core" crate anyway so those structs could go in there.

@thomaseizinger
Copy link
Contributor

One more question: Why would we need to pass in multiple checkpoints per script iterator? Wouldn't one be enough assuming that script iterator == descriptor and thus we call this backend API multiple times?

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Mar 15, 2022

If we want async storage then what we might wanna do is provide a way of syncing the in memory database back (and maybe forth?) with a more permanent database via an event sourcing kind of approach which could be async.

Just a minor note: The syncing can be achieved in various ways, not necessarily just with event sourcing although it might lead itself naturally to it to record all changes to the memory database as events and upon sync, apply them to the persistent one (but not store the events there).

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 15, 2022

Just a minor note: The syncing can be achieved in various ways, not necessarily just with event sourcing although it might lead itself naturally to it to record all changes to the memory database as events and upon sync, apply them to the persistent one (but not store the events there).

I think that this can be modelled by the sync checkpoints themselves. Each sync checkpoint represents a change in the database and if we can make it easy to find all changes associated with each checkpoint I think we've won (that also makes it easy to roll back a checkpoint if that checkpoint becomes invalid).

One more question: Why would we need to pass in multiple checkpoints per script iterator? Wouldn't one be enough assuming that script iterator == descriptor and thus we call this backend API multiple times?

I don't understand what you meant in the second sentence. The reason why you pass in multiple is so that we know which checkpoints are invalid and which ones are still valid so we can apply the change even if one of them was invalid.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Mar 16, 2022

One more question: Why would we need to pass in multiple checkpoints per script iterator? Wouldn't one be enough assuming that script iterator == descriptor and thus we call this backend API multiple times?

I don't understand what you meant in the second sentence. The reason why you pass in multiple is so that we know which checkpoints are invalid and which ones are still valid so we can apply the change even if one of them was invalid.

I don't understand that. The checkpoint is the current height right as known to the wallet / database? Why is it not good enough for the sync backend to be told what the latest height is that we have in the database? The sync backend function (the one we prototyped for esplora) should IMO outright fail if the hash at the given height is no longer the same.

As a result, the overall sync function has to grab an earlier checkpoint from the DB and try syncing again. You wouldn't want to reimplement this logic for every backend so I think it shouldn't live in there but in an overall sync function that looks like this:

mod bdk {
	// Sub-module for blocking API, modeled after `reqwest::blocking`
	mod blocking {
			pub fn sync(backend: &impl FetchRelatedTransactionsBlocking, wallet: &Wallet) {
				// for each descriptor in wallet
				// fetch related transactions
				// apply result to wallet

				// if any error during syncing, apply fallback logic
				// for example, if sync backend fails because hash at height differs, restart this loop and grab an earlier checkpoint from the wallet
			}
	}

	pub async fn sync(backend: &impl FetchRelatedTransactions, wallet: &Wallet) {
		// same as above but async version of `FetchRelatedTransactions`
	}
}

Edit: IMO we should strive for the logic to be implement by every backend to be as minimal as possible and try to reuse as much as we can across all of them.

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 17, 2022

@thomaseizinger Ok I think I can get on board with that approach. I don't mind leaving it up to another layer of logic to collect all the invalid checkpoints!

@thomaseizinger
Copy link
Contributor

@thomaseizinger Ok I think I can get on board with that approach. I don't mind leaving it up to another layer of logic to collect all the invalid checkpoints!

What I'd like to point out is that such a design will push us towards FetchRelatedTransactions or whatever we call it to be a good building block. If for whatever reason, our sync logic is not suitable for a bdk user, then they can always copy-paste the above and modify to their liking and things will still work!

notmandatory added a commit that referenced this pull request Apr 6, 2022
adef166 Create vector of thread handles to spawn threads (nickfarrow)

Pull request description:

  ### Description
  Speeds up esplora ureq syncing. Taken from #560

  The current sync just creates a map of scripts to joinhandles which doesn't yet spawn the sync threads due to lazy evaluation. In the following `handles.map()`, the thread handles are *sequentially* evaluated and joined. With the fix, the handles are collected so that the threads spawn in parallel, and then joined

  ### Notes to the reviewers

  I had to add a `#[allow(clippy::needless_collect)]` so that it wouldn't complain about collecting and then iterating. (Perhaps clippy is partially responsible for this issue!)

  Tested sync performance by doing a fresh sync on an existing [gun](https://gun.fun) wallet.
  ```
  ---- Before fix ---
  real0m13.121s
  real0m13.367s
  real0m13.211s

  ---- After fix ----
  real0m5.516s
  real0m5.251s
  real0m5.594s
  ```

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### Bugfixes:

  * [ ] This pull request breaks the existing API
  * [ ] I've added tests to reproduce the issue which are now passing
  * [ ] I'm linking the issue being fixed by this PR

ACKs for top commit:
  notmandatory:
    ACK adef166

Tree-SHA512: 47c617117afde9b4706bfa63759bf06d1ec60ff95d8a80931e5b7e40e3293c855d2f7dac0c681173d43aecf77201a842e739b82291da09ac81909cf526a51c8d

- Removed `Blockchain` from Wallet.
- Removed `Wallet::broadcast` (just use `Blockchain::broadcast`)
- Depreciated `Wallet::new_offline` (all wallets are offline now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Deprecated

) -> Result<(), Error> {
use crate::blockchain::script_sync::Request;
let mut request = script_sync::start(database, self.stop_gap)?;
let mut request = script_sync::start(database, 20)?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest bringing this magic number out into a constant

Copy link

@cryptoquick cryptoquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever approach. I just thought of another case where I could really make use of a unified interface between wasm and desktop builds; I'm looking to make a benchmark without it needing to be browser-side, so I took the time to look things over. You have my concept ACK.

@LLFourn
Copy link
Contributor Author

LLFourn commented Jun 10, 2022

hey @cryptoquick, this idea balooned into https://bitcoindevkit.org/blog/bdk-core-pt1/
This is the approach I'm taking to solve this now. Feedback really welcome.

I'll close this.

@LLFourn LLFourn closed this Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

maybe_async complicates isomorphic use-cases
5 participants