-
Notifications
You must be signed in to change notification settings - Fork 414
Async recipient-side of static invoice server #3618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Async recipient-side of static invoice server #3618
Conversation
Will go through the commits in a bit more detail before taking this out of draft, but conceptual feedback or feedback on the protocol itself is welcome, or the way the code is organized overall. It does add a significant amount of code to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I wonder if we shouldn't allow the client to cache N offers rather than only 1. I worry a bit about the privacy implications of having One Offer that gets reused across different contexts.
I think that makes sense, so they would interactively build and cache a few and then randomly(?) return one of them on It seems reasonable to save for follow-up although I could adapt the |
Yea, I dunno what to do for the fetch'er, maybe we just expose the whole list?
Makes sense, tho I imagine it would be a rather trivial diff, no? |
lightning/src/offers/signer.rs
Outdated
const IV_BYTES: &[u8; IV_LEN] = b"LDK Offer Paths~"; | ||
let mut hmac = expanded_key.hmac_for_offer(); | ||
hmac.input(IV_BYTES); | ||
hmac.input(&nonce.0); | ||
hmac.input(ASYNC_PAYMENTS_OFFER_PATHS_INPUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include path_absolute_expiry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the nonce/IV was sufficient but I'm not certain. @TheBlueMatt would it be an improvement to commit to the expiry in the hmac? IIUC the path still can't be re-purposed...
Going to base this on #3640. Will finish updating the ser macros based on those changes and push updates here after finishing some review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple of comments that I find out while working on the CI failure in #3593
5cf3585
to
5455d55
Compare
Pushed some updates after moving the async receive offer cache into the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New to the codebase but interested in following async payments. From reading the explanation in the commit messages, the protocol/flow between the async recipient and the always-online node to build the static invoice and offer made sense. Overall the code changes look good to me.
5455d55
to
f8023ca
Compare
Rebased on the latest version of #3639 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3618 +/- ##
==========================================
- Coverage 89.65% 89.56% -0.10%
==========================================
Files 164 165 +1
Lines 134658 134832 +174
Branches 134658 134832 +174
==========================================
+ Hits 120723 120756 +33
- Misses 11256 11390 +134
- Partials 2679 2686 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d1cc154
to
68fd751
Compare
Pushed some minor fixes for CI. |
22d1021
to
b7b8390
Compare
Pushed an overhaul of the PR based on discussions with @TheBlueMatt last week. Diff. Planning to discuss in more detail over a sync with @joostjager next week. Planning to push a few more updates today improving some docs and things. Edit: pushed a few more docs updates and fixes. Diff |
b7b8390
to
069c6b8
Compare
These are the items to save for follow-up PRs that I'm thinking:
|
/// | ||
// We need to re-persist the cache if a fresh offer was just marked as used to ensure we continue | ||
// to keep this offer's invoice updated and don't replace it with the server. | ||
pub fn get_async_receive_offer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for reviewers: should we refuse to return offers that don't have at least a week until they expire, or some behavior like that? I'm also wondering if we should always return the offer with the expiry that is the furthest out, regardless of whether it's a used offer or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to what we discussed offline: how is the user going to know that the QR code that they got isn't valid forever? An expiring QR code seems undesirable for a user.
Random association: Is BIP 353 able to do something here somehow? (just writing this down, not familiar with it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offer expiry is available in the offer, but we may want some behavior in LDK Node where we return the offer expiry as a separate return value to make it more obvious when it expires. To me it seems likely that most offer paths returned by an LSP will be never-expiring, since that's the best UX, but it's hard to be sure right now.
BIP 353 may indeed help here because IIUC we're putting an offer in a DNS record, so it may be easier to rotate the offer that's in a DNS record than update an offer that's hardcoded into a website.
/// Returns `None` if the cache is full and no offers can currently be replaced. | ||
/// | ||
/// [`ServeStaticInvoice::invoice_slot`]: crate::onion_message::async_payments::ServeStaticInvoice::invoice_slot | ||
fn needs_new_offer_idx(&self, duration_since_epoch: Duration) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to add tests for this method's behavior in the follow-up, looking for conceptual ACKs on the overall approach with replaceable offers/caching offers and retrying persisting their invoice until it succeeds first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to move some logic from flow.rs
to async_receive_offer_cache.rs
. flow.rs
is already pretty big and covers several largely-unrelated state machines. Keep the async caching state machine more in its own file would be nice.
last_offer_paths_request_timestamp: Duration, | ||
/// Blinded paths used to request offer paths from the static invoice server. | ||
#[allow(unused)] // TODO: remove when we get rid of async payments cfg flag | ||
paths_to_static_invoice_server: Vec<BlindedMessagePath>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem used here, and its duplicated in the flow. Can we drop it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that these paths will only be set very rarely, likely only once if the server and recipient are expected to be direct peers. So I wanted them to be persistent across restarts without being a separate field that gets persisted in ChannelManager
.
(Also updated the docs to be more clear about this).
/// The time as duration since the Unix epoch at which this path expires and messages sent over | ||
/// it should be ignored. | ||
/// | ||
/// As an async recipient we use this field to time out a static invoice server from sending us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a timeout really the best way to accomplish this? vs, eg, storing the hash of the LSP's BPs here and check that the ones we want to use haven't changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I think the real purpose of this is to avoid using provided OfferPaths
if the message was very delayed, will update.
If our paths to the LSP have changed, I'd think that's probably fine as long as we're still with the same LSP. If we aren't with the same LSP, then the process of interactively building the offer/invoice probably won't complete.
let (serve_static_invoice, reply_context) = match self.flow.handle_offer_paths( | ||
_message, | ||
_context, | ||
responder.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to clone
, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We persist this path in the cache
so we can update the invoice corresponding to this offer in the future, and also use it below when replying directly to the offer_paths
message.
069c6b8
to
0606582
Compare
Rebased on main to resolve minor conflicts in |
0606582
to
17b7100
Compare
Addressed most feedback. Diff Looking into moving more code into |
@TheBlueMatt I looked into this, it seems reasonable but I think it needs a prefactor to move the async offer/static invoice creation utils into the I guess if we move in your suggested direction we'd want to start with that prefactor + 5b567a1 in a separate PR, rather than landing this one, since so much of the current code would be imminently moving otherwise? My reasoning for liking the current separation of responsibilities was basically that I like how simple/focused the cache is right now, and the invoice/onion message creation bits seemed similar to the flow's other responsibilities (plus the flow already has all the blinded path utils/message router/queues/expanded key available to it). The flow ends up at around ~1500 LoC in this PR. I can definitely see the argument for putting the async offer/message creation into the cache though since it is a pretty different state machine from BOLT 12. |
/// number to the index of the offer in the cache. | ||
/// | ||
/// [`ServeStaticInvoice::invoice_slot`]: crate::onion_message::async_payments::ServeStaticInvoice | ||
offers: Vec<Option<AsyncReceiveOffer>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was again the reason that for the MVP, we need to cache multiple offers?
/// have a maximally fresh offer. We always want to have at least 1 offer in this state, | ||
/// preferably a few so we can respond to user requests for new offers without returning the same | ||
/// one multiple times. Returning a new offer each time is better for privacy. | ||
Ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still wonder whether creation and registration of a new offer cannot happen on-demand. It seems rare for an offer not to be coming from cache, and showing a brief spinner seems acceptable for an MVP?
/// the server, we indicate which invoice is being replaced by the invoice's "slot number", | ||
/// see [`ServeStaticInvoice::invoice_slot`]. So rather than internally tracking which cached | ||
/// offer corresponds to what invoice slot number on the server's end, we always set the slot | ||
/// number to the index of the offer in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there potential for things getting out of sync here, maybe if the client loses its state?
@@ -5248,6 +5248,20 @@ where | |||
) | |||
} | |||
|
|||
#[cfg(async_payments)] | |||
fn check_refresh_async_receive_offers(&self, timer_tick_occurred: bool) { | |||
let peers = self.get_peers_for_blinded_path(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only call this when new offers are needed?
// possible when a user goes to retrieve a cached offer. | ||
// | ||
// We avoid replacing unused offers too quickly -- this prevents the case where we send multiple | ||
// invoices from different offers competing for the same slot to the server, messages are received |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a downside of the slot approach and is having some kind of offer id / hash safer?
@@ -1082,4 +1104,9 @@ where | |||
) -> Vec<(DNSResolverMessage, MessageSendInstructions)> { | |||
core::mem::take(&mut self.pending_dns_onion_messages.lock().unwrap()) | |||
} | |||
|
|||
/// Get the `AsyncReceiveOfferCache` for persistence. | |||
pub(crate) fn writeable_async_receive_offer_cache(&self) -> impl Writeable + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If channel manager persistence had already been removed, how would you have handled cache persistence?
Err(()) => continue, | ||
}; | ||
|
||
let reply_path_context = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps do a compare here to see if anything changed? To prevent an LSP from being flooded with once per minute messages? Also not sure how suspend mechanisms work on all platforms/phones/browsers, and whether there is a risk of this happening continuously.
Or we could land this one, since its where it is now, and then move the code later? Code moves are pretty cheap to review so its not like its adding a lot of review burden by doing this one first, no?
Fair, and its nice to have a pretty encapsulated cache, but even 1500 LoC is pretty large, and importantly its several fairly unrelated "flows" :/. The cache currently also kinda leaks a bit, eg the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me. As mentioned both online and in private discussions, this doesn't quite feel like a minimal implementation or MVP to me, though I recognize that definitions of those can vary. What seems optional to one person might be essential to someone else. I believe that if the goal had been to support only the current state of Lightning (a client connected to an LSP, without blinded paths or expirations), the design could have been more straightforward. The handling of various expirations, slots, and race conditions seems to have added a fair amount of complexity.
From an architectural standpoint, I think the tight coupling with the channel manager, especially around persistence, could have been avoided.
That said, I'm overall okay with moving forward. As far as I can tell, this code doesn't impact critical paths, is opt-in for users, and only involves connections to a trusted invoice server.
{ | ||
let cache = self.async_receive_offer_cache.lock().unwrap(); | ||
for offer_and_metadata in cache.offers_needing_invoice_refresh() { | ||
let (offer, offer_nonce, slot_number, update_static_invoice_path) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip the intermediate offer_and_metadata
var? Maybe a formatting consideration though.
pub(super) fn cache_pending_offer( | ||
&mut self, offer: Offer, offer_paths_absolute_expiry_secs: Option<u64>, offer_nonce: Nonce, | ||
update_static_invoice_path: Responder, duration_since_epoch: Duration, | ||
) -> Result<u8, ()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u8 should be enough for anyone 😅
|
||
let serve_invoice_message = | ||
ServeStaticInvoice { invoice, forward_invoice_request_path, invoice_slot }; | ||
Some((serve_invoice_message, reply_path_context)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably how the framework is set up, but it seems somewhat inconsistent that we queue onion messages in flow.rs and also send responses in messenger.rs
/// | ||
// We need to re-persist the cache if a fresh offer was just marked as used to ensure we continue | ||
// to keep this offer's invoice updated and don't replace it with the server. | ||
pub fn get_async_receive_offer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to what we discussed offline: how is the user going to know that the QR code that they got isn't valid forever? An expiring QR code seems undesirable for a user.
Random association: Is BIP 353 able to do something here somehow? (just writing this down, not familiar with it).
|
||
// If no unused offers are available, return the used offer with the latest absolute expiry | ||
self.offers_with_idx() | ||
.filter(|(_, offer)| matches!(offer.status, OfferStatus::Used)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't last invoice update be tracked in this case too?
} | ||
|
||
// Require the offer that would be built using these paths to last at least | ||
// MIN_OFFER_PATHS_RELATIVE_EXPIRY_SECS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think constants in comments can be made into links. There may be other instances.
} | ||
} | ||
|
||
// Allow more offer paths requests to be sent out in a burst roughly once per minute, or if an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that every minute, we try three times?
Because async recipients are not online to respond to invoice requests, the plan is for another node on the network that is always-online to serve static invoices on their behalf. The protocol is as follows: - Recipient is configured with blinded message paths to reach the static invoice server - On startup, recipient requests blinded message paths for inclusion in their offer from the static invoice server over the configured paths - Server replies with offer paths for the recipient - Recipient builds their offer using these paths and the corresponding static invoice and replies with the invoice - Server persists the invoice and confirms that they've persisted it, causing the recipient to cache the interactively built offer for use At pay-time, the payer sends an invoice request to the static invoice server, who replies with the static invoice after forwarding the invreq to the recipient (to give them a chance to provide a fresh invoice in case they're online). Here we add the requisite trait methods and onion messages to support this protocol. An alterate design could be for the async recipient to publish static invoices directly without a preceding offer, e.g. on their website. Some drawbacks of this design include: 1) No fallback to regular BOLT 12 in the case that the recipient happens to be online at pay-time. Falling back to regular BOLT 12 allows the recipient to provide a fresh invoice and regain the proof-of-payment property 2) Static invoices don't fit in a QR code 3) No automatic rotation of the static invoice, which is useful in the case that payment paths become outdated due to changing fees, etc
In future commits, as part of being an async recipient, we will interactively build offers and static invoices with an always-online node that will serve static invoices on our behalf. Once an offer is built and we've requested persistence of the corresponding invoice from the server, we will use the new offer cache added here to save the invoice metadata and the offer in ChannelManager, though the OffersMessageFlow is responsible for keeping the cache updated. We want to cache and persist these offers so we always have them at the ready, we don't want to begin the process of interactively building an offer the moment it is needed. The offers are likely to be long-lived so caching them avoids having to keep interactively rebuilding them after every restart.
In future commits, as part of being an async recipient, we will interactively build offers and static invoices with an always-online node that will serve static invoices on our behalf. Here we add a setter for the blinded paths that we as an async recipient will use to contact the static invoice server to get paths to put in our offers.
As an async recipient, we need to interactively build static invoices that an always-online node will serve to payers on our behalf. At the start of this process, we send a requests for paths to include in our offers to the always-online node on startup and refresh the cached offers when they start to get stale.
As an async recipient, we need to interactively build a static invoice that an always-online node will serve to payers on our behalf. As part of this process, the static invoice server sends us blinded message paths to include in our offer so they'll receive invoice requests from senders trying to pay us while we're offline. On receipt of these paths, create an offer and static invoice and send the invoice back to the server so they can provide the invoice to payers, as well as caching the offer as pending.
As an async recipient, we need to interactively build a static invoice that an always-online node will serve on our behalf. Once this invoice is built and persisted by the static invoice server, they will send us a confirmation onion message. At this time, mark the corresponding pending offer as ready to receive async payments.
As an async recipient, we need to interactively build offers and corresponding static invoices, the latter of which an always-online node will serve to payers on our behalf. We want the static invoice server to always have the freshest possible invoice available, so on each timer tick for every offer that is either currently in use or pending confirmation, send them a new invoice.
Over the past multiple commits we've implemented interactively building async receive offers with a static invoice server that will service invoice requests on our behalf as an async recipient. Here we add an API to retrieve a resulting offer so we can receive payments when we're offline.
17b7100
to
848d50e
Compare
As part of being an async recipient, we need to interactively build an offer and static invoice with an always-online node that will serve static invoices on our behalf in response to invoice requests from payers.
While users could build this invoice manually, the plan is for LDK to automatically build it for them using onion messages. See this doc for more details on the protocol. Here we implement the client side of the linked protocol.
See lightning/bolts#1149 for more information on async payments.
Partially addresses #2298