Skip to content

Async recipient-side of static invoice server #3618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

valentinewallace
Copy link
Contributor

@valentinewallace valentinewallace commented Feb 24, 2025

As part of being an async recipient, we need to interactively build an offer and static invoice with an always-online node that will serve static invoices on our behalf in response to invoice requests from payers.

While users could build this invoice manually, the plan is for LDK to automatically build it for them using onion messages. See this doc for more details on the protocol. Here we implement the client side of the linked protocol.

See lightning/bolts#1149 for more information on async payments.

Partially addresses #2298

@valentinewallace
Copy link
Contributor Author

Will go through the commits in a bit more detail before taking this out of draft, but conceptual feedback or feedback on the protocol itself is welcome, or the way the code is organized overall. It does add a significant amount of code to ChannelManager currently.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I wonder if we shouldn't allow the client to cache N offers rather than only 1. I worry a bit about the privacy implications of having One Offer that gets reused across different contexts.

@valentinewallace
Copy link
Contributor Author

Hmmm, I wonder if we shouldn't allow the client to cache N offers rather than only 1. I worry a bit about the privacy implications of having One Offer that gets reused across different contexts.

I think that makes sense, so they would interactively build and cache a few and then randomly(?) return one of them on get_cached_async_receive_offer?

It seems reasonable to save for follow-up although I could adapt the AsyncReceiveOffer cache struct serialization for this now.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Feb 25, 2025

I think that makes sense, so they would interactively build and cache a few and then randomly(?) return one of them on get_cached_async_receive_offer?

Yea, I dunno what to do for the fetch'er, maybe we just expose the whole list?

It seems reasonable to save for follow-up although I could adapt the AsyncReceiveOffer cache struct serialization for this now.

Makes sense, tho I imagine it would be a rather trivial diff, no?

@jkczyz jkczyz self-requested a review February 27, 2025 18:10
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label Feb 27, 2025
Comment on lines 568 to 592
const IV_BYTES: &[u8; IV_LEN] = b"LDK Offer Paths~";
let mut hmac = expanded_key.hmac_for_offer();
hmac.input(IV_BYTES);
hmac.input(&nonce.0);
hmac.input(ASYNC_PAYMENTS_OFFER_PATHS_INPUT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include path_absolute_expiry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the nonce/IV was sufficient but I'm not certain. @TheBlueMatt would it be an improvement to commit to the expiry in the hmac? IIUC the path still can't be re-purposed...

@valentinewallace valentinewallace marked this pull request as ready for review March 4, 2025 21:14
@valentinewallace
Copy link
Contributor Author

Going to base this on #3640. Will finish updating the ser macros based on those changes and push updates here after finishing some review.

@valentinewallace valentinewallace removed the weekly goal Someone wants to land this this week label Mar 4, 2025
Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of comments that I find out while working on the CI failure in #3593

@valentinewallace
Copy link
Contributor Author

Pushed some updates after moving the async receive offer cache into the new OffersMessageFlow struct added in #3639.

Copy link
Contributor

@elnosh elnosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New to the codebase but interested in following async payments. From reading the explanation in the commit messages, the protocol/flow between the async recipient and the always-online node to build the static invoice and offer made sense. Overall the code changes look good to me.

@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch from 5455d55 to f8023ca Compare May 21, 2025 00:11
@valentinewallace
Copy link
Contributor Author

Rebased on the latest version of #3639

Copy link

codecov bot commented May 21, 2025

Codecov Report

Attention: Patch coverage is 27.74566% with 125 lines in your changes missing coverage. Please review.

Project coverage is 89.56%. Comparing base (24063e5) to head (848d50e).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 19.11% 55 Missing ⚠️
lightning/src/onion_message/async_payments.rs 0.00% 29 Missing ⚠️
lightning/src/ln/peer_handler.rs 0.00% 20 Missing ⚠️
lightning/src/onion_message/functional_tests.rs 0.00% 20 Missing ⚠️
lightning/src/offers/async_receive_offer_cache.rs 95.45% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3618      +/-   ##
==========================================
- Coverage   89.65%   89.56%   -0.10%     
==========================================
  Files         164      165       +1     
  Lines      134658   134832     +174     
  Branches   134658   134832     +174     
==========================================
+ Hits       120723   120756      +33     
- Misses      11256    11390     +134     
- Partials     2679     2686       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch 3 times, most recently from d1cc154 to 68fd751 Compare May 22, 2025 22:52
@valentinewallace
Copy link
Contributor Author

Pushed some minor fixes for CI.

@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch from 22d1021 to b7b8390 Compare June 20, 2025 17:03
@valentinewallace valentinewallace added the weekly goal Someone wants to land this this week label Jun 20, 2025
@valentinewallace
Copy link
Contributor Author

valentinewallace commented Jun 20, 2025

Pushed an overhaul of the PR based on discussions with @TheBlueMatt last week. Diff. Planning to discuss in more detail over a sync with @joostjager next week.

Planning to push a few more updates today improving some docs and things.

Edit: pushed a few more docs updates and fixes. Diff

@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch from b7b8390 to 069c6b8 Compare June 20, 2025 22:09
@valentinewallace
Copy link
Contributor Author

These are the items to save for follow-up PRs that I'm thinking:

  • support the server configuring how many invoices they'll persist for the user (currently it's hardcoded to 10)
  • proactively invalidate offers/invoices when a channel closes (or when fees change?)
  • exponential backoff on the messages the client is sending the server?

///
// We need to re-persist the cache if a fresh offer was just marked as used to ensure we continue
// to keep this offer's invoice updated and don't replace it with the server.
pub fn get_async_receive_offer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for reviewers: should we refuse to return offers that don't have at least a week until they expire, or some behavior like that? I'm also wondering if we should always return the offer with the expiry that is the furthest out, regardless of whether it's a used offer or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to what we discussed offline: how is the user going to know that the QR code that they got isn't valid forever? An expiring QR code seems undesirable for a user.

Random association: Is BIP 353 able to do something here somehow? (just writing this down, not familiar with it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offer expiry is available in the offer, but we may want some behavior in LDK Node where we return the offer expiry as a separate return value to make it more obvious when it expires. To me it seems likely that most offer paths returned by an LSP will be never-expiring, since that's the best UX, but it's hard to be sure right now.

BIP 353 may indeed help here because IIUC we're putting an offer in a DNS record, so it may be easier to rotate the offer that's in a DNS record than update an offer that's hardcoded into a website.

/// Returns `None` if the cache is full and no offers can currently be replaced.
///
/// [`ServeStaticInvoice::invoice_slot`]: crate::onion_message::async_payments::ServeStaticInvoice::invoice_slot
fn needs_new_offer_idx(&self, duration_since_epoch: Duration) -> Option<usize> {
Copy link
Contributor Author

@valentinewallace valentinewallace Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to add tests for this method's behavior in the follow-up, looking for conceptual ACKs on the overall approach with replaceable offers/caching offers and retrying persisting their invoice until it succeeds first.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to move some logic from flow.rs to async_receive_offer_cache.rs. flow.rs is already pretty big and covers several largely-unrelated state machines. Keep the async caching state machine more in its own file would be nice.

last_offer_paths_request_timestamp: Duration,
/// Blinded paths used to request offer paths from the static invoice server.
#[allow(unused)] // TODO: remove when we get rid of async payments cfg flag
paths_to_static_invoice_server: Vec<BlindedMessagePath>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem used here, and its duplicated in the flow. Can we drop it here?

Copy link
Contributor Author

@valentinewallace valentinewallace Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that these paths will only be set very rarely, likely only once if the server and recipient are expected to be direct peers. So I wanted them to be persistent across restarts without being a separate field that gets persisted in ChannelManager.

(Also updated the docs to be more clear about this).

/// The time as duration since the Unix epoch at which this path expires and messages sent over
/// it should be ignored.
///
/// As an async recipient we use this field to time out a static invoice server from sending us
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a timeout really the best way to accomplish this? vs, eg, storing the hash of the LSP's BPs here and check that the ones we want to use haven't changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think the real purpose of this is to avoid using provided OfferPaths if the message was very delayed, will update.

If our paths to the LSP have changed, I'd think that's probably fine as long as we're still with the same LSP. If we aren't with the same LSP, then the process of interactively building the offer/invoice probably won't complete.

let (serve_static_invoice, reply_context) = match self.flow.handle_offer_paths(
_message,
_context,
responder.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to clone, do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We persist this path in the cache so we can update the invoice corresponding to this offer in the future, and also use it below when replying directly to the offer_paths message.

@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch from 069c6b8 to 0606582 Compare June 24, 2025 15:36
@valentinewallace
Copy link
Contributor Author

Rebased on main to resolve minor conflicts in peer_handler.rs

@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch from 0606582 to 17b7100 Compare June 24, 2025 19:46
@valentinewallace
Copy link
Contributor Author

Addressed most feedback. Diff

Looking into moving more code into async_receive_offer_cache.rs.

@valentinewallace
Copy link
Contributor Author

valentinewallace commented Jun 24, 2025

Would be nice to move some logic from flow.rs to async_receive_offer_cache.rs. flow.rs is already pretty big and covers several largely-unrelated state machines. Keep the async caching state machine more in its own file would be nice.

@TheBlueMatt I looked into this, it seems reasonable but I think it needs a prefactor to move the async offer/static invoice creation utils into the async_receive_offer_cache first.

I guess if we move in your suggested direction we'd want to start with that prefactor + 5b567a1 in a separate PR, rather than landing this one, since so much of the current code would be imminently moving otherwise?

My reasoning for liking the current separation of responsibilities was basically that I like how simple/focused the cache is right now, and the invoice/onion message creation bits seemed similar to the flow's other responsibilities (plus the flow already has all the blinded path utils/message router/queues/expanded key available to it). The flow ends up at around ~1500 LoC in this PR.

I can definitely see the argument for putting the async offer/message creation into the cache though since it is a pretty different state machine from BOLT 12.

/// number to the index of the offer in the cache.
///
/// [`ServeStaticInvoice::invoice_slot`]: crate::onion_message::async_payments::ServeStaticInvoice
offers: Vec<Option<AsyncReceiveOffer>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was again the reason that for the MVP, we need to cache multiple offers?

/// have a maximally fresh offer. We always want to have at least 1 offer in this state,
/// preferably a few so we can respond to user requests for new offers without returning the same
/// one multiple times. Returning a new offer each time is better for privacy.
Ready {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder whether creation and registration of a new offer cannot happen on-demand. It seems rare for an offer not to be coming from cache, and showing a brief spinner seems acceptable for an MVP?

/// the server, we indicate which invoice is being replaced by the invoice's "slot number",
/// see [`ServeStaticInvoice::invoice_slot`]. So rather than internally tracking which cached
/// offer corresponds to what invoice slot number on the server's end, we always set the slot
/// number to the index of the offer in the cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there potential for things getting out of sync here, maybe if the client loses its state?

@@ -5248,6 +5248,20 @@ where
)
}

#[cfg(async_payments)]
fn check_refresh_async_receive_offers(&self, timer_tick_occurred: bool) {
let peers = self.get_peers_for_blinded_path();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only call this when new offers are needed?

// possible when a user goes to retrieve a cached offer.
//
// We avoid replacing unused offers too quickly -- this prevents the case where we send multiple
// invoices from different offers competing for the same slot to the server, messages are received
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a downside of the slot approach and is having some kind of offer id / hash safer?

@@ -1082,4 +1104,9 @@ where
) -> Vec<(DNSResolverMessage, MessageSendInstructions)> {
core::mem::take(&mut self.pending_dns_onion_messages.lock().unwrap())
}

/// Get the `AsyncReceiveOfferCache` for persistence.
pub(crate) fn writeable_async_receive_offer_cache(&self) -> impl Writeable + '_ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If channel manager persistence had already been removed, how would you have handled cache persistence?

Err(()) => continue,
};

let reply_path_context = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps do a compare here to see if anything changed? To prevent an LSP from being flooded with once per minute messages? Also not sure how suspend mechanisms work on all platforms/phones/browsers, and whether there is a risk of this happening continuously.

@TheBlueMatt
Copy link
Collaborator

I guess if we move in your suggested direction we'd want to start with that prefactor + 5b567a1 in a separate PR, rather than landing this one, since so much of the current code would be imminently moving otherwise?

Or we could land this one, since its where it is now, and then move the code later? Code moves are pretty cheap to review so its not like its adding a lot of review burden by doing this one first, no?

My reasoning for liking the current separation of responsibilities was basically that I like how simple/focused the cache is right now, and the invoice/onion message creation bits seemed similar to the flow's other responsibilities (plus the flow already has all the blinded path utils/message router/queues/expanded key available to it). The flow ends up at around ~1500 LoC in this PR.

Fair, and its nice to have a pretty encapsulated cache, but even 1500 LoC is pretty large, and importantly its several fairly unrelated "flows" :/. The cache currently also kinda leaks a bit, eg the new_offers_requested() method allows for external direct state manipulation in the cache, which always feels kinda awkward.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. As mentioned both online and in private discussions, this doesn't quite feel like a minimal implementation or MVP to me, though I recognize that definitions of those can vary. What seems optional to one person might be essential to someone else. I believe that if the goal had been to support only the current state of Lightning (a client connected to an LSP, without blinded paths or expirations), the design could have been more straightforward. The handling of various expirations, slots, and race conditions seems to have added a fair amount of complexity.

From an architectural standpoint, I think the tight coupling with the channel manager, especially around persistence, could have been avoided.

That said, I'm overall okay with moving forward. As far as I can tell, this code doesn't impact critical paths, is opt-in for users, and only involves connections to a trusted invoice server.

{
let cache = self.async_receive_offer_cache.lock().unwrap();
for offer_and_metadata in cache.offers_needing_invoice_refresh() {
let (offer, offer_nonce, slot_number, update_static_invoice_path) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip the intermediate offer_and_metadata var? Maybe a formatting consideration though.

pub(super) fn cache_pending_offer(
&mut self, offer: Offer, offer_paths_absolute_expiry_secs: Option<u64>, offer_nonce: Nonce,
update_static_invoice_path: Responder, duration_since_epoch: Duration,
) -> Result<u8, ()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u8 should be enough for anyone 😅


let serve_invoice_message =
ServeStaticInvoice { invoice, forward_invoice_request_path, invoice_slot };
Some((serve_invoice_message, reply_path_context))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably how the framework is set up, but it seems somewhat inconsistent that we queue onion messages in flow.rs and also send responses in messenger.rs

///
// We need to re-persist the cache if a fresh offer was just marked as used to ensure we continue
// to keep this offer's invoice updated and don't replace it with the server.
pub fn get_async_receive_offer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to what we discussed offline: how is the user going to know that the QR code that they got isn't valid forever? An expiring QR code seems undesirable for a user.

Random association: Is BIP 353 able to do something here somehow? (just writing this down, not familiar with it).


// If no unused offers are available, return the used offer with the latest absolute expiry
self.offers_with_idx()
.filter(|(_, offer)| matches!(offer.status, OfferStatus::Used))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't last invoice update be tracked in this case too?

}

// Require the offer that would be built using these paths to last at least
// MIN_OFFER_PATHS_RELATIVE_EXPIRY_SECS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think constants in comments can be made into links. There may be other instances.

}
}

// Allow more offer paths requests to be sent out in a burst roughly once per minute, or if an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that every minute, we try three times?

Because async recipients are not online to respond to invoice requests,
the plan is for another node on the network that is always-online to serve
static invoices on their behalf.

The protocol is as follows:
- Recipient is configured with blinded message paths to reach the static invoice
  server
- On startup, recipient requests blinded message paths for inclusion in their
  offer from the static invoice server over the configured paths
- Server replies with offer paths for the recipient
- Recipient builds their offer using these paths and the corresponding static
  invoice and replies with the invoice
- Server persists the invoice and confirms that they've persisted it, causing
  the recipient to cache the interactively built offer for use

At pay-time, the payer sends an invoice request to the static invoice server,
who replies with the static invoice after forwarding the invreq to the
recipient (to give them a chance to provide a fresh invoice in case they're
online).

Here we add the requisite trait methods and onion messages to support this
protocol.

An alterate design could be for the async recipient to publish static invoices
directly without a preceding offer, e.g. on their website. Some drawbacks of
this design include:
1) No fallback to regular BOLT 12 in the case that the recipient happens to be online
at pay-time. Falling back to regular BOLT 12 allows the recipient to provide a fresh
invoice and regain the proof-of-payment property
2) Static invoices don't fit in a QR code
3) No automatic rotation of the static invoice, which is useful in the case that payment
paths become outdated due to changing fees, etc
In future commits, as part of being an async recipient, we will interactively
build offers and static invoices with an always-online node that will serve
static invoices on our behalf.

Once an offer is built and we've requested persistence of the corresponding
invoice from the server, we will use the new offer cache added here to save the
invoice metadata and the offer in ChannelManager, though the OffersMessageFlow
is responsible for keeping the cache updated.

We want to cache and persist these offers so we always have them at the ready,
we don't want to begin the process of interactively building an offer the
moment it is needed. The offers are likely to be long-lived so caching them
avoids having to keep interactively rebuilding them after every restart.
In future commits, as part of being an async recipient, we will interactively
build offers and static invoices with an always-online node that will serve
static invoices on our behalf.

Here we add a setter for the blinded paths that we as an async recipient will
use to contact the static invoice server to get paths to put in our offers.
As an async recipient, we need to interactively build static invoices that an
always-online node will serve to payers on our behalf.

At the start of this process, we send a requests for paths to include in our
offers to the always-online node on startup and refresh the cached offers when
they start to get stale.
As an async recipient, we need to interactively build a static invoice that an
always-online node will serve to payers on our behalf.

As part of this process, the static invoice server sends us blinded message
paths to include in our offer so they'll receive invoice requests from senders
trying to pay us while we're offline. On receipt of these paths, create an
offer and static invoice and send the invoice back to the server so they can
provide the invoice to payers, as well as caching the offer as pending.
As an async recipient, we need to interactively build a static invoice that an
always-online node will serve on our behalf.

Once this invoice is built and persisted by the static invoice server, they
will send us a confirmation onion message. At this time, mark the corresponding
pending offer as ready to receive async payments.
As an async recipient, we need to interactively build offers and corresponding
static invoices, the latter of which an always-online node will serve to payers
on our behalf.

We want the static invoice server to always have the freshest possible invoice
available, so on each timer tick for every offer that is either currently in
use or pending confirmation, send them a new invoice.
Over the past multiple commits we've implemented interactively building async
receive offers with a static invoice server that will service invoice requests
on our behalf as an async recipient.

Here we add an API to retrieve a resulting offer so we can receive payments
when we're offline.
@valentinewallace valentinewallace force-pushed the 2025-02-static-inv-server-client branch from 17b7100 to 848d50e Compare June 25, 2025 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants