Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer-And-Content-Routing example: Remove or better document 100ms delay #950

Closed
mcclure opened this issue Jun 6, 2021 · 7 comments
Closed
Labels
effort/hours Estimated to take one or several hours good first issue Good issue for new contributors help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up topic/docs Documentation

Comments

@mcclure
Copy link
Contributor

mcclure commented Jun 6, 2021

This example
https://github.com/libp2p/js-libp2p/blob/master/examples/peer-and-content-routing
Involves running two scripts simultaneously; they then connect to each other. Each one does this before calling findPeers

// The DHT routing tables need a moment to populate
await delay(100)

This is fine for an example/test but is not good enough for production code. There's no indication how to modify this 100ms delay when moving the example code into production code.

Expected behavior: There should be some other type of event— for example, js-libp2p-kad-dht announcing it has finished negotiating dht with the remote server— that can be awaited on, and the example should use it.

@mcclure mcclure added the need/triage Needs initial labeling and prioritization label Jun 6, 2021
@lidel lidel added help wanted Seeking public contribution on this issue topic/docs Documentation effort/hours Estimated to take one or several hours P2 Medium: Good to have, but can wait until someone steps up and removed need/triage Needs initial labeling and prioritization labels Jul 2, 2021
@vasco-santos vasco-santos added the good first issue Good issue for new contributors label Nov 12, 2021
@alxroyer
Copy link

alxroyer commented Aug 11, 2022

Also interested in this info.

I've tracked my experimentations on that topic through a dedicated repo: https://github.com/Alexis-ROYER/js-libp2p-content-routing-delay

According to my observations:

  • No need to await for "propagation" in the given examples, in as much as there is a single intermediate node.
  • Still, time must be awaited after the connection is established with the remote:
    1. before a content provider provides its content,
    2. and before a querier asks for this content.
  • The required time to await got worse with [email protected] compared with [email protected],
    coming close to the 100 ms timings proposed in the examples.

Has anyone pointed out a formal event that should be awaited from the DHT before providing to / querying from it?

@inverted-capital
Copy link

In a system designed to operate with intermittent network connectivity, giving an error because you have no peers currently seems odd ?

Would a PR be accepted on this topic, if it added an abortable smart retry whenever there were no peers ?

For example, if no peers are found, we cache the request, and await the internal event (which is not exposed, but can be seen from enabling DEBUG='libp2p:*') for a new peer appearing, and then keep on trying whenever a new peer appears until we get acceptable success ?

This would close this ticket, because there would no longer be any delay required before using the DHT.

If acceptable, we would seek to do the same for pubsub - transient periods of no peers is expected behavior of this library - it says so on the tin

@inverted-capital
Copy link

Or, is the design intent of libp2p that retry functionality should be left up to the user, in which case we would aim to provide a module that could wrap the dht implementation by offering local persistence with retry ? @vasco-santos what are your thoughts ?

Our opinion is that if a repo was supplied to libp2p, the user intends for long term persistence, and that this implicitly means long term retry and refresh of dht operations, which would be started and stopped via libp2p.start() and libp2p.stop()

@inverted-capital
Copy link

We've had a chat about this, and would like to present a summary here.

Our belief is that libp2p presents a virtualized model of the conventional network as an ideal network node. The turmoil of the real world is gracefully abstracted away by libp2p and the developer can concentrate on the application logic, trusting that libp2p will eventually do what the developer asked. The tradeoff for this idealization of the network is time. In fact, 3 of the front page claims for libp2p imply eventuality of completion: Native Roaming, Work Offline, and Good For High Latency Scenarios.

To achieve this, libp2p has some memory, namely the peerstore, that can help accelerate the performance of its assurances to the developer. But two areas where it fails, due to no memory, is with the dht and pubsub modules.

Our expectation of an ideal dht is that it should:

  1. Return an async iterable on put() that resolved with lifecycle events, like dialing, querying, put1, put2, put3, removed
  2. Continue its lifecycle effort on node reload, if the storage indicates an incomplete task
  3. Maintain its put() values across the network periodically
  4. Retry forever until its key is overwritten, or until enough nodes have been reached

Our expectation of an ideal pubsub is that it should:

  1. Return an async iterable on publish() that resolved with lifecycle events
  2. Continue its lifecycle effort on node reload, if the storage indicates an incomplete task
  3. On new peer subscriptions detected, we should republish whatever we published, since a new node is served infinitely better with an old publication rather than nothing. They might have missed the broadcast by a mere 1ms, so they should be 'caught up' by default.

@achingbrain may we please have some guidance on:

  1. The errors in our presumptions here
  2. How would you like this implemented in libp2p, then we can have a shot at it

Our suggestion would be to implement a Startable interface class for both dht and pubsub, and intercept all calls to the backing dht or pubsub modules, performing long retry and other tasks, along with reload tasks. The storage we would use would be datastore keys with prefix /dht/put and /pubsub/publish/topic

@inverted-capital
Copy link

This PR from @achingbrain helps show what is being waited for: https://github.com/libp2p/js-libp2p/pull/1549/files

@maschad maschad closed this as completed Oct 1, 2023
@inverted-capital
Copy link

Sir this delay is still in the code for the example:

May we pretty please have some event that fires when the tables have propagated instead of polling for population ?

@maschad
Copy link
Member

maschad commented Oct 2, 2023

I think there is a fundamental misunderstanding with how Kademlia works. When looking for a peer, the local node contacts the k closest nodes to the remote peer’s ID asking them for closer nodes. The local node repeats the process until it finds the peer or determines that it is not in the network. This means that if you are unable to find the content you are looking for, the node needs to await more peers being discovered as opposed to continuously querying the DHT. In this particular example we could await the connection to be established for the remote, so that PR could be opened as a less hacky fix.

Now once a substantial amount of nodes are connected to the node, there is a PR that could introduce progress events to assess each query on the DHT but that still would not solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours good first issue Good issue for new contributors help wanted Seeking public contribution on this issue P2 Medium: Good to have, but can wait until someone steps up topic/docs Documentation
Projects
None yet
Development

No branches or pull requests

6 participants