Skip to content

Conversation

@wjmelements
Copy link
Contributor

Reviewer @rvagg @hugomrdias
Closes #388
This makes progress toward a parallel createContexts.
By pinging in parallel rather than sequentially, the performance of createContext and createContexts should be dramatically increased, especially in the case where many providers are down.
Ties among providers are resolved according to which can ping first.
The prior priority tiers, such as existing pieces and existing data sets, are preserved.

Changes

  • destroy async generator
  • Promise.race the pings
  • fix tests

@wjmelements wjmelements linked an issue Nov 4, 2025 that may be closed by this pull request
@github-project-automation github-project-automation bot moved this to 📌 Triage in FS Nov 4, 2025
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 4, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
⛔ Deployment terminated
View logs
synapse-dev 040b16f Commit Preview URL

Branch Preview URL
Nov 04 2025, 07:02 AM

@wjmelements wjmelements added the enhancement New feature or request label Nov 4, 2025
@wjmelements wjmelements mentioned this pull request Nov 4, 2025
)
)
let remaining = pings.length
while (remaining-- > 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to write this like while (remaining --> 0) but lint disagrees

}
}

export function fallbackRandIndex(length: number): number {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't delete fallbackRandIndex because it is still used by fallbackRandU256.

},
[new Set(), new Set()]
)
.map((deduped) => [...deduped])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does createContexts land us with duplicates at this point now? why are we worried about deduping with Sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the client data sets, and while we don't expect there to be multiple data sets with the same provider, there could be many, and we want to dedupe because the subsequent code does assume the providerId are unique. It will also be important to dedupe when changing this to a method that returns multiple providers; otherwise it might pick the same provider multiple times.

We currently dedupe these in the iterative code with the skipProviderIds.

@rjan90 rjan90 moved this from 📌 Triage to 🔎 Awaiting review in FS Nov 4, 2025
(provider: ProviderInfo | null): provider is ProviderInfo =>
provider !== null &&
(!withIpni || provider.products.PDP?.data.ipniIpfs !== false) &&
(dev || provider.products.PDP?.capabilities?.dev == null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will conflict with #376, let's pull that one in first (@rjan90) and make sure we account for it here

Comment on lines +629 to +632
for (const managedDataSets of [hasPieces, hasNoPieces]) {
const providers: ProviderInfo[] = (
await Promise.all(
managedDataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (const managedDataSets of [hasPieces, hasNoPieces]) {
const providers: ProviderInfo[] = (
await Promise.all(
managedDataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId))
for (const dataSets of [hasPieces, hasNoPieces]) {
const providers: ProviderInfo[] = (
await Promise.all(
dataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId))

shadowing managedDataSets here makes it confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataSets is also already used in this function

Comment on lines +722 to +727
const pings = providers.filter(hasPDP).map((provider, index) =>
new PDPServer(null, provider.products.PDP.data.serviceURL).ping().then(
() => Promise.resolve(provider),
(error) => Promise.reject({ error, index, provider })
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this construct feels unnecessarily complex, can't we just const pdpProviders = providers.filter(hasPDP) then map them into a plain ping() promise, then you should be able to use the index of the promise that you use to tell you which provider it is and avoid the complexity of this then nested promise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then does simplify this code. The only difference from making pdpProviders a local would be the ability to recalculate the provider from the index. Both resolve and reject need the provider though, so the code is simpler if you nest it like this.

await providerPdpServer.ping()
return provider
} catch (error) {
return await Promise.race(pings)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole loop could just be replaced with a Promise.any I think; the problem you're battling is that race will return the first settled promise regardless of whether it's a resolve or reject, Promise.any returns the first resolved promise or it rejects if they all reject. There's an example of this in packages/synapse-sdk/src/retriever/utils.ts.

const { response, index: winnerIndex } = await Promise.any(providerAttempts)

then use index to pick out of your original list, and you don't need that custom then block.

Also, see in the retriever code how AbortController is used, we should be doing the same thing down in to ping() so we can abort everything else once we get one succeeding. Although, in the retrieval case we care about not aborting the winning promise, in this case we can abort everything because the winning promise has properly completed (i.e. in that case the controller is passed to the fetch Response which we don't want to abort, but here we complete the response before the promise resolves). So we could just one AbortController for this whole thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promise.any would be good. Would have to move the failure logging into the .then() reject block, but could eliminate index and remaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the logging into the reject block would actually be noisy if we abort though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule is something like: Promise.race is almost never what you want.

@rvagg
Copy link
Collaborator

rvagg commented Nov 4, 2025

The prior priority tiers, such as existing pieces and existing data sets, are preserved

But they're not quite preserved, currently you'll always be pulled back to data sets with the most pieces in it, so the choice between the one with 1 piece and 20 will always land you with the 20 data set as long as you can ping the provider, so the behaviour is now changed. Maybe this is OK, but it's a change we'd need to deal with and think through the implications of.

The other major change with this is that we now end up talking to the closest & fastest SPs and completely ignore others. TTFB is now our main selection metric. This may be an OK design decision, but it's going to have some implications for the network and what it means to be an SP and how to compete for business. The existing randomness was helping us distribute the network a bit more.

Also, for the multi-context case, aren't we going to be hitting this same code multiple times, so pinging the same providers multiple times (with exclusion)?

How about an alternative form of this: The main purpose of the ping was to weed out providers that we can't talk to, it's an easy and quick test. When we get our list of top-level providers inside createContexts, we could do a bulk parallel ping of all of them, with a short timeout, maybe 500ms max, then that trims our initial list, and our smart select no longer needs to perform the ping because we've done it at the top level and that trimmed our list. Perhaps "smartSelectWithPing" is now different to "smartSelect" and createContexts only uses the latter while createContext uses the former.

const sorted = managedDataSets.sort((a, b) => {
if (a.currentPieceCount > 0 && b.currentPieceCount === 0) return -1
if (b.currentPieceCount > 0 && a.currentPieceCount === 0) return 1
return a.pdpVerifierDataSetId - b.pdpVerifierDataSetId
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they're not quite preserved, currently you'll always be pulled back to data sets with the most pieces in it

No, the current tie breaker is dataset ID

@wjmelements
Copy link
Contributor Author

Also, for the multi-context case, aren't we going to be hitting this same code multiple times, so pinging the same providers multiple times (with exclusion)?

Yes. I have a local diff that changes these functions to take a count and return an array but that will be more work and I will be prioritizing upload first.

@wjmelements wjmelements self-assigned this Nov 4, 2025
@wjmelements wjmelements marked this pull request as draft November 4, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

perf: selectProviderWithPing should use Promise.race

3 participants