perf: Select Storage Provider by a ping race #390

wjmelements · 2025-11-04T06:22:23Z

Reviewer @rvagg @hugomrdias
Closes #388
This makes progress toward a parallel createContexts.
By pinging in parallel rather than sequentially, the performance of createContext and createContexts should be dramatically increased, especially in the case where many providers are down.
Ties among providers are resolved according to which can ping first.
The prior priority tiers, such as existing pieces and existing data sets, are preserved.

Changes

destroy async generator
Promise.race the pings
fix tests

cloudflare-workers-and-pages · 2025-11-04T06:22:32Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
⛔ Deployment terminated View logs	synapse-dev	`040b16f`	Commit Preview URL Branch Preview URL	Nov 04 2025, 07:02 AM

wjmelements · 2025-11-04T06:24:17Z

packages/synapse-sdk/src/storage/context.ts

+      )
+    )
+    let remaining = pings.length
+    while (remaining-- > 0) {


I like to write this like while (remaining --> 0) but lint disagrees

packages/synapse-sdk/src/test/storage.test.ts

…ckRandU256

wjmelements · 2025-11-04T06:41:43Z

packages/synapse-core/src/utils/rand.ts

  }
 }

 export function fallbackRandIndex(length: number): number {


I don't delete fallbackRandIndex because it is still used by fallbackRandU256.

rvagg · 2025-11-04T07:54:12Z

packages/synapse-sdk/src/storage/context.ts

+          },
+          [new Set(), new Set()]
+        )
+        .map((deduped) => [...deduped])


does createContexts land us with duplicates at this point now? why are we worried about deduping with Sets?

These are the client data sets, and while we don't expect there to be multiple data sets with the same provider, there could be many, and we want to dedupe because the subsequent code does assume the providerId are unique. It will also be important to dedupe when changing this to a method that returns multiple providers; otherwise it might pick the same provider multiple times.

We currently dedupe these in the iterative code with the skipProviderIds.

rvagg · 2025-11-04T07:57:56Z

packages/synapse-sdk/src/storage/context.ts

+          (provider: ProviderInfo | null): provider is ProviderInfo =>
+            provider !== null &&
+            (!withIpni || provider.products.PDP?.data.ipniIpfs !== false) &&
+            (dev || provider.products.PDP?.capabilities?.dev == null)


this will conflict with #376, let's pull that one in first (@rjan90) and make sure we account for it here

rvagg · 2025-11-04T07:59:39Z

packages/synapse-sdk/src/storage/context.ts

+      for (const managedDataSets of [hasPieces, hasNoPieces]) {
+        const providers: ProviderInfo[] = (
+          await Promise.all(
+            managedDataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId))


Suggested change

for (const managedDataSets of [hasPieces, hasNoPieces]) {

const providers: ProviderInfo[] = (

await Promise.all(

managedDataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId))

for (const dataSets of [hasPieces, hasNoPieces]) {

const providers: ProviderInfo[] = (

await Promise.all(

dataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId))

shadowing managedDataSets here makes it confusing

dataSets is also already used in this function

rvagg · 2025-11-04T08:04:57Z

packages/synapse-sdk/src/storage/context.ts

+    const pings = providers.filter(hasPDP).map((provider, index) =>
+      new PDPServer(null, provider.products.PDP.data.serviceURL).ping().then(
+        () => Promise.resolve(provider),
+        (error) => Promise.reject({ error, index, provider })
+      )
+    )


this construct feels unnecessarily complex, can't we just const pdpProviders = providers.filter(hasPDP) then map them into a plain ping() promise, then you should be able to use the index of the promise that you use to tell you which provider it is and avoid the complexity of this then nested promise

then does simplify this code. The only difference from making pdpProviders a local would be the ability to recalculate the provider from the index. Both resolve and reject need the provider though, so the code is simpler if you nest it like this.

rvagg · 2025-11-04T08:08:50Z

packages/synapse-sdk/src/storage/context.ts

-        await providerPdpServer.ping()
-        return provider
-      } catch (error) {
+        return await Promise.race(pings)


This whole loop could just be replaced with a Promise.any I think; the problem you're battling is that race will return the first settled promise regardless of whether it's a resolve or reject, Promise.any returns the first resolved promise or it rejects if they all reject. There's an example of this in packages/synapse-sdk/src/retriever/utils.ts.

const { response, index: winnerIndex } = await Promise.any(providerAttempts)

then use index to pick out of your original list, and you don't need that custom then block.

Also, see in the retriever code how AbortController is used, we should be doing the same thing down in to ping() so we can abort everything else once we get one succeeding. Although, in the retrieval case we care about not aborting the winning promise, in this case we can abort everything because the winning promise has properly completed (i.e. in that case the controller is passed to the fetch Response which we don't want to abort, but here we complete the response before the promise resolves). So we could just one AbortController for this whole thing.

Promise.any would be good. Would have to move the failure logging into the .then() reject block, but could eliminate index and remaining.

Moving the logging into the reject block would actually be noisy if we abort though.

The rule is something like: Promise.race is almost never what you want.

rvagg · 2025-11-04T09:10:12Z

The prior priority tiers, such as existing pieces and existing data sets, are preserved

But they're not quite preserved, currently you'll always be pulled back to data sets with the most pieces in it, so the choice between the one with 1 piece and 20 will always land you with the 20 data set as long as you can ping the provider, so the behaviour is now changed. Maybe this is OK, but it's a change we'd need to deal with and think through the implications of.

The other major change with this is that we now end up talking to the closest & fastest SPs and completely ignore others. TTFB is now our main selection metric. This may be an OK design decision, but it's going to have some implications for the network and what it means to be an SP and how to compete for business. The existing randomness was helping us distribute the network a bit more.

Also, for the multi-context case, aren't we going to be hitting this same code multiple times, so pinging the same providers multiple times (with exclusion)?

How about an alternative form of this: The main purpose of the ping was to weed out providers that we can't talk to, it's an easy and quick test. When we get our list of top-level providers inside createContexts, we could do a bulk parallel ping of all of them, with a short timeout, maybe 500ms max, then that trims our initial list, and our smart select no longer needs to perform the ping because we've done it at the top level and that trimmed our list. Perhaps "smartSelectWithPing" is now different to "smartSelect" and createContexts only uses the latter while createContext uses the former.

wjmelements · 2025-11-04T10:03:15Z

packages/synapse-sdk/src/storage/context.ts

-      const sorted = managedDataSets.sort((a, b) => {
-        if (a.currentPieceCount > 0 && b.currentPieceCount === 0) return -1
-        if (b.currentPieceCount > 0 && a.currentPieceCount === 0) return 1
-        return a.pdpVerifierDataSetId - b.pdpVerifierDataSetId


But they're not quite preserved, currently you'll always be pulled back to data sets with the most pieces in it

No, the current tie breaker is dataset ID

wjmelements · 2025-11-04T10:15:57Z

Also, for the multi-context case, aren't we going to be hitting this same code multiple times, so pinging the same providers multiple times (with exclusion)?

Yes. I have a local diff that changes these functions to take a count and return an array but that will be more work and I will be prioritizing upload first.

wjmelements added 4 commits November 4, 2025 00:17

wip select by ping

2c76bff

fix tsc

cc94ac1

fix test

3558b3c

Promise.race the pings

fe919a7

wjmelements linked an issue Nov 4, 2025 that may be closed by this pull request

perf: selectProviderWithPing should use Promise.race #388

Open

wjmelements requested review from hugomrdias and rvagg November 4, 2025 06:22

github-project-automation bot added this to FS Nov 4, 2025

github-project-automation bot moved this to 📌 Triage in FS Nov 4, 2025

wjmelements added the enhancement New feature or request label Nov 4, 2025

wjmelements mentioned this pull request Nov 4, 2025

feat: createContexts #368

Merged

wjmelements commented Nov 4, 2025

View reviewed changes

packages/synapse-sdk/src/test/storage.test.ts Show resolved Hide resolved

rm unused randIndex, but keep fallbackRandIndex, still used by fallba…

040b16f

…ckRandU256

wjmelements commented Nov 4, 2025

View reviewed changes

rvagg reviewed Nov 4, 2025

View reviewed changes

rjan90 moved this from 📌 Triage to 🔎 Awaiting review in FS Nov 4, 2025

rvagg reviewed Nov 4, 2025

View reviewed changes

wjmelements commented Nov 4, 2025

View reviewed changes

wjmelements self-assigned this Nov 4, 2025

wjmelements marked this pull request as draft November 4, 2025 16:09

perf: Select Storage Provider by a ping race #390

Are you sure you want to change the base?

perf: Select Storage Provider by a ping race #390

Uh oh!

Conversation

wjmelements commented Nov 4, 2025

Changes

Uh oh!

cloudflare-workers-and-pages bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rvagg commented Nov 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wjmelements commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudflare-workers-and-pages bot commented Nov 4, 2025 •

edited

Loading