-
Notifications
You must be signed in to change notification settings - Fork 16
perf: Select Storage Provider by a ping race #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ⛔ Deployment terminated View logs |
synapse-dev | 040b16f | Commit Preview URL Branch Preview URL |
Nov 04 2025, 07:02 AM |
| ) | ||
| ) | ||
| let remaining = pings.length | ||
| while (remaining-- > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to write this like while (remaining --> 0) but lint disagrees
| } | ||
| } | ||
|
|
||
| export function fallbackRandIndex(length: number): number { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't delete fallbackRandIndex because it is still used by fallbackRandU256.
| }, | ||
| [new Set(), new Set()] | ||
| ) | ||
| .map((deduped) => [...deduped]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does createContexts land us with duplicates at this point now? why are we worried about deduping with Sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the client data sets, and while we don't expect there to be multiple data sets with the same provider, there could be many, and we want to dedupe because the subsequent code does assume the providerId are unique. It will also be important to dedupe when changing this to a method that returns multiple providers; otherwise it might pick the same provider multiple times.
We currently dedupe these in the iterative code with the skipProviderIds.
| (provider: ProviderInfo | null): provider is ProviderInfo => | ||
| provider !== null && | ||
| (!withIpni || provider.products.PDP?.data.ipniIpfs !== false) && | ||
| (dev || provider.products.PDP?.capabilities?.dev == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (const managedDataSets of [hasPieces, hasNoPieces]) { | ||
| const providers: ProviderInfo[] = ( | ||
| await Promise.all( | ||
| managedDataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (const managedDataSets of [hasPieces, hasNoPieces]) { | |
| const providers: ProviderInfo[] = ( | |
| await Promise.all( | |
| managedDataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId)) | |
| for (const dataSets of [hasPieces, hasNoPieces]) { | |
| const providers: ProviderInfo[] = ( | |
| await Promise.all( | |
| dataSets.map((dataSet: EnhancedDataSetInfo) => spRegistry.getProvider(dataSet.providerId)) |
shadowing managedDataSets here makes it confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataSets is also already used in this function
| const pings = providers.filter(hasPDP).map((provider, index) => | ||
| new PDPServer(null, provider.products.PDP.data.serviceURL).ping().then( | ||
| () => Promise.resolve(provider), | ||
| (error) => Promise.reject({ error, index, provider }) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this construct feels unnecessarily complex, can't we just const pdpProviders = providers.filter(hasPDP) then map them into a plain ping() promise, then you should be able to use the index of the promise that you use to tell you which provider it is and avoid the complexity of this then nested promise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then does simplify this code. The only difference from making pdpProviders a local would be the ability to recalculate the provider from the index. Both resolve and reject need the provider though, so the code is simpler if you nest it like this.
| await providerPdpServer.ping() | ||
| return provider | ||
| } catch (error) { | ||
| return await Promise.race(pings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole loop could just be replaced with a Promise.any I think; the problem you're battling is that race will return the first settled promise regardless of whether it's a resolve or reject, Promise.any returns the first resolved promise or it rejects if they all reject. There's an example of this in packages/synapse-sdk/src/retriever/utils.ts.
const { response, index: winnerIndex } = await Promise.any(providerAttempts)
then use index to pick out of your original list, and you don't need that custom then block.
Also, see in the retriever code how AbortController is used, we should be doing the same thing down in to ping() so we can abort everything else once we get one succeeding. Although, in the retrieval case we care about not aborting the winning promise, in this case we can abort everything because the winning promise has properly completed (i.e. in that case the controller is passed to the fetch Response which we don't want to abort, but here we complete the response before the promise resolves). So we could just one AbortController for this whole thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Promise.any would be good. Would have to move the failure logging into the .then() reject block, but could eliminate index and remaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the logging into the reject block would actually be noisy if we abort though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule is something like: Promise.race is almost never what you want.
But they're not quite preserved, currently you'll always be pulled back to data sets with the most pieces in it, so the choice between the one with 1 piece and 20 will always land you with the 20 data set as long as you can ping the provider, so the behaviour is now changed. Maybe this is OK, but it's a change we'd need to deal with and think through the implications of. The other major change with this is that we now end up talking to the closest & fastest SPs and completely ignore others. TTFB is now our main selection metric. This may be an OK design decision, but it's going to have some implications for the network and what it means to be an SP and how to compete for business. The existing randomness was helping us distribute the network a bit more. Also, for the multi-context case, aren't we going to be hitting this same code multiple times, so pinging the same providers multiple times (with exclusion)? How about an alternative form of this: The main purpose of the ping was to weed out providers that we can't talk to, it's an easy and quick test. When we get our list of top-level providers inside createContexts, we could do a bulk parallel ping of all of them, with a short timeout, maybe 500ms max, then that trims our initial list, and our smart select no longer needs to perform the ping because we've done it at the top level and that trimmed our list. Perhaps "smartSelectWithPing" is now different to "smartSelect" and createContexts only uses the latter while createContext uses the former. |
| const sorted = managedDataSets.sort((a, b) => { | ||
| if (a.currentPieceCount > 0 && b.currentPieceCount === 0) return -1 | ||
| if (b.currentPieceCount > 0 && a.currentPieceCount === 0) return 1 | ||
| return a.pdpVerifierDataSetId - b.pdpVerifierDataSetId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they're not quite preserved, currently you'll always be pulled back to data sets with the most pieces in it
No, the current tie breaker is dataset ID
Yes. I have a local diff that changes these functions to take a count and return an array but that will be more work and I will be prioritizing upload first. |
Reviewer @rvagg @hugomrdias
Closes #388
This makes progress toward a parallel
createContexts.By pinging in parallel rather than sequentially, the performance of
createContextandcreateContextsshould be dramatically increased, especially in the case where many providers are down.Ties among providers are resolved according to which can ping first.
The prior priority tiers, such as existing pieces and existing data sets, are preserved.
Changes