Skip to content

Add alternative provider retrieval check #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
aedec80
Add network wide retrieval check
pyropy Apr 9, 2025
233cc1f
Use status code instead of boolean retrieval flag
pyropy Apr 9, 2025
83e7f31
Simplify name for network wide measurements
pyropy Apr 9, 2025
afe30dd
Refactor code for picking random provider
pyropy Apr 9, 2025
23ee203
Add network retrieval protocol field
pyropy Apr 9, 2025
4bc1076
Add basic test for testing network retrieval
pyropy Apr 9, 2025
63424ff
Refactor function for picking random providers
pyropy Apr 9, 2025
8a94f4e
Only return providers in case of no valid advert
pyropy Apr 9, 2025
c4350b6
Convert network stats to object inside stats obj
pyropy Apr 10, 2025
edfdef1
Format testNetworkRetrieval func
pyropy Apr 10, 2025
dbf0fd7
Refactor queryTheIndex function
pyropy Apr 10, 2025
d33f276
Handle case when no random provider is picked
pyropy Apr 10, 2025
97bee91
Test function for picking random providers
pyropy Apr 10, 2025
4b6d0bc
Rename network retrieval to alternative provider check
pyropy Apr 11, 2025
97fcc28
Update logging to reflect metric name change
pyropy Apr 11, 2025
5121a49
Update logging to reflect metric name change
pyropy Apr 11, 2025
4065784
Rename providers field to alternativeProviders
pyropy Apr 11, 2025
74f06e9
Rename testNetworkRetrieval to checkRetrievalFromAlternativeProvider
pyropy Apr 11, 2025
ea8cce4
Return retrieval stats from checkRetrievalFromAlternativeProvider
pyropy Apr 11, 2025
f9afe34
Update lib/spark.js
pyropy Apr 11, 2025
9959b50
Update lib/spark.js
pyropy Apr 11, 2025
a2da050
Rename functions to match new metric name
pyropy Apr 11, 2025
9759d80
Merge branch 'add/network-wide-retrieval-check' of github.com:filecoi…
pyropy Apr 11, 2025
820e8a3
Pick alternative provider using supplied randomness
pyropy Apr 15, 2025
5b13287
Replace custom rng implementation with Prando
pyropy Apr 15, 2025
3c14f84
Fix typos
pyropy Apr 15, 2025
fe0f1f5
Merge remote-tracking branch 'origin/main' into add/network-wide-retr…
pyropy Apr 28, 2025
ad8a8e8
Lint fix
pyropy Apr 28, 2025
31019d0
Add ID to Provider
pyropy Apr 28, 2025
3710910
Filter out bitswap providers before picking random provider
pyropy Apr 28, 2025
c61a196
Update lib/ipni-client.js
pyropy Apr 29, 2025
d1f62fa
Update lib/spark.js
pyropy Apr 29, 2025
05bd1c2
Update lib/spark.js
pyropy Apr 29, 2025
3451eff
Rename random to alternative provider
pyropy Apr 29, 2025
8b8db36
Merge branch 'add/network-wide-retrieval-check' of github.com:Checker…
pyropy Apr 29, 2025
59d3d22
Simplify pseudo-rng
pyropy Apr 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 33 additions & 16 deletions lib/ipni-client.js
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
import { decodeBase64, decodeVarint, pRetry, assertOkResponse } from '../vendor/deno-deps.js'

/** @typedef {{ address: string; protocol: string; contextId: string; }} Provider */

/**
*
* @param {string} cid
* @param {string} providerId
* @returns {Promise<{
* indexerResult: string;
* provider?: { address: string; protocol: string };
* provider?: Provider;
* providers?: Provider[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing to have two properties called provider and providers, I cannot tell what's the difference.

How about calling the new property alternativeProviders or altProviders?

* }>}
*/
export async function queryTheIndex(cid, providerId) {
Expand All @@ -31,9 +34,8 @@ export async function queryTheIndex(cid, providerId) {
}

let graphsyncProvider
const providers = []
for (const p of providerResults) {
if (p.Provider.ID !== providerId) continue

const [protocolCode] = decodeVarint(decodeBase64(p.Metadata))
const protocol = {
0x900: 'bitswap',
Expand All @@ -45,22 +47,33 @@ export async function queryTheIndex(cid, providerId) {
const address = p.Provider.Addrs[0]
if (!address) continue

switch (protocol) {
case 'http':
return {
indexerResult: 'OK',
provider: { address, protocol },
}
const provider = {
address: formatProviderAddress(p.Provider.ID, address, protocol),
contextId: p.ContextID,
protocol,
}

if (p.Provider.ID === providerId) {
switch (protocol) {
case 'http':
return {
indexerResult: 'OK',
provider,
}

case 'graphsync':
if (!graphsyncProvider) {
graphsyncProvider = {
address: `${address}/p2p/${p.Provider.ID}`,
protocol,
case 'graphsync':
if (!graphsyncProvider) {
graphsyncProvider = provider
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing to me that in case of HTTP we return immediately, but in case of graph sync we don't. I don't see a reason for this behavior. Could we also directly return in the case of graphsync?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the way we use to do it even before introduction of "alternative providers" to the response. As I understood it: we're iterating trough list of providers and in case we find record serving content via graphsync first we'll return it only if there's no HTTP alternative. In case there's a record serving content via HTTP we return it immediately.


// Skip adding the provider to the list if it's the the one we are looking for
continue
}

providers.push(provider)
}

if (graphsyncProvider) {
console.log('HTTP protocol is not advertised, falling back to Graphsync.')
return {
Expand All @@ -70,7 +83,7 @@ export async function queryTheIndex(cid, providerId) {
}

console.log('All advertisements are from other miners or for unsupported protocols.')
return { indexerResult: 'NO_VALID_ADVERTISEMENT' }
return { indexerResult: 'NO_VALID_ADVERTISEMENT', providers }
}

async function getRetrievalProviders(cid) {
Expand All @@ -81,3 +94,7 @@ async function getRetrievalProviders(cid) {
const result = await res.json()
return result.MultihashResults.flatMap((r) => r.ProviderResults)
}

function formatProviderAddress(id, address, protocol) {
return protocol === 'http' ? address : `${address}/p2p/${id}`
}
114 changes: 112 additions & 2 deletions lib/spark.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
/* global Zinnia */

/** @import { Provider } from './ipni-client.js' */
import { ActivityState } from './activity-state.js'
import {
SPARK_VERSION,
Expand Down Expand Up @@ -71,11 +72,25 @@ export default class Spark {
}

console.log(`Querying IPNI to find retrieval providers for ${retrieval.cid}`)
const { indexerResult, provider } = await queryTheIndex(retrieval.cid, stats.providerId)
const { indexerResult, provider, providers } = await queryTheIndex(
retrieval.cid,
stats.providerId,
)
stats.indexerResult = indexerResult

const providerFound = indexerResult === 'OK' || indexerResult === 'HTTP_NOT_ADVERTISED'
if (!providerFound) return
const noValidAdvertisement = indexerResult === 'NO_VALID_ADVERTISEMENT'

// In case index lookup failed due to network error or CID not found,
// we will not perform any retrieval
if (!providerFound && !noValidAdvertisement) return

// In case we fail to find a valid advertisement for the provider
// we will try to perform network wide retrieval from other providers
if (noValidAdvertisement) {
console.log('No valid advertisement found. Trying to retrieve from an alternative provider...')
return await this.checkRetrievalFromAlternativeProvider(providers, retrieval.cid, stats)
}

stats.protocol = provider.protocol
stats.providerAddress = provider.address
Expand Down Expand Up @@ -202,6 +217,29 @@ export default class Spark {
}
}

async testNetworkRetrieval(providers, cid, stats) {
if (!providers.length) {
console.info('No alternative providers found for the CID.')
return
}

stats.networkRetrieval = newNetworkRetrievalStats()
const randomProvider = pickRandomProvider(providers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by this strategy. I might be out of sync with the current product plan, but wasn't it the goal to see if any provider can fetch the data? Here we're marking a failure if the 2nd provider we try fails, but don't try the other ones.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've posted my opinions on that in this comment and I'm happy to discuss more!

Other motivation behind this was not to put too much pressure on the network serving content and on the checkers. I'm afraid of the worst case situation where we'd have to check every provider serving content.

if (!randomProvider) {
console.warn(
'No providers serving the content via HTTP or Graphsync found. Skipping network-wide retrieval check.',
)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prevent this case earlier, when we decide whether to run this function in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's possible to prevent it by filtering alternative providers by their protocol. If we only have alternative providers that are serving content via bitswap by filtering them out we can exit early.


await this.fetchCAR(
randomProvider.protocol,
randomProvider.address,
cid,
stats.networkRetrieval,
)
}

async submitMeasurement(task, stats) {
console.log('Submitting measurement...')
const payload = {
Expand Down Expand Up @@ -315,6 +353,16 @@ export function newStats() {
carChecksum: null,
statusCode: null,
headStatusCode: null,
networkRetrieval: null,
}
}

function newNetworkRetrievalStats() {
return {
statusCode: null,
timeout: false,
endAt: null,
carTooLarge: false,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the selected providerId in the new stats object, see filcdn/roadmap#254 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit problematic as for providerId we need to make sure that provider is a Filecoin storage provider and that we have their miner Id.

Are you aware of some way to reverse Peer ID to Miner ID?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the retrieval provider peer ID found in the IPNI response instead of the Filecoin miner ID.

We are already doing that for the "regular" retrieval checks, see here:

console.log(`Found peer id: ${peerId}`)
stats.providerId = peerId

}

Expand Down Expand Up @@ -395,3 +443,65 @@ function mapErrorToStatusCode(err) {
// Fallback code for unknown errors
return 600
}

/**
* Assigns weights to providers based on their protocol and context ID and picks one at random.
* Providers with higher weights have a higher chance of being selected.
*
* Providers serving content using Bitswap protocol are filtered out.
*
* @param {Provider[]} providers
* @returns {Provider | undefined}
*/
export function pickRandomProvider(providers) {
const filteredProviders = providers.filter((provider) => provider.protocol !== 'bitswap')
if (!filteredProviders.length) return
const weightedProviders = weighProviders(filteredProviders)
return pickRandomWeightedItem(weightedProviders)
}

/**
* Assigns weights to providers based on their protocol and context ID.
*
* HTTP providers and those whose context ID starts with 'gHa' are given higher weights,
* hence having a higher chance of being selected.
*
* @param {Provider[]} providers
* @returns {Provider & { weight: number }[]}
*/
function weighProviders(providers) {
const protocolWeights = { http: 2, graphsync: 1 }

// assign weight to each provider
return providers.map((provider) => {
let weight = protocolWeights[provider.protocol]
if (provider.contextId.startsWith('ghsA')) weight += 1

return {
...provider,
weight,
}
})
}

/**
* Picks a random item from an array based on their weight. The higher the weight, the higher the chance of being selected.
*
* @template T The type of the item in the list.
* @param {Array<{weight: number}>} items The list of items, where each item has a `weight`property.
* @returns {T} The randomly selected item based on its weight.
*
*/
function pickRandomWeightedItem(items) {
const totalWeight = items.reduce((acc, item) => acc + item.weight, 0)
let random = Math.random() * totalWeight

// Iterate over items, subtracting the item's weight from the random number
// until we find the item where the random number is less than the item's weight
for (let i = 0; i < items.length; i++) {
random -= items[i].weight
if (random <= 0) {
return items[i]
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very problematic:

(1)
When using Math.random(), each checker node will pick a different provider. As a result, we cannot create committees to find an honest majority in the data reported by the network.

Please use a DRAND beacon to get deterministic randomness so that all nodes pick the same alternative provider to check.

See how we are using DRAND beacon in other parts of this codebase. The important part is to use the DRAND beacon tied to the time when the Spark round started.

(2)
I am not sure if it's a good idea to do a random selection with weights. Shouldn't we always prefer Filecoin SPs serving HTTP retrievals over everybody else?

I would replace weights with the following algorithm:

  • Is there a provider with ContextID starting with ghsA?
    • Yes:
      Does any of those providers support HTTP?
      • Yes -> pick one of those HTTP providers at random
      • No -> pick any of the Graphsync providers at random
    • No:
      Does any of the providers support HTTP?
      • Yes -> pick one of those HTTP providers at random
      • No -> pick any of the Graphsync providers at random

Alternatively, keep using weight, but pick randomly only from the providers with the same highest weight.

Let's discuss what would be the best approach here!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input!

(1)

I wasn't aware that we also need to form committees for the second (alternative) measurement as commitees were formed to evaluate measurement for a single storage provider. I therefore assumed that we can pick any provider at random as this measurement should not affect existing RSR.

In case we're going to use committees to evaluate this measurement I agree with using DRAND.

(2)

Alternative solution to your proposal would be to adjust the weights but I think the algorithm you proposed may be easier to understand. I don't mind changing the algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Would be nice if we could use committees
  2. I agree with the more explicit algorithm, it's easier to reason about

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added both a pseudo-RNG and more a explicit algorithm.

1 change: 1 addition & 0 deletions test/ipni-client.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ test('query advertised CID', async () => {
provider: {
address: '/dns/frisbii.fly.dev/tcp/443/https',
protocol: 'http',
contextId: 'ZnJpc2JpaQ==',
},
})
})
Expand Down
90 changes: 89 additions & 1 deletion test/spark.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/* global Zinnia */

import Spark, { calculateDelayBeforeNextTask, newStats } from '../lib/spark.js'
import Spark, { calculateDelayBeforeNextTask, newStats, pickRandomProvider } from '../lib/spark.js'
import { test } from 'zinnia:test'
import {
assertInstanceOf,
Expand All @@ -9,6 +9,7 @@ import {
assertNotEquals,
assertLessOrEqual,
assertGreaterOrEqual,
assertGreater,
} from 'zinnia:assert'
import { SPARK_VERSION } from '../lib/constants.js'

Expand Down Expand Up @@ -155,6 +156,60 @@ test('fetchCAR - http', async () => {
'stats.carChecksum',
)
assertEquals(requests, [`https://frisbii.fly.dev/ipfs/${KNOWN_CID}?dag-scope=block`])
assertEquals(stats.networkRetrieval, null, 'stats.networkRetrieval')
})

test('testNetworkRetrieval - http', async () => {
const requests = []
const spark = new Spark({
fetch: async (url) => {
requests.push(url.toString())
return fetch(url)
},
})
const stats = newStats()
const providers = [
{
address: '/dns/frisbii.fly.dev/tcp/443/https',
protocol: 'http',
contextId: 'ZnJpc2JpaQ==',
},
{
address: '/dns/mock.fly.dev/tcp/443/https',
protocol: 'bitswap',
contextId: 'ghA==',
},
]

await spark.testNetworkRetrieval(providers, KNOWN_CID, stats)
assertEquals(stats.networkRetrieval.statusCode, 200, 'stats.networkRetrieval.statusCode')
assertEquals(stats.networkRetrieval.timeout, false, 'stats.networkRetrieval.timeout')
assertInstanceOf(stats.networkRetrieval.endAt, Date, 'stats.networkRetrieval.endAt')
assertEquals(stats.networkRetrieval.carTooLarge, false, 'stats.networkRetrieval.carTooLarge')
assertEquals(stats.byteLength, 0, 'stats.byteLength')
assertEquals(stats.carChecksum, null, 'stats.carChecksum')
assertEquals(requests, [`https://frisbii.fly.dev/ipfs/${KNOWN_CID}?dag-scope=block`])
assertEquals(stats.statusCode, null, 'stats.statusCode')
assertEquals(stats.timeout, false, 'stats.timeout')
assertEquals(stats.startAt, null, 'stats.startAt')
assertEquals(stats.firstByteAt, null, 'stats.firstByteAt')
assertEquals(stats.endAt, null, 'stats.endAt')
assertEquals(stats.carTooLarge, false, 'stats.carTooLarge')
})

test('testNetworkRetrieval - no providers', async () => {
const requests = []
const spark = new Spark({
fetch: async (url) => {
requests.push(url.toString())
return fetch(url)
},
})
const stats = newStats()
const providers = []

await spark.testNetworkRetrieval(providers, KNOWN_CID, stats)
assertEquals(stats.networkRetrieval, null, 'stats.networkRetrieval')
})

/* Fixme: Find an active deal on a reliable graphsync provider
Expand Down Expand Up @@ -483,3 +538,36 @@ test('fetchCAR triggers timeout after long retrieval', async () => {

assertEquals(stats.timeout, true)
})

const mockProviders = [
{ protocol: 'http', contextId: 'ghsA123', address: 'provider1' },
{ protocol: 'graphsync', contextId: 'ghsB456', address: 'provider2' },
{ protocol: 'bitswap', contextId: 'ghsC789', address: 'provider3' },
// Serves using HTTP but contextId does not start with 'ghsA'
{ protocol: 'http', contextId: 'ghsB987', address: 'provider4' },
]

test('pickRandomProvider - should filter out providers using the Bitswap protocol', () => {
const result = pickRandomProvider(mockProviders)
assertNotEquals(result.protocol, 'bitswap')
})

test('pickRandomProvider - should return undefined if no providers are left after filtering', () => {
const providers = [{ protocol: 'bitswap', contextId: 'ghsC789', address: 'provider3' }]
const result = pickRandomProvider(providers)
assertEquals(result, undefined)
})

test('pickRandomProvider - should return a provider with higher weight more frequently', () => {
const results = {}
for (let i = 0; i < 1000; i++) {
const result = pickRandomProvider(mockProviders)
if (result) {
results[result.address] = (results[result.address] || 0) + 1
}
}

// Providers with protocol 'http' and contextId starting with 'ghsA' should have higher counts
assertGreater(results['provider1'], results['provider2'])
assertGreater(results['provider4'], results['provider2'])
})
Loading