-
Notifications
You must be signed in to change notification settings - Fork 1
Extract PieceCID from ContextID #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Example ContextID value in the new format:
Example Node.js code showing how to parse it: import { decode as decodeDagCbor } from '@ipld/dag-cbor'
const ContextID = 'ghsAAAAIAAAAANgqWCgAAYHiA5IgIFeksuf2VqNvrrRUxrvA+itvJhrDRju06ThagW6ULKw2'
const bytes = Buffer.from(ContextID, 'base64')
const [pieceSize, pieceCID] = decodeDagCbor(bytes)
console.log('PieceCID:', pieceCID.toString())
// CID(baga6ea4seaqfpjfs473fni3pv22fjrv3yd5cw3zgdlbumo5u5e4fvalosqwkynq)
console.log('PieceSize:', pieceSize)
// 34359738368 Note: we cannot assume that the decoded value will always be a pair of
|
Implementation Plan for ContextID-based PieceCID ExtractionOverviewThis implementation plan covers the enhancement of Background
Implementation Steps1. Create Utility Function for ContextID ParsingCreate the export function extractPieceCidFromContextID(contextID, logDebugMessage = debug) {
// Check if ContextID exists with proper structure
if (!contextID || !contextID['/'] || !contextID['/'].bytes) {
return null
}
try {
// Get bytes and check for "ghsA" prefix (optimization)
const contextIDBytes = contextID['/'].bytes
const contextIDString = Buffer.from(contextIDBytes, 'base64').toString('ascii')
if (!contextIDString.startsWith('ghsA')) {
return null
}
// Decode using CBOR and validate structure
const bytes = Buffer.from(contextIDBytes, 'base64')
const decoded = decodeDagCbor(bytes)
// Validation checks for array structure, pieceSize, and pieceCid
// [Type and structure validation logic here]
return { pieceCid, pieceSize }
} catch (err) {
// Error handling
return null
}
} 2. Modify PieceCID Extraction in advertisement-walker.jsUpdate the existing code to use the new function: // First, try to get PieceCID from Graphsync metadata (existing approach)
const meta = parseMetadata(advertisement.Metadata['/'].bytes)
let pieceCid = meta.deal?.PieceCID.toString()
// If not found in metadata, try to extract from ContextID
if (!pieceCid) {
const extractedData = extractPieceCidFromContextID(advertisement.ContextID, debug)
pieceCid = extractedData?.pieceCid?.toString()
// If still not found, return error
if (!pieceCid) {
debug('advertisement %s has no PieceCID in metadata or ContextID', advertisementCid)
return {
error: /** @type {const} */('MISSING_PIECE_CID'),
previousAdvertisementCid
}
}
} 3. Testing
4. Performance Considerations
|
Great plan, @NikolasHaimerl 👏 Do you think it's useful to get visibility into Context IDs that we can't parse? Or rather: Is it expected that there are Context IDs that we can't parse, or should we be able to parse all? If it's the former, no need to do anything, if it's the latter, let's add logging/telemetry. Regarding the performance optimization, it's a smart and simple idea 👍 But, as always: Have you measured what time we add if we don't have the optimization? And are we parsing Context IDs a lot, therefore the cycle saving is relevant? For now, the implementation suggests to try looking at metadata, then at the Context ID. Can it happen that both metadata and Context ID have Piece CID, but they are different? |
|
I understand that part, but what is the impact of the optimization? Ie how many ms / cycles are we saving? And how often does this saving occur? |
Great description of the plan, @NikolasHaimerl! 👏🏻
Yes, it is expected that there will be many Context IDs that we can't parse.
Let's make a decision and move on. I am fine either way, with a slight preference for implementing this optimisation.
Yes, a miner using the new ContextID format and advertising Graphsync retrievals will set PieceCID in both places (ContextID, Graphsync metadata). I would expect these values to be always the same, but that depends on the implementation in Miner SW. Let's search for PieceCID in ContextID first and treat Graphsync metadata as a fall-back option in case we cannot find PieceCID in ContextID.
|
I would go with the more detailed error logging than rather than the optimization. Should the performance of the piece-indexer be of concern we can always make the optimization later as well. Since the curio implementation is new to the entire stack, there is a chance of our assumptions being incorrect about how curio works/interacts with spark. Detailed error logging will help quite important here IMHO. |
To allow Spark to link deals to content advertised to IPNI, SP software like Curio creates ContextID from Piece info. This is provides an alternate way for extracting PieceCID from IPNI advertisements.
We need to enhance piece-indexer to support both options: PieceCID extracted from ContextID, PieceCID extracted from Graphsync metadata.
Spec:
https://github.com/CheckerNetwork/FIPs/blob/frc-retrieval-checking-requirements/FRCs/frc-retrieval-checking-requirements.md#construct-ipni-contextid-from-piececid-piecesize
Optimisation:
ContextID values following the spec above start with the prefix
ghsA
. If the ContextID value does not start with this prefix, we can skip it (there is no need to try to do base64 and CBOR decoding).See #117 for an example index provider that uses this new format.
Here is the place where we are extracting PieceCID from Graphsync metadata, we can add the new ContextID-based PieceCID extraction there:
piece-indexer/indexer/lib/advertisement-walker.js
Lines 273 to 281 in 684bdc0
The text was updated successfully, but these errors were encountered: