Skip to content

Commit

Permalink
feat: put to r2 and check linkdex-api for dag structure (#2204)
Browse files Browse the repository at this point in the history
Porting web3-storage/web3.storage#1785

### Adds
- put CARs to R2 in parallel with S3. Both must succed for a succesful
upload, so we add p-retry around both
- linkdex-api to fetch dag completeness report over CARs in S3, to
replace what ipfs-cluster was doing for us.
- `updatePinStatus` to the db client, so we can update the status
immediately if linkdex-api tells us the DAG is now complete...
previously this only happened via a direct db update in a cron job.
- ElasticIpfs as an entry in the ClusterService enum, and changes so
that we add a "pin" record for the one we have on e-ipfs... the e-ipfs
pin will now be the only one for an upload, and it's status will
determine if we tell the user it is "Pinned" or "Pinning"... and we can
now find the dag status immediately after upload, so the time to mark a
thing as Pinned will be massively reduced.
- mocking fetch requests in miniflare, to simulate linkdex-api
responses. See:
https://miniflare.dev/core/standards#mocking-outbound-fetch-requests

adds env vars
- `CARPARK` - an R2 Bucket binding
- `CARPARK_URL` - the public url prefix to use when recording the r2
backup_url
- `LINKDEX_URL` - url for linkdex-api

### Removes
- pinning to ipfs-cluster after upload completes. It's all e-ipfs now.

## Notes

- It looks like we've been using the wrong `dagSize` for things uploaded
via `/store`. Uploading a single CBOR block CAR to cluster and asking it
for the dag size just gives you the total bytes of the blocks in the
CAR... not the size of the reachable dag from that root. This PR will
change that to store the full dagSize as the sum of the size of all the
blocks.
see:
https://github.com/ipfs-cluster/ipfs-cluster/blob/f376cf5106deeeb903b58e7e2431fa63bdea6900/adder/adder.go#L303


License: MIT
Signed-off-by: Oli Evans <[email protected]>
  • Loading branch information
olizilla authored Oct 17, 2022
1 parent 95d82cd commit 197e11b
Show file tree
Hide file tree
Showing 28 changed files with 1,198 additions and 189 deletions.
3 changes: 3 additions & 0 deletions .env.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,6 @@ S3_REGION = us-east-1
S3_ACCESS_KEY_ID = minioadmin
S3_SECRET_ACCESS_KEY = minioadmin
S3_BUCKET_NAME = dotstorage-dev-0

# R2
CARPARK_URL = https://carpark-dev.web3.storage
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This doc should contain everything you need to know to get a working development

You'll need at least the following:

- Node.js v16+
- Node.js v18+
- [yarn](https://yarnpkg.com/)
- Docker
- A personal account at [Magic Link](https://magic.link).
Expand Down
17 changes: 17 additions & 0 deletions packages/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ wrangler secret put S3_SECRET_ACCESS_KEY --env production # Get from Amazon S3 (
wrangler secret put S3_BUCKET_NAME --env production # e.g nft.storage-staging-us-east-2 (not required for dev)
wrangler secret put PRIVATE_KEY --env production # Get from 1password
wrangler secret put MAINTENANCE_MODE --env production # default value is "rw"
wrangler secret put LINKDEX_URL --env production # URL for linkdex-api that can read the prod s3 bucket

wrangler publish --env production
```
Expand Down Expand Up @@ -111,3 +112,19 @@ Common errors would be "cannot read version of schema", this typically indicates
## S3 Setup

We use [S3](https://aws.amazon.com/s3/) for backup and disaster recovery. For production deployment an account on AWS is required.

## Linkdex

Our linkdex service determines if a user has uploaded a "Complete" DAG where it was split over multiple patial CARs. During CAR uplaod we query it with the S3 key _after_ writing the CAR to the bucket.

The `env.LINKDEX_URL` points to the service to use. It should be for a linkdex-api deployment that has read access to the same s3 bucket as is used for uploads.

It iterates all the blocks in all the CARs for that users uploads only, and where every link is a CID for a block contained in the CARs, we say the DAG is "Complete". If not, it's "Patial". If we haven't checked or any of the blocks are undecodable with the set of codecs we have currently, then it's "Unknown".

see: https://github.com/web3-storage/linkdex-api

## CARPARK

We write Uploaded CARs to both S3 and R2 in parallel. The R2 Bucket is bound to the worker as `env.CARPARK`. The API docs for an R2Bucket instance are here: https://developers.cloudflare.com/r2/runtime-apis/#bucket-method-definitions

We key our R2 uploads by CAR CID, and record them in the DB under `upload.backup_urls`. The URL prefix for CARs in R2 is set by the `env.CARPARK_URL`. This is currently pointing to a subdomain on web3.storage which we could configure when we need direct http access to the bucket, but does not exist at time of writing.
1 change: 1 addition & 0 deletions packages/api/db/migrations/007-add-service.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ALTER TYPE service_type ADD VALUE 'ElasticIpfs';
4 changes: 3 additions & 1 deletion packages/api/db/tables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ CREATE TYPE service_type AS ENUM (
-- An IPFS Cluster originally commissioned for niftysave.
'IpfsCluster2',
-- New cluster with flatfs and better DHT.
'IpfsCluster3'
'IpfsCluster3',
-- The big one.
'ElasticIpfs'
);

-- Upload type is the type of received upload data.
Expand Down
9 changes: 6 additions & 3 deletions packages/api/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"dev": "miniflare dist/worker.js --watch --debug --env ../../.env",
"dev:persist": "PERSIST_VOLUMES=true npm run dev",
"build": "scripts/cli.js build",
"pretest": "tsc",
"test": "./docker/run-with-dependencies.sh ./scripts/run-test.sh",
"db-types": "./scripts/cli.js db-types"
},
Expand All @@ -27,17 +28,20 @@
"@noble/ed25519": "^1.6.1",
"@supabase/postgrest-js": "^0.34.1",
"ipfs-car": "^0.6.1",
"it-last": "^2.0.0",
"merge-options": "^3.0.4",
"multiformats": "^9.6.4",
"nanoid": "^3.1.30",
"one-webcrypto": "^1.0.3",
"p-retry": "^5.1.1",
"regexparam": "^2.0.0",
"toucan-js": "^2.4.1",
"ucan-storage": "^1.3.0",
"uint8arrays": "^3.0.0"
},
"devDependencies": {
"@cloudflare/workers-types": "^3.3.1",
"@cloudflare/workers-types": "^3.17.0",
"@miniflare/core": "^2.10.0",
"@sentry/cli": "^1.71.0",
"@sentry/webpack-plugin": "^1.16.0",
"@types/assert": "^1.5.6",
Expand All @@ -56,11 +60,10 @@
"execa": "^5.1.1",
"git-rev-sync": "^3.0.1",
"ipfs-unixfs-importer": "^9.0.3",
"miniflare": "^2.7.1",
"miniflare": "^2.10.0",
"minio": "^7.0.28",
"npm-run-all": "^4.1.5",
"openapi-typescript": "^4.0.2",
"p-retry": "^4.6.1",
"pg": "^8.7.1",
"playwright-test": "^7.2.1",
"process": "^0.11.10",
Expand Down
7 changes: 1 addition & 6 deletions packages/api/scripts/run-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,9 @@ $CLI db-sql --cargo --testing --reset
echo "creating minio bucket..."
$CLI minio bucket create dotstorage-dev-0

cd $THIS_DIR/..
echo "typechecking..."
npx tsc

echo "building worker..."
$CLI build --env=test

# run test suite, passing along any arguments we received
echo "running tests"
npx ava $@

npx ava "$@"
36 changes: 31 additions & 5 deletions packages/api/src/bindings.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ import { Service } from 'ucan-storage/service'
import { Mode } from './middleware/maintenance.js'
import { UserOutput, UserOutputKey } from './utils/db-client-types.js'
import { DBClient } from './utils/db-client.js'
import { LinkdexApi } from './utils/linkdex.js'
import { Logging } from './utils/logs.js'

export type RuntimeEnvironmentName = 'test' | 'dev' | 'staging' | 'production'

export type RawEnvConfiguration = Record<string, any>

export interface ServiceConfiguration {
/** Is this a debug build? */
DEBUG: boolean
Expand All @@ -31,6 +34,15 @@ export interface ServiceConfiguration {
/** Salt for API key generation */
SALT: string

/** R2Bucket binding */
CARPARK: R2Bucket

/** Public URL prefix for CARPARK R2 Bucket */
CARPARK_URL: string

/** URL for linkdex-api */
LINKDEX_URL?: string

/** API key for special metaplex upload account */
METAPLEX_AUTH_TOKEN: string

Expand Down Expand Up @@ -105,7 +117,9 @@ export interface RouteContext {
params: Record<string, string>
db: DBClient
log: Logging
uploader: Uploader
linkdexApi?: LinkdexApi
s3Uploader: Uploader
r2Uploader: Uploader
ucanService: Service
auth?: Auth
}
Expand Down Expand Up @@ -243,6 +257,18 @@ export type RequestForm = Array<RequestFormItem>
*/
export type DagStructure = 'Unknown' | 'Partial' | 'Complete'

export type Backup = {
key: string
url: URL
}

// needs to be a type so it can be assigned to Record<string, string>
export type BackupMetadata = {
structure: DagStructure
rootCid: string
carCid: string
}

/**
* A client to a service that accepts CAR file uploads.
*/
Expand All @@ -251,9 +277,9 @@ export interface Uploader {
* Uploads the CAR file to the service and returns the URL.
*/
uploadCar(
carBytes: Uint8Array,
carCid: CID,
userId: number,
sourceCid: string,
car: Blob,
structure?: DagStructure
): Promise<URL>
metadata: BackupMetadata
): Promise<Backup>
}
32 changes: 25 additions & 7 deletions packages/api/src/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import {
} from './middleware/maintenance.js'

/**
* @typedef {import('./bindings').RawEnvConfiguration} RawEnvConfiguration
* @typedef {import('./bindings').ServiceConfiguration} ServiceConfiguration
* @typedef {import('./bindings').RuntimeEnvironmentName} RuntimeEnvironmentName
*/
Expand All @@ -29,29 +30,37 @@ export const getServiceConfig = () => {

/**
* Parse a {@link ServiceConfiguration} out of the given `configVars` map.
* @param {Record<string, string>} vars map of variable names to values.
* @param {RawEnvConfiguration} vars map of variable names to values.
*
* Exported for testing. See {@link getServiceConfig} for main public accessor.
*
* @returns {ServiceConfiguration}
*/
export function serviceConfigFromVariables(vars) {
let clusterUrl
if (!vars.CLUSTER_SERVICE) {
clusterUrl = vars.CLUSTER_API_URL
} else {
if (vars.CLUSTER_SERVICE) {
clusterUrl = CLUSTER_SERVICE_URLS[vars.CLUSTER_SERVICE]
if (!clusterUrl) {
throw new Error(`unknown cluster service: ${vars.CLUSTER_SERVICE}`)
}
}
if (vars.CLUSTER_API_URL) {
clusterUrl = vars.CLUSTER_API_URL
}
if (!clusterUrl || (vars.CLUSTER_SERVICE && vars.CLUSTER_API_URL)) {
throw new Error(
`One of CLUSTER_SERVICE or CLUSTER_API_URL must be set in ENV`
)
}

return {
ENV: parseRuntimeEnv(vars.ENV),
DEBUG: boolValue(vars.DEBUG),
MAINTENANCE_MODE: maintenanceModeFromString(vars.MAINTENANCE_MODE),

SALT: vars.SALT,
CARPARK: vars.CARPARK,
CARPARK_URL: vars.CARPARK_URL,
DATABASE_URL: vars.DATABASE_URL,
DATABASE_TOKEN: vars.DATABASE_TOKEN,
CLUSTER_API_URL: clusterUrl,
Expand All @@ -60,6 +69,7 @@ export function serviceConfigFromVariables(vars) {
SENTRY_DSN: vars.SENTRY_DSN,
METAPLEX_AUTH_TOKEN: vars.METAPLEX_AUTH_TOKEN,
MAILCHIMP_API_KEY: vars.MAILCHIMP_API_KEY,
LINKDEX_URL: vars.LINKDEX_URL,
LOGTAIL_TOKEN: vars.LOGTAIL_TOKEN,
S3_ENDPOINT: vars.S3_ENDPOINT,
S3_REGION: vars.S3_REGION,
Expand All @@ -84,11 +94,11 @@ export function serviceConfigFromVariables(vars) {
*
* Exported for testing. See {@link getServiceConfig} for main config accessor.
*
* @returns { Record<string, string>} an object with `vars` containing all config variables and their values. guaranteed to have a value for each key defined in DEFAULT_CONFIG_VALUES
* @returns { RawEnvConfiguration } an object with `vars` containing all config variables and their values. guaranteed to have a value for each key defined in DEFAULT_CONFIG_VALUES
* @throws if a config variable is missing, unless ENV is 'test' or 'dev', in which case the default value will be used for missing vars.
*/
export function loadConfigVariables() {
/** @type Record<string, string> */
/** @type RawEnvConfiguration */
const vars = {}

/** @type Record<string, unknown> */
Expand All @@ -98,6 +108,8 @@ export function loadConfigVariables() {
'ENV',
'DEBUG',
'SALT',
'CARPARK',
'CARPARK_URL',
'DATABASE_URL',
'DATABASE_TOKEN',
'MAGIC_SECRET_KEY',
Expand All @@ -118,6 +130,9 @@ export function loadConfigVariables() {
const val = globals[name]
if (typeof val === 'string') {
vars[name] = val
} else if (val !== null && typeof val === 'object') {
// some globals are objects like an R2Bucket, bound for us by Cloudflare
vars[name] = val
} else {
throw new Error(
`Missing required config variables: ${name}. Check your .env, testing globals or cloudflare vars.`
Expand All @@ -128,6 +143,7 @@ export function loadConfigVariables() {
const optional = [
'CLUSTER_SERVICE',
'CLUSTER_API_URL',
'LINKDEX_URL',
'S3_ENDPOINT',
'SLACK_USER_REQUEST_WEBHOOK_URL',
]
Expand All @@ -137,7 +153,9 @@ export function loadConfigVariables() {
if (typeof val === 'string') {
vars[name] = val
} else {
console.warn(`Missing optional config variables: ${name}`)
if (globals.DEBUG === 'true') {
console.warn(`Missing optional config variables: ${name}`)
}
}
}

Expand Down
14 changes: 14 additions & 0 deletions packages/api/src/errors.js
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,20 @@ export class ErrorInvalidMetaplexToken extends Error {

ErrorInvalidMetaplexToken.CODE = 'ERROR_INVALID_METAPLEX_TOKEN'

export class LinkdexError extends Error {
/**
* @param {number} status
* @param {string} statusText
*/
constructor(status, statusText) {
super(`linkdex-api not ok: ${status} ${statusText}`)
this.name = 'LinkdexError'
this.status = status
this.code = LinkdexError.CODE
}
}
LinkdexError.CODE = 'LINKDEX_NOT_OK'

export class ErrorPinningUnauthorized extends HTTPError {
constructor(
msg = 'Pinning not authorized for this user, visit https://nft.storage/docs/how-to/pinning-service/ for instructions on how to request authorization.'
Expand Down
1 change: 0 additions & 1 deletion packages/api/src/routes/metaplex-upload.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ export async function metaplexUpload(event, ctx) {
ctx,
user,
key,
car: blob,
mimeType: 'application/car',
files: [],
structure: 'Unknown',
Expand Down
Loading

0 comments on commit 197e11b

Please sign in to comment.