Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add catalog-server endpoint to update packages #1392

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,19 @@ The Stack is updated to be more familiar to modern JavaScript developers, includ
* GraphQL SDL
* graphql-codegen to generate TypeScript interfaces

## Catalog
## Catalog Server

The catalog is a standalone backend service with a GraphQL API that is accessed by the site's frontend server.
The catalog server is a standalone backend service with a GraphQL API that is accessed by the site's frontend server.

## Access Control

The server is intended to eventually have a public-facing GraphQL service. It also has admin functionality such as HTTP endpoints for updating the packages in the catalog that need to be reachable by scheduling services (such as [Google Cloud Scheduler](https://cloud.google.com/scheduler) or [Google Cloud Tasks](https://cloud.google.com/tasks)). These admin endpoints are attractive for denial-of-service attacks because they initiate expensive bulk I/O and database operations.

The admin endpoints must have restricted access control and not be public. For now, the easiest way to do this without a third service (a separate admin service) is to restrict access to the whole catalog server service which also implements the admin endpoints. Later, the catalog server can be refactored to contain only a public API and the admin endpoints moved to a separate service.

Access control to the catalog service can be implemented in Google Cloud Run by setting up a service account to make requests and limiting the service to non-public visibility, [as documented here](https://cloud.google.com/run/docs/authenticating/service-to-service).

As the article says, this should be environment agnostic, so it works outside of Google Cloud - though each environment will have it's own way of setting the access control of the catalog service.

### Custom Elements Manifest

Expand Down
20 changes: 19 additions & 1 deletion packages/catalog-server/src/lib/catalog.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ const toTemporalInstant = (date: Date) => {
*/
const defaultPackageRefreshInterval = Temporal.Duration.from({minutes: 5});

/**
* The default amount of time between automated bulk updates of packages.
*/
const defaultPackageUpdateInterval = Temporal.Duration.from({hours: 6});

export interface CatalogInit {
repository: Repository;
files: PackageFiles;
Expand Down Expand Up @@ -77,7 +82,7 @@ export class Catalog {
packageVersion?: PackageVersion;
problems?: ValidationProblem[];
}> {
console.log('Catalog.importPackage');
console.log('Catalog.importPackage', packageName);

const currentPackageInfo = await this.#repository.getPackageInfo(
packageName
Expand Down Expand Up @@ -347,4 +352,17 @@ export class Catalog {
// to the repository
return this.#repository.queryElements({query, limit});
}

async getPackagesToUpdate(notUpdatedSince?: Temporal.Instant) {
if (notUpdatedSince === undefined) {
const now = Temporal.Now.instant();
notUpdatedSince = now.subtract(defaultPackageUpdateInterval);
}

const packages = await this.#repository.getPackagesToUpdate(
notUpdatedSince,
100
);
return packages;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import {
CollectionReference,
CollectionGroup,
UpdateData,
Timestamp,
} from '@google-cloud/firestore';
import {Firestore} from '@google-cloud/firestore';
import firebase from 'firebase-admin';
Expand Down Expand Up @@ -55,6 +56,7 @@ import {
} from './package-version-converter.js';
import {customElementConverter} from './custom-element-converter.js';
import {validationProblemConverter} from './validation-problem-converter.js';
import type {Temporal} from '@js-temporal/polyfill';

const projectId = process.env['GCP_PROJECT_ID'] || 'wc-catalog';
firebase.initializeApp({projectId});
Expand Down Expand Up @@ -577,13 +579,37 @@ export class FirestoreRepository implements Repository {
return result;
}

getPackageRef(packageName: string) {
async getPackagesToUpdate(
notUpdatedSince: Temporal.Instant,
limit = 100
): Promise<Array<PackageInfo>> {
const date = new Date(notUpdatedSince.epochMilliseconds);
const notUpdatedSinceTimestamp = Timestamp.fromDate(date);

// Only query 'READY', 'ERROR', and 'NOT_FOUND' packages.
// INITIALIZING and UPDATING packages are being updated, possibly by the
// batch update task calling this method.
// ERROR and NOT_FOUND are "recoverable" errors, so we should try to import
// them again.
const result = await this.getPackageCollectionRef()
.where('status', 'in', ['READY', 'ERROR', 'NOT_FOUND'])
.where('lastUpdate', '<', notUpdatedSinceTimestamp)
.limit(limit)
.get();
const packages = result.docs.map((d) => d.data());
return packages;
}

getPackageCollectionRef() {
return db
.collection('packages' + (this.namespace ? `-${this.namespace}` : ''))
.doc(packageNameToId(packageName))
.withConverter(packageInfoConverter);
}

getPackageRef(packageName: string) {
return this.getPackageCollectionRef().doc(packageNameToId(packageName));
}

getPackageVersionCollectionRef(packageName: string) {
return this.getPackageRef(packageName)
.collection('versions')
Expand Down
9 changes: 9 additions & 0 deletions packages/catalog-server/src/lib/repository.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* SPDX-License-Identifier: Apache-2.0
*/

import type {Temporal} from '@js-temporal/polyfill';
import type {
CustomElement,
PackageInfo,
Expand Down Expand Up @@ -150,4 +151,12 @@ export interface Repository {
packageName: string,
version: string
): Promise<PackageVersion | undefined>;

/**
* Returns packages that have not been updated since the date given.
*/
getPackagesToUpdate(
notUpdatedSince: Temporal.Instant,
limit: number
): Promise<Array<PackageInfo>>;
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ export const makeBootstrapPackagesRoute =
const bootstrapListFile = await readFile(bootstrapListFilePath, 'utf-8');
const bootstrapList = JSON.parse(bootstrapListFile);
const packageNames = bootstrapList['packages'] as Array<string>;

// TODO (justinfagnani): rather than import the packages directly, add them
// to the DB in a non-imported state, then kick off the standard update
// workflow, which will import them all.
const results = await Promise.all(
packageNames.map(
async (
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
import {Temporal} from '@js-temporal/polyfill';
import {PackageInfo} from '@webcomponents/catalog-api/lib/schema.js';
import type Koa from 'koa';
import type {Catalog} from '../../catalog.js';

// Google Cloud Run default request timeout is 5 minutes, so to do longer
// imports we need to configure the timeout.
const maxImportDuration = Temporal.Duration.from({minutes: 5});

export const makeUpdatePackagesRoute =
(catalog: Catalog) => async (context: Koa.Context) => {
// TODO (justinfagnani): DO_NOT_LAUNCH check that the request is from one
// of our service accounts

const startInstant = Temporal.Now.instant();
// If the `force` query parameter is present we force updating of all
// packages by setting the `notUpdatedSince` parameter to `startInstant` so
// that we get all packages last updated before now. We calculate the
// `notUpdatedSince` time once before updates so that we don't retrieve
// packages that we update in this operation.
// `force`` is useful for development and testing as we may be trying to
// update packages that were just imported.
// TODO (justinfagnani): check a DEV mode also so this isn't available
// in production?
const force = 'force' in context.query;
const notUpdatedSince = force ? startInstant : undefined;

// If `force` is true, override the default packageUpdateInterval
// TODO: how do we make an actually 0 duration?
const packageUpdateInterval = force
? Temporal.Duration.from({microseconds: 1})
: undefined;

console.log('Starting package update at', startInstant, `force: ${force}`);

let packagesToUpdate!: Array<PackageInfo>;
let packagesUpdated = 0;
let iteration = 0;

// Loop through batches of packages to update.
// We batch here so that we can pause and check that we're still within the
// maxImportDuration, and use small enough batches so that we can ensure at
// least one batch in that time.
do {
// getPackagesToUpdate() queries the first N (default 100) packages that
// have not been updated since the update interval (default 6 hours).
// When a package is imported it's lastUpdate date will be updated and the
// next call to getPackagesToUpdate() will return the next 100 packages.
// This way we don't need a DB cursor to make progress through the
// package list.
packagesToUpdate = await catalog.getPackagesToUpdate(notUpdatedSince);

if (packagesToUpdate.length === 0) {
// No more packages to update
if (iteration === 0) {
console.log('No packages to update');
}
break;
}

await Promise.allSettled(
packagesToUpdate.map(async (pkg) => {
try {
return await catalog.importPackage(pkg.name, packageUpdateInterval);
} catch (e) {
console.error(e);
throw e;
}
})
);
packagesUpdated += packagesToUpdate.length;

const now = Temporal.Now.instant();
const timeSinceStart = now.since(startInstant);
// If the time since the update started is not less than that max import
// duration, stop.
// TODO (justinfagnani): we need a way to test this
if (Temporal.Duration.compare(timeSinceStart, maxImportDuration) !== -1) {
break;
}
} while (true);
console.log(`Updated ${packagesUpdated} packages`);

if (packagesToUpdate.length > 0) {
// TODO (justinfagnani): kick off new update request
console.log(`Not all packages were updated (${packagesToUpdate.length})`);
}

context.status = 200;
context.type = 'html';
context.body = `
<h1>Update Results</h1>
<p>Updated ${packagesUpdated} package</p>
`;
};
5 changes: 4 additions & 1 deletion packages/catalog-server/src/lib/server/server.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/**
* @license
* Copyright 2021 Google LLC
* Copyright 2022 Google LLC
* SPDX-License-Identifier: BSD-3-Clause
*/

Expand All @@ -17,6 +17,7 @@ import {NpmAndUnpkgFiles} from '@webcomponents/custom-elements-manifest-tools/li

import {makeGraphQLRoute} from './routes/graphql.js';
import {makeBootstrapPackagesRoute} from './routes/bootstrap-packages.js';
import {makeUpdatePackagesRoute} from './routes/update-packages.js';

export const makeServer = async () => {
const files = new NpmAndUnpkgFiles();
Expand All @@ -32,6 +33,8 @@ export const makeServer = async () => {

router.get('/bootstrap-packages', makeBootstrapPackagesRoute(catalog));

router.get('/update-packages', makeUpdatePackagesRoute(catalog));

router.get('/', async (ctx) => {
ctx.status = 200;
ctx.type = 'html';
Expand Down