Skip to content

oritwoen/omnichron

Repository files navigation

omnichron

npm version npm downloads license Ask DeepWiki

Unified TypeScript interface for querying web archive providers. One API, multiple sources, consistent output.

Features

  • πŸ” Multiple providers - Wayback Machine, Archive.today, Common Crawl, Perma.cc, WebCite
  • 🌳 Tree-shakable - providers are lazy-loaded via dynamic imports, bundle only what you use
  • πŸ“¦ Caching built in - pluggable storage layer via unstorage with configurable TTL
  • ⚑ Parallel queries - concurrency control, batching, automatic retries, configurable timeouts
  • πŸ”§ Config files - supports omnichron.config.ts, .omnichron, and package.json via c12
  • 🏷️ Fully typed - TypeScript definitions for all responses, options, and provider-specific metadata

Install

pnpm add omnichron

Usage

import { createArchive, providers } from "omnichron";

const archive = createArchive(providers.wayback());
const response = await archive.snapshots("example.com", { limit: 100 });

if (response.success) {
  for (const page of response.pages) {
    console.log(page.url, page.timestamp, page.snapshot);
  }
}

Query all providers at once with providers.all() (excludes Perma.cc since it needs an API key):

const archive = createArchive(providers.all());
const response = await archive.snapshots("example.com");

To pick specific providers, wrap them in Promise.all:

const archive = createArchive(
  Promise.all([providers.wayback(), providers.archiveToday(), providers.commoncrawl()]),
);

Perma.cc

Perma.cc requires an API key:

const archive = createArchive(providers.permacc({ apiKey: "YOUR_API_KEY" }));

Error handling

snapshots() returns a response object with a success flag. If you prefer throwing on failure, use getPages():

// safe - check success flag yourself
const response = await archive.snapshots("example.com");

// throws on failure, returns pages array directly
const pages = await archive.getPages("example.com");

Providers

Provider Factory Notes
Wayback Machine providers.wayback() web.archive.org CDX API
Archive.today providers.archiveToday() archive.ph via Memento timemap
Common Crawl providers.commoncrawl() Defaults to latest collection
Perma.cc providers.permacc() Requires apiKey
WebCite providers.webcite() Read-only, no longer accepts new archives
All providers.all() All of the above except Perma.cc

You can add providers dynamically after creation:

const archive = createArchive(providers.wayback());
await archive.use(providers.archiveToday());
await archive.useAll([providers.commoncrawl(), providers.webcite()]);

Response format

Every provider normalizes its output to the same shape:

interface ArchiveResponse {
  success: boolean;
  pages: ArchivedPage[];
  error?: string;
  _meta?: ResponseMetadata;
  fromCache?: boolean;
}

interface ArchivedPage {
  url: string; // original URL
  timestamp: string; // ISO 8601
  snapshot: string; // direct link to the archived version
  _meta: Record<string, unknown>;
}

The _meta object on each page carries provider-specific fields. Wayback includes status and timestamp in its raw format. Common Crawl adds digest, mime, collection. Perma.cc has guid, title, created_by. Archive.today provides hash and raw_date.

Configuration

omnichron loads configuration through c12, which means you can configure it via config files, environment overrides, or package.json:

// omnichron.config.ts
export default {
  storage: {
    cache: true,
    ttl: 7 * 24 * 60 * 60 * 1000, // 7 days
    prefix: "omnichron",
  },
  performance: {
    concurrency: 3,
    batchSize: 20,
    timeout: 10_000,
    retries: 1,
  },
};

Environment-specific overrides work with $development, $production, and $test keys.

Custom storage driver

The caching layer is backed by unstorage, so any unstorage driver works:

import { configureStorage } from "omnichron";
import fsDriver from "unstorage/drivers/fs";

await configureStorage({
  driver: fsDriver({ base: "./cache" }),
  ttl: 24 * 60 * 60 * 1000, // 1 day
});

Per-request cache control is also supported:

// skip cache for this request
await archive.snapshots("example.com", { cache: false });

API

createArchive(providers, options?)

Creates an archive client. Accepts a single provider, a Promise<ArchiveProvider>, or a Promise<ArchiveProvider[]>.

Returns:

  • snapshots(domain, options?) - returns full ArchiveResponse with success flag
  • getPages(domain, options?) - returns ArchivedPage[], throws on failure
  • use(provider) - add a provider to the instance
  • useAll(providers) - add multiple providers at once

Options

All methods accept ArchiveOptions:

Option Type Default Description
limit number 1000 Maximum results to return
cache boolean true Enable/disable caching
ttl number 604800000 Cache TTL in milliseconds (7 days)
concurrency number 3 Max parallel requests
batchSize number 20 Items per processing batch
timeout number 10000 Request timeout in ms
retries number 1 Retry attempts on failure
apiKey string - API key for providers that need auth

Options can be set at three levels: config file (global defaults), createArchive call (instance defaults), and individual method calls (per-request). Each level overrides the previous one.

Storage utilities

  • configureStorage(options?) - configure the cache driver and settings
  • clearProviderStorage(provider) - clear cached responses for a specific provider
  • storage - direct access to the underlying unstorage instance

Roadmap

Providers: Archive-It, Conifer (formerly Webrecorder)

Features: Page archiving API for creating archives, not just reading them

License

MIT

About

Unified TypeScript interface for multiple web archive platforms.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages