Skip to content

Will6855/Annas-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anna's Archive API

A powerful, production-ready REST API for searching and retrieving book information from Anna's Archive. Features bot-detection bypass via stealth Playwright, intelligent domain rotation across all mirrors, and a multi-layer caching system.

Features

  • 🤖 Stealth Scraping — Playwright + stealth plugin bypasses Cloudflare & bot detection
  • 🔄 Domain Rotation — Automatically rotates across all known Anna's Archive mirrors, falling back gracefully when one is down
  • Smart Caching — Multi-TTL in-memory cache: search results (5 min), book details (1 hour)
  • 📄 Pagination — Full pagination support on search results
  • 🏊 Browser Pool — Reusable browser instances for high performance
  • 🛡️ Rate Limiting — Built-in per-IP rate limiting
  • 📊 Health Endpoint — Domain health & cache statistics

Setup

# Install dependencies
npm install

# Install Playwright's Chromium browser
npm run install:browsers

# Start in development mode
npm run dev

# Start in production
npm start

API Endpoints

GET /api/search

Search for books by any query (title, author, ISBN, DOI, MD5, etc.)

Query Parameters:

Param Type Default Description
q string required Search query
page number 1 Page number
lang string Language filter (e.g. en)
ext string File extension filter (e.g. pdf, epub)
sort string Sort order (newest, oldest, largest, smallest)
content string Content type filter

Response:

{
  "success": true,
  "query": "python programming",
  "page": 1,
  "results": [...],
  "cached": false,
  "domain": "annas-archive.gl",
  "responseTime": 1234
}

GET /api/book/:md5

Get full details for a specific book by its MD5 hash.

Response:

{
  "success": true,
  "md5": "d64efd386ed7227592499460aca2044b",
  "book": {
    "title": "Data Science Essentials in Python",
    "author": "Dmitry Zinoviev",
    "publisher": "Pragmatic Bookshelf",
    "year": "2016",
    "language": "en",
    "filesize": 6432380,
    "extension": "pdf",
    "isbn": ["9781680501841", "1680501844"],
    "description": "...",
    "cover": "https://...",
    "md5": "d64efd386ed7227592499460aca2044b",
    "downloadLinks": {
      "fast": [...],
      "slow": [...],
      "external": [...]
    },
    "metadata": {...}
  },
  "cached": true,
  "responseTime": 45
}

GET /health

Returns API health, domain status, cache stats, and browser pool status.


DELETE /api/cache

Clears the entire cache. Useful for forced refresh.

Mirrors / Domain Rotation

The API will automatically try these domains in order:

  1. annas-archive.gl
  2. annas-archive.org
  3. annas-archive.se
  4. annas-archive.gs
  5. annas-archive.gd
  6. annas-archive.pk

If a domain is unreachable, it is temporarily blacklisted and the next one is tried.

Environment Variables

See .env for all configuration options. Key settings include:

  • PORT: Server port (default: 3000)
  • CACHE_TYPE: memory (default) or redis
  • REDIS_URL: Redis connection string (e.g. redis://localhost:6379)
  • REDIS_PREFIX: Prefix for keys in Redis (default: annas-api:)
  • CACHE_TTL_SEARCH: TTL for search results in seconds (default: 300)
  • CACHE_TTL_BOOK: TTL for book details in seconds (default: 3600)

Caching Options

The API supports two caching engines:

1. In-Memory (NodeCache)

Default option. Best for single-instance deployments.

  • Set CACHE_TYPE=memory
  • Automatic cleanup of expired keys
  • Lightning-fast retrieval

2. Redis

Best for multi-instance deployments or persistent caching.

  • Set CACHE_TYPE=redis
  • Configure via REDIS_URL
  • Shared cache across multiple API nodes
  • Survives application restarts

About

Anna’s Archive API built with web scraping, providing programmatic access to search and retrieve metadata from the Anna’s Archive library.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors