Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

reader-py

Python SDK for the Reader API — content extraction for LLMs. Wraps POST /v1/read, parses responses into Pydantic models, raises typed exceptions, and auto-polls async jobs to completion.

Version: 0.2.0 · Python: 3.9+

Install

pip install reader-py

Quick start (sync)

import os
from reader_py import ReaderClient

reader = ReaderClient(api_key=os.environ["READER_KEY"])

result = reader.read(url="https://example.com")
if result.kind == "scrape":
    print(result.data.markdown)

Quick start (async)

import asyncio
import os
from reader_py import AsyncReaderClient

async def main():
    async with AsyncReaderClient(api_key=os.environ["READER_KEY"]) as reader:
        result = await reader.read(url="https://example.com")
        if result.kind == "scrape":
            print(result.data.markdown)

asyncio.run(main())

reader.read(...) returns a discriminated union (Pydantic):

  • ScrapeReadResult(kind="scrape", data=ScrapeResult) — single-URL requests, returned immediately
  • JobReadResult(kind="job", data=Job) — batch and crawl requests, auto-polled to completion

Features

  • Sync and async clientsReaderClient (blocking, backed by httpx.Client) and AsyncReaderClient (backed by httpx.AsyncClient). Same method surface.
  • Typed errors for all 11 Reader error codes. InsufficientCreditsError, RateLimitedError, UrlBlockedError, ScrapeTimeoutError, and more. Each subclass exposes the relevant fields (e.g. err.required, err.retry_after_seconds).
  • Automatic retries with exponential backoff for transient codes. Honors the Retry-After header on 429.
  • Pagination-aware job collection. wait_for_job() returns the full job with every page result.
  • SSE streaming. for event in reader.stream(job_id) (sync) or async for (async) yields ProgressEvent / PageEvent / ErrorEvent / DoneEvent.
  • Pydantic models everywhere — all responses are parsed into typed models with IDE autocomplete.
  • Request ID tracing. Every error carries the x-request-id header value on err.request_id for support tickets.

Browser Sessions

Launch a stealthed Chrome and connect Playwright:

session = reader.sessions.create()

from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(session.ws_endpoint)
    page = browser.contexts[0].new_page()
    page.goto("https://example.com")
    print(page.title())
    browser.close()

reader.sessions.stop(session.session_id)

Async:

session = await reader.sessions.create()
# ... use async playwright ...
await reader.sessions.stop(session.session_id)

Methods: reader.sessions.create(), .get(id), .stop(id), .list()

Errors

from reader_py import (
    ReaderApiError,
    InsufficientCreditsError,
    RateLimitedError,
    UrlBlockedError,
)

try:
    reader.read(url=url)
except InsufficientCreditsError as err:
    print(f"Need {err.required}, have {err.available}")
except RateLimitedError as err:
    print(f"Retry after {err.retry_after_seconds}s")
except UrlBlockedError as err:
    print(f"Blocked: {err.reason}")
except ReaderApiError as err:
    print(f"[{err.code}] {err} — see {err.docs_url}")

ReaderError is re-exported as an alias for ReaderApiError so code written against the 0.1 SDK continues to work. New code should use ReaderApiError.

Full catalog of error codes: https://reader.dev/docs/home/concepts/errors

Links

Development

python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
pytest