Python SDK for the Reader API — content extraction for LLMs. Wraps POST /v1/read, parses responses into Pydantic models, raises typed exceptions, and auto-polls async jobs to completion.
Version: 0.2.0 · Python: 3.9+
pip install reader-pyimport os
from reader_py import ReaderClient
reader = ReaderClient(api_key=os.environ["READER_KEY"])
result = reader.read(url="https://example.com")
if result.kind == "scrape":
print(result.data.markdown)import asyncio
import os
from reader_py import AsyncReaderClient
async def main():
async with AsyncReaderClient(api_key=os.environ["READER_KEY"]) as reader:
result = await reader.read(url="https://example.com")
if result.kind == "scrape":
print(result.data.markdown)
asyncio.run(main())reader.read(...) returns a discriminated union (Pydantic):
ScrapeReadResult(kind="scrape", data=ScrapeResult)— single-URL requests, returned immediatelyJobReadResult(kind="job", data=Job)— batch and crawl requests, auto-polled to completion
- Sync and async clients —
ReaderClient(blocking, backed byhttpx.Client) andAsyncReaderClient(backed byhttpx.AsyncClient). Same method surface. - Typed errors for all 11 Reader error codes.
InsufficientCreditsError,RateLimitedError,UrlBlockedError,ScrapeTimeoutError, and more. Each subclass exposes the relevant fields (e.g.err.required,err.retry_after_seconds). - Automatic retries with exponential backoff for transient codes. Honors the
Retry-Afterheader on 429. - Pagination-aware job collection.
wait_for_job()returns the full job with every page result. - SSE streaming.
for event in reader.stream(job_id)(sync) orasync for(async) yieldsProgressEvent/PageEvent/ErrorEvent/DoneEvent. - Pydantic models everywhere — all responses are parsed into typed models with IDE autocomplete.
- Request ID tracing. Every error carries the
x-request-idheader value onerr.request_idfor support tickets.
Launch a stealthed Chrome and connect Playwright:
session = reader.sessions.create()
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(session.ws_endpoint)
page = browser.contexts[0].new_page()
page.goto("https://example.com")
print(page.title())
browser.close()
reader.sessions.stop(session.session_id)Async:
session = await reader.sessions.create()
# ... use async playwright ...
await reader.sessions.stop(session.session_id)Methods: reader.sessions.create(), .get(id), .stop(id), .list()
from reader_py import (
ReaderApiError,
InsufficientCreditsError,
RateLimitedError,
UrlBlockedError,
)
try:
reader.read(url=url)
except InsufficientCreditsError as err:
print(f"Need {err.required}, have {err.available}")
except RateLimitedError as err:
print(f"Retry after {err.retry_after_seconds}s")
except UrlBlockedError as err:
print(f"Blocked: {err.reason}")
except ReaderApiError as err:
print(f"[{err.code}] {err} — see {err.docs_url}")ReaderError is re-exported as an alias for ReaderApiError so code written against the 0.1 SDK continues to work. New code should use ReaderApiError.
Full catalog of error codes: https://reader.dev/docs/home/concepts/errors
- Docs: https://reader.dev/docs
- SDK reference: https://reader.dev/docs/sdk/python
- API reference: https://reader.dev/docs/api-reference/read
- Discord: https://discord.gg/6tjkq7J5WV
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
pytest