ads-bib takes a NASA ADS search query and produces a normalized, curated dataset, with disambiguated author names (AND via ads-and), topic models (via BERTopic or Toponymy), and citation networks ready for e.g. Gephi, CiteSpace, or VOSviewer, locally or via API.
Use uv and Python 3.12.
uv pip install ads-bib
# or: pip install ads-bibCreate a .env file in your project root with the relevant API keys.
ADS_TOKEN=your-ads-token # required
OPENROUTER_API_KEY=your-key # only for the openrouter road
HF_TOKEN=your-key # for hf_api and local_gpu model access
MODAL_TOKEN_ID=your-modal-id # only for AND with backend=modal
MODAL_TOKEN_SECRET=your-modal-secretADS user token settings | OpenRouter Keys | Hugging Face Access Tokens | Modal.
Then run in your terminal:
ads-bib run --preset openrouter --set search.query='author:"Hawking, S*"'Author name disambiguation is off by default. Enable the local CPU/GPU path
with --set author_disambiguation.enabled=true; use
--set author_disambiguation.backend=modal only when your Modal credentials are
configured.
Full setup details: Get Started | Runtime Roads
Every run writes config_used.yaml and reusable stage artifacts. To try one
change without repeating the whole pipeline, start a variant from that run:
ads-bib run --from-run run_20260407_120000_ads_bib_openrouter \
--set topic_model.embedding_model=google/gemini-embedding-001ads-bib loads the previous config, applies the override, chooses the earliest
stage that needs recomputation, and writes a new run folder with a variant
block in run_summary.yaml. Preview the reuse plan first with --dry-run.
import ads_bib
ads_bib.run(
preset="openrouter",
query='author:"Hawking, S*"',
)More examples and the NotebookSession interface: Python API docs
| Road | Hardware | Network | Cost |
|---|---|---|---|
openrouter |
any | API calls | pay-per-token |
hf_api |
any | API calls | HF-plan-dependent |
local_cpu |
CPU only | model downloads only | free after setup |
local_gpu |
NVIDIA + CUDA | model downloads only | free after setup |
Full provider matrix and first-run behavior: Runtime Roads
Each project folder keeps shared caches under data/cache/ and writes every
run under runs/<run_id>/:
runs/<run_id>/
├── config_used.yaml
├── run_summary.yaml
├── data/
│ ├── search/ # run-local ADS search result used for export variants
│ ├── export/ # pre-translation publications and references
│ ├── translated/ # translated publications and references
│ ├── tokenized/ # tokenized publications and references
│ ├── and/ # disambiguated frames plus optional ads-and diagnostics
│ ├── dataset/ # final publications, references, topic_info, manifest
│ └── citations/ # GEXF/CSV/JSON networks and WOS export
├── plots/topic_map.html
└── logs/runtime.log
data/search|export|translated|tokenized|and/— run-local stage boundaries used by--from-runvariantsdata/dataset/publications.parquet— cleaned, translated, topic-labeled publications, with disambiguated authors when AND is enableddata/dataset/references.parquet— normalized cited-reference metadata, with disambiguated authors when AND is enableddata/dataset/topic_info.parquet— one row per topic with labels, counts, and representation fieldsplots/topic_map.html— interactive topic visualization (open in any browser), using datamapplotdata/citations/*.gexf— direct citation, co-citation, bibliographic coupling, author co-citationdata/citations/download_wos_export.txt— Web of Science format for e.g. CiteSpace / VOSviewerrun_summary.yaml— full run metadata, stage status, and optional variant provenancedata/dataset/dataset_manifest.json— artifact hashes plus bundle-cleaning provenance
Topic map output from author:"Hawking, S*" in datamapplot.
Author co-citation output from author:"Hawking, S*" in Gephi Lite.