LLM AgeNt for Gene Program Annotation
This repository uses a UV workspace containing two packages:
1. langpa - Core DeepSearch Engine
The main LangPA package for gene program annotation and literature analysis.
- DeepSearch service integration
- Citation resolution and normalization
- Markdown report generation
- Output management and validation
- π Full Documentation
2. langpa-validation-tools - Validation & Analysis Tools
Companion package for comparing and analyzing DeepSearch outputs across multiple runs.
- Program comparison with Jaccard similarity
- Semantic similarity using OpenAI embeddings
- Visualization (bubble plots, confusion matrices)
- Master validation reports
- CLI:
langpa-validatecommand - π Full Documentation
# Clone the repository
git clone https://github.com/Cellular-Semantics/langpa.git
cd langpa
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install both packages (langpa + langpa-validation-tools)
uv sync --dev
# Set up pre-commit hooks (optional but recommended)
uv run pre-commit install
# Use repo-provided git hooks for consistent checks
git config core.hooksPath .githooksThe UV workspace automatically installs both packages and links them together. The langpa-validation-tools package imports from langpa as a workspace dependency.
Create a .env file in the project root (never commit secrets). cellsem_llm_client automatically loads this file via python-dotenv, so once the keys are present you can rely on the client (and the rest of the stack) to access them without extra wiring:
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_hereAs long as that .env file lives at the repo root, cellsem_llm_client (and the bootstrapping in src/langpa) will call load_dotenv() and expose those keys to agents, services, and tests automaticallyβno manual export required.
from langpa import bootstrap
# Load environment + perform any required startup tasks
bootstrap()Documentation lives in docs/ and is built with Sphinx + MyST. Run python scripts/check-docs.py to build with warnings-as-errors before each commit. Publish the rendered HTML via GitHub Pages or your preferred static host.
- β
Agentic workflow scaffold with strict TDD guardrails (
CLAUDE.md) - β Unit & integration test suites pre-configured with pytest markers
- β Docs + automation scripts for Sphinx builds
- β
Environment bootstrap handled via
python-dotenv - β
uv-first packaging (
pyproject.tomlwith Ruff, MyPy, pytest config) - β
Integrated clients:
cellsem_llm_clientfor LLMs anddeep-research-clientfor Deepsearch workflows - β
Pydantic AI graph orchestration:
pydantic-aiagent surfaces graph nodes safely with typed deps
This is a UV workspace with two packages:
langpa/ # Repository root
βββ pyproject.toml # Workspace configuration
βββ langpa/ # Core package
β βββ pyproject.toml # Core package config
β βββ src/langpa/
β β βββ agents/ # Agent classes coordinating workflows
β β βββ graphs/ # Optional workflow graphs powered by Pydantic
β β βββ schemas/ # Shared IO models and contracts
β β βββ services/ # LLM + Deepsearch integration layers
β β βββ utils/ # Repo-specific tooling/helpers
β βββ tests/
β βββ unit/ # Fast, isolated tests
β βββ integration/ # Real API + IO validation (no mocks)
βββ langpa_validation_tools/ # Validation tools package
β βββ pyproject.toml # Validation tools config
β βββ src/langpa_validation_tools/
β β βββ analysis/ # Run comparison & embedding workflows
β β βββ comparison/ # Similarity metrics & matching
β β βββ embeddings/ # OpenAI embedding generation
β β βββ visualization/ # Heatmaps & bubble plots
β β βββ reporting/ # Master report generation
β β βββ cli.py # langpa-validate CLI
β βββ tests/
β βββ unit/ # Validation tools unit tests
β βββ integration/ # Validation tools integration tests
βββ docs/ # Sphinx configuration and content
βββ scripts/ # Tooling helpers (docs, CLI, etc.)
langpa - Core functionality:
- Agent entrypoints coordinating services and schemas
- Optional workflow graphs powered by Pydantic + pydantic-ai
- JSON Schema contracts describing outputs + business rules
- Concrete integrations (CellSem LLM client, Deepsearch)
- Citation resolution and markdown report generation
langpa-validation-tools - Analysis & validation:
- Comparison metrics (Jaccard, name similarity, embeddings)
- Batch embedding generation with caching
- Visualization generation (bubble plots, confusion matrices)
- Master validation reports aggregating multiple runs
- CLI interface for validation workflows
The CLI scripts/run_deepsearch.py supports live API runs, batch processing from CSV, and offline processing of saved DeepSearch markdown/raw JSON.
-
Single Query Mode (
--single <file>):- Process a single markdown or raw JSON file
- Example:
python scripts/run_deepsearch.py --single outputs/project/query/timestamp/deepsearch.json
-
CSV Batch Mode (
--batch-csv <file>):- Run fresh DeepSearch API calls for each row in CSV
- CSV columns:
ID,name,gene_list,context(optional),GSE(optional) - Query naming:
{ID}_{name}if both present, else single value - Per-row context overrides global
--context/--context-file - Supports multiple runs per query via
--num-runs N(default: 1) - Example:
python scripts/run_deepsearch.py --batch-csv queries.csv --project my_project --num-runs 3 - See examples/batch_queries_example.csv for format
-
Batch Reprocess Mode (
--batch-reprocess):- Reprocess existing
deepsearch.jsonfiles underoutputs/<project>/ - Does not make new API calls; processes saved responses
- Example:
python scripts/run_deepsearch.py --batch-reprocess --project my_project
- Reprocess existing
--project/--query: organize outputs underoutputs/<project>/<query>/<timestamp>/.--from-markdown/--raw-input: process saved responses without calling the API.--resolve-citations: normalize/resolve citations via url2ref and write a container with CSL-JSON.--citation-style: citation style for compact bibliography (e.g., vancouver, apa, ieee, chicago). Default: vancouver.--citation-locale: locale for citation formatting (e.g., en-US, en-GB, de-DE). Default: en-US.
Output files per run (when validation succeeds) live under outputs/<project>/<query>/<timestamp>/ and use fixed filenames (identity comes from the folder path):
deepsearch.json: raw markdown + original citations + metadata.deepsearch_structured.json: parsed/validated DeepSearch report (source_id-only citations).deepsearch_container.json: structured report + citation map (CSL-JSON keyed by source_id) + stats + compact bibliography strings.deepsearch_extracted_debug.json: optional debug dump when--debug-extractionis used.
When --resolve-citations is enabled, the container JSON includes human-readable compact reference strings alongside CSL-JSON:
python scripts/run_deepsearch.py \
--genes TMEM14E \
--context "cellular function" \
--resolve-citations \
--citation-style apa \
--citation-locale en-GBThe container will include a compact_bibliography field:
{
"compact_bibliography": {
"entries": [
"[1] Author, A., & Author, B. (2024). Paper title. Journal Name, 123(4), 567-890.",
"[2] Smith, J. et al. (2023). Another paper. Nature, 456, 123-456."
],
"style": "apa",
"locale": "en-GB",
"renderer": "citeproc-py"
}
}Supported styles: vancouver, apa, ieee, chicago, and others supported by citeproc-py. See docs/compact-references.md for details.
from langpa.graphs import WorkflowGraph, GraphNode, build_graph_agent, GraphDependencies
graph = WorkflowGraph(
name="triage",
entrypoint="collect",
nodes=[
GraphNode(id="collect", description="collect context", service="collect_service", next=["summarize"]),
GraphNode(id="summarize", description="summarize findings", service="summary_service"),
],
)
agent = build_graph_agent()
result = agent.run_sync(
"pick next node",
deps=GraphDependencies(graph=graph),
# optional additional instructions/payload
)The pydantic-ai agent validates all outputs against GraphNode, while dependency injection hands it the validated WorkflowGraph for safe routing.
from jsonschema import validate
from langpa.schemas import load_schema
schema = load_schema("workflow_output.schema.json")
payload = {
"status": "completed",
"summary": "Gathered literature and synthesized insights.",
"actions": [{"name": "deepsearch.query", "details": "Retrieved 25 documents"}],
}
validate(instance=payload, schema=schema)Schemas stay in JSON so downstream services (Python, JS, workflows) can share the same contract without importing Pydantic models.
- Python: 3.11+
- Dependencies: Managed via
uv sync --dev - API Keys: OpenAI + Anthropic keys for integration tests (hard fail if missing)
- Follow the rules in
CLAUDE.md(TDD-first, tests before code, dotenv usage) - Write failing tests, then implement the smallest fix
- Keep coverage β₯80% and never skip failing tests
- Run the full quality suite (Ruff, MyPy, pytest, docs) before pushing
- Unit Tests (
@pytest.mark.unit): no network, deterministic, fast - Integration Tests (
@pytest.mark.integration): real APIs, fail hard if env vars missing - Coverage: target β₯80%, monitored via the coverage badge (currently 94%)
- CI Policy: GitHub Actions runs only
uv run pytest -m unit; runuv run pytest -m integrationlocally with real API keys before pushing - Hooks:
.githooks/pre-commitruns lint, unit tests, and integration tests (skips integration if API keys missing)
# Run tests for both packages
uv run pytest langpa/tests -m unit # langpa unit tests
uv run pytest langpa_validation_tools/tests -m unit # validation tools unit tests
uv run pytest langpa/tests -m integration # langpa integration tests
uv run pytest langpa_validation_tools/tests -m integration # validation tools integration tests
# Run all unit tests
uv run pytest langpa/tests/unit langpa_validation_tools/tests/unit -m unit
# Code quality
uv run ruff check --fix langpa/src/ langpa/tests/
uv run ruff format langpa/src/ langpa/tests/
uv run mypy langpa/src/
# Docs
python scripts/check-docs.py
# Validation tools CLI
uv run langpa-validate --help
uv run langpa-validate compare --project my_project
uv run langpa-validate pipeline --project my_project --use-embeddingsMIT License - see LICENSE for details.