MCP server for the INDRA CoGEx biomedical knowledge graph.
Gives AI agents access to 20M+ nodes spanning genes, diseases, drugs, pathways, and their causal relationships — assembled from scientific literature and curated databases by INDRA (Integrated Network and Dynamical Reasoning Assembler).
Replaces 100+ individual function tools with 9 composable tools that expose the full power of the knowledge graph while maintaining safety and usability.
Public server — no installation needed:
{
"mcpServers": {
"indra-cogex": {
"url": "https://discovery.indra.bio/mcp"
}
}
}Add this to your MCP client config (Claude Desktop, Claude Code, Cursor, etc.) and start asking questions about genes, diseases, and drugs.
Local installation — for development or private Neo4j instances:
pip install git+https://github.com/gyorilab/indra_agent.git| Stdio (local) | HTTP (remote) | |
|---|---|---|
| Use case | Claude Desktop, Cursor, Claude Code | Deployed server, shared access |
| Config key | "command": "indra-agent" |
"url": "https://..." |
| Neo4j credentials | Required locally (env vars) | Configured server-side |
| Scaling | Single process | Multi-worker (gunicorn) |
For local MCP clients:
# Via console script
indra-agent
# Or via module
python -m indra_agent.mcp_serverClaude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"indra-cogex": {
"command": "indra-agent",
"env": {
"INDRA_NEO4J_URL": "bolt://your-server:7687",
"INDRA_NEO4J_USER": "neo4j",
"INDRA_NEO4J_PASSWORD": "your-password",
"MCP_ALLOWED_HOSTS": "localhost",
"MCP_ALLOWED_ORIGINS": "http://localhost"
}
}
}
}For network deployments:
indra-agent --http
indra-agent --http --host 0.0.0.0 --port 8000
indra-agent --http --stateful --streamingProduction (gunicorn):
gunicorn indra_agent.mcp_server.server:app \
--bind 0.0.0.0:8778 \
--worker-class uvicorn.workers.UvicornWorker \
-w 49 tools in two groups:
High-level graph navigation — most agents start here:
| Tool | Purpose |
|---|---|
ground_entity |
Natural language to CURIE with semantic filtering. Supports single term or batch mode. |
suggest_endpoints |
Given CURIEs, suggest reachable entity types and traversal functions |
call_endpoint |
Execute any of 100+ autoclient functions with auto-grounding, caching, and xref fallback |
get_navigation_schema |
Full edge map showing how entity types connect. Optional entity_type filter. |
batch_call |
Execute an endpoint for multiple entities in one call. Routes to native batch queries where available. |
Low-level Cypher access for complex queries:
| Tool | Purpose |
|---|---|
get_graph_schema |
Discover entity types, relationships, patterns |
execute_cypher |
Run arbitrary Cypher with parameterization |
validate_cypher |
Pre-flight safety validation |
enrich_results |
Add metadata at configurable disclosure levels |
The primary query tool supports several optimization parameters:
| Parameter | Type | Description |
|---|---|---|
endpoint |
str | Function name (e.g., "get_diseases_for_gene") |
kwargs |
str | JSON arguments. Entities as CURIE tuples {"gene": ["HGNC", "6407"]} or natural language {"gene": "LRRK2"} |
auto_ground |
bool | Auto-ground string params to CURIEs (default: True) |
disclosure_level |
str | Enrich results: "standard" (~250 tokens/item), "detailed" (~400), "exploratory" (~750) |
offset |
int | Pagination offset. Use next_offset from previous response to continue. |
limit |
int | Max items per page. Auto-truncates to ~20k tokens if exceeded. |
fields |
list | Project results to specific keys (e.g., ["db_ns", "db_id", "name"]). Reduces token usage. |
estimate |
bool | Probe query cost before fetching. Returns count, token estimate, available fields, sample. |
sort_by |
str | "evidence" (most-validated first) or "name" (alphabetical). Applied before pagination. |
include_navigation |
bool | Append suggested next traversal steps to results. |
| Parameter | Type | Description |
|---|---|---|
endpoint |
str | Function name (e.g., "get_diseases_for_gene") |
entity_param |
str | Which parameter to batch over (e.g., "gene") |
entity_values |
list | Entity values to query (e.g., ["SIRT3", "PRKN", "MAPT"]). Auto-grounded. |
fields |
list | Project results to specific keys per item |
max_concurrent |
int | Max parallel queries (default: 10) |
merge_strategy |
str | "keyed" (default): {entity: results} dict. "flat": concatenated list. |
For get_drugs_for_target and get_targets_for_drug, batch_call automatically routes to native batch functions that use a single Neo4j WHERE IN query instead of N separate queries.
flowchart LR
Agent["Agent"]
subgraph Gateway["Gateway Tools"]
direction TB
Ground["ground_entity<br/><i>NL to CURIE</i>"]
Suggest["suggest_endpoints<br/><i>navigation hints</i>"]
Call["call_endpoint<br/><i>100+ functions</i>"]
NavSchema["get_navigation_schema<br/><i>edge map</i>"]
Batch["batch_call<br/><i>parallel execution</i>"]
end
subgraph Core["Query Infrastructure"]
direction TB
Schema["get_graph_schema<br/><i>progressive discovery</i>"]
Execute["execute_cypher<br/><i>arbitrary queries</i>"]
Validate["validate_cypher<br/><i>safety checks</i>"]
Enrich["enrich_results<br/><i>metadata layers</i>"]
end
subgraph Cache["Cache Layer"]
direction TB
DiskCache["diskcache<br/><i>LRU + TTL</i>"]
Coalesce["request coalescing<br/><i>dedup in-flight</i>"]
end
subgraph Data["Data Layer"]
direction TB
Neo[("Neo4j<br/>20M nodes")]
GILDA["GILDA API"]
end
Agent -->|"ground terms"| Ground
Agent -->|"explore edges"| Suggest
Agent -->|"call functions"| Call
Agent -->|"batch queries"| Batch
Agent -->|"discover schema"| Schema
Agent -->|"run Cypher"| Execute
Ground --> GILDA
Suggest --> Call
Call --> DiskCache
Batch --> Call
NavSchema --> Call
DiskCache -->|"miss"| Neo
DiskCache -->|"hit"| Call
Coalesce --> DiskCache
Schema --> Neo
Execute --> Validate
Validate --> Neo
classDef agent fill:#2C3E50,stroke:#1A252F,stroke-width:3px,color:#ECF0F1,font-weight:bold
classDef gateway fill:#9B59B6,stroke:#8E44AD,stroke-width:2px,color:#FFFFFF
classDef core fill:#3498DB,stroke:#2980B9,stroke-width:2px,color:#FFFFFF
classDef cache fill:#27AE60,stroke:#1E8449,stroke-width:2px,color:#FFFFFF
classDef data fill:#7F8C8D,stroke:#5D6D7E,stroke-width:2px,color:#ECF0F1
class Agent agent
class Ground,Suggest,Call,NavSchema,Batch gateway
class Schema,Execute,Validate,Enrich core
class DiskCache,Coalesce cache
class Neo,GILDA data
Most agents use Gateway Tools — ground natural language to CURIEs, then call pre-built functions. When predefined functions cannot express the query (graph algorithms, multi-hop traversals, conditional aggregations), agents drop down to Query Infrastructure.
Parameter semantics encode entity type, eliminating cross-type ambiguity:
ground_entity(term="ALS", param_name="disease")
# → MESH:D000690 (Amyotrophic Lateral Sclerosis)
ground_entity(term="ALS", param_name="gene")
# → HGNC:396 (SOD1, formerly ALS1)
ground_entity(term="aspirin", param_name="drug")
# → CHEBI:15365Supported param_name filters: disease, gene, drug, pathway, tissue, cell_line, cell_type, side_effect.
Batch mode grounds multiple terms in one call:
ground_entity(terms=["SIRT3", "PRKN", "MAPT"], param_name="gene")
# → {mappings: {"SIRT3": {curie, name, score}, ...}, failed: [...]}Organism context accepts common names or taxonomy IDs:
ground_entity(term="LRRK2", organism="human") # resolved to 9606
ground_entity(term="LRRK2", organism="9606") # passthrough
ground_entity(term="Trp53", organism="mouse") # resolved to 10090When a grounded query returns zero results, call_endpoint automatically looks up equivalent identifiers via xref relationships in the graph and retries. This handles cases where GILDA grounds to one namespace (e.g., MESH) but the relevant data is indexed under another (e.g., DOID).
All call_endpoint results are cached via a diskcache.FanoutCache:
- Cross-process safe — SQLite backend, works with gunicorn multi-worker
- LRU eviction — configurable max size (default 2GB)
- Per-key TTL — default 1 hour, schema cached for 24 hours
- Request coalescing — concurrent identical requests share a single Neo4j query
- Cache stores raw results; sort, field projection, and pagination are applied post-cache
- Validation layer prevents all write/mutate operations (DELETE, CREATE, MERGE, SET, REMOVE, DROP, DETACH)
- Parameterized queries prevent injection attacks
- Neo4j
execute_read()enforces read-only semantics at the driver level
Large result sets are automatically truncated with continuation hints:
{
"results": [...],
"pagination": {
"total": 1500,
"returned": 127,
"has_more": true,
"next_offset": 127
}
}Use estimate=True to probe query cost before fetching:
call_endpoint("get_drugs_for_target", '{"target": "EGFR"}', estimate=True)
# → {result_count: 78, token_estimate_full: 15200, fields_available: [...], sample: [...]}Then fetch with projection:
call_endpoint("get_drugs_for_target", '{"target": "EGFR"}', fields=["db_ns", "db_id", "name"], limit=50)export INDRA_NEO4J_URL="bolt://localhost:7687"
export INDRA_NEO4J_USER="neo4j"
export INDRA_NEO4J_PASSWORD="your-password"Or configure in ~/.config/indra/config.ini (the standard INDRA config file).
| Variable | Required | Description |
|---|---|---|
MCP_ALLOWED_HOSTS |
Yes | Comma-separated allowed hosts (e.g., localhost,discovery.indra.bio) |
MCP_ALLOWED_ORIGINS |
Yes | Comma-separated allowed origins (e.g., http://localhost:3000,https://discovery.indra.bio) |
| Variable | Default | Description |
|---|---|---|
INDRA_CACHE_DIR |
~/.cache/indra_cogex_mcp |
Cache directory path |
INDRA_CACHE_SIZE_MB |
2048 |
Max cache size in MB (LRU eviction beyond this) |
INDRA_CACHE_TTL |
3600 |
Default TTL in seconds |
INDRA_CACHE_SHARDS |
4 |
FanoutCache shards (higher = better concurrency, more file handles) |
| Variable | Default | Description |
|---|---|---|
MCP_HOST |
0.0.0.0 |
Host to bind in HTTP mode |
MCP_PORT |
8000 |
Port to bind in HTTP mode |
git clone https://github.com/gyorilab/indra_agent.git
cd indra_agent
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run specific test file
pytest tests/mcp_server/test_gateway_tools.py -vindra_cogex— INDRA CoGEx knowledge graph clientmcp>=1.2.0— Model Context Protocol SDK (requires 1.2.0+ for transport security)gilda— Biomedical entity groundingdiskcache>=5.6— Persistent caching with LRU evictionpydantic>=2.0— Data validationclick>=8.0— CLI frameworkstarlette>=0.27.0— ASGI frameworkuvicorn>=0.20.0— ASGI server (HTTP mode)jinja2>=3.0.0— Template engine
BSD-2-Clause