Adding data sources to the MCP server goes far beyond simply bundling data or providing basic search functionality. The combination of MCP resources, tools, and Claude's context window enables sophisticated AI-driven interactions that would be impossible with traditional APIs or databases.
- User must know exact field names and values
- Rigid query syntax required
- No semantic understanding
- No context awareness
- No intelligent recommendations
- Limited to exact matches
- Natural language queries
- Semantic understanding and synonym handling
- Context-aware reasoning
- Intelligent recommendations
- Conversational refinement
- Cross-source comparison and reasoning
What it enables:
- Claude can read the data source configurations (
juicebox://datasource/4dn,juicebox://datasource/encode) - Claude understands the schema: what columns exist, what they mean
- Claude can explain available metadata to users
- Claude can suggest search strategies based on available fields
Example interaction:
User: "What can I search for?"
Claude: [Reads resources, understands schema]
"You can search by Assembly (e.g., hg38, mm10), Biosource/Biosample
(e.g., cell types), Lab, Experiment type, and more. Would you like
me to show you specific examples?"
What it enables:
- Claude sees all search results in its context window
- Can compare maps across sources simultaneously
- Can reason about relationships between datasets
- Can make recommendations based on full context
Example capabilities:
User: "Show me maps for human cells"
Claude: [Understands "human" = "Homo sapiens" = matches "human" in Biosource]
[Searches across all sources]
[Returns results with explanation]
User: "Find mouse genome data"
Claude: [Understands "mouse" = "Mus musculus" = "mm10" or "mm39" assemblies]
[Searches for both terms]
User: "Show me recent human Hi-C maps"
Claude: [Understands "recent" might refer to Publication dates or newer datasets]
[Can reason about what "recent" means in this context]
[Searches and explains results]
What it enables:
- Users can iteratively refine searches through conversation
- Claude remembers previous search results
- Claude can suggest refinements
- No need to construct complex queries upfront
Example flow:
User: "Show me human Hi-C maps"
Claude: [Searches, finds 50+ results]
"I found 52 human Hi-C maps. Would you like to narrow this down?"
User: "Just from ENCODE"
Claude: [Refines search to ENCODE source only]
"Found 23 ENCODE human maps. Here are the top results..."
User: "Only hg38"
Claude: [Further refines by Assembly]
"Found 15 ENCODE human hg38 maps..."
User: "What about K562 cells specifically?"
Claude: [Refines by Biosample]
"Found 8 K562 cell line maps matching your criteria..."
What it enables:
- Claude can help users choose between similar maps
- Claude can explain differences
- Claude can recommend based on use case
- Claude can warn about potential issues
Example interactions:
User: "What's the difference between map #3 and map #5?"
Claude: [Has both maps in context from previous search]
"Map #3 is from Lab A using Protocol X, while Map #5 is from Lab B
using Protocol Y. Map #3 has higher resolution but Map #5 has better
normalization. For your analysis of chromatin loops, I'd recommend
Map #5 because..."
User: "I need a map for studying enhancer-promoter interactions"
Claude: [Understands the use case]
[Reviews available maps in context]
"For enhancer-promoter interactions, I recommend Map #7 because it:
1. Uses high-resolution Hi-C protocol
2. Has good coverage of regulatory regions
3. Includes both contact and control maps
Would you like me to load it?"
User: "Load map #12"
Claude: [Checks map details]
"I notice Map #12 uses an older assembly (hg19) and the data is
from 2015. For comparison with recent datasets, you might want to
consider Map #15 which uses hg38. Should I load #12 anyway?"
What it enables:
- Claude can compare maps across 4DN and ENCODE
- Claude can suggest complementary datasets
- Claude can identify gaps or overlaps
- Claude can recommend best source for specific needs
Example:
User: "Compare what's available in 4DN vs ENCODE for human cells"
Claude: [Searches both sources]
[Compares results in context]
"4DN has 28 human cell maps focusing on 4D nucleome projects,
while ENCODE has 15 maps with broader cell type coverage.
4DN tends to have more recent data and specialized protocols,
while ENCODE has better standardization. For your specific
research on [topic], I'd recommend starting with..."
What it enables:
- Users can be imprecise and Claude will interpret intent
- Claude can ask clarifying questions
- Claude can suggest alternatives
- No need for exact terminology
Examples:
User: "Show me the best human maps"
Claude: [Interprets "best" based on context]
"I found several high-quality human Hi-C maps. 'Best' could mean:
- Highest resolution (Map #3)
- Most recent (Map #7)
- Most commonly used (Map #12)
Which criteria matter most for your work?"
User: "I need something similar to what I saw before"
Claude: [Remembers previous conversation context]
[Searches for similar maps based on previous selection]
"Based on the GM12878 map you loaded earlier, here are similar
lymphoblastoid cell line maps..."
What it enables:
- Claude can apply biological knowledge to searches
- Claude understands relationships (e.g., cell type hierarchies)
- Claude can suggest related experiments
- Claude can explain biological significance
Example:
User: "What maps are good for studying TADs?"
Claude: [Applies domain knowledge about TADs]
"For Topologically Associating Domains (TADs), you'll want:
- High-resolution Hi-C data (5kb or better)
- Good signal-to-noise ratio
- Multiple replicates if available
Here are the best options from the available maps..."
- Understands meaning, not just keywords
- Handles synonyms and related terms
- Interprets context and intent
- Suggests relevant filters based on results
- Explains why certain filters might help
- Combines multiple criteria intelligently
- Explains why results match the query
- Highlights relevant metadata
- Points out interesting patterns
- Suggests related searches
- Recommends complementary maps
- Warns about potential issues
- Natural back-and-forth refinement
- Remembers context across turns
- Adapts to user's level of expertise
- Can plan complex searches
- Can combine multiple criteria
- Can reason about trade-offs
User: "I'm studying chromatin organization in cancer cells. What should I look at?"
Claude: [Applies domain knowledge]
"For cancer chromatin organization, you'll want Hi-C maps from
cancer cell lines. Let me search for relevant datasets..."
[Searches: "cancer cells"]
"Found 12 maps from cancer cell lines. The most commonly studied
are K562 (chronic myeloid leukemia) and HeLa (cervical cancer)."
User: "What about comparing normal vs cancer?"
Claude: [Understands comparison need]
[Searches for both normal and cancer cell maps]
"Great idea! I found 8 cancer cell maps and 15 normal cell maps
with matching experimental conditions. For a proper comparison,
I'd recommend:
- Cancer: K562 map #5 (high quality, recent)
- Normal: GM12878 map #12 (same lab, similar protocol)
These are well-matched for comparison. Should I load both?"
User: "Actually, I need something more recent"
Claude: [Refines search, considers publication dates]
"The most recent cancer cell map is #8 from 2023, and the most
recent normal cell map is #15 from 2024. However, they use
different protocols, so comparison might be challenging.
Would you prefer recent data or matched protocols?"
- MCP Resources: Provide discoverable schema that Claude can read and understand
- MCP Tools: Enable Claude to execute searches and retrieve data
- Context Window: Allows Claude to see all results and reason about them
- Natural Language Processing: Claude's ability to understand intent and semantics
- Domain Knowledge: Claude's training includes biological knowledge
- Search Quality: The current filter is keyword-based; semantic search could be enhanced
- Data Freshness: Results depend on when data sources were last fetched
- Context Limits: Very large result sets might exceed context window
- Network Dependency: Requires network access to fetch data sources
Potential improvements that would leverage AI capabilities even more:
- Semantic Search: Use embeddings to find semantically similar maps
- Query Expansion: Automatically expand queries with synonyms
- Result Ranking: Use ML to rank results by relevance
- Personalization: Learn user preferences over time
- Proactive Discovery: Suggest maps based on user's research interests
Adding data sources via MCP isn't just about bundling data—it's about creating an intelligent, conversational interface to your data. The combination of:
- MCP Resources (discoverable schema)
- MCP Tools (executable actions)
- AI Context Window (full visibility)
- Natural Language Understanding (semantic interpretation)
...enables interactions that are:
- More natural (conversational vs. query syntax)
- More intelligent (context-aware vs. exact matches)
- More helpful (recommendations vs. just results)
- More flexible (vague queries vs. rigid requirements)
This is the power of AI-assisted data access through MCP.