Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions config/defaults/presets.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# DeepSearch Configuration Presets
# Default configurations for various research scenarios
#
# Each preset combines provider, model, parameters, timeout, and prompt template
# for optimized performance in specific use cases.

presets:
perplexity-sonar-pro:
description: "Academic research optimized for Perplexity with high reasoning effort"
provider: perplexity
model: sonar-reasoning-pro
timeout: 180
prompt_template: gene_analysis_academic
provider_params:
return_citations: true
reasoning_effort: high # low, medium, high
search_recency_filter: month # hour, day, week, month, year
search_domain_filter:
- "pubmed.ncbi.nlm.nih.gov"
- "ncbi.nlm.nih.gov/pmc/"
- "www.ncbi.nlm.nih.gov"
- "europepmc.org"
- "biorxiv.org"
- "nature.com"
- "cell.com"
- "science.org"
system_prompt: null # Will be set dynamically with JSON schema

perplexity-deep-research_JSON:
description: "Academic research optimized for Perplexity deep search, returning structured results."
provider: perplexity
model: sonar-deep-research
timeout: 500
prompt_template: gene_analysis_deep_research
provider_params:
return_citations: true
# reasoning_effort: high # low, medium, high. Not applicable to deep-research
search_recency_filter: null # hour, day, week, month, year
search_domain_filter:
- "pubmed.ncbi.nlm.nih.gov"
- "ncbi.nlm.nih.gov/pmc/"
- "www.ncbi.nlm.nih.gov"
- "europepmc.org"
- "biorxiv.org"
- "nature.com"
- "cell.com"
- "science.org"
system_prompt: Respond with JSON conforming to the provided schema - no prose, no markdown. If you are unable to respond with JSON alone, make sure the text includes code-fenced schema compliant JSON covering all report content. If that is not possible, ensure the report includes tables that unambiguously capture the elements and their associations that would be captured if returning schema compliant JSON. Provided schema ```JSON {schema}```
# Example additional preset for fast analysis
perplexity-fast:
description: "Fast analysis with lower reasoning effort for quick experiments"
provider: perplexity
model: sonar-small
timeout: 60
prompt_template: gene_analysis_structured
provider_params:
return_citations: true
reasoning_effort: low
search_recency_filter: week
search_domain_filter:
- "pubmed.ncbi.nlm.nih.gov"
- "ncbi.nlm.nih.gov/pmc/"
- "nature.com"
system_prompt: null
120 changes: 120 additions & 0 deletions config/defaults/templates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# DeepSearch Prompt Templates
# Reusable prompt templates for different analysis approaches and provider optimizations
#
# Templates use {genes} and {context} placeholders that will be substituted at runtime

templates:
gene_analysis_academic:
description: "Academic research-focused analysis with comprehensive literature review strategy"
optimized_for: [perplexity, consensus]
supports_json_schema: true
schema_instruction: |
You are an expert biologist. Analyze the provided genes in the given biological context.

CRITICAL: Respond ONLY with valid JSON that exactly follows this schema structure:
{schema}

Do not include any prose, markdown, explanatory text, or <think> tags. Only the JSON structure.
template: |
Perform comprehensive literature analysis for the following gene list in the
specified biological context.

**Gene List**: {genes}

**Biological Context**: {context}

**Analysis Strategy**:
1. Search current scientific literature for functional roles of each gene in the input list
2. Identify clusters of genes that act together in pathways, processes, or cellular states
3. Treat each cluster as a potential gene program within the list
4. Interpret findings in light of both normal physiological roles and disease-specific alterations
5. Prioritize well-established functions with strong literature support, but highlight emerging
evidence if contextually relevant

**Guidelines**:
* Anchor all predictions in either the normal physiology and development of the cell type and
tissue specified in the context OR the alterations and dysregulations characteristic of the
specified disease
* Connect gene-level roles to program-level implications
* Consider gene interactions, regulatory networks, and pathway dynamics
* Highlight cases where multiple genes collectively strengthen evidence
* Ensure all claims are backed by experimental evidence with proper attribution

Provide a structured analysis identifying biological programs and their predicted cellular
impacts within the given context.

gene_analysis_structured:
description: "Structured analysis optimized for clear, systematic gene program identification"
optimized_for: [openai, edison]
supports_json_schema: true
schema_instruction: |
You are an expert biologist conducting systematic gene analysis.

Please structure your response as JSON following this exact format:
{schema}

Ensure all required fields are populated with accurate biological data and evidence.
template: |
Analyze the following genes in the specified biological context and provide
structured findings.

**Genes to Analyze**: {genes}

**Context**: {context}

**Required Analysis**:
1. For each gene, identify its primary biological functions
2. Group genes by common pathways or biological processes
3. Identify potential gene programs (clusters of functionally related genes)
4. Assess the relevance to the specified biological context

**Output Requirements**:
- Focus on well-established, peer-reviewed findings
- Prioritize recent research when available
- Include pathway and process associations
- Highlight gene-gene interactions where relevant

Provide a systematic analysis organized by biological programs and functional clusters.

gene_analysis_deep_research:
description: "Deep research analysis optimized for Perplexity deep research models with comprehensive findings"
optimized_for: [perplexity]
supports_json_schema: true
schema_instruction: |
You are an expert biologist conducting comprehensive deep research on genes in biological context.

Please format your detailed analysis as JSON following this structure:
{schema}

Include thorough findings with proper citations and evidence. Ensure all analysis is backed by current scientific literature.
template: |
Conduct comprehensive deep research and literature analysis for the following gene list within the specified biological context.

**Gene List**: {genes}

**Biological Context**: {context}

**Deep Research Strategy**:
1. Perform extensive literature search across current scientific databases
2. Analyze functional roles and molecular mechanisms of each gene
3. Identify functional clusters and biological programs within the gene list
4. Examine gene interactions, regulatory networks, and pathway dynamics
5. Assess both normal physiological roles and disease-associated alterations
6. Prioritize well-established functions while highlighting emerging evidence

**Research Guidelines**:
* Conduct thorough investigation of recent publications and reviews
* Focus on experimental evidence and peer-reviewed findings
* Consider tissue-specific and context-dependent gene functions
* Analyze gene co-expression patterns and functional annotations
* Include both direct and indirect gene interactions
* Provide detailed citations for all claims and findings

**Output Requirements**:
* Comprehensive analysis with detailed biological insights
* Strong evidence base with proper attribution
* Clear identification of biological programs and functional clusters
* Assessment of predicted cellular impacts within the given context
* Integration of findings into coherent biological narrative

Provide a thorough, evidence-based analysis identifying biological programs and their predicted cellular impacts within the specified context.
71 changes: 71 additions & 0 deletions config/schemas/preset-schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# JSON Schema for validating DeepSearch preset configurations
$schema: http://json-schema.org/draft-07/schema#
title: DeepSearch Preset Configuration Schema
description: Schema for validating DeepSearch configuration presets

type: object
properties:
presets:
type: object
patternProperties:
"^[a-zA-Z0-9_-]+$": # Preset names: alphanumeric, underscore, hyphen
type: object
required:
- description
- provider
- model
- timeout
- prompt_template
- provider_params
properties:
description:
type: string
minLength: 10
description: "Human-readable description of the preset"

provider:
type: string
enum: [perplexity, openai, anthropic, edison, consensus]
description: "Research provider name"

model:
type: string
minLength: 1
description: "Model name for the provider"

timeout:
type: integer
minimum: 30
maximum: 600
description: "Request timeout in seconds"

prompt_template:
type: string
minLength: 1
description: "Name of prompt template to use"

provider_params:
type: object
description: "Provider-specific parameters"
properties:
return_citations:
type: boolean
reasoning_effort:
type: string
enum: [low, medium, high]
search_recency_filter:
type: [string, "null"]
enum: [hour, day, week, month, year, null]
search_domain_filter:
type: array
items:
type: string
format: hostname
system_prompt:
type: [string, "null"]
additionalProperties: true # Allow provider-specific params
additionalProperties: false
additionalProperties: false
required:
- presets
additionalProperties: false
49 changes: 49 additions & 0 deletions config/schemas/template-schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# JSON Schema for validating DeepSearch template configurations
$schema: http://json-schema.org/draft-07/schema#
title: DeepSearch Template Configuration Schema
description: Schema for validating DeepSearch prompt templates

type: object
properties:
templates:
type: object
patternProperties:
"^[a-zA-Z0-9_-]+$": # Template names: alphanumeric, underscore, hyphen
type: object
required:
- description
- optimized_for
- supports_json_schema
- template
properties:
description:
type: string
minLength: 10
description: "Human-readable description of the template"

optimized_for:
type: array
items:
type: string
enum: [perplexity, openai, anthropic, edison, consensus]
minItems: 1
description: "List of providers this template is optimized for"

supports_json_schema:
type: boolean
description: "Whether this template supports structured JSON output"

schema_instruction:
type: string
minLength: 20
description: "Custom instruction for schema compliance, use {schema} placeholder"

template:
type: string
minLength: 50
description: "The prompt template text with {genes} and {context} placeholders"
additionalProperties: false
additionalProperties: false
required:
- templates
additionalProperties: false
3 changes: 3 additions & 0 deletions config/user/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# User configuration overrides - not tracked in git
*.yaml
*.yml
1 change: 1 addition & 0 deletions inputs/glioblastoma/test_context.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
malignant glioblastoma cells
1 change: 1 addition & 0 deletions inputs/glioblastoma/test_genes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"CFAP43", "NEGR1", "DNAH12", "LRRC2", "VAT1L", "ZNF804B", "RBMS3", "SLC14A1", "GABRA5", "ZBBX", "ADAMTS18", "CFAP52", "GRM1", "MAP3K19", "FHAD1", "TCTEX1D1", "DNAAF1", "DCDC2", "AC005165.1", "COL21A1", "PKHD1", "ZNF521", "EPB41L4B", "ERICH3", "PLAGL1", "EXPH5", "SHISAL2B", "SATB1-AS1", "RERGL", "FRMPD2", "TOGARAM2", "AP003062.2", "BMP6", "NRG3", "CFAP61", "FAM81B", "SLC47A2", "TMEM232", "NWD2", "AC109466.1", "GABRG3", "DTHD1", "COL13A1", "COL23A1", "CFAP73", "RFTN1", "FYB2", "POSTN", "AL513323.1", "BANK1", "CHD5", "THBS1", "ADCY8", "ADGB", "AFF2", "DRC1", "CFAP206", "CFAP47", "PPM1H", "KIAA2012", "MAP7", "KSR2", "DNAH5", "LYPD6B", "WSCD2", "CACNA2D1", "LRRIQ1", "CPNE4", "LINC01088", "SCIN", "PRMT8", "LINGO2", "CASC1", "CCDC170", "AC092110.1", "VWA3A", "CA10", "AC013470.2", "SLC22A3", "GRM4", "COL26A1", "CFAP221", "CFAP157", "TTC29", "C7orf57", "HMCN1", "CFAP100", "U91319.1", "RSPH1", "NAALAD2", "IL6R", "CDH7", "KCNJ3", "AL356108.1"
16 changes: 16 additions & 0 deletions planning/backwards_compatibility_bug_ds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

scripts/run_deepsearch.py
Comment on lines +457 to +459
```python

# Backward compatibility
service = DeepSearchService(preferred_provider=args.preferred_provider, **config_overrides)
```

Copilot AI
2 minutes ago
When using backward compatibility mode (no preset specified) and providing config_overrides, the overrides are applied but there's a logical issue: if args.preferred_provider is specified along with args.model or provider parameter overrides, the initialization on line 458 passes both preferred_provider and **config_overrides.

Looking at the DeepSearchService code, when preset is None, it loads the default "perplexity-sonar-pro" preset and only applies the preferred_provider override if it's specified (line 55-57 in deepsearch_service.py). However, the **config_overrides passed here are ignored in backward compatibility mode because the service doesn't merge them.

This means that in backward compatibility mode, --model, --reasoning-effort, and --search-recency arguments will be silently ignored, which could be confusing for users. Either document this limitation or ensure config_overrides are also applied in backward compatibility mode.
Loading
Loading