Skip to content

leonardodalinky/SciDER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

197 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SciDER: Scientific Data-centric End-to-end Researcher

Table of Contents

Installation

You can install the project using pip:

# from git
pip install git+https://github.com/leonardodalinky/SciDER
# locally
pip install -e .

Example Usage:

from scider.default.models import register_gemini_medium_high_models
from scider.workflows import run_full_workflow

# 1. Register the models you want to use
register_gemini_medium_high_models()
# 2. Run the full workflow
wf = run_full_workflow(
    data_path="/path/to/data/",
    workspace_path="/path/to/workspace/",
    user_query="Discover insights about RAG",
)
# 3. The final state after the workflow
print(wf.final_summary)

Workflows

SciDER provides six workflows in scider.workflows:

Workflow Description
IdeationWorkflow Generate research ideas from literature search.
DataWorkflow Analyze a dataset and produce a structured summary.
HypoDataWorkflow Generate synthetic data from a feature description, then analyze it.
ExperimentWorkflow Implement and run an experiment given a data summary.
FullWorkflow Data analysis -> experiment execution.
FullWorkflowWithIdeation Ideation -> (optional) data analysis -> (optional) experiment. Each phase can be skipped via flags.

Each workflow has a class form (FooWorkflow) and a convenience function (run_foo_workflow).

Configuration

The project is configured using environment variables. You can set these variables in a .env file at the root of the project. A template .env.template is provided for reference.

Also, you can set environment variables directly in your shell or terminal session.

Web UI

The web UI is a Streamlit application. Deploy it using the Dockerfile at the project root.

  1. Create a .env file at the project root (copy from .env.template) and fill in your API keys.

  2. Build the image:

docker build -t scider:latest .
  1. Run the container:
docker run -d \
  --name scider \
  -p 7860:7860 \
  --env-file .env \
  scider:latest
  1. Access the UI at http://localhost:7860.

UI Example:

Launch Workflow Case Study
Select workflow type and Get started Case study selection and Full workflow

Coding Backend

The experiment agent delegates code implementation to a coding subagent. Three backends are available, selectable via the CODING_AGENT_VERSION environment variable:

Backend Value Description
Claude Agent SDK (default) claude_sdk Delegates to Claude Agent SDK. Requires pip install claude-agent-sdk and ANTHROPIC_API_KEY.
Native native SciDER's built-in coding agent. Uses the experiment_coding model role with any LiteLLM-supported provider. No external dependencies. Pick this if you want a non-Claude provider (Gemini, GPT, etc.).
OpenHands openhands Delegates to OpenHands sandbox. Requires SCIDER_ENABLE_OPENHANDS=1.

Set CODING_AGENT_VERSION in .env to switch backends.

Skills

Skills are markdown files with YAML frontmatter that inject domain-specific guidance into an agent's system prompt. Modeled after Claude Code, they can be either preloaded (full content injected) or on-demand (listed by name, loaded via the Skill tool when needed).

Discovery

On startup, SciDER walks up from the workspace directory to the filesystem root (plus ~), scanning .scider/skills/ at each level. Closer directories override identically-named skills from parents. Supported layouts:

.scider/skills/
├── my-skill/
│   ├── SKILL.md              # directory format — can bundle reference files
│   └── references/
│       └── usage.md
└── another.md                # single-file format

Frontmatter fields:

---
name: my-skill
description: One-line summary shown in the on-demand listing.
allowed_agents: [data, experiment]   # omit → available to all agents
preload_for: [data]                  # omit → on-demand only (must be called via Skill tool)
---

For directory-format skills, SciDER automatically injects Base directory for this skill: <absolute path> at the top of the content so the model can resolve relative file references (e.g. references/usage.md) via the Read tool.

Dynamic Registration

You can also register skills programmatically, overriding frontmatter fields:

from scider.core.skills import SkillRegistry

# Single directory
SkillRegistry.instance().register_skill_dirs(
    "path/to/my-skill",
    allow=["experiment", "native_coding"],
    preload_for=["experiment"],
)

# Multiple directories at once
SkillRegistry.instance().register_skill_dirs(
    ["path/to/skill-a", "path/to/skill-b"],
    allow=["data"],
)

allow restricts which agents see the skill; preload_for controls which agents get the full content in their system prompt. Both accept a Literal of the valid agent names (ideation, data, experiment, experiment_coding, native_coding, critic, paper_search) for static type checking. Passing None for either keeps the value from the SKILL.md frontmatter.

Development Guide

First, install pre-commit:

pip install pre-commit

Install pre-commit to format code:

pre-commit install

Then, copy .env.template to .env and fill in the necessary values.

Finally, run the following command to sync dependencies:

# for cpu
uv sync --extra cpu

# for mac
uv sync --extra mac

# for gpu
uv sync --extra cu128

# streamlit client
uv sync --extra streamlit

Run tests with:

uv run pytest tests/

Benchmarks

See BENCHMARKS for details on the benchmarks we have conducted to evaluate SciDER's performance.

Feedback and Contributions

We welcome contributions to improve SciDER. Please open an issue or submit a pull request on our GitHub repository.

Also, any feedback on the project is greatly appreciated. You can fill the feedback form to rate this app and help to improve the project.

About

🍎SciDER: Scientific Data-centric End-to-end Researcher🍹

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages