Teaching AI Agents to Think Like Domain Experts

A backtesting framework + agent scaffolding for quantitative research

📖 Full Article • 🚀 Quick Start • 📚 Documentation • 🧪 Research Examples

What is this?

This is both a CLI-first backtesting library for cTrader AND an experiment in agent-driven research workflows.

The Core Library registers C# trading strategies, builds them with .NET SDK, runs backtests via cTrader CLI, parses results, and persists everything to DuckDB for analysis. Parameter sweeps, plugin architecture, resume/retry support—all the practical stuff you need.

The .cursor/ Framework teaches AI agents (like Cursor's Claude) to operate as domain researchers, not just code generators. Rules enforce research discipline (data validity gates, leakage checks), hooks provide runtime enforcement (workflow gates), and skills enable reusable capabilities (track management, checkpoints). Experiments get complete audit trails from hypothesis → decision.

In practice: validation gates catch a lot of “looks fine but is wrong” failures early (e.g. empty/shifted data windows that would otherwise produce phantom signals).

This isn't just about running backtests—it's about building institutional memory, enforcing research discipline, and scaling workflows across conversations and time.

Key Features

Core Backtesting Library

CLI-first Python harness for cTrader strategies (C# cBots)
Register → Build → Run workflow with deterministic artifact paths
Materialize .cbotset files from JSON parameter sets
Direct cTrader CLI invocation for backtests
Parse report.html → DuckDB schema (runs, trades, daily_metrics)
Plugin architecture for strategy-specific extensions
Parameter sweeps with parallel execution, resume, and retry

Agent Framework (`.cursor/` Setup)

Rules: Domain discipline (research gates, validation requirements, value-budget enforcement)
Hooks: Runtime enforcement (workflow gates, documentation requirements, command audit)
Skills: Reusable capabilities (track-router, checkpoint-writer, results-summarizer)
Experiments registry: Complete audit trail (plan.md → sanity.md → results.md → decision.md)
Methods registry: Self-updating playbook (METHODS.yml + USAGE_LOG.ndjson + auto-generated scorecard)
Track system: Context isolation across different research streams

What Makes This Different?

Traditional backtesting frameworks help you run code.

This framework helps agents think - with hard gates (data validity, leakage checks), institutional memory (methods registry), and artifact-driven workflows.

It's not just backtesting; it's a complete research loop that scales across conversations and time.

The Data Window Validity Gate alone has caught more silent failures than anything else—wrong timezones leading to empty data windows, features all zeros, models learning nonsense. Without this gate, you waste days chasing phantom signals.

See documentation/ARTICLE_v2.md for the full story of how this enables agents to run research loops autonomously.

Prerequisites

System Requirements

macOS (cTrader requirement - may work on Windows but untested)
Python 3.11+ (repo pins 3.13 in .python-version)
.NET SDK 6.0+ (for building cBots) - Download
cTrader installed (/Applications/cTrader.app on macOS) - Download

cTrader Account

cTrader account with backtesting access
Account credentials: CTID, Account ID, Auth Token
Get credentials from: cTrader → Tools → Settings → Account → Copy IDs

Optional

Cursor IDE (to use the .cursor/ agent framework) - Download
DuckDB CLI (for advanced queries) - Download

Installation

Step 1: Clone the repo

git clone <your-repo-url>
cd ctrader-orchestrator

Step 2: Install Python dependencies (recommended: `uv`)

This repo is a uv workspace (root pyproject.toml + uv.lock, with the installable package in core/).

From repo root:

uv venv env/.venv --clear
# Make uv happy from any subdirectory: uv expects a project env at `.venv/`.
ln -sfn env/.venv .venv
uv sync

Run the CLI (no activation required):

uv run python run_cli.py --help

Tip (Cursor agent usage): When running Python commands in this repo, prefer uv run ... so the agent uses the already-synced workspace environment (and avoids “missing module” errors).

Optional extras:

# tests
uv sync --extra dev

# richer terminal UI (Rich)
uv sync --extra tui

Details: docs/UV_WORKSPACE.md.

Step 2 (alternative): Install with venv + pip

python3 -m venv .venv
source .venv/bin/activate
pip install -e core

Step 3: Set up credentials

Create .env file in repo root:

CTRADER_DEFAULT_CTID=your_ctid_here
CTRADER_DEFAULT_ACCOUNT=your_account_id_here
CTRADER_DEFAULT_AUTH_TOKEN=your_auth_token_here

Security note: Never commit .env to git. It's already in .gitignore.

Notes:

The CLI loads .env automatically (see core/src/ctrader_orchestrator/core/env.py).
Credentials are profiled: scenarios default to profile=DEFAULT (see scenario-add and Scenario.profile).
Template: env/examples/env_credentials_only.example (non-secret).

Step 4: Verify installation

python run_cli.py --help

You should see the CLI help output. If you get an error, check that Python 3.11+ and .NET SDK are installed.

Quick Start: Your First Backtest

Let's run a complete backtest using the included SMA crossover example. Time: ~5 minutes

Where outputs go (artifacts root)

By default, this CLI writes all generated outputs under:

research/workspaces/research_1/artifacts/

You can override the location per command with --artifacts-root <path>.

For repeatable runs, set an explicit artifacts root once:

ARTIFACTS_ROOT="$PWD/research/workspaces/research_1/artifacts"

Step 1: Register a strategy

python run_cli.py strategy-add \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --source-dir research/shared/strategies/sma_crossover_v1/sma_crossover_v1 \
  --plugin-id sma_crossover

This registers the strategy and links it to the sma_crossover plugin for custom analysis.

Step 2: Build the strategy

python run_cli.py build-dotnet \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --csproj research/shared/strategies/sma_crossover_v1/sma_crossover_v1/sma_crossover_v1.csproj \
  --configuration Release \
  --tfm net6.0 \
  --output-name sma_crossover_v1.algo

Output: You'll see a build_id like build-20260119-abc123. Save this for the next step.

The .algo file is now under the artifacts root at:

research/workspaces/research_1/artifacts/builds/sma_crossover_v1/<build_id>/

Step 3: Add parameters and scenario

Parameters define your strategy settings (fast period, slow period, etc.):

python run_cli.py params-add \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --file research/shared/examples/sma_crossover_params_v1.json

Output: params_id like params-abc123. Save this.

Scenario defines the backtest window and market data:

python run_cli.py scenario-add \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --file research/shared/examples/sma_crossover_scenario_v1.json

Output: scenario_id like scenario-def456. Save this.

What's in these files?

sma_crossover_params_v1.json: FastPeriod=5, SlowPeriod=20, VolumeInUnits=1
sma_crossover_scenario_v1.json: NAS100_SB, M1 bars, 3-hour window (Jan 2, 2024)

Step 4: Run the backtest

python run_cli.py run-job \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --build-id <build_id_from_step_2> \
  --params-id <params_id_from_step_3> \
  --scenario-id <scenario_id_from_step_3> \
  --ctrader-bin /Applications/cTrader.app/Contents/MacOS/cTrader.Mac

Output: job_id like job-ghi789. Save this - you'll use it to query results.

The backtest is now running. Depending on data size, this takes 30 seconds to a few minutes.

Step 5: Query the results

View run summary:

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "SELECT * FROM runs WHERE job_id='<job_id>'"

View all trades:

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "
  SELECT 
    open_time, 
    close_time, 
    direction, 
    entry_price,
    exit_price,
    pnl 
  FROM trades 
  WHERE job_id='<job_id>' 
  ORDER BY open_time
"

Aggregate metrics:

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "
  SELECT 
    COUNT(*) as n_trades, 
    SUM(pnl) as total_pnl, 
    AVG(pnl) as avg_pnl,
    MIN(pnl) as worst_trade,
    MAX(pnl) as best_trade
  FROM trades 
  WHERE job_id='<job_id>'
"

Step 6: Inspect detailed results

python run_cli.py inspect-job --job-id <job_id>

This shows a formatted summary including:

Strategy configuration
Time window
Trade count and PnL breakdown
Plugin-specific analysis (if available)

Congratulations! You've run your first backtest and queried the results.

Next Steps: Parameter Sweeps

Want to test multiple parameter combinations? Parameter sweeps let you run batches of backtests with different settings.

Create a params file

Create my_sweep.jsonl (one JSON object per line):

{"FastPeriod": 5, "SlowPeriod": 20, "VolumeInUnits": 1}
{"FastPeriod": 10, "SlowPeriod": 30, "VolumeInUnits": 1}
{"FastPeriod": 15, "SlowPeriod": 45, "VolumeInUnits": 1}

Run the sweep

python run_cli.py run-sweep \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --build-id <build_id> \
  --scenario-id <scenario_id> \
  --params-jsonl my_sweep.jsonl \
  --ctrader-bin /Applications/cTrader.app/Contents/MacOS/cTrader.Mac \
  --parallel 2 \
  --resume \
  --continue-on-error

Options explained:

--parallel 2: Run 2 backtests simultaneously
--resume: Skip already-completed jobs (useful if sweep fails mid-run)
--continue-on-error: Don't stop if one job fails

Compare results

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "
  SELECT 
    r.job_id,
    r.params_json->>'FastPeriod' as fast_period,
    r.params_json->>'SlowPeriod' as slow_period,
    COUNT(t.trade_id) as n_trades,
    SUM(t.pnl) as total_pnl,
    AVG(t.pnl) as avg_pnl
  FROM runs r
  LEFT JOIN trades t ON r.job_id = t.job_id
  WHERE r.strategy_id = 'sma_crossover_v1'
  GROUP BY r.job_id, fast_period, slow_period
  ORDER BY total_pnl DESC
"

This shows which parameter combinations performed best.

The `.cursor` Agent Framework

What is this?

The .cursor/ directory contains rules, hooks, and skills that teach AI agents (like Cursor's Claude) to operate as domain researchers—not just code generators.

Key Components

.cursor/
├── rules/           # Domain discipline (research gates, validation)
│   ├── quant_research_loop.mdc
│   ├── 00-core.mdc
│   ├── 01-value-budget.mdc
│   └── ...
├── hooks/           # Runtime enforcement (workflow gates)
│   ├── gate_shell.py
│   ├── gate_readfile.py
│   └── ...
├── skills/          # Reusable capabilities
│   ├── track-router/
│   ├── track-checkpoint-writer/
│   └── ...
└── commands/        # Agent-invokable workflows
    └── research/

Example: Data Window Validity Gate

Rules in quant_research_loop.mdc enforce that before any experiment proceeds, the agent must verify:

Observation counts per window (mean/p1/p50/p99) - no empty data
Timestamp coverage (min/max within each window) - correct timezone, no gaps
Non-degeneracy (% zeros, #unique values, variance/IQR) - not all zeros or constants

Why this matters: Wrong timezone → empty data windows → features all zeros → model learns nonsense. The gate catches this immediately before you waste days chasing phantom signals.

Try it yourself

If you use Cursor IDE:

Open this repo in Cursor
Ask the agent: "Run an SMA crossover backtest and analyze the results"
Watch it use the CLI, check validation gates, and produce experiment artifacts

The agent will:

Use the track-router skill to isolate this work
Run the backtest via CLI
Parse results and check data validity
Create an experiment folder with plan.md, results.md, decision.md
Update the methods registry with usage logs

Workspaces vs experiments (important)

research/workspaces/<workspace>/...: execution/work areas for large outputs and tool/phase-specific research artifacts (often messy and big).
research/experiments/EXP-.../: repo-wide system-of-record experiment folders (plan/sanity/results/decision + RUN_METADATA.json) that link to workspace outputs and tmp/agent/<TRK>/... logs.

Rule of thumb: do heavy work in a workspace, and create an EXP-* folder whenever a run matters (decisions, comparisons, reproducibility).

Docs:

documentation/quant_research_loop_workflow.md
documentation/cursor_agent_setup.md

See documentation/ARTICLE_v2.md for the complete story of how this framework enables agents to run research loops autonomously.

Project Structure

ctrader-orchestrator/
├── .cursor/                        # Cursor agent framework (rules/hooks/skills/commands)
├── core/                           # Core Python library
│   ├── src/ctrader_orchestrator/  # CLI + adapters + plugins
│   │   ├── cli.py                 # Main CLI commands
│   │   ├── adapters/              # cTrader integration
│   │   ├── plugins/               # Strategy-specific extensions
│   │   └── store/                 # DuckDB persistence
│   └── documentation/              # Library docs, architecture
│       ├── LIBRARY.md             # Main library guide
│       └── ARCHITECTURE_PLUGINS.md
├── docs/                           # Track system (context/results/decisions/runbook)
│   ├── context/                    # Track registry + per-track state + checkpoints
│   └── results/                    # Per-track result summaries + command rollups
├── research/                       # Research artifacts
│   ├── experiments/                # Experiment folders
│   │   └── EXP-20260119-001/      # Example: plan, sanity, results, decision
│   ├── duckdb_ui/                  # Local web UI for browsing DuckDB files
│   ├── catalog/                    # Workspace discovery index (rebuildable)
│   ├── methods/                    # Methods registry
│   │   ├── METHODS.yml            # Method catalog
│   │   ├── USAGE_LOG.ndjson       # Append-only usage log
│   │   └── METHOD_SCORECARD.csv   # Auto-generated scorecard
│   └── shared/                     # Shared strategies and examples
│       ├── strategies/            # C# cBot source code
│       └── examples/              # Example params and scenarios
├── documentation/                  # Project-level docs
│   ├── ARTICLE_v2.md               # The main article (v2)
│   ├── ARTICLE.md                  # Legacy article (v1)
│   └── quant_research_agent_loop_spec.md
├── research/workspaces/            # Execution workspaces (legacy + current)
│   └── research_1/artifacts/       # Default artifacts root (generated, gitignored)
│       ├── builds/                # Built .algo files
│       ├── jobs/                  # Job execution artifacts
│       └── registry/              # Strategy/params/scenario/job registry
├── env/                            # Non-secret templates/conventions (no real creds)
└── run_cli.py                      # Convenience CLI entry point

What each top-level directory is for (practical)

.cursor/: Cursor IDE agent scaffolding.
- rules (.cursor/rules/*.mdc): discipline (value-budget, quant gates, task envelope).
- hooks (.cursor/hooks/*.py + .cursor/hooks.json): runtime gating + audit trails.
- skills (.cursor/skills/*/SKILL.md): repeatable “procedures” (track-router, checkpoints, debug-last, etc.).
- commands (.cursor/commands/**/*.md): slash-command playbooks the agent can invoke.
- Note: .cursor/project/ contains historical/legacy copies of the track system; the canonical one is under docs/.
core/: installable library + CLI (ctrader-orchestrator).
- Entry points: python run_cli.py ... (shim) or ctrader-orchestrator ... (after install).
- Docs: core/documentation/.
docs/: durable track system (context + checkpoints + results + decision log).
- Start here for how this repo runs long agent work: docs/RUNBOOK.md.
- Track registry: docs/context/INDEX.md.
research/: research “world”, including both legacy workspaces and the new system-of-record layers:
- research/workspaces/: execution work areas (often large/messy outputs and legacy artifacts).
  - Default artifacts root used by the CLI: research/workspaces/research_1/artifacts/
- research/experiments/: repo-wide system-of-record experiment folders (plan/sanity/results/decision + RUN_METADATA.json).
- research/methods/: repo-wide methods registry (method catalog + usage log + scorecard generator).
- research/shared/: shared C# strategies and JSON scenario/params examples used by the CLI.
- research/catalog/: discovery index for research/workspaces/* (see research/catalog/README.md).
- research/duckdb_ui/: optional local web UI for browsing/querying DuckDB files (see its README).
documentation/: project-level narrative/spec docs (including public-facing writeups and the quant loop spec).
env/: non-secret templates and conventions for environment variables (see env/README.md).

Documentation & Resources

Core Library

Library Guide - Complete CLI reference, cookbook recipes, API docs
Architecture Guide - Design decisions, terminology, boundaries
Plugin System - Extending with strategy-specific logic
Artifacts & Dataflow - Where things are stored and why

Agent Framework

The Full Story - Why and how this was built
Research Loop Spec - The discipline behind experiments
Methods Registry - How the playbook works
Quant Research Loop Rule - Hard gates and workflow enforcement

Examples

Experiment: EXP-20260119-001 - Example experiment structure
Shared Strategies - Strategy registry + how to run/create cBots
Example Parameters - Ready-to-use params and scenarios

Advanced Usage

Once you're comfortable with the basics, explore:

Custom Strategies - How to add your own cBots
Plugin Development - Extending with strategy-specific ingestion and ranking
DuckDB Queries - Advanced analysis patterns
Portfolio Analysis - Multi-strategy equity curves
Research Workflows - Using experiments registry and methods registry
Track System - Isolating different research streams
Workflow Monitors - Real-time backtest monitoring

Contributing

We welcome contributions! This project is experimental and evolving.

Especially interested in:

Method cards: Add research techniques to research/methods/METHODS.yml
Hooks for other domains: Adapt workflow gates to biology, legal research, marketing
Domain rules: Share your .cursor/rules for fields beyond quant research
Example strategies: Contribute interesting cBots or trading ideas
Bug fixes: Improve reliability and cross-platform support
Documentation: Clarify confusing sections or add examples

How to contribute:

Fork the repo
Create a feature branch (git checkout -b feature/amazing-contribution)
Make your changes (keep diffs minimal per the repo rules)
Test thoroughly (build passes, backtests work)
Submit a pull request with clear description

See CLAUDE.md for the project's AI agent contract and development philosophy.

Known Limitations & Warnings

⚠️ This is experimental - not production-grade.

Key limitations:

Agent supervision still required - Agents can and do make mistakes (hallucinated APIs, over-engineering, context exhaustion)
Context window exhaustion - Long experiments can exhaust agent context; checkpoint system helps but isn't perfect
macOS + cTrader specific - Cross-platform support untested (may work on Windows but no guarantees)
Methods registry underuse - Agents sometimes forget to consult/update methods log without reminders
Track isolation partial - Agent can still mix contexts from different tracks occasionally
Reproducibility gaps - RUN_METADATA.json not always complete (missing dataset hashes, exact commands)

The Guardrail Paradox: By adding extensive rules and procedures, we sometimes strip agents of creative problem-solving. They become too procedural—following checklists instead of thinking laterally. The balance between discipline and creativity is still being tuned.

When NOT to use this:

Simple one-off coding tasks (total overkill)
High-stakes production trading (human review mandatory)
If you lack domain expertise to validate outputs
When speed matters more than reproducibility

See the “Limits + when not to use this” section in documentation/ARTICLE_v2.md for the most important caveats.

Credits

Built with:

ChatGPT (planning, research synthesis, architecture)
Cursor Agent (implementation, iteration, bug fixes)
Human oversight (domain expertise, validation, integration)

Powered by:

cTrader by Spotware Systems
DuckDB - In-process analytical database
Cursor - AI-first code editor

Inspired by: The idea that AI agents should do more than just code—they should think, validate, learn, and build institutional memory.

Contact & Community

GitHub Issues: Bug reports and feature requests
Discussions: Questions and community support
Full Article: documentation/ARTICLE_v2.md - The complete story

Author: Alex Mihalache

If you build something with this, we'd love to hear about it.

Star this repo if you find it useful • Share it with others who might benefit

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.cursor		.cursor
.github		.github
core		core
docs		docs
documentation		documentation
env		env
research		research
tasks		tasks
.cspell.json		.cspell.json
.cursorignore		.cursorignore
.gitignore		.gitignore
.python-version		.python-version
.typos.toml		.typos.toml
CHANGELOG.md		CHANGELOG.md
CLAUDE.MD		CLAUDE.MD
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
cspell.json		cspell.json
pyproject.toml		pyproject.toml
run_cli.py		run_cli.py
typos.toml		typos.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Teaching AI Agents to Think Like Domain Experts

A backtesting framework + agent scaffolding for quantitative research

What is this?

Key Features

Core Backtesting Library

Agent Framework (.cursor/ Setup)

What Makes This Different?

Prerequisites

System Requirements

cTrader Account

Optional

Installation

Step 1: Clone the repo

Step 2: Install Python dependencies (recommended: uv)

Step 2 (alternative): Install with venv + pip

Step 3: Set up credentials

Step 4: Verify installation

Quick Start: Your First Backtest

Where outputs go (artifacts root)

Step 1: Register a strategy

Step 2: Build the strategy

Step 3: Add parameters and scenario

Step 4: Run the backtest

Step 5: Query the results

Step 6: Inspect detailed results

Next Steps: Parameter Sweeps

Create a params file

Run the sweep

Compare results

The .cursor Agent Framework

What is this?

Key Components

Example: Data Window Validity Gate

Try it yourself

Workspaces vs experiments (important)

Project Structure

What each top-level directory is for (practical)

Documentation & Resources

Core Library

Agent Framework

Examples

Advanced Usage

Contributing

Known Limitations & Warnings

Credits

Contact & Community

If you build something with this, we'd love to hear about it.

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Agent Framework (`.cursor/` Setup)

Step 2: Install Python dependencies (recommended: `uv`)

The `.cursor` Agent Framework

Packages