Skip to content

scaraguel/cursor_domain_agent

 
 

Repository files navigation

Teaching AI Agents to Think Like Domain Experts

A backtesting framework + agent scaffolding for quantitative research

Python Status

📖 Full Article🚀 Quick Start📚 Documentation🧪 Research Examples


What is this?

This is both a CLI-first backtesting library for cTrader AND an experiment in agent-driven research workflows.

The Core Library registers C# trading strategies, builds them with .NET SDK, runs backtests via cTrader CLI, parses results, and persists everything to DuckDB for analysis. Parameter sweeps, plugin architecture, resume/retry support—all the practical stuff you need.

The .cursor/ Framework teaches AI agents (like Cursor's Claude) to operate as domain researchers, not just code generators. Rules enforce research discipline (data validity gates, leakage checks), hooks provide runtime enforcement (workflow gates), and skills enable reusable capabilities (track management, checkpoints). Experiments get complete audit trails from hypothesis → decision.

In practice: validation gates catch a lot of “looks fine but is wrong” failures early (e.g. empty/shifted data windows that would otherwise produce phantom signals).

This isn't just about running backtests—it's about building institutional memory, enforcing research discipline, and scaling workflows across conversations and time.


Key Features

Core Backtesting Library

  • CLI-first Python harness for cTrader strategies (C# cBots)
  • Register → Build → Run workflow with deterministic artifact paths
  • Materialize .cbotset files from JSON parameter sets
  • Direct cTrader CLI invocation for backtests
  • Parse report.html → DuckDB schema (runs, trades, daily_metrics)
  • Plugin architecture for strategy-specific extensions
  • Parameter sweeps with parallel execution, resume, and retry

Agent Framework (.cursor/ Setup)

  • Rules: Domain discipline (research gates, validation requirements, value-budget enforcement)
  • Hooks: Runtime enforcement (workflow gates, documentation requirements, command audit)
  • Skills: Reusable capabilities (track-router, checkpoint-writer, results-summarizer)
  • Experiments registry: Complete audit trail (plan.md → sanity.md → results.md → decision.md)
  • Methods registry: Self-updating playbook (METHODS.yml + USAGE_LOG.ndjson + auto-generated scorecard)
  • Track system: Context isolation across different research streams

What Makes This Different?

Traditional backtesting frameworks help you run code.

This framework helps agents think - with hard gates (data validity, leakage checks), institutional memory (methods registry), and artifact-driven workflows.

It's not just backtesting; it's a complete research loop that scales across conversations and time.

The Data Window Validity Gate alone has caught more silent failures than anything else—wrong timezones leading to empty data windows, features all zeros, models learning nonsense. Without this gate, you waste days chasing phantom signals.

See documentation/ARTICLE_v2.md for the full story of how this enables agents to run research loops autonomously.


Prerequisites

System Requirements

  • macOS (cTrader requirement - may work on Windows but untested)
  • Python 3.11+ (repo pins 3.13 in .python-version)
  • .NET SDK 6.0+ (for building cBots) - Download
  • cTrader installed (/Applications/cTrader.app on macOS) - Download

cTrader Account

  • cTrader account with backtesting access
  • Account credentials: CTID, Account ID, Auth Token
  • Get credentials from: cTrader → Tools → Settings → Account → Copy IDs

Optional

  • Cursor IDE (to use the .cursor/ agent framework) - Download
  • DuckDB CLI (for advanced queries) - Download

Installation

Step 1: Clone the repo

git clone <your-repo-url>
cd ctrader-orchestrator

Step 2: Install Python dependencies (recommended: uv)

This repo is a uv workspace (root pyproject.toml + uv.lock, with the installable package in core/).

From repo root:

uv venv env/.venv --clear
# Make uv happy from any subdirectory: uv expects a project env at `.venv/`.
ln -sfn env/.venv .venv
uv sync

Run the CLI (no activation required):

uv run python run_cli.py --help

Tip (Cursor agent usage): When running Python commands in this repo, prefer uv run ... so the agent uses the already-synced workspace environment (and avoids “missing module” errors).

Optional extras:

# tests
uv sync --extra dev

# richer terminal UI (Rich)
uv sync --extra tui

Details: docs/UV_WORKSPACE.md.

Step 2 (alternative): Install with venv + pip

python3 -m venv .venv
source .venv/bin/activate
pip install -e core

Step 3: Set up credentials

Create .env file in repo root:

CTRADER_DEFAULT_CTID=your_ctid_here
CTRADER_DEFAULT_ACCOUNT=your_account_id_here
CTRADER_DEFAULT_AUTH_TOKEN=your_auth_token_here

Security note: Never commit .env to git. It's already in .gitignore.

Notes:

  • The CLI loads .env automatically (see core/src/ctrader_orchestrator/core/env.py).
  • Credentials are profiled: scenarios default to profile=DEFAULT (see scenario-add and Scenario.profile).
  • Template: env/examples/env_credentials_only.example (non-secret).

Step 4: Verify installation

python run_cli.py --help

You should see the CLI help output. If you get an error, check that Python 3.11+ and .NET SDK are installed.


Quick Start: Your First Backtest

Let's run a complete backtest using the included SMA crossover example. Time: ~5 minutes

Where outputs go (artifacts root)

By default, this CLI writes all generated outputs under:

  • research/workspaces/research_1/artifacts/

You can override the location per command with --artifacts-root <path>.

For repeatable runs, set an explicit artifacts root once:

ARTIFACTS_ROOT="$PWD/research/workspaces/research_1/artifacts"

Step 1: Register a strategy

python run_cli.py strategy-add \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --source-dir research/shared/strategies/sma_crossover_v1/sma_crossover_v1 \
  --plugin-id sma_crossover

This registers the strategy and links it to the sma_crossover plugin for custom analysis.

Step 2: Build the strategy

python run_cli.py build-dotnet \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --csproj research/shared/strategies/sma_crossover_v1/sma_crossover_v1/sma_crossover_v1.csproj \
  --configuration Release \
  --tfm net6.0 \
  --output-name sma_crossover_v1.algo

Output: You'll see a build_id like build-20260119-abc123. Save this for the next step.

The .algo file is now under the artifacts root at:

  • research/workspaces/research_1/artifacts/builds/sma_crossover_v1/<build_id>/

Step 3: Add parameters and scenario

Parameters define your strategy settings (fast period, slow period, etc.):

python run_cli.py params-add \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --file research/shared/examples/sma_crossover_params_v1.json

Output: params_id like params-abc123. Save this.

Scenario defines the backtest window and market data:

python run_cli.py scenario-add \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --file research/shared/examples/sma_crossover_scenario_v1.json

Output: scenario_id like scenario-def456. Save this.

What's in these files?

  • sma_crossover_params_v1.json: FastPeriod=5, SlowPeriod=20, VolumeInUnits=1
  • sma_crossover_scenario_v1.json: NAS100_SB, M1 bars, 3-hour window (Jan 2, 2024)

Step 4: Run the backtest

python run_cli.py run-job \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --build-id <build_id_from_step_2> \
  --params-id <params_id_from_step_3> \
  --scenario-id <scenario_id_from_step_3> \
  --ctrader-bin /Applications/cTrader.app/Contents/MacOS/cTrader.Mac

Output: job_id like job-ghi789. Save this - you'll use it to query results.

The backtest is now running. Depending on data size, this takes 30 seconds to a few minutes.

Step 5: Query the results

View run summary:

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "SELECT * FROM runs WHERE job_id='<job_id>'"

View all trades:

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "
  SELECT 
    open_time, 
    close_time, 
    direction, 
    entry_price,
    exit_price,
    pnl 
  FROM trades 
  WHERE job_id='<job_id>' 
  ORDER BY open_time
"

Aggregate metrics:

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "
  SELECT 
    COUNT(*) as n_trades, 
    SUM(pnl) as total_pnl, 
    AVG(pnl) as avg_pnl,
    MIN(pnl) as worst_trade,
    MAX(pnl) as best_trade
  FROM trades 
  WHERE job_id='<job_id>'
"

Step 6: Inspect detailed results

python run_cli.py inspect-job --job-id <job_id>

This shows a formatted summary including:

  • Strategy configuration
  • Time window
  • Trade count and PnL breakdown
  • Plugin-specific analysis (if available)

Congratulations! You've run your first backtest and queried the results.


Next Steps: Parameter Sweeps

Want to test multiple parameter combinations? Parameter sweeps let you run batches of backtests with different settings.

Create a params file

Create my_sweep.jsonl (one JSON object per line):

{"FastPeriod": 5, "SlowPeriod": 20, "VolumeInUnits": 1}
{"FastPeriod": 10, "SlowPeriod": 30, "VolumeInUnits": 1}
{"FastPeriod": 15, "SlowPeriod": 45, "VolumeInUnits": 1}

Run the sweep

python run_cli.py run-sweep \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --strategy-id sma_crossover_v1 \
  --build-id <build_id> \
  --scenario-id <scenario_id> \
  --params-jsonl my_sweep.jsonl \
  --ctrader-bin /Applications/cTrader.app/Contents/MacOS/cTrader.Mac \
  --parallel 2 \
  --resume \
  --continue-on-error

Options explained:

  • --parallel 2: Run 2 backtests simultaneously
  • --resume: Skip already-completed jobs (useful if sweep fails mid-run)
  • --continue-on-error: Don't stop if one job fails

Compare results

python run_cli.py sql \
  --artifacts-root "$ARTIFACTS_ROOT" \
  --query "
  SELECT 
    r.job_id,
    r.params_json->>'FastPeriod' as fast_period,
    r.params_json->>'SlowPeriod' as slow_period,
    COUNT(t.trade_id) as n_trades,
    SUM(t.pnl) as total_pnl,
    AVG(t.pnl) as avg_pnl
  FROM runs r
  LEFT JOIN trades t ON r.job_id = t.job_id
  WHERE r.strategy_id = 'sma_crossover_v1'
  GROUP BY r.job_id, fast_period, slow_period
  ORDER BY total_pnl DESC
"

This shows which parameter combinations performed best.


The .cursor Agent Framework

What is this?

The .cursor/ directory contains rules, hooks, and skills that teach AI agents (like Cursor's Claude) to operate as domain researchers—not just code generators.

Key Components

.cursor/
├── rules/           # Domain discipline (research gates, validation)
│   ├── quant_research_loop.mdc
│   ├── 00-core.mdc
│   ├── 01-value-budget.mdc
│   └── ...
├── hooks/           # Runtime enforcement (workflow gates)
│   ├── gate_shell.py
│   ├── gate_readfile.py
│   └── ...
├── skills/          # Reusable capabilities
│   ├── track-router/
│   ├── track-checkpoint-writer/
│   └── ...
└── commands/        # Agent-invokable workflows
    └── research/

Example: Data Window Validity Gate

Rules in quant_research_loop.mdc enforce that before any experiment proceeds, the agent must verify:

  • Observation counts per window (mean/p1/p50/p99) - no empty data
  • Timestamp coverage (min/max within each window) - correct timezone, no gaps
  • Non-degeneracy (% zeros, #unique values, variance/IQR) - not all zeros or constants

Why this matters: Wrong timezone → empty data windows → features all zeros → model learns nonsense. The gate catches this immediately before you waste days chasing phantom signals.

Try it yourself

If you use Cursor IDE:

  1. Open this repo in Cursor
  2. Ask the agent: "Run an SMA crossover backtest and analyze the results"
  3. Watch it use the CLI, check validation gates, and produce experiment artifacts

The agent will:

  • Use the track-router skill to isolate this work
  • Run the backtest via CLI
  • Parse results and check data validity
  • Create an experiment folder with plan.md, results.md, decision.md
  • Update the methods registry with usage logs

Workspaces vs experiments (important)

  • research/workspaces/<workspace>/...: execution/work areas for large outputs and tool/phase-specific research artifacts (often messy and big).
  • research/experiments/EXP-.../: repo-wide system-of-record experiment folders (plan/sanity/results/decision + RUN_METADATA.json) that link to workspace outputs and tmp/agent/<TRK>/... logs.

Rule of thumb: do heavy work in a workspace, and create an EXP-* folder whenever a run matters (decisions, comparisons, reproducibility).

Docs:

  • documentation/quant_research_loop_workflow.md
  • documentation/cursor_agent_setup.md

See documentation/ARTICLE_v2.md for the complete story of how this framework enables agents to run research loops autonomously.


Project Structure

ctrader-orchestrator/
├── .cursor/                        # Cursor agent framework (rules/hooks/skills/commands)
├── core/                           # Core Python library
│   ├── src/ctrader_orchestrator/  # CLI + adapters + plugins
│   │   ├── cli.py                 # Main CLI commands
│   │   ├── adapters/              # cTrader integration
│   │   ├── plugins/               # Strategy-specific extensions
│   │   └── store/                 # DuckDB persistence
│   └── documentation/              # Library docs, architecture
│       ├── LIBRARY.md             # Main library guide
│       └── ARCHITECTURE_PLUGINS.md
├── docs/                           # Track system (context/results/decisions/runbook)
│   ├── context/                    # Track registry + per-track state + checkpoints
│   └── results/                    # Per-track result summaries + command rollups
├── research/                       # Research artifacts
│   ├── experiments/                # Experiment folders
│   │   └── EXP-20260119-001/      # Example: plan, sanity, results, decision
│   ├── duckdb_ui/                  # Local web UI for browsing DuckDB files
│   ├── catalog/                    # Workspace discovery index (rebuildable)
│   ├── methods/                    # Methods registry
│   │   ├── METHODS.yml            # Method catalog
│   │   ├── USAGE_LOG.ndjson       # Append-only usage log
│   │   └── METHOD_SCORECARD.csv   # Auto-generated scorecard
│   └── shared/                     # Shared strategies and examples
│       ├── strategies/            # C# cBot source code
│       └── examples/              # Example params and scenarios
├── documentation/                  # Project-level docs
│   ├── ARTICLE_v2.md               # The main article (v2)
│   ├── ARTICLE.md                  # Legacy article (v1)
│   └── quant_research_agent_loop_spec.md
├── research/workspaces/            # Execution workspaces (legacy + current)
│   └── research_1/artifacts/       # Default artifacts root (generated, gitignored)
│       ├── builds/                # Built .algo files
│       ├── jobs/                  # Job execution artifacts
│       └── registry/              # Strategy/params/scenario/job registry
├── env/                            # Non-secret templates/conventions (no real creds)
└── run_cli.py                      # Convenience CLI entry point

What each top-level directory is for (practical)

  • .cursor/: Cursor IDE agent scaffolding.

    • rules (.cursor/rules/*.mdc): discipline (value-budget, quant gates, task envelope).
    • hooks (.cursor/hooks/*.py + .cursor/hooks.json): runtime gating + audit trails.
    • skills (.cursor/skills/*/SKILL.md): repeatable “procedures” (track-router, checkpoints, debug-last, etc.).
    • commands (.cursor/commands/**/*.md): slash-command playbooks the agent can invoke.
    • Note: .cursor/project/ contains historical/legacy copies of the track system; the canonical one is under docs/.
  • core/: installable library + CLI (ctrader-orchestrator).

    • Entry points: python run_cli.py ... (shim) or ctrader-orchestrator ... (after install).
    • Docs: core/documentation/.
  • docs/: durable track system (context + checkpoints + results + decision log).

    • Start here for how this repo runs long agent work: docs/RUNBOOK.md.
    • Track registry: docs/context/INDEX.md.
  • research/: research “world”, including both legacy workspaces and the new system-of-record layers:

    • research/workspaces/: execution work areas (often large/messy outputs and legacy artifacts).
      • Default artifacts root used by the CLI: research/workspaces/research_1/artifacts/
    • research/experiments/: repo-wide system-of-record experiment folders (plan/sanity/results/decision + RUN_METADATA.json).
    • research/methods/: repo-wide methods registry (method catalog + usage log + scorecard generator).
    • research/shared/: shared C# strategies and JSON scenario/params examples used by the CLI.
    • research/catalog/: discovery index for research/workspaces/* (see research/catalog/README.md).
    • research/duckdb_ui/: optional local web UI for browsing/querying DuckDB files (see its README).
  • documentation/: project-level narrative/spec docs (including public-facing writeups and the quant loop spec).

  • env/: non-secret templates and conventions for environment variables (see env/README.md).


Documentation & Resources

Core Library

Agent Framework

Examples


Advanced Usage

Once you're comfortable with the basics, explore:


Contributing

We welcome contributions! This project is experimental and evolving.

Especially interested in:

  • Method cards: Add research techniques to research/methods/METHODS.yml
  • Hooks for other domains: Adapt workflow gates to biology, legal research, marketing
  • Domain rules: Share your .cursor/rules for fields beyond quant research
  • Example strategies: Contribute interesting cBots or trading ideas
  • Bug fixes: Improve reliability and cross-platform support
  • Documentation: Clarify confusing sections or add examples

How to contribute:

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/amazing-contribution)
  3. Make your changes (keep diffs minimal per the repo rules)
  4. Test thoroughly (build passes, backtests work)
  5. Submit a pull request with clear description

See CLAUDE.md for the project's AI agent contract and development philosophy.


Known Limitations & Warnings

⚠️ This is experimental - not production-grade.

Key limitations:

  • Agent supervision still required - Agents can and do make mistakes (hallucinated APIs, over-engineering, context exhaustion)
  • Context window exhaustion - Long experiments can exhaust agent context; checkpoint system helps but isn't perfect
  • macOS + cTrader specific - Cross-platform support untested (may work on Windows but no guarantees)
  • Methods registry underuse - Agents sometimes forget to consult/update methods log without reminders
  • Track isolation partial - Agent can still mix contexts from different tracks occasionally
  • Reproducibility gaps - RUN_METADATA.json not always complete (missing dataset hashes, exact commands)

The Guardrail Paradox: By adding extensive rules and procedures, we sometimes strip agents of creative problem-solving. They become too procedural—following checklists instead of thinking laterally. The balance between discipline and creativity is still being tuned.

When NOT to use this:

  • Simple one-off coding tasks (total overkill)
  • High-stakes production trading (human review mandatory)
  • If you lack domain expertise to validate outputs
  • When speed matters more than reproducibility

See the “Limits + when not to use this” section in documentation/ARTICLE_v2.md for the most important caveats.


Credits

Built with:

  • ChatGPT (planning, research synthesis, architecture)
  • Cursor Agent (implementation, iteration, bug fixes)
  • Human oversight (domain expertise, validation, integration)

Powered by:

  • cTrader by Spotware Systems
  • DuckDB - In-process analytical database
  • Cursor - AI-first code editor

Inspired by: The idea that AI agents should do more than just code—they should think, validate, learn, and build institutional memory.


Contact & Community

  • GitHub Issues: Bug reports and feature requests
  • Discussions: Questions and community support
  • Full Article: documentation/ARTICLE_v2.md - The complete story

Author: Alex Mihalache


If you build something with this, we'd love to hear about it.

Star this repo if you find it useful • Share it with others who might benefit

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 78.4%
  • C# 11.5%
  • JavaScript 7.1%
  • HTML 1.8%
  • CSS 1.2%