This is both a CLI-first backtesting library for cTrader AND an experiment in agent-driven research workflows.
The Core Library registers C# trading strategies, builds them with .NET SDK, runs backtests via cTrader CLI, parses results, and persists everything to DuckDB for analysis. Parameter sweeps, plugin architecture, resume/retry support—all the practical stuff you need.
The .cursor/ Framework teaches AI agents (like Cursor's Claude) to operate as domain researchers, not just code generators. Rules enforce research discipline (data validity gates, leakage checks), hooks provide runtime enforcement (workflow gates), and skills enable reusable capabilities (track management, checkpoints). Experiments get complete audit trails from hypothesis → decision.
In practice: validation gates catch a lot of “looks fine but is wrong” failures early (e.g. empty/shifted data windows that would otherwise produce phantom signals).
This isn't just about running backtests—it's about building institutional memory, enforcing research discipline, and scaling workflows across conversations and time.
- CLI-first Python harness for cTrader strategies (C# cBots)
- Register → Build → Run workflow with deterministic artifact paths
- Materialize
.cbotsetfiles from JSON parameter sets - Direct cTrader CLI invocation for backtests
- Parse
report.html→ DuckDB schema (runs, trades, daily_metrics) - Plugin architecture for strategy-specific extensions
- Parameter sweeps with parallel execution, resume, and retry
- Rules: Domain discipline (research gates, validation requirements, value-budget enforcement)
- Hooks: Runtime enforcement (workflow gates, documentation requirements, command audit)
- Skills: Reusable capabilities (track-router, checkpoint-writer, results-summarizer)
- Experiments registry: Complete audit trail (plan.md → sanity.md → results.md → decision.md)
- Methods registry: Self-updating playbook (METHODS.yml + USAGE_LOG.ndjson + auto-generated scorecard)
- Track system: Context isolation across different research streams
Traditional backtesting frameworks help you run code.
This framework helps agents think - with hard gates (data validity, leakage checks), institutional memory (methods registry), and artifact-driven workflows.
It's not just backtesting; it's a complete research loop that scales across conversations and time.
The Data Window Validity Gate alone has caught more silent failures than anything else—wrong timezones leading to empty data windows, features all zeros, models learning nonsense. Without this gate, you waste days chasing phantom signals.
See documentation/ARTICLE_v2.md for the full story of how this enables agents to run research loops autonomously.
- macOS (cTrader requirement - may work on Windows but untested)
- Python 3.11+ (repo pins 3.13 in
.python-version) - .NET SDK 6.0+ (for building cBots) - Download
- cTrader installed (
/Applications/cTrader.appon macOS) - Download
- cTrader account with backtesting access
- Account credentials: CTID, Account ID, Auth Token
- Get credentials from: cTrader → Tools → Settings → Account → Copy IDs
- Cursor IDE (to use the
.cursor/agent framework) - Download - DuckDB CLI (for advanced queries) - Download
git clone <your-repo-url>
cd ctrader-orchestratorThis repo is a uv workspace (root pyproject.toml + uv.lock, with the installable package in core/).
From repo root:
uv venv env/.venv --clear
# Make uv happy from any subdirectory: uv expects a project env at `.venv/`.
ln -sfn env/.venv .venv
uv syncRun the CLI (no activation required):
uv run python run_cli.py --helpTip (Cursor agent usage): When running Python commands in this repo, prefer uv run ... so the agent uses the already-synced workspace environment (and avoids “missing module” errors).
Optional extras:
# tests
uv sync --extra dev
# richer terminal UI (Rich)
uv sync --extra tuiDetails: docs/UV_WORKSPACE.md.
python3 -m venv .venv
source .venv/bin/activate
pip install -e coreCreate .env file in repo root:
CTRADER_DEFAULT_CTID=your_ctid_here
CTRADER_DEFAULT_ACCOUNT=your_account_id_here
CTRADER_DEFAULT_AUTH_TOKEN=your_auth_token_hereSecurity note: Never commit .env to git. It's already in .gitignore.
Notes:
- The CLI loads
.envautomatically (seecore/src/ctrader_orchestrator/core/env.py). - Credentials are profiled: scenarios default to
profile=DEFAULT(seescenario-addandScenario.profile). - Template:
env/examples/env_credentials_only.example(non-secret).
python run_cli.py --helpYou should see the CLI help output. If you get an error, check that Python 3.11+ and .NET SDK are installed.
Let's run a complete backtest using the included SMA crossover example. Time: ~5 minutes
By default, this CLI writes all generated outputs under:
research/workspaces/research_1/artifacts/
You can override the location per command with --artifacts-root <path>.
For repeatable runs, set an explicit artifacts root once:
ARTIFACTS_ROOT="$PWD/research/workspaces/research_1/artifacts"python run_cli.py strategy-add \
--artifacts-root "$ARTIFACTS_ROOT" \
--strategy-id sma_crossover_v1 \
--source-dir research/shared/strategies/sma_crossover_v1/sma_crossover_v1 \
--plugin-id sma_crossoverThis registers the strategy and links it to the sma_crossover plugin for custom analysis.
python run_cli.py build-dotnet \
--artifacts-root "$ARTIFACTS_ROOT" \
--strategy-id sma_crossover_v1 \
--csproj research/shared/strategies/sma_crossover_v1/sma_crossover_v1/sma_crossover_v1.csproj \
--configuration Release \
--tfm net6.0 \
--output-name sma_crossover_v1.algoOutput: You'll see a build_id like build-20260119-abc123. Save this for the next step.
The .algo file is now under the artifacts root at:
research/workspaces/research_1/artifacts/builds/sma_crossover_v1/<build_id>/
Parameters define your strategy settings (fast period, slow period, etc.):
python run_cli.py params-add \
--artifacts-root "$ARTIFACTS_ROOT" \
--strategy-id sma_crossover_v1 \
--file research/shared/examples/sma_crossover_params_v1.jsonOutput: params_id like params-abc123. Save this.
Scenario defines the backtest window and market data:
python run_cli.py scenario-add \
--artifacts-root "$ARTIFACTS_ROOT" \
--file research/shared/examples/sma_crossover_scenario_v1.jsonOutput: scenario_id like scenario-def456. Save this.
What's in these files?
sma_crossover_params_v1.json: FastPeriod=5, SlowPeriod=20, VolumeInUnits=1sma_crossover_scenario_v1.json: NAS100_SB, M1 bars, 3-hour window (Jan 2, 2024)
python run_cli.py run-job \
--artifacts-root "$ARTIFACTS_ROOT" \
--strategy-id sma_crossover_v1 \
--build-id <build_id_from_step_2> \
--params-id <params_id_from_step_3> \
--scenario-id <scenario_id_from_step_3> \
--ctrader-bin /Applications/cTrader.app/Contents/MacOS/cTrader.MacOutput: job_id like job-ghi789. Save this - you'll use it to query results.
The backtest is now running. Depending on data size, this takes 30 seconds to a few minutes.
View run summary:
python run_cli.py sql \
--artifacts-root "$ARTIFACTS_ROOT" \
--query "SELECT * FROM runs WHERE job_id='<job_id>'"View all trades:
python run_cli.py sql \
--artifacts-root "$ARTIFACTS_ROOT" \
--query "
SELECT
open_time,
close_time,
direction,
entry_price,
exit_price,
pnl
FROM trades
WHERE job_id='<job_id>'
ORDER BY open_time
"Aggregate metrics:
python run_cli.py sql \
--artifacts-root "$ARTIFACTS_ROOT" \
--query "
SELECT
COUNT(*) as n_trades,
SUM(pnl) as total_pnl,
AVG(pnl) as avg_pnl,
MIN(pnl) as worst_trade,
MAX(pnl) as best_trade
FROM trades
WHERE job_id='<job_id>'
"python run_cli.py inspect-job --job-id <job_id>This shows a formatted summary including:
- Strategy configuration
- Time window
- Trade count and PnL breakdown
- Plugin-specific analysis (if available)
Congratulations! You've run your first backtest and queried the results.
Want to test multiple parameter combinations? Parameter sweeps let you run batches of backtests with different settings.
Create my_sweep.jsonl (one JSON object per line):
{"FastPeriod": 5, "SlowPeriod": 20, "VolumeInUnits": 1}
{"FastPeriod": 10, "SlowPeriod": 30, "VolumeInUnits": 1}
{"FastPeriod": 15, "SlowPeriod": 45, "VolumeInUnits": 1}python run_cli.py run-sweep \
--artifacts-root "$ARTIFACTS_ROOT" \
--strategy-id sma_crossover_v1 \
--build-id <build_id> \
--scenario-id <scenario_id> \
--params-jsonl my_sweep.jsonl \
--ctrader-bin /Applications/cTrader.app/Contents/MacOS/cTrader.Mac \
--parallel 2 \
--resume \
--continue-on-errorOptions explained:
--parallel 2: Run 2 backtests simultaneously--resume: Skip already-completed jobs (useful if sweep fails mid-run)--continue-on-error: Don't stop if one job fails
python run_cli.py sql \
--artifacts-root "$ARTIFACTS_ROOT" \
--query "
SELECT
r.job_id,
r.params_json->>'FastPeriod' as fast_period,
r.params_json->>'SlowPeriod' as slow_period,
COUNT(t.trade_id) as n_trades,
SUM(t.pnl) as total_pnl,
AVG(t.pnl) as avg_pnl
FROM runs r
LEFT JOIN trades t ON r.job_id = t.job_id
WHERE r.strategy_id = 'sma_crossover_v1'
GROUP BY r.job_id, fast_period, slow_period
ORDER BY total_pnl DESC
"This shows which parameter combinations performed best.
The .cursor/ directory contains rules, hooks, and skills that teach AI agents (like Cursor's Claude) to operate as domain researchers—not just code generators.
.cursor/
├── rules/ # Domain discipline (research gates, validation)
│ ├── quant_research_loop.mdc
│ ├── 00-core.mdc
│ ├── 01-value-budget.mdc
│ └── ...
├── hooks/ # Runtime enforcement (workflow gates)
│ ├── gate_shell.py
│ ├── gate_readfile.py
│ └── ...
├── skills/ # Reusable capabilities
│ ├── track-router/
│ ├── track-checkpoint-writer/
│ └── ...
└── commands/ # Agent-invokable workflows
└── research/
Rules in quant_research_loop.mdc enforce that before any experiment proceeds, the agent must verify:
- Observation counts per window (mean/p1/p50/p99) - no empty data
- Timestamp coverage (min/max within each window) - correct timezone, no gaps
- Non-degeneracy (% zeros, #unique values, variance/IQR) - not all zeros or constants
Why this matters: Wrong timezone → empty data windows → features all zeros → model learns nonsense. The gate catches this immediately before you waste days chasing phantom signals.
If you use Cursor IDE:
- Open this repo in Cursor
- Ask the agent: "Run an SMA crossover backtest and analyze the results"
- Watch it use the CLI, check validation gates, and produce experiment artifacts
The agent will:
- Use the track-router skill to isolate this work
- Run the backtest via CLI
- Parse results and check data validity
- Create an experiment folder with plan.md, results.md, decision.md
- Update the methods registry with usage logs
research/workspaces/<workspace>/...: execution/work areas for large outputs and tool/phase-specific research artifacts (often messy and big).research/experiments/EXP-.../: repo-wide system-of-record experiment folders (plan/sanity/results/decision +RUN_METADATA.json) that link to workspace outputs andtmp/agent/<TRK>/...logs.
Rule of thumb: do heavy work in a workspace, and create an EXP-* folder whenever a run matters (decisions, comparisons, reproducibility).
Docs:
documentation/quant_research_loop_workflow.mddocumentation/cursor_agent_setup.md
See documentation/ARTICLE_v2.md for the complete story of how this framework enables agents to run research loops autonomously.
ctrader-orchestrator/
├── .cursor/ # Cursor agent framework (rules/hooks/skills/commands)
├── core/ # Core Python library
│ ├── src/ctrader_orchestrator/ # CLI + adapters + plugins
│ │ ├── cli.py # Main CLI commands
│ │ ├── adapters/ # cTrader integration
│ │ ├── plugins/ # Strategy-specific extensions
│ │ └── store/ # DuckDB persistence
│ └── documentation/ # Library docs, architecture
│ ├── LIBRARY.md # Main library guide
│ └── ARCHITECTURE_PLUGINS.md
├── docs/ # Track system (context/results/decisions/runbook)
│ ├── context/ # Track registry + per-track state + checkpoints
│ └── results/ # Per-track result summaries + command rollups
├── research/ # Research artifacts
│ ├── experiments/ # Experiment folders
│ │ └── EXP-20260119-001/ # Example: plan, sanity, results, decision
│ ├── duckdb_ui/ # Local web UI for browsing DuckDB files
│ ├── catalog/ # Workspace discovery index (rebuildable)
│ ├── methods/ # Methods registry
│ │ ├── METHODS.yml # Method catalog
│ │ ├── USAGE_LOG.ndjson # Append-only usage log
│ │ └── METHOD_SCORECARD.csv # Auto-generated scorecard
│ └── shared/ # Shared strategies and examples
│ ├── strategies/ # C# cBot source code
│ └── examples/ # Example params and scenarios
├── documentation/ # Project-level docs
│ ├── ARTICLE_v2.md # The main article (v2)
│ ├── ARTICLE.md # Legacy article (v1)
│ └── quant_research_agent_loop_spec.md
├── research/workspaces/ # Execution workspaces (legacy + current)
│ └── research_1/artifacts/ # Default artifacts root (generated, gitignored)
│ ├── builds/ # Built .algo files
│ ├── jobs/ # Job execution artifacts
│ └── registry/ # Strategy/params/scenario/job registry
├── env/ # Non-secret templates/conventions (no real creds)
└── run_cli.py # Convenience CLI entry point
-
.cursor/: Cursor IDE agent scaffolding.- rules (
.cursor/rules/*.mdc): discipline (value-budget, quant gates, task envelope). - hooks (
.cursor/hooks/*.py+.cursor/hooks.json): runtime gating + audit trails. - skills (
.cursor/skills/*/SKILL.md): repeatable “procedures” (track-router, checkpoints, debug-last, etc.). - commands (
.cursor/commands/**/*.md): slash-command playbooks the agent can invoke. - Note:
.cursor/project/contains historical/legacy copies of the track system; the canonical one is underdocs/.
- rules (
-
core/: installable library + CLI (ctrader-orchestrator).- Entry points:
python run_cli.py ...(shim) orctrader-orchestrator ...(after install). - Docs:
core/documentation/.
- Entry points:
-
docs/: durable track system (context + checkpoints + results + decision log).- Start here for how this repo runs long agent work:
docs/RUNBOOK.md. - Track registry:
docs/context/INDEX.md.
- Start here for how this repo runs long agent work:
-
research/: research “world”, including both legacy workspaces and the new system-of-record layers:research/workspaces/: execution work areas (often large/messy outputs and legacy artifacts).- Default artifacts root used by the CLI:
research/workspaces/research_1/artifacts/
- Default artifacts root used by the CLI:
research/experiments/: repo-wide system-of-record experiment folders (plan/sanity/results/decision +RUN_METADATA.json).research/methods/: repo-wide methods registry (method catalog + usage log + scorecard generator).research/shared/: shared C# strategies and JSON scenario/params examples used by the CLI.research/catalog/: discovery index forresearch/workspaces/*(seeresearch/catalog/README.md).research/duckdb_ui/: optional local web UI for browsing/querying DuckDB files (see its README).
-
documentation/: project-level narrative/spec docs (including public-facing writeups and the quant loop spec). -
env/: non-secret templates and conventions for environment variables (seeenv/README.md).
- Library Guide - Complete CLI reference, cookbook recipes, API docs
- Architecture Guide - Design decisions, terminology, boundaries
- Plugin System - Extending with strategy-specific logic
- Artifacts & Dataflow - Where things are stored and why
- The Full Story - Why and how this was built
- Research Loop Spec - The discipline behind experiments
- Methods Registry - How the playbook works
- Quant Research Loop Rule - Hard gates and workflow enforcement
- Experiment: EXP-20260119-001 - Example experiment structure
- Shared Strategies - Strategy registry + how to run/create cBots
- Example Parameters - Ready-to-use params and scenarios
Once you're comfortable with the basics, explore:
- Custom Strategies - How to add your own cBots
- Plugin Development - Extending with strategy-specific ingestion and ranking
- DuckDB Queries - Advanced analysis patterns
- Portfolio Analysis - Multi-strategy equity curves
- Research Workflows - Using experiments registry and methods registry
- Track System - Isolating different research streams
- Workflow Monitors - Real-time backtest monitoring
We welcome contributions! This project is experimental and evolving.
Especially interested in:
- Method cards: Add research techniques to
research/methods/METHODS.yml - Hooks for other domains: Adapt workflow gates to biology, legal research, marketing
- Domain rules: Share your
.cursor/rulesfor fields beyond quant research - Example strategies: Contribute interesting cBots or trading ideas
- Bug fixes: Improve reliability and cross-platform support
- Documentation: Clarify confusing sections or add examples
How to contribute:
- Fork the repo
- Create a feature branch (
git checkout -b feature/amazing-contribution) - Make your changes (keep diffs minimal per the repo rules)
- Test thoroughly (build passes, backtests work)
- Submit a pull request with clear description
See CLAUDE.md for the project's AI agent contract and development philosophy.
Key limitations:
- Agent supervision still required - Agents can and do make mistakes (hallucinated APIs, over-engineering, context exhaustion)
- Context window exhaustion - Long experiments can exhaust agent context; checkpoint system helps but isn't perfect
- macOS + cTrader specific - Cross-platform support untested (may work on Windows but no guarantees)
- Methods registry underuse - Agents sometimes forget to consult/update methods log without reminders
- Track isolation partial - Agent can still mix contexts from different tracks occasionally
- Reproducibility gaps -
RUN_METADATA.jsonnot always complete (missing dataset hashes, exact commands)
The Guardrail Paradox: By adding extensive rules and procedures, we sometimes strip agents of creative problem-solving. They become too procedural—following checklists instead of thinking laterally. The balance between discipline and creativity is still being tuned.
When NOT to use this:
- Simple one-off coding tasks (total overkill)
- High-stakes production trading (human review mandatory)
- If you lack domain expertise to validate outputs
- When speed matters more than reproducibility
See the “Limits + when not to use this” section in documentation/ARTICLE_v2.md for the most important caveats.
Built with:
- ChatGPT (planning, research synthesis, architecture)
- Cursor Agent (implementation, iteration, bug fixes)
- Human oversight (domain expertise, validation, integration)
Powered by:
Inspired by: The idea that AI agents should do more than just code—they should think, validate, learn, and build institutional memory.
- GitHub Issues: Bug reports and feature requests
- Discussions: Questions and community support
- Full Article: documentation/ARTICLE_v2.md - The complete story
Author: Alex Mihalache