Towards autonomous quantum physics research using LLM agents with access to intelligent tools
Sören Arlt, Xuemei Gu, Mario Krenn
https://arxiv.org/abs/2511.11752
This repo implements AI-Mandel, a multi-agent workflow to generate quantum optics research ideas and implement them them in PyTheus. It’s centered around three scripts:
researchers.py— drives a Researcher → Novelty Supervisor → Judge loop to produce concise abstracts and logs.prep_expert.py— prepares decision files (“prepped_*.txt”) by transforming the last Researcher message into a Pytheus-ready configuration prompt.expert.py— runs the Expert agent to produce/validate configurations and executes Pytheus with timeouts, saving final results and journal entries.
- Research ideas (researchers loop)
- The Researcher proposes an idea and iterates with the Novelty Supervisor (and an optional Mediator) to refine novelty. If accepted, a short two‑sentence abstract is written to the ensemble directory and a detailed JSON log is saved under an
ensemble_runs/subfolder.
- Prepare an expert decision (preparation)
- For each abstract produced in step 1, the preparation script extracts the last non‑summarizing Researcher message from the matching run log. It composes a prompt with Pytheus capabilities/limitations and curated examples, optionally adding a “domain expert feedback” section when a feedback file is provided. The model then outputs a Thought / Action / Action Input, where the Action Input is an improved configuration JSON. This is saved as
prepped_{timestamp}.txtnext to the abstract.
- Execute with Pytheus (expert run)
- The Expert agent builds its prompt from the abstract, the last Researcher message, the preparation decision (“prepped_*.txt”), Pytheus infos, and example snippets. It asks the model to produce a valid configuration; the script writes the config to disk and then runs
pytheusviarun_mainin a sandboxed subprocess with a timeout. Final results and a journal entry are written to the configured journal directory.
The three stages are independent scripts so you can run them end‑to‑end or selectively.
Centralized in assets/:
- Prompts (text files)
- Researchers loop:
assets/PROMPT_RESEARCHER.txt,assets/PROMPT_NOVELTY.txt,assets/PROMPT_JUDGE.txt - Expert:
assets/PROMPT_EXPERT.txt
- Researchers loop:
- Domain knowledge:
assets/PYTHEUS_EXPLICIT_INFOS.txt— capabilities/limitations and usage notes- Pytheus examples under
assets/pytheus_examples/{States,Gates,Measurements,Communication}withexplanation.txtandconfig*.json
- Data files (used by scripts as configured):
assets/100states.txtorassets/100states_medium.txt,assets/filtered_future_suggested_pairs_IR10.txt,assets/pytheus_fullpaper.txt,assets/recent_quantum_papers.json- Optional feedback file (only for preparation step)
- Outputs
- Researchers:
ensemble/abstracts_{timestamp}.txt, logs underensemble/ensemble_runs/o4-mini_* - Preparation:
prepped_{timestamp}.txt(in the same ensemble dir by default) - Expert: run directory with configs and
final_result_*, plusjournal_*.txtunder the journal directory
- Researchers:
All scripts use OpenAI chat completions with the o4-mini model and do not pass a temperature parameter. API key is read from the environment variable OPENAI_API_KEY or (as a fallback) from API_key.txt when present.
- Python 3.9+ recommended
- Install dependencies:
- Using pip:
pip install -r requirements.txt
- Using pip:
- Provide your API key by either:
- Setting the environment variable
OPENAI_API_KEY, or - Creating a file
API_key.txtwith your key.
- Setting the environment variable
- Purpose: Generate abstracts by running the Researcher → Novelty Supervisor → Judge loop; log the full conversation; write short abstracts to the ensemble directory.
- Key flags (subset; defaults shown):
--ensemble-dir ENSEMBLE_DIR(default varies per repo setup)--journal-dir JOURNAL_DIR--all-journals(aggregate previous results across anyjournal*/folders)--max-researcher-calls N
- Notes:
- Uses optional arXiv search; if the
arxivpackage isn’t installed, the step is skipped gracefully. - Saves a snapshot of prompts in the run directory for reproducibility.
- Uses optional arXiv search; if the
Example:
python3 researchers.py \
--ensemble-dir ensemble \
--journal-dir journal_o4-mini \
--all-journals \
--max-researcher-calls 20
- Purpose: For each
abstracts_{timestamp}.txt, extract the last Researcher message from the corresponding run logs and prepareprepped_{timestamp}.txtwith an improved Pytheus config prompt. -- Key flags:--ensemble-dir ENSEMBLE_DIR(defaultensemble)--examples-root assets/pytheus_examples--pytheus-infos assets/PYTHEUS_EXPLICIT_INFOS.txt--feedback-file PATH(optional; if omitted or missing, the feedback section is skipped)--id abstracts_*.txt(optional; process a single abstract)--output-dir DIR(default: same as--ensemble-dir)--seed N(optional; determinism for example ordering)
Examples:
# Process all abstracts in the directory
python3 prep_expert.py \
--ensemble-dir ensemble \
--examples-root assets/pytheus_examples \
--pytheus-infos assets/PYTHEUS_EXPLICIT_INFOS.txt
# Process a single abstract and include a feedback file
python3 prep_expert.py \
--ensemble-dir ensemble \
--id abstracts_2025_11_12_1507_09.txt \
--feedback-file combined_abstracts_20250808_1905_feedback.txt
- Purpose: Run the Expert agent to create a valid Pytheus config and execute it with a timeout; write journal entries and final results.
- Key flags:
--ensemble-dir ENSEMBLE_DIR(defaultensemble_mediator_mario)--journal-dir JOURNAL_DIR(defaultjournal_o4-mini)--all-journals(aggregates previous results in prompts)--id abstracts_*.txt(optional; otherwise a random one is picked)--max-internal-expert-calls N,--max-external-expert-calls N,--max-researcher-calls N--openai-api-key-file PATH(optional)
Example:
python3 expert.py \
--ensemble-dir ensemble \
--journal-dir journal_o4-mini \
--all-journals
- Missing API key
- Set
OPENAI_API_KEYor place a key inAPI_key.txt.
- Set
- Missing feedback file (preparation)
- The script continues without the feedback section and prints a warning.
- No matching run directory for an abstract
- Ensure the
abstracts_{timestamp}.txtbelongs to a run with a directory containing that timestamp underensemble_dir/ensemble_runs/.
- Ensure the
- Pytheus timeout or JSON decode errors
- The Expert script captures the exception, logs a debugging message, and continues based on the call budget limits.
