block · Kvadratni · Mar 24, 2025 · Mar 14, 2025 · Mar 24, 2025
diff --git a/README.md b/README.md
@@ -178,7 +178,17 @@ The new CLI adds these capabilities not present in the main branch:
    # Interactive dialog for viewing, editing, or generating system prompts
    ```
 
-4. **Rich Help System**:
+4. **Automatic Evaluation Generation**:
+
+   ```bash
+   # Disable automatic evaluation generation
+   uv run ai-migrate migrate --dont-create-evals <file_paths>
+
+   # Manage evaluations
+   uv run ai-migrate migrate --manage evals
+   ```
+
+5. **Rich Help System**:
    ```bash
    uv run ai-migrate --help
    uv run ai-migrate <command> --help
@@ -205,6 +215,7 @@ eval "$(./bin/hermit env)"
 Additional documentation is available in the `docs/` directory:
 
 - [Evaluation Runner](docs/eval_runner.md) - Documentation for the evaluation runner system
+- [Automatic Evaluation Generation](docs/eval_improvement.md) - Documentation for the automatic evaluation generation feature
 
 ## AI-Powered Project Setup
 
@@ -269,10 +280,12 @@ The new interactive CLI provides a more user-friendly experience with:
    - `migrate` - Migrate one or more files or manage project resources
      - Use `--manage examples` to manage example files
      - Use `--manage system-prompt` to view or edit the system prompt
+     - Use `--manage evals` to manage evaluation test cases
      - Use `--manifest-file` to specify a manifest file for batch processing
      - Use `--rerun-passed` to re-run migrations that have already passed
      - Use `--max-workers` to set the maximum number of parallel workers
      - Use `--local-worktrees` to create worktrees alongside the git repo
+     - Use `--dont-create-evals` to disable automatic evaluation generation
    - `status` - Show the status of migration projects
      - See which files are passing, failing, or have not been processed
    - `checkout` - Check out the branch for a failed migration attempt
@@ -307,3 +320,37 @@ uv run ai-migrate merge-branches
 # Get help for a specific command
 uv run ai-migrate migrate --help
 ```
+
+## Automatic Evaluation Generation
+
+The tool now automatically creates evaluation test cases from successful migrations. This helps build a comprehensive test suite and ensures that future versions of the migration system continue to work correctly.
+
+### How It Works
+
+1. When a migration succeeds (passes verification), the system:
+   - Captures the original source files before migration
+   - Creates a new directory in the `evals` directory with a timestamp-based name
+   - Saves the original files in the `source` subdirectory
+   - Creates a manifest file with the verification command and file information
+
+2. These evaluations can then be used to:
+   - Test future versions of the migration system
+   - Ensure that regressions don't occur
+   - Benchmark performance and accuracy
+
+### Usage
+
+By default, evaluations are automatically created for every successful migration. You can:
+
+```bash
+# Disable automatic evaluation creation
+uv run ai-migrate migrate --dont-create-evals <file_paths>
+
+# Manage evaluations
+uv run ai-migrate migrate --manage evals
+```
+
+The evaluation management interface allows you to:
+- List existing evaluations with details like file count and creation date
+- Generate evaluations from a GitHub Pull Request
+- Generate evaluations from recent successful migrations
diff --git a/docs/eval_improvement.md b/docs/eval_improvement.md
@@ -0,0 +1,52 @@
+# Automatic Evaluation Generation
+
+The system now automatically creates evaluation test cases from successful migrations. This behavior is enabled by default and can be disabled using the `--dont-create-evals` flag:
+
+```bash
+ai-migrate migrate --dont-create-evals <file_paths>
+```
+
+Each successful migration will generate a new evaluation in the project's `evals` directory with:
+- The original source files
+- The manifest used for the migration
+- The verification command that was used
+
+This ensures that all successful migrations contribute to the test suite, improving coverage and helping to catch regressions.
+
+## Managing Evaluations
+
+You can also manage evaluations using the CLI:
+
+```bash
+ai-migrate migrate --manage evals
+```
+
+This provides options to:
+- **List existing evaluations**: View all evaluations in the project with details like file count and creation date
+- **Generate evaluations from a PR**: Create evaluations based on a GitHub Pull Request
+- **Generate evaluations from recent migrations**: Create evaluations from recent successful migrations (coming soon)
+
+## How Automatic Evaluation Works
+
+When a migration succeeds (passes verification), the system:
+
+1. Captures the original source files before migration
+2. Captures the transformed files after migration
+3. Creates a new directory in the `evals` directory with a timestamp-based name
+4. Saves the original files in the `source` subdirectory
+5. Creates a manifest file with the verification command and file information
+
+These evaluations can then be used to:
+- Test future versions of the migration system
+- Ensure that regressions don't occur
+- Benchmark performance and accuracy
+
+## Disabling Automatic Evaluation
+
+If you don't want to automatically create evaluations (for example, during development or testing), use the `--dont-create-evals` flag:
+
+```bash
+ai-migrate migrate --dont-create-evals <file_paths>
+```
+
+This will skip the evaluation creation step while still performing the migration as usual.
diff --git a/pyproject.toml b/pyproject.toml
@@ -9,6 +9,7 @@ dependencies = [
     "httpx>=0.24.0",
     "openai>=1.0.0",
     "pytest",
+    "pytest-asyncio",
     "click>=8.1.0",
     "rich>=13.0.0",
     "prompt_toolkit>=3.0.0",
@@ -29,3 +30,7 @@ packages = ["src/ai_migrate"]
 dev = [
     "ruff>=0.9.9",
 ]
+
+[tool.pytest.ini_options]
+asyncio_mode = "strict"
+asyncio_default_fixture_loop_scope = "function"