Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval generator #2

Merged
merged 2 commits into from
Mar 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,17 @@ The new CLI adds these capabilities not present in the main branch:
# Interactive dialog for viewing, editing, or generating system prompts
```

4. **Rich Help System**:
4. **Automatic Evaluation Generation**:

```bash
# Disable automatic evaluation generation
uv run ai-migrate migrate --dont-create-evals <file_paths>

# Manage evaluations
uv run ai-migrate migrate --manage evals
```

5. **Rich Help System**:
```bash
uv run ai-migrate --help
uv run ai-migrate <command> --help
Expand All @@ -205,6 +215,7 @@ eval "$(./bin/hermit env)"
Additional documentation is available in the `docs/` directory:

- [Evaluation Runner](docs/eval_runner.md) - Documentation for the evaluation runner system
- [Automatic Evaluation Generation](docs/eval_improvement.md) - Documentation for the automatic evaluation generation feature

## AI-Powered Project Setup

Expand Down Expand Up @@ -269,10 +280,12 @@ The new interactive CLI provides a more user-friendly experience with:
- `migrate` - Migrate one or more files or manage project resources
- Use `--manage examples` to manage example files
- Use `--manage system-prompt` to view or edit the system prompt
- Use `--manage evals` to manage evaluation test cases
- Use `--manifest-file` to specify a manifest file for batch processing
- Use `--rerun-passed` to re-run migrations that have already passed
- Use `--max-workers` to set the maximum number of parallel workers
- Use `--local-worktrees` to create worktrees alongside the git repo
- Use `--dont-create-evals` to disable automatic evaluation generation
- `status` - Show the status of migration projects
- See which files are passing, failing, or have not been processed
- `checkout` - Check out the branch for a failed migration attempt
Expand Down Expand Up @@ -307,3 +320,37 @@ uv run ai-migrate merge-branches
# Get help for a specific command
uv run ai-migrate migrate --help
```

## Automatic Evaluation Generation

The tool now automatically creates evaluation test cases from successful migrations. This helps build a comprehensive test suite and ensures that future versions of the migration system continue to work correctly.

### How It Works

1. When a migration succeeds (passes verification), the system:
- Captures the original source files before migration
- Creates a new directory in the `evals` directory with a timestamp-based name
- Saves the original files in the `source` subdirectory
- Creates a manifest file with the verification command and file information

2. These evaluations can then be used to:
- Test future versions of the migration system
- Ensure that regressions don't occur
- Benchmark performance and accuracy

### Usage

By default, evaluations are automatically created for every successful migration. You can:

```bash
# Disable automatic evaluation creation
uv run ai-migrate migrate --dont-create-evals <file_paths>

# Manage evaluations
uv run ai-migrate migrate --manage evals
```

The evaluation management interface allows you to:
- List existing evaluations with details like file count and creation date
- Generate evaluations from a GitHub Pull Request
- Generate evaluations from recent successful migrations
52 changes: 52 additions & 0 deletions docs/eval_improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Automatic Evaluation Generation

The system now automatically creates evaluation test cases from successful migrations. This behavior is enabled by default and can be disabled using the `--dont-create-evals` flag:

```bash
ai-migrate migrate --dont-create-evals <file_paths>
```

Each successful migration will generate a new evaluation in the project's `evals` directory with:
- The original source files
- The manifest used for the migration
- The verification command that was used

This ensures that all successful migrations contribute to the test suite, improving coverage and helping to catch regressions.

## Managing Evaluations

You can also manage evaluations using the CLI:

```bash
ai-migrate migrate --manage evals
```

This provides options to:
- **List existing evaluations**: View all evaluations in the project with details like file count and creation date
- **Generate evaluations from a PR**: Create evaluations based on a GitHub Pull Request
- **Generate evaluations from recent migrations**: Create evaluations from recent successful migrations (coming soon)

## How Automatic Evaluation Works

When a migration succeeds (passes verification), the system:

1. Captures the original source files before migration
2. Captures the transformed files after migration
3. Creates a new directory in the `evals` directory with a timestamp-based name
4. Saves the original files in the `source` subdirectory
5. Creates a manifest file with the verification command and file information

These evaluations can then be used to:
- Test future versions of the migration system
- Ensure that regressions don't occur
- Benchmark performance and accuracy

## Disabling Automatic Evaluation

If you don't want to automatically create evaluations (for example, during development or testing), use the `--dont-create-evals` flag:

```bash
ai-migrate migrate --dont-create-evals <file_paths>
```

This will skip the evaluation creation step while still performing the migration as usual.
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies = [
"httpx>=0.24.0",
"openai>=1.0.0",
"pytest",
"pytest-asyncio",
"click>=8.1.0",
"rich>=13.0.0",
"prompt_toolkit>=3.0.0",
Expand All @@ -29,3 +30,7 @@ packages = ["src/ai_migrate"]
dev = [
"ruff>=0.9.9",
]

[tool.pytest.ini_options]
asyncio_mode = "strict"
asyncio_default_fixture_loop_scope = "function"
Loading