Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,26 @@ bash evaluate.sh

Results will be generated in `run_logs/` with detailed metrics and analysis.

## πŸ–₯️ Configuration UI Tool

AU-Harness includes a web-based configuration UI tool to help users easily create and customize evaluation configurations without manually editing YAML files.

### Features:
- Interactive task selection from all supported categories (Speech Recognition, Paralinguistics, Audio Understanding, etc.)
- Model configuration with preset templates for common models like GPT-4o-mini and Gemini
- Advanced options for filtering, judge settings, and generation parameters
- Copy to clipboard or download functionality

### Usage:
1. Navigate to the `ui/` directory
2. Open `index.html` in your web browser
3. Select the tasks you want to evaluate from the categorized task list
4. Configure your models by adding model endpoints, API keys, and parameters
5. Adjust advanced options like sample limits, language filters, and judge settings
6. Generate the YAML configuration, then copy or download it for use with `evaluate.sh`

This tool simplifies the process of setting up complex evaluation runs by providing a user-friendly interface to build `config.yaml` files, making it easier to get started with AU-Harness evaluations.

## πŸ’» Usage

AU-Harness requires setting up a running configuration file (`config.yaml`) to define your evaluation parameters. This file controls which models, datasets, and metrics are used in your evaluation.
Expand Down
135 changes: 135 additions & 0 deletions ui/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# AU-Harness UI Tool

A user-friendly web interface for configuring and running audio model evaluations with the AU-Harness framework.

## πŸš€ Quick Start

1. **Open the UI**: Simply open `index.html` in your web browser
2. **Select Tasks**: Browse task categories and select specific tasks with their metrics
3. **Configure Models**: Choose from preset models or configure custom endpoints
4. **Generate Config**: Preview and download the generated YAML configuration

## πŸ“‹ Features

### Task Selection
- **Visual Category Navigation**: 6 task categories with clear descriptions
- **Smart Metric Filtering**: Automatically shows supported metrics for each task
- **Multi-Selection Support**: Select multiple tasks across different categories
- **Real-time Feedback**: Visual indicators show selected tasks and metrics

### Model Configuration
- **Preset Models**: Quick setup for common models (GPT-4o, Gemini, Qwen)
- **Custom Model Support**: Add any OpenAI-compatible endpoint
- **Sharding Configuration**: Automatic model instance management
- **Connection Validation**: Built-in endpoint testing

### Advanced Options
- **Dataset Filtering**: Control sample limits, duration ranges, and language
- **Judge Settings**: Configure LLM judges for evaluation
- **Generation Parameters**: Override model parameters per task
- **Prompt Customization**: Modify system and user prompts

### Configuration Management
- **YAML Preview**: See generated configuration
- **Export Options**: Download as YAML file or copy to clipboard

## πŸ› οΈ Technical Details

### Architecture
- **Frontend**: Vanilla HTML5, CSS3, JavaScript (ES6+)
- **No Dependencies**: Completely self-contained, no npm packages required
- **Responsive Design**: Works on desktop, tablet, and mobile
- **Modern CSS**: CSS Grid, Flexbox, Custom Properties
- **Accessibility**: WCAG 2.1 compliant with semantic HTML

### File Structure
```
ui/
β”œβ”€β”€ index.html # Main application page with HTML comments for sections
β”œβ”€β”€ styles.css # Complete styling with CSS custom properties and section comments
β”œβ”€β”€ app.js # Application logic with detailed function comments
β”œβ”€β”€ generate_tasks.py # Script to generate tasks.js and tasks.json with docstrings and comments
β”œβ”€β”€ tasks.js # Task categories and metrics data
β”œβ”€β”€ tasks.json # Task categories and metrics data
└── README.md # This documentation
```

### Browser Support
- Chrome 90+
- Firefox 88+
- Safari 14+
- Edge 90+

## πŸ“– Usage Guide

### 1. Selecting Tasks
1. Click on any category card to expand it
2. Check the boxes next to desired tasks
3. View selected metrics in the "Selected Tasks" section
4. Remove tasks by clicking the "Remove" button

### 2. Configuring Models
1. Choose "Preset Models" for quick setup
2. Check boxes next to desired models
3. Or switch to "Custom Model" tab for custom endpoints
4. Fill in model name, endpoint, and API key

### 3. Advanced Configuration
1. Set sample limits to control evaluation size
2. Adjust duration filters for audio length constraints
3. Select target language for evaluation
4. Configure additional options as needed

### 4. Generating Configuration
1. Click "Generate Config" to create YAML
2. Review the generated configuration in the preview
3. Click "Download YAML" to save the file
4. Use the config with AU-Harness evaluation engine

### 5. Running Evaluations
1. Click "Run Evaluation" to start the process
2. Monitor progress in the Results Dashboard
3. View scores and metrics as they complete
4. Export results for further analysis


## πŸš€ Integration with AU-Harness

The generated YAML configuration is fully compatible with the AU-Harness evaluation engine. Use it as follows:

```bash
# Using the generated config
python evaluate.py --config your-config.yaml

# Or with the UI-generated file
python evaluate.py --config au-harness-config.yaml
```


## πŸ†˜ Troubleshooting

### Common Issues

**Q: Configuration preview is empty**
A: Make sure you've selected at least one task and one model before generating the config.

**Q: Download doesn't work**
A: Check your browser's download settings and ensure pop-ups are allowed for this site.

**Q: Styling looks broken**
A: Ensure you're opening `index.html` directly in a browser, not through a file:// path with restrictions.

### Performance Tips

- For large evaluations, consider reducing sample limits initially
- Use preset models for faster setup
- Clear browser cache if experiencing issues with updates

## πŸ“ž Support

For issues with the UI tool, please check:
1. Browser console for JavaScript errors
2. Network tab for any failed resource loads
3. This documentation for usage guidance

For issues with the AU-Harness framework itself, please refer to the main project documentation.
Loading