AI Software Engineering Agent Benchmarking platform evaluating real-world performance on open source GitHub issues through automated PR generation and assessment.
AISWE Bench is a benchmarking platform that evaluates AI software engineering agents by testing their ability to solve real-world GitHub issues through automated pull request generation. The platform provides comprehensive metrics and comparisons between different AI agents.
- Real-world Evaluation: Tests AI agents on actual open source GitHub issues
- Automated PR Generation: Agents create pull requests to solve issues
- Comprehensive Metrics: Success rates, response times, code quality scores
- Agent Comparison: Side-by-side comparison of different AI agents
- Academic Design: Clean, minimal interface inspired by academic research sites
- Dashwave - AI-powered software engineering agent (Sponsored)
- Google Jules - Google's AI coding assistant
- Frontend: React.js 18
- Styling: CSS3 with academic design principles
- Fonts: Inter (UI) and JetBrains Mono (code)
- Deployment: Ready for hosting on bench.aiswe.dev
- Node.js 16+
- npm or yarn
- Clone the repository:
git clone <repository-url>
cd aiswe-bench- Install dependencies:
npm install- Start the development server:
npm start- Open http://localhost:3000 to view it in the browser.
npm run buildThis creates a build folder with optimized production files ready for deployment.
src/
├── components/ # React components
│ ├── Header.js # Navigation header
│ ├── Hero.js # Hero section
│ ├── BenchmarkSection.js # Main benchmark comparison
│ └── Footer.js # Footer with links
├── App.js # Main app component
├── App.css # App-specific styles
├── index.js # React entry point
└── index.css # Global styles
The site follows academic design principles inspired by research platforms like Epoch AI and personal academic sites:
- Minimal Typography: Clean, readable fonts (Inter for UI, JetBrains Mono for code)
- Neutral Color Palette: Subtle grays and blues with high contrast
- Academic Layout: Structured, research-focused presentation
- Responsive Design: Works across all device sizes
- Accessibility: High contrast and readable text
- Issue Selection: Real GitHub issues from open source projects
- Agent Processing: AI agents analyze issues and generate solutions
- PR Creation: Automated pull request generation
- Evaluation: Assessment of code quality, correctness, and completeness
- Scoring: Comprehensive metrics and comparison
This project is sponsored by Dashwave, which is reflected in the design and presentation of benchmark results.
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Email: [email protected]
- GitHub: aiswe-bench
The site is designed to be hosted at bench.aiswe.dev. The build output is optimized for production deployment on any static hosting service.