JSCore is a dataset designed to support research on automated unit test generation for JavaScript and TypeScript. This repository provides a curated benchmark of real-world open-source projects together with a static analysis pipeline for extracting language features, code complexity metrics, and project metadata.
Motivated by the lack of a broadly adopted benchmark for JavaScript test generation, JSCore captures the diversity of real-world projects while providing an extensible pipeline that can be expanded with additional analyses and projects. The dataset is intended to support the evaluation of modern test generation approaches including search-based tools, LLM-based systems, and emerging agentic systems and to contribute toward a standardized benchmark for testing research on dynamically typed languages.
The dataset consists of a curated list of JS and TS repositories. All projects are already included under the Benchmark-Projects directory. Alternatively, a complete list is provided in benchmark.txt for those who wish to clone them manually. You can manage these using the provided scripts:
- Cloning: Use
scripts/clone_benchmark.shto clone all repositories listed inbenchmark.txtinto theBenchmark-Projectsdirectory. - Installation & Validation: Run
scripts/install_benchmark.shto performnpm installon the benchmarked projects making them ready for test generation evaluation.
If you wish to execute the pipeline on your own set of repositories or modify the analysis criteria:
- Prerequisites: Ensure you have Python (3.x), Node.js and npm installed.
- Environment Setup: Install the necessary Python and Node.js dependencies:
pip install -r requirements.txt npm install
- Repository Discovery: Run
scripts/get_gh_repos.pyto populaterepos.txtwith potential GitHub repositories. - Cloning: Execute
scripts/clone_repos.shto clone the discovered repositories. - Feature Analysis: Run
evaluation_pipeline/evaluate.pyto analyze the complexity and language features of the collected projects. - Filtering: Use
evaluation_pipeline/analyse_and_filter.pyto apply filtering criteria. The final project list with analysis metrics is stored infiltered_projects_with_analysis_result.csv.
The project includes a web-based explorer to visualize the dataset features. You can access it via index.html or the GitHub Pages deployment.
The following projects constitute the core benchmark dataset:
- AdminLTE
- Autoprefixer
- Bower
- Dropzone
- Express-JWT
- Extract-Text-Webpack-Plugin
- Hoodie
- JSON-Server
- Knex
- Lodash
- Markdown-Here
- MDX
- NeDB
- Release-It
- ScrollMagic
- ScrollReveal
- ShellJS
- Sitespeed.io
- Spectacle-Code-Slide
- Supertest
- Tabulator
- PostCSS
- Puppeteer
- Tabby
- Vue-Devtools
- Weex-UI
- ZY-Player
- Commander.js
- Express
- JavaScript-Algorithms
- Crawler-URL-Parser
- Delta
- Node-Dir
- Node-Glob
- Node-Graceful-FS
- Spacl-Core
- Zip-a-folder
- Beatbump
- Felte
- Pandora
- Xyflow
The complete list with specific commit hashes is available in benchmark.txt.
This project and its accompanying code are licensed under the MIT License. Individual projects of the benchmark within the Benchmark-Projects directory are subject to their respective original licenses.