TPCx-AI under the Microscope: A Benchmarking Debt Analysis

This repository has the source code for the paper: TPCx-AI under the Microscope: A Benchmarking Debt Analysis, submitted at VLDB 2026.

In the folder notebooks/metadata_notebooks, you can find a set of notebooks to interactively analyze the TPCx-AI datasets and workloads at SF1. For accessing the code for running an end-to-end evaluation on a remote server at SF1-30, please follow the instructions in the data_center folder.

Interactive TPCx-AI Analysis (SF1)

To set up the environment for an interactive SF1 data exploration on TPCx-AI data, please execute the steps below.

Clone Repository:

git clone https://github.com/ilin-t/tpcx-ai-2-analysis

Download Raw Data:

https://drive.google.com/file/d/1IZPBFwakTzEQwO9cWeD-HVcQAm1O6L73/view?usp=sharing
mkdir data && cd data
unzip raw_data.zip

Environment setup

Install all TPCx-AI dependencies:

cd setup  
bash setup-python.sh

The setup-python.sh script installs two environments:

python-venv for Use Cases: 1, 4, 8, 10
python-venv-ks for Use Cases: 2, 3, 5, 6, 7, 9

Run pipelines

cd ../notebooks/metadata_notebooks/

In this folder, there all 10 all use cases with a step-by-step breakdown and metadata generation. The pipelines generate long metadata files and visualizations of their data distribution and/or decision boundaries. To skip the metadata generation and analyze the json files independently, skip to Metadata Analysis.

Pipeline Comparison and Analysis

To compare some of the pipelines and execute larger SF analysis on ready data, head to notebooks/analysis folder and run the notebooks inside python-venv.

Metadata Analysis

The raw metadata can be found notebooks/json_outputs directory.

Default Pipelines

The default pipelines of TPCx-AI can be found in the pipelines directory in .py format and in notebooks/default_notebooks directory in the .ipynb format.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data_center		data_center
notebooks		notebooks
pipelines		pipelines
setup		setup
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TPCx-AI under the Microscope: A Benchmarking Debt Analysis

Interactive TPCx-AI Analysis (SF1)

Clone Repository:

Download Raw Data:

Environment setup

Run pipelines

Pipeline Comparison and Analysis

Metadata Analysis

Default Pipelines

About

Uh oh!

Releases

Packages

Languages

License

hpides/tpcx-ai-2-analysis

Folders and files

Latest commit

History

Repository files navigation

TPCx-AI under the Microscope: A Benchmarking Debt Analysis

Interactive TPCx-AI Analysis (SF1)

Clone Repository:

Download Raw Data:

Environment setup

Run pipelines

Pipeline Comparison and Analysis

Metadata Analysis

Default Pipelines

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages