Skip to content

Source code for the VLDB 2026 submission TPCx-AI under the Microscope: A Benchmarking Debt Analysis

License

Notifications You must be signed in to change notification settings

hpides/tpcx-ai-2-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPCx-AI under the Microscope: A Benchmarking Debt Analysis

This repository has the source code for the paper: TPCx-AI under the Microscope: A Benchmarking Debt Analysis, submitted at VLDB 2026.

In the folder notebooks/metadata_notebooks, you can find a set of notebooks to interactively analyze the TPCx-AI datasets and workloads at SF1. For accessing the code for running an end-to-end evaluation on a remote server at SF1-30, please follow the instructions in the data_center folder.

Interactive TPCx-AI Analysis (SF1)

To set up the environment for an interactive SF1 data exploration on TPCx-AI data, please execute the steps below.

Clone Repository:

git clone https://github.com/ilin-t/tpcx-ai-2-analysis

Download Raw Data:

https://drive.google.com/file/d/1IZPBFwakTzEQwO9cWeD-HVcQAm1O6L73/view?usp=sharing
mkdir data && cd data
unzip raw_data.zip

Environment setup

Install all TPCx-AI dependencies:

cd setup  
bash setup-python.sh

The setup-python.sh script installs two environments:

  • python-venv for Use Cases: 1, 4, 8, 10
  • python-venv-ks for Use Cases: 2, 3, 5, 6, 7, 9

Run pipelines

cd ../notebooks/metadata_notebooks/

In this folder, there all 10 all use cases with a step-by-step breakdown and metadata generation. The pipelines generate long metadata files and visualizations of their data distribution and/or decision boundaries. To skip the metadata generation and analyze the json files independently, skip to Metadata Analysis.

Pipeline Comparison and Analysis

To compare some of the pipelines and execute larger SF analysis on ready data, head to notebooks/analysis folder and run the notebooks inside python-venv.

Metadata Analysis

The raw metadata can be found notebooks/json_outputs directory.

Default Pipelines

The default pipelines of TPCx-AI can be found in the pipelines directory in .py format and in notebooks/default_notebooks directory in the .ipynb format.

About

Source code for the VLDB 2026 submission TPCx-AI under the Microscope: A Benchmarking Debt Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published