ShareBench-Gen is a workload generator for OLAP workloads based on the TPC-DS data set and queries. It is intended to be used with ShareBench-Base to perform real-world performance analysis studies, especially of distributed resource-sharing mechanisms and policies.
It is recommended to follow the installation of ShareBench-Base. For ShareBench-Gen in isolation, only the step of setting up a Python environment is needed.
This repository includes statistics for a limited number of queries and date ranges, that can be used out of the box for generating new workloads. If the statistics should be gathered anew (possibly because of a different system performance), or new queries should be added, the process described below can be used.
- Follow the guide to generate data.
- If new queries or date ranges have been added, re-build the image.
- Use the
collect_query_stats.py
scripts to collect runtime statistics of the queries. - Collect and save the query data using the
query-stats
notebook. - Import the new data into the generator.
The queries can be found in the docker/queries
directory, any file with a .sql
extension in this folder will be considered a query.
The docker/queries/dates.json
file defines the date ranges to be used.
The workload generator exists in the form of a Jupyter notebook. The notebook includes use case examples which should serve as a manual.
Note that after generating and saving a workload, the Docker image has to be re-built and pushed, which can be done using the image
script (as described in Image Modifications)