Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.98 KB

sharebench-gen.md

File metadata and controls

30 lines (19 loc) · 1.98 KB

ShareBench-Gen

ShareBench-Gen is a workload generator for OLAP workloads based on the TPC-DS data set and queries. It is intended to be used with ShareBench-Base to perform real-world performance analysis studies, especially of distributed resource-sharing mechanisms and policies.

Installation

It is recommended to follow the installation of ShareBench-Base. For ShareBench-Gen in isolation, only the step of setting up a Python environment is needed.

Collecting Query Data

This repository includes statistics for a limited number of queries and date ranges, that can be used out of the box for generating new workloads. If the statistics should be gathered anew (possibly because of a different system performance), or new queries should be added, the process described below can be used.

  1. Follow the guide to generate data.
  2. If new queries or date ranges have been added, re-build the image.
  3. Use the collect_query_stats.py scripts to collect runtime statistics of the queries.
  4. Collect and save the query data using the query-stats notebook.
  5. Import the new data into the generator.

The queries can be found in the docker/queries directory, any file with a .sql extension in this folder will be considered a query. The docker/queries/dates.json file defines the date ranges to be used.

Generating Workloads

The workload generator exists in the form of a Jupyter notebook. The notebook includes use case examples which should serve as a manual.

Note that after generating and saving a workload, the Docker image has to be re-built and pushed, which can be done using the image script (as described in Image Modifications)