Choosing the right database for analytics is hard: there are many options, each optimized for different use cases. Benchmarks can help, but only if they reflect your actual workload.
Common analytics benchmarks tend to represent analytics workloads as:
- Store all data in a single, wide, denormalized table.
- Run full-table scans or large aggregations across long time periods.
- Are optimized for ad-hoc, exploratory queries rather than pre-defined application queries.
This approach works well for batch processing and historical analysis, but real-time analytics inside applications requires a different perspective. Instead of analyzing large datasets retrospectively, applications generate fast, targeted insights on fresh data for specific users, devices, or transactions. This leads to three key differences:
- Queries require joining multiple tables instead of using a single denormalized table.
- Queries are often highly selective, filtering on specific objects and time windows.
- Pre-aggregated views are very often used for instant responses.
That is why we designed RTABench, to provide a benchmark that accurately reflects real-time analytics inside applications, with a normalized schema, realistic dataset sizes, and queries that match real-world usage patterns.
RTABench uses the Clickbench framework for benchmarking, but it introduces a new dataset and query set that better represents real-time analytics inside applications. All tools, datasets, and benchmark results are available on GitHub, where we welcome contributions for expanding RTABench to support additional databases and optimizations.
Like any benchmark, RTABench results should not be viewed as a ranking of databases, but rather as a guide to understanding which system aligns best with your real-time analytics needs.
A Normalized Data Model That Reflects Real Applications
RTABench is based on an application that tracks products, orders, and shipments for an online store. Instead of a single table, it follows a normalized schema that reflects how most applications store and manage data:
- customers – stores information about people making orders.
- products – contains product catalog information, including prices and stock levels.
- orders – tracks orders placed by customers.
- order_items – records which products were included in each order.
- order_events – tracks order status changes (e.g., created, shipped, delivered).
This multi-table schema ensures RTABench measures how well databases handle real-time analytics queries that require joins and filtering—a scenario missing from other analytics benchmarks.
RTABench includes a dataset with ~171 million events that is large enough to run performance benchmarks without making it impractical to be used and run easily and fast.
The benchmark also includes 1,102 customers, 9,255 products and 10,010,342 orders, ensuring RTABench can test query performance under realistic application workloads while remaining scalable for different database configurations.
Measuring Real-Time Performance
RTABench evaluates databases using 33 queries designed to reflect the analytics patterns commonly found in real-time applications. These queries assess query performance on normalized schemas, selective filtering, and incremental materialized views:
- Raw event queries – Counting, filtering, and aggregating events over time. (e.g., “Count the number of ‘Departed’ shipments per day at a specific terminal.”)
- Selective filtering – Querying specific objects and time windows to test indexing and partitioning efficiency. (e.g., “Find the last recorded status of a given order.”)
- Multi-table joins – Fetching related data across multiple tables to simulate real-world application queries. (e.g., “Show the total revenue generated by each customer in the last 30 days.”)
- Pre-aggregated queries – Measuring how incremental materialized views improve response times by precomputing results. (e.g., “Retrieve pre-aggregated counts of delayed shipments over the last month.”)
By including both raw and pre-aggregated queries, RTABench ensures that databases are tested for both ad-hoc analytics and optimized real-time reporting, capturing the trade-offs between flexibility and performance.
RTABench evaluates databases that are commonly used for real-time analytics inside applications, where high-ingest rates, low-latency queries, and efficient joins are critical. Databases in the benchmark fall into three broad categories:
- General-Purpose Databases: A transactional database that can handle many use cases and used as the primary database for an application. Most general-purpose databases, like PostgreSQL and MySQL, are capable of handling real-time analytics depending on scale and performance requirements
- Real-Time Analytics: A database optimized for real-time analytics with support for high ingest throughput, making data instantly available, fast analytical queries and high concurrency. Specialized real-time analytics databases are often used as a secondary database for an application.
- Batch Analytics Databases: These databases are optimized for large-scale historical analysis and batch processing, excelling at ad-hoc queries on static datasets rather than real-time, continuously updated data. Although Batch analytics databases are not designed for real-time analytics, we have included them in RTABench for developers interested in comparing their performance. Because these databases cannot serve real-time analytics, they are not the focus of this benchmark. Their results are not shown by default, as the benchmark is not targeted at them. It’s possible for a database to fall into multiple categories based on their capabilities.
The first version of the benchmark includes the databases listed below:
Database | General-Purpose | Real-Time | Batch Analytics |
---|---|---|---|
ClickHouse | ✅ | ✅ | |
ClickHouse Cloud | ✅ | ✅ | |
DuckDB | ✅ | ||
MongoDB | ✅ | ||
MySQL | ✅ | ||
PostgreSQL | ✅ | ||
PostgreSQL with TimescaleDB | ✅ | ✅ | |
Timescale Cloud | ✅ | ✅ |
RTABench is an open-source benchmark, and we encourage the community to contribute by:
- Adding new databases to expand the comparison.
- Improving query optimizations for different systems.
- Providing feedback on configurations to ensure fairness.
Contributions can be made through GitHub, where all benchmark tooling, datasets, and results are publicly available.
By using Clickbench as the underlying framework, we inherit the same rules for adding new systems and results.
It compares analytical databases using clickstream data. This type of workload—common in web analytics, BI reporting, and log aggregation—favors single-table queries that scan large datasets to generate insights over long time ranges.
Time Series Benchmark Suite is a benchmarking tool designed to evaluate the performance of time-series databases under realistic ingestion and query workloads.
A benchmark that measures the performance of analytical databases using a set of ad hoc business queries on a simplified schema. It evaluates the usecase of traditional data warehouses.
An evolution of TPC-H to provide a more realistic, complex, and comprehensive benchmark for modern decision support systems. It uses complex, business-oriented queries on large, multi-dimensional datasets. It's meant to evaluate data warehouse-like workloads (star/snowflake schema, fact/dimension tables)