Remove ASVBase from the finalize staged data benchmarks #2825

poodlewars · 2025-12-24T12:25:25Z

Start to replace the current real storage testing framework.

There are many layers of abstraction in the current code, such as the TestLibraryManager and AsvBase. There is also a notion of "persistent" storage - long running data stored on S3 and only manually updated, for read benchmarks to work against. This is complicated to work with and fragile - for example it is easy to forget to update "persistent" benchmarking libraries when we update benchmark parameters, so they silently stay stuck on the wrong parameters.

The datasets we write for benchmarking are actually fairly small but they are not being written efficiently at the moment (for example, they are not using our batch APIs).

This PR shows a new design, using finalize_staged_data as a starting point.

Do not have separate finalize_staged_data and real_finalize_staged_data modules. Parameterize a single one by storage instead. This should be safer, since the exact benchmark code will at least run against LMDB on each PR.
Control which storages to test against by environment variables (ARCTICDB_STORAGE_AWS_S3=0, ARCTICDB_STORAGE_LMDB=1)
Make sure that we only disable storages when the benchmark executes, rather than at discovery time. This makes sure that benchmarks.json includes all the storages so that adhoc real storage runs can execute.

This PR doesn't clean up data written to S3 yet, that will be easier to implement once all the real storage benchmarks are ported over. My plan for that is for the CI to create a temporary bucket, and drop it at the end of the benchmarking run.

Manual run against real storage: https://github.com/man-group/ArcticDB/actions/runs/20850427753/job/59904037259

Manual run against LMDB: https://github.com/man-group/ArcticDB/actions/runs/20850444779

poodlewars · 2025-12-24T12:26:33Z

python/benchmarks/finalize_staged_data.py

We remove real_finalize_staged_data.py and instead parameterize this over two storages. We also rip out the AsvBase base here.

IvoDD · 2025-12-29T09:27:19Z

python/benchmarks/finalize_staged_data.py

-        copytree(FinalizeStagedData.ARCTIC_DIR_ORIGINAL, FinalizeStagedData.ARCTIC_DIR, dirs_exist_ok=True)
-        del self.ac
+        self.logger.info(f"SETUP_CACHE TIME: {time.time() - start}")
+        return lib_for_storage


How are the lib_for_storage library objects transitioned between the setup_cache and setup_and_benchmark processes? I assume they are pickled. Does pickling and unpickling a Library object work for all storages?

Yeah they're pickled. We have quite careful support for pickling in the NativeVersionStore, as people rely on it for multi-processing. It certainly works for LMDB and Amazon, which are the only ones we use at the moment, and if we find storages where this does not work that would be a bug worth fixing in its own right.

IvoDD · 2026-01-05T09:00:53Z

python/benchmarks/environment_setup.py

+        raise RuntimeError(f"storage {storage} not implemented for benchmark {__file__}")
+
+    if lib_name in ac:
+        ac.delete_library(lib_name)


Nit: delete_library is a no-op if library doesn't exist, so we can just drop the if.

IvoDD · 2026-01-05T09:30:38Z

python/benchmarks/finalize_staged_data.py

    number = 1
    rounds = 1
-    repeat = 5
+    repeat = 1


I'm a bit weary of designing benchmarks in a way that relies on a single benchmark execution. I.e. doing all of the setup in setup_cache which will never be repeated.

It is possible in the future we will want to be able to repeat the benchmark multiple times to decrease the variance.

I'm ok with leaving repeat=1 if benchmark is not flaky for now but I think we should _prepopulate_library inside setup to allow increasing the repeats in the future.

Also would you mind adding comments explaining these like:

number = 1 # Not safe to increase, each benchmark run relies on setup repeat = 1 # Safe to increase, `setup` does required staging to correctly repeat benchmark multiple times

IvoDD · 2026-01-05T09:37:15Z

python/benchmarks/finalize_staged_data.py

+            list_of_chunks = [10_000] * param
+
+            for suffix in ("-time", "-mem"):
+                symbol = _symbol_name(param) + suffix


Why not add a required suffix argument to _symbol_name since we just add it every time?

We don't add it every time, eg

self.symbol = _symbol_name(num_chunks)

In that case we still then add the suffix in the benchmarks themselves e.g. self.symbol + "-time" in several places.

I don't have a strong preference but it feels confusing to have self.symbol not be an actual symbol in the library but rather a symbol prefix. What do you think about either:

renaming self.symbol to self.symbol_prefix

adding suffix argument to _symbol_name and constructing the symbol in the benchmarks itself

…taged_data.py to it and fix the benchmark so that it actually does something.

…ng, so that all storages are present in `benchmarks.json`

…. This matches the old behaviour.

…ges to enable. Suite overwrite will be irrelevant.

poodlewars · 2026-01-09T11:43:44Z

python/benchmarks/finalize_staged_data.py

+        assert len(self.lib.list_symbols()) == 0  # check we are in a clean state
+        initial_timestamp = TimestampNumber(0, self.df_generator.TIME_UNIT)

-class FinalizeStagedDataWiderDataframeX3(FinalizeStagedData):


This hasn't run for ages. I'm just going to remove it.

poodlewars commented Dec 24, 2025

View reviewed changes

IvoDD reviewed Dec 29, 2025

View reviewed changes

poodlewars force-pushed the aseaton/asv/rm-asvbase branch from 696096b to 429590c Compare December 31, 2025 11:07

poodlewars added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 31, 2025

poodlewars marked this pull request as ready for review December 31, 2025 11:11

poodlewars requested a review from alexowens90 as a code owner December 31, 2025 11:11

IvoDD reviewed Jan 5, 2026

View reviewed changes

poodlewars force-pushed the aseaton/asv/rm-asvbase branch from 429590c to 8d7ce6c Compare January 7, 2026 12:12

IvoDD approved these changes Jan 7, 2026

View reviewed changes

poodlewars added 5 commits January 9, 2026 11:08

Introduce simpler way to run real storage benchmarks. Port finalize_s…

a208869

…taged_data.py to it and fix the benchmark so that it actually does something.

Skip undesired storages at test runtime rather than when parameterizi…

f47527f

…ng, so that all storages are present in `benchmarks.json`

Make strtobool assume values are False by default rather than raising…

90d845c

…. This matches the old behaviour.

Implement PR comments on finalize_staged_data.py

135d2ac

Update benchmarks.json

d3736fd

poodlewars force-pushed the aseaton/asv/rm-asvbase branch from 8d7ce6c to d3736fd Compare January 9, 2026 11:14

Use the enumeration of storages for manual runs to decide which stora…

f5add5e

…ges to enable. Suite overwrite will be irrelevant.

poodlewars commented Jan 9, 2026

View reviewed changes

Formatting

5c5df32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove ASVBase from the finalize staged data benchmarks #2825

Remove ASVBase from the finalize staged data benchmarks #2825

Uh oh!

poodlewars commented Dec 24, 2025 •

edited

Loading

Uh oh!

poodlewars Dec 24, 2025 •

edited

Loading

Uh oh!

IvoDD Dec 29, 2025

Uh oh!

poodlewars Dec 31, 2025

Uh oh!

IvoDD Jan 5, 2026

Uh oh!

IvoDD Jan 5, 2026

Uh oh!

IvoDD Jan 5, 2026

Uh oh!

poodlewars Jan 7, 2026 •

edited

Loading

Uh oh!

IvoDD Jan 7, 2026

Uh oh!

poodlewars Jan 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove ASVBase from the finalize staged data benchmarks #2825

Are you sure you want to change the base?

Remove ASVBase from the finalize staged data benchmarks #2825

Uh oh!

Conversation

poodlewars commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poodlewars Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvoDD Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

poodlewars Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

IvoDD Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

IvoDD Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

IvoDD Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

poodlewars Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvoDD Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

poodlewars Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

poodlewars commented Dec 24, 2025 •

edited

Loading

poodlewars Dec 24, 2025 •

edited

Loading

poodlewars Jan 7, 2026 •

edited

Loading

poodlewars Jan 9, 2026 •

edited

Loading