Skip to content

Conversation

@poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Dec 24, 2025

Start to replace the current real storage testing framework.

There are many layers of abstraction in the current code, such as the TestLibraryManager and AsvBase. There is also a notion of "persistent" storage - long running data stored on S3 and only manually updated, for read benchmarks to work against. This is complicated to work with and fragile - for example it is easy to forget to update "persistent" benchmarking libraries when we update benchmark parameters, so they silently stay stuck on the wrong parameters.

The datasets we write for benchmarking are actually fairly small but they are not being written efficiently at the moment (for example, they are not using our batch APIs).

This PR shows a new design, using finalize_staged_data as a starting point.

  • Do not have separate finalize_staged_data and real_finalize_staged_data modules. Parameterize a single one by storage instead. This should be safer, since the exact benchmark code will at least run against LMDB on each PR.
  • Control which storages to test against by environment variables (ARCTICDB_STORAGE_AWS_S3=0, ARCTICDB_STORAGE_LMDB=1)
  • Make sure that we only disable storages when the benchmark executes, rather than at discovery time. This makes sure that benchmarks.json includes all the storages so that adhoc real storage runs can execute.

This PR doesn't clean up data written to S3 yet, that will be easier to implement once all the real storage benchmarks are ported over. My plan for that is for the CI to create a temporary bucket, and drop it at the end of the benchmarking run.

Manual run against real storage: https://github.com/man-group/ArcticDB/actions/runs/20850427753/job/59904037259

Manual run against LMDB: https://github.com/man-group/ArcticDB/actions/runs/20850444779

Copy link
Collaborator Author

@poodlewars poodlewars Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove real_finalize_staged_data.py and instead parameterize this over two storages. We also rip out the AsvBase base here.

copytree(FinalizeStagedData.ARCTIC_DIR_ORIGINAL, FinalizeStagedData.ARCTIC_DIR, dirs_exist_ok=True)
del self.ac
self.logger.info(f"SETUP_CACHE TIME: {time.time() - start}")
return lib_for_storage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the lib_for_storage library objects transitioned between the setup_cache and setup_and_benchmark processes? I assume they are pickled. Does pickling and unpickling a Library object work for all storages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they're pickled. We have quite careful support for pickling in the NativeVersionStore, as people rely on it for multi-processing. It certainly works for LMDB and Amazon, which are the only ones we use at the moment, and if we find storages where this does not work that would be a bug worth fixing in its own right.

@poodlewars poodlewars force-pushed the aseaton/asv/rm-asvbase branch from 696096b to 429590c Compare December 31, 2025 11:07
@poodlewars poodlewars added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 31, 2025
@poodlewars poodlewars marked this pull request as ready for review December 31, 2025 11:11
raise RuntimeError(f"storage {storage} not implemented for benchmark {__file__}")

if lib_name in ac:
ac.delete_library(lib_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: delete_library is a no-op if library doesn't exist, so we can just drop the if.

number = 1
rounds = 1
repeat = 5
repeat = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit weary of designing benchmarks in a way that relies on a single benchmark execution. I.e. doing all of the setup in setup_cache which will never be repeated.

It is possible in the future we will want to be able to repeat the benchmark multiple times to decrease the variance.

I'm ok with leaving repeat=1 if benchmark is not flaky for now but I think we should _prepopulate_library inside setup to allow increasing the repeats in the future.

Also would you mind adding comments explaining these like:

number = 1 # Not safe to increase, each benchmark run relies on setup
repeat = 1 # Safe to increase, `setup` does required staging to correctly repeat benchmark multiple times

list_of_chunks = [10_000] * param

for suffix in ("-time", "-mem"):
symbol = _symbol_name(param) + suffix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add a required suffix argument to _symbol_name since we just add it every time?

Copy link
Collaborator Author

@poodlewars poodlewars Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't add it every time, eg

        self.symbol = _symbol_name(num_chunks)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we still then add the suffix in the benchmarks themselves e.g. self.symbol + "-time" in several places.

I don't have a strong preference but it feels confusing to have self.symbol not be an actual symbol in the library but rather a symbol prefix. What do you think about either:

  • renaming self.symbol to self.symbol_prefix
  • adding suffix argument to _symbol_name and constructing the symbol in the benchmarks itself

@poodlewars poodlewars force-pushed the aseaton/asv/rm-asvbase branch from 429590c to 8d7ce6c Compare January 7, 2026 12:12
@poodlewars poodlewars force-pushed the aseaton/asv/rm-asvbase branch from 8d7ce6c to d3736fd Compare January 9, 2026 11:14
…ges to enable. Suite overwrite will be irrelevant.
assert len(self.lib.list_symbols()) == 0 # check we are in a clean state
initial_timestamp = TimestampNumber(0, self.df_generator.TIME_UNIT)

class FinalizeStagedDataWiderDataframeX3(FinalizeStagedData):
Copy link
Collaborator Author

@poodlewars poodlewars Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't run for ages. I'm just going to remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants