Skip to content

Conversation

@poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Dec 31, 2025

Note that this includes a non-test change I came across while working on this, just to fix the logging when we write empty dataframes, which was malformed. This is e7ecd73.

The other changes are to speed up or otherwise improve our ASV execution.

  • Adjust batch parameters to get a sensible execution time
  • Remove time_read_batch_pure
  • Test batch functions with 1M and 10M rows, not 1M and 1.5M rows which is not interesting
  • Several changes to decompose benchmarks in to smaller classes. We have a pattern in several places where a parameterized benchmark class includes some functions that do not actually vary based on the parameters, so they run once for each parameterization for no reason.
  • Improve the setup_cache time in comparison_benchmarks.py from about 60s to about 3s
  • Fix some places where we write empty dataframes, which causes a lot of log warnings

This has reduced the benchmarking time to roughly 1h40 down from about 2h20m.

Please review commit-by-commit.

@poodlewars poodlewars added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 31, 2025
@poodlewars poodlewars force-pushed the aseaton/asv/basic-functions-slowness branch from f9b32c3 to e19ce1a Compare January 7, 2026 11:20
@poodlewars poodlewars changed the title Aseaton/asv/basic functions slowness ASV performance improvements Jan 7, 2026
@poodlewars poodlewars marked this pull request as ready for review January 7, 2026 12:13
@poodlewars poodlewars force-pushed the aseaton/asv/basic-functions-slowness branch from e19ce1a to 8c66918 Compare January 9, 2026 15:28
param_names = PARAM_NAMES

CONNECTION_STRING = "lmdb://basic_functions"
DATE_RANGE = DATE_RANGE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

self.fresh_lib._nvs.compact_incomplete(f"sym", False, False)


class ShortWideWrite:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep the ShortWideWrite and UltraShortWideWrite separated by have parametrized ShortWideRead

warmup_time = 0
timeout = 6000
CONNECTION_STRING = "lmdb://batch_basic_functions?map_size=20GB"
sample_time = 0.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the process of coming up with this?

os.remove(self.path)

def create_dict(self, size):
ten_char_strings = [random_string(10) for _ in range(1000)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it will matter much but maybe it'll be a good idea to add some seeds for all the random stuff below

self.lib.append("sym", self.df_append_single)

def time_append_large(self, lad: LargeAppendDataModify, rows):
large: pd.DataFrame = lad.df_append_large[rows].pop(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lad seems to be used only here. Does it make sense to make a separate benchmark class for it?

rounds = 1
number = 1 # We do a single run between setup and teardown because we e.g. can't delete a symbol twice
repeat = 2
warmup_time = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why it's included in some tests and some doesn't


class IterateVersionChain:
timeout = 6000
timeout = 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this has been tweaked a few times. Given that its backend is LMDB so the storage less likely to fail, shall this be relaxed to prioritize outputting result at the end?
No strong opinion on this though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants