ASV performance improvements #2834

poodlewars · 2025-12-31T15:55:04Z

Note that this includes a non-test change I came across while working on this, just to fix the logging when we write empty dataframes, which was malformed. This is e7ecd73.

The other changes are to speed up or otherwise improve our ASV execution.

Adjust batch parameters to get a sensible execution time
Remove time_read_batch_pure
Test batch functions with 1M and 10M rows, not 1M and 1.5M rows which is not interesting
Several changes to decompose benchmarks in to smaller classes. We have a pattern in several places where a parameterized benchmark class includes some functions that do not actually vary based on the parameters, so they run once for each parameterization for no reason.
Improve the setup_cache time in comparison_benchmarks.py from about 60s to about 3s
Fix some places where we write empty dataframes, which causes a lot of log warnings

This has reduced the benchmarking time to roughly 1h40 down from about 2h20m.

Please review commit-by-commit.

… rows is pointless.

…tions information about the setup_cache time is clearer

…og warnings

Remove unused start_time = time.time()

To start with this took 63s on my machine. After the change for string generation, this took 8s. After the change for date generation, this took 3s.

…y run for each "rows" parameter, even though they do not use it.

… ModificationFunctions otherwise they run for each "rows" parameter, even though they do not use it.

vasil-pashov · 2026-01-09T16:09:20Z

python/benchmarks/basic_functions.py

    param_names = PARAM_NAMES

+    CONNECTION_STRING = "lmdb://basic_functions"
+    DATE_RANGE = DATE_RANGE


Why do we need this?

vasil-pashov · 2026-01-09T16:15:54Z

python/benchmarks/basic_functions.py

        self.fresh_lib._nvs.compact_incomplete(f"sym", False, False)


+class ShortWideWrite:


Is there a reason to keep the ShortWideWrite and UltraShortWideWrite separated by have parametrized ShortWideRead

vasil-pashov · 2026-01-09T16:19:52Z

python/benchmarks/basic_functions.py

-    warmup_time = 0
-    timeout = 6000
-    CONNECTION_STRING = "lmdb://batch_basic_functions?map_size=20GB"
+    sample_time = 0.1


What was the process of coming up with this?

vasil-pashov · 2026-01-09T16:33:22Z

python/benchmarks/comparison_benchmarks.py

            os.remove(self.path)

    def create_dict(self, size):
+        ten_char_strings = [random_string(10) for _ in range(1000)]


I doubt it will matter much but maybe it'll be a good idea to add some seeds for all the random stuff below

vasil-pashov · 2026-01-09T16:49:29Z

python/benchmarks/modification_functions.py

+        self.lib.append("sym", self.df_append_single)
+
+    def time_append_large(self, lad: LargeAppendDataModify, rows):
+        large: pd.DataFrame = lad.df_append_large[rows].pop(0)


lad seems to be used only here. Does it make sense to make a separate benchmark class for it?

phoebusm · 2026-01-09T17:12:31Z

python/benchmarks/modification_functions.py

+    rounds = 1
+    number = 1  # We do a single run between setup and teardown because we e.g. can't delete a symbol twice
+    repeat = 2
+    warmup_time = 0


I wonder why it's included in some tests and some doesn't

phoebusm · 2026-01-09T17:17:53Z

python/benchmarks/version_chain.py


 class IterateVersionChain:
-    timeout = 6000
+    timeout = 1000


I see this has been tweaked a few times. Given that its backend is LMDB so the storage less likely to fail, shall this be relaxed to prioritize outputting result at the end?
No strong opinion on this though

poodlewars added patch Small change, should increase patch version no-release-notes This PR shouldn't be added to release notes. labels Dec 31, 2025

poodlewars force-pushed the aseaton/asv/basic-functions-slowness branch from f9b32c3 to e19ce1a Compare January 7, 2026 11:20

poodlewars changed the title ~~Aseaton/asv/basic functions slowness~~ ASV performance improvements Jan 7, 2026

poodlewars marked this pull request as ready for review January 7, 2026 12:13

poodlewars requested review from IvoDD and alexowens90 as code owners January 7, 2026 12:13

poodlewars added 13 commits January 9, 2026 15:22

Basic tweaks to batch parameters for speedup

5fd4ae7

Remove time_read_batch_pure, its results match time_read_batch

8c0a5a4

Use more rows for the batch basic_functions. Testing over 1M and 1.5M…

015deea

… rows is pointless.

Move the ModificationFunctions out of basic_functions.py so that dura…

6a065fb

…tions information about the setup_cache time is clearer

Incidental: update warning message when writing empty dataframes.

ab375c2

Do not write empty dataframes in benchmarks as they create a ton of l…

db96164

…og warnings

Fix nit from version chain PR

458c9a2

Remove unused start_time = time.time()

Improve performance of the comparison_benchmarks.py setup_cache

3d30e44

To start with this took 63s on my machine. After the change for string generation, this took 8s. After the change for date generation, this took 3s.

Time out the IterateVersionChain benchmark after 5 minutes.

1b4ae35

Split out the short wide benchmarks from BasicFunctions otherwise the…

6aac272

…y run for each "rows" parameter, even though they do not use it.

Split out the short wide benchmarks and some deletion benchmarks from…

8b0ea1e

… ModificationFunctions otherwise they run for each "rows" parameter, even though they do not use it.

Tweak IterateVersionChain timeout

8ce8e13

Update benchmarks.json

8c66918

poodlewars force-pushed the aseaton/asv/basic-functions-slowness branch from e19ce1a to 8c66918 Compare January 9, 2026 15:28

Linting

cf8f375

vasil-pashov reviewed Jan 9, 2026

View reviewed changes

phoebusm reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ASV performance improvements #2834

ASV performance improvements #2834

Uh oh!

poodlewars commented Dec 31, 2025 •

edited

Loading

Uh oh!

vasil-pashov Jan 9, 2026

Uh oh!

vasil-pashov Jan 9, 2026

Uh oh!

vasil-pashov Jan 9, 2026

Uh oh!

vasil-pashov Jan 9, 2026

Uh oh!

vasil-pashov Jan 9, 2026

Uh oh!

phoebusm Jan 9, 2026

Uh oh!

phoebusm Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		self.fresh_lib._nvs.compact_incomplete(f"sym", False, False)


		class ShortWideWrite:

ASV performance improvements #2834

Are you sure you want to change the base?

ASV performance improvements #2834

Uh oh!

Conversation

poodlewars commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasil-pashov Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

vasil-pashov Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

vasil-pashov Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

vasil-pashov Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

vasil-pashov Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

phoebusm Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

phoebusm Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

poodlewars commented Dec 31, 2025 •

edited

Loading