Skip to content

Add caching of sharding function #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 31, 2022

Conversation

0x501D
Copy link
Member

@0x501D 0x501D commented Dec 13, 2021

The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:

  • _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
    trigger
  • schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82

@0x501D 0x501D marked this pull request as draft December 13, 2021 21:51
Copy link
Member

@ligurio ligurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! See my comments inline.

Also please add a 'Closes #82' to commit message and a number of issue to PR description.

@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from a5089d1 to 9e0f3f8 Compare December 14, 2021 16:23
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch 2 times, most recently from d8c016d to a1799da Compare December 15, 2021 14:40
@0x501D 0x501D marked this pull request as ready for review December 15, 2021 14:41
@0x501D 0x501D requested a review from ligurio December 15, 2021 14:41
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch 2 times, most recently from 26c1da3 to 62da6fa Compare December 16, 2021 12:57
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from 62da6fa to 213337a Compare December 16, 2021 13:32
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from 213337a to 3862d21 Compare December 23, 2021 11:05
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch 2 times, most recently from 81fcff6 to 42b0295 Compare December 28, 2021 07:53
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch 2 times, most recently from 878ee46 to b3cd5ab Compare December 30, 2021 11:40
@Totktonada
Copy link
Member

Since we aim to resolve a performance problem here I would like to see some performance measurements. For example, the following:

  • Baseline: just call our own bucket_id function directly (I suggest to extract mpcrc32 from vshard).
  • Bad case: call ddl.bucket_id() before this patch (name / body).
  • Good case: call it after the patch (name / body).

It would be good to commit the microbenchmark right into the repository with a short instruction.

If the good case is not so good, we can look a bit deeper and, say, check, whether implementing a fast path for a function that is placed directly in _G will change the situation (here we don't need a loop to extract a nested field with the function). Or, maybe, we'll spot other problematic places. Or, in turn, we can see that the caching does not give anything (I would be surprised, but anyway).

@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from b3cd5ab to 79ea5db Compare January 14, 2022 17:18
@0x501D 0x501D requested a review from Totktonada January 14, 2022 17:24
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from 79ea5db to f0539c2 Compare January 14, 2022 20:59
@0x501D 0x501D requested a review from ligurio January 14, 2022 21:01
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch 2 times, most recently from adefb00 to 9b0fc46 Compare January 21, 2022 08:34
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from 9b0fc46 to 989e61f Compare January 24, 2022 11:12
@0x501D 0x501D requested a review from Totktonada January 24, 2022 11:16
Copy link
Member

@ligurio ligurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your fixes!

In general everything is fine, however I left several non-critical comments. Please consider them. I will give an LGTM to don't block merge.

Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Please, address stylistic suggestions from Sergey, squash (or rework-and-squash) the proposed commit into your one and let me know that everything is done.

The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
@0x501D 0x501D force-pushed the 0x501D/gh-82-caching-of-sharding-func branch from 42c01d5 to 725b241 Compare January 26, 2022 12:14
@Totktonada Totktonada merged commit 147f8ac into master Jan 31, 2022
@Totktonada Totktonada deleted the 0x501D/gh-82-caching-of-sharding-func branch January 31, 2022 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching of sharding function
3 participants