Skip to content

refactor!: move storage module into logstore #3382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 12, 2025
Merged

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented Apr 12, 2025

Description

TLDR;

  • storage -> logstore::storage
  • separate runtime mod for io runtime
  • code movement

Another pre-factor PR, this time we move the storage module into logstore.

While working on adding a caching layer for log reads I realised that we are exposing quite a bit of functionality via the storage module and interactions with the logstore module require quite a bit of code parsing. With kernel coming in we will have to deal with yet another storage abstraction.

The good news, in practice we actually have a quite narrow waist when creating the tables, so we can just hook into the approach we take for wrapping the io runtime. Still I believe, that logically the ObjectStore sits behind teh log store, or at least the log store is responsible for providing it.

In an immediate follow up I would like to add the cached store (@rtyler - in case you have some time to run bench marks? 😄) and with that make do a pass over our config for storage to make sure it stays manageable as we add yet another set of options.

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Apr 12, 2025
Copy link

codecov bot commented Apr 12, 2025

Codecov Report

Attention: Patch coverage is 42.20624% with 241 lines in your changes missing coverage. Please review.

Project coverage is 71.99%. Comparing base (50e6c0a) to head (03c0e37).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/logstore/storage/runtime.rs 10.14% 186 Missing ⚠️
crates/core/src/logstore/storage/mod.rs 77.60% 23 Missing and 5 partials ⚠️
crates/core/src/logstore/storage/retry_ext.rs 72.22% 7 Missing and 8 partials ⚠️
crates/core/src/logstore/storage/utils.rs 30.00% 7 Missing ⚠️
crates/core/src/logstore/mod.rs 78.57% 3 Missing ⚠️
crates/core/src/operations/merge/filter.rs 50.00% 0 Missing and 1 partial ⚠️
python/src/utils.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3382      +/-   ##
==========================================
+ Coverage   71.82%   71.99%   +0.17%     
==========================================
  Files         145      145              
  Lines       45972    45774     -198     
  Branches    45972    45774     -198     
==========================================
- Hits        33018    32957      -61     
+ Misses      10859    10743     -116     
+ Partials     2095     2074      -21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@roeap roeap changed the title refactor: move storage module into logstore refactor!: move storage module into logstore Apr 12, 2025
@ion-elgreco
Copy link
Collaborator

Just an fyi, the IO runtime wrapping will hopefully replaced by a kind of Spawservice, the initial PR of Tustvold became stale but I'll try to push it forward from there with Andrew, apache/arrow-rs#7253 (comment)

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! :) Im curious to see your caching layer and on what abstraction level it will be because there are some quite through implementations out there at least on an ObjectStore level

@roeap
Copy link
Collaborator Author

roeap commented Apr 12, 2025

Just an fyi, the IO runtime wrapping will hopefully replaced by a kind of Spawservice,

Awesome, looking forward to that :)

there are some quite through implementations out there at least on an ObjectStore level

@ion-elgreco - do you have some links maybe? 😄. The main idea is though to keep it quite simple on that layer - our most effective caching is how we managed the parsed actions :).

Another optimisation, that is not part of this (but I'm hoping to get to one day) is to cache parquet footers - this is something we can implement on the DF level ... That said, the PR should be coming up shortly, and we can continue there :).

@roeap roeap enabled auto-merge April 12, 2025 17:40
@ion-elgreco
Copy link
Collaborator

@roeap the overview is here apache/arrow-rs-object-store#14

Caches results of requests locally using memory / disk (see ObjectStoreMemCache in influxdb3_core), and this one in slatedb @criccomini (thanks @ion-elgreco for the pointer)

@roeap roeap added this pull request to the merge queue Apr 12, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 12, 2025
@roeap roeap added this pull request to the merge queue Apr 12, 2025
Merged via the queue into delta-io:main with commit 7538c58 Apr 12, 2025
29 checks passed
@roeap roeap deleted the cached-store branch April 12, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants