Skip to content

More efficient storage storage format #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Sep 19, 2024
Merged

More efficient storage storage format #27

merged 21 commits into from
Sep 19, 2024

Conversation

rkistner
Copy link
Contributor

@rkistner rkistner commented Sep 10, 2024

This changes ps_oplog to only store PUT operations, roughly in the format described here.

Specific changes on ps_oplog:

  1. Remove superseded column.
  2. Remove op column.
  3. Change bucket column from TEXT (bucket name) to INTEGER (bucket id).

Instead of storing pending REMOVE operations in ps_oplog, it is now stored in a dedicated ps_updated_rows table. Only the row type and id are relevant here - it indicates rows which may need to be re-synced from the local bucket data.

This has some advantages:

  1. Reduced storage requirements.
  2. No need for a slow "compact" operation to delete old REMOVE operations.
  3. Simplifies deleting of buckets.

The operations delete_pending_buckets and clear_remove_ops are now no-ops, and may be removed from the respective SDKs.

Benchmarks

Flutter desktop on linux, 500k PUT operations in a single global[] bucket, time initial sync and database size.

Before: 56s total, of which 18s is spent in sync_local. 401MB database.
After: 57s total, of which 19s is spent in sync_local. 390MB database.

Conclusion: Does not significantly affect initial sync time. Does slightly reduces database size. May have a bigger effect when syncing many REMOVE operations, when deleting buckets, or when there are long bucket names.

Tests

This now adds dart-based tests to test the extension - specifically the schema migrations. This could be any high-level language - just used Dart because the tests were already written in Dart.

Other tests may be moved from the powersync.dart repo to here in the future.

@rkistner rkistner mentioned this pull request Sep 10, 2024
@rkistner rkistner changed the title WIP: Restructure storage format More efficient storage storage format Sep 19, 2024
@rkistner rkistner marked this pull request as ready for review September 19, 2024 12:33
Base automatically changed from caching-checksums to main September 19, 2024 14:24
@rkistner rkistner merged commit a7a8f8e into main Sep 19, 2024
17 checks passed
@rkistner rkistner deleted the restructure branch September 19, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants