Support writes during ingestion #400

hhwyt · 2025-01-07T14:24:35Z

This PR adds an option allow_write to IngestExternalFileOptions, enabling writes to the database during the ingestion process.

More details at tikv/tikv#18096.

hhwyt · 2025-01-07T14:29:36Z

/cc @glorv @LykxSassinator @Connor1996 @v01dstar PTAL, thx~

Connor1996 · 2025-01-07T23:06:43Z

include/rocksdb/options.h

@@ -2095,6 +2095,9 @@ struct IngestExternalFileOptions {
  //
  // ingest_behind takes precedence over fail_if_not_bottommost_level.
  bool fail_if_not_bottommost_level = false;
+  // Set to TRUE if user wants to allow writes to the DB during ingestion.
+  // User must ensure no writes overlap with the ingested data.
+  bool allow_write = false;


allow_write is a little confusing, how about allow_foreground_write

This naming style aligns well with RocksDB’s conventions. In RocksDB, similar names, such as unordered_write and WaitForPendingWrites, typically use “write” to refer to foreground writes.

Connor1996 · 2025-01-07T23:07:56Z

db/db_impl/db_impl.cc

+      if (two_write_queues_) {
+        nonmem_write_thread_.ExitUnbatched(&nonmem_w);
+      }
+      write_thread_.ExitUnbatched(&w);


Why not just skip L5856-5869 when allowing write

Even with allow_write = true, writes to the DB must be temporarily stopped to wait for pending writes. This is because allow_write = true only requires users to ensure no concurrent writes overlap with the ingestion data and does not require ensuring no overlapping unordered_write before ingestion.

@v01dstar @Connor1996 PTAL again, thx~

Or we can skip lines L5856-5869 and treat unfinished unordered_write as concurrent writes. Users need to ensure unordered_writes are finished before ingestion. This approach might better align with the original intent of allow_write, as it offers better performance when users can guarantee no concurrent writes. TiKV seems to be able to guarantee this. @v01dstar @Connor1996 What do you think of this?

I think, in theory, we can ignore those preceding-queued-unfinished foreground writes.

However, I don't have absolute confidence, since this change made many "assumptions". Just a side note, we probably need to test it against Jepsen test suite.

I am now confident that in all current TiKV scenarios where allow_write is enabled, ongoing writes will always be waited for to finish. Therefore, I updated the code to no longer wait for pending writes.

This conclusion is supported by the many tests added directly to test_ingest_sst.rs in this TiKV PR (tikv/tikv#18096).

glorv · 2025-01-08T03:31:35Z

db/db_impl/db_impl.cc

+  bool allow_write = args[0].options.allow_write;
+  for (const auto& arg : args) {
+    if (arg.options.allow_write != allow_write) {
+      return Status::InvalidArgument(


Better fallback to the default behavior if any of this options is false. Return error here will cause tikv panic in some code paths.

I think it is not necessary. Firstly, from RocksDB’s perspective, the checks should be as strict as possible. Secondly, TiKV does not currently use the IngestExternalFiles interface directly; it only uses IngestExternalFile, which calls IngestExternalFiles with a single arg, i.e., only one allow_write. Therefore, this would not affect TiKV.

v01dstar

lgtm

ti-chi-bot · 2025-01-09T03:56:01Z

[LGTM Timeline notifier]

Timeline:

2025-01-09 03:56:01.542220825 +0000 UTC m=+412304.831052532: ☑️ agreed by v01dstar.

ti-chi-bot · 2025-01-09T03:56:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: v01dstar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [v01dstar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Connor1996

It comes to me that titan GC and compaction filter GC may write back some kvs.
Considering the case, there is a peer before, then it's moved to other stores. And it's moved back to this store again.
Titan GC would be okay as it won't write back any overlapping key after the peer is destroyed. While compaction filter GC may perform some deletion as the same time. Please consider compaction filter GC case as well.

hhwyt · 2025-01-09T06:54:31Z

itan GC would be okay as it won't write back any overlapping key after the peer is destroyed

@Connor1996 How does Titan GC avoid writing back any overlapping keys after the peer is destroyed?

Connor1996 · 2025-01-09T07:02:03Z

itan GC would be okay as it won't write back any overlapping key after the peer is destroyed

@Connor1996 How does Titan GC avoid writing back any overlapping keys after the peer is destroyed?

All keys in that range are deleted when destroying. And Titan GC find keys are deleted and just skip that key for generating new blob file

v01dstar · 2025-01-09T19:12:53Z

It comes to me that titan GC and compaction filter GC may write back some kvs. Considering the case, there is a peer before, then it's moved to other stores. And it's moved back to this store again. Titan GC would be okay as it won't write back any overlapping key after the peer is destroyed. While compaction filter GC may perform some deletion as the same time. Please consider compaction filter GC case as well.

Titan GC wouldn't write overlap keys. For a key to be written back, it needs to be 1. "existing in the current view" 2. the value in LSM tree is a reference to the blob store, and the reference points to the exact same location of the to-be-examined blob key. So, when the ingest happens, if Titan GC reads the moved-back kv, it will see that value is not a reference, it does not meet requirement 2. If Titan GC does not read the moved-back KV, it will get a NotFound, thus does not satisfy condition 1.

Signed-off-by: hhwyt <[email protected]>

ti-chi-bot bot added dco-signoff: no Indicates the PR's author has not signed dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 7, 2025

hhwyt changed the title ~~Support ingest without pausing writes~~ Support writes during ingestion Jan 7, 2025

hhwyt force-pushed the pure-no-pause-write branch from 7340dac to 82ad5dc Compare January 7, 2025 14:29

ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Jan 7, 2025

hhwyt mentioned this pull request Jan 7, 2025

raftstore: support rocksdb writes during ingestion tikv/tikv#18096

Open

9 tasks

Connor1996 reviewed Jan 7, 2025

View reviewed changes

ti-chi-bot bot added dco-signoff: no Indicates the PR's author has not signed dco. and removed dco-signoff: yes Indicates the PR's author has signed the dco. labels Jan 8, 2025

hhwyt force-pushed the pure-no-pause-write branch from 6a859b7 to b695598 Compare January 8, 2025 03:01

ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Jan 8, 2025

glorv reviewed Jan 8, 2025

View reviewed changes

v01dstar approved these changes Jan 9, 2025

View reviewed changes

ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 9, 2025

ti-chi-bot bot added the approved label Jan 9, 2025

Connor1996 reviewed Jan 9, 2025

View reviewed changes

ti-chi-bot bot added dco-signoff: no Indicates the PR's author has not signed dco. and removed dco-signoff: yes Indicates the PR's author has signed the dco. labels Jan 16, 2025

hhwyt force-pushed the pure-no-pause-write branch from c11f98d to ad98a17 Compare January 16, 2025 10:07

ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Jan 16, 2025

Support writes during ingestion

295343e

Signed-off-by: hhwyt <[email protected]>

hhwyt added 2 commits January 16, 2025 18:09

Add comments

ec91fb1

Signed-off-by: hhwyt <[email protected]>

Address comments

ff3e293

Signed-off-by: hhwyt <[email protected]>

hhwyt force-pushed the pure-no-pause-write branch from ad98a17 to ff3e293 Compare January 16, 2025 10:10

ti-chi-bot bot added dco-signoff: no Indicates the PR's author has not signed dco. and removed dco-signoff: yes Indicates the PR's author has signed the dco. labels Jan 17, 2025

Fix error

40e0131

Signed-off-by: hhwyt <[email protected]>

hhwyt force-pushed the pure-no-pause-write branch from e2c41b1 to 40e0131 Compare January 17, 2025 06:36

ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support writes during ingestion #400

Support writes during ingestion #400

hhwyt commented Jan 7, 2025 •

edited

Loading

hhwyt commented Jan 7, 2025

Connor1996 Jan 7, 2025

hhwyt Jan 8, 2025 •

edited

Loading

Connor1996 Jan 7, 2025

v01dstar Jan 8, 2025

hhwyt Jan 8, 2025

hhwyt Jan 8, 2025

hhwyt Jan 8, 2025 •

edited

Loading

v01dstar Jan 8, 2025 •

edited

Loading

hhwyt Jan 16, 2025

glorv Jan 8, 2025 •

edited

Loading

hhwyt Jan 8, 2025

v01dstar left a comment

ti-chi-bot bot commented Jan 9, 2025

ti-chi-bot bot commented Jan 9, 2025

Connor1996 left a comment

hhwyt commented Jan 9, 2025

Connor1996 commented Jan 9, 2025

v01dstar commented Jan 9, 2025

Support writes during ingestion #400

Are you sure you want to change the base?

Support writes during ingestion #400

Conversation

hhwyt commented Jan 7, 2025 • edited Loading

hhwyt commented Jan 7, 2025

Connor1996 Jan 7, 2025

Choose a reason for hiding this comment

hhwyt Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Connor1996 Jan 7, 2025

Choose a reason for hiding this comment

v01dstar Jan 8, 2025

Choose a reason for hiding this comment

hhwyt Jan 8, 2025

Choose a reason for hiding this comment

hhwyt Jan 8, 2025

Choose a reason for hiding this comment

hhwyt Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

v01dstar Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

hhwyt Jan 16, 2025

Choose a reason for hiding this comment

glorv Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

hhwyt Jan 8, 2025

Choose a reason for hiding this comment

v01dstar left a comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Jan 9, 2025

[LGTM Timeline notifier]

ti-chi-bot bot commented Jan 9, 2025

Connor1996 left a comment

Choose a reason for hiding this comment

hhwyt commented Jan 9, 2025

Connor1996 commented Jan 9, 2025

v01dstar commented Jan 9, 2025

hhwyt commented Jan 7, 2025 •

edited

Loading

hhwyt Jan 8, 2025 •

edited

Loading

hhwyt Jan 8, 2025 •

edited

Loading

v01dstar Jan 8, 2025 •

edited

Loading

glorv Jan 8, 2025 •

edited

Loading