-
Notifications
You must be signed in to change notification settings - Fork 674
FOEPD-2119: Use SaveContext to write to DB in background thread #6389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughAdds an executor-driven asynchronous batch-write pathway to SaveContext via a new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Loader
participant SaveCtx as SaveContext
participant Exec as Executor (ThreadPoolExecutor or DummyExecutor)
participant Worker as _do_save_batch
Loader->>SaveCtx: with SaveContext(samples, async_writes=True)
activate SaveCtx
loop per sample
Loader->>SaveCtx: ctx.save(sample)
SaveCtx->>SaveCtx: lock, enqueue ops & batch ids
alt batch threshold reached
SaveCtx->>Exec: submit(_do_save_batch) -> Future
Exec->>Worker: run _do_save_batch()
Worker-->>Exec: result / exception
Exec-->>SaveCtx: future completes
end
end
SaveCtx->>SaveCtx: __exit__ waits on collected futures, propagate errors
deactivate SaveCtx
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/core/collections.py
(1 hunks)fiftyone/core/models.py
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
fiftyone/core/models.py (1)
fiftyone/core/collections.py (1)
AsyncSaveContext
(237-254)
fiftyone/core/collections.py (1)
fiftyone/core/utils.py (1)
submit
(3224-3227)
🪛 Ruff (0.13.3)
fiftyone/core/collections.py
240-240: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-app
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: lint / eslint
- GitHub Check: build / build
- GitHub Check: e2e / test-e2e
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes intuitive sense to me that offloading DB writes to a worker thread would be faster! Yay!
I'd like to get this behavior into the public interface:
for sample in dataset.iter_samples(..., autosave=True, **kwargs):
...
for sample in dataset.iter_groups(..., autosave=True, **kwargs):
...
for sample in dataset.save_context(**kwargs):
...
# note that this is imported into the public `fiftyone` namespace
with fo.SaveContext(dataset, **kwargs) as ctx:
for sample in dataset:
..
not just used internally in the fiftyone.core.models
module.
Moreover (assuming this is fully tested and stable), given that it is more performant, I think we should consider making it the default behavior across the library.
Today, the **kwargs
in the public interface let users configure save batching if desired via batch_size
and batching_strategy
on a per-call basis. And there are environment variables that allow users to configure the default autosave batching behavior across an entire session:
FIFTYONE_DEFAULT_BATCHER
FIFTYONE_BATCHER_STATIC_SIZE
FIFTYONE_BATCHER_TARGET_SIZE_BYTES
FIFTYONE_BATCHER_TARGET_LATENCY
So, some options for this new behavior:
- Always use async writes, because it is strictly better
- Make async writes configurable via the above patterns by adding:
kwarg: async_writes=True/False
config: FIFTYONE_SAVE_CONTEXT_ASYNC_WRITES=True/False
In 2, assuming async is strictly better, I'd advocate for the default being async_writes=True
.
Since |
I don't think we need a deprecation procedure for removing I'd be in favor of reverting #6288 so that the model inference code in |
Will do in a separate PR. |
See #6391 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fiftyone/core/collections.py (1)
258-269
: Consider reducing lock granularity to avoid blocking producersHolding the lock during the entire DB write serializes producers with the bulk write, which can reduce throughput. If needed later, snapshot and swap the op lists under the lock and perform bulk_write on the snapshot outside the lock to maximize concurrency (while keeping correctness).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/core/collections.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.13.3)
fiftyone/core/collections.py
241-241: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test / test-app
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: build / build
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (2)
fiftyone/core/collections.py (2)
20-20
: Import looks correctNeeded for the new lock usage.
238-246
: Good fix for prior race; lock placement is appropriateUsing a single lock around save() and the flush path (calling the base _save_batch) prevents dropped ops discussed in the earlier review thread. Please confirm any callers pass an executor that supports context management, since this class enters/exits it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/core/collections.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/core/collections.py (2)
fiftyone/core/odm/database.py (1)
bulk_write
(945-982)fiftyone/core/utils.py (1)
submit
(3224-3227)
🪛 Ruff (0.13.3)
fiftyone/core/collections.py
241-241: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: e2e / test-e2e
- GitHub Check: build
@brimoor captured your full comment in FOEPD-2242 so we don't lose it while this experimental approach is battle-tested via internal usage, but it's out of scope for this PR. |
Using async writes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
fiftyone/core/collections.py (2)
322-366
: Backpressure vs memory: current copy‑and‑clear releases locks during I/O
_do_save_batch()
copies and clears under locks, then performsbulk_write()
and parent sync without locks. This removes intentional backpressure and can increase peak memory while a flush is in progress. Profiling notes indicated holding locks acrossbulk_write()
/_sync_source()
kept memory flat.If the goal is bounded memory, consider holding the locks across I/O:
- if self._sample_ops: - with self.samples_lock: - sample_ops = self._sample_ops.copy() - self._sample_ops.clear() - res = foo.bulk_write( - sample_ops, - self._sample_coll, - ordered=False, - batcher=False, - )[0] + if self._sample_ops: + with self.samples_lock: + res = foo.bulk_write( + self._sample_ops, + self._sample_coll, + ordered=False, + batcher=False, + )[0] + self._sample_ops.clear() encoded_size += res.bulk_api_result.get("nBytes", 0) - if self._frame_ops: - with self.frames_lock: - frame_ops = self._frame_ops.copy() - self._frame_ops.clear() - res = foo.bulk_write( - frame_ops, - self._frame_coll, - ordered=False, - batcher=False, - )[0] + if self._frame_ops: + with self.frames_lock: + res = foo.bulk_write( + self._frame_ops, + self._frame_coll, + ordered=False, + batcher=False, + )[0] + self._frame_ops.clear() encoded_size += res.bulk_api_result.get("nBytes", 0) - if self._batch_ids and self._is_generated: - with self.batch_ids_lock: - batch_ids = self._batch_ids.copy() - self._batch_ids.clear() - self.sample_collection._sync_source(ids=batch_ids) + if self._batch_ids and self._is_generated: + with self.batch_ids_lock: + # hold lock to bound growth while syncing + self.sample_collection._sync_source(ids=self._batch_ids) + self._batch_ids.clear() - if self._reload_parents: - with self.reloading_lock: - reload_parents = self._reload_parents.copy() - self._reload_parents.clear() - for sample in reload_parents: - sample._reload_parents() + if self._reload_parents: + with self.reloading_lock: + for sample in self._reload_parents: + sample._reload_parents() + self._reload_parents.clear()If you prefer the unlocked I/O for throughput, add a brief comment explaining the trade‑off and verify profiling still shows flat memory. Based on learnings
368-370
: Boundself.futures
growthDuring long runs, the list can grow with completed futures until exit. Opportunistically prune done futures before appending new ones.
def _save_batch(self): - future = self.executor.submit(self._do_save_batch) - self.futures.append(future) + # drop completed futures to bound list size + self.futures = [f for f in self.futures if not f.done()] + future = self.executor.submit(self._do_save_batch) + self.futures.append(future)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/core/collections.py
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-08T17:43:55.970Z
Learnt from: exupero
PR: voxel51/fiftyone#6389
File: fiftyone/core/collections.py:323-361
Timestamp: 2025-10-08T17:43:55.970Z
Learning: In fiftyone/core/collections.py, AsyncSaveContext intentionally holds per-category locks (samples_lock, frames_lock, batch_ids_lock, reloading_lock) across bulk_write() and _sync_source() to provide backpressure and keep memory usage flat. Swapping buffers to release locks during I/O was considered but increased peak memory in profiling; current design improved overall runtime while bounding memory.
Applied to files:
fiftyone/core/collections.py
🧬 Code graph analysis (1)
fiftyone/core/collections.py (3)
fiftyone/core/frame.py (6)
save
(490-492)save
(1047-1049)save
(1097-1104)_in_db
(125-126)_save
(494-509)_reload_parents
(1051-1056)fiftyone/core/odm/database.py (2)
save
(89-92)bulk_write
(945-982)fiftyone/core/sample.py (5)
save
(550-552)save
(741-749)_save
(554-570)_save
(751-762)SampleView
(637-762)
🪛 Ruff (0.13.3)
fiftyone/core/collections.py
241-241: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: e2e / test-e2e
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
fiftyone/core/collections.py (4)
271-329
: Clarify concurrency expectations for batching counters_curr_batch_size, _curr_batch_size_bytes, and _last_time are updated without locks. If save() can be called from multiple producer threads, increments may race and skew flush thresholds. Either document single-producer usage or guard these fields with a lightweight lock.
356-360
: Reduce timing sensitivity computing _encoding_ratioSnapshot _curr_batch_size_bytes before writes to avoid races with concurrent producers.
Apply:
- self._encoding_ratio = ( - self._curr_batch_size_bytes / encoded_size + bytes_snapshot = self._curr_batch_size_bytes + self._encoding_ratio = ( + bytes_snapshot / encoded_size - if encoded_size > 0 and self._curr_batch_size_bytes + if encoded_size > 0 and bytes_snapshot else 1.0 )
362-374
: Guard batch_ids/reload_parents checks under locksMove truthiness checks inside the locks to avoid racy empty snapshots and unnecessary calls.
Apply:
- if self._batch_ids and self._is_generated: - with self.batch_ids_lock: - batch_ids = self._batch_ids.copy() - self._batch_ids.clear() - self.sample_collection._sync_source(ids=batch_ids) + if self._is_generated: + with self.batch_ids_lock: + batch_ids, self._batch_ids = self._batch_ids, [] + if batch_ids: + self.sample_collection._sync_source(ids=batch_ids) @@ - if self._reload_parents: - with self.reloading_lock: - reload_parents = self._reload_parents.copy() - self._reload_parents.clear() - for sample in reload_parents: - sample._reload_parents() + with self.reloading_lock: + reload_parents, self._reload_parents = self._reload_parents, [] + if reload_parents: + for sample in reload_parents: + sample._reload_parents()
375-378
: Make futures list thread-safe (if multiple producers can call _save_batch)If save() can run from multiple threads, appending to self.futures can race with exit drains. Use a small lock.
Apply:
- future = self.executor.submit(self._do_save_batch) - self.futures.append(future) + future = self.executor.submit(self._do_save_batch) + # optional: protect if multiple producers exist + # (add `self.futures_lock = threading.Lock()` in __init__) + try: + lock = self.futures_lock + except AttributeError: + self.futures.append(future) + else: + with lock: + self.futures.append(future)And in init:
self.futures = [] + self.futures_lock = threading.Lock()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/core/collections.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/core/collections.py (3)
fiftyone/core/frame.py (5)
save
(490-492)save
(1047-1049)save
(1097-1104)_in_db
(125-126)_save
(494-509)fiftyone/core/odm/database.py (2)
save
(89-92)bulk_write
(945-982)fiftyone/core/utils.py (1)
submit
(3224-3227)
🪛 Ruff (0.13.3)
fiftyone/core/collections.py
241-241: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test / test-app
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: build / build
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (1)
fiftyone/core/collections.py (1)
20-20
: LGTM: required for locksImporting threading is appropriate for the new lock usage.
Findings of more workers not improving time align with previous performance investigations and understanding of pymongo. Echoing Brian's comments from earlier, it feels like the save context should just be an async save context when save(_deferred=True). I'm wondering about the reload_parents() and both how much that could be contributing to the overall time and if reloading once upon exit after all ops are done could help if reload is even necessary. Also, have you tested with the different batching strategies? There's not anything I'm concerned about off the top of my head, but it would be good to just test and not be surprised about later |
sample: a :class:`fiftyone.core.sample.Sample` or | ||
:class:`fiftyone.core.sample.SampleView` | ||
""" | ||
if sample._in_db and sample._dataset is not self._dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is all of this just copied from SaveContext but with the locks? Feels like a lot of shared logic/similar code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. What are your thoughts on resolving that? Put the locks in SaveContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the locks in SaveContext?
Yes that seems like like the best path. Both to reduce code duplication and also because per my comments here, I see this either as (1) the way that SaveContext
always works, or (2) a behavior I can get via fo.SaveContext(..., async_writes=True)
and similar.
Note that I am making some small tweaks to SaveContext in #4773, so it would be great to get that merged first 😄
^reminder about this as well. More reason not to duplicate implementation of SaveContext
and to get that PR merged first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
fiftyone/core/collections.py (1)
338-363
: Fix race + prevent data loss in _do_save_batch (drain under locks, write only when non-empty, requeue on failure)Currently checks outside locks and unconditional bulk_write can cause bulk_write([])[0] and drop ops if I/O fails. Drain by swap under locks, then write on the snapshot; on exception, requeue to preserve data. Also compute encoding_ratio from a snapshot and drain ids/reloads after locks.
Apply:
def _do_save_batch(self): - encoded_size = -1 - if self._sample_ops: - with self.samples_lock: - sample_ops = self._sample_ops.copy() - self._sample_ops.clear() - res = foo.bulk_write( - sample_ops, - self._sample_coll, - ordered=False, - batcher=False, - )[0] - encoded_size += res.bulk_api_result.get("nBytes", 0) + encoded_size = -1 + # drain samples under lock + with self.samples_lock: + sample_ops, self._sample_ops = self._sample_ops, [] + if sample_ops: + try: + res = foo.bulk_write( + sample_ops, + self._sample_coll, + ordered=False, + batcher=False, + )[0] + encoded_size += res.bulk_api_result.get("nBytes", 0) + except Exception: + # requeue to avoid data loss + with self.samples_lock: + self._sample_ops[:0] = sample_ops + raise - if self._frame_ops: - with self.frames_lock: - frame_ops = self._frame_ops.copy() - self._frame_ops.clear() - res = foo.bulk_write( - frame_ops, - self._frame_coll, - ordered=False, - batcher=False, - )[0] - encoded_size += res.bulk_api_result.get("nBytes", 0) + # drain frames under lock + with self.frames_lock: + frame_ops, self._frame_ops = self._frame_ops, [] + if frame_ops: + try: + res = foo.bulk_write( + frame_ops, + self._frame_coll, + ordered=False, + batcher=False, + )[0] + encoded_size += res.bulk_api_result.get("nBytes", 0) + except Exception: + with self.frames_lock: + self._frame_ops[:0] = frame_ops + raise - self._encoding_ratio = ( - self._curr_batch_size_bytes / encoded_size + # snapshot to reduce timing sensitivity + bytes_snapshot = self._curr_batch_size_bytes + self._encoding_ratio = ( + bytes_snapshot / encoded_size if encoded_size > 0 and self._curr_batch_size_bytes else 1.0 ) - if self._batch_ids and self._is_generated: - with self.batch_ids_lock: - batch_ids = self._batch_ids.copy() - self._batch_ids.clear() - self.sample_collection._sync_source(ids=batch_ids) + if self._is_generated: + with self.batch_ids_lock: + batch_ids, self._batch_ids = self._batch_ids, [] + if batch_ids: + self.sample_collection._sync_source(ids=batch_ids) - if self._reload_parents: - with self.reloading_lock: - reload_parents = self._reload_parents.copy() - self._reload_parents.clear() - for sample in reload_parents: - sample._reload_parents() + with self.reloading_lock: + reload_parents, self._reload_parents = self._reload_parents, [] + for sample in reload_parents: + sample._reload_parents()Based on learnings
Also applies to: 370-382
🧹 Nitpick comments (3)
fiftyone/core/collections.py (3)
256-278
: Avoid masking exceptions from the with-body in exitIf the with-body raised, re-raising a Future error here will mask it. Gate the re-raise on no prior exception.
def __exit__(self, *args): super().__exit__(*args) error = None try: @@ finally: self.executor.__exit__(*args) - if error: - raise error + # Only raise background error if the with-body didn't raise + if error and (not args or args[0] is None): + raise error
279-337
: Minor: reduce duplication with SaveContext.save()Large overlap with SaveContext.save; consider extracting shared batching/threshold logic into a helper to prevent drift.
383-386
: Record submitted futures safely (optional)If save() may be called from multiple threads, consider guarding futures append/drain with a small lock to avoid races with exit’s drain.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/core/collections.py
(2 hunks)
🔇 Additional comments (2)
fiftyone/core/collections.py (2)
20-20
: Import looks goodthreading is required for the added locks.
238-249
: Constructor and lock setup look good; confirm executor concurrency assumptionsRequiring an executor and adding per-list locks is sound. If this context may be used with executors having >1 worker, ensure _do_save_batch is race-safe (see my fix below) or constrain max_workers=1 at call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/core/collections.py
(9 hunks)fiftyone/core/models.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/models.py
What changes are proposed in this pull request?
#6288 performed DB writes on a background thread but showed increasing memory usage as batches were accumulated, so DB writes were moved back to the main thread in #6384. Testing on #6361 showed that limiting the number of tasks in the async queue did not reduce memory usage when computing embeddings with a large model.
This PR takes a different approach to putting DB writes on an async thread by using a subclass of
SaveContext
that submits saves to an async executor. Profiling shows memory usage remains flat while execution also finishes sooner, even when using only 1 async worker thread.How is this patch tested? If it is not, please explain why.
Ran a memory profiling script that computes embeddings with a large model, using 500, 1000, and 2000 samples. Using

async_executor
to write to the DB resulted in increasing memory usage, while usingAsyncSaveContext
did not and is faster, and using more async workers neither changed memory usage nor total execution time:Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit