-
Notifications
You must be signed in to change notification settings - Fork 675
FOEPD-2119: Limit number of async tasks writing to DB #6361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
@jacobsela are you able to run your analysis on this branch to see if it makes any difference? |
fiftyone/core/utils.py
Outdated
yield submit | ||
|
||
for future in _futures: | ||
for future in futures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are an opportunistic cleanup in the original code.
With this branch this function is no longer used in fiftyone, so it could be removed, but the code has been released so there could be third-party uses of it. Suggestions on whether to keep it or get rid of it?
fiftyone/core/models.py
Outdated
limit=10, | ||
max_workers=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some defaults. Ideally these would be configurable by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for limit to 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arbitrary. Didn't have any reason to choose something different. @jacobsela might be able to weigh in with an informed option.
return future | ||
|
||
for item in iterator: | ||
yield submit, item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be submit(item)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This is yielding the submit function and the current item to the consumer loop, so the consumer can call submit
where it wants and with the function it wants to run async.
@jacobsela any word on whether this branch makes any different in your benchmarks? |
6e348e9
to
b1596c7
Compare
Profiled memory on this branch with @jacobsela's testing script and found that limiting the number of background tasks doesn't seem to help. Even when using a limit of 0, which should be behaviorally equivalent to develop before #6288 (i.e. each batch's prediction is blocked on the previous batch being saved), memory usage climbs almost as high as without a limit. Increasing the number of workers speeds up execution but doesn't use that much less memory. |
Considering the results found for #6389, the increasing memory usage appears to be the result of an interaction between |
What changes are proposed in this pull request?
#6288 allows asynchronous background tasks to accumulate without limit, on the assumption that predictions on a batch would take longer than saving that batch to the DB. In cases where saving to the DB takes longer, the number of async tasks can grow without limit and could cause the process to run out of memory.
This PR wraps the batch iterator in a new utility function that limits the number background tasks. After each iteration of the source iterator, if there are more background tasks than the specified limit, tasks are waited for until the backlog is reduced to the limit, at which point the next item is taken from the iterator.
This approach is open to discussion. If writing to the DB is the bottleneck and creates backpressure on the prediction pipeline, then as soon as enough async tasks accumulate, performance degenerates to the same throughput as never using async tasks. In that case, we might prefer simplicity and revert #6288 to go back a single thread.
How is this patch tested? If it is not, please explain why.
Ran the following script on a machine with a GPU, with some added logging that recorded how many tasks had accumulated. Logs showed the expected limits were observed.
I also tried using memory-profiler, but the profiles showed the same memory characteristics as the original code. I suspect memory usage is dominated by the downloaded samples, not by the data needed for the async tasks, so the effect of limiting the queue doesn't show:

@jacobsela has some more sophisticated testing he might be able to apply.
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyone
Python library changes