Skip to content

Conversation

@ddobie
Copy link
Contributor

@ddobie ddobie commented Jun 10, 2025

There are currently huge memory leaks caused by generating delayeds from values in a large dataframe - the entirety of the dataframe gets dragged into the delayed (rather than just the selected rows) which causes memory to explode. This happens in both the forced extraction and new sources steps.

After fixing it, I am able to run the entirety of the galactic data through the pipeline.

@ddobie ddobie requested a review from mauch September 23, 2025 06:03
@ddobie ddobie marked this pull request as ready for review September 23, 2025 06:03
Copy link
Contributor

@mauch mauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.


# Do the forced extraction work by combining the two delayed lists above then
# running extract_from_image on the tuple of delayed futures.
# Persist at this point uning the number of io workers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this comment - since we're now using the semaphores further down the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants