Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Remove closing of IO in SerializableTable* #12129

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mgmarino
Copy link

This is to fix: #12046

To summarize, the issue is that Spark can remove broadcast variables from memory and
persist them to disk in case that memory needs to be freed. In the case
that this happens, the IO object would be closed even if it was still
being used by tasks.

This fixes the issue by removing the closure of the IO object when the
serializable table is closed. The IO objects should be closed on thread
finalizers.

@github-actions github-actions bot added the spark label Jan 29, 2025
@mgmarino
Copy link
Author

I am happy to get input here as to whether or not this is the correct way to solve this issue and am happy to adapt as necessary. Thanks!

This effectively reverts: #8924

@mgmarino mgmarino force-pushed the fix-removal-of-broadcast-variable branch from cb81b3b to cac8ceb Compare January 29, 2025 14:11
@nastra nastra requested a review from aokolnychyi January 29, 2025 14:34
@mgmarino mgmarino force-pushed the fix-removal-of-broadcast-variable branch 2 times, most recently from 1855162 to 59c743b Compare January 29, 2025 15:11
This is to fix: apache#12046

To summarize, the issue is that Spark can remove broadcast variables from memory and
persist them to disk in case that memory needs to be freed. In the case
that this happens, the IO object would be closed even if it was still
being used by tasks.

This fixes the issue by removing the closure of the IO object when the
serializable table is closed. The IO objects should be closed on thread
finalizers.
@mgmarino mgmarino force-pushed the fix-removal-of-broadcast-variable branch from 59c743b to 04137d3 Compare January 29, 2025 15:18
@mgmarino
Copy link
Author

mgmarino commented Feb 3, 2025

@aokolnychyi Any chance you'll be able to give some feedback on this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spark rewrite_data_files failing with java.lang.IllegalStateException: Connection pool shut down
1 participant