[#37209] Enhance serialization error messages #37298

PDGGK · 2026-01-14T03:24:36Z

What changes are being proposed in this pull request?

This PR addresses issue #37209 with two improvements:

1. Enhanced serialization error messages (Python)

Problem: When users pass non-serializable lambdas or closures, they get cryptic errors like:

RuntimeError: Unable to pickle fn <function>: <PicklingError>

Solution: Enhanced error message to explain:

Why serialization is required (distributed execution)
What commonly causes the error (capturing file handles, DB connections, etc.)
How to fix it (use module-level functions, initialize in setup(), etc.)

Example:

RuntimeError: Unable to pickle fn <function>: cannot serialize <_io.TextIOWrapper>. 
User code must be serializable (picklable) for distributed execution. 
This usually happens when lambdas or closures capture non-serializable objects 
like file handles, database connections, or thread locks. Try: (1) using 
module-level functions instead of lambdas, (2) initializing resources in 
setup() methods, (3) checking what your closure captures.

Testing: Added test_callable_non_serializable_error_message() to verify error message content.

Testing

Python (ptransform)

✅ 202 tests passed in ptransform_test.py
✅ New test verifies enhanced error message

Impact

Developer Experience:
- Python: Reduces debugging time for serialization issues
Stability: No change to execution logic; pure visibility change + diagnostic improvement
Compatibility: No breaking changes to existing pipelines

Fixes #37209

Improved error messages when user code fails to serialize (pickle) for distributed execution. The original error was too technical and didn't explain the cause or suggest fixes. Changes: - Enhanced RuntimeError message with clear explanation of why serialization is required - Added common causes (lambdas capturing file handles, DB connections, thread locks) - Provided three concrete fixes: module-level functions, setup() methods, checking closure captures - Broadened exception catching to include TypeError and other pickling failures (not just RuntimeError) - Added exception chaining (from e) to preserve original stack trace - Added test case to verify the new error message content This significantly improves developer experience when debugging serialization issues, especially for new Apache Beam users. Fixes apache#37209 Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

gemini-code-assist · 2026-01-14T03:24:51Z

Summary of Changes

Hello @PDGGK, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the developer experience in Apache Beam by transforming cryptic serialization error messages into clear, actionable guidance. By providing detailed explanations of why serialization is required, common pitfalls like capturing non-serializable objects, and concrete solutions, it aims to drastically reduce the debugging time for users encountering these issues, particularly those new to distributed execution paradigms.

Highlights

Improved Serialization Error Messages: Enhanced the error messages for non-serializable user code, providing clear explanations, common causes, and actionable solutions.
Broader Exception Handling: Expanded the exception handling for pickling failures to catch RuntimeError, TypeError, and general Exception to ensure comprehensive coverage.
Preserved Exception Context: Implemented exception chaining (from e) to retain the original stack trace and context for better debugging.
New Test Coverage: Added a dedicated test case to validate the new, informative serialization error message content.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-01-14T04:48:58Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Fix Python formatter precommit check by applying yapf v0.43.0 formatting rules to modified files.

github-actions · 2026-01-14T06:12:18Z

Assigning reviewers:

R: @claudevdm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

claudevdm

Thanks!

codecov · 2026-01-19T13:17:12Z

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.30%. Comparing base (4dcc27d) to head (978e0af).
⚠️ Report is 20 commits behind head on master.

Files with missing lines	Patch %	Lines
sdks/python/apache_beam/transforms/ptransform.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #37298       +/-   ##
=============================================
- Coverage     55.16%   36.30%   -18.86%     
  Complexity     1676     1676               
=============================================
  Files          1068     1068               
  Lines        167257   167323       +66     
  Branches       1208     1208               
=============================================
- Hits          92261    60753    -31508     
- Misses        72816   104390    +31574     
  Partials       2180     2180

Flag	Coverage Δ
python	`40.58% <33.33%> (-40.21%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PDGGK · 2026-01-20T04:06:50Z

The two failing checks appear to be flaky tests unrelated to this PR's changes:

PreCommit Python Dataframes (3.10) - test_unbatch_no_index_Series[string] failed with grpc._channel._MultiThreadedRendezvous: Socket closed - a gRPC infrastructure issue
PreCommit Python Examples (3.11) - test_gaps_1 failed with a BeamAssertException due to result ordering mismatch

This PR only modifies error messages in ptransform.py and does not touch any dataframe or example code. Could a maintainer please re-run the checks?

…y configuration Users need to configure bounded backoff to prevent infinite retry loops. Making withBackOffSupplier public allows users to set FluentBackoff.DEFAULT.withMaxRetries(n) and control retry behavior. Added integration test demonstrating bounded retry with maxRetries=3. Related to apache#37198, apache#37176 Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

claudevdm · 2026-01-21T02:58:34Z

@PDGGK can you please revert the unrelated java changes then I will merge this change.

github-actions bot added the python label Jan 14, 2026

Apply yapf formatting

19e4a85

Fix Python formatter precommit check by applying yapf v0.43.0 formatting rules to modified files.

github-actions bot added the Next Action: Reviewers label Jan 14, 2026

claudevdm approved these changes Jan 16, 2026

View reviewed changes

github-actions bot added java io and removed java io labels Jan 18, 2026

PDGGK mentioned this pull request Jan 19, 2026

Fix RequestResponseIO parseAndThrow to preserve retryable exception types #37341

Closed

9 tasks

github-actions bot added java io and removed java io labels Jan 19, 2026

PDGGK force-pushed the fix-issue-37209 branch from 978e0af to 19e4a85 Compare January 19, 2026 13:54

github-actions bot added java io and removed java io labels Jan 20, 2026

PDGGK changed the title ~~[#37209] Enhance serialization error messages for better developer experience~~ [#37209] Make withBackOffSupplier public & enhance serialization error messages Jan 20, 2026

Revert unrelated Java changes

6ca1564

claudevdm changed the title ~~[#37209] Make withBackOffSupplier public & enhance serialization error messages~~ [#37209] Enhance serialization error messages Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[#37209] Enhance serialization error messages #37298

[#37209] Enhance serialization error messages #37298

Uh oh!

PDGGK commented Jan 14, 2026 •

edited by claudevdm

Loading

Uh oh!

gemini-code-assist bot commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

claudevdm left a comment

Uh oh!

codecov bot commented Jan 19, 2026

Uh oh!

PDGGK commented Jan 20, 2026

Uh oh!

claudevdm commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[#37209] Enhance serialization error messages #37298

Are you sure you want to change the base?

[#37209] Enhance serialization error messages #37298

Uh oh!

Conversation

PDGGK commented Jan 14, 2026 • edited by claudevdm Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are being proposed in this pull request?

1. Enhanced serialization error messages (Python)

Testing

Python (ptransform)

Impact

Uh oh!

gemini-code-assist bot commented Jan 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

claudevdm left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 19, 2026

Codecov Report

Uh oh!

PDGGK commented Jan 20, 2026

Uh oh!

claudevdm commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PDGGK commented Jan 14, 2026 •

edited by claudevdm

Loading