Skip to content

Conversation

@damccorm
Copy link
Contributor

@damccorm damccorm commented Jan 8, 2026

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @damccorm, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is dedicated to validating the upcoming Apache Beam 2.71.0 release candidate. It involves a comprehensive update of both Java and Python dependencies to align with the new Beam version, ensuring compatibility and leveraging the latest features and fixes. The changes span across various project modules, primarily focusing on dependency version bumps and the introduction of new transitive dependencies in Python environments.

Highlights

  • Apache Beam Version Update: The core Apache Beam and Beam Python versions have been updated from 2.70.0 to 2.71.0 in the pom.xml configuration.
  • Maven Repository Configuration: A new Maven repository URL (https://repository.apache.org/content/repositories/orgapachebeam-1420) has been added to pom.xml to fetch the updated Beam artifacts.
  • Python Dependency Pinning: All relevant Python requirements.txt files across various modules (e.g., default_base_python_requirements.txt, default_base_yaml_requirements.txt, job-builder-server, streaming-llm, word-count-python, yaml-template, and several v2 templates) now explicitly pin apache-beam to version 2.71.0rc1.
  • Python Dependency Updates: Several Python dependencies have been updated to newer versions, including beartype (to 0.22.9), google-cloud-bigquery (to 3.40.0), google-genai (to 1.57.0), jsonschema (to 4.26.0), numpy (to 2.4.0), pymongo (to 4.16.0), testcontainers (to 4.14.0), and urllib3 (to 2.6.3).
  • New Python Dependencies Introduced: New Python packages such as betterproto, envoy-data-plane, grpclib, h2, hpack, and hyperframe have been added to the dependency lists in various requirements.txt files, often as transitive dependencies of apache-beam or other updated packages.
  • Build Command Adjustments: Comments within some requirements.txt files (job-builder-server, word-count-python, yaml-template, and v2 templates) have been updated to reflect changes in pip-compile command syntax, specifically removing the ./ prefix from output file paths.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 50.87%. Comparing base (7be9f75) to head (dae581c).

Additional details and impacted files
@@            Coverage Diff             @@
##               main    #3204    +/-   ##
==========================================
  Coverage     50.86%   50.87%            
- Complexity     5133     5513   +380     
==========================================
  Files           976      976            
  Lines         60090    60090            
  Branches       6572     6572            
==========================================
+ Hits          30564    30569     +5     
+ Misses        27374    27372     -2     
+ Partials       2152     2149     -3     
Components Coverage Δ
spanner-templates 71.02% <ø> (+<0.01%) ⬆️
spanner-import-export 68.99% <ø> (ø)
spanner-live-forward-migration 80.12% <ø> (+0.01%) ⬆️
spanner-live-reverse-replication 77.77% <ø> (+0.01%) ⬆️
spanner-bulk-migration 88.44% <ø> (+0.01%) ⬆️
see 2 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@damccorm
Copy link
Contributor Author

damccorm commented Jan 9, 2026

Currently what I am seeing is that all tests are passing except some spanner related tests. These are failing with:

Timeout in polling result file: gs://dataflow-staging-us-west2-269744978479/staging/template_launches/2026-01-08_12_48_39-9625519560852535969/operation_result.
Service account: [email protected]
Image URL: gcr.io/cloud-teleport-testing/spanner-changestreams-to-bigquery:2026-01-08-20-42-15_IT
Troubleshooting guide at https://cloud.google.com/dataflow/docs/guides/troubleshoot-templates#timeout-polling

Example: https://pantheon.corp.google.com/dataflow/jobs/us-west2/2026-01-08_12_48_39-9625519560852535969;logsSeverity=ERROR;graphView=1?project=cloud-teleport-testing&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))&e=13802955&mods=dm_deploy_from_gcs

I'm not currently seeing anything in the logs that gives much of a hint about what is happening. I have confirmed that if I remove SpannerIO.ReadChangeStream from the pipeline, that allows it to succeed, so there is something odd going on there. There are no obvious changes in https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner

@Abacn have you seen anything like this while doing this upgrade before? If not, I will keep looking

@Abacn
Copy link
Contributor

Abacn commented Jan 9, 2026

In the past there were situations that spanner tests due to breaking changes in Spanner client that get upgraded as GCP BOM upgrade happened in Beam.

"Timeout in polling result file: gs://dataflow-staging-us-west2-269744978479/staging/template_launches/2026-01-08_12_48_39-9625519560852535969/operation_result."

This sounds template launcher failed.

if I remove SpannerIO.ReadChangeStream from the pipeline

Sounds like pipeline expansion gets stuck. if we can reproduce it locally then could get a hint. Also reach out to Spanner CDC team to investigate.

@damccorm
Copy link
Contributor Author

damccorm commented Jan 9, 2026

In the past there were situations that spanner tests due to breaking changes in Spanner client that get upgraded as GCP BOM upgrade happened in Beam.

Thanks, this is good context - was already starting on the other steps here, thanks!

@damccorm
Copy link
Contributor Author

damccorm commented Jan 12, 2026

Interestingly, I'm not able to repro the problem. For example, here is the same pipeline that fails as a template running with the 2.71.0 rc version - https://pantheon.corp.google.com/dataflow/jobs/us-central1/2026-01-12_09_09_16-11282152439616596575;step=;graphView=0?project=apache-beam-testing&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))&e=13802955&mods=dm_deploy_from_gcs

vs the same pipeline running in a template: https://pantheon.corp.google.com/dataflow/jobs/us-central1/2026-01-12_08_29_11-15499653623879337782;logsSeverity=INFO;graphView=1?project=apache-beam-testing&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))&e=13802955&mods=dm_deploy_from_gcs

I have pretty definitively proven that something with the spanner change streams connector is the problem though. The minimal repro I have right now is changing the run method in SpannerChangeStreamsToBigQuery.java to:

  public static PipelineResult run(SpannerChangeStreamsToBigQueryOptions options) {
    setOptions(options);
    validateOptions(options);
    Pipeline pipeline = Pipeline.create(options);
    DeadLetterQueueManager dlqManager = buildDlqManager(options);
    String spannerProjectId = OptionsUtils.getSpannerProjectId(options);

    String dlqDirectory = dlqManager.getRetryDlqDirectoryWithDateTime();
    String tempDlqDirectory = dlqManager.getRetryDlqDirectory() + "tmp/";

    Timestamp startTimestamp =
        options.getStartTimestamp().isEmpty()
            ? Timestamp.now()
            : Timestamp.parseTimestamp(options.getStartTimestamp());
    Timestamp endTimestamp =
        options.getEndTimestamp().isEmpty()
            ? Timestamp.MAX_VALUE
            : Timestamp.parseTimestamp(options.getEndTimestamp());

    SpannerConfig spannerConfig =
        SpannerConfig.create()
            .withProjectId(spannerProjectId)
            .withInstanceId(options.getSpannerInstanceId())
            .withDatabaseId(options.getSpannerDatabase())
            .withRpcPriority(options.getRpcPriority());
    if (!Strings.isNullOrEmpty(options.getSpannerHost())) {
      spannerConfig =
          spannerConfig.withHost(ValueProvider.StaticValueProvider.of(options.getSpannerHost()));
    }
    // Propagate database role for fine-grained access control on change stream.
    if (options.getSpannerDatabaseRole() != null) {
      spannerConfig =
          spannerConfig.withDatabaseRole(
              ValueProvider.StaticValueProvider.of(options.getSpannerDatabaseRole()));
    }

    SpannerIO.ReadChangeStream readChangeStream =
        SpannerIO.readChangeStream()
            .withSpannerConfig(spannerConfig)
            .withMetadataInstance(options.getSpannerMetadataInstanceId())
            .withMetadataDatabase(options.getSpannerMetadataDatabase())
            .withChangeStreamName(options.getSpannerChangeStreamName())
            .withInclusiveStartAt(startTimestamp)
            .withInclusiveEndAt(endTimestamp)
            .withRpcPriority(options.getRpcPriority());

    PCollection<DataChangeRecord> dataChangeRecord =
        pipeline
            .apply("Read from Spanner Change Streams", readChangeStream)
            .apply("Reshuffle DataChangeRecord", Reshuffle.viaRandomKey());

    return pipeline.run();
  }

I can then launch a template with:

mvn clean verify   -PtemplatesIntegrationTests   -Dproject="apache-beam-testing"   -DartifactBucket="apache-beam-damccorm"   -Dregion=us-central1   -pl v2/googlecloud-to-googlecloud -am   -Dtest=SpannerChangeStreamsToBigQueryIT#testSpannerChangeStreamsToBigQueryAddTable

This successfully launches the template when I don't change pom.xml at all (Template version is set to 2.70.0) though the test fails because of my changes. When I apply this PRs changes to pom.xml, it fails to launch the template at all. There is something wrong with the connector in 2.71.0. It is possibly just a permissions issue, but we need to investigate further.

@damccorm
Copy link
Contributor Author

I've narrowed the problem to this line: https://github.com/apache/beam/blob/d0223389f47f8085477e113391d7ae5961bc0bbc/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L2105

Specifically, having:

SpannerAccessor.getOrCreate(spannerConfigWithCredential)

in the template run method causes hangs. Interestingly, we still log after this, so it seems like some resource is not getting cleaned up correctly

@Abacn
Copy link
Contributor

Abacn commented Jan 13, 2026

There is something wrong with the connector in 2.71.0.

Try to revert com.google.cloud:google-cloud-spanner to 6.95.0 for templates?

apache/beam@8ff2c94

if it worked we then need to revert #36995 (and the line in #37142 s well) to unblock the release, and ask Spanner team to take a further look

we still log after this

Log agent runs in a parallel container inside launcher VM and it is not affected by stuck pipeline expansion

EDIT:

I cannot reproduce it locally either. What I did is to launch a pipeline locally with this modified code:

Actually it can be reproduced: Abacn@26c4218

Running with IDEA, there are threads preventing main method exiting:

image

@Abacn
Copy link
Contributor

Abacn commented Jan 13, 2026

Update: confirmed that downgrading com.google.cloud:google-cloud-spanner to 6.104 resolves the stuck main function:

My minimum reproduce:

build.gradle

plugins {
    id 'application'
}

configurations.all {
    resolutionStrategy {
        force "org.slf4j:slf4j-api:1.7.36"
    }
}

apply plugin : "java"

def beam_version = "2.70.0"
group = 'com.github.abacn'

ext {
    javaMainClass = "com.google.spanner.v1.SpannerTest"
}

application {
    mainClassName = javaMainClass
}

repositories {
    mavenCentral()
}

dependencies {
    implementation(group: 'org.apache.beam', name: 'beam-sdks-java-io-google-cloud-platform', version: "$beam_version")
    implementation("com.google.cloud:google-cloud-spanner:6.104.0")
}

main function:

  public static void main(String[] argv) {
    SpannerConfig config = SpannerConfig.create()
        .withProjectId("cloud-teleport-testing")
        .withInstanceId("beam-test")
        .withDatabaseId("test-271");
    SpannerAccessor accessor = SpannerAccessor.getOrCreate(config);
    // accessor.close();
  }
  • Spanner 6.104.0 x without close --- run successful
  • Spanner 6.105.0 x without close ---- stuck
  • Spanner 6.105.0 x with close --- run successful

I think we need to revert this line

https://github.com/apache/beam/blob/133034612813f20d76e0c081d5058f04daf10544/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L617

and build a new RC

before that we can override implementation("com.google.cloud:google-cloud-spanner:6.104.0") dependency in DataflowTemplates to use 6.104.0 (declare it in pom, and exclude it as a transient dep for io-google-cloud-platform) to confirm

@Abacn Abacn mentioned this pull request Jan 13, 2026
3 tasks
@damccorm
Copy link
Contributor Author

damccorm commented Jan 13, 2026

Yep, this showed up for me as well. Thanks!

Spanner 6.105.0 x with close --- run successful

This is a good find, thanks for digging in. I was guessing we weren't closing connections but hadn't had time to dig in yet. We could probably consider rolling forward instead. I did a quick audit and the only places this is misused are:

@damccorm
Copy link
Contributor Author

I'll put together a PR for tomorrow; if we can avoid diverging from the BOM version, I do prefer that.

@sakthivelmanii
Copy link

sakthivelmanii commented Jan 14, 2026

@rahul2393 can you please check this issue? my primary suspect is related to enabling gRPC gcp by default.

@damccorm
Copy link
Contributor Author

Yep, this showed up for me as well. Thanks!

Spanner 6.105.0 x with close --- run successful

This is a good find, thanks for digging in. I was guessing we weren't closing connections but hadn't had time to dig in yet. We could probably consider rolling forward instead. I did a quick audit and the only places this is misused are:

Actually, a roll forward isn't as easy as I thought. It is straightforward in SpannerIO, but in DaoFactory there are several places where we initialize a SpannerAccessor, then use it to create a DB client which we continue to pass around beyond the scope of the method. So we can't just call close there, we need to close the connection only when we're tearing down the calling context.

For now, I'll just revert the spanner version, but we'll need more investigation

@damccorm damccorm mentioned this pull request Jan 14, 2026
3 tasks
@damccorm
Copy link
Contributor Author

apache/beam#37305 for the revert in Beam

@damccorm
Copy link
Contributor Author

Validating RC2 here - #3222 - I'll leave this one open for now though for investigation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants