Skip to content

logservice: avoid dispatcher registration failure during schema store initialization#4366

Merged
ti-chi-bot[bot] merged 2 commits intomasterfrom
ldz/fix-event-broker0305
Mar 6, 2026
Merged

logservice: avoid dispatcher registration failure during schema store initialization#4366
ti-chi-bot[bot] merged 2 commits intomasterfrom
ldz/fix-event-broker0305

Conversation

@lidezhu
Copy link
Collaborator

@lidezhu lidezhu commented Mar 6, 2026

What problem does this PR solve?

Issue Number: ref #4272 close #4285

What is changed and how it works?

This pull request aims to avoid schema store registration failure like the following:

[2026/02/22 13:16:56.220 +08:00] [ERROR] [event_broker.go:983] ["register table to schemaStore failed"] [keyspaceID=1] [dispatcherID=31207855160892268629454174482575748278] [tableID=181] [startTs=464450330781810783] [span="tableID: 181, startKey: 7800000174800000ff00000000b55f7200fe, endKey: 7800000174800000ff00000000b55f7300fe, keyspaceID: 1"] [error="[CDC:ErrNewStore]new store failed: context canceled"]

Highlights

  • Event Dispatcher Robustness: Enhanced the addDispatcher function to ensure proper unregistration and marking of dispatchers as removed immediately upon schema store registration failure, preventing inconsistent states.
  • Improved CreateTiStore Retry Logic: Refactored the retry mechanism for CreateTiStore to differentiate between caller-initiated context cancellations (which should stop retries) and internal client cancellations (which should continue retries), improving resilience.
  • Comprehensive Testing: Added new unit tests to validate the corrected addDispatcher error handling and the refined CreateTiStore retry behavior.
  • Mock Event Store Enhancements: Updated the mock event store to accurately track unregistration calls and ensure complete cleanup of dispatcher mappings during tests.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Bug Fixes

    • Ensures event dispatchers are properly marked and unregistered on schema registration failures to avoid stray dispatching.
    • Refines upstream connection retry handling to respect caller cancellation.
    • Ensures event store write batches are always released to prevent resource leaks.
  • Tests

    • Added tests covering dispatcher unregistration on schema errors and retry cancellation behavior.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 6, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses potential issues during service initialization and resource management. It improves the robustness of event dispatcher registration by ensuring immediate cleanup upon failure and refines the retry strategy for CreateTiStore to handle context cancellations more intelligently. These changes aim to prevent service registration failures and enhance the overall stability of the system.

Highlights

  • Event Dispatcher Robustness: Enhanced the addDispatcher function to ensure proper unregistration and marking of dispatchers as removed immediately upon schema store registration failure, preventing inconsistent states.
  • Improved CreateTiStore Retry Logic: Refactored the retry mechanism for CreateTiStore to differentiate between caller-initiated context cancellations (which should stop retries) and internal client cancellations (which should continue retries), improving resilience.
  • Comprehensive Testing: Added new unit tests to validate the corrected addDispatcher error handling and the refined CreateTiStore retry behavior.
  • Mock Event Store Enhancements: Updated the mock event store to accurately track unregistration calls and ensure complete cleanup of dispatcher mappings during tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pkg/eventservice/event_broker.go
    • Added logic to mark a dispatcher as removed and unregister it from the event store immediately if addDispatcher fails due to a schema store error.
  • pkg/eventservice/event_broker_test.go
    • Added TestAddDispatcherUnregisterOnSchemaStoreError to verify that dispatchers are correctly unregistered when schema store registration fails.
  • pkg/eventservice/event_service_test.go
    • Imported sync/atomic package.
    • Added an atomic.Uint64 field unregisterCount to mockEventStore to track unregistration calls.
    • Modified UnregisterDispatcher in mockEventStore to delete the dispatcher from dispatcherMap and increment unregisterCount.
  • pkg/upstream/upstream.go
    • Extracted the retry condition logic for CreateTiStore into a new helper function isCreateTiStoreRetryable.
    • Updated isCreateTiStoreRetryable to distinguish between caller's context cancellation and internal client cancellation for retry decisions.
  • pkg/upstream/upstream_test.go
    • Added TestIsCreateTiStoreRetryable to test the new retry logic, verifying correct behavior for both background and canceled contexts.
Activity
  • The pull request description is a template, and no specific human activity (comments, reviews, or progress updates) has been recorded.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a4fe6f8a-c366-4717-a14b-f64b7abba2f9

📥 Commits

Reviewing files that changed from the base of the PR and between 0f8b6c5 and 38b66a3.

📒 Files selected for processing (1)
  • logservice/eventstore/event_store.go

📝 Walkthrough

Walkthrough

Adds cleanup to event broker error paths to mark and unregister dispatchers on schema/register failures, extracts CreateTiStore retry decision into a helper, and ensures Pebble batch closure in the event store; corresponding tests added/updated.

Changes

Cohort / File(s) Summary
Event Service: broker & tests
pkg/eventservice/event_broker.go, pkg/eventservice/event_broker_test.go, pkg/eventservice/event_service_test.go
On addDispatcher error paths, mark dispatcher removed and call eventStore.UnregisterDispatcher(...) to prevent notifications; add test TestAddDispatcherUnregisterOnSchemaStoreError. Mock event store now tracks unregisterCount.
Upstream: retry logic & tests
pkg/upstream/upstream.go, pkg/upstream/upstream_test.go
Extracts retryability check into isCreateTiStoreRetryable(ctx, err) and adds TestIsCreateTiStoreRetryable covering context-canceled behavior.
Event store: resource cleanup
logservice/eventstore/event_store.go, go.mod
Add defer batch.Close() in writeEvents to ensure Pebble batch is closed; minor manifest update.

Sequence Diagram(s)

sequenceDiagram
    participant Broker as EventBroker
    participant Schema as SchemaStore
    participant Store as EventStore
    Broker->>Schema: register schema / table
    Schema--xBroker: return error
    Note right of Broker: mark dispatcher.isRemoved = true
    Broker->>Store: UnregisterDispatcher(changefeedID, dispatcherID)
    Store-->>Broker: acknowledge unregister
    Broker-->>Broker: cleanup local dispatcher state
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • wk989898
  • flowbehappy

Poem

🐰 I hopped in code, a careful thrice—
When schema breaks, I cut the splice.
I mark the dispatcher, send it away,
Close the batch, and hop to play.
Tests cheer loud for a tidy day. 🎋

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: avoiding dispatcher registration failure during schema store initialization, which is the core problem addressed by the PR.
Description check ✅ Passed The PR description includes a clear problem statement with error context, details about what is changed and how it works, highlights of the improvements, and references the related issue (#4272). However, the release note section is left incomplete with only the template placeholder.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ldz/fix-event-broker0305

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two main improvements. First, it enhances the error handling in addDispatcher by ensuring that if schemaStore.RegisterTable fails, the dispatcher is correctly unregistered from the event store and marked as removed to prevent race conditions. This is a solid fix for resource cleanup and stability. Second, it refactors the retry logic in CreateTiStore to be more robust against transient context.Canceled errors, which improves the reliability of upstream initialization. The changes are well-implemented and include corresponding tests to verify the new behavior.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 6, 2026
@lidezhu lidezhu changed the title logservice: avoid register failure during initialization logservice: avoid register failure during schemastore initialization Mar 6, 2026
@lidezhu lidezhu changed the title logservice: avoid register failure during schemastore initialization logservice: avoid dispatcher registration failure during schema store initialization Mar 6, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand, tenfyzhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [3AceShowHand,tenfyzhong]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 6, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-06 06:31:12.948585564 +0000 UTC m=+512517.526664758: ☑️ agreed by 3AceShowHand.
  • 2026-03-06 06:51:28.848505851 +0000 UTC m=+513733.426585065: ☑️ agreed by tenfyzhong.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Mar 6, 2026

/retest

@ti-chi-bot ti-chi-bot bot merged commit 3dae8aa into master Mar 6, 2026
24 of 26 checks passed
@ti-chi-bot ti-chi-bot bot deleted the ldz/fix-event-broker0305 branch March 6, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EventService sent resolved ts stuck after SchemaStore.RegisterTable failure

3 participants