-
Notifications
You must be signed in to change notification settings - Fork 29
metrics: add more metrics and improve grafana #3430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @lidezhu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's monitoring capabilities by introducing a new metric for tracking dispatcher resets and refining the collection logic for existing event-related metrics. These improvements aim to provide more detailed and accurate insights into the event service's operation, which will be crucial for better performance analysis and debugging, particularly when integrated with Grafana dashboards. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces improvements to metrics. It adds a new counter EventServiceResetDispatcherCount to track dispatcher resets, which is a valuable addition for monitoring. It also refactors the naming of some metric variables for better style. However, the change to the EventStoreReceivedEventCount metric for resolved events alters its meaning from counting received events to counting outgoing notifications to dispatchers. This makes the metric's name and help text misleading and inconsistent with how other event types for the same metric are handled. I've provided a suggestion to address this for better clarity and maintainability of the metrics.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini summary |
Summary of ChangesThis pull request aims to significantly enhance the monitoring and observability of the TiCDC system. It introduces new metrics to track critical event processing and dispatcher management operations, alongside comprehensive updates to the Grafana dashboard. These changes provide more granular insights into system performance, particularly regarding event store activity, dispatcher health, and the state of slowest table replication, facilitating better debugging and performance analysis. Highlights
Changelog
Activity
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hongyunyan, tenfyzhong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
/test pull-cdc-mysql-integration-heavy |
1 similar comment
|
/test pull-cdc-mysql-integration-heavy |
Signed-off-by: ti-chi-bot <[email protected]>
|
In response to a cherrypick label: new pull request created to branch |
* This is an automated cherry-pick of #3430 Signed-off-by: ti-chi-bot <[email protected]> * fix check --------- Signed-off-by: ti-chi-bot <[email protected]> Co-authored-by: lidezhu <[email protected]> Co-authored-by: lidezhu <[email protected]>
What problem does this PR solve?
Issue Number: ref #2751
What is changed and how it works?
This pull request aims to significantly enhance the monitoring and observability of the TiCDC system. It introduces new metrics to track critical event processing and dispatcher management operations, alongside comprehensive updates to the Grafana dashboard. These changes provide more granular insights into system performance, particularly regarding event store activity, dispatcher health, and the state of slowest table replication, facilitating better debugging and performance analysis.
Highlights
CounterKvandCounterResolvedin the event store were renamed tokvEventCountandresolvedEventCountrespectively for improved clarity.EventStoreNotifyDispatcherCount, was introduced to accurately track the total number of dispatcher notifications sent by the event store, reflecting the number of subscribers notified.EventServiceResetDispatcherCount, was added to monitor the frequency of event dispatcher reset operations.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note