Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metric for rocksdb getLastEntryInLedger to help find out bottleneck #4529

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TakaHiR07
Copy link
Contributor

@TakaHiR07 TakaHiR07 commented Nov 19, 2024

Motivation

One of our cluster occur a case that read ledger LAC timeout and ledger can not recover, which make topic unavailable. After adding extra log in bookie-server, we finally found the bottleneck is in EntryLocationIndex#getLastEntryInLedgerInternal, it spend 2.5 minute to scan the rocksdb.

企业微信截图_ccc8453e-19b6-4c6a-8d7c-d63f2b7bf09e

Currently it may be hard to find out the bottleneck is in getLastEntryInLedgerInternal. Because if getLastEntryInLedgerInternal throw noEntry exception, the read-locations-index-time is not able to record the long latency. There is no way to know the bottleneck is in getLastEntryInLedgerInternal.

企业微信截图_599c4094-1e4e-4edb-8c64-35bed13e0917
企业微信截图_8d312886-7c02-4ef7-9e3a-a94525a1c8e1

Because once the bottleneck in getLastEntry occur, the worst it would cause ledger unavailable and pulsar topic unavailable, I think is important to add this metric

Changes

add metric.

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants