-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: fix gateway metrics #5483
base: release/1.42.x
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## release/1.42.x #5483 +/- ##
==================================================
- Coverage 75.03% 74.90% -0.14%
==================================================
Files 458 458
Lines 63266 63279 +13
==================================================
- Hits 47470 47396 -74
- Misses 13160 13227 +67
- Partials 2636 2656 +20 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved given that you didn't introduce the issue but do keep it in mind please since you also review other people's code, it's important that we design code that is decoupled and efficient. So please do keep an eye out for these issues when you review as well.
jws.stat.RequestEventsFailed(1, "storeFailed") | ||
jws.stat.EventsFailed(1, "storeFailed") | ||
jws.stat.Report(gw.stats) | ||
} | ||
stat.RequestFailed("storeFailed") | ||
stat.Report(gw.stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole approach of jobs with stats embedded smells and it's starting to become a problem.
Calling Report
each time in a loop doesn't seem like a good idea. I know it was already there but I had missed the first time. As already mentioned here we have to be especially careful about what we're doing in loops.
Report
is calling NewTaggedStats
which we know uses mutexes under the hood. So imagine doing this for a batch of thousands. We're paying a substantial performance penalty because of bad designed code.
I think this should work the other way around. Instead of having jobs with stats we should have a stats-aware component where you just feed it jobs and their status and then report once at the end.
stats.AddFailedEvent(job, "storeFailed") // gets workspaceID, sourceID from job
stats.AddFailedEvent(job, "storeFailed") // gets workspaceID, sourceID from job
stats.AddFailedEvent(job, "storeFailed") // gets workspaceID, sourceID from job
stats.Report() // stats could have kept maps of tags according to labels like workspaceID etc... and then it reports once per Tags
Description
We iterate on the jobs and for each job we increase the events by 1 like this:
The problem is that, whenever you call RequestEventsSucceeded it is also increasing the requests:
So we should increment events and requests separately. there are some tags like sourceID which are required for events but do not makes sense for internalBatch request. So introducing new methods
EventsSuccess
EventsFailed
to increment events and requests separately.Linear Ticket
Security