chore: fix gateway metrics #5483

mihir20 · 2025-02-06T09:59:07Z

Description

We iterate on the jobs and for each job we increase the events by 1 like this:

for _, jws := range jobsWithStats {
	jws.stat.RequestEventsSucceeded(1)

The problem is that, whenever you call RequestEventsSucceeded it is also increasing the requests:

func (ss *SourceStat) RequestEventsSucceeded(num int) {
	ss.events.succeeded += num
	ss.events.total += num
	ss.requests.total++
	ss.requests.succeeded++
}

So we should increment events and requests separately. there are some tags like sourceID which are required for events but do not makes sense for internalBatch request. So introducing new methods EventsSuccess EventsFailed to increment events and requests separately.

Linear Ticket

Security

The code changed/added as part of this pull request won't create any security issues with how the software is being used.

codecov · 2025-02-06T11:50:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.90%. Comparing base (9bf77b4) to head (1769a3e).
Report is 1 commits behind head on release/1.42.x.

Additional details and impacted files

@@                Coverage Diff                 @@
##           release/1.42.x    #5483      +/-   ##
==================================================
- Coverage           75.03%   74.90%   -0.14%     
==================================================
  Files                 458      458              
  Lines               63266    63279      +13     
==================================================
- Hits                47470    47396      -74     
- Misses              13160    13227      +67     
- Partials             2636     2656      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fracasula

I approved given that you didn't introduce the issue but do keep it in mind please since you also review other people's code, it's important that we design code that is decoupled and efficient. So please do keep an eye out for these issues when you review as well.

fracasula · 2025-02-07T09:28:04Z

gateway/handle.go

-					jws.stat.RequestEventsFailed(1, "storeFailed")
+					jws.stat.EventsFailed(1, "storeFailed")
 					jws.stat.Report(gw.stats)
 				}
+				stat.RequestFailed("storeFailed")
+				stat.Report(gw.stats)


This whole approach of jobs with stats embedded smells and it's starting to become a problem.

Calling Report each time in a loop doesn't seem like a good idea. I know it was already there but I had missed the first time. As already mentioned here we have to be especially careful about what we're doing in loops.

Report is calling NewTaggedStats which we know uses mutexes under the hood. So imagine doing this for a batch of thousands. We're paying a substantial performance penalty because of bad designed code.

I think this should work the other way around. Instead of having jobs with stats we should have a stats-aware component where you just feed it jobs and their status and then report once at the end.

stats.AddFailedEvent(job, "storeFailed") // gets workspaceID, sourceID from job stats.AddFailedEvent(job, "storeFailed") // gets workspaceID, sourceID from job stats.AddFailedEvent(job, "storeFailed") // gets workspaceID, sourceID from job stats.Report() // stats could have kept maps of tags according to labels like workspaceID etc... and then it reports once per Tags

mihir20 added 2 commits February 6, 2025 15:27

chore: fix gateway metrics

2c0bb15

fix tests

1769a3e

mihir20 marked this pull request as ready for review February 6, 2025 11:50

mihir20 requested review from fracasula and Sidddddarth February 6, 2025 11:50

$fracasula$

fracasula approved these changes Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: fix gateway metrics #5483

chore: fix gateway metrics #5483

mihir20 commented Feb 6, 2025 •

edited

Loading

codecov bot commented Feb 6, 2025

$@fracasula$ fracasula left a comment

$@fracasula$ fracasula Feb 7, 2025

chore: fix gateway metrics #5483

Are you sure you want to change the base?

chore: fix gateway metrics #5483

Conversation

mihir20 commented Feb 6, 2025 • edited Loading

Description

Linear Ticket

Security

codecov bot commented Feb 6, 2025

Codecov Report

fracasula left a comment

Choose a reason for hiding this comment

fracasula Feb 7, 2025

Choose a reason for hiding this comment

mihir20 commented Feb 6, 2025 •

edited

Loading

$@fracasula$ fracasula left a comment

$@fracasula$ fracasula Feb 7, 2025