Skip to content

Reapply "Fix: Handle duplicate validation correctly when sanitizing (#4238)"#4265

Merged
marcsanmi merged 7 commits intomainfrom
20250618_reapply-label-validation
Jul 11, 2025
Merged

Reapply "Fix: Handle duplicate validation correctly when sanitizing (#4238)"#4265
marcsanmi merged 7 commits intomainfrom
20250618_reapply-label-validation

Conversation

@simonswine
Copy link
Contributor

@simonswine simonswine commented Jun 18, 2025

This introduces #4238 again, it does break the OTLP validation integration tests.

@marcsanmi if you got the chance, please take a look at it.

Fixes https://github.com/grafana/pyroscope-squad/issues/480

@marcsanmi
Copy link
Contributor

@simonswine Please take a look when you're back. It seems the we weren't keeping the labels properly ordered after updating them.

Copy link
Contributor Author

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking that up, I do think it fairly hard to read and not entirely sure if its correct. Let's simplfy this as I mentioned in the comments.

@marcsanmi marcsanmi marked this pull request as ready for review July 10, 2025 09:52
@marcsanmi
Copy link
Contributor

@simonswine I addressed all the concerns, lmkwyt

Copy link
Contributor Author

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't approve as it's my PR, but I would approve if I could.

Thanks for revisiting this ❤️

@marcsanmi marcsanmi requested a review from a team July 11, 2025 07:51
@alsoba13
Copy link
Contributor

alsoba13 commented Jul 11, 2025

LGTM,

I'm wondering how would be the performance impact of using map[string]string to handle duplicates, instead of doing slice tricks such as creating new slices and binary search them. I guess it's just fine in both ways as probably we are dealing with a relatively small amount of labels. Edit: also most common scenario would be 0 slice tricks (no renaming labels), so introducing maps here would be punishing

@marcsanmi
Copy link
Contributor

I'm wondering how would be the performance impact of using map[string]string to handle duplicates, instead of doing slice tricks such as creating new slices and binary search them. I guess it's just fine in both ways as probably we are dealing with a relatively small amount of labels. Edit: also most common scenario would be 0 slice tricks (no renaming labels), so introducing maps here would be punishing

I think labels must stay sorted for consistent fingerprints (Labels.Hash() processes sequentially), so even with a map for duplicates, we'd still need binary search for correct insertion positioning. The current approach uses the sorted slice for both duplicate detection and positioning, eliminating an additional data structure, lmkwyt.

@marcsanmi marcsanmi merged commit 6f8e0d3 into main Jul 11, 2025
24 checks passed
@marcsanmi marcsanmi deleted the 20250618_reapply-label-validation branch July 11, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants