-
-
Notifications
You must be signed in to change notification settings - Fork 178
Cross chunk matching #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SimonCropp
wants to merge
45
commits into
main
Choose a base branch
from
CrossChunkMatcher2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,241
−397
Open
Cross chunk matching #1648
Changes from 15 commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
2e6eab6
.
SimonCropp 61be3e1
.
SimonCropp 9547497
.
SimonCropp 4fcfe7e
.
SimonCropp 9cbf604
.
SimonCropp 8ea715c
.
SimonCropp 4500695
Update CrossChunkMatcher.cs
SimonCropp d5bd5b2
.
SimonCropp 448915d
Update CrossChunkMatcher.cs
SimonCropp 621a50e
.
SimonCropp d089ed1
Update DirectoryReplacements_StringBuilder.cs
SimonCropp d55ab13
.
SimonCropp 6eec98f
Update CrossChunkMatcher.cs
SimonCropp bb269f9
.
SimonCropp d28a0fc
Update DateScrubber.cs
SimonCropp d89326b
Update src/Verify/Serialization/Scrubbers/CrossChunkMatcher.cs
SimonCropp e5b7655
Update src/Verify/Serialization/Scrubbers/DirectoryReplacements_Strin…
SimonCropp 45cff61
Update src/Verify/Serialization/Scrubbers/DirectoryReplacements_Strin…
SimonCropp 0eddccb
.
SimonCropp cc75e9c
.
SimonCropp 19e4e72
Merge branch 'main' into CrossChunkMatcher2
SimonCropp 6b6bbb0
Update CrossChunkMatcher.cs
SimonCropp 2e84ea2
Update CrossChunkMatcher.cs
SimonCropp b1c5d9a
.
SimonCropp 87f4760
.
SimonCropp e8e1263
Update CrossChunkMatcher.cs
SimonCropp 33020a6
Update CrossChunkMatcherBenchmarks.cs
SimonCropp 7003276
Update CrossChunkMatcherBenchmarks.cs
SimonCropp 29b7f60
Merge branch 'main' into CrossChunkMatcher2
SimonCropp 54105a1
Merge branch 'main' into CrossChunkMatcher2
SimonCropp 6bc2108
Update CrossChunkMatcher.cs
SimonCropp f584fac
Merge branch 'main' into CrossChunkMatcher2
SimonCropp 9c5d0f6
Update CrossChunkMatcher.cs
SimonCropp d2efaa3
Update CrossChunkMatcher.cs
SimonCropp 7ce4241
Update CrossChunkMatcher.cs
SimonCropp b2711cc
.
SimonCropp e7b2874
Update CrossChunkMatcherBenchmarks.cs
SimonCropp 90dc4a5
Update CrossChunkMatcher.cs
SimonCropp 81dbc48
Update CrossChunkMatcher.cs
SimonCropp d3c9213
Update CrossChunkMatcher.cs
SimonCropp b5d8496
Update CrossChunkMatcher.cs
SimonCropp 86973c9
Update CrossChunkMatcher.cs
SimonCropp 4295148
Update DateScrubber.cs
SimonCropp b50e2ef
Update Extensions.cs
SimonCropp 26cfb5d
Update Extensions.cs
SimonCropp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
109 changes: 109 additions & 0 deletions
109
src/Verify/Serialization/Scrubbers/CrossChunkMatcher.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| /// <summary> | ||
| /// Helper for matching and replacing patterns in StringBuilder that may span across chunk boundaries. | ||
| /// </summary> | ||
| static class CrossChunkMatcher | ||
| { | ||
| /// <summary> | ||
| /// Finds all matches in a StringBuilder (handling patterns spanning chunk boundaries) and applies replacements. | ||
| /// </summary> | ||
| /// <param name="builder">The StringBuilder to search and modify</param> | ||
| /// <param name="context">User context passed to callbacks</param> | ||
| /// <param name="onCrossChunk">Called for each potential cross-chunk match position</param> | ||
| /// <param name="onWithinChunk">Called for each position within a chunk</param> | ||
| public static void ReplaceAll<TContext>( | ||
| StringBuilder builder, | ||
| int maxLength, | ||
| TContext context, | ||
| CrossChunkHandler<TContext> onCrossChunk, | ||
| WithinChunkHandler<TContext> onWithinChunk) | ||
| { | ||
| Span<char> buffer = stackalloc char[maxLength]; | ||
| Span<char> carryoverBuffer = stackalloc char[maxLength - 1]; | ||
| var carryoverLength = 0; | ||
| var previousChunkAbsoluteEnd = 0; | ||
| var absolutePosition = 0; | ||
| List<Match> matches = []; | ||
| var addMatch = matches.Add; | ||
| foreach (var chunk in builder.GetChunks()) | ||
| { | ||
| var chunkSpan = chunk.Span; | ||
|
|
||
| // Check for matches spanning from previous chunk to current chunk | ||
| if (carryoverLength > 0) | ||
| { | ||
| for (var carryoverIndex = 0; carryoverIndex < carryoverLength; carryoverIndex++) | ||
| { | ||
| var remainingInCarryover = carryoverLength - carryoverIndex; | ||
| var startPosition = previousChunkAbsoluteEnd - carryoverLength + carryoverIndex; | ||
|
|
||
| onCrossChunk( | ||
| builder, | ||
| carryoverBuffer, | ||
| buffer, | ||
| carryoverIndex, | ||
| remainingInCarryover, | ||
| chunkSpan, | ||
| startPosition, | ||
| context, | ||
| addMatch); | ||
| } | ||
| } | ||
|
|
||
| // Process matches entirely within this chunk | ||
| var chunkIndex = 0; | ||
| while (chunkIndex < chunk.Length) | ||
| { | ||
| var absoluteIndex = absolutePosition + chunkIndex; | ||
| var skipAhead = onWithinChunk(chunk, chunkSpan, chunkIndex, absoluteIndex, context, addMatch); | ||
| chunkIndex += skipAhead > 0 ? skipAhead : 1; | ||
| } | ||
|
|
||
| // Save last N chars for next iteration | ||
| carryoverLength = Math.Min(maxLength - 1, chunk.Length); | ||
| chunkSpan.Slice(chunk.Length - carryoverLength, carryoverLength).CopyTo(carryoverBuffer); | ||
|
|
||
| previousChunkAbsoluteEnd = absolutePosition + chunk.Length; | ||
| absolutePosition += chunk.Length; | ||
| } | ||
|
|
||
| // Apply matches in descending position order | ||
| foreach (var match in matches.OrderByDescending(_ => _.Index)) | ||
| { | ||
| builder.Overwrite(match.Value, match.Index, match.Length); | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Callback for processing potential cross-chunk matches. | ||
| /// </summary> | ||
| public delegate void CrossChunkHandler<TContext>( | ||
| StringBuilder builder, | ||
| Span<char> carryoverBuffer, | ||
| Span<char> buffer, | ||
| int carryoverIndex, | ||
| int remainingInCarryover, | ||
| CharSpan currentChunkSpan, | ||
| int absoluteStartPosition, | ||
| TContext context, | ||
| Action<Match> addMatch); | ||
|
|
||
| /// <summary> | ||
| /// Callback for processing positions within a chunk. | ||
| /// </summary> | ||
| /// <returns>Number of positions to skip ahead (0 or 1 for normal iteration, more to skip past a match)</returns> | ||
SimonCropp marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| public delegate int WithinChunkHandler<TContext>( | ||
| ReadOnlyMemory<char> chunk, | ||
| CharSpan chunkSpan, | ||
| int chunkIndex, | ||
| int absoluteIndex, | ||
| TContext context, | ||
| Action<Match> addMatch); | ||
| } | ||
|
|
||
|
|
||
| readonly struct Match(int index, int length, string value) | ||
| { | ||
| public readonly int Index = index; | ||
| public readonly int Length = length; | ||
| public readonly string Value = value; | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The XML documentation is missing the
maxLengthparameter. Add a<param name="maxLength">tag to document this parameter, e.g.,<param name="maxLength">Maximum length of patterns to search for</param>.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback