Skip to content

Conversation

@desertwitch
Copy link
Contributor

@desertwitch desertwitch commented Mar 30, 2025

fixes #592

regex was parsed by string method and not considering escaped delimiters
a regex sequence with an escaped delimiter would be cut short (see issue)

lexing logic for parsing regex was moved into a separate method
a regex sequence with an escaped delimiter is now properly handled
test cases were added to make this behavior visible and testable for the future

@desertwitch desertwitch requested a review from a team as a code owner March 30, 2025 10:06
@desertwitch desertwitch requested review from bashbunni and removed request for a team March 30, 2025 10:06
@desertwitch

This comment has been minimized.

@desertwitch
Copy link
Contributor Author

Gentle bump: Are there any blockers for this bugfix (something I could/should address)?


// readRegex reads a regex pattern from the input, handling escaped delimiters.
// /foo\/bar/ => Token(foo\/bar).
func (l *Lexer) readRegex(endChar byte) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @desertwitch thanks for the PR!!

I think we may need to improve this logic a bit. This function handles \/ escapes, but regexes can have other escapes like \\, \n, \d, etc. Current implementation would incorrectly consume \\ making \\/ become \/ in the literal.

The pattern /foo\\/ (ending with literal backslash) may not parse correctly since \\ followed by / would be consumed as escaped delimiter.

Copy link
Contributor Author

@desertwitch desertwitch Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems logical, not sure why I didn't consider this at the time. Unfortunately relatively limited in bandwidth these days, but feel free to shuffle around or close/rework elsewhere as needed.

Edit: Now I remember. The lexer's job seemed to me just to extract the literal regex pattern between delimiters, and therefore only handle escapes that are relevant to the lexer syntax (escaped delimiters), not regex-specific escapes (to be handled downstream in an actual regex parsing function, which relies on such inner escapes to be there).

{token.AT, "@"},
{token.NUMBER, "1"},
{token.MINUTES, "m"},
{token.REGEX, "foo\\/bar"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test expects "foo\/bar" but the actual string stored should be "foo/bar" after consuming the escape.

Copy link
Contributor Author

@desertwitch desertwitch Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? The lexer's job seemed to me just to extract the literal regex pattern between delimiters, and only handling escapes that are relevant to the lexer syntax (escaped delimiters), not regex-specific escapes. This, I think, is also why I only handled such escaped delimiters, so other escapes would end up in the pattern string, to be handled as required during actual regex parsing.

@raphamorim
Copy link
Member

Merged changes in #678 . thank you @desertwitch 🙏

@raphamorim raphamorim closed this Nov 18, 2025
@desertwitch desertwitch deleted the patch-1 branch November 18, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impossible to escape / in Wait regex

3 participants