-
Notifications
You must be signed in to change notification settings - Fork 335
fix(lexer): handle regex and escaped delimiters #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
|
Gentle bump: Are there any blockers for this bugfix (something I could/should address)? |
|
|
||
| // readRegex reads a regex pattern from the input, handling escaped delimiters. | ||
| // /foo\/bar/ => Token(foo\/bar). | ||
| func (l *Lexer) readRegex(endChar byte) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @desertwitch thanks for the PR!!
I think we may need to improve this logic a bit. This function handles \/ escapes, but regexes can have other escapes like \\, \n, \d, etc. Current implementation would incorrectly consume \\ making \\/ become \/ in the literal.
The pattern /foo\\/ (ending with literal backslash) may not parse correctly since \\ followed by / would be consumed as escaped delimiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems logical, not sure why I didn't consider this at the time. Unfortunately relatively limited in bandwidth these days, but feel free to shuffle around or close/rework elsewhere as needed.
Edit: Now I remember. The lexer's job seemed to me just to extract the literal regex pattern between delimiters, and therefore only handle escapes that are relevant to the lexer syntax (escaped delimiters), not regex-specific escapes (to be handled downstream in an actual regex parsing function, which relies on such inner escapes to be there).
| {token.AT, "@"}, | ||
| {token.NUMBER, "1"}, | ||
| {token.MINUTES, "m"}, | ||
| {token.REGEX, "foo\\/bar"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test expects "foo\/bar" but the actual string stored should be "foo/bar" after consuming the escape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this? The lexer's job seemed to me just to extract the literal regex pattern between delimiters, and only handling escapes that are relevant to the lexer syntax (escaped delimiters), not regex-specific escapes. This, I think, is also why I only handled such escaped delimiters, so other escapes would end up in the pattern string, to be handled as required during actual regex parsing.
|
Merged changes in #678 . thank you @desertwitch 🙏 |
fixes #592
regex was parsed by string method and not considering escaped delimiters
a regex sequence with an escaped delimiter would be cut short (see issue)
lexing logic for parsing regex was moved into a separate method
a regex sequence with an escaped delimiter is now properly handled
test cases were added to make this behavior visible and testable for the future