-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Add @stm "SyntaxTree match" macro
#60475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6570552 to
5e12406
Compare
5e12406 to
1b79fc2
Compare
topolarity
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a nice utility to have, esp. to start developing "AST niceties", and the docs (and careful semantics) are appreciated
No major objections, but I would like to test matching performance. Ideally we should be able to match against / extract things from existing SyntaxTrees without any allocations?
669d97e to
ea1dc9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. I like that you've started with something minimal, there's lots of ways to extend it but we'll find out what we need as we use it.
I'd like to write the pattern syntax for leaf nodes in the same way that @ast does. I see you used [K"Identifier"] for leaf nodes at the moment but that feels fairly foreign after using the @ast syntax for a while.
How about we have the pattern [K"call" name::K"Identifer"] match the tree [K"call" "f"::K"Identifier"] with the binding name = "f"? (`_::K"Identifer" for the version without creating a binding)
Lots of ideas for possible extensions come to mind but we can get to that if we need them. A couple of rough ideas:
- Greedy left-to-right matching of one-or-zero children. Example:
[K"call" name maybe[K"parameters" param_list...] args...]could allow optional parameters withparam_listbound to aSyntaxListornothing. Obviously we could choose an operator instead ofmaybe... needs thought to find something appealing. - I think we'll want to be able to bind subtrees of a particular kind without resorting to
when. It could make sense to take one of the assignment operators[K"call" name params_block~K"parameters"] args...]could work. Or things like[K"call" name params_block=[K"parameters" p1 _...] args...]if we want to match substructure though unsure how useful that is.
Will implement for consistency.
Very true, there are so many features I want to have (from using https://docs.racket-lang.org/reference/match.html: and/or/not patterns, My feeling is that |
Co-authored-by: Claire Foster <[email protected]>
Co-authored-by: Claire Foster <[email protected]>
e8dc374 to
23e6283
Compare
Done, but not sure if I like it. I opted to keep the amount of checking the same as before, i.e. assuming the kind determines whether something is a leaf. That's already assumed everywhere, and otherwise we'd need to add an |
Yeah I don't super love the syntax
I think that should be fine. The
Ayy oh well we'll just have to leave that up to the compiler engineers I guess 🤷♀️ Seriously though I've been tempted... I saw Keno's match work too :) |
JuliaSyntax/test/syntax_graph.jl
Outdated
| [K"call" _ _ [K"kw" _::K"Identifier" k1] [K"call" _::K"Identifier" [K"kw" _::K"Identifier" k2]]] -> 1 | ||
| end | ||
| @test @stm st begin | ||
| [K"call" x::K"Identifier" _ _ _] -> x === st[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this wasn't what I had in mind. My thought was to copy the same meaning that @ast uses so that we have
| [K"call" x::K"Identifier" _ _ _] -> x === st[1] | |
| [K"call" x::K"Identifier" _ _ _] -> x == "foo" |
I don't think the x::K"blah" syntax is fitting if we want to get the actual node.
(I do think it would be useful to have syntax for "match a node of this kind regardless of its children". But I didn't want to make this PR more complicated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I forgot what @ast did there. I think I'll just revert my previous commit in that case, since only binding to e.g. a K"Identifier"'s name_val would require a kind->attribute mapping that makes @stm specific to lowering (e.g. K"BindingId" -> var_id) or a bunch of bikeshedding to make that mapping a parameter. All I want in this PR is simple kind and numchildren matching, and I'm fine with deviating a little bit from the @ast syntax.
This reverts commit 23e6283.
Good to know! |
c42f
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway so I'm uneasy enough about the syntax pun here that I've spent a few hours trying to figure out whether I'm stuck thinking the way is_leaf happens to work is useful, or whether there's a way to interpret SyntaxTree where @stm [K"Identifier"] syntax can be a good representation of the data structure.
It would be possible in future to remove leaves as separate state and implement is_leaf based on the Kind. The Kind bits or kind ordering can be tweaked to make is_leaf(::Kind) basically free. We'd need to change the kinds in the green tree to avoid the pun on things like K"for" being an internal node of the AST vs K"for" being a tag for the literal text "for" coming from Tokenize. All that's possible and it could be good.
(We can't have is_leaf(st) = (numchildren(st) == 0) because, eg, K"block" is an empty list not a leaf with a value.)
That's all reasonable to do but the tree still has a useful distinction between leaves and internal nodes with zero children and syntax like [K"Identifier"] still seems misleading even though it's simple to type.
I'd prefer a compromise to only support _::K"Identifier" for now and leave the general name::K"Identifier" for later if we need it. (It's true that value unpacking is awkward but the main problem is that Kind is extensible. That's messy and I think Jeff would prefer it be deleted.)
I'm not going to die on this hill - any more than I already have, at least. Having the brackets just looks really weird to me for things which aren't lists.
Sounds reasonable. We've talked about the whole "lowering kinds ⊃ parsing kinds ⊃ Tokenize kinds" thing being an issue before, and I wouldn't mind separating those sets entirely. I think JuliaLang/JuliaLowering.jl#136 is an example of parsing kinds leaking into lowering.
I'll merge as-is in that case. I agree square brackets are weird, but I don't think they're weirder than in the non-leaf case. I wouldn't make either public to users. We could perhaps use #60181, but I was thinking we just work on good |
topolarity
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share @c42f 's reservations about [ ... ] syntax for leaves and also about @stm syntax not matching @ast, etc., but I think those concerns are minor enough to be solvable later, esp. given that this isn't user-facing.
Very curious whether / how first-class match will affect that story
Separated from JuliaLang/JuliaLowering.jl#93 with some tweaks and tests. This will make life much easier in writing
EST->CST, and in writing lowering code in general. Matching on trees is at least half of lowering's job!Includes a checker for correct usage and the option to print a trace while matching. See the docstring for usage and see the original PR for some examples.