Skip to content

Conversation

@mlechu
Copy link
Member

@mlechu mlechu commented Dec 24, 2025

Separated from JuliaLang/JuliaLowering.jl#93 with some tweaks and tests. This will make life much easier in writing EST->CST, and in writing lowering code in general. Matching on trees is at least half of lowering's job!

Includes a checker for correct usage and the option to print a trace while matching. See the docstring for usage and see the original PR for some examples.

Copy link
Member

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nice utility to have, esp. to start developing "AST niceties", and the docs (and careful semantics) are appreciated

No major objections, but I would like to test matching performance. Ideally we should be able to match against / extract things from existing SyntaxTrees without any allocations?

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I like that you've started with something minimal, there's lots of ways to extend it but we'll find out what we need as we use it.

I'd like to write the pattern syntax for leaf nodes in the same way that @ast does. I see you used [K"Identifier"] for leaf nodes at the moment but that feels fairly foreign after using the @ast syntax for a while.

How about we have the pattern [K"call" name::K"Identifer"] match the tree [K"call" "f"::K"Identifier"] with the binding name = "f"? (`_::K"Identifer" for the version without creating a binding)


Lots of ideas for possible extensions come to mind but we can get to that if we need them. A couple of rough ideas:

  • Greedy left-to-right matching of one-or-zero children. Example: [K"call" name maybe[K"parameters" param_list...] args...] could allow optional parameters with param_list bound to a SyntaxList or nothing. Obviously we could choose an operator instead of maybe ... needs thought to find something appealing.
  • I think we'll want to be able to bind subtrees of a particular kind without resorting to when. It could make sense to take one of the assignment operators [K"call" name params_block~K"parameters"] args...] could work. Or things like [K"call" name params_block=[K"parameters" p1 _...] args...] if we want to match substructure though unsure how useful that is.

@mlechu
Copy link
Member Author

mlechu commented Jan 6, 2026

I'd like to write the pattern syntax for leaf nodes in the same way that @ast does. I see you used [K"Identifier"] for leaf nodes at the moment but that feels fairly foreign after using the @ast syntax for a while.

Will implement for consistency.

Lots of ideas for possible extensions come to mind but we can get to that if we need them. A couple of rough ideas

Very true, there are so many features I want to have (from using https://docs.racket-lang.org/reference/match.html: and/or/not patterns, ... after any pattern, applying functions before matching...) but I'm leaning towards not adding anything. I think we both agree @ast and @stm are temporary solutions that aren't meant to be user-facing, but that users should be able to use something similar soon. In particular, I'm thinking about @Keno's current work on pattern-matching as a part of the julia language: a rough prototype is available here.

My feeling is that @ast and @stm use "the best syntax we can do if we aren't allowed to touch compiler internals", which, uhhh...

@mlechu
Copy link
Member Author

mlechu commented Jan 6, 2026

Will implement for consistency

Done, but not sure if I like it. I opted to keep the amount of checking the same as before, i.e. assuming the kind determines whether something is a leaf. That's already assumed everywhere, and otherwise we'd need to add an is_leaf check for every pattern and subpattern.

@c42f
Copy link
Member

c42f commented Jan 7, 2026

Done, but not sure if I like it.

Yeah I don't super love the syntax _::K"blah" but I think it's better to have consistency and to reflect the data model (not exactly lists) in the syntax somehow. @ast could also be different but it's close to some kind of local maximum.

otherwise we'd need to add an is_leaf check for every pattern and subpattern

I think that should be fine. The is_leaf() vs numchildren() == 0 distinction was important in the JuliaSyntax internals. One place it came up is when trying to consume an error tree containing tokens which went through the error recovery path in bump_closing_token(). But the JuliaSyntax recovery system is fairly dubious.

My feeling is that @ast and @stm use "the best syntax we can do if we aren't allowed to touch compiler internals", which, uhhh...

Ayy oh well we'll just have to leave that up to the compiler engineers I guess 🤷‍♀️

Seriously though I've been tempted... I saw Keno's match work too :)

[K"call" _ _ [K"kw" _::K"Identifier" k1] [K"call" _::K"Identifier" [K"kw" _::K"Identifier" k2]]] -> 1
end
@test @stm st begin
[K"call" x::K"Identifier" _ _ _] -> x === st[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this wasn't what I had in mind. My thought was to copy the same meaning that @ast uses so that we have

Suggested change
[K"call" x::K"Identifier" _ _ _] -> x === st[1]
[K"call" x::K"Identifier" _ _ _] -> x == "foo"

I don't think the x::K"blah" syntax is fitting if we want to get the actual node.

(I do think it would be useful to have syntax for "match a node of this kind regardless of its children". But I didn't want to make this PR more complicated.)

Copy link
Member Author

@mlechu mlechu Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot what @ast did there. I think I'll just revert my previous commit in that case, since only binding to e.g. a K"Identifier"'s name_val would require a kind->attribute mapping that makes @stm specific to lowering (e.g. K"BindingId" -> var_id) or a bunch of bikeshedding to make that mapping a parameter. All I want in this PR is simple kind and numchildren matching, and I'm fine with deviating a little bit from the @ast syntax.

@mlechu
Copy link
Member Author

mlechu commented Jan 7, 2026

I think that should be fine. The is_leaf() vs numchildren() == 0 distinction was important in the JuliaSyntax internals.

Good to know!

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway so I'm uneasy enough about the syntax pun here that I've spent a few hours trying to figure out whether I'm stuck thinking the way is_leaf happens to work is useful, or whether there's a way to interpret SyntaxTree where @stm [K"Identifier"] syntax can be a good representation of the data structure.

It would be possible in future to remove leaves as separate state and implement is_leaf based on the Kind. The Kind bits or kind ordering can be tweaked to make is_leaf(::Kind) basically free. We'd need to change the kinds in the green tree to avoid the pun on things like K"for" being an internal node of the AST vs K"for" being a tag for the literal text "for" coming from Tokenize. All that's possible and it could be good.

(We can't have is_leaf(st) = (numchildren(st) == 0) because, eg, K"block" is an empty list not a leaf with a value.)

That's all reasonable to do but the tree still has a useful distinction between leaves and internal nodes with zero children and syntax like [K"Identifier"] still seems misleading even though it's simple to type.

I'd prefer a compromise to only support _::K"Identifier" for now and leave the general name::K"Identifier" for later if we need it. (It's true that value unpacking is awkward but the main problem is that Kind is extensible. That's messy and I think Jeff would prefer it be deleted.)

I'm not going to die on this hill - any more than I already have, at least. Having the brackets just looks really weird to me for things which aren't lists.

@mlechu
Copy link
Member Author

mlechu commented Jan 9, 2026

It would be possible in future to remove leaves as separate state and implement is_leaf based on the Kind. The Kind bits or kind ordering can be tweaked to make is_leaf(::Kind) basically free. We'd need to change the kinds in the green tree to avoid the pun on things like K"for" being an internal node of the AST vs K"for" being a tag for the literal text "for" coming from Tokenize. All that's possible and it could be good.

Sounds reasonable. We've talked about the whole "lowering kinds ⊃ parsing kinds ⊃ Tokenize kinds" thing being an issue before, and I wouldn't mind separating those sets entirely. I think JuliaLang/JuliaLowering.jl#136 is an example of parsing kinds leaking into lowering.

I'm not going to die on this hill - any more than I already have, at least. Having the brackets just looks really weird to me for things which aren't lists.

I'll merge as-is in that case. I agree square brackets are weird, but I don't think they're weirder than in the non-leaf case. I wouldn't make either public to users. We could perhaps use #60181, but I was thinking we just work on good match syntax, since that should end up supporting the language's AST type anyway.

Copy link
Member

@topolarity topolarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share @c42f 's reservations about [ ... ] syntax for leaves and also about @stm syntax not matching @ast, etc., but I think those concerns are minor enough to be solvable later, esp. given that this isn't user-facing.

Very curious whether / how first-class match will affect that story

@topolarity topolarity added the merge me PR is reviewed. Merge when all tests are passing label Jan 9, 2026
@IanButterworth IanButterworth merged commit b4da4fc into JuliaLang:master Jan 11, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

JuliaLowering merge me PR is reviewed. Merge when all tests are passing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants