Skip to content

Macro fragment fields #3714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Oct 20, 2024

Add a syntax and mechanism for macros to access "fields" of high-level fragment
specifiers that they've matched, to let macros use the Rust parser for
robustness and future compatibility, while still extracting pieces of the
matched syntax.

This RFC introduces the syntax ${fragname.field}, and a couple of fragment specifiers and their fields. The goal is to add more such fragment specifiers and fields, to allow more macros to leverage the Rust parser, but the purpose of this RFC is to introduce the concept and syntax.

Rendered

@joshtriplett joshtriplett added T-lang Relevant to the language team, which will review and decide on the RFC. A-macros Macro related proposals and issues labels Oct 20, 2024
@nikomatsakis
Copy link
Contributor

Oh, I like this. Cute idea.

@joshtriplett joshtriplett added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Oct 22, 2024
@joshtriplett
Copy link
Member Author

Nominating this (and related RFCs) for discussion, to decide whether we can process it asynchronously or whether we need a design meeting.

@matthieu-m
Copy link

I think an important discussion to be had will be whether it's okay for fields to "generate" tokens.

The RFC itself already proposes that fn.return_type materializes a () AST type node ex-nihilo, and the discussion on param notes that &self could simply be materialized as self: &Self thus fitting the larger pat: ty pattern.

Unless the plan is to drop fn.return_type from this RFC, hamstringing fn, I believe the lang/compiler teams should come to a consensus on the policy here:

  • Is sticking to the source more important? In which case fn.return_type should be a ty?.
  • Is simplification, if semantically equivalent, preferable?

Another discussion which may be necessary is pinning down exactly which types the fields should have.

Unless macro-rules are significantly complicated by allowing subtyping in the future, for now, types are final.

For example, using the pat: ty syntax for a function parameter may seem more favorable than adding an ad-hoc fragment type. Okay, &self is a bit weird, but it can be matched as self: &Self so all good?

The problem, though, is that suddenly:

  • How do you attach attributes?
  • How do you extend the parameter syntax to allow splat for variadic generics (eg. name...: T...), is splat shoehorned in the pat/ty?

Keen readers may notice that C-variadics ... would already be a problem, but please bear with me here. Finding examples is hard.

The conservative choice, it seems to me, would be to err on the side of introduce fragment types more often than not, even if in the meantime they end up being functionally equivalent to another fragment type (or a set thereof).

Note: editions may help here, but any change risks introducing breakage so... it may be best to think of editions as a last resort rather than as the default way.

@joshtriplett
Copy link
Member Author

joshtriplett commented Oct 24, 2024

@matthieu-m wrote:

Unless macro-rules are significantly complicated by allowing subtyping in the future, for now, types are final.

I don't think this is the case.

Today, you can write a macro that matches the same tokens several different ways. And I think we could, for instance, present param as pat: ty today, and later present it as a param type containing the same tokens. I don't think that would add any complexity or compatibility issues.

We can also add fields to existing fragment specifiers, without breaking compatibility.

There are compatibility considerations we have to take care with, and we may need to introduce new fragment specifiers in the future to handle those; for instance, if we make a field required and it later becomes optional, we might have to introduce a new fragment specifier with it optional.

But I don't think switching the type of a field (e.g. to a newly created fragment specifier) would break compatibility as long as it contains the same tokens.

Copy link
Member

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT otherwise

Copy link
Member

@jhpratt jhpratt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see someone materializing the idea I've had floating around for a while.

For the purpose of avoiding RFCs for future fields, I believe it would be best to explicitly grant T-lang the ability to decide this on their own volition.


- `:fn`: A function definition (including body).
- `name`: The name of the function, as an `ident`.
- `param`: The parameters of the function, presented as though captured by a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "type" of this field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the macro fragment param, then each repetition will have type param; until then, each repetition looks like pat_param: ty. (Handwaving the ... case here.)

@joshtriplett
Copy link
Member Author

For the purpose of avoiding RFCs for future fields, I believe it would be best to explicitly grant T-lang the ability to decide this on their own volition.

I added an unresolved question about whether we should develop a lighter-weight process/policy for approving these, and whether we should delegate them to another team (e.g. wg-macros).

@veluca93
Copy link

IMO this is a great feature - giving macros access to parts of high-level fragments massively simplifies the job of people writing macros, and makes robust, future-proof declarative macros significantly easier to write.

@traviscross traviscross added I-lang-radar Items that are on lang's radar and will need eventual work or consideration. and removed I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. labels Jan 26, 2025
@safinaskar
Copy link

I don't like this RFC. Yes, I totally agree that the compiler's parser should be exposed to decl. macros, i. e. macros should not reinvent their own parsing. But I believe that this parser should be exposed in more generic way (as opposed to your opinionated ad-hoc way). We should give decl. macros full grammar of Rust or at least some big part of it. I. e. we should assign fixed names to all (or to many) Rust AST nodes and productions and give decl. macros ability to extract nodes.

This is how get_name macro would be written:

macro_rules! get_name {
  // struct struct
  (struct $i:ident $($g:GenericParams)? $($w:WhereClause)? { $($s:StructFields)? }) => { stringify!($i) };
  (struct $i:ident $($g:GenericParams)? $($w:WhereClause)? ;) => { stringify!($i) };

  // tuple struct
  (struct $i:ident $($g:GenericParams)? ( $($t:TupleFields)? ) $($w:WhereClause)? ;) => { stringify!($i) };

  // enums and unions are left as exercise to a reader :)
}

Here I took names for AST nodes from Rust reference.

Yes, you may say this is verbose. But this allows us to get access to AST in generic way. Of course, some simplifications can be developed in 3rd party crates based on this feature. Say, some crate can expose macro for getting name of any ADT type in single macro call.

Also, of course, it will be beneficial to give same AST access to proc macros. I. e. proc macros will not rely on 3rd party crates, such as syn, for actual parsing of Rust.

But to do this we should first ensure that proc macros and decl macros see same lexical syntax. As well as I understand, this is currently not so

@joshtriplett
Copy link
Member Author

joshtriplett commented Feb 1, 2025

@safinaskar The problem with exposing the full Rust AST to macros is that the Rust AST evolves over time. One of the major goals of this RFC is to allow macros to take advantage of the compiler's knowledge of the Rust language while still allowing the macros to ignore things they don't understand.

The way you wrote the get_name macro requires the macro author to handle all the productions of struct/tuple/union/enum/etc. And if Rust adds more, the macro will break. That's true whether that parsing lives in a separate crate or is written directly. For that matter, we don't have any good structured way for macros in a separate crate to provide things like the name of an ADT without parsing the whole ADT, and then a macro to return some other part of the ADT will also have to parse the whole ADT; we don't have a way to parse it once and return all the components in a convenient way.

Macro fragment fields solve that problem.

Also, note from the RFC that the intention is to expand the set of fields; the RFC just specifies a very minimal set of fields to prove the concept.

@safinaskar
Copy link

@joshtriplett

And if Rust adds more, the macro will break

No. The macro will still handle old syntax. It just will not understand new productions. And this is okay. Author of that macro will need to release new version, which will handle missing productions. This is similar to how syn-based proc macros work.

Also: Rust sometimes adds new productions, but it usually not removes them (unless over edition boundary). So we totally can expose full AST.

Also, syn crate was able to do breaking release once in a 3.5 years! (See https://github.com/dtolnay/syn/releases/tag/2.0.0 ). This is rarer than new Rust editions emerge. This proves that we totally can maintain AST exposed to decl. macros

@safinaskar
Copy link

Note: task of syn author was even harder than ours, because he supports not only stable syntax, but also unstable one. For example, he supported box x syntax in syn 1.x (and removed it in syn 2.x). But we can support stable syntax only. And yet syn was able to do breaking release once in 3.5 years

@programmerjake
Copy link
Member

one other problem with just having macro matchers for each ast thing and then writing out rust's syntax explicitly in the macro pattern is that iirc macro_rules isn't powerful enough to fully parse Rust's syntax, since Rust needs some lookahead (iirc 3 tokens), but afaik macro_rules simply don't support that much lookahead.

Plus, just trying to match rust's syntax isn't enough for ergonomic macros, since if you need to access some interior part of the input (e.g. field names from an enum), you have to write out the full syntax until it gets down to the level of the field names whereas with macro fragment fields it's quite trivial to write (maybe like ($a:enum) => ($($(${$a.variants.fields})*)*))

@safinaskar
Copy link

@programmerjake

one other problem with just having macro matchers for each ast thing and then writing out rust's syntax explicitly in the macro pattern is that iirc macro_rules isn't powerful enough to fully parse Rust's syntax, since Rust needs some lookahead (iirc 3 tokens), but afaik macro_rules simply don't support that much lookahead.

When you say "macro_rules isn't powerful enough to fully parse Rust's syntax" you mean parsing Rust by manual macro_rules code? This is not what we are talking about. In my approach, code will be parsed by compiler, of course.

Plus, just trying to match rust's syntax isn't enough for ergonomic macros, since if you need to access some interior part of the input (e.g. field names from an enum), you have to write out the full syntax until it gets down to the level of the field names whereas with macro fragment fields it's quite trivial to write (maybe like ($a:enum) => ($($(${$a.variants.fields})*)*))

All ergonomics improvements can be put to 3rd party crates. I. e. compiler should provide core AST and improvements like "extract enum fields simple way" should go to crates.io . Yes, this will probably require https://docs.rs/tt-call/latest/tt_call/ weirdness or something similar for passing resulting list of fields to user code. But I still believe that supporting full (or almost full) AST is aesthetically good approach.

Full AST approach is in line with long-standing goal of establishing the Rust grammar ( https://github.com/rust-lang/wg-grammar ). This RFC simply adds some alternative feature instead, which will never replace proper AST

@joshtriplett
Copy link
Member Author

joshtriplett commented Feb 3, 2025

No. The macro will still handle old syntax. It just will not understand new productions. And this is okay. Author of that macro will need to release new version, which will handle missing productions. This is similar to how syn-based proc macros work.

That is one of several problems this RFC sets out to solve: macros should not have to constantly update so that they can parse new syntax, if what they want to extract is something that existed in the old syntax.

All ergonomics improvements can be put to 3rd party crates.

They might be able to be, but that doesn't mean they should be.

Yes, this will probably require https://docs.rs/tt-call/latest/tt_call/ weirdness or something similar for passing resulting list of fields to user code.

Being able to avoid those kinds of hacks is another goal of this RFC.

Full AST approach is in line with long-standing goal of establishing the Rust grammar

That project is dead and archived, and nobody has stepped up to change that. The Rust spec will likely end up specifying a grammar of Rust, but that doesn't mean it'll be in a programmatic form usable for macros.


In any case, right now it's not clear what you are proposing doing in the compiler. You say this should happen in a "more generic way" and "assign fixed names to all (or to many) Rust AST nodes and productions". That's exactly what this RFC is doing. I intend to use this mechanism to expose more-or-less the entire Rust AST. If you would like to see it exposed in a different way, please make a proposal sketch, beyond "leave it entirely to third-party crates". If your proposal is "leave it entirely to third-party crates", I've added that to the "rationale and alternatives" section as a possibility to be considered when this RFC is evaluated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Macro related proposals and issues I-lang-radar Items that are on lang's radar and will need eventual work or consideration. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.