Skip to content

Conversation

@ssmendon
Copy link
Contributor

@ssmendon ssmendon commented Jul 27, 2025

Closes #131.

Sorry to add yet another PR, but I wanted to address the comments added by @epage to #798. I want this to be upstreamed, even if it's unstable at first, so that I can use the API in some other projects of mine.

I also think it would help resolve a lot of the other related PRs if there's a base PR inside of winnow.

Summary of changes

This PR rebases #614 to the newest version of winnow. It adds:

Most of the work was done by @39555 in the following drafted PRs: #614, #622, and #620. There's also PR #798 (which I did not consult), but it is a more recent attempt to try and have this merged.

Changes from #614

  • Rebased to the newest version of winnow
  • Renamed Expression::current_precedence_level to precedence_level
  • Implemented Parser for Postfix instead of for (i64, fn) (in src/combinator/expression.rs#L372-377)
  • Made the fields public in the tuple structs of Prefix and Postfix
  • Made crate::expression a private module, and pub use expression::* instead
  • Added more documentation comments
  • Slight style changes

The reason I renamed current_precedence_level was to remove some boilerplate in the builder-style API. But I am able to change that back. See #614 (comment)

The reason I pub use expression::* in src/combinator/mod.rs is for consistency with the other modules.

Changes from #622

  • Rebased to the newest version of winnow
  • Renamed the example to c_expression from pratt.
  • Added a new example under crate::_topic::language referencing c_expression
  • Added a cross-link under crate::_topic::arithmetic referencing c_expression
  • More documentation in the example

The reason I named it c_expression instead of @epage's suggestion to name it c_expr is mostly for consistency with the existing s_expression example.

Related PRs

#798 contains some recent review comments.

Sorry I lost my pace a little bit. #614 contains the final api based on a builder pattern. #622 is an example that uses that api (maybe a slightly outdated one) and the benchmark #620 is rebased on both of that

#131 (comment) by @39555

#131 (comment) by @epage

Discussion summary for #614

Details

  • TODO: Add an unstable-pratt feature flag

  • TODO: Overflow checks around operator precedence, which uses i64

  • Discussed the API of prefix, infix, postfix. They are both grammar-level constructs and value processing functions. Normally, grammar-level concepts are standalone functions and value processing is put into a trait function.

  • dispatch! discussed instead of alt() for performance, which led to a slightly different implementation where we dispatch by the input type.

  • The "builder" approach violates the API guidelines discussed earlier, but it's also more convenient.

  • Discussed trying to implement this without recursion, due to stack overflows.

  • Operator callbacks return impl Parser to receive the input, for something like a ternary operator.

  • Provided a sample using bumpalo and Stateful

  • Some discussion about start_precedence_level in feat: implement Pratt parsing #614 (comment)

This is needed for if the binding power of the operator < start_precedence the parser stops parsing. This is exposed to allow parsing a subset of an expression such as a ? b : c, 1 + 2 (parsing stops at ,)

  • Some discussion about why the builder doesn't mutate itself, which is because the Expression type itself changes because the parser's type may be different

Discussion summary for #618

Details

  • A modified shunting_yard implementation without recursion
  • Requires std
  • Want to have a user-provided stack, possibly?

Discussion summary for #620

Details

  • Discussion about using recursion (the approach used currently), vs. heap-based allocation in the shunting_yard parser
  • More discussion about arena-based allocation using bumpalo

The recursion grows only with prefixes like -+--++-++--+1 and with right associative infix operators such as pow: 1 ** 2 ** 3 ** 4 ** 5 ** 7 ** 2 ** 1. I assume the user can track the recursion inside the parser and the fold function using the Stateful<> input, as the fold occurs only after all the operands have been parsed.

#620 (comment)

Discussion summary for #622

Details

  • Lexing as a performance improvement for multispace0 matches
  • The rationale for knowing the starting precedence in recursive parsing of something like a ternary Pratt example #622 (comment)
  • Another discussion regarding a Precedence struct instead of specifying the starting precedence in the parser
  • Some nits and API discussion
  • Adding trace for the user automatically
  • Recursion still required in parenthesis, requiring users to make a depth check
  • Discussion about left and right power, using a clever trick to bump values
  • Errors from the algorithm leaving an operand and value on the stack
  • Algorithm description: PoC: Pratt parsing with shunting yard algorithm #618 (comment)

Discussion summary for #798

Details

Open Questions

@ssmendon ssmendon mentioned this pull request Jul 27, 2025
@coveralls
Copy link

coveralls commented Jul 29, 2025

Pull Request Test Coverage Report for Build 19690123416

Details

  • 61 of 97 (62.89%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.7%) to 47.511%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/combinator/expression.rs 61 97 62.89%
Files with Coverage Reduction New Missed Lines %
src/binary/bits/mod.rs 1 80.49%
Totals Coverage Status
Change from base Build 19552868709: 0.7%
Covered Lines: 1680
Relevant Lines: 3536

💛 - Coveralls

@epage
Copy link
Collaborator

epage commented Jul 29, 2025

Thank you for all of your work on this, especially the great summary that made it easier to evaluate this!

@ssmendon ssmendon force-pushed the ssmendon-pratt-pr branch from 961694e to 8e95be1 Compare July 30, 2025 01:29
@ssmendon
Copy link
Contributor Author

ssmendon commented Aug 9, 2025

A lot of these documentation comments are difficult to get right. I feel like:

  1. I'm repeating myself for each of them.
  2. It's confusing to navigate, because you need to see all of the API surface + Pratt parsing to understand what's happening.

I was trying to avoid a comprehensive treatment of Pratt parsing, but I think the documentation questions would be better served by having it more centralized.

Maybe the core of this should be a special "topic" page that documents Pratt parsing at a high level, linking out to appropriate resources where necessary. Then the documentation on the individual structs and methods can be a little sparser, linking back to that topic page (or to expression(), which links back to the topic page).

@epage, what would your opinion be on that? Some specific questions I'm hoping you could answer:

  1. If you think making a dedicated topic page for expression() is a good idea, where it goes into more depth about Pratt parsing and how the API is structured in winnow, let me know. I will work on drafting that anyway.
  2. If the comments on individual structs / methods should be a little less about the theory of Pratt parsing, instead linking back to expression(), which also links the topic page.
  3. Regarding commit history for this change: if I could organize these documentation changes at the end, rather than melding it into the docs() commit for the Pratt parser, that would be good. Especially since @lu-zero is giving me some ideas and PRs on this. 😄

@epage
Copy link
Collaborator

epage commented Aug 11, 2025

I'm repeating myself for each of them.

This will happen to some degree. People will have multiple jump-in points. They may browse straight to expression, from the special topics, looking up a specific part they have a question about (searching for Infix), LSPs reporting information, etc. This doesn't mean we have to put all content everywhere.

We need to balance

  • Maintenance of the documentation
  • How many hops a user must go through to read what they need. I can't speak fully to the LSP experience but I suspect that is the one I worry about most because people can't do hops only get to look at what is in front of them. The more they write, the more that will be in front of them.
  • How rustdoc renders documentation. Mod-level functions and structs get their own page, impl-level functions are on the page for what they belong to. Similarly, mod-level functions and structs will get a one-line summary in the page the user is seeing them all while impl-level functions are not summarized.

It's confusing to navigate, because you need to see all of the API surface + Pratt parsing to understand what's happening.

Considering the above and this, I suspect focusing the documentation on Expression, Expression::* is likely the best approach.

  • All top-level topics are right there, easily addressable for validated linking, and focused on the topic the user cares about
  • expression can point to Expression in its summary for more details, Prefix can point to Expression::prefix, etc so people are at most one hop away from the main page they want and can go straight to the specific section. In an IDE, users will be entering .prefix( first before Prefix, so the documentation is on the more immediately available part of the API.

Regarding commit history for this change: if I could organize these documentation changes at the end, rather than melding it into the docs() commit for the Pratt parser, that would be good. Especially since @lu-zero is giving me some ideas and PRs on this. 😄

Feel free to squash things in. I don't even mind if this PR is one commit as its adding one self-contained piece of functionality and I see docs, tests, and examples as all part of that whole.

@lu-zero
Copy link
Contributor

lu-zero commented Aug 20, 2025

As the codebase is now documented (+ my changes), probably we could do a final pass to make sure binding power and precedence aren't mentioned in conflicting or confusing ways and it should be good enough.

@epage
Copy link
Collaborator

epage commented Aug 20, 2025

@lu-zero to double check, are you saying that this PR is now considered in a "ready" state for reviewing or is this more about changes you all have pending elsewhere?

@lu-zero
Copy link
Contributor

lu-zero commented Aug 20, 2025

I sent ssmendon#1 to add some bits of documentation.

I'm undecided on the binding power vs precedence though.

@lu-zero
Copy link
Contributor

lu-zero commented Sep 6, 2025

@ssmendon if you are busy I can deal with the fmt lint.

Regarding binding power vs precedence, what about adding a paragraph like

Binding power is a value that quantifies the precedence between different operators with a number, the higher the value the tighter the operands are bound to operators, e.g. multiplication could have a binding power of 2, addition a binding power of 1

@ssmendon
Copy link
Contributor Author

ssmendon commented Sep 6, 2025

@ssmendon if you are busy I can deal with the fmt lint.

I will hopefully have the time to work on this more in about a week. I've had something unexpected come up.

(Apologies for the bump).

ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 6, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 6, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 6, 2025
There are three repositories (and three branches):
* winnow-rs/winnow:main
* ssmendon/winnow:ssmendon-pratt-pr (PR winnow-rs#804)
* lu-zero/winnow:ssmendon-pratt-pr (newest changes)

I'm starting this branch by rebasing lu-zero's work
on top of winnow-rs's main. This is just a rote copy
of the files explicitly changed (to keep the diff tidy).

Co-authored-by: Igor <[email protected]>
Co-authored-by: Luca Barbato <[email protected]>
@ssmendon ssmendon force-pushed the ssmendon-pratt-pr branch 2 times, most recently from c36a5eb to cc2769e Compare November 6, 2025 02:13
@epage
Copy link
Collaborator

epage commented Nov 6, 2025

Outstanding items

ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 7, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 7, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
There are three repositories (and three branches):
* winnow-rs/winnow:main
* ssmendon/winnow:ssmendon-pratt-pr (PR winnow-rs#804)
* lu-zero/winnow:ssmendon-pratt-pr (newest changes)

I'm starting this branch by rebasing lu-zero's work
on top of winnow-rs's main. This is just a rote copy
of the files explicitly changed (to keep the diff tidy).

Co-authored-by: Igor <[email protected]>
Co-authored-by: Luca Barbato <[email protected]>
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
ssmendon added a commit to ssmendon/winnow that referenced this pull request Nov 22, 2025
Tries to resolve:
* winnow-rs#804 (comment)

Alternative resolution for:
* #2
@ssmendon
Copy link
Contributor Author

Changes:

@ssmendon ssmendon force-pushed the ssmendon-pratt-pr branch 2 times, most recently from 755ae89 to 341a078 Compare November 26, 2025 02:00
Adds a new operator precedence parser.

Co-authored-by: Igor <[email protected]>
Co-authored-by: Luca Barbato <[email protected]>
@epage epage merged commit fca75c5 into winnow-rs:main Nov 26, 2025
20 checks passed
@epage
Copy link
Collaborator

epage commented Nov 26, 2025

Thanks for all of the hard work from everyone involved!

(sorry if I missed someone!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pratt parsing support

4 participants