Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed parsing refactor #3420

Open
gwhitney opened this issue Mar 16, 2025 · 1 comment
Open

Proposed parsing refactor #3420

gwhitney opened this issue Mar 16, 2025 · 1 comment
Labels
category:expressions Issues about the expression parser, variable scoping etc. design decision

Comments

@gwhitney
Copy link
Collaborator

Describe the concerns
There are a number of outstanding parsing/evaluation issues, such as #2631, which it would be convenient to work on in conjunction with #3374 (in particular the implementation of the quotient operator to be represented also by infix // ala Python -- that needs some specialized TeX representation, and right now it is tricky to to make that TeX handling uniform between quotient(a,b) and a // b). These items generally point to unifying FunctionNode and OperatorNode -- so I am planning to go ahead with that, presuming you are still willing on that, and I was thinking of calling the unified node a CallNode. (It's easiest to use a third name to make sure all instances of each have been handled in the unification.)

Anyhow, working on the unification of FunctionNode and OperatorNode led me to realize that there is currently redundancy between the big table in src/expression/operators.js and a number of small tables in src/expression/parse.js, such as the one in the (somewhat awkwardly named) function parseMultiplyDivideModulusPercentage. In addition, although there is a mechanism for adding custom nodes to the parser, it is not particularly well documented, and it is limited in scope: basically the custom nodes can only be created by syntax that looks like a function call. Thus, for example in our color computations, we would like to make expressions like #FF80C0 be color constants; this doesn't really interfere with using the # for comments, because you just need to avoid having 6 or 8 hex digits immediately after the # for a comment, and it reads very naturally. But right now there is no way to extend the parser to handle such things.

Proposal

  • Obtain all of the parser-driving symbol/precedence information from a single unified source of truth, like the one in operators.js.
  • Expose that big table in the mathjs configuration. Design question: should it be a big complicated property, perhaps named operators, directly in the config object? Or should we have a separate exposed parseConfig object, parallel to config that is just concerned with parsing? If so, should number and numberFallback properties move to parseConfig since they really only concern parsing expressions, so that config can deal with computation and parseConfig with parsing? Note that having the table of operators in the configuration would resolve Facilitate defining operator synonyms such as && for 'and', || for 'or' #2722, for example -- someone wanting to parse && as and would just need to insert the appropriate entry in the table for logical operators before calling parse.
  • Recommended addition to this proposal: move the tokenizer or at least tables for the tokenizer into config or parseConfiguration as well, so that syntax extensions like #FF80C0 for a color constant can be supported.
  • Recommended addition to this proposal: move each precedence level's "local parsing" function, e.g, parseAddSubtract, into the operators.js-style table, with an automatic facility to call the function at one higher level of precedence to get the subexpressions of the current precedence level; this additional refactor would facilitate run-time tweaking of the parser, easing development of parsing improvements and enabling other unforeseen syntax extensions on the part of clients of mathjs, beyond just the current custom nodes facility.
  • Mildly recommended addition to this proposal: eliminate the CustomNodes facility because once the precedence table is exposed, you can implement custom nodes by just inserting an entry in the table at the appropriate precedence that recognizes your custom syntax and returns a node of your choice (custom or not). For example, if you prefer EXPR if COND else ALTERNATE to COND ? EXPR : ALTERNATE, you could add an entry to the table just before the one for the ternary that recognizes this alternate syntax, but just returns a ConditionalNode.
  • fully document all customization opportunities that result from the refactor, including the custom nodes facility if it is kept.

I am going to embark on these refactors, and let you know how it goes; I welcome feedback in the meantime. Design question: do you consider the unification of OperatorNode and FunctionNode to CallNode a breaking change in and of itself, because mathjs does document and expose its list of valid node types? If so, would you like this refactor, presuming it is OK with you, to go in two steps, one that doesn't unify the nodes but does make the parser more DRY and configurable, followed by one that does unify the nodes that would have to go into mathjs 15?

Thanks for your thoughts on this.

@gwhitney
Copy link
Collaborator Author

OK, after diving into an actual refactor, I realized/remembered that there are already quite several parsing configuration settings (many of them to do with tokenization, actually) exposed as properties of the math.parse function. Clients of the package are specifically advised to change these properties to reconfigure parsing, e.g. to allow unicode characters in identifiers.

With a number of new proposed configuration parameters in the works here and #3374, these considerations lead to the following possible organization alternatives for all of the configuration parameters, which do seem to fall into two families, those for parsing (e.g. existing parse.isAlpha, config.number, and proposed operators table) and those for computation (e.g., existing config.relTol, config.predictable, proposed config.numberApproximate).

  1. Move them all to top-level properties of the config object: parse.isAlpha => config.isAlpha, create config.operators, config. numberApproximate, etc. For backwards compatibility, parse.isAlpha could have getter/setter functions that make it an alias of config.isAlpha.
  2. Have a separate parseConfig object analogous to config to house the parsing configuration: parse.isAlpha => parseConfig.isAlpha, config.number => parseConfig.number, create config.numberApproximate, parseConfig.operators, etc. We will likely be able to provide backwards-compatibility aliases, although there is some worry about creating circular dependencies.
  3. Leave parsing configuration properties as properties of the parse function and consolidate all of them there: config.number => parse.number, create parse.operators, create config.numberApproximate, etc. We could try to provide backwards-compatibility aliases, but likely would run into circular-dependency problems between the parse function and the config object. So this would likely necessitate a breaking change.
  4. Move to a two-level hierarchy inside of existing config: put parsing properties on config.parse, and compute properties on config.compute. So parse.isAlpha => config.parse.isAlpha, config.number => config.parse.number, create config.compute.numberApproximate, config.parse.operators, etc. Existing top-level properties would remain (for now) as backwards-compatibility aliases for the two-level hierarchy, exactly as config.epsilon is now.

As the most similar to the current state of the world, I will pursue (3) for now, even though it's a bit inconsistent as far as the mechanisms for configuring parsing and computation go and I don't at the moment see how to do it without a breaking change. But if you prefer any of the other options or some other configuration scheme, please just let me know and I will be happy to switch gears.

@gwhitney gwhitney added design decision category:expressions Issues about the expression parser, variable scoping etc. labels Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:expressions Issues about the expression parser, variable scoping etc. design decision
Projects
None yet
Development

No branches or pull requests

1 participant