Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered choices failed to switch to the next choice #74

Open
ghusse opened this issue Dec 9, 2023 · 1 comment
Open

Ordered choices failed to switch to the next choice #74

ghusse opened this issue Dec 9, 2023 · 1 comment

Comments

@ghusse
Copy link

ghusse commented Dec 9, 2023

I'm trying to use canopy to parse search queries, and I'm struggling with ordered choices that raise parsing errors.

What I want to do (this is one use case that I extracted from the syntax I need to parse):

  1. parse text tokens without spaces as text
  2. when tokens are prefixed by -, consider the token as negated
  3. BUT if the token is a valid number, then it should not be negated.

I tried to achieve this with the following PEG file:

grammar Query
  query         <- number / negation / token
  
  
  negation      <-  @'-' token                         %negation
  number  <- '-'? [1-9] [0-9]* ('.' [0-9]+)?  %token
  token      <-  [^\s-] [^\s]*                            %token

When parsing the string 1.2.3 with the parser generated by canopy & this PEG file, it raises the following error:

SyntaxError: Line 1: expected one of:

        - [0-9] from Query::number

         1 | 42.43.44
                  ^

      634 |     }
      635 |     this.constructor.lastError = { offset: this._offset, expected: this._expected };
    > 636 |     throw new SyntaxError(formatError(this._input, this._failure, this._expected));
          |           ^
      637 |   };
      638 |
      639 |   Object.assign(Parser.prototype, Grammar);

I understand that the documentation warns about this, but how can I define a working PEG file that will correctly fallback on tokens when the value is not a valid number?

The documentation:

This means that the options are tried in order and the first choice that leads to a successful parse is kept. If this choice results in a parsing error later on in the input, the parser does not backtrack and try any other choices, the parse simply fails.

I don't get the difference between a successful parse that will be used to choose the right choice, and a parsing on the input that can fail.

@jcoglan
Copy link
Owner

jcoglan commented Dec 13, 2023

What happens when trying to parse the string: 1.2.3:

  • The first rule, query, is tried.
  • query is a choice whose first member is number, so number is tried
  • number matches the string 1.2, leaving .3 unparsed
  • Since number succeeded in matching the input, query succeeds with its result
  • After applying the first rule, query, to the input, we have not consumed the entire string, so the parse fails

The problem here is that you have a choice where one member (number) matches prefixes of another member (token). In general this is a bad idea when using PEGs, and you need to redesign the grammar so that this does not happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants