Control Parser Behavior with Flags

The pe.compile(), pe.match(), and Parser.match() functions can take a flags parameter that can change how the parser is constructed or applied.

Flags for Grammar Construction

When calling pe.compile(), the following flags can affect how the grammar is constructed:

Flag	Effect
`pe.NONE`	No flags set
`pe.DEBUG`	Enable DEBUG mode
`pe.INLINE`	Inline nonterminals when possible
`pe.COMMON`	Replace common grammar patterns with faster alternatives
`pe.REGEX`	Replace expressions with equivalent regular expressions*
`pe.OPTIMIZE`	Set all grammar optimization options (`pe.INLINE`, `pe.COMMON`, `pe.REGEX`)

* Available for the packrat and machine-python parsers only.

For instance, the DEBUG mode can show you the effect of optimizations:

>>> import pe
>>> grm = r'''
... Start <- Token (" "+ Token)*
... Token <- CHAR+
... CHAR  <- !" " .
... '''
>>> p = pe.compile(grm, flags=pe.DEBUG|pe.INLINE)
## Grammar ##
Start <- Token (" "+ Token)*
Token <- CHAR+
CHAR  <- !" " .
## Modified Grammar ##
Start <- (!" " .)+ (" "+ (!" " .)+)*
Token <- (!" " .)+
CHAR  <- !" " .

With the packrat parser, the pe.DEBUG flag also inserts actions that print each step of parsing, with the input string context on the left and the current expression on the right. Expressions are indented to show nesting levels.

>>> p.match('ab c')  # doctest: +NORMALIZE_WHITESPACE
ab c         |    (!" " .)+ (" "+ (!" " .)+)*
ab c         |      (!" " .)+
ab c         |        !" " .
ab c         |          !" "
ab c         |            " "
ab c         |          .
b c          |        !" " .
b c          |          !" "
b c          |            " "
b c          |          .
 c           |        !" " .
 c           |          !" "
 c           |            " "
 c           |      (" "+ (!" " .)+)*
 c           |        " "+ (!" " .)+
 c           |          " "+
 c           |            " "
c            |            " "
c            |          (!" " .)+
c            |            !" " .
c            |              !" "
c            |                " "
c            |              .
             |            !" " .
             |              !" "
             |                " "
             |              .
             |        " "+ (!" " .)+
             |          " "+
             |            " "
<Match object; span=(0, 4), match='ab c'>

Flags for Matching

When matching with pe.match() or Parser.match(), there are flags that affect what happens while parsing. Currently these are only available for the packrat parser.

Flag	Effect
`pe.NONE`	No flags set
`pe.STRICT`	Raise an error on parse failures
`pe.MEMOIZE`	Use memoization

Without pe.STRICT, a parse failure means that the matching function returns None, but with the flag enabled it raises an exception. If pe.MEMOIZE is also used, it will print the furthest position in the input that the parser successfully reached.

>>> grm = r'''
...   Start  <- (" "* ~(STRING / INT))+ EOF  # match quoted strings or integers
...   STRING <- ["] (!["] .)* ["]
...   INT    <- "-"? ("0" / [1-9] [0-9]*)
...   EOF    <- !.
... '''
>>> pe.match(grm, '"one" 1')  # matches
<Match object; span=(0, 7), match='"one" 1'>
>>> pe.match(grm, '"one" 1 "two')  # missing quotation mark; returns None
>>> pe.match(grm, '"one" 1 "two', flags=pe.STRICT)  # raises exception
Traceback (most recent call last):
  [...]
pe._errors.ParseError: ParseError: failed to parse; use memoization for more details
>>> pe.match(grm, '"one" 1 "two', flags=pe.STRICT|pe.MEMOIZE)
Traceback (most recent call last):
  [...]
pe._errors.ParseError: 
  line 0, character 8
    "one" 1 "two
            ^
ParseError: `(?=(?P<_5>"(?=(?P<_2>[^"]*))(?P=_2)"|\-?(?=(?P<_4>0|[1-9](?=(?P<_3>[0-9]*))(?P=_3)))(?P=_4)))(?P=_5)`

Some things to note here:

The error message shows the optimized expression that failed, so it doesn't look like the input grammar. This more accurately shows what failed, but it's also harder to decipher.
Showing an accurate position relies on memoization being used close to the failure point, but some grammar optimizations can reduce the points where memoization can occur. In this case, I leveraged the fact that semantic operations (such as captures and actions) and recursion can prevent these optimizations and used a capture (~) in the Start rule. Another option is to first compile a grammar without optimizations and match using that.

Flags for Compiling and Compiling

The pe.match() function does not allow you to specify the flags used for compilation, and it defaults to pe.OPTIMIZE. If you want to specify the grammar-compilation flags and the flags used in matching, you will need to use pe.compile() followed by Parser.match() on the resulting parser object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using-flags.md

using-flags.md

Control Parser Behavior with Flags

Flags for Grammar Construction

Flags for Matching

Flags for Compiling and Compiling

Files

using-flags.md

Latest commit

History

using-flags.md

File metadata and controls

Control Parser Behavior with Flags

Flags for Grammar Construction

Flags for Matching

Flags for Compiling and Compiling