Skip to content

Latest commit

 

History

History
264 lines (163 loc) · 7.26 KB

pe.md

File metadata and controls

264 lines (163 loc) · 7.26 KB

API Reference: pe

Functions

  • pe.compile (source, actions=None, parser='packrat', ignore=pe.patterns.DEFAULT_IGNORE, flags=pe.OPTIMIZE)

    Compile the parsing expression or grammar defined in source and return the Parser object. If source contains a grammar with named rules, the first rule is the starting expression. Otherwise if source only contains an anonymous expression, the expression is associated with the rule name 'Start'.

    The actions parameter is used to associate semantic actions to grammar rules. Its argument should be a dictionary mapping rule names to callables.

    The parser argument selects the underlying parser implementation. By default this is 'packrat' for the packrat parser, but it can be set to "machine" to use the state machine parser.

    The ignore pattern is the definition used by autoignore. Set to None to disable autoignore.

    The flags argument can be used to affect how the parser is initialized. By default it is pe.OPTIMIZE, but pe.DEBUG is useful flag when experimenting. Note that changing the flag's value will disable optimization unless the pe.OPTIMIZE flag is set again.

    • Example

      >>> import pe
      >>> float_parser = pe.compile(
      ...     r'''
      ...     Float    <- ~( INTEGER FRACTION? EXPONENT )
      ...     INTEGER  <- '-'? ('0' / [1-9] [0-9]*)
      ...     FRACTION <- '.' [0-9]+
      ...     EXPONENT <- [eE] [-+]? [0-9]+
      ...     ''',
      ...     actions={'Float': float}
      ... )
      >>> m = float_parser.match('0.183e+3')
      >>> m.value()
      183.0
  • pe.match (pattern, string, actions=None, parser='packrat', ignore=pe.patterns.DEFAULT_IGNORE, flags=pe.MEMOIZE | pe.STRICT)

    Match the parsing expression defined in pattern against the input string.

    By default the grammar is optimized and the packrat parser is used. The parser parameter can be set to "machine" to use the state machine parser, but for more control over grammar compilation use the compile() function.

    The ignore pattern is the definition used by autoignore. Set to None to disable autoignore.

    The flags parameter is used to affect parsing behavior; by default it uses pe.MEMOIZE. Note that changing this value will disable memoization unless the pe.MEMOIZE flag is set again.

    • Example

      >>> pe.match(r'"-"? ("0" / [1-9] [0-9]*)', '-183')
      <Match object; span=(0, 4), match='-183'>
  • pe.escape (string)

    Escape any characters in string with a special meaning in literals or character classes.

    • Example

      >>> pe.escape('"\n"')
      '\\"\\n\\"'
  • pe.unescape (string)

    Unescape escaped characters in string. Characters that are unescaped include those that would be escaped by pe.escape and also unicode escapes.

    • Example

      >>> pe.unescape('\\"\\u3042\\"')
      '"あ"'

Grammar Objects

  • class pe.Grammar (definitions=None, actions=None, start='Start')

    A grammar maintains a mapping between names and parsing expression definitions. This class can be used to programatically construct a parser, rather than through re.compile().

    • Example

      >>> from pe.packrat import PackratParser
      >>> from pe import Grammar
      >>> from pe.operators import (
      ...    Class, Optional, Star, Capture, Sequence, Choice)
      >>> g = Grammar(
      ...     {'Integer': Capture(
      ...         Sequence(
      ...             Optional('-'),
      ...             Choice('0', Sequence(Class('1-9'),
      ...                                  Star(Class('0-9'))))))},
      ...     actions={'Integer': int},
      ...     start='Integer',
      ... )
      >>> int_parser = PackratParser(g)
      >>> int_parser.match('183').value()
      183

Parser Objects

  • class pe.Parser (grammar, flags=pe.NONE)

    A generic parser class. This is not meant to be instantiated directly, but is used as the superclass for parsers such as PackratParser and MachineParser.

    • match (s, pos=0, flags=pe.NONE)

      Match the string s using the parser.

      The flags argument affects parsing behavior, such as with pe.STRICT or pe.MEMOIZE.

Match Objects

  • class pe.Match

    A match object contains information about a successful match.

    • string

      The string the expression was matched against.

    • start ()

      Return the position in the string where the match began.

    • end ()

      Return the position in the string where the match ended.

    • group (key_or_index=0)

      Return an emitted or bound value given by key_or_index, or return the entire matching substring when key_or_index is 0 (default). When key_or_index is an integer greater than 0, it is a 1-based index for the emitted value to return. When key_or_index is a string, it is the name of the bound value to return. In either of these last two cases, if the index or name does not exist, an IndexError is raised.

    • groups ()

      Return the tuple of emitted values.

    • groupdict ()

      Return the dictionary of bound values.

    • value ()

      Return the result of evaluating the input against the expression.

Exceptions

  • class pe.Error ()

    General error class raised by erroneous pe operations.

  • class pe.GrammarError ()

    Inherits from pe.Error.

    Raised for invalid grammar definitions.

  • class pe.ParseError (message=None, filename=None, lineno=None, offset=None, text=None)

    Inherits from pe.Error.

    Raised for parsing errors when the pe.STRICT flag is set.

Flags

The following constant values affect grammar compilation or matching behavior.

  • pe.NONE

    The flag used when no flags are set.

  • pe.DEBUG

    Display debug information when compiling a grammar.

  • pe.STRICT

    Raise an error on parse failures rather than just returning None.

  • pe.MEMOIZE

    Use memoization if the parser allows it.

  • pe.OPTIMIZE

    Optimize the grammar by inlining some expressions and merging adjacent expressions into a single regular expression.