Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new parser API built on attrs for defining classes instantiated from scanned stanzas #52

Open
jwodder opened this issue Dec 19, 2023 · 0 comments · May be fixed by #55
Open

Add a new parser API built on attrs for defining classes instantiated from scanned stanzas #52

jwodder opened this issue Dec 19, 2023 · 0 comments · May be fixed by #55
Assignees
Labels
attrs-parser enhancement New feature or request therefor under consideration Dev has not yet decided whether or how to implement

Comments

@jwodder
Copy link
Collaborator

jwodder commented Dec 19, 2023

A parser will be defined via a class decorated with @parsable. Header fields will be mapped to attributes of the class, with non-trivial mappings defined via field declarations of the form fieldname: Annotation = Field(...).

  • Alternative idea: Replace Field with typing.Annotated à la Pydantic 2.0.

  • Field constructs an attr.Attribute with headerparser-specific parameters stored in the attribute metadata under a "headerparser" key

  • @parsable compiles the class's parsing metadata into a ParserSpec instance that is then saved as a class variable, which is then used by the actual parse*() functions.

  • @parsable can be passed the following arguments:

    • name_decoder — what the v1 parser calls the "normalizer"; defaults to lambda s: re.sub(r'[^\w_]', "_", s.lower())
    • scanner_options: dict[str, Any]
    • **kwargs — passed to attr.define
  • Field — For defining nontrivial multiple=False fields

    • Takes the following arguments:
      • alias
      • decoder — A callable that takes a header name (str) and a value
        • For fields with aliases, this is passed the actual field name, not the alias, as that's what pydantic does with validators.
      • **kwargs — passed to attr.field
  • MultiField: For defining multiple=True fields

    • Takes the same arguments as Field, except that decoder is passed a header name and a list of values
  • ExtraFields: For defining an attribute to store additional fields with multiple=False on

    • Takes the following arguments:
      • decoder — a callable that is passed a list of (name, value) pairs with unique names
      • **kwargs — passed to attr.field
    • Extra fields are allowed in the parsed input iff this or MultiExtraFields is present
    • A class cannot have more than one ExtraFields or MultiExtraFields
  • MultiExtraFields: For defining an attribute to store additional fields with multiple=True on

    • Takes the following arguments:
      • decoder — a callable that is passed a list of (name, value) pairs in which the names need not be unique
      • **kwargs — passed to attr.field
  • BodyField: For defining the attribute on which the body will be stored

    • Takes the following arguments:
      • decoder — a callable that takes just a value
      • **kwargs — passed to attr.field
    • A body is allowed iff such a BodyField is present in the class
    • A class cannot have more than one BodyField
  • Functions:

    • parse(klass: Type[L], data: Union[Iterable[str], str, Scanner]) -> L
    • parse_stanzas(klass: Type[L], data: Union[Iterable[str], str, Scanner]) -> Iterator[L]
    • parse_stream(klass: Type[L], fields: Iterable[Tuple[Optional[str], str]]) -> L
      • There's no point in trying to merge this and parse_stanzas_stream() into the non-stream versions, as either way this function or an equivalent will be needed for the others to call
    • parse_stanzas_stream(klass: Type[L], fields: Iterable[Iterable[Tuple[str, str]]]) -> Iterator[L]
    • There is no parse_next_stanza(); to get this effect, the user should scan the stanza themselves using Scanner and pass the results to parse_stream()
      • Or should parse_next_stanza() exist but only take a Scanner?
    • make_parsable(…) — wraps attr.make_class()
    • is_parsable(Any) -> bool
    • Something (get_scanner()?) for taking a parsable and returning a Scanner initialized with its scanner options?
      • The function would also need to take the data to initialize the Scanner with — unless I give Scanner a feed() method
  • There is a ParserMixin(?) mixin class that implements equivalents of all of the parse*() functions as classmethods that get the klass from cls

  • Supply a premade set of decoders for parsing bools, timestamps, etc.?

  • Supply higher-order functions for converting single-argument functions to (name, value) decoders, converting (name, value) decoders to (name, [value]) decoders, and converting single-argument functions to (name, [value]) decoders

  • Supply one or more equivalents of attrs' pipe() et alii?

  • Add an option for just discarding all extra/unknown fields?

@jwodder jwodder added attrs-parser enhancement New feature or request therefor under consideration Dev has not yet decided whether or how to implement labels Dec 19, 2023
@jwodder jwodder added this to the Attrs-Based Parser milestone Dec 19, 2023
@jwodder jwodder self-assigned this Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attrs-parser enhancement New feature or request therefor under consideration Dev has not yet decided whether or how to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant