Skip to content

[protobuf2, protobuf3] Text Format Language Specification is not implemented correctly for protobuf2 or protobuf3. #4637

@kaby76

Description

@kaby76

It says in the specs for proto2 and proto3: MessageValue is defined in the [Text Format Language Specification](https://protobuf.dev/reference/protobuf/textformat-spec#fields).. MessageValue is used in both specs as:

constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ] floatLit ) |
                strLit | boolLit | MessageValue

The Text Format Language Specification (https://protobuf.dev/reference/protobuf/textformat-spec/) is an entirely different spec, different language. However, it's only used with Protocol Buffers. The EBNF for the language is given in the Text Format Language spec (https://protobuf.dev/reference/protobuf/textformat-spec/#fields). *NB: As noted in the spec, the EBNF was reverse-engineered from the C++ parser implementation. We know where this can lead us...`

Field        = ScalarField | MessageField ;
MessageField = FieldName, [ ":" ], ( MessageValue | MessageList ) [ ";" | "," ];
ScalarField  = FieldName, ":",     ( ScalarValue  | ScalarList  ) [ ";" | "," ];
MessageList  = "[", [ MessageValue, { ",", MessageValue } ], "]" ;
ScalarList   = "[", [ ScalarValue,  { ",", ScalarValue  } ], "]" ;
MessageValue = "{", Message, "}" | "<", Message, ">" ;
ScalarValue  = String
             | Float
             | Identifier
             | SignedIdentifier
             | DecSignedInteger
             | OctSignedInteger
             | HexSignedInteger
             | DecUnsignedInteger
             | OctUnsignedInteger
             | HexUnsignedInteger ;

Unfortunately, all of this does not correspond to the EBNF in the Antlr grammars.

// not specified in specification but used in tests
blockLit
: LC (ident COLON constant)* RC
;

// not specified in specification but used in tests
blockLit
: LC (ident COLON constant)* RC
;

MessageValue can be delimited by { ... } or by < ... >, but the Antlr grammar does not accept that.

NB: I will have to check what protoc does before changing the grammar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions