Skip to content

Python runtime: NoViableAltException at <EOF> reported by listener despite successful exit from start rule in trace #4830

@Bormotoon

Description

@Bormotoon

Versions:

  • ANTLR Tool version: 4.13.2
  • antlr4-python3-runtime version: 4.13.2
  • Python version: 3.13.2
  • OS: macOS 13.7 Ventura (darwin 22.6.0)

Problem Description:
We are encountering a strange issue where the ANTLR Python3 runtime seems to report a NoViableAltException at the <EOF> token for certain input files, even though the parser trace (parser.setTrace(True)) clearly shows a successful exit from the start rule (program) just before encountering the <EOF>.

Grammar (Minimal Relevant Parts):

KumirLexer.g4:

lexer grammar KumirLexer;

// ... (keywords, operators, literals) ...

ID : LETTER (LETTER | DIGIT | '_' | '@')* ;

LINE_COMMENT : '|' ~[\r\n]* -> channel(HIDDEN);
DOC_COMMENT  : '#' ~[\r\n]* -> channel(HIDDEN);
WS           : [ \t\r\n]+ -> skip;

fragment LETTER : [a-zA-Zа-яА-ЯёЁ]; // Note: Includes Cyrillic letters
// ... (other fragments) ...

KumirParser.g4:

parser grammar KumirParser;

options { tokenVocab=KumirLexer; }

// ... (expression rules, type rules, etc.) ...

statement
    : variableDeclaration SEMICOLON?
    | assignmentStatement SEMICOLON?
    | ioStatement SEMICOLON?
    | ifStatement SEMICOLON?
    | switchStatement SEMICOLON?
    | loopStatement SEMICOLON?
    | exitStatement SEMICOLON?
    | pauseStatement SEMICOLON?
    | stopStatement SEMICOLON?
    | assertionStatement SEMICOLON?
    | procedureCallStatement SEMICOLON?
    | SEMICOLON
    ;

statementSequence
    : statement*
    ;

algorithmBody
    : statementSequence
    ;

// Captures tokens for algorithm name using a predicate
algorithmNameTokens
    : ( {self._input.LA(1) != self.LPAREN and \
         self._input.LA(1) != self.ALG_BEGIN and \
         self._input.LA(1) != self.PRE_CONDITION and \
         self._input.LA(1) != self.POST_CONDITION and \
         self._input.LA(1) != self.SEMICOLON and \
         self._input.LA(1) != self.EOF}? .
      )+
    ;

algorithmHeader
    : ALG_HEADER (typeSpecifier)? algorithmNameTokens (LPAREN parameterList? RPAREN)? SEMICOLON?
    ;

algorithmDefinition
    : algorithmHeader (preCondition | postCondition | variableDeclaration)*
      ALG_BEGIN
      algorithmBody
      ALG_END (algorithmName)? SEMICOLON?
    ;

// ... (module definition rules) ...

program // Start Rule
    : programItem* (moduleDefinition | algorithmDefinition)* SEMICOLON? EOF
    ;

Example Input File (15-while.kum - triggers the error):
(Note: File uses \n line endings after \r\n -> \n normalization)

| Программа к учебнику информатики для 10 класса
| К.Ю. Полякова и Е.А. Еремина.
| Глава 8.
| Программа № 15. Цикл "пока": количество цифр числа
| Вход:
|   12345
| Результат:
|   Цифр в числе: 5
алг Количество цифр
нач
цел n, count
вывод 'Введите целое число: '
ввод n
count:= 0
нц пока n > 0
n:= div(n,10)
count:= count + 1
кц
вывод 'Цифр в числе: ', count
кон



Code to Reproduce (Python):

from antlr4 import InputStream, CommonTokenStream
from antlr4.error.ErrorListener import ErrorListener
# Assuming generated KumirLexer, KumirParser are importable
from KumirLexer import KumirLexer
from KumirParser import KumirParser
import sys

class SyntaxErrorListener(ErrorListener):
    def __init__(self):
        super().__init__()
        self.errors = []

    def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
        offending_symbol_text = repr(offendingSymbol.text) if offendingSymbol else 'None'
        exception_type = type(e).__name__ if e else 'None'
        error_msg = (f"line {line}:{column} MSG: {msg} | "
                     f"OFFENDING_SYMBOL: {offending_symbol_text} | "
                     f"EXCEPTION: {exception_type}")
        self.errors.append(error_msg)
        # Also print to stderr immediately for visibility
        print(f"[ERROR] {error_msg}", file=sys.stderr)


def parse_kumir_code(code: str):
    # Normalize line endings
    code = code.replace('\r\n', '\n')
    input_stream = InputStream(code)
    lexer = KumirLexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = KumirParser(stream)

    parser.removeErrorListeners()
    error_listener = SyntaxErrorListener()
    parser.addErrorListener(error_listener)

    # Enable trace
    # --- TRACE ---
    print("\n--- Parser Trace Start ---", file=sys.stderr)
    parser.setTrace(True)
    # -------------

    try:
        tree = parser.program() # Start rule
        print("--- Parser Trace End ---", file=sys.stderr) # Should be reached if trace exit happens
        return tree, error_listener.errors
    except Exception as e:
        print(f"[EXCEPTION DURING PARSE] {e}", file=sys.stderr)
        return None, [str(e)]

# Example usage:
file_content = """
| ... comments ...
алг Количество цифр
нач
цел n, count
вывод 'Введите целое число: '
ввод n
count:= 0
нц пока n > 0
n:= div(n,10)
count:= count + 1
кц
вывод 'Цифр в числе: ', count
кон



""" # Content of 15-while.kum

tree, errors = parse_kumir_code(file_content)

if errors:
    print("\n--- Parsing Errors Reported by Listener: ---")
    for err in errors:
        print(err)
else:
    print("\n--- No Parsing Errors Reported by Listener ---")

if tree:
    print("\n--- Parse Tree Built Successfully ---")
else:
     print("\n--- Parse Tree NOT Built ---")

Observed Behavior:
When running the code with the 15-while.kum input (and 11 other similar files from our test suite):

  1. The parser trace (parser.setTrace(True)) shows a successful entry into and successful exit from the program rule. LT(1) upon exiting is <EOF>. Example trace output:
    --- Parser Trace Start ---
    enter   program, LT(1)=алг
    ... (trace of parsing the whole file) ...
    exit    program, LT(1)=<EOF>
    --- Parser Trace End ---
    
  2. However, the SyntaxErrorListener is still invoked and reports an error:
    [ERROR] line 24:0 MSG: no viable alternative at input 'алгКоличествоцифрнач...' | OFFENDING_SYMBOL: '<EOF>' | EXCEPTION: NoViableAltException
    
  3. A test checking assert not errors fails.

Expected Behavior:
If the parser trace indicates a successful exit from the start rule program upon reaching EOF, the SyntaxErrorListener should not be invoked with a NoViableAltException error for that EOF. The parsing should be considered successful.

Additional Notes:

Could this be a bug in the Python3 runtime's error reporting mechanism or state handling related to EOF, especially since the internal trace reports success?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions