-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Versions:
- ANTLR Tool version: 4.13.2
- antlr4-python3-runtime version: 4.13.2
- Python version: 3.13.2
- OS: macOS 13.7 Ventura (darwin 22.6.0)
Problem Description:
We are encountering a strange issue where the ANTLR Python3 runtime seems to report a NoViableAltException
at the <EOF>
token for certain input files, even though the parser trace (parser.setTrace(True)
) clearly shows a successful exit from the start rule (program
) just before encountering the <EOF>
.
Grammar (Minimal Relevant Parts):
KumirLexer.g4:
lexer grammar KumirLexer;
// ... (keywords, operators, literals) ...
ID : LETTER (LETTER | DIGIT | '_' | '@')* ;
LINE_COMMENT : '|' ~[\r\n]* -> channel(HIDDEN);
DOC_COMMENT : '#' ~[\r\n]* -> channel(HIDDEN);
WS : [ \t\r\n]+ -> skip;
fragment LETTER : [a-zA-Zа-яА-ЯёЁ]; // Note: Includes Cyrillic letters
// ... (other fragments) ...
KumirParser.g4:
parser grammar KumirParser;
options { tokenVocab=KumirLexer; }
// ... (expression rules, type rules, etc.) ...
statement
: variableDeclaration SEMICOLON?
| assignmentStatement SEMICOLON?
| ioStatement SEMICOLON?
| ifStatement SEMICOLON?
| switchStatement SEMICOLON?
| loopStatement SEMICOLON?
| exitStatement SEMICOLON?
| pauseStatement SEMICOLON?
| stopStatement SEMICOLON?
| assertionStatement SEMICOLON?
| procedureCallStatement SEMICOLON?
| SEMICOLON
;
statementSequence
: statement*
;
algorithmBody
: statementSequence
;
// Captures tokens for algorithm name using a predicate
algorithmNameTokens
: ( {self._input.LA(1) != self.LPAREN and \
self._input.LA(1) != self.ALG_BEGIN and \
self._input.LA(1) != self.PRE_CONDITION and \
self._input.LA(1) != self.POST_CONDITION and \
self._input.LA(1) != self.SEMICOLON and \
self._input.LA(1) != self.EOF}? .
)+
;
algorithmHeader
: ALG_HEADER (typeSpecifier)? algorithmNameTokens (LPAREN parameterList? RPAREN)? SEMICOLON?
;
algorithmDefinition
: algorithmHeader (preCondition | postCondition | variableDeclaration)*
ALG_BEGIN
algorithmBody
ALG_END (algorithmName)? SEMICOLON?
;
// ... (module definition rules) ...
program // Start Rule
: programItem* (moduleDefinition | algorithmDefinition)* SEMICOLON? EOF
;
Example Input File (15-while.kum
- triggers the error):
(Note: File uses \n
line endings after \r\n
-> \n
normalization)
| Программа к учебнику информатики для 10 класса
| К.Ю. Полякова и Е.А. Еремина.
| Глава 8.
| Программа № 15. Цикл "пока": количество цифр числа
| Вход:
| 12345
| Результат:
| Цифр в числе: 5
алг Количество цифр
нач
цел n, count
вывод 'Введите целое число: '
ввод n
count:= 0
нц пока n > 0
n:= div(n,10)
count:= count + 1
кц
вывод 'Цифр в числе: ', count
кон
Code to Reproduce (Python):
from antlr4 import InputStream, CommonTokenStream
from antlr4.error.ErrorListener import ErrorListener
# Assuming generated KumirLexer, KumirParser are importable
from KumirLexer import KumirLexer
from KumirParser import KumirParser
import sys
class SyntaxErrorListener(ErrorListener):
def __init__(self):
super().__init__()
self.errors = []
def syntaxError(self, recognizer, offendingSymbol, line, column, msg, e):
offending_symbol_text = repr(offendingSymbol.text) if offendingSymbol else 'None'
exception_type = type(e).__name__ if e else 'None'
error_msg = (f"line {line}:{column} MSG: {msg} | "
f"OFFENDING_SYMBOL: {offending_symbol_text} | "
f"EXCEPTION: {exception_type}")
self.errors.append(error_msg)
# Also print to stderr immediately for visibility
print(f"[ERROR] {error_msg}", file=sys.stderr)
def parse_kumir_code(code: str):
# Normalize line endings
code = code.replace('\r\n', '\n')
input_stream = InputStream(code)
lexer = KumirLexer(input_stream)
stream = CommonTokenStream(lexer)
parser = KumirParser(stream)
parser.removeErrorListeners()
error_listener = SyntaxErrorListener()
parser.addErrorListener(error_listener)
# Enable trace
# --- TRACE ---
print("\n--- Parser Trace Start ---", file=sys.stderr)
parser.setTrace(True)
# -------------
try:
tree = parser.program() # Start rule
print("--- Parser Trace End ---", file=sys.stderr) # Should be reached if trace exit happens
return tree, error_listener.errors
except Exception as e:
print(f"[EXCEPTION DURING PARSE] {e}", file=sys.stderr)
return None, [str(e)]
# Example usage:
file_content = """
| ... comments ...
алг Количество цифр
нач
цел n, count
вывод 'Введите целое число: '
ввод n
count:= 0
нц пока n > 0
n:= div(n,10)
count:= count + 1
кц
вывод 'Цифр в числе: ', count
кон
""" # Content of 15-while.kum
tree, errors = parse_kumir_code(file_content)
if errors:
print("\n--- Parsing Errors Reported by Listener: ---")
for err in errors:
print(err)
else:
print("\n--- No Parsing Errors Reported by Listener ---")
if tree:
print("\n--- Parse Tree Built Successfully ---")
else:
print("\n--- Parse Tree NOT Built ---")
Observed Behavior:
When running the code with the 15-while.kum
input (and 11 other similar files from our test suite):
- The parser trace (
parser.setTrace(True)
) shows a successful entry into and successful exit from theprogram
rule.LT(1)
upon exiting is<EOF>
. Example trace output:--- Parser Trace Start --- enter program, LT(1)=алг ... (trace of parsing the whole file) ... exit program, LT(1)=<EOF> --- Parser Trace End ---
- However, the
SyntaxErrorListener
is still invoked and reports an error:[ERROR] line 24:0 MSG: no viable alternative at input 'алгКоличествоцифрнач...' | OFFENDING_SYMBOL: '<EOF>' | EXCEPTION: NoViableAltException
- A test checking
assert not errors
fails.
Expected Behavior:
If the parser trace indicates a successful exit from the start rule program
upon reaching EOF
, the SyntaxErrorListener
should not be invoked with a NoViableAltException
error for that EOF
. The parsing should be considered successful.
Additional Notes:
- This issue only occurs for 12 out of 60 test files. The other 48 files (including some with very similar structure but lacking certain elements like loops) parse successfully without errors.
- Normalizing line endings (
\r\n
->\n
) did not resolve the issue. - Changing the
program
rule's quantifier from...+ EOF
to...* EOF
allowed parsing files with a single top-level definition but did not fix the error-on-EOF issue for these 12 files. - This problem appears similar to issues EOF fails to match when starting rule used elsewhere. #4242 and Addition of unrelated, unused parser rule to grammar causes different parsing results #3851.
Could this be a bug in the Python3 runtime's error reporting mechanism or state handling related to EOF, especially since the internal trace reports success?