Version: 3.0 (TypedAST Actions) Last Updated: February 2026
ZynPEG is a PEG parser generator with semantic actions that construct TypedAST nodes directly. Version 3.0 replaces the previous JSON command-block system with a typed action language that mirrors Rust struct/enum syntax, enabling grammars to build type-safe ASTs at parse time without an intermediate representation.
The runtime is provided by zyn_peg::runtime2 — a Packrat-memoized interpreter that achieves O(n × grammar_size) parsing time.
┌──────────────────────────────────────────────────┐
│ .zyn grammar → parse_grammar() → GrammarIR │
└──────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ source code → GrammarInterpreter → TypedAST │
│ │
│ • Packrat memoization (O(n) per rule) │
│ • No JSON serialization │
│ • No pest VM or code generation │
└──────────────────────────────────────────────────┘
The Grammar2 struct in zyntax_embed wraps this pipeline:
let grammar = Grammar2::from_source(include_str!("my_lang.zyn"))?;
let program: TypedProgram = grammar.parse(source_code)?;A .zyn grammar file consists of directives followed by rule definitions:
// Directives
@language { ... }
@types { ... } // optional
@builtin { ... } // optional
// Rule definitions
rule_name = modifier? { pattern }
-> action
Defines metadata about the language.
@language {
name: "ZynML",
version: "1.0",
file_extensions: [".ml", ".zynml"],
entry_point: "main",
}
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Language name |
version |
string | Yes | Language version |
file_extensions |
string[] | No | File extensions to associate |
entry_point |
string | No | Function to call with --run flag |
Declares opaque extern types and their function return mappings.
@types {
opaque: [$Tensor, $Audio, $Model],
returns: {
tensor_zeros: $Tensor,
audio_load: $Audio,
}
}
opaque: Types that are opaque at the language level (backed by ZRTL plugins)returns: Maps builtin alias names to their return type, overriding ZRTL signature inference
Maps grammar-level function/method/operator names to ZRTL symbol names. Grammar2::parse_with_signatures() uses this to inject extern declarations.
@builtin {
// Function aliases: grammar_name -> symbol_name
tensor_zeros: "$Tensor$zeros",
tensor_arange: "$Tensor$arange",
// Method aliases (prefixed with @)
@sum: "tensor_sum_f32",
@mean: "tensor_mean_f32",
// Operator aliases (prefixed with $@)
$@add: "tensor_add_f32",
}
rule_name = modifier? { pattern }
-> action
| Modifier | Name | Description |
|---|---|---|
@ |
Atomic | No implicit whitespace skipping inside the rule |
_ |
Silent | Rule is consumed but produces no value |
$ |
Compound | Atomic but preserves inner token structure |
! |
Non-atomic | Forces whitespace skipping even inside atomic context |
ZynPEG uses standard PEG operators extended with named bindings:
// Sequence (whitespace is skipped between elements by default)
a ~ b ~ c
// Named binding: binds result of rule to a local variable
name:rule_ref
// Choice (ordered, first match wins)
a | b | c
// Repetition
a* // Zero or more → Vec<T>
a+ // One or more → Vec<T>
a? // Optional → Option<T>
a{n} // Exactly n
a{n,} // At least n
a{n,m} // Between n and m
// Predicates (consume no input)
&a // Positive lookahead
!a // Negative lookahead
// Grouping
(a ~ b) | c
// Literals
"keyword"
'a'..'z'
// Built-in terminals
SOI // Start of input
EOI // End of input
ANY // Any single character
ASCII_DIGIT // 0-9
ASCII_ALPHA // a-z, A-Z
ASCII_ALPHANUMERIC
ASCII_HEX_DIGIT
NEWLINE
WHITESPACE // Auto-skipped between sequence elements
COMMENT // Auto-skipped between sequence elements
Bindings capture rule results into local variables for use in actions:
fn_def = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ body:block }
// ^^^^^ ^^^^^^ ^^^^
// bound to 'name' bound to 'params' bound to 'body'
Binding types follow from the pattern:
name:rule→T(the rule's return type)items:rule*→Vec<T>opt:rule?→Option<T>
Actions follow the -> arrow and describe how to construct TypedAST nodes from the parsed bindings. There are five action kinds.
The primary action type. Syntax mirrors Rust struct/enum construction:
rule_name = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ":" ~ ret:type_expr ~ body:block }
-> TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: ret,
body: Some(body),
is_async: false,
}
The type path (TypedDeclaration::Function) identifies the enum variant or struct to construct. Field values are ExprIR expressions (see Action Expressions below).
For wrapper rules that just select between alternatives, -> binding returns the binding directly:
// Choice rule: returns whichever alternative matched
statement = { let_stmt | assign_stmt | expr_stmt | return_stmt }
-> stmt // if the binding is named 'stmt'
// Or with implicit binding from a single-rule pattern:
factor = { inner:paren_expr | inner:number }
-> inner
When a rule has no action, the last successfully bound value is returned.
Calls a built-in helper that operates on bindings:
// prepend_list: combines first element with rest Vec into Vec
fn_params = { first:fn_param ~ rest:fn_param_comma* }
-> prepend_list(first, rest)
// fold_left_ops: builds left-associative binary expression tree
additive_expr = { first:multiplicative_expr ~ rest:additive_rest* }
-> fold_left_ops(first, rest)
// intern: interns a string into the arena
type_param = { name:identifier ~ (":" ~ type_bounds)? }
-> intern(name)
Available helpers:
| Helper | Signature | Description |
|---|---|---|
intern(s) |
(text) → InternedString |
Intern a string into the arena |
prepend_list(first, rest) |
(T, Vec<T>) → Vec<T> |
Prepend first element to rest |
fold_left_ops(first, rest) |
(Expr, Vec<(op, Expr)>) → Expr |
Build left-associative binary tree |
make_pair(op, operand) |
(op, Expr) → (op, Expr) |
Package an operator and operand for fold_left_ops |
Dispatches to different construct actions based on a string binding value:
stmt = { kind:("let" | "const") ~ name:identifier ~ "=" ~ value:expr }
-> match kind {
"let" => TypedStatement::Let { name: intern(name), value: value },
"const" => TypedStatement::Const { name: intern(name), value: value },
}
Branches on a boolean expression:
fn_decl = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ret:(":" ~ type_expr)? ~ body:block }
-> if ret.is_some() {
TypedDeclaration::Function { name: intern(name), params: params.unwrap_or([]), return_type: ret, body: Some(body) }
} else {
TypedDeclaration::Procedure { name: intern(name), params: params.unwrap_or([]), body: body }
}
Action field values are ExprIR expressions. The following forms are available:
name // value of the binding 'name'
intern(name) // intern string → InternedString
prepend_list(first, rest) // Vec construction helper
Some(value) // wrap in Option::Some
Box::new(expr) // heap-box a value
params.unwrap_or([]) // Option<T> → T, using [] as default
opt.is_some() // bool
binding.text // get matched text as String
binding.span // get Span for the match
Inline struct or enum variant expressions within a field value:
-> TypedExpression::Binary {
left: Box::new(TypedExpression::Variable { name: intern(obj) }),
op: op,
right: right,
}
path: [intern(name)] // single-element Vec
declarations: [] // empty Vec
"string"
42
true / false
a == b
a && b
a || b
import_simple = { "import" ~ name:identifier }
-> TypedDeclaration::Import {
path: [intern(name)],
}
fn_def = {
"def" ~ name:identifier
~ "(" ~ params:fn_params? ~ ")"
~ ":" ~ ret:type_expr
~ body:block
}
-> TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: ret,
body: Some(body),
is_async: false,
}
// Parameters accumulate via prepend_list
fn_params = { first:fn_param ~ rest:fn_param_comma* }
-> prepend_list(first, rest)
fn_param_comma = { "," ~ param:fn_param }
-> param
fn_param = { name:identifier ~ ":" ~ ty:type_expr }
-> TypedParameter {
name: intern(name),
ty: ty,
}
// additive_rest packages (op, operand) pairs for fold_left_ops
additive_expr = { first:multiplicative_expr ~ rest:additive_rest* }
-> fold_left_ops(first, rest)
additive_rest = { op:additive_op ~ operand:multiplicative_expr }
-> make_pair(op, operand)
additive_op = @{ "+" | "-" }
// Choice rule: delegates to whichever alternative matched
type_expr = { optional_type | fn_type | generic_type | primitive_type | simple_type }
// Each alternative handles its own action; type_expr has no explicit action
// (implicitly passes through the result of the matched alternative)
struct_field = { name:identifier ~ ":" ~ ty:type_expr }
-> TypedField {
name: intern(name),
ty: ty,
}
struct_fields = { first:struct_field ~ rest:struct_field_comma* ~ ","? }
-> prepend_list(first, rest)
comparison_with_op = { left:range_expr ~ op:comparison_op ~ right:range_expr }
-> TypedExpression::Binary {
op: op,
left: Box::new(left),
right: Box::new(right),
}
program = { SOI ~ items:top_level_items ~ EOI }
-> TypedProgram {
declarations: items,
}
top_level_items = { decl:top_level_item* }
-> decl // Vec<TypedDeclaration> collected by the repeat
top_level_item = { fn_def | struct_def | import_stmt | ... }
// passthrough — no action needed
Grammar2 (in zyntax_embed) is the primary interface for using a .zyn grammar at runtime:
use zyntax_embed::Grammar2;
// Load from embedded grammar source
let grammar = Grammar2::from_source(include_str!("my_lang.zyn"))?;
// Parse source → TypedProgram (direct, no signatures)
let program = grammar.parse(source_code)?;
// Parse with ZRTL plugin signatures for proper extern type resolution
let program = grammar.parse_with_signatures(source, filename, &plugin_sigs)?;
// Metadata access
grammar.name() // → &str
grammar.version() // → &str
grammar.file_extensions() // → &[String]
grammar.entry_point() // → Option<&str>
grammar.grammar_ir() // → &GrammarIR (for inspection)parse_with_signatures additionally injects extern function declarations for all entries in the @builtin directive, with types resolved from ZRTL plugin signatures or @types.returns overrides.
The old JSON command-block syntax is still parsed and executed, but is deprecated. New grammars should use TypedAST actions exclusively.
Old (JSON):
number = @{ ASCII_DIGIT+ }
-> TypedExpression {
"get_text": true,
"parse_int": true,
"define": "int_literal",
"args": { "value": "$result" }
}
New (TypedAST):
number = @{ ASCII_DIGIT+ }
-> TypedExpression::IntLiteral {
value: number,
}
Key differences:
| JSON (v2) | TypedAST (v3) |
|---|---|
"$1", "$2" positional references |
Named bindings: name:rule |
"define": "node_type" with args dict |
TypedAST::Variant { field: value } |
"commands": [...] sequential blocks |
Inline expressions in field values |
"get_text": true / "parse_int": true |
Atomic rule (@) captures text automatically |
"fold_binary": { ... } |
fold_left_ops(first, rest) helper |
"store" / "load" temporaries |
Direct binding references |
The runtime2 interpreter uses Packrat memoization keyed on (position, rule_id). Each execute_rule call:
- Checks
state.check_memo(rule_id)— returns cachedSuccess/Failureimmediately on hit - Marks the entry as
InProgressto detect left-recursive cycles - Executes the rule pattern and action
- Stores the result via
state.store_memo_at(start_pos, rule_id, entry)
This ensures each (position, rule) pair is evaluated at most once, converting exponential PEG backtracking to O(n × grammar_size).
- Use atomic rules for tokens — mark lexical rules with
@so the matched text is captured automatically and whitespace is not skipped - Use
_(silent) for delimiters — punctuation rules like commas, semicolons, and brackets rarely need to appear in the AST - Use
prepend_listfor lists — pair afirst:rulebinding withrest:rule_comma*(whererule_comma = { "," ~ item:rule } -> item) and combine withprepend_list(first, rest) - Use
fold_left_opsfor binary operators — pair withmake_pair(op, operand)in the rest rule for correct left-associativity - Use
intern()for all identifier strings — interns into the global arena for cheap equality and deduplication - Prefer passthrough for choice rules — if a rule just selects between alternatives, each alternative can have its own action; the choice rule needs no action
Common errors and solutions:
| Error | Cause | Solution |
|---|---|---|
Parse error at L:C: expected [...] |
PEG match failure | Check pattern syntax and token spelling |
unknown rule: foo |
Rule referenced but not defined | Define the missing rule |
binding 'name' not found |
Action references a binding not in the pattern | Add name:rule to the pattern |
left recursion detected |
Rule calls itself without consuming input | Refactor to use rest* style (no direct left recursion) |
UnexpectedResult |
Entry rule did not return TypedProgram |
Ensure the program rule action returns TypedProgram { ... } |
- ZYN_PARSER_IMPLEMENTATION.md — Implementation details of the grammar parser and interpreter
- BYTECODE_FORMAT_SPEC.md — HIR/SSA bytecode format produced after parsing
crates/zyn_peg/src/grammar/ir.rs—GrammarIR,RuleIR,ActionIR,ExprIR,PatternIRdefinitionscrates/zyn_peg/src/runtime2/interpreter.rs—GrammarInterpreterand Packrat memoizationcrates/zynml/ml.zyn— Full reference grammar for ZynML showing all action patterns in use