-
-
Notifications
You must be signed in to change notification settings - Fork 0
05 Semantic Actions
Semantic actions define how grammar rules build TypedAST nodes from parsed input. Each rule can have an action that follows the -> arrow. Actions use a typed syntax that mirrors Rust struct/enum construction rather than JSON commands.
Before actions can reference parsed values, those values must be bound to names in the pattern:
// Bind results to local variables for use in the action
fn_def = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ":" ~ ret:type_expr ~ body:block }
// ^^^^ ^^^^^^ ^^^ ^^^
// bound as 'name' bound as 'params' as 'ret' as 'body'
The binding syntax is variable_name:rule_name. The type of the binding follows the pattern:
| Pattern | Binding type |
|---|---|
name:rule |
T (rule's return value) |
items:rule* |
Vec<T> |
opt:rule? |
Option<T> |
This replaces the old $1, $2 positional references. Named bindings make actions self-documenting.
There are five ways to write an action. Each follows the -> arrow.
The primary action. Syntax mirrors Rust struct/enum construction:
fn_def = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ":" ~ ret:type_expr ~ body:block }
-> TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: ret,
body: Some(body),
is_async: false,
}
The type path identifies the TypedAST variant to construct. Field values are action expressions (see below).
For wrapper rules that just select between alternatives:
// No explicit action: implicitly passes through the last matched value
statement = { let_stmt | assign_stmt | return_stmt | expr_stmt }
// Or explicitly:
factor = { inner:paren_expr | inner:number }
-> inner
Calls a helper function with bindings as arguments:
// Combine first + rest into a Vec
fn_params = { first:fn_param ~ rest:fn_param_comma* }
-> prepend_list(first, rest)
// Build a left-associative binary expression tree
additive_expr = { first:multiplicative_expr ~ rest:additive_rest* }
-> fold_left_ops(first, rest)
// Intern a string identifier
type_param = { name:identifier }
-> intern(name)
Dispatch to different constructs based on a string binding:
visibility_kw = { "pub" | "priv" }
decl = { vis:visibility_kw ~ "def" ~ name:identifier ~ ... }
-> match vis {
"pub" => TypedDeclaration::Function { visibility: Visibility::Public, name: intern(name), ... },
"priv" => TypedDeclaration::Function { visibility: Visibility::Private, name: intern(name), ... },
}
Branch on a boolean expression:
fn_or_proc = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ret:(":" ~ type_expr)? ~ body:block }
-> if ret.is_some() {
TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: ret,
body: Some(body),
}
} else {
TypedDeclaration::Procedure {
name: intern(name),
params: params.unwrap_or([]),
body: body,
}
}
Field values in actions are expressions that can reference bindings, call helpers, and construct nested nodes.
name // value of the binding 'name'
params // Vec<TypedParameter> collected from 'params:fn_param*'
ret // Option<Type> from 'ret:type_expr?'
intern(name) // String → InternedString (always use for identifiers)
Some(value) // wrap in Option::Some
Box::new(expr) // heap-box a value
prepend_list(first, rest) // T + Vec<T> → Vec<T>
params.unwrap_or([]) // Option<Vec<T>> → Vec<T>, with [] as default
opt.is_some() // Option<T> → bool
binding.text // get matched text as String
binding.span // get the source Span
Fields can contain inline TypedAST node construction:
comparison_with_op = { left:range_expr ~ op:comparison_op ~ right:range_expr }
-> TypedExpression::Binary {
op: op,
left: Box::new(left),
right: Box::new(right),
}
pipe_call = { callee:identifier ~ "(" ~ args:call_args? ~ ")" }
-> TypedExpression::Call {
callee: Box::new(TypedExpression::Variable { name: intern(callee) }),
args: args.unwrap_or([]),
}
path: [intern(name)] // single-element Vec
declarations: [] // empty Vec
is_async: false
visibility: Visibility::Public
The standard pattern for comma-separated lists uses prepend_list and a helper rule for the comma:
// The params list rule
fn_params = { first:fn_param ~ rest:fn_param_comma* }
-> prepend_list(first, rest)
// Strip the comma, return the param
fn_param_comma = { "," ~ param:fn_param }
-> param
// Build a single param
fn_param = { name:identifier ~ ":" ~ ty:type_expr }
-> TypedParameter {
name: intern(name),
ty: ty,
}
Usage in the parent rule:
fn_def = { "def" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ... }
-> TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]), // Option<Vec<TypedParameter>> → Vec<TypedParameter>
...
}
Use fold_left_ops with a paired make_pair rest rule:
additive_expr = { first:multiplicative_expr ~ rest:additive_rest* }
-> fold_left_ops(first, rest)
// Package each (op, operand) pair
additive_rest = { op:additive_op ~ operand:multiplicative_expr }
-> make_pair(op, operand)
// Atomic rule captures op text automatically
additive_op = @{ "+" | "-" }
For input a + b - c this builds:
Binary(-, Binary(+, a, b), c)
Full precedence chain:
expr = { e:pipe_expr }
-> e
pipe_expr = { first:or_expr ~ ... } // lowest precedence
or_expr = { inner:and_expr ~ ("||" ~ and_expr)* } -> inner
and_expr = { inner:comparison_expr ~ ("&&" ~ comparison_expr)* } -> inner
comparison_expr = { comparison_with_op | comparison_no_op }
comparison_with_op = { left:additive_expr ~ op:comparison_op ~ right:additive_expr }
-> TypedExpression::Binary { op: op, left: Box::new(left), right: Box::new(right) }
comparison_no_op = { inner:additive_expr }
-> inner
additive_expr = { first:multiplicative_expr ~ rest:additive_rest* }
-> fold_left_ops(first, rest)
multiplicative_expr = { first:unary_expr ~ rest:multiplicative_rest* }
-> fold_left_ops(first, rest)
The postfix expression pattern builds a chain of operations:
postfix_expr = { base:primary_expr ~ suffix:postfix_suffix* }
-> fold_left_ops(base, suffix)
postfix_suffix = { suffix_call | suffix_method | suffix_field | suffix_index }
suffix_call = { "(" ~ args:call_args? ~ ")" }
-> TypedExpression::Suffix::Call { args: args.unwrap_or([]) }
suffix_field = { "." ~ name:identifier ~ !"(" }
-> TypedExpression::Suffix::Field { name: intern(name) }
suffix_method = { "." ~ name:identifier ~ "(" ~ args:call_args? ~ ")" }
-> TypedExpression::Suffix::Method { name: intern(name), args: args.unwrap_or([]) }
Atomic rules (@) capture their matched text automatically. Access it via the binding name:
// The @-modifier makes 'integer' atomic; the binding captures its text
integer_literal = @{ "-"? ~ ASCII_DIGIT+ }
-> TypedExpression::IntLiteral { value: integer_literal }
// ^^^^^^^^^^^^^^ text of the match
string_literal = @{ "\"" ~ (!"\"" ~ ANY)* ~ "\"" }
-> TypedExpression::StringLiteral { value: string_literal }
bool_literal = @{ "true" | "false" }
-> TypedExpression::BoolLiteral { value: bool_literal }
import_simple = { "import" ~ name:identifier }
-> TypedDeclaration::Import {
path: [intern(name)],
}
import_aliased = { "import" ~ path:module_path ~ "as" ~ alias:identifier }
-> TypedDeclaration::Import {
path: [intern(path)],
alias: Some(intern(alias)),
}
// @-rule captures the full dotted path as a single string
module_path = @{ identifier ~ ("." ~ identifier)* }
struct_field = { name:identifier ~ ":" ~ ty:type_expr }
-> TypedField {
name: intern(name),
ty: ty,
}
struct_fields = { first:struct_field ~ rest:struct_field_comma* ~ ","? }
-> prepend_list(first, rest)
struct_field_comma = { "," ~ field:struct_field }
-> field
struct_def = { "struct" ~ name:identifier ~ "{" ~ fields:struct_fields? ~ "}" }
-> TypedDeclaration::Struct {
name: intern(name),
fields: fields.unwrap_or([]),
}
// Optional return type: ": type_expr"
fn_def = {
"def" ~ name:identifier
~ "(" ~ params:fn_params? ~ ")"
~ ret:(":" ~ ret_ty:type_expr)?
~ body:block
}
-> if ret.is_some() {
TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: ret,
body: Some(body),
}
} else {
TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: Type::Unit,
body: Some(body),
}
}
@language { name: "MyLang", version: "1.0" }
// Entry
program = { SOI ~ items:top_level_item* ~ EOI }
-> TypedProgram { declarations: items }
top_level_item = { fn_def }
// Function definition
fn_def = { "fn" ~ name:identifier ~ "(" ~ params:fn_params? ~ ")" ~ ":" ~ ret:type_expr ~ body:block }
-> TypedDeclaration::Function {
name: intern(name),
params: params.unwrap_or([]),
return_type: ret,
body: Some(body),
}
fn_params = { first:fn_param ~ rest:fn_param_comma* }
-> prepend_list(first, rest)
fn_param_comma = { "," ~ param:fn_param }
-> param
fn_param = { name:identifier ~ ":" ~ ty:type_expr }
-> TypedParameter { name: intern(name), ty: ty }
// Block
block = { "{" ~ stmts:statement* ~ "}" }
-> TypedBlock { stmts: stmts }
// Statements
statement = { return_stmt | let_stmt | expr_stmt }
return_stmt = { "return" ~ value:expr? ~ ";" }
-> TypedStatement::Return { value: value }
let_stmt = { "let" ~ name:identifier ~ "=" ~ init:expr ~ ";" }
-> TypedStatement::Let { name: intern(name), init: init }
expr_stmt = { e:expr ~ ";" }
-> TypedStatement::Expr { expr: e }
// Expressions (left-associative operators)
expr = { e:additive_expr }
-> e
additive_expr = { first:multiplicative_expr ~ rest:additive_rest* }
-> fold_left_ops(first, rest)
additive_rest = { op:additive_op ~ operand:multiplicative_expr }
-> make_pair(op, operand)
additive_op = @{ "+" | "-" }
multiplicative_expr = { first:unary_expr ~ rest:multiplicative_rest* }
-> fold_left_ops(first, rest)
multiplicative_rest = { op:multiplicative_op ~ operand:unary_expr }
-> make_pair(op, operand)
multiplicative_op = @{ "*" | "/" | "%" }
unary_expr = { unary_with_op | postfix_expr }
unary_with_op = { op:unary_op ~ operand:postfix_expr }
-> TypedExpression::Unary {
op: op,
operand: Box::new(operand),
}
unary_op = @{ "-" | "!" }
postfix_expr = { base:primary_expr ~ suffix:postfix_suffix* }
-> fold_left_ops(base, suffix)
postfix_suffix = { suffix_call | suffix_field }
suffix_call = { "(" ~ args:call_arg_list? ~ ")" }
-> TypedExpression::Suffix::Call { args: args.unwrap_or([]) }
suffix_field = { "." ~ name:identifier ~ !"(" }
-> TypedExpression::Suffix::Field { name: intern(name) }
// Primary expressions
primary_expr = { int_literal | bool_literal | string_literal | paren_expr | var_expr }
int_literal = @{ ASCII_DIGIT+ }
-> TypedExpression::IntLiteral { value: int_literal }
bool_literal = @{ "true" | "false" }
-> TypedExpression::BoolLiteral { value: bool_literal }
string_literal = @{ "\"" ~ (!"\"" ~ ANY)* ~ "\"" }
-> TypedExpression::StringLiteral { value: string_literal }
paren_expr = _{ "(" ~ expr ~ ")" }
var_expr = { name:identifier }
-> TypedExpression::Variable { name: intern(name) }
// Types
type_expr = { ty:identifier }
-> Type::Named { name: intern(ty) }
// Terminals
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
COMMENT = _{ "//" ~ (!"\n" ~ ANY)* ~ "\n"? }
If you have grammars using the old JSON command syntax, here is the migration mapping:
| Old (JSON) | New (TypedAST) |
|---|---|
"$1", "$2"
|
first_binding, second_binding (named) |
-> String { "get_text": true } |
@ rule modifier — text is captured automatically |
"define": "int_literal", "args": {"value": "$result"} |
TypedExpression::IntLiteral { value: int_literal } |
"define": "function", "args": {"name":"$1","params":"$2",...} |
TypedDeclaration::Function { name: intern(name), params: params, ... } |
"fold_binary": {"operand":"term","operator":"add_op|sub_op"} |
fold_left_ops(first, rest) with additive_rest = { op:op ~ operand:term } -> make_pair(op, operand)
|
"get_child": {"index": 0} |
-> inner (passthrough) or no action |
"get_all_children": true |
Repetition binding: items:rule* → items is Vec<T>
|
"commands": [...] |
Direct field expressions (no sequencing needed) |
Before (JSON):
fn_decl = { "fn" ~ identifier ~ "(" ~ fn_params ~ ")" ~ type_expr ~ block }
-> TypedDeclaration {
"commands": [
{ "define": "function", "args": {
"name": "$1",
"params": "$2",
"return_type": "$3",
"body": "$4"
}}
]
}
After (TypedAST):
fn_decl = { "fn" ~ name:identifier ~ "(" ~ params:fn_params ~ ")" ~ ret:type_expr ~ body:block }
-> TypedDeclaration::Function {
name: intern(name),
params: params,
return_type: ret,
body: Some(body),
}
- Chapter 6: Understand the TypedAST node types your actions produce
- Chapter 7: Use the builder API directly in Rust code
- Chapter 15: See these patterns applied in a complete DSL