Skip to main content

MEP 3. Abstract Syntax Tree

FieldValue
MEP3
TitleAbstract Syntax Tree
AuthorMochi core
StatusInformational
TypeInformational
Created2026-05-08

Abstract

Mochi's AST is the same set of Go structs that the parser uses. Each node carries both JSON tags (used by the parser golden tests) and participle parser tags (used to build the parser). This MEP documents the node catalogue, the invariants that hold across the AST, and the rules contributors must respect when adding a new node.

Motivation

The parser package owns the AST. There is no separate ast package. That choice keeps the source small but means a careless rename can break both the parser and the goldens at the same time. A written record of the node catalogue is the cheapest way to make changes safe.

Specification

Source of truth

parser/parser.go. Every AST node is a Go struct with both JSON tags and participle parser tags. The two views must stay in sync because the JSON shape is what the golden files capture.

Root and statements

Program {
Pos lexer.Position
Package string
PackageDoc string
Statements []*Statement
}

Statement {
Pos lexer.Position
Test *TestBlock
Bench *BenchBlock
Expect *ExpectStmt
Agent *AgentDecl
Stream *StreamDecl
Model *ModelDecl
Import *ImportStmt
Type *TypeDecl
ExternType *ExternTypeDecl
ExternVar *ExternVarDecl
ExternFun *ExternFunDecl
ExternObject *ExternObjectDecl
Fact *FactStmt
Rule *RuleStmt
On *OnHandler
Emit *EmitStmt
Let *LetStmt
Var *VarStmt
Assign *AssignStmt
Fun *FunStmt
Return *ReturnStmt
If *IfStmt
While *WhileStmt
For *ForStmt
Break *BreakStmt
Continue *ContinueStmt
Fetch *FetchStmt
Update *UpdateStmt
Expr *ExprStmt
}

Statement is a tagged union encoded as a struct with mutually exclusive nullable fields. Exactly one field is non-nil after a successful parse. The check pass switches on the non-nil field.

The pattern repeats at lower levels: TypeRef, PostfixOp, Primary, and Literal are all the same shape.

Declarations

ImportStmt{ Pos, Lang *string, Path string, As string, Auto bool }
LetStmt { Pos, Name, Doc, Type *TypeRef, Value *Expr }
VarStmt { Pos, Name, Doc, Type *TypeRef, Value *Expr }
AssignStmt{ Pos, Name, Index []*IndexOp, Field []*FieldOp, Value *Expr }
FunStmt { Pos, Export bool, Name, Doc, TypeParams []string,
Params []*Param, Return *TypeRef, Body []*Statement }
TypeDecl { Pos, Name, Doc, Members []*TypeMember,
Variants []*TypeVariant, Alias *TypeRef }
ExternTypeDecl { Pos, Name string }
ExternVarDecl { Pos, Root string, Tail []string, Type *TypeRef }
ExternFunDecl { Pos, Root string, Tail []string,
Params []*Param, Return *TypeRef }
ExternObjectDecl{ Pos, Name string }

ImportStmt.Lang is nil for a plain import "path". Auto is set by import "path" auto to enable auto-import resolution.

Invariants:

  • TypeDecl uses three fields. Members is set for struct and struct-alias forms (type P { ... } and type P = { ... }). Variants is set for any '=' Ident ... form — including bare aliases like type Id = int (parsed as a single variant named int). Alias is set only for function-type or generic-type aliases such as type F = fun(int):int or type L = list<int>. The checker distinguishes a true alias from a single-variant declaration based on whether the sole variant has fields.
  • LetStmt and VarStmt require at least one of Type or Value. The parser allows both to be missing and the checker raises T000 for the empty case.

Type references

TypeRef { Fun *FunType, Generic *GenericType,
Struct *InlineStructType, Simple *string }

FunType { Params []*TypeRef, Return *TypeRef }
GenericType { Name string, Args []*TypeRef }
InlineStructType{ Fields []*TypeField }

There is no Union field. A union type by name resolves through Simple to a UnionType in the type environment. There is no inline union literal.

Expressions

Expr { Pos, Binary *BinaryExpr }
BinaryExpr { Left *Unary, Right []*BinaryOp }
BinaryOp { Pos, Op string, All bool, Right *PostfixExpr }
Unary { Pos, Ops []string, Value *PostfixExpr }
PostfixExpr { Target *Primary, Ops []*PostfixOp }
PostfixOp { Call *CallOp, Index *IndexOp, Field *FieldOp, Cast *CastOp }

BinaryExpr carries a flat list because the parser does not enforce precedence. The type checker applies the precedence table defined at types/infer.go:89-97 (the levels slice) across lines 89-205 (inferBinaryType). A consumer that walks the tree without applying precedence will produce incorrect typing for mixed operator chains.

Unary.Ops is a slice of strings ("-" or "!"). The checker applies them right to left.

Primary {
Pos, Struct *StructLiteral, Call *CallExpr,
Query *QueryExpr, LogicQuery *LogicQueryExpr,
If *IfExpr, Selector *SelectorExpr,
List *ListLiteral, Map *MapLiteral, FunExpr *FunExpr,
Match *MatchExpr, Generate *GenerateExpr,
Fetch *FetchExpr, Load *LoadExpr, Save *SaveExpr,
Lit *Literal, Group *Expr,
}

The order in this struct is meaningful. StructLiteral comes before Selector because Foo{...} could otherwise be parsed as the identifier Foo followed by a block.

FunExpr carries two mutually exclusive body fields — BlockBody []*Statement for fun() { ... } and ExprBody *Expr for fun() => expr. Exactly one is non-nil after a successful parse.

Patterns

There is no Pattern AST. A MatchCase.Pattern is an *Expr. The checker decides what shapes are pattern legal:

  • A literal (any Literal) matches by equality.
  • A bare identifier acts as a wildcard binding.
  • A call expression Tag(a, b, c) matches a tagged union variant by name and binds the field arguments.
  • The underscore identifier is treated as a discard wildcard.

Literals

Literal { Pos, Int *IntLit, Float *float64,
Bool *boolLit, Str *string, Null bool }

Exactly one field is set per literal. Null is a bool flag rather than a pointer because null carries no payload.

Loops, conditionals, blocks

IfStmt { Pos, Cond *Expr, Then []*Statement,
ElseIf *IfStmt, Else []*Statement }
WhileStmt{ Pos, Cond *Expr, Body []*Statement }
ForStmt { Pos, Name string, Source *Expr, RangeEnd *Expr, Body []*Statement }

ForStmt.RangeEnd is non-nil for for i in 0..10 and nil for for x in xs. The checker uses the difference to decide whether the loop variable is bound to an int or to the element type of the source.

IfStmt has both an ElseIf chain and an Else body. They are mutually exclusive at the same level: an if either continues with another if (linked through ElseIf) or terminates with an Else block.

I/O nodes

FetchStmt { Pos, URL *Expr, Target string, With *Expr }
FetchExpr { Pos, URL *Expr, With *Expr }
LoadExpr { Pos, Path *string, Type *TypeRef, With *Expr }
SaveExpr { Pos, Src *Expr, Path *string, With *Expr }
UpdateStmt{ Pos, Target string, Set *MapLiteral, Where *Expr }
EmitStmt { Pos, Stream string, Fields []*StructLitField }

Path is a *string rather than string so we can distinguish "no path supplied" from "empty string". load as T without a path uses stdin; save x without to writes to stdout.

Logic and stream nodes

StreamDecl { Pos, Name, Doc, Fields []*StreamField }
ModelDecl { Pos, Name string, Fields []*ModelField }
OnHandler { Pos, Stream, Alias string, Body []*Statement }
EmitStmt { Pos, Stream string, Fields []*StructLitField }
AgentDecl { Pos, Name, Doc, Body []*AgentBlock }
AgentBlock { Let *LetStmt, Var *VarStmt, Assign *AssignStmt,
On *OnHandler, Intent *IntentDecl }
IntentDecl { Pos, Name string, Params []*Param, Return *TypeRef, Body []*Statement }
FactStmt { Pos, Pred *LogicPredicate }
RuleStmt { Pos, Head *LogicPredicate, Body []*LogicCond }

AgentBlock is a tagged union like Statement but scoped to the agent body. ModelDecl has no Doc field (unlike StreamDecl).

These are full nodes, type checked into the environment for the most part, but their semantics live with the interpreter, not the bytecode VM.

JSON view and golden tests

The golden suite at tests/parser/valid/ writes the parsed program to JSON and diffs it. That means any field rename, reorder, or change in optionality forces a golden update. The omitempty markers on every field keep the goldens minimal but mean adding a new optional field does not break older fixtures unless it appears in the program.

To regenerate goldens after a deliberate AST change:

make update-golden STAGE=parser

Rationale

Reusing the parser structs as the AST keeps the source small and avoids a separate translation step. The cost is that we cannot evolve the AST shape independently of the parser. We accept that trade-off because the language is small and the AST and parser change together more often than not.

Tagged unions encoded as nullable struct fields produce verbose Go code but keep the JSON shape stable, which is what the golden tests want.

Backwards Compatibility

Informational. No backward compatibility implications.

Reference Implementation

  • parser/parser.go — entire AST.
  • tests/parser/valid/ — golden suite that pins the JSON shape.

Open Questions

  • Separate ast package. Splitting the AST out of parser would let the IR and tooling depend on AST without pulling in participle. The migration is large.
  • Pattern AST. Today patterns are reused *Expr nodes. A dedicated Pattern type would catch more errors at parse time, at the cost of duplicating the postfix and call grammar.

References

  • See MEP 2 for the grammar that produces these nodes.

This document is placed in the public domain.