M1PP
Scope
M1PP is a tiny single-pass macro expander that runs ahead of M0. It takes
M1 source with macro directives and emits plain M1 suitable for M0.
The implementation lives in m1pp/m1pp.c. It is one pass, allocation-free
(fixed static buffers), and stops at the first error.
Invocation
m1pp input.M1 output.M1
Input is read whole into a fixed buffer (MAX_INPUT = 262144 bytes); output
is written whole from another (MAX_OUTPUT = 524288 bytes).
Lexical structure
The lexer produces a flat token array. Token kinds:
WORD— any run of non-special charactersSTRING—"..."or'...'(quotes included in the token text)NEWLINE— a single\nLPAREN,RPAREN,COMMA,LBRACE,RBRACEPASTE— the##marker
Whitespace other than newlines is discarded. Line comments start with # or
; and run to end-of-line. Output formatting is normalized to tokens
separated by spaces and newlines; original spacing is not preserved.
Directives
Directives are recognized only at the start of a line (after a newline or at the top of file).
%macro / %endm
%macro NAME(p1, p2, ...)
... body tokens ...
%endm
Defines a function-like macro. Zero-parameter macros are written %macro NAME(). Macros are define-before-use; there is no prescan. Recursive
macros are not detected and will loop until a buffer limit fires.
%struct
%struct NAME { f1 f2 f3 ... }
Synthesizes zero-parameter macros for fixed 8-byte-per-field layout:
%NAME.f1→0%NAME.f2→8%NAME.f3→16%NAME.SIZE→N * 8
Fields are separated by whitespace, commas, or newlines.
%enum
%enum NAME { l1 l2 l3 ... }
Like %struct with stride 1 and a trailing COUNT:
%NAME.l1→0,%NAME.l2→1, ...%NAME.COUNT→N
%scope / %endscope
%scope NAME
... body ...
%endscope
Pushes NAME onto a lexical scope stack active until the matching
%endscope. Scopes nest. While the stack is non-empty, any ::name or
&::name token emitted from within is rewritten with the current scope
path (see Scoped labels). Every %scope must be closed
before end-of-input. NAME is a single WORD token and may come from
macro-argument substitution.
Macro calls
%NAME(arg, arg, ...)
Arguments are comma-separated token spans, with parentheses and braces
balanced inside an argument. A zero-parameter macro may be invoked either
as %NAME() or as a bare %NAME.
An argument wrapped in a single outer pair of { ... } has the braces
stripped on substitution. This lets a comma-containing or paren-containing
token sequence be passed as a single argument: %foo({a, b, c}) passes one
argument whose tokens are a , b , c.
Argument substitution happens inside the body. After substitution, ##
token-paste is applied: left ## right becomes a single WORD token whose
text is the concatenation of the two operand tokens' text. Operands of
## must be exactly one non-braced token; newlines and other ## tokens
are not valid neighbors.
The expanded body is then rescanned by pushing it onto the stream stack, so macros can call other macros.
Local labels
Inside a macro body, a token starting with :@name or &@name is a local
label definition or reference. On expansion, @ is replaced by __N where
N is a monotonically increasing expansion id, so each call site gets a
fresh label namespace:
:@loop→:loop__7&@loop→&loop__7
Each macro expansion gets a fresh N, so :@loop in two different call
sites (or two different macros) never collide. Argument-substituted tokens
keep their original text and are not rewritten, so a :@name literal
passed as a macro argument passes through verbatim.
Scoped labels
A WORD token whose text starts with :: is a scoped label definition; a
token starting with &:: is a scoped reference. The :: prefix is rewritten
at emit time against the current %scope stack:
- stack =
[parse_number]:::start→:parse_number__start - stack =
[outer, inner]:&::end→&outer__inner__end - stack empty:
::foo→:foo;&::bar→&bar(pass-through)
Because resolution is at emit time rather than macro-expansion time, a
::foo token written inside a macro body resolves against whatever scope
is active at the point the token flows to the output — i.e. the caller's
surroundings, not the macro's own expansion id. This makes generic
control-flow macros possible:
%macro loop_scoped(name, body)
%scope name
::top
body
LA_BR &::top
B
::end
%endscope
%endm
%macro break()
LA_BR &::end
B
%endm
%loop_scoped(scan, {
...
%if_eqz(a0, { %break() })
...
})
Inside the expansion, %loop_scoped has pushed the scope [scan], so
when %break()'s &::end token is finally emitted the stack is [scan]
and the output is &scan__end — exactly the label %loop_scoped
defined at the bottom of its body. A nested %loop_scoped(inner, { ... })
makes [outer, inner] the active stack, so a %break() inside the inner
block targets the innermost scope. To jump past an intervening scope,
write the concatenated name explicitly (&outer__end).
Scoped labels and local (:@ / &@) labels are independent and compose.
A common pattern: use :@ for the macro's private internal labels (the
caller can never name them) and :: for labels that are the macro's
public contract with its caller (::end, ::top, etc.).
Built-in calls
These are recognized wherever a token matches, not only at line start.
Integer emission: ! @ % $
!(expr) → 1-byte little-endian hex, e.g. 'AB'
@(expr) → 2-byte little-endian hex
%(expr) → 4-byte little-endian hex
$(expr) → 8-byte little-endian hex
The expression is evaluated to a signed 64-bit integer and emitted as an
M0-safe single-quoted hex literal ('AABBCCDD') rather than a bare number,
so M0 does not reinterpret it as decimal.
%select(cond, then, else)
Evaluates cond as an expression. If nonzero, the then argument's tokens
are pushed back for rescan; otherwise the else argument's tokens are. The
branches are raw token spans, not expressions.
%str(IDENT)
Stringifies a single WORD token into a double-quoted string literal:
%str(foo) → "foo". The argument must be exactly one word token.
Expression language
Expressions are Lisp-shaped S-expressions. Atoms are integer literals
(decimal, or any base accepted by strtoull/strtoll, including 0x...)
or zero-arg macro calls that evaluate to integer tokens.
Calls:
(+ a b ...) (- a b ...) (* a b ...) (~ a)
(/ a b) (% a b) (<< a b) (>> a b)
(& a ...) (| a ...) (^ a ...)
(= a b) (!= a b)
(< a b) (<= a b) (> a b) (>= a b)
(strlen "literal")
+ - * & | ^are n-ary with at least one argument. Unary-negates./ % << >> = != < <= > >=are strictly binary.~is unary.strlentakes oneSTRINGtoken and returns the raw byte count of the contents between the quotes.
Inside an expression, a %NAME that names a zero-parameter (or invokable)
macro is expanded and its tokens are re-parsed as a sub-expression. This
is how %struct and %enum-generated names compose into arithmetic.
Limits
Fixed at compile time:
| Resource | Limit |
|---|---|
| input bytes | 262144 |
| output bytes | 524288 |
| total token text | 524288 |
| source tokens | 65536 |
| macro body tokens | 65536 |
| expansion pool tokens | 65536 |
| macros | 512 |
| parameters per macro | 16 |
| stream stack depth | 64 |
| expression frames | 256 |
| scope stack depth | 32 |
Exceeding any limit aborts with an error message on stderr.
Errors
On failure, m1pp prints m1macro: <reason> to stderr and exits 1.
Reasons are terse: bad macro header, unterminated macro,
wrong arg count, bad paste, bad expression, bad builtin,
text overflow, token overflow, expansion overflow, output overflow,
stream overflow, unbalanced braces, too many args, too many macros,
bad integer, bad directive, unterminated directive,
unterminated macro call, bad scope header, scope underflow,
scope not closed, scope depth overflow, bad scope label.