commit e2b9951654ab6ba3250b936f286753bb0ca5e274
parent 87e3956d3b23b7ef1e537e75fc304782bd3df6bb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Fri, 24 Apr 2026 11:56:04 -0700
Add %scope / %endscope and :: label rewrite to M1PP
A lexical scope stack driven by `%scope NAME` / `%endscope`. A WORD
token starting with `::` emits as `:scope1__..__scopeN__name` at output
time; `&::name` emits the reference form. With an empty stack the
sigil+name passes through. Resolution is emit-time, so `::foo` inside a
macro body resolves against the caller's scope — making generic
`%break()` / `%continue()` possible.
Implemented in both `m1pp/m1pp.c` (reference) and `m1pp/m1pp.P1`
(self-hosted). New tests/m1pp/17-scopes covers empty-stack pass-through,
nested scopes, scope name from a macro argument, anaphoric break /
continue, innermost-wins for nested scope-introducing macros, and
`:@` + `::` coexisting in one macro.
Also extend p1_gen.py MEM_OFFS with small positive offsets (2–6) so the
scope check can use `lb_*,*,2` directly; regenerated
`build/p1v2/aarch64/p1_aarch64.M1` (gitignored).
Doc: new Scoped labels section in docs/M1PP.md, updated Directives,
Limits, and Errors. docs/LIBP1PP.md's %fn / tagged-loop entries
rewritten to reflect scopes being available.
Also scrubs pre-existing phasing / section-number comments (Phase N,
§N, M1PP-EXT) from m1pp.P1 and every tests/m1pp fixture, for a cleaner
description of current behavior.
Diffstat:
23 files changed, 845 insertions(+), 51 deletions(-)
diff --git a/docs/LIBP1PP.md b/docs/LIBP1PP.md
@@ -208,13 +208,10 @@ explicit labels:
### Tagged loops: `%loop_tag`, `%while_tag_<cc>`, `%for_lt_tag`
-> **Planned migration.** When the M1PP scope feature
-> (`docs/M1PP-SCOPE.md`) lands, the tagged-loop family is retired in
-> favor of scoped equivalents (`%loop_scoped`, `%while_scoped_<cc>`,
-> `%for_lt_scoped`) paired with a generic `%break()` / `%continue()`.
-> The tagged forms remain in libp1pp v1 until scopes ship, so existing
-> callers keep working; new code should prefer the scoped forms once
-> they are available.
+> Tagged loops predate M1PP's `%scope` feature. They still work, but new
+> code should prefer the scoped equivalents (`%loop_scoped`,
+> `%while_scoped_<cc>`, `%for_lt_scoped`) paired with the generic
+> `%break()` / `%continue()` — no tag argument required.
M1PP's `@` local-label mechanism is scoped to the defining macro's body: an
`&@name` token passed to a macro through an argument is not stamped and
@@ -309,11 +306,11 @@ frame-local storage. Expands to:
- an `%eret()` epilogue,
- a matching `%endscope`.
-`%fn` is a scope-introducing-with-block macro in the sense defined by
-M1PP-SCOPE.md. It pushes the scope `name`. Any `%break()` /
-`%continue()` directly in `body` would target `name__end` / `name__top`
-— which `%fn` itself does not define, so those should only appear inside
-a nested scope-introducing loop.
+`%fn` is a scope-introducing-with-block macro: it pushes the scope
+`name` around `body`. Any `%break()` / `%continue()` directly in `body`
+would target `name__end` / `name__top` — which `%fn` itself does not
+define, so those should only appear inside a nested scope-introducing
+loop.
Example:
@@ -554,11 +551,6 @@ functions that have established a frame with `ENTER`.
The following were considered and deferred:
-- Untagged `%break()` / `%continue()` — these become possible once the
- M1PP scope feature (see `docs/M1PP-SCOPE.md`) lands; until then,
- callers use the `%break(tag)` / `%continue(tag)` tagged forms. `%fn`
- already specs against the scope feature and is a standing TODO until
- that feature is implemented.
- Field-access helpers such as `%ld_field` — `LD rd, [base + %S.f]` is
short enough.
- `printf`-style formatted output — replaced by dedicated `print_*` and
diff --git a/docs/M1PP.md b/docs/M1PP.md
@@ -66,6 +66,19 @@ Like `%struct` with stride 1 and a trailing `COUNT`:
- `%NAME.l1` → `0`, `%NAME.l2` → `1`, ...
- `%NAME.COUNT` → `N`
+### `%scope` / `%endscope`
+
+ %scope NAME
+ ... body ...
+ %endscope
+
+Pushes `NAME` onto a lexical scope stack active until the matching
+`%endscope`. Scopes nest. While the stack is non-empty, any `::name` or
+`&::name` token emitted from within is rewritten with the current scope
+path (see [Scoped labels](#scoped-labels)). Every `%scope` must be closed
+before end-of-input. `NAME` is a single `WORD` token and may come from
+macro-argument substitution.
+
## Macro calls
%NAME(arg, arg, ...)
@@ -98,6 +111,61 @@ fresh label namespace:
- `:@loop` → `:loop__7`
- `&@loop` → `&loop__7`
+Each macro expansion gets a fresh `N`, so `:@loop` in two different call
+sites (or two different macros) never collide. Argument-substituted tokens
+keep their original text and are not rewritten, so a `:@name` literal
+passed as a macro argument passes through verbatim.
+
+## Scoped labels
+
+A `WORD` token whose text starts with `::` is a scoped label definition; a
+token starting with `&::` is a scoped reference. The `::` prefix is rewritten
+at **emit time** against the current `%scope` stack:
+
+- stack = `[parse_number]`: `::start` → `:parse_number__start`
+- stack = `[outer, inner]`: `&::end` → `&outer__inner__end`
+- stack empty: `::foo` → `:foo`; `&::bar` → `&bar` (pass-through)
+
+Because resolution is at emit time rather than macro-expansion time, a
+`::foo` token written inside a macro body resolves against whatever scope
+is active at the point the token flows to the output — i.e. the caller's
+surroundings, not the macro's own expansion id. This makes generic
+control-flow macros possible:
+
+ %macro loop_scoped(name, body)
+ %scope name
+ ::top
+ body
+ LA_BR &::top
+ B
+ ::end
+ %endscope
+ %endm
+
+ %macro break()
+ LA_BR &::end
+ B
+ %endm
+
+ %loop_scoped(scan, {
+ ...
+ %if_eqz(a0, { %break() })
+ ...
+ })
+
+Inside the expansion, `%loop_scoped` has pushed the scope `[scan]`, so
+when `%break()`'s `&::end` token is finally emitted the stack is `[scan]`
+and the output is `&scan__end` — exactly the label `%loop_scoped`
+defined at the bottom of its body. A nested `%loop_scoped(inner, { ... })`
+makes `[outer, inner]` the active stack, so a `%break()` inside the inner
+block targets the innermost scope. To jump past an intervening scope,
+write the concatenated name explicitly (`&outer__end`).
+
+Scoped labels and local (`:@` / `&@`) labels are independent and compose.
+A common pattern: use `:@` for the macro's private internal labels (the
+caller can never name them) and `::` for labels that are the macro's
+public contract with its caller (`::end`, `::top`, etc.).
+
## Built-in calls
These are recognized wherever a token matches, not only at line start.
@@ -165,6 +233,7 @@ Fixed at compile time:
| parameters per macro | 16 |
| stream stack depth | 64 |
| expression frames | 256 |
+| scope stack depth | 32 |
Exceeding any limit aborts with an error message on `stderr`.
@@ -176,4 +245,5 @@ Reasons are terse: `bad macro header`, `unterminated macro`,
`text overflow`, `token overflow`, `expansion overflow`, `output overflow`,
`stream overflow`, `unbalanced braces`, `too many args`, `too many macros`,
`bad integer`, `bad directive`, `unterminated directive`,
-`unterminated macro call`.
+`unterminated macro call`, `bad scope header`, `scope underflow`,
+`scope not closed`, `scope depth overflow`, `bad scope label`.
diff --git a/m1pp/m1pp.P1 b/m1pp/m1pp.P1
@@ -91,6 +91,11 @@ DEFINE M1PP_EXPR_FRAMES_CAP 0009000000000000
## Common cap used by macro params, call args, and expression args.
DEFINE M1PP_MAX_PARAMS 1000000000000000
+## Scope-stack cap. 32 nested scopes max; each slot is a 16-byte TextSpan
+## (ptr + len) pointing into stable text (input_buf or text_buf), so
+## scope_stack is 32 × 16 = 512 bytes.
+DEFINE M1PP_MAX_SCOPE_DEPTH 2000000000000000
+
## ExprOp codes (indexed by apply_expr_op).
DEFINE EXPR_ADD 0000000000000000
DEFINE EXPR_SUB 0100000000000000
@@ -768,6 +773,51 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &emit_token_skip
beq_t0,t1
+ # Scope rewrite: TOK_WORD whose text begins with "::" (len>=3) becomes
+ # a scoped definition, "&::" (len>=4) a scoped reference. Dispatch to
+ # emit_scope_rewrite with a1=skip, a2=sigil.
+ ld_a1,a0,0
+ li_a2 TOK_WORD
+ la_br &emit_token_after_scope
+ bne_a1,a2
+ ld_a2,a0,16
+ li_a3 %3 %0
+ la_br &emit_token_after_scope
+ blt_a2,a3
+ ld_a3,a0,8
+ lb_t0,a3,0
+ li_t1 %58 %0
+ la_br &emit_token_check_amp
+ bne_t0,t1
+ lb_t0,a3,1
+ li_t1 %58 %0
+ la_br &emit_token_after_scope
+ bne_t0,t1
+ li_a1 %2 %0
+ li_a2 %58 %0
+ la_br &emit_scope_rewrite
+ b
+:emit_token_check_amp
+ li_t1 %38 %0
+ la_br &emit_token_after_scope
+ bne_t0,t1
+ ld_a2,a0,16
+ li_t2 %4 %0
+ la_br &emit_token_after_scope
+ blt_a2,t2
+ lb_t0,a3,1
+ li_t1 %58 %0
+ la_br &emit_token_after_scope
+ bne_t0,t1
+ lb_t0,a3,2
+ la_br &emit_token_after_scope
+ bne_t0,t1
+ li_a1 %3 %0
+ li_a2 %38 %0
+ la_br &emit_scope_rewrite
+ b
+
+:emit_token_after_scope
# if (output_need_space) emit ' ' (skip the space for the first token on a line)
la_a1 &output_need_space
ld_t0,a1,0
@@ -825,6 +875,164 @@ DEFINE EXPR_INVALID 1200000000000000
:emit_token_skip
ret
+## emit_scope_rewrite: branch target from emit_token for tokens whose text
+## starts with "::" (scoped definition) or "&::" (scoped reference).
+## Writes sigil + scope1 + "__" + ... + scopeN + "__" + name directly to
+## output_buf; with an empty scope stack the middle collapses so output is
+## just sigil + name (pass-through). Not a callable function: reached by `b`,
+## shares emit_token's leaf return address, exits via `ret`.
+##
+## Register inputs:
+## a0 = tok_ptr
+## a1 = skip (2 for "::", 3 for "&::")
+## a2 = sigil (':' = 58 for definitions, '&' = 38 for references)
+:emit_scope_rewrite
+ # name_len = tok->text_len - skip; fail if zero.
+ ld_a3,a0,16
+ sub_a3,a3,a1
+ la_br &err_bad_scope_label
+ beqz_a3
+
+ # Spill inputs — the byte-copy loops below reuse a0..a3/t0..t2 freely.
+ la_t0 &sr_tok_ptr
+ st_a0,t0,0
+ la_t0 &sr_skip
+ st_a1,t0,0
+ la_t0 &sr_sigil
+ st_a2,t0,0
+ la_t0 &sr_name_len
+ st_a3,t0,0
+
+ # Emit leading ' ' if output_need_space.
+ la_a0 &output_need_space
+ ld_t0,a0,0
+ la_br &sr_post_space
+ beqz_t0
+ la_a1 &output_used
+ ld_t0,a1,0
+ li_t1 M1PP_OUTPUT_CAP
+ la_br &err_output_overflow
+ beq_t0,t1
+ la_a2 &output_buf
+ add_a2,a2,t0
+ li_t1 %32 %0
+ sb_t1,a2,0
+ addi_t0,t0,1
+ st_t0,a1,0
+:sr_post_space
+
+ # Emit the sigil byte.
+ la_a0 &output_used
+ ld_t0,a0,0
+ li_t1 M1PP_OUTPUT_CAP
+ la_br &err_output_overflow
+ beq_t0,t1
+ la_a1 &output_buf
+ add_a1,a1,t0
+ la_a2 &sr_sigil
+ ld_a3,a2,0
+ sb_a3,a1,0
+ addi_t0,t0,1
+ st_t0,a0,0
+
+ # Emit each scope frame's bytes followed by "__".
+ li_t0 %0 %0
+:sr_scope_outer
+ la_a0 &scope_depth
+ ld_a1,a0,0
+ la_br &sr_tail_start
+ beq_t0,a1
+
+ la_a0 &scope_stack
+ li_a2 %16 %0
+ mul_a2,a2,t0
+ add_a0,a0,a2
+ ld_a1,a0,0
+ ld_a2,a0,8
+ li_a3 %0 %0
+:sr_scope_inner
+ la_br &sr_scope_sep
+ beq_a3,a2
+ la_t1 &output_used
+ ld_t2,t1,0
+ li_a0 M1PP_OUTPUT_CAP
+ la_br &err_output_overflow
+ beq_t2,a0
+ la_a0 &output_buf
+ add_a0,a0,t2
+ add_t2,a1,a3
+ lb_t2,t2,0
+ sb_t2,a0,0
+ la_t1 &output_used
+ ld_t2,t1,0
+ addi_t2,t2,1
+ st_t2,t1,0
+ addi_a3,a3,1
+ la_br &sr_scope_inner
+ b
+:sr_scope_sep
+ la_a0 &output_used
+ ld_t1,a0,0
+ li_t2 M1PP_OUTPUT_CAP
+ la_br &err_output_overflow
+ beq_t1,t2
+ la_a1 &output_buf
+ add_a1,a1,t1
+ li_a2 %95 %0
+ sb_a2,a1,0
+ addi_t1,t1,1
+ st_t1,a0,0
+ la_a0 &output_used
+ ld_t1,a0,0
+ li_t2 M1PP_OUTPUT_CAP
+ la_br &err_output_overflow
+ beq_t1,t2
+ la_a1 &output_buf
+ add_a1,a1,t1
+ li_a2 %95 %0
+ sb_a2,a1,0
+ addi_t1,t1,1
+ st_t1,a0,0
+ addi_t0,t0,1
+ la_br &sr_scope_outer
+ b
+
+:sr_tail_start
+ la_a0 &sr_tok_ptr
+ ld_a1,a0,0
+ ld_a2,a1,8
+ la_a0 &sr_skip
+ ld_a3,a0,0
+ add_a1,a2,a3
+ la_a0 &sr_name_len
+ ld_a2,a0,0
+ li_a3 %0 %0
+:sr_tail_loop
+ la_br &sr_tail_done
+ beq_a3,a2
+ la_t1 &output_used
+ ld_t2,t1,0
+ li_a0 M1PP_OUTPUT_CAP
+ la_br &err_output_overflow
+ beq_t2,a0
+ la_a0 &output_buf
+ add_a0,a0,t2
+ add_t2,a1,a3
+ lb_t2,t2,0
+ sb_t2,a0,0
+ la_t1 &output_used
+ ld_t2,t1,0
+ addi_t2,t2,1
+ st_t2,t1,0
+ addi_a3,a3,1
+ la_br &sr_tail_loop
+ b
+:sr_tail_done
+ la_a0 &output_need_space
+ li_a1 %1 %0
+ st_a1,a0,0
+ ret
+
## --- Main processor ----------------------------------------------------------
## Stream-driven loop. Pushes source_tokens as the initial stream, then drives
## the streams[] stack until it empties. Per iteration: pop the stream if
@@ -952,7 +1160,7 @@ DEFINE EXPR_INVALID 1200000000000000
li_a2 %5 %0
la_br &tok_eq_const
call
- la_br &proc_check_newline
+ la_br &proc_check_scope
beqz_a0
# %enum matched: shim into define_fielded(stride=1, total="COUNT", len=5)
@@ -976,6 +1184,68 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &proc_loop
b
+## ---- line_start && tok eq "%scope" ----
+:proc_check_scope
+ ld_t0,sp,8
+ mov_a0,t0
+ la_a1 &const_scope
+ li_a2 %6 %0
+ la_br &tok_eq_const
+ call
+ la_br &proc_check_endscope
+ beqz_a0
+
+ # %scope matched: shim into push_scope(stream_end).
+ ld_t0,sp,8
+ la_a0 &proc_pos
+ st_t0,a0,0
+ la_a0 &proc_line_start
+ li_a1 %1 %0
+ st_a1,a0,0
+ ld_a0,sp,0
+ ld_a0,a0,8
+ la_br &push_scope
+ call
+ ld_a0,sp,0
+ la_a1 &proc_pos
+ ld_t0,a1,0
+ st_t0,a0,16
+ li_t1 %1 %0
+ st_t1,a0,24
+ la_br &proc_loop
+ b
+
+## ---- line_start && tok eq "%endscope" ----
+:proc_check_endscope
+ ld_t0,sp,8
+ mov_a0,t0
+ la_a1 &const_endscope
+ li_a2 %9 %0
+ la_br &tok_eq_const
+ call
+ la_br &proc_check_newline
+ beqz_a0
+
+ # %endscope matched: shim into pop_scope(stream_end).
+ ld_t0,sp,8
+ la_a0 &proc_pos
+ st_t0,a0,0
+ la_a0 &proc_line_start
+ li_a1 %1 %0
+ st_a1,a0,0
+ ld_a0,sp,0
+ ld_a0,a0,8
+ la_br &pop_scope
+ call
+ ld_a0,sp,0
+ la_a1 &proc_pos
+ ld_t0,a1,0
+ st_t0,a0,16
+ li_t1 %1 %0
+ st_t1,a0,24
+ la_br &proc_loop
+ b
+
:proc_check_newline
# reload s, tok
ld_a0,sp,0
@@ -1073,7 +1343,7 @@ DEFINE EXPR_INVALID 1200000000000000
:proc_check_macro
# macro = find_macro(tok); if non-zero AND
# ((tok+1 < s->end AND (tok+1)->kind == TOK_LPAREN) OR macro->param_count == 0)
- # then expand_call. (§4 paren-less 0-arg calls.)
+ # then expand_call. Paren-less form is reserved for 0-arg macros.
ld_a0,sp,8
la_br &find_macro
call
@@ -1132,6 +1402,113 @@ DEFINE EXPR_INVALID 1200000000000000
b
:proc_done
+ # Every %scope must be matched by an %endscope before EOF.
+ la_a0 &scope_depth
+ ld_t0,a0,0
+ la_br &err_scope_not_closed
+ bnez_t0
+ eret
+
+## --- %scope / %endscope handlers --------------------------------------------
+## Called at proc_pos == the `%scope` / `%endscope` word on a line-start.
+## Input: a0 = stream end (pointer one past last token in the current stream).
+## Output: proc_pos advanced past the trailing newline (or stream end).
+
+## push_scope(a0 = stream_end): consume `%scope NAME\n`.
+## Name must be a single WORD token; anything else on the line is an error.
+:push_scope
+ enter_0
+
+ # proc_pos += 24 (skip past the `%scope` token).
+ la_t0 &proc_pos
+ ld_t1,t0,0
+ addi_t1,t1,24
+ st_t1,t0,0
+
+ # Require a WORD name token within the stream.
+ la_br &err_bad_scope_header
+ beq_t1,a0
+ ld_t2,t1,0
+ la_br &err_bad_scope_header
+ bnez_t2
+
+ # scope_depth < MAX_SCOPE_DEPTH?
+ la_a1 &scope_depth
+ ld_a2,a1,0
+ li_a3 M1PP_MAX_SCOPE_DEPTH
+ la_br &err_scope_depth_overflow
+ beq_a2,a3
+
+ # scope_stack[scope_depth] = (name.text_ptr, name.text_len)
+ la_a3 &scope_stack
+ li_t0 %16 %0
+ mul_t0,t0,a2
+ add_a3,a3,t0
+ ld_t0,t1,8
+ st_t0,a3,0
+ ld_t0,t1,16
+ st_t0,a3,8
+
+ # scope_depth++
+ addi_a2,a2,1
+ st_a2,a1,0
+
+ # proc_pos += 24 (past the name).
+ la_t0 &proc_pos
+ ld_t1,t0,0
+ addi_t1,t1,24
+ st_t1,t0,0
+
+ # EOF here is tolerated (caller handles stream end). Otherwise the next
+ # token must be TOK_NEWLINE — anything else is a header error.
+ la_br &psc_done
+ beq_t1,a0
+ ld_t2,t1,0
+ li_t0 TOK_NEWLINE
+ la_br &err_bad_scope_header
+ bne_t2,t0
+ addi_t1,t1,24
+ la_t0 &proc_pos
+ st_t1,t0,0
+:psc_done
+ eret
+
+## pop_scope(a0 = stream_end): consume `%endscope\n`. Extra tokens on the line
+## are tolerated (matches %endm's behavior) — skip to the next newline.
+:pop_scope
+ enter_0
+
+ # scope_depth > 0?
+ la_a1 &scope_depth
+ ld_a2,a1,0
+ la_br &err_scope_underflow
+ beqz_a2
+ addi_a2,a2,neg1
+ st_a2,a1,0
+
+ # proc_pos += 24 (past the `%endscope` token).
+ la_t0 &proc_pos
+ ld_t1,t0,0
+ addi_t1,t1,24
+ st_t1,t0,0
+
+:pop_skip_loop
+ la_br &pop_done
+ beq_t1,a0
+ ld_t2,t1,0
+ li_t0 TOK_NEWLINE
+ la_br &pop_consume_newline
+ beq_t2,t0
+ addi_t1,t1,24
+ la_t0 &proc_pos
+ st_t1,t0,0
+ la_br &pop_skip_loop
+ b
+:pop_consume_newline
+ addi_t1,t1,24
+ la_t0 &proc_pos
+ st_t1,t0,0
+:pop_done
eret
## --- %macro storage: parse header + body into macros[] / macro_body_tokens --
@@ -2554,7 +2931,7 @@ DEFINE EXPR_INVALID 1200000000000000
# lparen = call_tok + 24
addi_a0,a0,24
- # Branch split (§4 paren-less 0-arg calls):
+ # Branch split for paren-less 0-arg calls:
# if lparen < limit AND lparen->kind == TOK_LPAREN: parse_args as usual.
# else if macro->param_count == 0: synthesize empty arg list, no parse_args.
# else: fatal "bad macro call".
@@ -4272,7 +4649,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &eea_int_atom
beqz_a0
- # §4 paren-less 0-arg atom:
+ # Paren-less 0-arg atom:
# Take the macro-call branch if (tok+1 < limit AND (tok+1)->kind == TOK_LPAREN)
# OR macro->param_count == 0. Otherwise fall through to int atom (unchanged).
ld_t0,sp,0
@@ -5321,6 +5698,31 @@ DEFINE EXPR_INVALID 1200000000000000
li_a1 %36 %0
la_br &fatal
b
+:err_bad_scope_header
+ la_a0 &msg_bad_scope_header
+ li_a1 %16 %0
+ la_br &fatal
+ b
+:err_scope_depth_overflow
+ la_a0 &msg_scope_depth_overflow
+ li_a1 %20 %0
+ la_br &fatal
+ b
+:err_scope_underflow
+ la_a0 &msg_scope_underflow
+ li_a1 %15 %0
+ la_br &fatal
+ b
+:err_scope_not_closed
+ la_a0 &msg_scope_not_closed
+ li_a1 %16 %0
+ la_br &fatal
+ b
+:err_bad_scope_label
+ la_a0 &msg_bad_scope_label
+ li_a1 %15 %0
+ la_br &fatal
+ b
## fatal(a0=msg_ptr, a1=msg_len): writes "m1pp: <msg>\n" to stderr, exits 1.
## Saves args across the three syscalls since a0..a3 are caller-saved.
@@ -5379,6 +5781,8 @@ DEFINE EXPR_INVALID 1200000000000000
:const_enum "%enum"
:const_size "SIZE"
:const_count "COUNT"
+:const_scope "%scope"
+:const_endscope "%endscope"
## Operator strings for expr_op_code. Each is a raw byte literal; lengths
## are passed separately to tok_eq_const. "<=" must be tested before "<"
@@ -5425,6 +5829,11 @@ DEFINE EXPR_INVALID 1200000000000000
:msg_unbalanced_braces "unbalanced braces"
:msg_bad_directive "bad %struct/%enum directive"
:msg_unterminated_directive "unterminated %struct/%enum directive"
+:msg_bad_scope_header "bad scope header"
+:msg_scope_depth_overflow "scope depth overflow"
+:msg_scope_underflow "scope underflow"
+:msg_scope_not_closed "scope not closed"
+:msg_bad_scope_label "bad scope label"
## --- BSS ---------------------------------------------------------------------
## Placed before :ELF_end so filesz/memsz (which this ELF header sets equal)
@@ -5580,7 +5989,7 @@ ZERO8
:emt_body_start
ZERO8
-## Local-label rewrite (§1). next_expansion_id is the monotonic counter
+## Local-label rewrite. next_expansion_id is the monotonic counter
## (never reset); emt_expansion_id snapshots it at the start of each
## expand_macro_tokens call so nested-call BSS reuse is safe.
## ll_* slots hold body-token span + derived sizes while building the
@@ -5614,7 +6023,27 @@ ZERO8 ZERO8 ZERO8
:local_label_scratch
ZERO32 ZERO32 ZERO32 ZERO32
-## %struct / %enum scratch (§5, §6). define_fielded calls append_text twice
+## --- Scope-stack rewrite -----------------------------------------------------
+## scope_depth: current depth (0..32).
+## scope_stack: 32 × TextSpan (16 bytes each) = 512 bytes. Each slot is
+## (text_ptr, text_len) pointing into stable text memory (input_buf or
+## text_buf — both append-only), so names are borrowed without copying.
+## sr_* slots hold emit_scope_rewrite's inputs across the byte-copy loops.
+:scope_depth
+ZERO8
+:scope_stack
+ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32
+ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32
+:sr_tok_ptr
+ZERO8
+:sr_skip
+ZERO8
+:sr_sigil
+ZERO8
+:sr_name_len
+ZERO8
+
+## %struct / %enum scratch. define_fielded calls append_text twice
## per synthesized macro, so every piece of state that must survive a call
## lives here rather than in a register.
## df_stride — 8 for %struct, 1 for %enum
diff --git a/m1pp/m1pp.c b/m1pp/m1pp.c
@@ -77,6 +77,7 @@
#define MAX_EXPAND 65536
#define MAX_STACK 64
#define MAX_EXPR_FRAMES 256
+#define MAX_SCOPE_DEPTH 32
enum {
TOK_WORD,
@@ -158,6 +159,7 @@ static struct Token macro_body_tokens[MAX_MACRO_BODY_TOKENS];
static struct Token expand_pool[MAX_EXPAND];
static struct Macro macros[MAX_MACROS];
static struct Stream streams[MAX_STACK];
+static struct TextSpan scope_stack[MAX_SCOPE_DEPTH];
static int text_used;
static int source_count;
@@ -168,6 +170,7 @@ static int output_used;
static int output_need_space;
static int stream_top;
static int next_expansion_id;
+static int scope_depth;
static struct Token *arg_starts[MAX_PARAMS];
static struct Token *arg_ends[MAX_PARAMS];
@@ -400,11 +403,68 @@ static int emit_newline(void)
return 1;
}
+static int emit_scoped_label(const struct Token *tok, int skip, char sigil)
+{
+ /* Rewrite `::name` or `&::name` against the current scope stack.
+ * skip is the number of leading chars to drop (`::` -> 2, `&::` -> 3);
+ * sigil is the single-char prefix to emit (`:` for definitions, `&`
+ * for references). With a non-empty scope stack the output is
+ * sigil + scope1 + "__" + ... + scopeN + "__" + name; with an empty
+ * stack it degrades to sigil + name (pass-through). */
+ int name_len = tok->text.len - skip;
+ int i;
+
+ if (name_len <= 0) {
+ return fail("bad scope label");
+ }
+
+ if (output_need_space) {
+ if (output_used + 1 >= MAX_OUTPUT) {
+ return fail("output overflow");
+ }
+ output_buf[output_used++] = ' ';
+ }
+
+ if (output_used + 1 >= MAX_OUTPUT) {
+ return fail("output overflow");
+ }
+ output_buf[output_used++] = sigil;
+
+ for (i = 0; i < scope_depth; i++) {
+ int span_len = scope_stack[i].len;
+ if (output_used + span_len + 2 >= MAX_OUTPUT) {
+ return fail("output overflow");
+ }
+ memcpy(output_buf + output_used, scope_stack[i].ptr,
+ (size_t)span_len);
+ output_used += span_len;
+ output_buf[output_used++] = '_';
+ output_buf[output_used++] = '_';
+ }
+
+ if (output_used + name_len >= MAX_OUTPUT) {
+ return fail("output overflow");
+ }
+ memcpy(output_buf + output_used, tok->text.ptr + skip, (size_t)name_len);
+ output_used += name_len;
+ output_need_space = 1;
+ return 1;
+}
+
static int emit_token(const struct Token *tok)
{
if (tok->kind == TOK_LBRACE || tok->kind == TOK_RBRACE) {
return 1;
}
+ if (tok->kind == TOK_WORD && tok->text.len >= 2 &&
+ tok->text.ptr[0] == ':' && tok->text.ptr[1] == ':') {
+ return emit_scoped_label(tok, 2, ':');
+ }
+ if (tok->kind == TOK_WORD && tok->text.len >= 3 &&
+ tok->text.ptr[0] == '&' &&
+ tok->text.ptr[1] == ':' && tok->text.ptr[2] == ':') {
+ return emit_scoped_label(tok, 3, '&');
+ }
if (output_need_space) {
if (output_used + 1 >= MAX_OUTPUT) {
return fail("output overflow");
@@ -1542,6 +1602,44 @@ static int expand_call(struct Stream *s, const struct Macro *macro)
return push_pool_stream_from_mark(mark);
}
+static int push_scope(struct Stream *s)
+{
+ s->pos++;
+ if (s->pos >= s->end || s->pos->kind != TOK_WORD) {
+ return fail("bad scope header");
+ }
+ if (scope_depth >= MAX_SCOPE_DEPTH) {
+ return fail("scope depth overflow");
+ }
+ scope_stack[scope_depth++] = s->pos->text;
+ s->pos++;
+ if (s->pos < s->end && s->pos->kind != TOK_NEWLINE) {
+ return fail("bad scope header");
+ }
+ if (s->pos < s->end) {
+ s->pos++;
+ }
+ s->line_start = 1;
+ return 1;
+}
+
+static int pop_scope(struct Stream *s)
+{
+ s->pos++;
+ if (scope_depth <= 0) {
+ return fail("scope underflow");
+ }
+ scope_depth--;
+ while (s->pos < s->end && s->pos->kind != TOK_NEWLINE) {
+ s->pos++;
+ }
+ if (s->pos < s->end) {
+ s->pos++;
+ }
+ s->line_start = 1;
+ return 1;
+}
+
static int process_tokens(void)
{
if (!push_stream_span((struct TokenSpan){source_tokens, source_tokens + source_count}, -1)) {
@@ -1591,6 +1689,24 @@ static int process_tokens(void)
continue;
}
+ if (s->line_start &&
+ tok->kind == TOK_WORD &&
+ token_text_eq(tok, "%scope")) {
+ if (!push_scope(s)) {
+ return 0;
+ }
+ continue;
+ }
+
+ if (s->line_start &&
+ tok->kind == TOK_WORD &&
+ token_text_eq(tok, "%endscope")) {
+ if (!pop_scope(s)) {
+ return 0;
+ }
+ continue;
+ }
+
if (tok->kind == TOK_NEWLINE) {
s->pos++;
s->line_start = 1;
@@ -1632,6 +1748,10 @@ static int process_tokens(void)
}
}
+ if (scope_depth != 0) {
+ return fail("scope not closed");
+ }
+
if (output_used >= MAX_OUTPUT) {
return fail("output overflow");
}
diff --git a/p1/p1_gen.py b/p1/p1_gen.py
@@ -67,8 +67,8 @@ LOGI_IMMS = (
SHIFT_IMMS = tuple(range(64))
MEM_OFFS = (
- -256, -128, -64, -48, -32, -24, -16, -8, -1, 0, 1, 7, 8, 15, 16, 24, 32,
- 40, 48, 56, 64, 128, 255,
+ -256, -128, -64, -48, -32, -24, -16, -8, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
+ 15, 16, 24, 32, 40, 48, 56, 64, 128, 255,
)
LDARG_SLOTS = tuple(range(32))
diff --git a/tests/m1pp/00-hello.M1 b/tests/m1pp/00-hello.M1
@@ -1,9 +1,8 @@
-## Phase 0 smoke fixture: P1v2 hello-world.
+## P1v2 hello-world smoke fixture.
##
-## Proves that the m1pp build pipeline (lint -> prune -> catm -> M0 -> ELF
-## link -> hex2-0) works against build/p1v2/aarch64/p1_aarch64.M1.
-## Independent of m1pp/m1pp.M1's current state so Phase 0 can land before
-## Phase 1.
+## Exercises the build pipeline (lint -> prune -> catm -> M0 -> ELF link ->
+## hex2-0) against build/p1v2/aarch64/p1_aarch64.M1. Standalone program —
+## does not drive the m1pp expander.
##
## P1v2 syscall ABI:
## a0 = syscall number on entry, return value on exit
diff --git a/tests/m1pp/01-passthrough.M1pp b/tests/m1pp/01-passthrough.M1pp
@@ -1,4 +1,4 @@
-## Phase 1 parity fixture: tokenizer + pass-through + structural %macro skip.
+## Pass-through fixture: tokenizer + structural %macro skip.
## No macro calls, no ## paste, no !@%$ or %select. The m1pp expander must
## match the C oracle byte-for-byte on this input.
diff --git a/tests/m1pp/01-passthrough.expected b/tests/m1pp/01-passthrough.expected
@@ -1,4 +1,4 @@
-## Phase 1 parity fixture: tokenizer + pass-through + structural %macro skip.
+## Pass-through fixture: tokenizer + structural %macro skip.
## No macro calls , no ## paste , no !@%$ or %select. The m1pp expander must
## match the C oracle byte-for-byte on this input.
diff --git a/tests/m1pp/02-defs.M1pp b/tests/m1pp/02-defs.M1pp
@@ -1,5 +1,5 @@
-## Phase 2 parity fixture: %macro definitions are stored, not invoked.
-## Defs produce no output; non-def tokens pass through as in Phase 1.
+## %macro definitions are stored, not invoked.
+## Defs produce no output; non-def tokens pass through unchanged.
## Exercises: 0/1/many params, multi-token bodies, string body tokens,
## body-internal ## paste, %macro-looking words mid-line, empty body.
before
diff --git a/tests/m1pp/02-defs.expected b/tests/m1pp/02-defs.expected
@@ -1,4 +1,4 @@
-## Phase 2 parity fixture: %macro definitions are stored , not invoked.
+## %macro definitions are stored , not invoked.
## Defs produce no output
## Exercises: 0/1/many params , multi-token bodies , string body tokens ,
## body-internal ## paste , %macro-looking words mid-line , empty body.
diff --git a/tests/m1pp/03-builtins.M1pp b/tests/m1pp/03-builtins.M1pp
@@ -1,4 +1,4 @@
-# Phase 8 parity: each of !(1B), @(2B), %(4B), $(8B) emits little-endian
+# Integer-emission builtins: !(1B), @(2B), %(4B), $(8B) emit little-endian
# uppercase hex of (2 * size) chars. Exercises each size at:
# - small literal so byte-order is observable
# - hex literal that fills the slot exactly
diff --git a/tests/m1pp/06-paste.M1pp b/tests/m1pp/06-paste.M1pp
@@ -1,4 +1,4 @@
-# Phase 6 paste compaction.
+# `##` paste compaction inside macro bodies.
# - param ## param (basic)
# - literal ## param / param ## literal
# - chain: a ## b ## c (paste compactor processes left-to-right)
diff --git a/tests/m1pp/11-local-labels.M1pp b/tests/m1pp/11-local-labels.M1pp
@@ -1,8 +1,8 @@
-# Local labels (§1): `:@name` / `&@name` inside macro bodies rewrite to
+# Local labels: `:@name` / `&@name` inside macro bodies rewrite to
# `:name__NN` / `&name__NN` where NN is a fresh monotonic id per expansion.
# Scoping: body-native only; param-substituted tokens pass through untouched.
#
-# Scenarios:
+# Covers:
# 1) a single macro using `:@end` called twice -> end__1, end__2 distinct
# 2) nested macros each using `:@done` -> outer/inner get separate ids
# 3) `&@label` address form rewrites the same way
diff --git a/tests/m1pp/12-braced-args.M1pp b/tests/m1pp/12-braced-args.M1pp
@@ -1,4 +1,4 @@
-# Braced block arguments (§2 of M1PP-EXT):
+# Braced block arguments:
# - { ... } groups tokens into one arg, protecting commas inside
# - outer { ... } is stripped when the arg span begins with LBRACE and
# ends with its matching RBRACE
diff --git a/tests/m1pp/13-parenless.M1pp b/tests/m1pp/13-parenless.M1pp
@@ -1,4 +1,4 @@
-# Paren-less 0-arg macro calls (M1PP-EXT §4):
+# Paren-less 0-arg macro calls:
# - a zero-param macro invoked without trailing () expands the same as with ()
# - applies at top level and as an atom inside %(...) expressions
# - non-zero-param macros still require their (arg, ...) syntax — %add1
diff --git a/tests/m1pp/14-str-builtin.M1pp b/tests/m1pp/14-str-builtin.M1pp
@@ -1,4 +1,4 @@
-# Phase 14 %str stringification builtin.
+# %str stringification builtin.
# - %str(IDENT) wraps the identifier text in double quotes
# - result is a TOK_STRING, byte-identical to a hand-written literal
diff --git a/tests/m1pp/14-str-paste.M1pp b/tests/m1pp/14-str-paste.M1pp
@@ -1,4 +1,4 @@
-# Phase 14 paste + stringify.
+# `##` paste + `%str` stringify composed on the same identifier.
# - `##` joins word fragments: str_##n -> str_quote (TOK_WORD).
# - `%str(n)` wraps the same identifier in quotes (TOK_STRING).
# - Complementary operators: paste builds the label, %str builds the literal.
diff --git a/tests/m1pp/15-struct.M1pp b/tests/m1pp/15-struct.M1pp
@@ -1,8 +1,8 @@
-# %struct directive (M1PP-EXT §5):
+# %struct directive:
# - %struct NAME { f1 f2 ... } synthesizes N+1 zero-parameter macros:
# NAME.field_k -> k*8 (decimal word)
# NAME.SIZE -> N*8
-# - paren-less access (§4) is the natural read form: %closure.body
+# - paren-less access is the natural read form: %closure.body
# - composes via a plain wrapper macro using %frame_hdr.SIZE for stack-
# frame layouts
@@ -15,7 +15,7 @@
%closure.env
%closure.SIZE
-# With parens still works (§4 parity).
+# With parens still works.
%closure.body()
# Inside an expression atom: loads 16+100 = 116 -> 0x74.
diff --git a/tests/m1pp/16-enum.M1pp b/tests/m1pp/16-enum.M1pp
@@ -1,4 +1,4 @@
-# %enum directive (M1PP-EXT §6):
+# %enum directive:
# - %enum NAME { l1 l2 ... } synthesizes N+1 zero-parameter macros:
# NAME.label_k -> k
# NAME.COUNT -> N
diff --git a/tests/m1pp/17-scopes.M1pp b/tests/m1pp/17-scopes.M1pp
@@ -0,0 +1,93 @@
+# Lexical scopes: `::name` / `&::name` rewrite at emit time against the
+# current scope stack, joined by `__`. Resolution is anaphoric: a `::foo`
+# token inside a macro body resolves against the caller's scope, not the
+# macro's expansion id.
+#
+# Covers:
+# - empty-stack pass-through: `::foo` -> `:foo`, `&::bar` -> `&bar`
+# - basic scope: `::start` -> `:parse_number__start`
+# - nested scopes: `::a` under [outer, inner] -> `:outer__inner__a`
+# - scope name from a macro argument (loop_scoped pattern)
+# - anaphoric macros: %break / %continue reach the caller's innermost scope
+# - nested scope-introducing macros: innermost scope wins for %break
+# - hygienic `:@` and anaphoric `::` coexist in one macro
+
+%macro loop_scoped(name, body)
+%scope name
+::top
+body
+LA_BR &::top
+B
+::end
+%endscope
+%endm
+
+%macro break()
+LA_BR &::end
+B
+%endm
+
+%macro continue()
+LA_BR &::top
+B
+%endm
+
+%macro while_scoped_nez(name, ra, body)
+%scope name
+LA_BR &@top
+B
+:@body
+body
+:@top
+LA_BR &@body
+BNEZ ra
+::end
+%endscope
+%endm
+
+# Empty-stack pass-through.
+::foo
+&::bar
+
+# Basic scope.
+%scope parse_number
+::start
+LA_BR &::done
+BEQZ a0
+::done
+ERET
+%endscope
+
+# Nested scopes.
+%scope outer
+::before
+%scope inner
+::a
+LA_BR &::a
+B
+%endscope
+::after
+%endscope
+
+# Scope name from macro arg; anaphoric %break / %continue.
+%loop_scoped(scan, {
+LI t0, 0
+%break()
+%continue()
+})
+
+# Nested scope-introducing macro intercepts inner %break.
+%loop_scoped(outer, {
+%loop_scoped(inner, {
+%break()
+})
+%break()
+})
+
+# Hygienic `@` and anaphoric `::` together.
+%while_scoped_nez(retry, a0, {
+LI t1, 1
+%break()
+})
+
+END
diff --git a/tests/m1pp/17-scopes.expected b/tests/m1pp/17-scopes.expected
@@ -0,0 +1,91 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+:foo
+&bar
+
+
+:parse_number__start
+LA_BR &parse_number__done
+BEQZ a0
+:parse_number__done
+ERET
+
+
+:outer__before
+:outer__inner__a
+LA_BR &outer__inner__a
+B
+:outer__after
+
+
+:scan__top
+
+LI t0 , 0
+LA_BR &scan__end
+B
+
+LA_BR &scan__top
+B
+
+
+LA_BR &scan__top
+B
+:scan__end
+
+
+
+:outer__top
+
+:outer__inner__top
+
+LA_BR &outer__inner__end
+B
+
+
+LA_BR &outer__inner__top
+B
+:outer__inner__end
+
+LA_BR &outer__end
+B
+
+
+LA_BR &outer__top
+B
+:outer__end
+
+
+
+LA_BR &top__8
+B
+:body__8
+
+LI t1 , 1
+LA_BR &retry__end
+B
+
+
+:top__8
+LA_BR &body__8
+BNEZ a0
+:retry__end
+
+
+END
diff --git a/tests/m1pp/_12-braced-malformed.M1pp b/tests/m1pp/_12-braced-malformed.M1pp
@@ -5,8 +5,8 @@
# reports "unbalanced braces".
#
# No `.expected` file is needed — the leading underscore in the filename
-# causes m1pp/test.sh to skip this fixture. It is verified manually via the
-# verification block in the §2 implementation notes.
+# causes m1pp/test.sh to skip this fixture. Run by hand to observe the
+# non-zero exit with "unbalanced braces".
%macro F(a, b)
a b
diff --git a/tests/m1pp/_14-str-malformed.M1pp b/tests/m1pp/_14-str-malformed.M1pp
@@ -1,4 +1,4 @@
-# Phase 14 %str malformed input.
+# %str malformed input.
# - Underscore-prefix => skipped by test.sh.
# - Expected outcome: m1pp exits non-zero.
# - %str takes exactly one single-token WORD argument. A multi-token