boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit e2b9951654ab6ba3250b936f286753bb0ca5e274
parent 87e3956d3b23b7ef1e537e75fc304782bd3df6bb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 24 Apr 2026 11:56:04 -0700

Add %scope / %endscope and :: label rewrite to M1PP

A lexical scope stack driven by `%scope NAME` / `%endscope`. A WORD
token starting with `::` emits as `:scope1__..__scopeN__name` at output
time; `&::name` emits the reference form. With an empty stack the
sigil+name passes through. Resolution is emit-time, so `::foo` inside a
macro body resolves against the caller's scope — making generic
`%break()` / `%continue()` possible.

Implemented in both `m1pp/m1pp.c` (reference) and `m1pp/m1pp.P1`
(self-hosted). New tests/m1pp/17-scopes covers empty-stack pass-through,
nested scopes, scope name from a macro argument, anaphoric break /
continue, innermost-wins for nested scope-introducing macros, and
`:@` + `::` coexisting in one macro.

Also extend p1_gen.py MEM_OFFS with small positive offsets (2–6) so the
scope check can use `lb_*,*,2` directly; regenerated
`build/p1v2/aarch64/p1_aarch64.M1` (gitignored).

Doc: new Scoped labels section in docs/M1PP.md, updated Directives,
Limits, and Errors. docs/LIBP1PP.md's %fn / tagged-loop entries
rewritten to reflect scopes being available.

Also scrubs pre-existing phasing / section-number comments (Phase N,
§N, M1PP-EXT) from m1pp.P1 and every tests/m1pp fixture, for a cleaner
description of current behavior.

Diffstat:
Mdocs/LIBP1PP.md | 26+++++++++-----------------
Mdocs/M1PP.md | 72+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
Mm1pp/m1pp.P1 | 441+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
Mm1pp/m1pp.c | 120+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mp1/p1_gen.py | 4++--
Mtests/m1pp/00-hello.M1 | 9++++-----
Mtests/m1pp/01-passthrough.M1pp | 2+-
Mtests/m1pp/01-passthrough.expected | 2+-
Mtests/m1pp/02-defs.M1pp | 4++--
Mtests/m1pp/02-defs.expected | 2+-
Mtests/m1pp/03-builtins.M1pp | 2+-
Mtests/m1pp/06-paste.M1pp | 2+-
Mtests/m1pp/11-local-labels.M1pp | 4++--
Mtests/m1pp/12-braced-args.M1pp | 2+-
Mtests/m1pp/13-parenless.M1pp | 2+-
Mtests/m1pp/14-str-builtin.M1pp | 2+-
Mtests/m1pp/14-str-paste.M1pp | 2+-
Mtests/m1pp/15-struct.M1pp | 6+++---
Mtests/m1pp/16-enum.M1pp | 2+-
Atests/m1pp/17-scopes.M1pp | 93+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atests/m1pp/17-scopes.expected | 91+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtests/m1pp/_12-braced-malformed.M1pp | 4++--
Mtests/m1pp/_14-str-malformed.M1pp | 2+-
23 files changed, 845 insertions(+), 51 deletions(-)

diff --git a/docs/LIBP1PP.md b/docs/LIBP1PP.md @@ -208,13 +208,10 @@ explicit labels: ### Tagged loops: `%loop_tag`, `%while_tag_<cc>`, `%for_lt_tag` -> **Planned migration.** When the M1PP scope feature -> (`docs/M1PP-SCOPE.md`) lands, the tagged-loop family is retired in -> favor of scoped equivalents (`%loop_scoped`, `%while_scoped_<cc>`, -> `%for_lt_scoped`) paired with a generic `%break()` / `%continue()`. -> The tagged forms remain in libp1pp v1 until scopes ship, so existing -> callers keep working; new code should prefer the scoped forms once -> they are available. +> Tagged loops predate M1PP's `%scope` feature. They still work, but new +> code should prefer the scoped equivalents (`%loop_scoped`, +> `%while_scoped_<cc>`, `%for_lt_scoped`) paired with the generic +> `%break()` / `%continue()` — no tag argument required. M1PP's `@` local-label mechanism is scoped to the defining macro's body: an `&@name` token passed to a macro through an argument is not stamped and @@ -309,11 +306,11 @@ frame-local storage. Expands to: - an `%eret()` epilogue, - a matching `%endscope`. -`%fn` is a scope-introducing-with-block macro in the sense defined by -M1PP-SCOPE.md. It pushes the scope `name`. Any `%break()` / -`%continue()` directly in `body` would target `name__end` / `name__top` -— which `%fn` itself does not define, so those should only appear inside -a nested scope-introducing loop. +`%fn` is a scope-introducing-with-block macro: it pushes the scope +`name` around `body`. Any `%break()` / `%continue()` directly in `body` +would target `name__end` / `name__top` — which `%fn` itself does not +define, so those should only appear inside a nested scope-introducing +loop. Example: @@ -554,11 +551,6 @@ functions that have established a frame with `ENTER`. The following were considered and deferred: -- Untagged `%break()` / `%continue()` — these become possible once the - M1PP scope feature (see `docs/M1PP-SCOPE.md`) lands; until then, - callers use the `%break(tag)` / `%continue(tag)` tagged forms. `%fn` - already specs against the scope feature and is a standing TODO until - that feature is implemented. - Field-access helpers such as `%ld_field` — `LD rd, [base + %S.f]` is short enough. - `printf`-style formatted output — replaced by dedicated `print_*` and diff --git a/docs/M1PP.md b/docs/M1PP.md @@ -66,6 +66,19 @@ Like `%struct` with stride 1 and a trailing `COUNT`: - `%NAME.l1` → `0`, `%NAME.l2` → `1`, ... - `%NAME.COUNT` → `N` +### `%scope` / `%endscope` + + %scope NAME + ... body ... + %endscope + +Pushes `NAME` onto a lexical scope stack active until the matching +`%endscope`. Scopes nest. While the stack is non-empty, any `::name` or +`&::name` token emitted from within is rewritten with the current scope +path (see [Scoped labels](#scoped-labels)). Every `%scope` must be closed +before end-of-input. `NAME` is a single `WORD` token and may come from +macro-argument substitution. + ## Macro calls %NAME(arg, arg, ...) @@ -98,6 +111,61 @@ fresh label namespace: - `:@loop` → `:loop__7` - `&@loop` → `&loop__7` +Each macro expansion gets a fresh `N`, so `:@loop` in two different call +sites (or two different macros) never collide. Argument-substituted tokens +keep their original text and are not rewritten, so a `:@name` literal +passed as a macro argument passes through verbatim. + +## Scoped labels + +A `WORD` token whose text starts with `::` is a scoped label definition; a +token starting with `&::` is a scoped reference. The `::` prefix is rewritten +at **emit time** against the current `%scope` stack: + +- stack = `[parse_number]`: `::start` → `:parse_number__start` +- stack = `[outer, inner]`: `&::end` → `&outer__inner__end` +- stack empty: `::foo` → `:foo`; `&::bar` → `&bar` (pass-through) + +Because resolution is at emit time rather than macro-expansion time, a +`::foo` token written inside a macro body resolves against whatever scope +is active at the point the token flows to the output — i.e. the caller's +surroundings, not the macro's own expansion id. This makes generic +control-flow macros possible: + + %macro loop_scoped(name, body) + %scope name + ::top + body + LA_BR &::top + B + ::end + %endscope + %endm + + %macro break() + LA_BR &::end + B + %endm + + %loop_scoped(scan, { + ... + %if_eqz(a0, { %break() }) + ... + }) + +Inside the expansion, `%loop_scoped` has pushed the scope `[scan]`, so +when `%break()`'s `&::end` token is finally emitted the stack is `[scan]` +and the output is `&scan__end` — exactly the label `%loop_scoped` +defined at the bottom of its body. A nested `%loop_scoped(inner, { ... })` +makes `[outer, inner]` the active stack, so a `%break()` inside the inner +block targets the innermost scope. To jump past an intervening scope, +write the concatenated name explicitly (`&outer__end`). + +Scoped labels and local (`:@` / `&@`) labels are independent and compose. +A common pattern: use `:@` for the macro's private internal labels (the +caller can never name them) and `::` for labels that are the macro's +public contract with its caller (`::end`, `::top`, etc.). + ## Built-in calls These are recognized wherever a token matches, not only at line start. @@ -165,6 +233,7 @@ Fixed at compile time: | parameters per macro | 16 | | stream stack depth | 64 | | expression frames | 256 | +| scope stack depth | 32 | Exceeding any limit aborts with an error message on `stderr`. @@ -176,4 +245,5 @@ Reasons are terse: `bad macro header`, `unterminated macro`, `text overflow`, `token overflow`, `expansion overflow`, `output overflow`, `stream overflow`, `unbalanced braces`, `too many args`, `too many macros`, `bad integer`, `bad directive`, `unterminated directive`, -`unterminated macro call`. +`unterminated macro call`, `bad scope header`, `scope underflow`, +`scope not closed`, `scope depth overflow`, `bad scope label`. diff --git a/m1pp/m1pp.P1 b/m1pp/m1pp.P1 @@ -91,6 +91,11 @@ DEFINE M1PP_EXPR_FRAMES_CAP 0009000000000000 ## Common cap used by macro params, call args, and expression args. DEFINE M1PP_MAX_PARAMS 1000000000000000 +## Scope-stack cap. 32 nested scopes max; each slot is a 16-byte TextSpan +## (ptr + len) pointing into stable text (input_buf or text_buf), so +## scope_stack is 32 × 16 = 512 bytes. +DEFINE M1PP_MAX_SCOPE_DEPTH 2000000000000000 + ## ExprOp codes (indexed by apply_expr_op). DEFINE EXPR_ADD 0000000000000000 DEFINE EXPR_SUB 0100000000000000 @@ -768,6 +773,51 @@ DEFINE EXPR_INVALID 1200000000000000 la_br &emit_token_skip beq_t0,t1 + # Scope rewrite: TOK_WORD whose text begins with "::" (len>=3) becomes + # a scoped definition, "&::" (len>=4) a scoped reference. Dispatch to + # emit_scope_rewrite with a1=skip, a2=sigil. + ld_a1,a0,0 + li_a2 TOK_WORD + la_br &emit_token_after_scope + bne_a1,a2 + ld_a2,a0,16 + li_a3 %3 %0 + la_br &emit_token_after_scope + blt_a2,a3 + ld_a3,a0,8 + lb_t0,a3,0 + li_t1 %58 %0 + la_br &emit_token_check_amp + bne_t0,t1 + lb_t0,a3,1 + li_t1 %58 %0 + la_br &emit_token_after_scope + bne_t0,t1 + li_a1 %2 %0 + li_a2 %58 %0 + la_br &emit_scope_rewrite + b +:emit_token_check_amp + li_t1 %38 %0 + la_br &emit_token_after_scope + bne_t0,t1 + ld_a2,a0,16 + li_t2 %4 %0 + la_br &emit_token_after_scope + blt_a2,t2 + lb_t0,a3,1 + li_t1 %58 %0 + la_br &emit_token_after_scope + bne_t0,t1 + lb_t0,a3,2 + la_br &emit_token_after_scope + bne_t0,t1 + li_a1 %3 %0 + li_a2 %38 %0 + la_br &emit_scope_rewrite + b + +:emit_token_after_scope # if (output_need_space) emit ' ' (skip the space for the first token on a line) la_a1 &output_need_space ld_t0,a1,0 @@ -825,6 +875,164 @@ DEFINE EXPR_INVALID 1200000000000000 :emit_token_skip ret +## emit_scope_rewrite: branch target from emit_token for tokens whose text +## starts with "::" (scoped definition) or "&::" (scoped reference). +## Writes sigil + scope1 + "__" + ... + scopeN + "__" + name directly to +## output_buf; with an empty scope stack the middle collapses so output is +## just sigil + name (pass-through). Not a callable function: reached by `b`, +## shares emit_token's leaf return address, exits via `ret`. +## +## Register inputs: +## a0 = tok_ptr +## a1 = skip (2 for "::", 3 for "&::") +## a2 = sigil (':' = 58 for definitions, '&' = 38 for references) +:emit_scope_rewrite + # name_len = tok->text_len - skip; fail if zero. + ld_a3,a0,16 + sub_a3,a3,a1 + la_br &err_bad_scope_label + beqz_a3 + + # Spill inputs — the byte-copy loops below reuse a0..a3/t0..t2 freely. + la_t0 &sr_tok_ptr + st_a0,t0,0 + la_t0 &sr_skip + st_a1,t0,0 + la_t0 &sr_sigil + st_a2,t0,0 + la_t0 &sr_name_len + st_a3,t0,0 + + # Emit leading ' ' if output_need_space. + la_a0 &output_need_space + ld_t0,a0,0 + la_br &sr_post_space + beqz_t0 + la_a1 &output_used + ld_t0,a1,0 + li_t1 M1PP_OUTPUT_CAP + la_br &err_output_overflow + beq_t0,t1 + la_a2 &output_buf + add_a2,a2,t0 + li_t1 %32 %0 + sb_t1,a2,0 + addi_t0,t0,1 + st_t0,a1,0 +:sr_post_space + + # Emit the sigil byte. + la_a0 &output_used + ld_t0,a0,0 + li_t1 M1PP_OUTPUT_CAP + la_br &err_output_overflow + beq_t0,t1 + la_a1 &output_buf + add_a1,a1,t0 + la_a2 &sr_sigil + ld_a3,a2,0 + sb_a3,a1,0 + addi_t0,t0,1 + st_t0,a0,0 + + # Emit each scope frame's bytes followed by "__". + li_t0 %0 %0 +:sr_scope_outer + la_a0 &scope_depth + ld_a1,a0,0 + la_br &sr_tail_start + beq_t0,a1 + + la_a0 &scope_stack + li_a2 %16 %0 + mul_a2,a2,t0 + add_a0,a0,a2 + ld_a1,a0,0 + ld_a2,a0,8 + li_a3 %0 %0 +:sr_scope_inner + la_br &sr_scope_sep + beq_a3,a2 + la_t1 &output_used + ld_t2,t1,0 + li_a0 M1PP_OUTPUT_CAP + la_br &err_output_overflow + beq_t2,a0 + la_a0 &output_buf + add_a0,a0,t2 + add_t2,a1,a3 + lb_t2,t2,0 + sb_t2,a0,0 + la_t1 &output_used + ld_t2,t1,0 + addi_t2,t2,1 + st_t2,t1,0 + addi_a3,a3,1 + la_br &sr_scope_inner + b +:sr_scope_sep + la_a0 &output_used + ld_t1,a0,0 + li_t2 M1PP_OUTPUT_CAP + la_br &err_output_overflow + beq_t1,t2 + la_a1 &output_buf + add_a1,a1,t1 + li_a2 %95 %0 + sb_a2,a1,0 + addi_t1,t1,1 + st_t1,a0,0 + la_a0 &output_used + ld_t1,a0,0 + li_t2 M1PP_OUTPUT_CAP + la_br &err_output_overflow + beq_t1,t2 + la_a1 &output_buf + add_a1,a1,t1 + li_a2 %95 %0 + sb_a2,a1,0 + addi_t1,t1,1 + st_t1,a0,0 + addi_t0,t0,1 + la_br &sr_scope_outer + b + +:sr_tail_start + la_a0 &sr_tok_ptr + ld_a1,a0,0 + ld_a2,a1,8 + la_a0 &sr_skip + ld_a3,a0,0 + add_a1,a2,a3 + la_a0 &sr_name_len + ld_a2,a0,0 + li_a3 %0 %0 +:sr_tail_loop + la_br &sr_tail_done + beq_a3,a2 + la_t1 &output_used + ld_t2,t1,0 + li_a0 M1PP_OUTPUT_CAP + la_br &err_output_overflow + beq_t2,a0 + la_a0 &output_buf + add_a0,a0,t2 + add_t2,a1,a3 + lb_t2,t2,0 + sb_t2,a0,0 + la_t1 &output_used + ld_t2,t1,0 + addi_t2,t2,1 + st_t2,t1,0 + addi_a3,a3,1 + la_br &sr_tail_loop + b +:sr_tail_done + la_a0 &output_need_space + li_a1 %1 %0 + st_a1,a0,0 + ret + ## --- Main processor ---------------------------------------------------------- ## Stream-driven loop. Pushes source_tokens as the initial stream, then drives ## the streams[] stack until it empties. Per iteration: pop the stream if @@ -952,7 +1160,7 @@ DEFINE EXPR_INVALID 1200000000000000 li_a2 %5 %0 la_br &tok_eq_const call - la_br &proc_check_newline + la_br &proc_check_scope beqz_a0 # %enum matched: shim into define_fielded(stride=1, total="COUNT", len=5) @@ -976,6 +1184,68 @@ DEFINE EXPR_INVALID 1200000000000000 la_br &proc_loop b +## ---- line_start && tok eq "%scope" ---- +:proc_check_scope + ld_t0,sp,8 + mov_a0,t0 + la_a1 &const_scope + li_a2 %6 %0 + la_br &tok_eq_const + call + la_br &proc_check_endscope + beqz_a0 + + # %scope matched: shim into push_scope(stream_end). + ld_t0,sp,8 + la_a0 &proc_pos + st_t0,a0,0 + la_a0 &proc_line_start + li_a1 %1 %0 + st_a1,a0,0 + ld_a0,sp,0 + ld_a0,a0,8 + la_br &push_scope + call + ld_a0,sp,0 + la_a1 &proc_pos + ld_t0,a1,0 + st_t0,a0,16 + li_t1 %1 %0 + st_t1,a0,24 + la_br &proc_loop + b + +## ---- line_start && tok eq "%endscope" ---- +:proc_check_endscope + ld_t0,sp,8 + mov_a0,t0 + la_a1 &const_endscope + li_a2 %9 %0 + la_br &tok_eq_const + call + la_br &proc_check_newline + beqz_a0 + + # %endscope matched: shim into pop_scope(stream_end). + ld_t0,sp,8 + la_a0 &proc_pos + st_t0,a0,0 + la_a0 &proc_line_start + li_a1 %1 %0 + st_a1,a0,0 + ld_a0,sp,0 + ld_a0,a0,8 + la_br &pop_scope + call + ld_a0,sp,0 + la_a1 &proc_pos + ld_t0,a1,0 + st_t0,a0,16 + li_t1 %1 %0 + st_t1,a0,24 + la_br &proc_loop + b + :proc_check_newline # reload s, tok ld_a0,sp,0 @@ -1073,7 +1343,7 @@ DEFINE EXPR_INVALID 1200000000000000 :proc_check_macro # macro = find_macro(tok); if non-zero AND # ((tok+1 < s->end AND (tok+1)->kind == TOK_LPAREN) OR macro->param_count == 0) - # then expand_call. (§4 paren-less 0-arg calls.) + # then expand_call. Paren-less form is reserved for 0-arg macros. ld_a0,sp,8 la_br &find_macro call @@ -1132,6 +1402,113 @@ DEFINE EXPR_INVALID 1200000000000000 b :proc_done + # Every %scope must be matched by an %endscope before EOF. + la_a0 &scope_depth + ld_t0,a0,0 + la_br &err_scope_not_closed + bnez_t0 + eret + +## --- %scope / %endscope handlers -------------------------------------------- +## Called at proc_pos == the `%scope` / `%endscope` word on a line-start. +## Input: a0 = stream end (pointer one past last token in the current stream). +## Output: proc_pos advanced past the trailing newline (or stream end). + +## push_scope(a0 = stream_end): consume `%scope NAME\n`. +## Name must be a single WORD token; anything else on the line is an error. +:push_scope + enter_0 + + # proc_pos += 24 (skip past the `%scope` token). + la_t0 &proc_pos + ld_t1,t0,0 + addi_t1,t1,24 + st_t1,t0,0 + + # Require a WORD name token within the stream. + la_br &err_bad_scope_header + beq_t1,a0 + ld_t2,t1,0 + la_br &err_bad_scope_header + bnez_t2 + + # scope_depth < MAX_SCOPE_DEPTH? + la_a1 &scope_depth + ld_a2,a1,0 + li_a3 M1PP_MAX_SCOPE_DEPTH + la_br &err_scope_depth_overflow + beq_a2,a3 + + # scope_stack[scope_depth] = (name.text_ptr, name.text_len) + la_a3 &scope_stack + li_t0 %16 %0 + mul_t0,t0,a2 + add_a3,a3,t0 + ld_t0,t1,8 + st_t0,a3,0 + ld_t0,t1,16 + st_t0,a3,8 + + # scope_depth++ + addi_a2,a2,1 + st_a2,a1,0 + + # proc_pos += 24 (past the name). + la_t0 &proc_pos + ld_t1,t0,0 + addi_t1,t1,24 + st_t1,t0,0 + + # EOF here is tolerated (caller handles stream end). Otherwise the next + # token must be TOK_NEWLINE — anything else is a header error. + la_br &psc_done + beq_t1,a0 + ld_t2,t1,0 + li_t0 TOK_NEWLINE + la_br &err_bad_scope_header + bne_t2,t0 + addi_t1,t1,24 + la_t0 &proc_pos + st_t1,t0,0 +:psc_done + eret + +## pop_scope(a0 = stream_end): consume `%endscope\n`. Extra tokens on the line +## are tolerated (matches %endm's behavior) — skip to the next newline. +:pop_scope + enter_0 + + # scope_depth > 0? + la_a1 &scope_depth + ld_a2,a1,0 + la_br &err_scope_underflow + beqz_a2 + addi_a2,a2,neg1 + st_a2,a1,0 + + # proc_pos += 24 (past the `%endscope` token). + la_t0 &proc_pos + ld_t1,t0,0 + addi_t1,t1,24 + st_t1,t0,0 + +:pop_skip_loop + la_br &pop_done + beq_t1,a0 + ld_t2,t1,0 + li_t0 TOK_NEWLINE + la_br &pop_consume_newline + beq_t2,t0 + addi_t1,t1,24 + la_t0 &proc_pos + st_t1,t0,0 + la_br &pop_skip_loop + b +:pop_consume_newline + addi_t1,t1,24 + la_t0 &proc_pos + st_t1,t0,0 +:pop_done eret ## --- %macro storage: parse header + body into macros[] / macro_body_tokens -- @@ -2554,7 +2931,7 @@ DEFINE EXPR_INVALID 1200000000000000 # lparen = call_tok + 24 addi_a0,a0,24 - # Branch split (§4 paren-less 0-arg calls): + # Branch split for paren-less 0-arg calls: # if lparen < limit AND lparen->kind == TOK_LPAREN: parse_args as usual. # else if macro->param_count == 0: synthesize empty arg list, no parse_args. # else: fatal "bad macro call". @@ -4272,7 +4649,7 @@ DEFINE EXPR_INVALID 1200000000000000 la_br &eea_int_atom beqz_a0 - # §4 paren-less 0-arg atom: + # Paren-less 0-arg atom: # Take the macro-call branch if (tok+1 < limit AND (tok+1)->kind == TOK_LPAREN) # OR macro->param_count == 0. Otherwise fall through to int atom (unchanged). ld_t0,sp,0 @@ -5321,6 +5698,31 @@ DEFINE EXPR_INVALID 1200000000000000 li_a1 %36 %0 la_br &fatal b +:err_bad_scope_header + la_a0 &msg_bad_scope_header + li_a1 %16 %0 + la_br &fatal + b +:err_scope_depth_overflow + la_a0 &msg_scope_depth_overflow + li_a1 %20 %0 + la_br &fatal + b +:err_scope_underflow + la_a0 &msg_scope_underflow + li_a1 %15 %0 + la_br &fatal + b +:err_scope_not_closed + la_a0 &msg_scope_not_closed + li_a1 %16 %0 + la_br &fatal + b +:err_bad_scope_label + la_a0 &msg_bad_scope_label + li_a1 %15 %0 + la_br &fatal + b ## fatal(a0=msg_ptr, a1=msg_len): writes "m1pp: <msg>\n" to stderr, exits 1. ## Saves args across the three syscalls since a0..a3 are caller-saved. @@ -5379,6 +5781,8 @@ DEFINE EXPR_INVALID 1200000000000000 :const_enum "%enum" :const_size "SIZE" :const_count "COUNT" +:const_scope "%scope" +:const_endscope "%endscope" ## Operator strings for expr_op_code. Each is a raw byte literal; lengths ## are passed separately to tok_eq_const. "<=" must be tested before "<" @@ -5425,6 +5829,11 @@ DEFINE EXPR_INVALID 1200000000000000 :msg_unbalanced_braces "unbalanced braces" :msg_bad_directive "bad %struct/%enum directive" :msg_unterminated_directive "unterminated %struct/%enum directive" +:msg_bad_scope_header "bad scope header" +:msg_scope_depth_overflow "scope depth overflow" +:msg_scope_underflow "scope underflow" +:msg_scope_not_closed "scope not closed" +:msg_bad_scope_label "bad scope label" ## --- BSS --------------------------------------------------------------------- ## Placed before :ELF_end so filesz/memsz (which this ELF header sets equal) @@ -5580,7 +5989,7 @@ ZERO8 :emt_body_start ZERO8 -## Local-label rewrite (§1). next_expansion_id is the monotonic counter +## Local-label rewrite. next_expansion_id is the monotonic counter ## (never reset); emt_expansion_id snapshots it at the start of each ## expand_macro_tokens call so nested-call BSS reuse is safe. ## ll_* slots hold body-token span + derived sizes while building the @@ -5614,7 +6023,27 @@ ZERO8 ZERO8 ZERO8 :local_label_scratch ZERO32 ZERO32 ZERO32 ZERO32 -## %struct / %enum scratch (§5, §6). define_fielded calls append_text twice +## --- Scope-stack rewrite ----------------------------------------------------- +## scope_depth: current depth (0..32). +## scope_stack: 32 × TextSpan (16 bytes each) = 512 bytes. Each slot is +## (text_ptr, text_len) pointing into stable text memory (input_buf or +## text_buf — both append-only), so names are borrowed without copying. +## sr_* slots hold emit_scope_rewrite's inputs across the byte-copy loops. +:scope_depth +ZERO8 +:scope_stack +ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 +ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 ZERO32 +:sr_tok_ptr +ZERO8 +:sr_skip +ZERO8 +:sr_sigil +ZERO8 +:sr_name_len +ZERO8 + +## %struct / %enum scratch. define_fielded calls append_text twice ## per synthesized macro, so every piece of state that must survive a call ## lives here rather than in a register. ## df_stride — 8 for %struct, 1 for %enum diff --git a/m1pp/m1pp.c b/m1pp/m1pp.c @@ -77,6 +77,7 @@ #define MAX_EXPAND 65536 #define MAX_STACK 64 #define MAX_EXPR_FRAMES 256 +#define MAX_SCOPE_DEPTH 32 enum { TOK_WORD, @@ -158,6 +159,7 @@ static struct Token macro_body_tokens[MAX_MACRO_BODY_TOKENS]; static struct Token expand_pool[MAX_EXPAND]; static struct Macro macros[MAX_MACROS]; static struct Stream streams[MAX_STACK]; +static struct TextSpan scope_stack[MAX_SCOPE_DEPTH]; static int text_used; static int source_count; @@ -168,6 +170,7 @@ static int output_used; static int output_need_space; static int stream_top; static int next_expansion_id; +static int scope_depth; static struct Token *arg_starts[MAX_PARAMS]; static struct Token *arg_ends[MAX_PARAMS]; @@ -400,11 +403,68 @@ static int emit_newline(void) return 1; } +static int emit_scoped_label(const struct Token *tok, int skip, char sigil) +{ + /* Rewrite `::name` or `&::name` against the current scope stack. + * skip is the number of leading chars to drop (`::` -> 2, `&::` -> 3); + * sigil is the single-char prefix to emit (`:` for definitions, `&` + * for references). With a non-empty scope stack the output is + * sigil + scope1 + "__" + ... + scopeN + "__" + name; with an empty + * stack it degrades to sigil + name (pass-through). */ + int name_len = tok->text.len - skip; + int i; + + if (name_len <= 0) { + return fail("bad scope label"); + } + + if (output_need_space) { + if (output_used + 1 >= MAX_OUTPUT) { + return fail("output overflow"); + } + output_buf[output_used++] = ' '; + } + + if (output_used + 1 >= MAX_OUTPUT) { + return fail("output overflow"); + } + output_buf[output_used++] = sigil; + + for (i = 0; i < scope_depth; i++) { + int span_len = scope_stack[i].len; + if (output_used + span_len + 2 >= MAX_OUTPUT) { + return fail("output overflow"); + } + memcpy(output_buf + output_used, scope_stack[i].ptr, + (size_t)span_len); + output_used += span_len; + output_buf[output_used++] = '_'; + output_buf[output_used++] = '_'; + } + + if (output_used + name_len >= MAX_OUTPUT) { + return fail("output overflow"); + } + memcpy(output_buf + output_used, tok->text.ptr + skip, (size_t)name_len); + output_used += name_len; + output_need_space = 1; + return 1; +} + static int emit_token(const struct Token *tok) { if (tok->kind == TOK_LBRACE || tok->kind == TOK_RBRACE) { return 1; } + if (tok->kind == TOK_WORD && tok->text.len >= 2 && + tok->text.ptr[0] == ':' && tok->text.ptr[1] == ':') { + return emit_scoped_label(tok, 2, ':'); + } + if (tok->kind == TOK_WORD && tok->text.len >= 3 && + tok->text.ptr[0] == '&' && + tok->text.ptr[1] == ':' && tok->text.ptr[2] == ':') { + return emit_scoped_label(tok, 3, '&'); + } if (output_need_space) { if (output_used + 1 >= MAX_OUTPUT) { return fail("output overflow"); @@ -1542,6 +1602,44 @@ static int expand_call(struct Stream *s, const struct Macro *macro) return push_pool_stream_from_mark(mark); } +static int push_scope(struct Stream *s) +{ + s->pos++; + if (s->pos >= s->end || s->pos->kind != TOK_WORD) { + return fail("bad scope header"); + } + if (scope_depth >= MAX_SCOPE_DEPTH) { + return fail("scope depth overflow"); + } + scope_stack[scope_depth++] = s->pos->text; + s->pos++; + if (s->pos < s->end && s->pos->kind != TOK_NEWLINE) { + return fail("bad scope header"); + } + if (s->pos < s->end) { + s->pos++; + } + s->line_start = 1; + return 1; +} + +static int pop_scope(struct Stream *s) +{ + s->pos++; + if (scope_depth <= 0) { + return fail("scope underflow"); + } + scope_depth--; + while (s->pos < s->end && s->pos->kind != TOK_NEWLINE) { + s->pos++; + } + if (s->pos < s->end) { + s->pos++; + } + s->line_start = 1; + return 1; +} + static int process_tokens(void) { if (!push_stream_span((struct TokenSpan){source_tokens, source_tokens + source_count}, -1)) { @@ -1591,6 +1689,24 @@ static int process_tokens(void) continue; } + if (s->line_start && + tok->kind == TOK_WORD && + token_text_eq(tok, "%scope")) { + if (!push_scope(s)) { + return 0; + } + continue; + } + + if (s->line_start && + tok->kind == TOK_WORD && + token_text_eq(tok, "%endscope")) { + if (!pop_scope(s)) { + return 0; + } + continue; + } + if (tok->kind == TOK_NEWLINE) { s->pos++; s->line_start = 1; @@ -1632,6 +1748,10 @@ static int process_tokens(void) } } + if (scope_depth != 0) { + return fail("scope not closed"); + } + if (output_used >= MAX_OUTPUT) { return fail("output overflow"); } diff --git a/p1/p1_gen.py b/p1/p1_gen.py @@ -67,8 +67,8 @@ LOGI_IMMS = ( SHIFT_IMMS = tuple(range(64)) MEM_OFFS = ( - -256, -128, -64, -48, -32, -24, -16, -8, -1, 0, 1, 7, 8, 15, 16, 24, 32, - 40, 48, 56, 64, 128, 255, + -256, -128, -64, -48, -32, -24, -16, -8, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 24, 32, 40, 48, 56, 64, 128, 255, ) LDARG_SLOTS = tuple(range(32)) diff --git a/tests/m1pp/00-hello.M1 b/tests/m1pp/00-hello.M1 @@ -1,9 +1,8 @@ -## Phase 0 smoke fixture: P1v2 hello-world. +## P1v2 hello-world smoke fixture. ## -## Proves that the m1pp build pipeline (lint -> prune -> catm -> M0 -> ELF -## link -> hex2-0) works against build/p1v2/aarch64/p1_aarch64.M1. -## Independent of m1pp/m1pp.M1's current state so Phase 0 can land before -## Phase 1. +## Exercises the build pipeline (lint -> prune -> catm -> M0 -> ELF link -> +## hex2-0) against build/p1v2/aarch64/p1_aarch64.M1. Standalone program — +## does not drive the m1pp expander. ## ## P1v2 syscall ABI: ## a0 = syscall number on entry, return value on exit diff --git a/tests/m1pp/01-passthrough.M1pp b/tests/m1pp/01-passthrough.M1pp @@ -1,4 +1,4 @@ -## Phase 1 parity fixture: tokenizer + pass-through + structural %macro skip. +## Pass-through fixture: tokenizer + structural %macro skip. ## No macro calls, no ## paste, no !@%$ or %select. The m1pp expander must ## match the C oracle byte-for-byte on this input. diff --git a/tests/m1pp/01-passthrough.expected b/tests/m1pp/01-passthrough.expected @@ -1,4 +1,4 @@ -## Phase 1 parity fixture: tokenizer + pass-through + structural %macro skip. +## Pass-through fixture: tokenizer + structural %macro skip. ## No macro calls , no ## paste , no !@%$ or %select. The m1pp expander must ## match the C oracle byte-for-byte on this input. diff --git a/tests/m1pp/02-defs.M1pp b/tests/m1pp/02-defs.M1pp @@ -1,5 +1,5 @@ -## Phase 2 parity fixture: %macro definitions are stored, not invoked. -## Defs produce no output; non-def tokens pass through as in Phase 1. +## %macro definitions are stored, not invoked. +## Defs produce no output; non-def tokens pass through unchanged. ## Exercises: 0/1/many params, multi-token bodies, string body tokens, ## body-internal ## paste, %macro-looking words mid-line, empty body. before diff --git a/tests/m1pp/02-defs.expected b/tests/m1pp/02-defs.expected @@ -1,4 +1,4 @@ -## Phase 2 parity fixture: %macro definitions are stored , not invoked. +## %macro definitions are stored , not invoked. ## Defs produce no output ## Exercises: 0/1/many params , multi-token bodies , string body tokens , ## body-internal ## paste , %macro-looking words mid-line , empty body. diff --git a/tests/m1pp/03-builtins.M1pp b/tests/m1pp/03-builtins.M1pp @@ -1,4 +1,4 @@ -# Phase 8 parity: each of !(1B), @(2B), %(4B), $(8B) emits little-endian +# Integer-emission builtins: !(1B), @(2B), %(4B), $(8B) emit little-endian # uppercase hex of (2 * size) chars. Exercises each size at: # - small literal so byte-order is observable # - hex literal that fills the slot exactly diff --git a/tests/m1pp/06-paste.M1pp b/tests/m1pp/06-paste.M1pp @@ -1,4 +1,4 @@ -# Phase 6 paste compaction. +# `##` paste compaction inside macro bodies. # - param ## param (basic) # - literal ## param / param ## literal # - chain: a ## b ## c (paste compactor processes left-to-right) diff --git a/tests/m1pp/11-local-labels.M1pp b/tests/m1pp/11-local-labels.M1pp @@ -1,8 +1,8 @@ -# Local labels (§1): `:@name` / `&@name` inside macro bodies rewrite to +# Local labels: `:@name` / `&@name` inside macro bodies rewrite to # `:name__NN` / `&name__NN` where NN is a fresh monotonic id per expansion. # Scoping: body-native only; param-substituted tokens pass through untouched. # -# Scenarios: +# Covers: # 1) a single macro using `:@end` called twice -> end__1, end__2 distinct # 2) nested macros each using `:@done` -> outer/inner get separate ids # 3) `&@label` address form rewrites the same way diff --git a/tests/m1pp/12-braced-args.M1pp b/tests/m1pp/12-braced-args.M1pp @@ -1,4 +1,4 @@ -# Braced block arguments (§2 of M1PP-EXT): +# Braced block arguments: # - { ... } groups tokens into one arg, protecting commas inside # - outer { ... } is stripped when the arg span begins with LBRACE and # ends with its matching RBRACE diff --git a/tests/m1pp/13-parenless.M1pp b/tests/m1pp/13-parenless.M1pp @@ -1,4 +1,4 @@ -# Paren-less 0-arg macro calls (M1PP-EXT §4): +# Paren-less 0-arg macro calls: # - a zero-param macro invoked without trailing () expands the same as with () # - applies at top level and as an atom inside %(...) expressions # - non-zero-param macros still require their (arg, ...) syntax — %add1 diff --git a/tests/m1pp/14-str-builtin.M1pp b/tests/m1pp/14-str-builtin.M1pp @@ -1,4 +1,4 @@ -# Phase 14 %str stringification builtin. +# %str stringification builtin. # - %str(IDENT) wraps the identifier text in double quotes # - result is a TOK_STRING, byte-identical to a hand-written literal diff --git a/tests/m1pp/14-str-paste.M1pp b/tests/m1pp/14-str-paste.M1pp @@ -1,4 +1,4 @@ -# Phase 14 paste + stringify. +# `##` paste + `%str` stringify composed on the same identifier. # - `##` joins word fragments: str_##n -> str_quote (TOK_WORD). # - `%str(n)` wraps the same identifier in quotes (TOK_STRING). # - Complementary operators: paste builds the label, %str builds the literal. diff --git a/tests/m1pp/15-struct.M1pp b/tests/m1pp/15-struct.M1pp @@ -1,8 +1,8 @@ -# %struct directive (M1PP-EXT §5): +# %struct directive: # - %struct NAME { f1 f2 ... } synthesizes N+1 zero-parameter macros: # NAME.field_k -> k*8 (decimal word) # NAME.SIZE -> N*8 -# - paren-less access (§4) is the natural read form: %closure.body +# - paren-less access is the natural read form: %closure.body # - composes via a plain wrapper macro using %frame_hdr.SIZE for stack- # frame layouts @@ -15,7 +15,7 @@ %closure.env %closure.SIZE -# With parens still works (§4 parity). +# With parens still works. %closure.body() # Inside an expression atom: loads 16+100 = 116 -> 0x74. diff --git a/tests/m1pp/16-enum.M1pp b/tests/m1pp/16-enum.M1pp @@ -1,4 +1,4 @@ -# %enum directive (M1PP-EXT §6): +# %enum directive: # - %enum NAME { l1 l2 ... } synthesizes N+1 zero-parameter macros: # NAME.label_k -> k # NAME.COUNT -> N diff --git a/tests/m1pp/17-scopes.M1pp b/tests/m1pp/17-scopes.M1pp @@ -0,0 +1,93 @@ +# Lexical scopes: `::name` / `&::name` rewrite at emit time against the +# current scope stack, joined by `__`. Resolution is anaphoric: a `::foo` +# token inside a macro body resolves against the caller's scope, not the +# macro's expansion id. +# +# Covers: +# - empty-stack pass-through: `::foo` -> `:foo`, `&::bar` -> `&bar` +# - basic scope: `::start` -> `:parse_number__start` +# - nested scopes: `::a` under [outer, inner] -> `:outer__inner__a` +# - scope name from a macro argument (loop_scoped pattern) +# - anaphoric macros: %break / %continue reach the caller's innermost scope +# - nested scope-introducing macros: innermost scope wins for %break +# - hygienic `:@` and anaphoric `::` coexist in one macro + +%macro loop_scoped(name, body) +%scope name +::top +body +LA_BR &::top +B +::end +%endscope +%endm + +%macro break() +LA_BR &::end +B +%endm + +%macro continue() +LA_BR &::top +B +%endm + +%macro while_scoped_nez(name, ra, body) +%scope name +LA_BR &@top +B +:@body +body +:@top +LA_BR &@body +BNEZ ra +::end +%endscope +%endm + +# Empty-stack pass-through. +::foo +&::bar + +# Basic scope. +%scope parse_number +::start +LA_BR &::done +BEQZ a0 +::done +ERET +%endscope + +# Nested scopes. +%scope outer +::before +%scope inner +::a +LA_BR &::a +B +%endscope +::after +%endscope + +# Scope name from macro arg; anaphoric %break / %continue. +%loop_scoped(scan, { +LI t0, 0 +%break() +%continue() +}) + +# Nested scope-introducing macro intercepts inner %break. +%loop_scoped(outer, { +%loop_scoped(inner, { +%break() +}) +%break() +}) + +# Hygienic `@` and anaphoric `::` together. +%while_scoped_nez(retry, a0, { +LI t1, 1 +%break() +}) + +END diff --git a/tests/m1pp/17-scopes.expected b/tests/m1pp/17-scopes.expected @@ -0,0 +1,91 @@ + + + + + + + + + + + + + + + + + + + +:foo +&bar + + +:parse_number__start +LA_BR &parse_number__done +BEQZ a0 +:parse_number__done +ERET + + +:outer__before +:outer__inner__a +LA_BR &outer__inner__a +B +:outer__after + + +:scan__top + +LI t0 , 0 +LA_BR &scan__end +B + +LA_BR &scan__top +B + + +LA_BR &scan__top +B +:scan__end + + + +:outer__top + +:outer__inner__top + +LA_BR &outer__inner__end +B + + +LA_BR &outer__inner__top +B +:outer__inner__end + +LA_BR &outer__end +B + + +LA_BR &outer__top +B +:outer__end + + + +LA_BR &top__8 +B +:body__8 + +LI t1 , 1 +LA_BR &retry__end +B + + +:top__8 +LA_BR &body__8 +BNEZ a0 +:retry__end + + +END diff --git a/tests/m1pp/_12-braced-malformed.M1pp b/tests/m1pp/_12-braced-malformed.M1pp @@ -5,8 +5,8 @@ # reports "unbalanced braces". # # No `.expected` file is needed — the leading underscore in the filename -# causes m1pp/test.sh to skip this fixture. It is verified manually via the -# verification block in the §2 implementation notes. +# causes m1pp/test.sh to skip this fixture. Run by hand to observe the +# non-zero exit with "unbalanced braces". %macro F(a, b) a b diff --git a/tests/m1pp/_14-str-malformed.M1pp b/tests/m1pp/_14-str-malformed.M1pp @@ -1,4 +1,4 @@ -# Phase 14 %str malformed input. +# %str malformed input. # - Underscore-prefix => skipped by test.sh. # - Expected outcome: m1pp exits non-zero. # - %str takes exactly one single-token WORD argument. A multi-token