m1pp extensions doc - boot2 - Playing with the boostrap

commit 6761b0608d69551ae4bfe6e9b5a932ddcafb7b26
parent ce7dbc4c1ff6509f28aa721ebffdffda6b6bb051
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 23 Apr 2026 16:53:48 -0700

m1pp extensions doc

Diffstat:
A docs/M1PP-EXT.md  | 656 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 656 insertions(+), 0 deletions(-)
diff --git a/docs/M1PP-EXT.md b/docs/M1PP-EXT.md
@@ -0,0 +1,656 @@
+# M1PP extensions for the seed Scheme interpreter
+
+Three independent additions to `m1pp/m1pp.c`, ordered by sequencing.
+
+Motivation: when writing the seed Lisp interpreter portably across three
+arches, most pain in `lisp/lisp.M1` traces to two things — hand-named
+scratch labels that collide when a pattern is reused, and argument
+substitution that can't carry instruction bodies (commas break the
+parser). `strlen` is the smaller third item: it removes a class of
+hand-counted-length bugs in error messages and string literals.
+
+## 1. Local labels
+
+### Syntax
+
+Two prefixed word forms, recognized only when they appear in a macro
+body as body-native tokens:
+
+- `:@name` — label definition, scoped to the current expansion
+- `&@name` — address-of reference, scoped to the current expansion
+
+### Semantics
+
+Each `%NAME(...)` invocation allocates a fresh expansion id `NN` from a
+global monotonic counter. While copying body-native tokens into the pool,
+any TOK_WORD whose text starts with `:@` or `&@` (and has ≥1 char after
+the `@`) is rewritten to the corresponding non-`@` form with `__NN`
+suffixed: `:@end` → `:end__42`, `&@end` → `&end__42`.
+
+**Scoping.** Rename body-native tokens only. Argument-substituted tokens
+pass through unchanged — they were already renamed under the caller's
+`NN` if the caller was itself a macro body. This gives lexical label
+scoping: nested and stacked macros each see their own labels, collisions
+are impossible.
+
+**Interaction with `##`.** None. The rename happens before the paste
+pass; a body `:@end##_lbl` renames `:@end` first, then pastes. Edge
+cases here should error out (pasting onto a renamed label is almost
+certainly a bug); leave it unconstrained for v1 and revisit if it
+bites.
+
+### Tokenizer
+
+No changes. `:@foo` / `&@foo` already tokenize as single TOK_WORD under
+the current word-terminator set (`m1pp.c:310`). The existing `@(...)`
+builtin dispatch keys on token text being exactly `@` followed by
+LPAREN, so `@foo` words do not collide.
+
+### m1pp.c touchpoints
+
+- One new static `int next_expansion_id` (monotonic, never reset).
+- `expand_macro_tokens` (`m1pp.c:670`): allocate `NN = ++next_expansion_id`
+  before the body-walk. Inside the body-copy loop, when about to push a
+  body-native TOK_WORD whose text starts with `:@` or `&@`:
+  - build the renamed text directly by appending bytes into `text_buf`:
+    copy the original token bytes (sigil + tail), append `__`, then
+    append the decimal digits of `NN`
+  - push a TOK_WORD pointing at the new text span
+
+Avoid `snprintf`. The m1m port (`docs/M1M-P1-PORT.md`) will reimplement
+every new m1pp feature in P1 assembly; varargs format parsing is a
+non-trivial thing to port. Plain byte appends plus a hand-rolled
+integer → decimal emit (the `display_uint` reverse-fill pattern already
+in `lisp/lisp.M1:2983`) port cleanly.
+
+Concretely in C: reserve a small stack scratch (say 16 bytes), fill
+digits right-to-left via repeated `%10` / `/=10`, then `append_text_len`
+the sigil bytes, the tail bytes, `"__"`, and the digit run. A 32-bit
+counter fits in 10 decimal digits; collision across a file is a
+non-concern because the counter is file-global and monotonic.
+
+No struct changes. No lexer changes. No new global syntax.
+
+## 2. Braced block arguments
+
+### Syntax
+
+Curly braces `{` and `}` group tokens into a single macro argument,
+protecting commas inside the group from the comma-splits-args rule.
+
+```
+%if_eq(r1, r2, {
+    li(r0)
+    %5
+    st(r0, r3, 0)
+})
+```
+
+Without braces, `st(r0, r3, 0)` exposes two commas at paren depth 1 and
+the call parses as 5 args instead of 3.
+
+### Semantics
+
+- `{` and `}` are new TOK kinds, tokenized as single-char delimiters.
+- In `parse_args`, a `brace_depth` counter runs parallel to the paren
+  `depth`. Commas at `depth == 1` split args **only when
+  `brace_depth == 0`**. LBRACE increments, RBRACE decrements.
+- When copying an arg span into a macro body, if the span begins with
+  TOK_LBRACE and ends with matching TOK_RBRACE at the outermost level,
+  strip the outer pair. Otherwise copy verbatim — `%foo(plain)` stays
+  working.
+- Braces never reach output. Either filter them during substitution or
+  make `emit_token` treat both kinds as no-ops (belt-and-braces; I'd
+  do both).
+
+### Nesting
+
+`{ { ... } }` nests via `brace_depth`. Braces inside a `"..."` string
+stay inside the string token — the lexer already handles that.
+
+Braces and parens are independent. `{ ( }` is syntactically fine in the
+arg-splitter; paren balancing only cares about LPAREN/RPAREN.
+
+### Tokenizer
+
+`lex_source` (`m1pp.c:232`): add LBRACE/RBRACE cases alongside the
+existing LPAREN/RPAREN cases (~10 lines). Add `{` and `}` to the
+word-terminator set at `m1pp.c:310`.
+
+### m1pp.c touchpoints
+
+- New TOK_LBRACE, TOK_RBRACE enum entries (`m1pp.c:77`).
+- `lex_source`: two new single-char token cases.
+- `parse_args` (`m1pp.c:543`): add `brace_depth` counter; gate the
+  comma-split on `brace_depth == 0`; LBRACE/RBRACE bump/drop it.
+- Arg copy (in `expand_macro_tokens`, via `copy_arg_tokens_to_pool` and
+  `copy_paste_arg_to_pool`): detect outer `{ ... }` wrapping and strip.
+  The `copy_paste_arg_to_pool` path (single-token arg for `##`) should
+  reject braced args — pasting onto a block is nonsense.
+- `emit_token`: no-op for both brace kinds (defensive; they shouldn't
+  reach here if substitution is clean).
+
+### What this does not give you
+
+A C-like block-statement form (`%if_eq(a,b) { … } %else { … } %endif`)
+needs `process_tokens` to recognize line-start block openers/closers —
+a separate, heavier change. Braced args get us
+`%if_eq_else(a, b, { then }, { else })` and `%while_nez(r, { body })`,
+which covers the patterns in lisp.M1 we care about. Defer the block-
+statement form until braced-arg shows real ergonomic pain.
+
+## 3. `strlen` expression op
+
+### Syntax
+
+A new unary op in the Lisp-shaped expression grammar:
+
+```
+(strlen "literal")
+```
+
+Composes with arithmetic like any other op:
+
+```
+%((+ (strlen "hello") 1))
+```
+
+### Semantics
+
+- Argument must be a single `TOK_STRING` atom (double-quoted form).
+- Value is the raw byte count between the quotes: `span.len - 2`.
+  Matches what M1's `"…"` emission writes before appending NUL.
+- Single-quoted `'…'` hex literals error out — strlen is meaningless
+  on raw hex.
+
+### No decimal emitter needed
+
+The 4-byte LE hex emitter `%(expr)` is sufficient. Two paths cover
+everything:
+
+1. Companion DEFINE:
+
+   ```
+   %macro defstr(label, text)
+   :label text
+   DEFINE label##_LEN %((strlen text))
+   %endm
+   ```
+
+   M1 substitutes `label_LEN` with its 4 hex bytes at each use site.
+
+2. Inline at an LI-immediate slot:
+
+   ```
+   li_r2 %((strlen "usage: …"))
+   ```
+
+   LI's inline literal slot takes 4 raw LE bytes; `05000000` and `%5`
+   are byte-equivalent there. lisp.M1 already relies on this (see
+   `DEFINE NIL 07000000` at `lisp/lisp.M1:30`, consumed as
+   `li_r0 NIL`).
+
+The 1/2/8-byte emitters (`!(e)`, `@(e)`, `$(e)`) cover non-4-byte widths
+if needed.
+
+### m1pp.c touchpoints
+
+- `EXPR_STRLEN` entry in the `ExprOp` enum (`m1pp.c:87`).
+- `expr_op_code` (`m1pp.c:751`): match the word `strlen`.
+- Eval path: `strlen` is a degenerate case — its "argument" is a
+  TOK_STRING, not a recursive expression. Easiest is a special-case
+  branch in `eval_expr_range` (`m1pp.c:976`) that handles `(strlen
+  "...")` directly rather than routing through `eval_expr_atom`.
+  Emit `span.len - 2` as the value.
+- Alternative: extend `eval_expr_atom` to accept TOK_STRING atoms with
+  value `len - 2`, and treat `strlen` as identity. Cleaner
+  composition but more surface area; defer unless needed.
+
+## 4. Paren-less 0-arg macro calls
+
+### Syntax
+
+A macro defined with zero parameters may be called without trailing
+`()`:
+
+```
+%macro FRAME_BASE()
+16
+%endm
+
+%((+ %FRAME_BASE 8))           ## paren-less
+%((+ %FRAME_BASE() 8))         ## still works
+```
+
+### Semantics
+
+- When `find_macro` matches a `%NAME` token and the macro's
+  `param_count == 0`, the expansion triggers whether or not an LPAREN
+  follows.
+- Applies in both contexts where a macro call is currently recognized:
+  top-level processing in `process_tokens`, and atom position in
+  `eval_expr_atom` so a 0-arg macro is a valid expression atom inside
+  `%(...)`.
+- Non-zero-param macros still require their existing `(arg, ...)`
+  syntax.
+- `%foo` where `foo` is not defined as a macro still passes through
+  unchanged — the match only fires when a matching 0-param macro
+  exists. Backward compatible.
+
+### Why it matters
+
+The one feature that needs it is `%struct` field access (§5). Once
+`NAME.field` expands to an integer, writing `%NAME.field` reads as a
+named constant; `%NAME.field()` looks like a function call. The
+relaxation is also load-bearing for expression-level composition:
+`%((+ %frame_hdr.SIZE %frame_apply.callee))` needs both atoms to
+resolve as 0-arg calls inside the evaluator.
+
+### m1pp.c touchpoints
+
+- `process_tokens` (`m1pp.c:1225`): the LPAREN-next guard becomes
+  "LPAREN-next OR `param_count == 0`."
+- `eval_expr_atom` (`m1pp.c:944`): same relaxation on the same guard.
+- The zero-param paren-less path constructs an empty arg list and
+  calls `expand_macro_tokens` with `arg_count == 0` — no
+  `parse_args` change.
+- No lexer changes, no new token kinds, no new Macro fields.
+
+## 5. `%struct` directive
+
+### Syntax
+
+A top-level directive declaring a fixed-layout aggregate of 8-byte
+fields:
+
+```
+%struct closure { hdr params body env }
+```
+
+Fields are bare identifiers separated by whitespace and/or commas.
+The closing brace terminates the declaration.
+
+### Semantics
+
+Expands at declaration time to N+1 zero-parameter macros:
+
+- `NAME.field_k` → `k * 8` for each field at index k
+- `NAME.SIZE` → `N * 8`
+
+All fields are 8-byte words. Mixed widths are deferred until a real
+use case appears.
+
+Callers consume these as paren-less 0-arg calls (per §4):
+
+```
+ld(r0, r1, %closure.body)
+enter(%frame_apply.SIZE)
+```
+
+### No `base=` parameter
+
+The struct primitive declares offsets from zero. Base offsets (e.g.
+for stack-frame locals sitting above the retaddr/caller-sp header)
+compose at the call site via an ordinary wrapper macro:
+
+```
+%struct frame_hdr { retaddr caller_sp }        ## SIZE = 16
+
+%macro frame(field)
+%((+ field %frame_hdr.SIZE))
+%endm
+
+%struct frame_apply { callee args body env }
+
+:apply
+    enter(%frame_apply.SIZE)
+    st(r1, sp, %frame(%frame_apply.callee))    ## 0 + 16 = 16
+    st(r2, sp, %frame(%frame_apply.args))      ## 8 + 16 = 24
+    …
+    leave()
+    ret
+```
+
+Heap structs access fields directly (`%closure.body`); stack frames
+route through the `%frame` wrapper. Same primitive, two conventions,
+no special casing inside `%struct`. If a function needs a different
+base (e.g. a permanent spill prefix), define `%frame_big(field)`
+alongside `%frame` — the struct declarations don't change.
+
+### Tokenizer
+
+- `.` is already a word char, so `NAME.field` tokenizes as one
+  TOK_WORD under the current word-terminator set (`m1pp.c:310`).
+- `{` / `}` reuse the TOK_LBRACE / TOK_RBRACE kinds introduced for §2.
+  `%struct` cannot land before §2 does.
+
+### m1pp.c touchpoints
+
+- New top-level directive branch in `process_tokens` (`m1pp.c:1192`)
+  alongside the existing `%macro` detection. At line-start, if the
+  first word is `%struct`:
+  - consume name, `{`, field-identifier list (WORD tokens,
+    comma-or-whitespace separated), `}`, trailing newline
+  - for each field k, generate an entry in `macros[]`:
+    - name = synthesized `"NAME.field_k"` in `text_buf`
+    - `param_count = 0`
+    - body = a single TOK_WORD whose text is the decimal rendering of
+      `k * 8` in `text_buf`
+  - emit a final `"NAME.SIZE"` entry pointing at `N * 8`
+- Integer → decimal rendering reuses the hand-rolled reverse-fill
+  pattern from §1 local labels — no `snprintf`.
+- No new expression-evaluator surface; consumption goes through the
+  existing `find_macro` + `eval_expr_atom` path once §4 lands.
+- No new Macro struct fields. A struct-generated macro is
+  indistinguishable from any other 0-param macro once declared.
+
+### What this does not give you
+
+- **Mixed-width fields.** All offsets are `k * 8`. The packed 8-bit
+  type + 8-bit gc-flags + 48-bit length header in lisp.M1 is easier
+  to handle with dedicated bit-op macros than struct syntax; defer.
+- **Bundled enter/leave per frame.** A `%frame NAME { … }` directive
+  that also emits ENTER/LEAVE around a body would bring back the
+  block-body problem and tightly couple locals to one function shape.
+  The call-site verbosity savings don't pay; use plain `%struct` plus
+  a wrapper macro.
+
+## 6. `%enum` directive
+
+### Syntax
+
+A top-level directive declaring an incrementing sequence of named
+integer constants:
+
+```
+%enum tag { fixnum pair vector string symbol proc singleton }
+%enum prim_id { add sub mul div mod eq lt gt ... }
+```
+
+### Semantics
+
+Expands at declaration time to N+1 zero-parameter macros:
+
+- `NAME.label_k` → `k` for each label at index k
+- `NAME.COUNT` → `N`
+
+Callers consume these as paren-less 0-arg calls (per §4):
+
+```
+li_r2 %tag.pair                       ## loads 1
+%((= %prim_id.COUNT 45))              ## compile-time sanity check
+```
+
+### Relationship to `%struct`
+
+Implementation-wise, `%enum` is `%struct` with stride 1 instead of 8
+and a totalizer named `COUNT` instead of `SIZE`. The directive
+parser, brace consumption, field-list parsing, and macro-generation
+loop are all shared. Factor the §5 implementation around one helper
+parameterized by `(stride, totalizer_name)`:
+
+- `%struct` → `define_fielded(8, "SIZE")`
+- `%enum`   → `define_fielded(1, "COUNT")`
+
+No separate code path; adding `%enum` is a second line-start
+directive check in `process_tokens` plus one call to the shared
+helper.
+
+### Why it matters
+
+lisp.M1 maintains two hand-numbered integer enumerations whose
+numbering must stay in sync across disjoint sites:
+
+- Tag codes (`lisp/lisp.M1:35–47`) referenced throughout the
+  reader / eval / printer dispatchers.
+- Primitive code IDs — used by the registration table and the
+  dispatch cascade (`lisp/lisp.M1:3843–3983`). Inserting a new
+  primitive in the middle shifts every downstream id; silent drift,
+  no error until runtime.
+
+`%enum` eliminates both drift classes: names declared once,
+referenced by name everywhere, renumbering on insertion is automatic.
+
+### m1pp.c touchpoints
+
+Same as §5 with the two parameter differences above. No new Macro
+struct fields, no new token kinds, no new expression-evaluator
+surface.
+
+### What this does not give you
+
+- **Explicit values.** `%enum foo { a=5 b c }` is not supported in
+  v1. All values are consecutive from 0. C's explicit-value form
+  is useful when matching external ABIs; our enums are internal, so
+  defer until a real use case appears.
+- **Flag/bitmask enums.** Not specially supported. If you want bit
+  positions, declare the bit index via `%enum` and take
+  `(1 << %NAME.flag_k)` at use sites.
+
+## 7. `%str` stringification builtin
+
+### Syntax
+
+A new builtin alongside `!(e)`, `@(e)`, `%(e)`, `$(e)`, `strlen`,
+and `%select`:
+
+```
+%str(IDENT)
+```
+
+Takes a single WORD-token argument; produces a TOK_STRING literal
+whose contents are the argument's text wrapped in double quotes:
+
+```
+%macro quoteit(name)
+%str(name)
+%endm
+
+%quoteit(hello)          →  "hello"
+%quoteit(foo_bar)        →  "foo_bar"
+```
+
+### Semantics
+
+- Exactly one argument, kind TOK_WORD. Multi-token, pasted, or
+  already-string args error out.
+- Output is a freshly-allocated TOK_STRING span in `text_buf` built
+  as `"` + original_text + `"`. The span's `len` is
+  `original_len + 2`, so `strlen` on the result (per §3) returns
+  `original_len` — the char count between the quotes, matching
+  what M1's `"…"` emission writes before the NUL.
+- Produces a string literal, not a word. Complementary to `##`, not
+  a replacement — see below.
+
+### Relationship to `##` paste
+
+Both turn a parameter into something else, but they produce
+**different token kinds** and serve **different goals**:
+
+| operator | inputs          | output           | kind       |
+|----------|-----------------|------------------|------------|
+| `##`     | two WORD tokens | one WORD token   | TOK_WORD   |
+| `%str`   | one WORD token  | one STRING token | TOK_STRING |
+
+`##` joins word fragments to build identifiers / label names.
+`%str` wraps a word in quotes to produce a string literal. They
+can't substitute for each other:
+
+- `:str_quote` (a label definition) must be a word — `##` can
+  build it, `%str` can't.
+- `"quote"` (a string literal) must introduce quote characters —
+  `%str` is the only way to manufacture it from a bare identifier,
+  paste can't.
+
+M1 sees the difference too: `:str_quote "quote"` is a label-def
+word followed by a quoted-bytes directive (5 bytes + NUL). Paste
+manufactures the first, stringify the second, both from the same
+source identifier.
+
+### Why it matters
+
+Every special-form symbol in lisp.M1 (`lisp/lisp.M1:164–260`)
+follows the same triad, written longhand 15 times today:
+
+```
+:str_quote "quote"
+DEFINE str_quote_LEN 05000000
+:sym_quote %0 %0
+```
+
+With `##` and `%str` together, one declarative site per symbol:
+
+```
+%macro defsym(name)
+:str_##name %str(name)
+DEFINE str_##name##_LEN %((strlen %str(name)))
+:sym_##name %0 %0
+%endm
+
+%defsym(quote)
+%defsym(if)
+%defsym(begin)
+…
+```
+
+- `##name` builds the label identifiers (`str_quote`, `sym_quote`).
+- `%str(name)` builds the string literal (`"quote"`).
+- `(strlen %str(name))` computes the length for the DEFINE.
+- One source of truth per symbol — the identifier itself.
+
+Without `%str`, callers would have to pass the string explicitly
+(`%defsym(quote, "quote")`). That works today with zero m1pp
+changes but invites drift between the identifier and its
+spelled-out string form — nothing at compile time flags a typo
+where the two disagree.
+
+### Why a builtin, not a `#x` sigil
+
+cpp uses `#x` inside macro bodies to stringify a parameter. That
+shape doesn't port cleanly to m1pp because `#` is already the
+line-comment starter (`m1pp.c:278`). Giving `#` dual duty would
+create parse ambiguity in `lex_source`.
+
+`%str(x)` reuses the existing builtin-dispatch plumbing — the same
+path that handles `! / @ / % / $ / %select` — and reads uniformly
+with the other text and numeric builtins.
+
+### Tokenizer
+
+No changes. Existing TOK_STRING machinery handles the output;
+`%str` is a word token recognized as a builtin in `process_tokens`.
+
+### m1pp.c touchpoints
+
+- `process_tokens` (`m1pp.c:1211`): extend the builtin-dispatch
+  guard to accept `%str` alongside `! @ % $ %select`.
+- `expand_builtin_call` (`m1pp.c:1092`): add a branch for `%str`.
+  Arg-count check: exactly 1. Arg-shape check: exactly one token,
+  kind TOK_WORD. Anything else errors.
+- Stringification body: compute `out_len = arg.text.len + 2`,
+  reserve that many bytes via `append_text_len`, write `"`,
+  the original bytes, `"`. Push a TOK_STRING pointing at the new
+  span.
+- No `snprintf` — plain byte copies, straightforward port.
+- No new token kinds, no new Macro fields.
+
+### What this does not give you
+
+- **Stringification of non-parameter tokens.** Only single-token
+  WORD args. `%str(foo bar)` or `%str("already a string")` both
+  error. Wider forms are cpp-ish; defer until a real use case
+  appears.
+- **Escape processing inside the stringified text.** The input is
+  a bare identifier — no quotes, backslashes, or whitespace to
+  escape. If `%str` is ever extended to take broader token spans,
+  escape handling becomes relevant then.
+
+## Per-feature implementation sequence
+
+Each of the three features lands in the same three ordered steps. Do
+not skip or reorder — the tests exist to pin behavior before the
+port, and the port exists because the C expander is disposable.
+
+1. **Implement in `m1pp/m1pp.c`.** The C expander is the oracle. Land
+   the feature here first so there is something to diff against.
+2. **Add a test in `tests/m1pp/`.** New `NN-name.M1pp` +
+   `NN-name.expected` pair following the existing numbering (see
+   `tests/m1pp/` — current fixtures run 00 through 10), **or** extend
+   an existing fixture when the feature is a natural addition to one
+   (e.g. `strlen` goes into `04-expr-ops.M1pp` alongside the other
+   expression ops rather than getting its own file). For malformed-
+   input features, the expected artifact is a non-zero exit; document
+   that in the fixture.
+3. **Add to `m1pp/m1pp.M1`.** Port the feature to the pure-P1
+   implementation of m1pp so the seed bootstrap doesn't depend on the
+   host C expander. The test from step 2 runs against both `m1pp` (C)
+   and `m1m` (P1) and must produce byte-identical output; that parity
+   is what `docs/M1M-P1-PORT.md` calls "C-oracle comparison."
+
+Shipping a feature means all three steps are done. A half-landed
+feature (C only, or C + test but no port) blocks the next feature in
+the sequencing list below.
+
+## Cross-feature sequencing
+
+1. **Local labels.** Smallest patch, immediately useful — enables
+   straight-line macros like `%case_tag` and `%tag_dispatch` that
+   want one or two internal labels without hand-naming.
+2. **Braced args.** Unlocks structured `%if_eq_else` / `%while_nez`
+   that carry instruction bodies. Depends on (1) in practice — the
+   bodies reference labels defined in the surrounding macro.
+3. **`strlen`.** Independent of the other two. Land when the first
+   `%defstr` call site shows up.
+4. **Paren-less 0-arg macro calls.** Independent small relaxation of
+   two guards (one in `process_tokens`, one in `eval_expr_atom`).
+   Useful on its own for constants-as-macros; load-bearing for (5).
+5. **`%struct`.** Depends on (2) for the brace token kinds and (4)
+   for paren-less access syntax. Land only after both.
+6. **`%enum`.** Same dependencies as (5). Share the
+   directive-handler implementation with `%struct` — land together
+   or back-to-back.
+7. **`%str`.** Independent of everything else. Pairs naturally with
+   (3) `strlen` in the `%defsym` pattern but has no build-order
+   dependency on it. Land when the first `%defsym`-style
+   declarative macro shows up.
+
+Each is a self-contained patch. No cross-dependencies beyond the
+sequencing above and the three-step rule per feature.
+
+## Per-feature acceptance fixtures
+
+- **Local labels:** two fixtures — a single macro using `:@end` and
+  calling itself twice in one function (must produce distinct labels),
+  and nested macros each using `:@done` (must not collide). Assemble
+  through M1 + hex2 clean on at least one arch.
+- **Braced args:** fixture exercising a body with commas
+  (`st(r0, r3, 0)`), a body with nested braces, and a malformed
+  fixture (unmatched `{`) that exits non-zero.
+- **`strlen`:** fixture with `DEFINE X_LEN %((strlen "hello"))`
+  followed by `li_r2 X_LEN` — binary must load the value 5 and
+  syscall-exit 5 on all three arches via the existing P1 differential
+  harness.
+- **Paren-less 0-arg calls:** fixture with a 0-param macro invoked
+  both with and without trailing `()`, in top-level position and as
+  an atom inside `%(...)` expressions; all forms must produce
+  byte-identical output against a control fixture that always uses
+  `()`.
+- **`%struct`:** fixture declaring a 4-field struct, accessing each
+  field via paren-less calls, and layering a `%frame` wrapper using
+  `%frame_hdr.SIZE` composition (per the doc example); build on all
+  three arches and exit with a sentinel computed from both the
+  struct-level `.SIZE` and the wrapped base offset, proving the
+  compose-and-add path resolves correctly.
+- **`%enum`:** fixture declaring an enum with 3+ labels, referencing
+  each via paren-less call, and asserting `%NAME.COUNT` equals the
+  label count via a `%(=)` expression that feeds an exit code;
+  build on all three arches. Share fixture scaffolding with the
+  `%struct` test where practical.
+- **`%str`:** two fixtures — (a) a macro using `%str(name)` in its
+  body, compared against a control that writes the literal
+  `"name"` string directly (byte-identical output); (b) combined
+  paste + stringify, `%macro defsym(n) :str_##n %str(n) %endm`
+  invoked with distinct identifiers, assembled through M1 + hex2,
+  each generated label must point at the correctly-spelled string
+  bytes. A third malformed fixture (`%str(a b)` or
+  `%str("already_string")`) must exit non-zero.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs