scheme1 explorations - boot2 - Playing with the boostrap

commit 76fd67990e47b226037ab1da73887d10684e5aad
parent 632d7039ab4ccf96a93577a762fc39c5e188bb13
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed, 20 May 2026 09:33:49 -0700

scheme1 explorations

Diffstat:
A docs/MACROS.md  | 372 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A docs/SCHEME1-GC.md  | 427 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

2 files changed, 799 insertions(+), 0 deletions(-)
diff --git a/docs/MACROS.md b/docs/MACROS.md
@@ -0,0 +1,372 @@
+# Macros (R7RS `syntax-rules`) for scheme1
+
+Plan for adding hygienic `syntax-rules` macros to `scheme1.P1pp`.
+Conforms to R7RS-small with the coverage limits noted in
+[Scope](#scope).
+
+## Architecture
+
+Two-layer split:
+
+- **P1 runtime** (in `scheme1.P1pp`) provides the *minimum* needed to
+  recognize and dispatch macro bindings: a new heap-object type, three
+  special forms (`define-syntax`, `let-syntax`, `letrec-syntax`), an
+  expansion hook in `eval`, and a `gensym` primitive.
+- **Scheme engine** (in `prelude.scm`) implements `syntax-rules` itself
+  as a regular procedure that returns a transformer closure. Pattern
+  matching, template rendering, ellipsis expansion, and gensym-based
+  hygiene all live here.
+
+Rationale: a hygienic matcher/renderer is ~300 lines of dense list
+manipulation. Writing it in Scheme is straightforward and debuggable;
+writing it in P1pp is neither. The runtime cost is one extra closure
+call per macro use — negligible at the scale this interpreter
+operates.
+
+## Scope
+
+Covered in v1:
+
+- `define-syntax`, `let-syntax`, `letrec-syntax`.
+- `syntax-rules` with literals list, single-depth ellipsis (`...`),
+  underscore wildcard (`_`), and improper-tail patterns
+  (`(p1 p2 ... . ptail)`).
+- Hygiene: every template-introduced identifier (not a pattern
+  variable, not a literal) is gensym-renamed per expansion so it
+  cannot shadow a user binding at the use site.
+
+Deferred (not v1):
+
+- Multi-depth ellipsis (a pattern variable appearing under two or more
+  `...`s, producing nested-list captures).
+- Custom ellipsis identifier (`(syntax-rules ::: ...)`).
+- `syntax-case` / `er-macro-transformer` / explicit-renaming variants.
+- Macro-introduced top-level definitions (`define` inside an
+  expansion). Internal `define` is already rejected by scheme1.
+
+## Type additions
+
+### `HDR.MACRO` heap object
+
+Added to the existing `HDR` enum (currently
+`{BV CLOSURE PRIM TD REC MV}`).
+
+Layout (tagged HEAP, 16 bytes):
+
+```
+struct MACRO {
+    hdr           ; HDR.MACRO
+    transformer   ; tagged closure: (lambda (form) -> form)
+}
+```
+
+The transformer is a Scheme closure built by `syntax-rules` at the
+time `define-syntax` (or `let-syntax`/`letrec-syntax`) is evaluated.
+It takes the entire macro form as a quoted datum (e.g. `'(my-when t
+b1 b2)`) and returns a quoted datum that `eval` then re-evaluates.
+
+Touchpoints for the GC tracer (when GC lands): trace the
+`transformer` slot. For `equal?` / `write` / `display`: render as
+`#<macro>`; not structurally comparable.
+
+## Runtime changes
+
+### Special-form dispatch
+
+Three new entries in the `dispatch_form` table inside `eval`'s pair
+branch:
+
+```
+%dispatch_form(&sym_define_syntax,  &.do_define_syntax)
+%dispatch_form(&sym_let_syntax,     &.do_let_syntax)
+%dispatch_form(&sym_letrec_syntax,  &.do_letrec_syntax)
+```
+
+Cached symbol slots and `intern_special_forms` updated alongside.
+
+### `eval_define_syntax`
+
+```
+(define-syntax name transformer-expr)
+```
+
+1. Evaluate `transformer-expr` in the current env. Must yield a
+   closure (`HDR.CLOSURE`) — error otherwise (`die(msg_bad_syntax)`).
+2. `alloc_hdr(MACRO.SIZE, HDR.MACRO)`; store the closure in the
+   transformer slot.
+3. Bind the symbol's *global* slot to the tagged MACRO. (Same global
+   binding mechanism as `define`.)
+4. Return UNSPEC.
+
+`alloc_hdr_main` is the right allocator: the macro outlives any
+scratch-heap reset, just like `define-record-type`'s TD.
+
+### `eval_let_syntax` / `eval_letrec_syntax`
+
+```
+(let-syntax    ((name transformer-expr) ...) body ...)
+(letrec-syntax ((name transformer-expr) ...) body ...)
+```
+
+Both extend the **lexical** env with `(name . macro-obj)` pairs.
+`let-syntax` evaluates each `transformer-expr` in the *outer* env;
+`letrec-syntax` first installs all bindings as placeholders, then
+evaluates each transformer in the new env (allowing mutual
+recursion). Body evaluates in the extended env via `eval_body`.
+
+The lexical env is the existing alist `((sym . val) ...)`. Macro
+values are tagged HEAP objects, so they coexist with regular value
+bindings without changing the alist shape.
+
+### Macro dispatch in `eval`
+
+After special-form dispatch in the pair branch, before the apply
+path: when the head is a symbol, resolve it (lexical env first, then
+global slot). If the resolved value is a `HDR.MACRO`:
+
+1. Build the form: re-cons the head onto the args, *unevaluated* —
+   essentially the original `expr`. Already in hand as `expr`.
+2. Call the transformer closure with this form as its single
+   argument. (Reuse `apply` with a one-element args list.)
+3. Tail-call `eval` on the returned form in the *current* env.
+
+Pseudocode (in P1 terms inside `eval`):
+
+```
+:.maybe_macro
+%hdr_type(t0, resolved)
+%bine(t0, %HDR.MACRO, &.apply_normal, t1)
+%heap_ld(a0, resolved, %MACRO.transformer)   ; closure
+%ldl(a1, expr)                                ; the form (quoted)
+%li(a2, %imm_val(%IMM.NIL))
+%call(&cons)                                  ; args = (form)
+%mov(a1, a0)
+%heap_ld(a0, ..., transformer)
+%call(&apply)
+%mov(a0, ...)                                 ; expanded form
+%ldl(a1, env)
+%tail(&eval)
+```
+
+Ordering: macro check happens *after* the static special-form
+dispatch table, so user code cannot redefine `if`, `lambda`, etc.
+via `define-syntax`. (That's a deliberate restriction; lifting it
+would require turning every special form into a default macro
+binding that the user can override.)
+
+### `gensym` primitive
+
+```
+(gensym)        ; -> fresh symbol, e.g. g.0, g.1, ...
+(gensym "tag")  ; -> g.tag.N
+```
+
+Implementation: a process-global counter `gensym_counter` in BSS.
+Builds a name by `format`-ing `g.<n>` (or `g.<tag>.<n>`) into a
+scratch buffer, then calls `intern` so the symbol gets a stable
+slot in the symtab. Because `intern` is keyed on the byte string,
+distinct counters always produce distinct symbols.
+
+The interned name can never collide with user identifiers because
+user identifiers cannot contain `.` followed by a digit at the
+exact pattern produced — *except* that scheme1 *does* allow `.` in
+identifiers. Mitigation: prefix with a byte that the reader rejects
+in user identifiers but allows when interning programmatically.
+Cleanest: pick a leading byte outside the reader's identifier
+charset (e.g. `\x01`) so user code cannot construct a colliding
+name through the reader. The byte is invisible in `display` output
+unless the user explicitly writes the symbol — acceptable.
+
+Added to `prim_table`: `gensym`.
+
+## Scheme engine (`syntax-rules` in prelude.scm)
+
+`syntax-rules` is a regular procedure. Its result is a closure of
+the form `(lambda (form) ...)`.
+
+```
+(syntax-rules literals rule ...)
+  ;; literals = list of identifiers
+  ;; rule     = (pattern template) where pattern is (head . sub-pattern)
+```
+
+Top-level call shape after macro expansion of the special form:
+
+```
+(define-syntax foo (syntax-rules (lit) ((foo p) t) ...))
+```
+
+`define-syntax`'s argument is just the `(syntax-rules ...)`
+expression. Eval evaluates that expression, getting a closure, then
+wraps it in HDR.MACRO. So `syntax-rules` must be available in the
+prelude before any macro is defined.
+
+### Algorithm — pattern matching
+
+`(match-pattern pat form literals)` returns either `#f` (no match)
+or an alist of `(pattern-var . captured-form-or-list)`.
+
+Pattern dispatch:
+
+| Pattern shape           | Match rule                                                |
+|-------------------------|-----------------------------------------------------------|
+| `_`                     | Match anything; no binding.                               |
+| `<id>` ∈ literals       | Match iff `form` is the same symbol (`eq?`).              |
+| `<id>` (otherwise)      | Match anything; bind `id` → `form`.                       |
+| `()`                    | Match iff `form` is `'()`.                                |
+| `(p_head . p_rest)`     | See ellipsis / improper / proper rules below.             |
+| `<literal-datum>`       | Match by `equal?` (numbers, strings, booleans, chars).    |
+
+Pair-pattern cases:
+
+1. **Ellipsis**: pattern is `(p ... . p_tail)` where `p ...` is the
+   ellipsis element. Greedily consume as many leading elements of
+   `form` as match `p`, collecting per-pattern-var captures into
+   parallel lists. Then match the remaining tail against `p_tail`.
+2. **Improper tail without ellipsis**: pattern is `(p1 ... . p_tail)`
+   (dot before a non-ellipsis tail). Match each `p_i` against
+   `(list-ref form i)`, then match `p_tail` against the dotted rest.
+3. **Proper list**: lengths must match exactly; element-wise
+   recursion.
+
+### Ellipsis capture shape
+
+When `p` (under `...`) contains pattern vars `v1, v2, ...`, after
+consuming `n` elements, each `v_i` is bound to a *list of length n*
+of the values captured at that position. (Empty list when `n = 0`.)
+
+Each pattern-var binding is annotated with its **depth**: 0 for a
+plain capture, 1 for ellipsis-captured. Depth-1 only in v1.
+
+### Algorithm — template rendering
+
+`(render template bindings rename-map)` walks the template and
+produces an output form.
+
+- Symbol that is a depth-0 pattern var → substitute its captured
+  form.
+- Symbol that is a depth-1 pattern var **outside** of an ellipsis
+  context → error (arity mismatch).
+- `(t ... . t_tail)`: for each ellipsis-relevant pattern var
+  appearing in `t` (the sub-template before `...`), look up its
+  list of captures. All such lists must have equal length `n`
+  (else error). Render `t` `n` times, each time with depth-1 vars
+  shadowed by their `i`-th element. Splice the results into the
+  output, then continue with `t_tail`.
+- Symbol that is a literal or free identifier → look up in
+  `rename-map`; emit the renamed symbol if present, else emit
+  unchanged.
+- Pair (non-ellipsis) → recurse into car and cdr.
+- Other (literal datum, `'()`) → emit unchanged.
+
+### Hygiene: identifier renaming
+
+Before each expansion, walk the template and collect every symbol
+that is **not**:
+- a pattern variable in `bindings`, or
+- a literal in the rule's literals list, or
+- a reference to a variable visible at the macro's *use site* via
+  the standard scope chain (we approximate this as: any symbol not
+  introduced by the template is fine to leave alone).
+
+In a single-global-env Scheme, the simpler rule "rename every
+template-only identifier that binds a name" suffices: rename
+identifiers that appear in *binding positions* of constructs the
+template introduces (`lambda` formals, `let` bound names, `define`
+names). Identifiers in operator/operand positions don't need
+renaming because they reference globals or pattern-bound values.
+
+Implementation: walk the template once, collect a set of
+"binding-position" identifiers (this requires a small table of
+which forms bind names: `lambda`, `let`, `let*`, `letrec`,
+`let-values`, `let*-values`, `do`, `define`, `define-record-type`).
+For each, allocate a fresh `gensym` name and substitute every
+occurrence in the template (binding *and* references) consistently.
+
+This is conservative — it sometimes renames identifiers that
+wouldn't have collided — but that's fine, it's still hygienic.
+
+### Putting it together — the transformer closure
+
+```
+(define (syntax-rules literals . rules)
+  (lambda (form)
+    (let loop ((rs rules))
+      (cond
+        ((null? rs)
+         (error "no syntax-rules pattern matched" form))
+        (else
+         (let* ((rule (car rs))
+                (pat (car rule))
+                (tpl (cadr rule))
+                (b   (match-pattern pat form literals)))
+           (if b
+               (let ((rmap (build-rename-map tpl literals b)))
+                 (render tpl b rmap))
+               (loop (cdr rs)))))))))
+```
+
+`match-pattern`, `build-rename-map`, and `render` are the three
+core helpers; together ~300 lines.
+
+## Files / line budget
+
+| Location                              | Add        |
+|---------------------------------------|------------|
+| `scheme1.P1pp` HDR enum               | +1 line    |
+| `scheme1.P1pp` MACRO struct           | +3 lines   |
+| `scheme1.P1pp` `intern_special_forms` | ~8 lines   |
+| `scheme1.P1pp` eval dispatch + macro hook | ~30 lines  |
+| `scheme1.P1pp` `eval_define_syntax`   | ~25 lines  |
+| `scheme1.P1pp` `eval_let_syntax`      | ~50 lines  |
+| `scheme1.P1pp` `eval_letrec_syntax`   | ~50 lines  |
+| `scheme1.P1pp` `prim_gensym_entry`    | ~25 lines  |
+| `scheme1.P1pp` writer `#<macro>` case | ~5 lines   |
+| `scheme1.P1pp` prim_table + names     | +3 lines   |
+| `prelude.scm` `syntax-rules` engine   | ~350 lines |
+
+Total: ~200 lines P1pp, ~350 lines Scheme.
+
+## Testing
+
+Add a `tests/scheme1/13x-macro-*.scm` series. Minimum coverage:
+
+- `130-macro-basic.scm` — `(define-syntax my-when ...)`, simple
+  fixed-arity expansion.
+- `131-macro-ellipsis.scm` — `my-when` with `body ...`, `my-list`
+  (`(x ...)` → `(list x ...)`), zero-element case.
+- `132-macro-let.scm` — re-implement `let` via `syntax-rules` with
+  inner-shape pattern `((name val) ...)`. Verify against builtin.
+- `133-macro-tail.scm` — improper-tail pattern, e.g. `(_ x . rest)`.
+- `134-macro-literals.scm` — literal `else`-style identifier in a
+  cond-shaped macro.
+- `135-macro-hygiene.scm` — macro that introduces a binding that
+  would shadow a user var; verify the user var still resolves.
+  Classic test: `(define-syntax swap! ...)` using a temp.
+- `136-let-syntax.scm` — local `let-syntax`, including a body that
+  references a same-named global value — must shadow correctly and
+  un-shadow outside the body.
+- `137-letrec-syntax.scm` — two mutually-recursive transformers
+  (rare in practice but spec-required).
+- `138-no-match.scm` — fall-through error path.
+
+Each fixture is a normal scheme1 test (`.scm` + `.expected-exit`),
+runnable via `tests/boot-run-scheme1.sh`.
+
+## Open caveats
+
+- **Top-level `define` inside expansions is rejected.** scheme1
+  rejects internal `define`, and that check fires *after* macro
+  expansion. A macro that expands to `(begin (define x 1) ...)` at
+  the top level works; the same expansion inside a `lambda` body
+  does not. Document, don't fix.
+- **Macros cannot override built-in special forms.** Dispatch
+  checks special-form symbols before macro lookup. Lifting this
+  would require representing the special forms as default
+  bindings.
+- **No source-location tracking.** Errors from inside expansions
+  point at the macro implementation, not the use site. Consistent
+  with scheme1's existing error story.
+- **`equal?` on macros is reference equality only.** Two
+  `syntax-rules` expressions compiled separately are distinct
+  closures. Not specified by R7RS as comparable.
diff --git a/docs/SCHEME1-GC.md b/docs/SCHEME1-GC.md
@@ -0,0 +1,427 @@
+# scheme1 GC
+
+Spec for adding a Cheney-style semispace copying garbage collector to
+`scheme1/scheme1.P1pp`. Replaces the main heap's bump-only allocator
+with a moving collector; leaves the scratch heap and the
+`heap-mark` / `heap-rewind!` family alone.
+
+## Goals & non-goals
+
+Goals:
+- Reclaim unreachable main-heap objects automatically. Programs no
+  longer depend on `heap-mark` / `heap-rewind!` discipline to stay
+  within the 256 MiB main-heap reservation.
+- Preserve every observable semantic of the existing interpreter
+  (`eq?`, mutation through `set-car!` / `set-cdr!` / `bytevector-u8-set!`,
+  closure capture, record identity).
+- Keep the scratch-heap, `heap-mark`, `heap-rewind!`, `use-scratch-heap!`
+  surface working unchanged. `cc.scm` must run with no source change.
+- Roots are unambiguous: every Scheme value live across an allocation
+  point lives in a place the collector explicitly knows how to find.
+
+Non-goals:
+- Generational, incremental, or concurrent collection.
+- Collection of the scratch heap.
+- Reducing the worst-case pause time. (Stop-the-world is fine.)
+- Collection of the symbol table (it stays append-only and pinned;
+  see [Roots](#roots)).
+- Finalizers, weak references, ephemerons.
+
+## Algorithm
+
+Stop-the-world Cheney semispace copy. The 256 MiB main heap is split
+into two equal **semispaces** of 128 MiB each. At any moment one is
+the active *from-space* (allocations bump into it) and the other is
+the idle *to-space*.
+
+A collection:
+1. Swap the role of the two spaces. The old from-space becomes the
+   new to-space's source; the old to-space becomes the new
+   from-space (initially empty, `next == start`).
+2. **Forward roots.** For each root word holding a tagged pointer
+   into the old from-space, copy the pointee into the new
+   from-space and overwrite the root with the new tagged pointer.
+3. **Scan.** Walk the new from-space from the start to the
+   advancing `next` pointer with a cursor `scan`. For each object
+   between `scan` and `next`, forward each of its tagged-pointer
+   fields. `next` advances as new objects are appended;
+   `scan == next` is the termination condition.
+4. The old from-space is now garbage in bulk. No per-object sweep.
+
+Forwarding is in-place: when an object is copied, its old header
+word is overwritten with a forwarding tag carrying the new address
+(see [Forwarding](#forwarding)). A second visit to the same object
+sees the forwarding tag and just reuses the recorded new address.
+
+Why Cheney over mark-sweep-with-free-list (the original ask): the
+existing allocator is bump-and-pointer. Cheney keeps it that way
+post-collection (no free list, no fragmentation), and the collector
+itself is short — copy + scan, no separate mark and sweep passes.
+
+## Heap layout
+
+Current main-heap region (`HEAP_CAP_BYTES = 0x10000000`, 256 MiB)
+is replaced by two regions of 128 MiB each in BSS, sized via two new
+constants:
+
+```
+%macro SEMISPACE_CAP_BYTES() 0x08000000 %endm   # 128 MiB
+%macro SCHEME_STACK_CAP_BYTES() 0x00100000 %endm   # 1 MiB; see Roots
+```
+
+New global pointer slots (replacing `heap_buf_ptr` /
+`heap_next` / `heap_end`):
+
+| Slot                | Meaning                                            |
+|---------------------|----------------------------------------------------|
+| `space_a_ptr`       | base of semispace A (set by `init_arenas`)         |
+| `space_b_ptr`       | base of semispace B (set by `init_arenas`)         |
+| `from_space_start`  | base of the active from-space                      |
+| `from_space_end`    | end of the active from-space                       |
+| `from_space_next`   | bump pointer into active from-space                |
+| `to_space_start`    | base of the idle to-space                          |
+| `to_space_end`      | end of the idle to-space                           |
+
+`current_heap_next_ptr` / `current_heap_end_ptr` continue to exist
+unchanged. When the current heap is "main," they point at
+`from_space_next` / `from_space_end`. When the current heap is
+"scratch," they point at the scratch slots (unchanged from today).
+After a collection, the slots `from_space_next` /
+`from_space_end` are updated to refer to the *new* from-space,
+and `current_heap_next_ptr` is repointed if main is current.
+
+The scratch heap, the symtab arena, and the readbuf are unaffected.
+
+## Roots
+
+Roots are split into three closed sets:
+
+### 1. Symbol table
+
+Every `SYMENT.global_val` slot in the symbol-table BSS array is a
+potential root. Iteration is bounded by `symtab_count`. The table
+itself never moves; entries are pinned and only their `global_val`
+field is forwarded.
+
+### 2. Scheme value stack
+
+A new dedicated stack, allocated in BSS (`SCHEME_STACK_CAP_BYTES`),
+that holds **every Scheme value live across a call site**. The P1
+call stack (`sp`) continues to hold raw machine state — return
+addresses, raw integer locals, scratch — and is *not* scanned.
+
+Two new globals:
+- `scheme_stack_base`  — set at init to the start of the stack region.
+- `scheme_sp`          — current top; grows upward.
+
+A frame on the Scheme stack is a contiguous array of N tagged
+values. A function reserves N slots on entry, accesses them by
+displacement off `scheme_sp`-at-entry (saved as the function's
+"sfp"), and releases them on exit. New macros:
+
+| Macro                  | Effect                                                    |
+|------------------------|-----------------------------------------------------------|
+| `%senter(n)`           | bump `scheme_sp` by `n*8`; save old in current P1 frame   |
+| `%sleave(n)`           | drop `n*8` from `scheme_sp`                                |
+| `%sst(reg, slot)`      | store `reg` into Scheme-frame `slot`                      |
+| `%sld(reg, slot)`      | load Scheme-frame `slot` into `reg`                       |
+| `%spush(reg)`          | push `reg` onto Scheme stack (1-slot bump)                |
+| `%spop(reg)`           | pop one slot into `reg`                                    |
+
+A new `%fn3(name, scheme_locals, raw_locals, body)` form parallels
+`%fn2`: it builds two frames simultaneously, one on the P1 stack
+(raw locals — cursors, byte counts, pointers into BSS) and one on
+the Scheme value stack (every tagged Scheme value). The collector
+walks only the Scheme stack.
+
+Functions that hold zero Scheme values across calls keep using
+`%fn2` / `%fn` and need no Scheme-stack frame at all (e.g. `memcpy`,
+`strlen`, low-level syscall trampolines, the writer's byte
+emitters).
+
+`scheme_sp` itself is the only stack-side root pointer the GC needs;
+the live region is `[scheme_stack_base, scheme_sp)`.
+
+### 3. In-flight argument-passing registers
+
+At every potential GC point — that is, at every allocation —
+arguments / temporaries already-live in `a0`–`a3` and `t0`–`t3` may
+hold tagged values. The protocol is: **never call into the
+allocator with a live tagged value held only in a register**. Either
+spill to the Scheme frame first, or pass it through `a0` / `a1` to
+the allocator (which the allocator itself preserves across
+collection by treating its inputs as roots — see
+[Allocation hooks](#allocation-hooks)).
+
+This is a hard discipline; violating it produces use-after-free that
+silently survives until the next collection. We mitigate by:
+- Treating `a0` and `a1` of the allocator entry points (`cons`,
+  `alloc_hdr`, `alloc_bytes`) as additional roots during a
+  collection that fires from inside the allocator. They get
+  forwarded along with everything else and the allocator returns
+  with the updated values.
+- Forbidding any other register from holding a tagged value across
+  a `%call` to a function that may allocate. (Audit pass; this is
+  already mostly true because of the existing spill discipline.)
+
+## Object headers and tracing
+
+Every heap object — without exception — begins with an 8-byte
+header word whose low byte is one of the `HDR` enum values. After
+this change there are no headerless allocations: the existing
+`alloc_bytes` is replaced by a header-emitting variant.
+
+### New tags
+
+```
+%enum HDR { BV CLOSURE PRIM TD REC MV RAW FWD }
+```
+
+- `HDR.RAW` — opaque byte buffer with no internal tagged refs.
+  Used for BV data, symtab name copies (when those move into the
+  GC heap; they currently live in main but are pinned — see
+  [Pinned allocations](#pinned-allocations)), and any other raw
+  payload that needs to participate in the parsable-heap walk.
+  Header word: `(raw_size_bytes << 8) | HDR.RAW`.
+- `HDR.FWD` — forwarding sentinel (only valid in from-space during
+  a collection). Header word: `(new_addr << 8) | HDR.FWD`. Because
+  every heap object is 8-byte aligned, `new_addr` shifted left 8
+  loses no information for any address representable in our memory
+  layout.
+
+### Per-type trace and size
+
+The collector dispatches on `hdr.low_byte` and produces (a) the
+total size in bytes of the allocation including the 8-byte header,
+and (b) a list of slot offsets containing tagged pointer fields.
+
+| HDR         | Total size                              | Pointer-bearing slots                     |
+|-------------|-----------------------------------------|-------------------------------------------|
+| `BV`        | 24 (hdr, len, cap), data via `BV.data`  | `BV.data` (points to a `RAW` block)       |
+| `CLOSURE`   | 32                                      | `CLOSURE.params`, `CLOSURE.body`, `CLOSURE.env` |
+| `PRIM`      | 24                                      | `PRIM.data` (only if entry is a parameterized prim — see below) |
+| `TD`        | 32                                      | `TD.name`, `TD.fields`                    |
+| `REC`       | `16 + nfields*8` (read `nfields` from `td`) | `td` slot + each field slot           |
+| `MV`        | `8 + count*8` (read `count` from header high bytes) | each value slot                |
+| `RAW`       | `8 + align8(size)` (read `size` from header high bytes) | none                       |
+| `FWD`       | (must not be visited; precondition violation if seen during scan) | n/a |
+
+PAIRs are **not** in this table because they are tagged with
+`TAG.PAIR`, not `TAG.HEAP`. Their layout is fixed:
+`[car | cdr]`, no header byte. PAIR copy is special-cased: 16
+bytes, two tagged-pointer slots (car at offset 0, cdr at +8).
+
+Because PAIRs have no header byte, the collector also can't store
+a forwarding sentinel at offset 0 the same way it does for HEAP
+objects. Instead we use the convention: for a forwarded PAIR,
+overwrite the **car** slot with `(new_addr << 3) | TAG.PAIR` (a
+self-tagged forward) and set the **cdr** slot to a sentinel
+`IMM.UNBOUND`. To detect: a from-space pair is forwarded iff its
+cdr slot equals the `UNBOUND` immediate. (We pick `UNBOUND`
+because it is an immediate that user code never legitimately stores
+into a cdr — it is reserved for "symbol unbound" lookups.)
+
+For PRIM, the `data` slot is *only* a tagged pointer when the prim
+is a parameterized prim (record accessor / mutator / ctor /
+predicate). For plain prims it's zero. Trace logic: read the slot;
+if `tagof != TAG.FIXNUM && tagof != 0`, treat as a pointer.
+Equivalent and simpler: always trace `PRIM.data` — fixnum-tagged
+words and zero will fail the tag check inside the forwarding
+routine and pass through unchanged.
+
+### Allocation hooks
+
+`cons` and `alloc_hdr` and `alloc_bytes` (which now emits a
+`RAW` header) gain an OOM check that triggers a collection
+instead of aborting:
+
+```
+:cons
+  load from_space_next, from_space_end
+  if next + 16 <= end: bump, write, return
+  else:
+    save a0, a1 to a known location (Scheme stack push)
+    call gc_collect
+    pop a0, a1 (now forwarded if they were heap pointers)
+    retry: this time it must succeed, else abort with msg_heap_full
+```
+
+The save-restore around `gc_collect` is exactly the in-flight
+register protocol: the allocator's inputs are spilled onto the
+Scheme stack so they participate as roots during the collection.
+
+## Forwarding
+
+`forward(tagged_ptr)` is the core operation, called by both root
+forwarding and the scan loop:
+
+```
+case tagof(p):
+  FIXNUM, IMM, SYM:    return p unchanged   # not a heap ref
+  PAIR:
+    raw = p - 1
+    if cdr(raw) == imm_val(UNBOUND):        # already forwarded
+        return car(raw)                      # holds new tagged ptr
+    new = bump to-space by 16
+    new[0] = car(raw); new[8] = cdr(raw)
+    car(raw) = (new << 3) | TAG.PAIR        # write forward
+    cdr(raw) = imm_val(UNBOUND)              # forward marker
+    return car(raw)
+  HEAP:
+    raw = p - 3
+    hdr = ld(raw, 0)
+    if (hdr & 0xff) == HDR.FWD:
+        return ((hdr >> 8) | TAG.HEAP-untagged-arith)   # extract new
+    size = size_of(hdr)
+    new = bump to-space by size
+    memcpy(new, raw, size)
+    st(raw, 0, (new << 8) | HDR.FWD)
+    return new + 3
+```
+
+The scan loop walks to-space byte-by-byte using `size_of` to skip
+over each copied object, calling `forward` on each tagged-pointer
+slot listed by the per-type tracer.
+
+## Pinned allocations
+
+Some interpreter-owned allocations must not move:
+- Symtab name buffers (`alloc_bytes_main` from `intern`).
+- TD objects that hold field-name lists (`alloc_hdr_main` /
+  `cons_main` from `eval_define_record_type`).
+- Pre-allocated MACRO objects, special-form name strings, etc.
+
+Today these live in the main heap, distinguished from user
+allocations only by their use of the `*_main` allocator suffix.
+After GC introduction, they need to live in a region the collector
+**doesn't** sweep. Two options:
+
+1. **Move pinned allocations to a separate "perm" region.** Rename
+   `alloc_*_main` to `alloc_*_perm`, point them at a third BSS arena
+   (perm) sized for ~1 MiB. Collector treats perm as a root region
+   (scans every word, forwards heap pointers) but never moves perm
+   objects. Cleaner.
+2. **Keep pinned allocations in from-space and special-case them
+   during copy.** Tag a "pinned" bit in the header; collector copies
+   the *contents* into to-space conceptually but actually leaves
+   them in place and just forwards their internal references.
+   Complicates the moving invariant.
+
+Recommend option 1. Perm region is small, write-once, scan-only.
+The handful of `*_main` call sites in the interpreter all become
+`*_perm`.
+
+## Collection lifecycle
+
+Init (called from `heap_init`):
+1. Reserve both semispaces and the perm region in BSS.
+2. `from_space = space_a`, `to_space = space_b`,
+   `from_space_next = from_space_start`.
+3. `scheme_sp = scheme_stack_base`.
+
+Trigger:
+- Only on alloc-fail in the from-space. No proactive trigger, no
+  threshold-based trigger.
+- A collection that fails to free enough space for the pending
+  allocation aborts with `msg_heap_full`. (No heap growth.)
+
+Collection body (`gc_collect`):
+1. Swap from/to roles. Set `to_space_next = to_space_start`.
+2. Forward all roots:
+   - Walk `[scheme_stack_base, scheme_sp)`: for each word, replace
+     in place with `forward(word)`.
+   - Walk `[symtab[0], symtab[symtab_count])`: for each entry, replace
+     `global_val` with `forward(global_val)`.
+   - Walk perm region's tagged-pointer fields (via the same
+     header-driven trace dispatch).
+   - Forward the allocator's spilled `a0`/`a1` if a collection
+     fired from inside an allocator.
+3. Scan: cursor walks new from-space until it catches `next`.
+   For each object, forward each pointer-bearing slot per the
+   per-type trace.
+4. Optionally zero the old from-space (debug only — helps catch
+   stale pointers; off by default for speed).
+
+Post-conditions:
+- All live objects copied to the new from-space; all forwarding
+  pointers consumed.
+- `from_space_next` reflects the new occupancy.
+- The old from-space is logically free.
+
+## Test posture
+
+A debug build flag (compile-time `%macro GC_DEBUG_FILL_FROM() %endm`)
+that, when defined, fills the old from-space with a poison byte
+after collection. Any surviving raw pointer to a copied object will
+read poisoned bytes on its next dereference and crash visibly.
+
+Test scaffolding (under `tests/scheme1/`):
+- `gc-stress.scm`: allocate ~10 MiB of garbage in a loop with a
+  small live set; assert the heap doesn't grow.
+- `gc-identity.scm`: pre-collection vs post-collection `eq?` on
+  pairs / records / closures should keep its answer.
+- `gc-mutation.scm`: `set-car!` on a pair forwarded mid-collection
+  is observed correctly.
+- `gc-cons-main.scm`: pinned objects survive collections without
+  changing identity.
+- `gc-during-bv.scm`: trigger collection during `bv-grow`; verify
+  the BV's `RAW` data block is correctly forwarded.
+- `gc-during-parser.scm`: large input program forces collection
+  during parsing; verify reader-built lists end up correct.
+- `gc-during-cc.scm`: run `cc.scm` on a small input end-to-end
+  with an artificially shrunken from-space (e.g. 4 MiB) so
+  collection is exercised at every phase. Output must be
+  byte-identical to the un-shrunken run.
+
+## Migration plan
+
+The change is large enough to land in stages. Each stage is
+independently testable; stages 1–4 are pure refactors that change
+no observable behavior.
+
+1. **Add the perm region.** Introduce `*_perm` allocators and
+   migrate every `*_main` call site to `*_perm`. The "main" name
+   is freed up to mean "GC-managed" later. No semantic change.
+2. **Add HDR.RAW.** Replace `alloc_bytes` with a variant that
+   emits a `RAW` header. Update `bv_alloc`, `bv_grow`, and the
+   single other `alloc_bytes` call site (string allocator) to
+   accept the new offset of the data buffer (now `+8` past the
+   raw allocation start). Heap is now parsable.
+3. **Add the Scheme value stack.** Build the BSS region, the
+   `scheme_sp` global, and the `%senter` / `%sleave` / `%sst` /
+   `%sld` / `%spush` / `%spop` / `%fn3` macros. No callers yet.
+4. **Migrate frames.** Convert every `%fn2` whose locals hold
+   tagged Scheme values to `%fn3`, splitting locals into Scheme
+   vs raw. Order: leaves first (`bind_params`, `eval_args`,
+   `apply_build_args`), then `eval` and the `eval_*` family,
+   then primitives, then the parser's value-producing leaves
+   (`parse_atom`, `parse_string`, `parse_char`, `parse_list`,
+   `parse_u8_body`, `parse_one`), then the writer
+   (`write_to_bv`, `write_pair_to_bv`, `value_to_bv`).
+5. **Wire the collector.** Add `gc_collect`, `forward`,
+   `size_of`, and the per-type trace dispatch. Hook into
+   `cons` / `alloc_hdr` / `alloc_bytes` OOM paths.
+6. **Shrink the heap and validate.** Cut the from-space from
+   128 MiB to 4 MiB temporarily and run the full
+   `tests/scheme1/` and `tests/cc-pp/` suites. Restore.
+
+After stage 4 the interpreter still works exactly as today (Scheme
+values just live in a different stack), so each stage can be
+landed and tested independently. Stage 5 is where new behavior
+appears.
+
+## Open items
+
+- **Pinned cons cells.** `cons_main` produces PAIRs in main; with
+  GC, those PAIRs must instead live in perm. Need to confirm
+  every existing `cons_main` caller is fine with that.
+- **Multi-value packs as roots in mid-flight.** `prim_call_with_values`
+  and friends pass MV-packs through registers. Audit to ensure no
+  MV-pack is held only in a register across an allocation.
+- **`apply` with deep arg lists.** `apply_build_args` already builds
+  the arg list incrementally; verify that the head/tail pointers
+  are spilled to the Scheme stack across each `cons`.
+- **Statistics.** A `(gc-stats)` primitive returning collection
+  count, bytes copied, last pause length. Useful for the test
+  scaffolding and for `cc.scm` self-instrumentation. Defer until
+  after stage 5 lands.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README

A	docs/MACROS.md	\|	372	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	docs/SCHEME1-GC.md	\|	427	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++