commit 76fd67990e47b226037ab1da73887d10684e5aad
parent 632d7039ab4ccf96a93577a762fc39c5e188bb13
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Wed, 20 May 2026 09:33:49 -0700
scheme1 explorations
Diffstat:
| A | docs/MACROS.md | | | 372 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | docs/SCHEME1-GC.md | | | 427 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
2 files changed, 799 insertions(+), 0 deletions(-)
diff --git a/docs/MACROS.md b/docs/MACROS.md
@@ -0,0 +1,372 @@
+# Macros (R7RS `syntax-rules`) for scheme1
+
+Plan for adding hygienic `syntax-rules` macros to `scheme1.P1pp`.
+Conforms to R7RS-small with the coverage limits noted in
+[Scope](#scope).
+
+## Architecture
+
+Two-layer split:
+
+- **P1 runtime** (in `scheme1.P1pp`) provides the *minimum* needed to
+ recognize and dispatch macro bindings: a new heap-object type, three
+ special forms (`define-syntax`, `let-syntax`, `letrec-syntax`), an
+ expansion hook in `eval`, and a `gensym` primitive.
+- **Scheme engine** (in `prelude.scm`) implements `syntax-rules` itself
+ as a regular procedure that returns a transformer closure. Pattern
+ matching, template rendering, ellipsis expansion, and gensym-based
+ hygiene all live here.
+
+Rationale: a hygienic matcher/renderer is ~300 lines of dense list
+manipulation. Writing it in Scheme is straightforward and debuggable;
+writing it in P1pp is neither. The runtime cost is one extra closure
+call per macro use — negligible at the scale this interpreter
+operates.
+
+## Scope
+
+Covered in v1:
+
+- `define-syntax`, `let-syntax`, `letrec-syntax`.
+- `syntax-rules` with literals list, single-depth ellipsis (`...`),
+ underscore wildcard (`_`), and improper-tail patterns
+ (`(p1 p2 ... . ptail)`).
+- Hygiene: every template-introduced identifier (not a pattern
+ variable, not a literal) is gensym-renamed per expansion so it
+ cannot shadow a user binding at the use site.
+
+Deferred (not v1):
+
+- Multi-depth ellipsis (a pattern variable appearing under two or more
+ `...`s, producing nested-list captures).
+- Custom ellipsis identifier (`(syntax-rules ::: ...)`).
+- `syntax-case` / `er-macro-transformer` / explicit-renaming variants.
+- Macro-introduced top-level definitions (`define` inside an
+ expansion). Internal `define` is already rejected by scheme1.
+
+## Type additions
+
+### `HDR.MACRO` heap object
+
+Added to the existing `HDR` enum (currently
+`{BV CLOSURE PRIM TD REC MV}`).
+
+Layout (tagged HEAP, 16 bytes):
+
+```
+struct MACRO {
+ hdr ; HDR.MACRO
+ transformer ; tagged closure: (lambda (form) -> form)
+}
+```
+
+The transformer is a Scheme closure built by `syntax-rules` at the
+time `define-syntax` (or `let-syntax`/`letrec-syntax`) is evaluated.
+It takes the entire macro form as a quoted datum (e.g. `'(my-when t
+b1 b2)`) and returns a quoted datum that `eval` then re-evaluates.
+
+Touchpoints for the GC tracer (when GC lands): trace the
+`transformer` slot. For `equal?` / `write` / `display`: render as
+`#<macro>`; not structurally comparable.
+
+## Runtime changes
+
+### Special-form dispatch
+
+Three new entries in the `dispatch_form` table inside `eval`'s pair
+branch:
+
+```
+%dispatch_form(&sym_define_syntax, &.do_define_syntax)
+%dispatch_form(&sym_let_syntax, &.do_let_syntax)
+%dispatch_form(&sym_letrec_syntax, &.do_letrec_syntax)
+```
+
+Cached symbol slots and `intern_special_forms` updated alongside.
+
+### `eval_define_syntax`
+
+```
+(define-syntax name transformer-expr)
+```
+
+1. Evaluate `transformer-expr` in the current env. Must yield a
+ closure (`HDR.CLOSURE`) — error otherwise (`die(msg_bad_syntax)`).
+2. `alloc_hdr(MACRO.SIZE, HDR.MACRO)`; store the closure in the
+ transformer slot.
+3. Bind the symbol's *global* slot to the tagged MACRO. (Same global
+ binding mechanism as `define`.)
+4. Return UNSPEC.
+
+`alloc_hdr_main` is the right allocator: the macro outlives any
+scratch-heap reset, just like `define-record-type`'s TD.
+
+### `eval_let_syntax` / `eval_letrec_syntax`
+
+```
+(let-syntax ((name transformer-expr) ...) body ...)
+(letrec-syntax ((name transformer-expr) ...) body ...)
+```
+
+Both extend the **lexical** env with `(name . macro-obj)` pairs.
+`let-syntax` evaluates each `transformer-expr` in the *outer* env;
+`letrec-syntax` first installs all bindings as placeholders, then
+evaluates each transformer in the new env (allowing mutual
+recursion). Body evaluates in the extended env via `eval_body`.
+
+The lexical env is the existing alist `((sym . val) ...)`. Macro
+values are tagged HEAP objects, so they coexist with regular value
+bindings without changing the alist shape.
+
+### Macro dispatch in `eval`
+
+After special-form dispatch in the pair branch, before the apply
+path: when the head is a symbol, resolve it (lexical env first, then
+global slot). If the resolved value is a `HDR.MACRO`:
+
+1. Build the form: re-cons the head onto the args, *unevaluated* —
+ essentially the original `expr`. Already in hand as `expr`.
+2. Call the transformer closure with this form as its single
+ argument. (Reuse `apply` with a one-element args list.)
+3. Tail-call `eval` on the returned form in the *current* env.
+
+Pseudocode (in P1 terms inside `eval`):
+
+```
+:.maybe_macro
+%hdr_type(t0, resolved)
+%bine(t0, %HDR.MACRO, &.apply_normal, t1)
+%heap_ld(a0, resolved, %MACRO.transformer) ; closure
+%ldl(a1, expr) ; the form (quoted)
+%li(a2, %imm_val(%IMM.NIL))
+%call(&cons) ; args = (form)
+%mov(a1, a0)
+%heap_ld(a0, ..., transformer)
+%call(&apply)
+%mov(a0, ...) ; expanded form
+%ldl(a1, env)
+%tail(&eval)
+```
+
+Ordering: macro check happens *after* the static special-form
+dispatch table, so user code cannot redefine `if`, `lambda`, etc.
+via `define-syntax`. (That's a deliberate restriction; lifting it
+would require turning every special form into a default macro
+binding that the user can override.)
+
+### `gensym` primitive
+
+```
+(gensym) ; -> fresh symbol, e.g. g.0, g.1, ...
+(gensym "tag") ; -> g.tag.N
+```
+
+Implementation: a process-global counter `gensym_counter` in BSS.
+Builds a name by `format`-ing `g.<n>` (or `g.<tag>.<n>`) into a
+scratch buffer, then calls `intern` so the symbol gets a stable
+slot in the symtab. Because `intern` is keyed on the byte string,
+distinct counters always produce distinct symbols.
+
+The interned name can never collide with user identifiers because
+user identifiers cannot contain `.` followed by a digit at the
+exact pattern produced — *except* that scheme1 *does* allow `.` in
+identifiers. Mitigation: prefix with a byte that the reader rejects
+in user identifiers but allows when interning programmatically.
+Cleanest: pick a leading byte outside the reader's identifier
+charset (e.g. `\x01`) so user code cannot construct a colliding
+name through the reader. The byte is invisible in `display` output
+unless the user explicitly writes the symbol — acceptable.
+
+Added to `prim_table`: `gensym`.
+
+## Scheme engine (`syntax-rules` in prelude.scm)
+
+`syntax-rules` is a regular procedure. Its result is a closure of
+the form `(lambda (form) ...)`.
+
+```
+(syntax-rules literals rule ...)
+ ;; literals = list of identifiers
+ ;; rule = (pattern template) where pattern is (head . sub-pattern)
+```
+
+Top-level call shape after macro expansion of the special form:
+
+```
+(define-syntax foo (syntax-rules (lit) ((foo p) t) ...))
+```
+
+`define-syntax`'s argument is just the `(syntax-rules ...)`
+expression. Eval evaluates that expression, getting a closure, then
+wraps it in HDR.MACRO. So `syntax-rules` must be available in the
+prelude before any macro is defined.
+
+### Algorithm — pattern matching
+
+`(match-pattern pat form literals)` returns either `#f` (no match)
+or an alist of `(pattern-var . captured-form-or-list)`.
+
+Pattern dispatch:
+
+| Pattern shape | Match rule |
+|-------------------------|-----------------------------------------------------------|
+| `_` | Match anything; no binding. |
+| `<id>` ∈ literals | Match iff `form` is the same symbol (`eq?`). |
+| `<id>` (otherwise) | Match anything; bind `id` → `form`. |
+| `()` | Match iff `form` is `'()`. |
+| `(p_head . p_rest)` | See ellipsis / improper / proper rules below. |
+| `<literal-datum>` | Match by `equal?` (numbers, strings, booleans, chars). |
+
+Pair-pattern cases:
+
+1. **Ellipsis**: pattern is `(p ... . p_tail)` where `p ...` is the
+ ellipsis element. Greedily consume as many leading elements of
+ `form` as match `p`, collecting per-pattern-var captures into
+ parallel lists. Then match the remaining tail against `p_tail`.
+2. **Improper tail without ellipsis**: pattern is `(p1 ... . p_tail)`
+ (dot before a non-ellipsis tail). Match each `p_i` against
+ `(list-ref form i)`, then match `p_tail` against the dotted rest.
+3. **Proper list**: lengths must match exactly; element-wise
+ recursion.
+
+### Ellipsis capture shape
+
+When `p` (under `...`) contains pattern vars `v1, v2, ...`, after
+consuming `n` elements, each `v_i` is bound to a *list of length n*
+of the values captured at that position. (Empty list when `n = 0`.)
+
+Each pattern-var binding is annotated with its **depth**: 0 for a
+plain capture, 1 for ellipsis-captured. Depth-1 only in v1.
+
+### Algorithm — template rendering
+
+`(render template bindings rename-map)` walks the template and
+produces an output form.
+
+- Symbol that is a depth-0 pattern var → substitute its captured
+ form.
+- Symbol that is a depth-1 pattern var **outside** of an ellipsis
+ context → error (arity mismatch).
+- `(t ... . t_tail)`: for each ellipsis-relevant pattern var
+ appearing in `t` (the sub-template before `...`), look up its
+ list of captures. All such lists must have equal length `n`
+ (else error). Render `t` `n` times, each time with depth-1 vars
+ shadowed by their `i`-th element. Splice the results into the
+ output, then continue with `t_tail`.
+- Symbol that is a literal or free identifier → look up in
+ `rename-map`; emit the renamed symbol if present, else emit
+ unchanged.
+- Pair (non-ellipsis) → recurse into car and cdr.
+- Other (literal datum, `'()`) → emit unchanged.
+
+### Hygiene: identifier renaming
+
+Before each expansion, walk the template and collect every symbol
+that is **not**:
+- a pattern variable in `bindings`, or
+- a literal in the rule's literals list, or
+- a reference to a variable visible at the macro's *use site* via
+ the standard scope chain (we approximate this as: any symbol not
+ introduced by the template is fine to leave alone).
+
+In a single-global-env Scheme, the simpler rule "rename every
+template-only identifier that binds a name" suffices: rename
+identifiers that appear in *binding positions* of constructs the
+template introduces (`lambda` formals, `let` bound names, `define`
+names). Identifiers in operator/operand positions don't need
+renaming because they reference globals or pattern-bound values.
+
+Implementation: walk the template once, collect a set of
+"binding-position" identifiers (this requires a small table of
+which forms bind names: `lambda`, `let`, `let*`, `letrec`,
+`let-values`, `let*-values`, `do`, `define`, `define-record-type`).
+For each, allocate a fresh `gensym` name and substitute every
+occurrence in the template (binding *and* references) consistently.
+
+This is conservative — it sometimes renames identifiers that
+wouldn't have collided — but that's fine, it's still hygienic.
+
+### Putting it together — the transformer closure
+
+```
+(define (syntax-rules literals . rules)
+ (lambda (form)
+ (let loop ((rs rules))
+ (cond
+ ((null? rs)
+ (error "no syntax-rules pattern matched" form))
+ (else
+ (let* ((rule (car rs))
+ (pat (car rule))
+ (tpl (cadr rule))
+ (b (match-pattern pat form literals)))
+ (if b
+ (let ((rmap (build-rename-map tpl literals b)))
+ (render tpl b rmap))
+ (loop (cdr rs)))))))))
+```
+
+`match-pattern`, `build-rename-map`, and `render` are the three
+core helpers; together ~300 lines.
+
+## Files / line budget
+
+| Location | Add |
+|---------------------------------------|------------|
+| `scheme1.P1pp` HDR enum | +1 line |
+| `scheme1.P1pp` MACRO struct | +3 lines |
+| `scheme1.P1pp` `intern_special_forms` | ~8 lines |
+| `scheme1.P1pp` eval dispatch + macro hook | ~30 lines |
+| `scheme1.P1pp` `eval_define_syntax` | ~25 lines |
+| `scheme1.P1pp` `eval_let_syntax` | ~50 lines |
+| `scheme1.P1pp` `eval_letrec_syntax` | ~50 lines |
+| `scheme1.P1pp` `prim_gensym_entry` | ~25 lines |
+| `scheme1.P1pp` writer `#<macro>` case | ~5 lines |
+| `scheme1.P1pp` prim_table + names | +3 lines |
+| `prelude.scm` `syntax-rules` engine | ~350 lines |
+
+Total: ~200 lines P1pp, ~350 lines Scheme.
+
+## Testing
+
+Add a `tests/scheme1/13x-macro-*.scm` series. Minimum coverage:
+
+- `130-macro-basic.scm` — `(define-syntax my-when ...)`, simple
+ fixed-arity expansion.
+- `131-macro-ellipsis.scm` — `my-when` with `body ...`, `my-list`
+ (`(x ...)` → `(list x ...)`), zero-element case.
+- `132-macro-let.scm` — re-implement `let` via `syntax-rules` with
+ inner-shape pattern `((name val) ...)`. Verify against builtin.
+- `133-macro-tail.scm` — improper-tail pattern, e.g. `(_ x . rest)`.
+- `134-macro-literals.scm` — literal `else`-style identifier in a
+ cond-shaped macro.
+- `135-macro-hygiene.scm` — macro that introduces a binding that
+ would shadow a user var; verify the user var still resolves.
+ Classic test: `(define-syntax swap! ...)` using a temp.
+- `136-let-syntax.scm` — local `let-syntax`, including a body that
+ references a same-named global value — must shadow correctly and
+ un-shadow outside the body.
+- `137-letrec-syntax.scm` — two mutually-recursive transformers
+ (rare in practice but spec-required).
+- `138-no-match.scm` — fall-through error path.
+
+Each fixture is a normal scheme1 test (`.scm` + `.expected-exit`),
+runnable via `tests/boot-run-scheme1.sh`.
+
+## Open caveats
+
+- **Top-level `define` inside expansions is rejected.** scheme1
+ rejects internal `define`, and that check fires *after* macro
+ expansion. A macro that expands to `(begin (define x 1) ...)` at
+ the top level works; the same expansion inside a `lambda` body
+ does not. Document, don't fix.
+- **Macros cannot override built-in special forms.** Dispatch
+ checks special-form symbols before macro lookup. Lifting this
+ would require representing the special forms as default
+ bindings.
+- **No source-location tracking.** Errors from inside expansions
+ point at the macro implementation, not the use site. Consistent
+ with scheme1's existing error story.
+- **`equal?` on macros is reference equality only.** Two
+ `syntax-rules` expressions compiled separately are distinct
+ closures. Not specified by R7RS as comparable.
diff --git a/docs/SCHEME1-GC.md b/docs/SCHEME1-GC.md
@@ -0,0 +1,427 @@
+# scheme1 GC
+
+Spec for adding a Cheney-style semispace copying garbage collector to
+`scheme1/scheme1.P1pp`. Replaces the main heap's bump-only allocator
+with a moving collector; leaves the scratch heap and the
+`heap-mark` / `heap-rewind!` family alone.
+
+## Goals & non-goals
+
+Goals:
+- Reclaim unreachable main-heap objects automatically. Programs no
+ longer depend on `heap-mark` / `heap-rewind!` discipline to stay
+ within the 256 MiB main-heap reservation.
+- Preserve every observable semantic of the existing interpreter
+ (`eq?`, mutation through `set-car!` / `set-cdr!` / `bytevector-u8-set!`,
+ closure capture, record identity).
+- Keep the scratch-heap, `heap-mark`, `heap-rewind!`, `use-scratch-heap!`
+ surface working unchanged. `cc.scm` must run with no source change.
+- Roots are unambiguous: every Scheme value live across an allocation
+ point lives in a place the collector explicitly knows how to find.
+
+Non-goals:
+- Generational, incremental, or concurrent collection.
+- Collection of the scratch heap.
+- Reducing the worst-case pause time. (Stop-the-world is fine.)
+- Collection of the symbol table (it stays append-only and pinned;
+ see [Roots](#roots)).
+- Finalizers, weak references, ephemerons.
+
+## Algorithm
+
+Stop-the-world Cheney semispace copy. The 256 MiB main heap is split
+into two equal **semispaces** of 128 MiB each. At any moment one is
+the active *from-space* (allocations bump into it) and the other is
+the idle *to-space*.
+
+A collection:
+1. Swap the role of the two spaces. The old from-space becomes the
+ new to-space's source; the old to-space becomes the new
+ from-space (initially empty, `next == start`).
+2. **Forward roots.** For each root word holding a tagged pointer
+ into the old from-space, copy the pointee into the new
+ from-space and overwrite the root with the new tagged pointer.
+3. **Scan.** Walk the new from-space from the start to the
+ advancing `next` pointer with a cursor `scan`. For each object
+ between `scan` and `next`, forward each of its tagged-pointer
+ fields. `next` advances as new objects are appended;
+ `scan == next` is the termination condition.
+4. The old from-space is now garbage in bulk. No per-object sweep.
+
+Forwarding is in-place: when an object is copied, its old header
+word is overwritten with a forwarding tag carrying the new address
+(see [Forwarding](#forwarding)). A second visit to the same object
+sees the forwarding tag and just reuses the recorded new address.
+
+Why Cheney over mark-sweep-with-free-list (the original ask): the
+existing allocator is bump-and-pointer. Cheney keeps it that way
+post-collection (no free list, no fragmentation), and the collector
+itself is short — copy + scan, no separate mark and sweep passes.
+
+## Heap layout
+
+Current main-heap region (`HEAP_CAP_BYTES = 0x10000000`, 256 MiB)
+is replaced by two regions of 128 MiB each in BSS, sized via two new
+constants:
+
+```
+%macro SEMISPACE_CAP_BYTES() 0x08000000 %endm # 128 MiB
+%macro SCHEME_STACK_CAP_BYTES() 0x00100000 %endm # 1 MiB; see Roots
+```
+
+New global pointer slots (replacing `heap_buf_ptr` /
+`heap_next` / `heap_end`):
+
+| Slot | Meaning |
+|---------------------|----------------------------------------------------|
+| `space_a_ptr` | base of semispace A (set by `init_arenas`) |
+| `space_b_ptr` | base of semispace B (set by `init_arenas`) |
+| `from_space_start` | base of the active from-space |
+| `from_space_end` | end of the active from-space |
+| `from_space_next` | bump pointer into active from-space |
+| `to_space_start` | base of the idle to-space |
+| `to_space_end` | end of the idle to-space |
+
+`current_heap_next_ptr` / `current_heap_end_ptr` continue to exist
+unchanged. When the current heap is "main," they point at
+`from_space_next` / `from_space_end`. When the current heap is
+"scratch," they point at the scratch slots (unchanged from today).
+After a collection, the slots `from_space_next` /
+`from_space_end` are updated to refer to the *new* from-space,
+and `current_heap_next_ptr` is repointed if main is current.
+
+The scratch heap, the symtab arena, and the readbuf are unaffected.
+
+## Roots
+
+Roots are split into three closed sets:
+
+### 1. Symbol table
+
+Every `SYMENT.global_val` slot in the symbol-table BSS array is a
+potential root. Iteration is bounded by `symtab_count`. The table
+itself never moves; entries are pinned and only their `global_val`
+field is forwarded.
+
+### 2. Scheme value stack
+
+A new dedicated stack, allocated in BSS (`SCHEME_STACK_CAP_BYTES`),
+that holds **every Scheme value live across a call site**. The P1
+call stack (`sp`) continues to hold raw machine state — return
+addresses, raw integer locals, scratch — and is *not* scanned.
+
+Two new globals:
+- `scheme_stack_base` — set at init to the start of the stack region.
+- `scheme_sp` — current top; grows upward.
+
+A frame on the Scheme stack is a contiguous array of N tagged
+values. A function reserves N slots on entry, accesses them by
+displacement off `scheme_sp`-at-entry (saved as the function's
+"sfp"), and releases them on exit. New macros:
+
+| Macro | Effect |
+|------------------------|-----------------------------------------------------------|
+| `%senter(n)` | bump `scheme_sp` by `n*8`; save old in current P1 frame |
+| `%sleave(n)` | drop `n*8` from `scheme_sp` |
+| `%sst(reg, slot)` | store `reg` into Scheme-frame `slot` |
+| `%sld(reg, slot)` | load Scheme-frame `slot` into `reg` |
+| `%spush(reg)` | push `reg` onto Scheme stack (1-slot bump) |
+| `%spop(reg)` | pop one slot into `reg` |
+
+A new `%fn3(name, scheme_locals, raw_locals, body)` form parallels
+`%fn2`: it builds two frames simultaneously, one on the P1 stack
+(raw locals — cursors, byte counts, pointers into BSS) and one on
+the Scheme value stack (every tagged Scheme value). The collector
+walks only the Scheme stack.
+
+Functions that hold zero Scheme values across calls keep using
+`%fn2` / `%fn` and need no Scheme-stack frame at all (e.g. `memcpy`,
+`strlen`, low-level syscall trampolines, the writer's byte
+emitters).
+
+`scheme_sp` itself is the only stack-side root pointer the GC needs;
+the live region is `[scheme_stack_base, scheme_sp)`.
+
+### 3. In-flight argument-passing registers
+
+At every potential GC point — that is, at every allocation —
+arguments / temporaries already-live in `a0`–`a3` and `t0`–`t3` may
+hold tagged values. The protocol is: **never call into the
+allocator with a live tagged value held only in a register**. Either
+spill to the Scheme frame first, or pass it through `a0` / `a1` to
+the allocator (which the allocator itself preserves across
+collection by treating its inputs as roots — see
+[Allocation hooks](#allocation-hooks)).
+
+This is a hard discipline; violating it produces use-after-free that
+silently survives until the next collection. We mitigate by:
+- Treating `a0` and `a1` of the allocator entry points (`cons`,
+ `alloc_hdr`, `alloc_bytes`) as additional roots during a
+ collection that fires from inside the allocator. They get
+ forwarded along with everything else and the allocator returns
+ with the updated values.
+- Forbidding any other register from holding a tagged value across
+ a `%call` to a function that may allocate. (Audit pass; this is
+ already mostly true because of the existing spill discipline.)
+
+## Object headers and tracing
+
+Every heap object — without exception — begins with an 8-byte
+header word whose low byte is one of the `HDR` enum values. After
+this change there are no headerless allocations: the existing
+`alloc_bytes` is replaced by a header-emitting variant.
+
+### New tags
+
+```
+%enum HDR { BV CLOSURE PRIM TD REC MV RAW FWD }
+```
+
+- `HDR.RAW` — opaque byte buffer with no internal tagged refs.
+ Used for BV data, symtab name copies (when those move into the
+ GC heap; they currently live in main but are pinned — see
+ [Pinned allocations](#pinned-allocations)), and any other raw
+ payload that needs to participate in the parsable-heap walk.
+ Header word: `(raw_size_bytes << 8) | HDR.RAW`.
+- `HDR.FWD` — forwarding sentinel (only valid in from-space during
+ a collection). Header word: `(new_addr << 8) | HDR.FWD`. Because
+ every heap object is 8-byte aligned, `new_addr` shifted left 8
+ loses no information for any address representable in our memory
+ layout.
+
+### Per-type trace and size
+
+The collector dispatches on `hdr.low_byte` and produces (a) the
+total size in bytes of the allocation including the 8-byte header,
+and (b) a list of slot offsets containing tagged pointer fields.
+
+| HDR | Total size | Pointer-bearing slots |
+|-------------|-----------------------------------------|-------------------------------------------|
+| `BV` | 24 (hdr, len, cap), data via `BV.data` | `BV.data` (points to a `RAW` block) |
+| `CLOSURE` | 32 | `CLOSURE.params`, `CLOSURE.body`, `CLOSURE.env` |
+| `PRIM` | 24 | `PRIM.data` (only if entry is a parameterized prim — see below) |
+| `TD` | 32 | `TD.name`, `TD.fields` |
+| `REC` | `16 + nfields*8` (read `nfields` from `td`) | `td` slot + each field slot |
+| `MV` | `8 + count*8` (read `count` from header high bytes) | each value slot |
+| `RAW` | `8 + align8(size)` (read `size` from header high bytes) | none |
+| `FWD` | (must not be visited; precondition violation if seen during scan) | n/a |
+
+PAIRs are **not** in this table because they are tagged with
+`TAG.PAIR`, not `TAG.HEAP`. Their layout is fixed:
+`[car | cdr]`, no header byte. PAIR copy is special-cased: 16
+bytes, two tagged-pointer slots (car at offset 0, cdr at +8).
+
+Because PAIRs have no header byte, the collector also can't store
+a forwarding sentinel at offset 0 the same way it does for HEAP
+objects. Instead we use the convention: for a forwarded PAIR,
+overwrite the **car** slot with `(new_addr << 3) | TAG.PAIR` (a
+self-tagged forward) and set the **cdr** slot to a sentinel
+`IMM.UNBOUND`. To detect: a from-space pair is forwarded iff its
+cdr slot equals the `UNBOUND` immediate. (We pick `UNBOUND`
+because it is an immediate that user code never legitimately stores
+into a cdr — it is reserved for "symbol unbound" lookups.)
+
+For PRIM, the `data` slot is *only* a tagged pointer when the prim
+is a parameterized prim (record accessor / mutator / ctor /
+predicate). For plain prims it's zero. Trace logic: read the slot;
+if `tagof != TAG.FIXNUM && tagof != 0`, treat as a pointer.
+Equivalent and simpler: always trace `PRIM.data` — fixnum-tagged
+words and zero will fail the tag check inside the forwarding
+routine and pass through unchanged.
+
+### Allocation hooks
+
+`cons` and `alloc_hdr` and `alloc_bytes` (which now emits a
+`RAW` header) gain an OOM check that triggers a collection
+instead of aborting:
+
+```
+:cons
+ load from_space_next, from_space_end
+ if next + 16 <= end: bump, write, return
+ else:
+ save a0, a1 to a known location (Scheme stack push)
+ call gc_collect
+ pop a0, a1 (now forwarded if they were heap pointers)
+ retry: this time it must succeed, else abort with msg_heap_full
+```
+
+The save-restore around `gc_collect` is exactly the in-flight
+register protocol: the allocator's inputs are spilled onto the
+Scheme stack so they participate as roots during the collection.
+
+## Forwarding
+
+`forward(tagged_ptr)` is the core operation, called by both root
+forwarding and the scan loop:
+
+```
+case tagof(p):
+ FIXNUM, IMM, SYM: return p unchanged # not a heap ref
+ PAIR:
+ raw = p - 1
+ if cdr(raw) == imm_val(UNBOUND): # already forwarded
+ return car(raw) # holds new tagged ptr
+ new = bump to-space by 16
+ new[0] = car(raw); new[8] = cdr(raw)
+ car(raw) = (new << 3) | TAG.PAIR # write forward
+ cdr(raw) = imm_val(UNBOUND) # forward marker
+ return car(raw)
+ HEAP:
+ raw = p - 3
+ hdr = ld(raw, 0)
+ if (hdr & 0xff) == HDR.FWD:
+ return ((hdr >> 8) | TAG.HEAP-untagged-arith) # extract new
+ size = size_of(hdr)
+ new = bump to-space by size
+ memcpy(new, raw, size)
+ st(raw, 0, (new << 8) | HDR.FWD)
+ return new + 3
+```
+
+The scan loop walks to-space byte-by-byte using `size_of` to skip
+over each copied object, calling `forward` on each tagged-pointer
+slot listed by the per-type tracer.
+
+## Pinned allocations
+
+Some interpreter-owned allocations must not move:
+- Symtab name buffers (`alloc_bytes_main` from `intern`).
+- TD objects that hold field-name lists (`alloc_hdr_main` /
+ `cons_main` from `eval_define_record_type`).
+- Pre-allocated MACRO objects, special-form name strings, etc.
+
+Today these live in the main heap, distinguished from user
+allocations only by their use of the `*_main` allocator suffix.
+After GC introduction, they need to live in a region the collector
+**doesn't** sweep. Two options:
+
+1. **Move pinned allocations to a separate "perm" region.** Rename
+ `alloc_*_main` to `alloc_*_perm`, point them at a third BSS arena
+ (perm) sized for ~1 MiB. Collector treats perm as a root region
+ (scans every word, forwards heap pointers) but never moves perm
+ objects. Cleaner.
+2. **Keep pinned allocations in from-space and special-case them
+ during copy.** Tag a "pinned" bit in the header; collector copies
+ the *contents* into to-space conceptually but actually leaves
+ them in place and just forwards their internal references.
+ Complicates the moving invariant.
+
+Recommend option 1. Perm region is small, write-once, scan-only.
+The handful of `*_main` call sites in the interpreter all become
+`*_perm`.
+
+## Collection lifecycle
+
+Init (called from `heap_init`):
+1. Reserve both semispaces and the perm region in BSS.
+2. `from_space = space_a`, `to_space = space_b`,
+ `from_space_next = from_space_start`.
+3. `scheme_sp = scheme_stack_base`.
+
+Trigger:
+- Only on alloc-fail in the from-space. No proactive trigger, no
+ threshold-based trigger.
+- A collection that fails to free enough space for the pending
+ allocation aborts with `msg_heap_full`. (No heap growth.)
+
+Collection body (`gc_collect`):
+1. Swap from/to roles. Set `to_space_next = to_space_start`.
+2. Forward all roots:
+ - Walk `[scheme_stack_base, scheme_sp)`: for each word, replace
+ in place with `forward(word)`.
+ - Walk `[symtab[0], symtab[symtab_count])`: for each entry, replace
+ `global_val` with `forward(global_val)`.
+ - Walk perm region's tagged-pointer fields (via the same
+ header-driven trace dispatch).
+ - Forward the allocator's spilled `a0`/`a1` if a collection
+ fired from inside an allocator.
+3. Scan: cursor walks new from-space until it catches `next`.
+ For each object, forward each pointer-bearing slot per the
+ per-type trace.
+4. Optionally zero the old from-space (debug only — helps catch
+ stale pointers; off by default for speed).
+
+Post-conditions:
+- All live objects copied to the new from-space; all forwarding
+ pointers consumed.
+- `from_space_next` reflects the new occupancy.
+- The old from-space is logically free.
+
+## Test posture
+
+A debug build flag (compile-time `%macro GC_DEBUG_FILL_FROM() %endm`)
+that, when defined, fills the old from-space with a poison byte
+after collection. Any surviving raw pointer to a copied object will
+read poisoned bytes on its next dereference and crash visibly.
+
+Test scaffolding (under `tests/scheme1/`):
+- `gc-stress.scm`: allocate ~10 MiB of garbage in a loop with a
+ small live set; assert the heap doesn't grow.
+- `gc-identity.scm`: pre-collection vs post-collection `eq?` on
+ pairs / records / closures should keep its answer.
+- `gc-mutation.scm`: `set-car!` on a pair forwarded mid-collection
+ is observed correctly.
+- `gc-cons-main.scm`: pinned objects survive collections without
+ changing identity.
+- `gc-during-bv.scm`: trigger collection during `bv-grow`; verify
+ the BV's `RAW` data block is correctly forwarded.
+- `gc-during-parser.scm`: large input program forces collection
+ during parsing; verify reader-built lists end up correct.
+- `gc-during-cc.scm`: run `cc.scm` on a small input end-to-end
+ with an artificially shrunken from-space (e.g. 4 MiB) so
+ collection is exercised at every phase. Output must be
+ byte-identical to the un-shrunken run.
+
+## Migration plan
+
+The change is large enough to land in stages. Each stage is
+independently testable; stages 1–4 are pure refactors that change
+no observable behavior.
+
+1. **Add the perm region.** Introduce `*_perm` allocators and
+ migrate every `*_main` call site to `*_perm`. The "main" name
+ is freed up to mean "GC-managed" later. No semantic change.
+2. **Add HDR.RAW.** Replace `alloc_bytes` with a variant that
+ emits a `RAW` header. Update `bv_alloc`, `bv_grow`, and the
+ single other `alloc_bytes` call site (string allocator) to
+ accept the new offset of the data buffer (now `+8` past the
+ raw allocation start). Heap is now parsable.
+3. **Add the Scheme value stack.** Build the BSS region, the
+ `scheme_sp` global, and the `%senter` / `%sleave` / `%sst` /
+ `%sld` / `%spush` / `%spop` / `%fn3` macros. No callers yet.
+4. **Migrate frames.** Convert every `%fn2` whose locals hold
+ tagged Scheme values to `%fn3`, splitting locals into Scheme
+ vs raw. Order: leaves first (`bind_params`, `eval_args`,
+ `apply_build_args`), then `eval` and the `eval_*` family,
+ then primitives, then the parser's value-producing leaves
+ (`parse_atom`, `parse_string`, `parse_char`, `parse_list`,
+ `parse_u8_body`, `parse_one`), then the writer
+ (`write_to_bv`, `write_pair_to_bv`, `value_to_bv`).
+5. **Wire the collector.** Add `gc_collect`, `forward`,
+ `size_of`, and the per-type trace dispatch. Hook into
+ `cons` / `alloc_hdr` / `alloc_bytes` OOM paths.
+6. **Shrink the heap and validate.** Cut the from-space from
+ 128 MiB to 4 MiB temporarily and run the full
+ `tests/scheme1/` and `tests/cc-pp/` suites. Restore.
+
+After stage 4 the interpreter still works exactly as today (Scheme
+values just live in a different stack), so each stage can be
+landed and tested independently. Stage 5 is where new behavior
+appears.
+
+## Open items
+
+- **Pinned cons cells.** `cons_main` produces PAIRs in main; with
+ GC, those PAIRs must instead live in perm. Need to confirm
+ every existing `cons_main` caller is fine with that.
+- **Multi-value packs as roots in mid-flight.** `prim_call_with_values`
+ and friends pass MV-packs through registers. Audit to ensure no
+ MV-pack is held only in a register across an allocation.
+- **`apply` with deep arg lists.** `apply_build_args` already builds
+ the arg list incrementally; verify that the head/tail pointers
+ are spilled to the Scheme stack across each `cons`.
+- **Statistics.** A `(gc-stats)` primitive returning collection
+ count, bytes copied, last pause length. Useful for the test
+ scaffolding and for `cc.scm` self-instrumentation. Defer until
+ after stage 5 lands.