commit 83b6da62fabbdda3a189d49eba611679a4781546
parent 9d8abb8d6a2cd646c041021c86a3337af964f64c
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 27 Apr 2026 14:03:58 -0700
docs rm old
Diffstat:
| D | docs/CC-CONTRACTS.md | | | 578 | ------------------------------------------------------------------------------ |
| D | docs/CC-INTERNALS.md | | | 764 | ------------------------------------------------------------------------------- |
| D | docs/CC-PUNCHLIST.md | | | 464 | ------------------------------------------------------------------------------- |
| D | docs/LISP-C.md | | | 1105 | ------------------------------------------------------------------------------- |
| D | docs/LISP-PMATCH.md | | | 259 | ------------------------------------------------------------------------------- |
| D | docs/SCHEME1-R7RS-TODO.md | | | 118 | ------------------------------------------------------------------------------- |
6 files changed, 0 insertions(+), 3288 deletions(-)
diff --git a/docs/CC-CONTRACTS.md b/docs/CC-CONTRACTS.md
@@ -1,578 +0,0 @@
-# lispcc contracts
-
-Frozen interfaces between modules. Engineers must not diverge from
-these without proposing a change. This document is the source of truth
-for the symbol alphabets, test formats, ABI, and phase-1 milestone
-referenced from [CC-INTERNALS.md](CC-INTERNALS.md).
-
-## 1. Symbol alphabets
-
-Every record's `kind`-style fields use these exact symbols. Adding a
-symbol = updating this section first.
-
-### 1.1 `tok-kind`
-
-```
-IDENT KW INT STR CHAR PUNCT HASH NL EOF
-```
-
-Uppercase to distinguish from value-level symbols. `IDENT` carries an
-unrecognized identifier; `KW` is one of the symbols in §1.3.
-
-### 1.2 `PUNCT` value symbols
-
-The lexer produces `tok-value` symbols for punctuators per the
-following table. **Names are mandatory** — no engineer may use the raw
-`'+`, `'*`, etc. as symbols, because several C punctuator characters
-(`%`, `|`, `,`, `;`, `(`, `)`, `{`, `}`, `[`, `]`, `.`, `#`) cannot
-form valid scheme1 symbols. We use named symbols for *all* punctuators
-to keep the scheme uniform.
-
-| C | Symbol | C | Symbol |
-|---------|-------------|----------|------------|
-| `[` | `lbrack` | `==` | `eq2` |
-| `]` | `rbrack` | `!=` | `ne` |
-| `(` | `lparen` | `<` | `lt` |
-| `)` | `rparen` | `>` | `gt` |
-| `{` | `lbrace` | `<=` | `le` |
-| `}` | `rbrace` | `>=` | `ge` |
-| `.` | `dot` | `<<` | `shl` |
-| `->` | `arrow` | `>>` | `shr` |
-| `,` | `comma` | `&&` | `land` |
-| `;` | `semi` | `\|\|` | `lor` |
-| `:` | `colon` | `&` | `amp` |
-| `?` | `qmark` | `\|` | `bar` |
-| `...` | `ellipsis` | `^` | `caret` |
-| `++` | `inc` | `~` | `tilde` |
-| `--` | `dec` | `!` | `bang` |
-| `+` | `plus` | `=` | `assign` |
-| `-` | `minus` | `+=` | `plus-eq` |
-| `*` | `star` | `-=` | `minus-eq` |
-| `/` | `slash` | `*=` | `star-eq` |
-| `%` | `pct` | `/=` | `slash-eq` |
-| `#` | `hash` | `%=` | `pct-eq` |
-| `##` | `paste` | `<<=` | `shl-eq` |
-| | | `>>=` | `shr-eq` |
-| | | `&=` | `amp-eq` |
-| | | `^=` | `caret-eq` |
-| | | `\|=` | `bar-eq` |
-
-Digraphs (`<:` `:>` `<%` `%>` `%:` `%:%:`) lex as their standard
-equivalents: same symbol on the right-hand side of the table.
-
-### 1.3 `KW` value symbols
-
-```
-;; storage
-auto register static extern typedef
-;; qualifiers (parsed and discarded)
-const volatile restrict inline
-;; type specifiers
-void char short int long signed unsigned _Bool
-;; rejected type specifiers (lexed as KW so we get clean diagnostics)
-float double
-;; aggregates
-struct union enum
-;; statements
-if else while do for switch case default break continue return goto
-;; operators
-sizeof
-;; reserved-and-rejected (lexed as KW so we error crisply)
-_Generic _Atomic _Thread_local _Alignof _Alignas _Static_assert
-_Complex _Imaginary
-```
-
-Anything matching the C identifier grammar that is **not** in this
-list lexes as `IDENT`.
-
-### 1.4 `ctype-kind`
-
-```
-void i8 u8 i16 u16 i32 u32 i64 u64 bool
-ptr arr fn struct union enum
-```
-
-Char is `i8` (`signed char`) or `u8` (`unsigned char`); `char` itself
-is `i8` (we treat plain `char` as signed, matching MesCC and most
-compilers). Long and long long collapse to `i64`/`u64` on P1-64.
-
-### 1.5 `opnd-kind`
-
-```
-imm frame global reg
-```
-
-`reg` is transient — used for the result of a call before it spills
-to a frame slot. The vstack itself never holds `reg` opnds; cg
-materializes through `reg` only inside a single emission step.
-
-### 1.6 `macro-kind`
-
-```
-obj fn fn-vararg
-```
-
-### 1.7 `sym-kind`
-
-```
-var fn typedef enum-const param label
-```
-
-### 1.8 `sym-storage`
-
-```
-auto static extern register
-```
-
-`#f` for `typedef`, `enum-const`, and `label` symbols (storage
-class doesn't apply).
-
-### 1.9 `loop-ctx-kind`
-
-```
-while do for switch
-```
-
-### 1.10 `reg` opnd register names
-
-```
-a0 a1 a2 a3
-```
-
-Only argument registers. Saved registers (`s0`..`s3`) and temporaries
-(`t0`..`t2`) are cg-private; never exposed as opnd payload.
-
-### 1.11 `cg-binop` and `cg-unop` operator symbols
-
-```
-;; cg-binop:
-add sub mul div rem
-and or xor shl shr
-eq ne lt le gt ge
-
-;; cg-unop:
-neg ;; arithmetic negate
-bnot ;; bitwise complement (~)
-lnot ;; logical not (!)
-```
-
-These are abstract operations independent of source-level PUNCT
-symbols; the parser maps PUNCT → cg-op (e.g., `'plus` → `'add`,
-`'eq2` → `'eq`).
-
-## 2. Test serialization formats
-
-Two test shapes coexist:
-
-- **Pure-transformation suites** (`cc-lex`, `cc-pp`) byte-diff a
- Scheme-readable serialization (§2.1).
-- **Codegen / language suites** (`cc-cg`, `cc`)
- compile-and-run the emitted program and assert runtime behavior
- (§2.2). No P1pp-text goldens.
-
-### 2.1 Token line format
-
-One token per line, as a Scheme list:
-
-```
-(KIND VALUE FILE LINE COL)
-```
-
-- **KIND**: bare symbol from §1.1.
-- **VALUE** rendering depends on KIND:
- - `IDENT`, `STR`: bytevector literal `"..."` with `\n \t \r \\ \"`
- escapes; non-ASCII bytes as `\xNN`.
- - `INT`, `CHAR`: decimal integer.
- - `KW`, `PUNCT`: bare symbol from §1.2 / §1.3.
- - `HASH`, `NL`, `EOF`: `#f`.
-- **FILE**: bytevector literal.
-- **LINE**, **COL**: decimal integers (1-based).
-
-The `tok-hide` field is **not** serialized — it is implementation
-detail of the preprocessor.
-
-Example for `int main() { return 0; }` in `t.c`:
-
-```
-(KW int "t.c" 1 1)
-(IDENT "main" "t.c" 1 5)
-(PUNCT lparen "t.c" 1 9)
-(PUNCT rparen "t.c" 1 10)
-(PUNCT lbrace "t.c" 1 12)
-(KW return "t.c" 1 14)
-(INT 0 "t.c" 1 21)
-(PUNCT semi "t.c" 1 22)
-(PUNCT rbrace "t.c" 1 24)
-(EOF #f "t.c" 1 25)
-```
-
-Trailing whitespace and `;`-comments in the golden file are ignored.
-
-### 2.2 Runtime fixture format
-
-Every cc-cg / cc fixture compiles to a runnable program
-whose runtime behavior is the assertion. Two sibling files describe
-the expectation:
-
-```
-<name>.expected-exit # decimal integer; default 0 if absent
-<name>.expected # exact stdout match; default empty if absent
-```
-
-Stdout and stderr are merged in the runner. The harness exits non-zero
-if either expectation fails.
-
-Fixture-source conventions:
-
-- **cc-cg** (`<name>.scm`): drives the cg API directly, ending in
- `(write-bv-fd 1 (cg-finish cg))`. The fixture must define every
- symbol it references — `cg-call`s into externs are out of scope for
- this suite (no libc linkage).
-- **cc** (`<name>.c`): a complete C translation unit including a
- `main` that returns the asserted value. The full compiler driver
- (`cc/cc.scm`: lex+pp+parse+cg) runs against it. Holds both
- feature-targeted drills and full-envelope scenarios; once multi-TU
- / libc linkage land, those fixtures live here too.
-
-Negative tests (compiler is supposed to die) set `expected-exit` to a
-non-zero value and may rely on the diagnostic-prefix check in §2.3
-rather than asserting exact stdout.
-
-### 2.3 Diagnostic format
-
-Already canonical from CC-INTERNALS:
-
-```
-<file>:<line>:<col>: error: <msg>: <irritants...>
-```
-
-Tests for failure paths verify:
-- exit status is 1
-- stderr contains the expected `<file>:<line>:<col>: error:` prefix
-
-The `<msg>` body is **not** matched character-for-character (so we
-can refine wording without breaking tests); only the prefix and a
-keyword-substring of the engineer's choice.
-
-## 3. Frame layout / parameter ABI
-
-### 3.1 cg-fn-begin contract
-
-```scheme
-(cg-fn-begin cg name params return-type) -> param-syms
-;; name: bv (un-mangled C identifier)
-;; params: list of (name-bv . ctype)
-;; return-type: ctype
-;; param-syms: alist (name-bv . sym), each sym already bound to a frame slot
-```
-
-Inside `cg-fn-begin`, cg:
-
-1. Allocates one frame slot per parameter via `cg-alloc-slot`. Slot
- width = `ctype-size` rounded up to 8 (`align-up`); align = 8.
- (Yes, every param costs at least 8 bytes. P1-64 frame is
- word-stride; we don't pack.)
-2. Begins emitting into `cg-fn-buf`. Does **not** yet emit the
- prologue — that's deferred to `cg-fn-end` once `frame-hi` is final.
-3. Emits the param-spill code into a "prologue prefix" buffer
- (private to cg): for params 0..3, `ST aN, [sp + slotN]`; for
- params 4+, `LDARG t0, K` then `ST t0, [sp + slotK]`.
-4. Returns the param-sym alist. Parser binds these into the function-
- body scope.
-
-### 3.2 cg-fn-end contract
-
-```scheme
-(cg-fn-end cg)
-```
-
-cg:
-
-1. Reads final `frame-hi` (highest byte allocated).
-2. Emits the per-function preamble (an M1pp `%struct` is **not**
- used — slots are numeric byte offsets, baked into the body text
- already buffered in `fn-buf`).
-3. Wraps the prologue-prefix + fn-buf inside a libp1pp `%fn` macro:
-
- ```
- %fn(<mangled-name>, <frame-hi-aligned-up-to-16>, {
- <prologue-prefix bytes>
- <fn-buf bytes>
- ::ret
- LD a0, [sp + <return-slot>]
- ; LD a1, [sp + <return-slot> + 8] when ret-type size > 8
- })
- ```
-4. Flushes the result into `cg-text`, clears `fn-buf` and the
- prologue-prefix buffer, resets `vstack`, `frame-hi`, and the
- function-local label counter.
-
-The frame size is rounded up to 16 to satisfy the P1 stack-align
-contract.
-
-The return slot itself is sized to `max(8, ctype-size(ret-type))`
-rounded up to 8, 8-byte aligned. Slot width is what dispatches the
-load epilogue across the three result conventions defined in
-[P1.md §Arguments and return values](P1.md#arguments-and-return-values):
-
-| Width | Convention | Epilogue |
-|---|---|---|
-| ≤ 8B | one-word direct | `LD a0, [sp + slot]` |
-| 9–16B | two-word direct | `LD a0, [sp + slot]; LD a1, [sp + slot + 8]` |
-| > 16B | indirect-result | `LD a0, [sp + sret-slot]` (reload caller's buffer ptr) |
-
-For the indirect-result convention (struct/union return > 16B), the
-caller materializes a fresh receive slot's address in `a0` before the
-call; explicit args 0..2 shift to `a1..a3` and args 3+ stage to
-incoming stack-arg slots 0+. The callee's prologue spills the
-incoming `a0` to a private sret-slot; `cg-return` copies the source
-struct's bytes through that saved pointer; the epilogue reloads
-`a0` from the sret-slot so the convention's "a0 unchanged on
-return" holds even if the body clobbered `a0`.
-
-The cg lowers all three conventions; the parser surface (return
-statement, struct-typed call result) is identical regardless of
-which convention applies.
-
-### 3.3 Loop tag protocol
-
-```scheme
-(cg-loop cg head-thunk body-thunk) -> tag
-```
-
-`cg-loop` allocates a fresh per-function tag (`L0`, `L1`, …), emits
-the libp1pp `%loop_tag(<tag>, { … })` wrapper, runs `head-thunk`
-inside the loop head (it is expected to leave the condition opnd on
-the vstack), pops the condition and emits an `%if_eqz(t0, %break(tag))`,
-then invokes `body-thunk` **with the tag as its single argument**:
-
-```scheme
-(cg-loop cg
- (lambda () (cg-push-imm cg %t-i32 1))
- (lambda (tag)
- (cg-continue cg tag)
- (cg-break cg tag)))
-```
-
-The parser uses the same tag for any `cg-break` / `cg-continue` calls
-made during body emission. `cg-loop` returns the tag to its caller as
-well, so post-loop teardown code may reference it; `cg-loop-end` is a
-no-op kept for symmetry.
-
-Switch dispatch follows the same pattern: `cg-switch-begin` returns a
-`swctx` whose `swctx-end-tag` accessor exposes cg's break-target tag
-to the parser.
-
-### 3.4 Outgoing-arg staging
-
-When `cg-call` is asked to emit a call with arity > 4, it stages
-args 4..(N-1) into the *low-addressed* prefix of the current frame
-at `[sp + 0*8]`, `[sp + 1*8]`, etc., per LIBP1PP.md §Frame locals.
-cg tracks the maximum staging count seen across the function and
-reserves that prefix at fn-end before any other slots — i.e.,
-`cg-alloc-slot`'s first allocation comes *after* the staging area.
-
-The accounting is internal to cg. Parse never sees staging slots.
-
-### 3.5 cg-alloc-slot contract
-
-```scheme
-(cg-alloc-slot cg bytes align) -> offset
-```
-
-- `bytes` = total size needed (e.g., 4 for `int`, 40 for `int[10]`,
- `sizeof(struct foo)` for a struct).
-- `align` = required alignment (1, 2, 4, or 8).
-- Returns numeric byte offset relative to `sp` post-`%enter`.
-
-cg first aligns `frame-hi` up to `align`, returns that as the
-offset, then bumps `frame-hi` by `bytes`. Slots are not reused
-across scopes (we're optimizing for compiler simplicity, not frame
-size). Local arrays and structs request their full size in one call.
-
-## 4. Conversion responsibility
-
-The parser drives type semantics; cg is type-aware only enough to
-choose signed-vs-unsigned variants and to scale pointer arithmetic.
-
-### 4.1 Parser's responsibilities
-
-The parser **must** call cg in this order around each operation:
-
-| Source | Required parser actions before the operation |
-|--------|----------------------------------------------|
-| `e1 + e2` (and other arith binops) | (a) parse e1 → if lval, `cg-load`; (b) `cg-promote` if rank < int; (c) parse e2 same way; (d) `cg-arith-conv` to bring both to common type; (e) `cg-binop add` |
-| `*p` | parse p → if lval, `cg-load`; then `cg-push-deref` |
-| `&x` | parse x → must be lval; then `cg-take-addr` |
-| `(T)e` | parse e → if lval, `cg-load` (unless casting to a pointer); then `cg-cast T` |
-| `f(a, b, ...)` | parse f → if lval and `f` not a function-typed identifier, `cg-load`; parse each arg → `cg-load` if lval, then `cg-cast` to param type (or default-promote for variadic args); then `cg-call` |
-| `lhs = rhs` | parse lhs → must be lval (no load); parse rhs → `cg-load` if lval; `cg-assign` (cg internally casts rhs to lhs type — parse cannot peek beneath vstack top) |
-| `lhs += rhs` (and other compound assigns) | parse lhs (lval) → `cg-dup` to preserve the lval across the read; `cg-load` (consumes one copy); parse rhs → `cg-load` if lval; `cg-arith-conv`; `cg-binop <op>`; `cg-assign` (cg casts internally) |
-| `++lhs` / `--lhs` | parse lhs (lval) → `cg-dup`; `cg-load`; `cg-push-imm 1`; `cg-binop add`/`sub`; `cg-assign` |
-| `lhs++` / `lhs--` | parse lhs (lval) → `cg-postinc` / `cg-postdec` (atomic primitive: dups+loads to capture old rval, then dup+load+`+1`+assign for the store, finally pushes the saved old rval) |
-| `return e` | parse e → `cg-load` if lval; `cg-cast` to fn return type; `cg-return` |
-| `if (e) ...` | parse e → `cg-load` if lval; `cg-cast bool` if not already int-shaped; `cg-if` |
-| `c ? a : b` | parse c → `cg-load` if lval; `cg-ifelse-merge` with each thunk parsing one arm and ending with `rval!`; result type is the first arm's type |
-| `a && b` | parse a → `cg-load` if lval; `cg-ifelse-merge` with then-arm = `parse b; rval!; cg-cast bool; cg-cast i32` and else-arm = `cg-push-imm i32 0` |
-| `a \|\| b` | mirror of `&&`: then-arm = `cg-push-imm i32 1`; else-arm = parse b + bool/i32 cast |
-| `a, b` | parse a (its rval is on top) → `cg-pop`; parse b → `cg-load` if lval; the comma's value is b |
-| `sizeof e` | parse e (don't suppress emission); peek `(opnd-type (cg-top …))`'s `ctype-size`; `cg-pop`; `cg-push-imm u64 size` |
-
-The parser is responsible for the standard:
-
-- **Integer promotion**: any operand of type rank below `int` is
- promoted to `int` (or `unsigned int` if it can't fit) before use
- in arithmetic, before assignment to a wider lhs in mixed contexts,
- and before being passed as a variadic argument.
-- **Usual arithmetic conversions**: applied to both operands of a
- binary arithmetic operator after promotion. The result type is
- the common type.
-- **Pointer-int interaction**: detected by parser; `cg-binop add`
- on (ptr, int) handles scaling internally (see §4.2).
-
-### 4.2 cg's responsibilities
-
-cg trusts the operand types it is handed.
-
-- **`cg-load`**: pop lval, emit one load (of the right width based
- on `ctype-size`), push rval of the same type.
-- **`cg-cast to-type`**: pop, emit sign-extend / zero-extend /
- truncate as needed based on source vs. target sizes and signedness.
- For `to-type = bool`: emit `(BNEZ -> 1, fallthrough -> 0)` shape.
- Pointer ↔ integer casts are bit-for-bit on P1-64 (no emission).
-- **`cg-binop add` (and `sub`)**: if exactly one operand is a `ptr`,
- scale the int operand by `ctype-size` of the pointee before adding.
- If both ptr → only `sub` is valid (yields `i64` byte difference,
- divided by element size); other binops on (ptr, ptr) abort via
- `die`.
-- **`cg-binop` for divisions and comparisons**: dispatch to signed
- (`DIV`/`BLT`) or unsigned (`DIV`+sign-flip / `BLTU`) variant based
- on the operand kinds (`i*` → signed, `u*` → unsigned). After
- `cg-arith-conv`, both operands have the same kind, so dispatch is
- unambiguous.
-
-cg never:
-- auto-loads an lvalue
-- auto-promotes
-- auto-converts arguments
-- looks at fn-ctx return type (parser passes the cast)
-
-This split keeps cg under ~600 LOC by pushing all "C language"
-knowledge into parse.
-
-## 5. Symbol-to-label mangling
-
-Three label namespaces in the emitted P1pp:
-
-### 5.1 User globals (functions, variables)
-
-```
-C identifier "foo" → P1pp label :cc__foo
-```
-
-Verbatim concatenation. C identifiers can't contain `:` or other
-P1pp-special characters, so no escaping is needed. The `cc__` prefix
-guarantees no collision with libp1pp internals (`libp1pp__*`),
-backend stubs (`_start`, `p1_main`, `sys_*`), or our own runtime
-support.
-
-`static` storage at file scope changes nothing about the label —
-since we have one TU, internal-linkage and external-linkage symbols
-share the same namespace. `static` only suppresses any future
-"export to other TUs" emission, which we don't do anyway.
-
-### 5.2 String pool
-
-```
-n-th distinct string literal → :cc__str_<n>
-```
-
-`<n>` is a fresh decimal counter starting at 0, advanced only on
-non-deduplicating insert. Identical string literals share a label
-(idempotent intern).
-
-### 5.3 Function-internal labels
-
-Inside `%fn(...)`, libp1pp's `%scope` mechanism prefixes short
-labels (`::ret`, `::lbl_42`) to `<fnname>__ret`, `<fnname>__lbl_42`
-at M1pp time. cg uses short labels exclusively inside fn-buf:
-
-- `::ret` — single function exit
-- `::lbl_<n>` — anonymous control-flow targets (for switch cases,
- short-circuit eval, etc.)
-- `::user_<name>` — user-written `goto` labels (C `myloop:` →
- `::user_myloop`). The `user_` prefix prevents collisions with our
- `lbl_` and `ret` labels.
-
-Loop tags (for libp1pp's `%loop_tag`, `%break(tag)`, `%continue(tag)`)
-are not labels — they're macro-name fragments. cg generates them as
-`L<n>` (no `cc__` prefix; tag namespace is per-function and `%fn`
-already scopes them).
-
-### 5.4 Entry stub
-
-cg emits a small entry stub at `cg-finish` time:
-
-```
-%fn(p1_main, %p1_main_f.SIZE, {
- ; argc = a0, argv = a1 already
- %la_br(&cc__main)
- %call
- %eret
-})
-```
-
-So `int main(int argc, char **argv)` is reached from the P1
-program-entry contract.
-
-If the user's `main` has no parameters, the stub still passes
-argc/argv — main just ignores them, which is harmless. If the user
-defines `main` with a different return type, that's a CC.md
-violation; cg can either die or emit and let the cast happen at the
-return site. Recommend: parser checks at `cg-fn-end` time when
-fn-name == `main`.
-
-## 6. Phase-1 milestone
-
-```c
-int main(int argc, char **argv) {
- return argc;
-}
-```
-
-This is the integration target every engineer aims at. It goes
-through:
-
-- **lex**: `int`, `main`, `(`, `int`, `argc`, `,`, `char`, `*`, `*`,
- `argv`, `)`, `{`, `return`, `argc`, `;`, `}` — all PUNCT and KW
- symbols touched are core; covers two of each kind.
-- **pp**: zero directives, but full token-list traversal.
-- **parse**: function definition, two-parameter list including
- `char **`, compound stmt, return stmt, identifier expression,
- lval→rval load.
-- **cg**: `cg-fn-begin` with two params, parameter spilling
- (one register-passed, one register-passed), `cg-push-sym`,
- `cg-load`, `cg-return`, `cg-fn-end`, `%fn` wrapping, and the
- `p1_main` entry stub.
-- **e2e**: link with arch backend + libp1pp; run as native ELF;
- exit code matches argc.
-
-Acceptance test: `tests/cc/00-return-argc.c` exists, the make
-target builds it, and:
-
-```
-$ ./tests/cc/build/00-return-argc ; echo $? → 1
-$ ./tests/cc/build/00-return-argc a b ; echo $? → 3
-```
-
-When this test passes on aarch64, amd64, and riscv64, phase 1 is
-complete.
-
-## Change protocol
-
-Anyone proposing a contract change:
-
-1. PR amends this doc first, with rationale.
-2. Affected modules + tests are listed.
-3. Changes land in one PR (doc + all affected code) so no engineer
- pulls a half-migrated tree.
diff --git a/docs/CC-INTERNALS.md b/docs/CC-INTERNALS.md
@@ -1,764 +0,0 @@
-# lispcc internals
-
-Companion to [CC.md](CC.md). CC.md says what we accept; this doc says
-how the implementation is organized so engineers can split work and
-test independently.
-
-The compiler lives in one file, `cc/cc.scm`, that bundles every
-module section in order; `cc/main.scm` is the one-liner entry point
-that fires `cc-main`:
-
-```
-build: catm build/$ARCH/cc/cc.scm scheme1/prelude.scm cc/cc.scm cc/main.scm
-run: scheme1 build/$ARCH/cc/cc.scm input.flat.c output.P1pp
-```
-
-Sections inside `cc.scm` are delimited by their original
-`;; cc/<name>.scm —` headers (util, data, lex, pp, cg, parse, main),
-so the layering below is still navigable by search.
-
-## Module DAG
-
-```
-util ────────┐
- │
-data ────────┼─► lex ──► pp ──► parse ─► main
- │ │
- └──────────────────────► cg ──► main
-```
-
-- **util** — leaf helpers; depends only on the scheme1 prelude.
-- **data** — record-type definitions used across modules.
-- **lex** — bytestream → token list. Pure function.
-- **pp** — token list → expanded token list. Pure function.
-- **cg** — codegen state and emission API. Mutates a `cg` record.
-- **parse** — token list + cg → P1pp output. Mutates `pstate` and
- drives `cg`.
-- **main** — argv handling, file I/O, ties phases together.
-
-Cycles are forbidden. The parse section calls cg but never the reverse.
-
-## Feature workflow
-
-When adding any new codegen-touching feature, **always** in this order:
-
-1. **cc-cg fixture first.** Write a `tests/cc-cg/<n>-name.scm` that drives
- the cg API directly to emit a self-contained program whose runtime
- exit code (and/or stdout) reflects the feature working. Add the
- `.expected-exit` (default `0`) and optional `.expected` stdout file.
- At this point the test fails — that's the spec.
-2. **Implement the cg primitive(s).** Add or fix the API surface in
- the cg section of `cc/cc.scm`. Iterate until `make test SUITE=cc-cg`
- passes.
-3. **cc fixture next.** Write a `tests/cc/<n>-name.c` that
- exercises the same feature from the C source side, with a `main`
- that exits with a known value. Add `.expected-exit` / `.expected`.
- Test fails — that's the spec for the parser.
-4. **Implement the parse changes.** Wire C syntax through the parse
- section of `cc/cc.scm`
- using the cg primitives from step 2. Iterate until
- `make test SUITE=cc` passes.
-
-This sequencing keeps the parser↔cg seam honest: cg is validated by
-runtime behavior on direct API calls before parse is even touched, so
-parse never papers over a cg bug. Steps 1+3 are the contract; steps 2+4
-are the implementation.
-
-## Conventions
-
-- **Naming**: every public function and accessor is prefixed by its
- module or record. `lex-tokenize`, `pp-expand`, `cg-push-imm`,
- `parse-translation-unit`, `tok-kind`, `ctype-size`. Internal helpers
- use a leading `%` (e.g. `%pp-expand-line`).
-- **Record constructors**: `%record-name` is the all-fields raw
- constructor; named factory functions like `make-tok` wrap it when
- defaults are useful.
-- **Mutators**: `field-set!` form, e.g. `cg-out-set!`, matching
- scheme1/prelude.scm.
-- **Bytevectors as strings**: every "string" in this codebase is a
- bytevector. We never use `symbol->string` for runtime data — symbols
- are reserved for the small fixed alphabets (token kinds, ctype
- kinds, opnd kinds).
-- **Errors**: `(die loc msg . irritants)` writes a diagnostic to fd 2
- and `sys-exit`s with status 1. No exceptions, no recovery. Every
- module uses the same `die`.
-- **No global state**: every long-lived datum lives in a record passed
- explicitly. The only top-level definitions are functions, constants,
- and the keyword/punctuator alists.
-
-## Errors
-
-```scheme
-(die loc msg . irritants) ; abort with formatted diagnostic
-(loc-of-tok tok) -> loc ; pull file/line/col out of a tok
-(loc file line col) -> loc ; construct manually (used by lexer)
-```
-
-Diagnostic format on stderr:
-
-```
-<file>:<line>:<col>: error: <msg>: <irritants display'd one-by-one>
-```
-
-`irritants` are written via `display` (no quoting). Pairs and integers
-print naturally; bytevectors print as their byte content.
-
-`die` is the only failure path. There is no warning level — anything
-worth saying aborts.
-
-## util.scm
-
-Bytevector and list helpers that scheme1/prelude.scm doesn't already
-provide.
-
-```scheme
-;; bytevector
-(bv= a b) -> bool ; alias for bytevector=?, terser
-(bv-prefix? p s) -> bool ; does s start with p?
-(bv-find bv byte from) -> idx-or-#f ; first occurrence at >= from
-(bv-slice bv start end) -> bv ; fresh copy
-(bv-of-string str) -> bv ; literal helper for inline ASCII
-(bv-of-byte b) -> bv ; 1-byte bv
-(bv-cat lst-of-bv) -> bv ; concat list, single allocation
-(bv->fixnum bv radix) -> (ok . val) ; (ok . #f) on parse fail
-(fixnum->bv n radix) -> bv
-
-;; lists / alists
-(alist-ref key al) -> val-or-#f
-(alist-ref/eq key al) -> val-or-#f ; eq? compare (for symbol keys)
-(alist-set key val al) -> al' ; cons new pair on the front
-(alist-update key f al) -> al' ; functional update
-(any p xs) -> bool
-(every p xs) -> bool
-(count p xs) -> fixnum
-
-;; ints
-(min3 a b c) -> fixnum
-(align-up n k) -> fixnum ; round n up to multiple of k
-
-;; output buffers (reversed list of bv chunks; flush builds in one pass)
-(make-buf) -> buf ; record: { chunks }
-(buf-push! buf bv) ; cons bv onto chunks
-(buf-flush buf) -> bv ; reverse + bv-cat
-
-;; diagnostics + I/O
-(die loc msg . irritants) -> never returns
-(slurp-fd fd) -> bv ; read until EOF
-(write-bv-fd fd bv) ; full write or die
-
-;; fresh names
-(make-namer prefix) -> proc ; (proc) -> bv "prefix0", "prefix1", ...
-```
-
-`make-buf` and `buf-push!` are the universal output primitive — both
-cg's three output streams and pp's expansion staging buffer use them.
-
-## data.scm
-
-All record types used by more than one module.
-
-### loc
-
-```scheme
-(define-record-type loc
- (%loc file line col)
- loc?
- (file loc-file) ; bv
- (line loc-line) ; fixnum
- (col loc-col)) ; fixnum
-```
-
-### tok
-
-```scheme
-(define-record-type tok
- (%tok kind value loc hide)
- tok?
- (kind tok-kind) ; symbol; see kinds table
- (value tok-value) ; varies by kind
- (loc tok-loc) ; loc
- (hide tok-hide)) ; list of bv (macro names already expanded)
-```
-
-| `kind` | `value` shape |
-|---------|--------------------------------------------------|
-| `IDENT` | bv (identifier name) |
-| `KW` | symbol (one of `if while ... typedef`) |
-| `INT` | fixnum (post-suffix integer value) |
-| `STR` | bv (raw bytes, no NUL terminator) |
-| `CHAR` | fixnum (value 0..255) |
-| `PUNCT` | symbol (`'+ '== '-> '... '## '#` …) |
-| `HASH` | `#f` (preprocessor only; line-leading `#`) |
-| `NL` | `#f` (significant only inside the preprocessor) |
-| `EOF` | `#f` |
-
-`make-tok` is the canonical constructor; pass `hide = '()` for
-freshly-lexed tokens.
-
-### macro
-
-```scheme
-(define-record-type macro
- (%macro kind params body)
- macro?
- (kind macro-kind) ; 'obj | 'fn | 'fn-vararg
- (params macro-params) ; list of bv
- (body macro-body)) ; list of tok
-```
-
-### ctype
-
-```scheme
-(define-record-type ctype
- (%ctype kind size align ext)
- ctype?
- (kind ctype-kind) ; 'void 'i8 'i16 'i32 'i64 'u8 'u16 'u32 'u64
- ; 'bool 'ptr 'arr 'fn 'struct 'union 'enum
- (size ctype-size) ; fixnum bytes; -1 = incomplete
- (align ctype-align) ; fixnum bytes
- (ext ctype-ext)) ; payload, kind-specific
-```
-
-ext payload by kind:
-
-| `kind` | `ext` shape |
-|-------------------|----------------------------------------------------------------------|
-| `void` / int / `bool` | `#f` |
-| `ptr` | pointee ctype |
-| `arr` | `(elem-ctype . length-or--1)` |
-| `fn` | `(ret-ctype params variadic?)` — `params` is list of ctype |
-| `struct`/`union` | `(tag-bv complete? fields)` — `fields` = `((name-bv ctype offset) ...)` |
-| `enum` | `(tag-bv ((const-name-bv . value) ...))` |
-
-Primitive ctypes are interned at startup as top-level bindings:
-`%t-void`, `%t-i8`, `%t-i32`, `%t-i64`, `%t-u8`, `%t-u32`, `%t-u64`,
-`%t-bool`, `%t-char-ptr`. Equality of primitive ctypes is `eq?`.
-Derived types are *not* deduped.
-
-### sym
-
-```scheme
-(define-record-type sym
- (%sym name kind storage type slot)
- sym?
- (name sym-name) ; bv
- (kind sym-kind) ; 'var 'fn 'typedef 'enum-const 'param 'label
- (storage sym-storage) ; 'auto 'static 'extern 'register | #f for typedef/enum
- (type sym-type) ; ctype
- (slot sym-slot)) ; locals/params: fixnum byte offset
- ; globals/fn: bv emitted-label
- ; enum-const: fixnum value
- ; typedef: #f
- ; label: bv P1pp local label
-```
-
-### opnd
-
-```scheme
-(define-record-type opnd
- (%opnd kind type ext lval?)
- opnd?
- (kind opnd-kind) ; 'imm 'frame 'global 'reg | sub-cases below
- (type opnd-type) ; ctype
- (ext opnd-ext) ; per kind
- (lval? opnd-lval?)) ; #t = the slot/place holds an *address*
- ; #f = the slot/place holds a *value*
-```
-
-ext by kind:
-
-| `kind` | `ext` |
-|----------|------------------------------------------------|
-| `imm` | fixnum literal value (lval? always `#f`) |
-| `frame` | fixnum byte offset in current function frame |
-| `global` | bv label name (the symbol's emitted P1pp label)|
-| `reg` | symbol `'a0`..`'a3` |
-
-`reg` is transient — used for the result of a call before it's
-spilled. The vstack itself holds only `imm`, `frame`, and `global`
-opnds.
-
-### pstate
-
-```scheme
-(define-record-type pstate
- (%pstate toks scope tags loops fn-ctx typedefs cg)
- pstate?
- (toks ps-toks ps-toks-set!) ; remaining tok list (head = lookahead)
- (scope ps-scope ps-scope-set!) ; list of alists: (bv . sym)
- (tags ps-tags ps-tags-set!) ; list of alists: (bv . ctype) for struct/union/enum
- (loops ps-loops ps-loops-set!) ; list of loop-ctx (break/continue stack)
- (fn-ctx ps-fn-ctx ps-fn-ctx-set!) ; current function record, or #f at file scope
- (typedefs ps-typedefs ps-typedefs-set!) ; flat alist (bv . #t) — fast typedef-name lookup
- (cg ps-cg)) ; codegen state (not mutated through pstate)
-```
-
-`typedefs` is a separate flat alist (not derived from `scope` at
-lookup time) because the lexer-vs-parser distinction in C requires
-fast "is this identifier a typedef-name *now*" answers during
-declaration parsing. Updated in lockstep with `scope`.
-
-### loop-ctx
-
-```scheme
-(define-record-type loop-ctx
- (%loop-ctx kind tag has-continue?)
- loop-ctx?
- (kind loop-ctx-kind) ; 'while 'do 'for 'switch
- (tag loop-ctx-tag) ; bv tag for libp1pp %_tag macros
- (has-continue? loop-ctx-has-continue?)) ; #f for switch
-```
-
-### fn-ctx
-
-```scheme
-(define-record-type fn-ctx
- (%fn-ctx name return-type params variadic? labels)
- fn-ctx?
- (name fn-ctx-name) ; bv
- (return-type fn-ctx-return-type) ; ctype
- (params fn-ctx-params) ; list of sym
- (variadic? fn-ctx-variadic?)
- (labels fn-ctx-labels ; alist (user-bv . emitted-bv)
- fn-ctx-labels-set!)) ; mutated as `goto`/labels resolve
-```
-
-### cg
-
-```scheme
-(define-record-type cg
- (%cg text data bss vstack frame-hi label-ctr str-pool globals fn-buf)
- cg?
- (text cg-text) ; buf — final text section (all functions)
- (data cg-data) ; buf — initialized globals + string pool
- (bss cg-bss) ; buf — zero-init globals
- (vstack cg-vstack cg-vstack-set!) ; list of opnd
- (frame-hi cg-frame-hi cg-frame-hi-set!) ; fixnum: frame bytes used so far
- (label-ctr cg-label-ctr cg-label-ctr-set!)
- (str-pool cg-str-pool cg-str-pool-set!) ; alist (bv-content . bv-label)
- (globals cg-globals cg-globals-set!) ; alist (bv-name . sym) — emitted globals
- (fn-buf cg-fn-buf cg-fn-buf-set!)) ; buf — body of current function
-```
-
-Functions are emitted into `fn-buf`, then on `cg-fn-end` flushed
-through libp1pp's `%fn(...)` wrapper into `text`. This lets us know
-the final `frame-hi` before writing the prologue.
-
-## lex.scm
-
-Pure: bytestream + filename → token list. No file I/O, no macro
-awareness.
-
-```scheme
-(lex-tokenize src file) -> (list-of tok)
-;; src: bv (the C bytestream)
-;; file: bv (filename, for tok-loc)
-;; result: list ending in a single 'EOF tok; never #f
-;; aborts via die on unrecognized byte sequences
-```
-
-Internal contract:
-
-- Comments (`/* ... */`, `// ...`) are removed but produce no tokens.
-- Trigraphs and `\<newline>` line splicing are applied before
- tokenization.
-- `NL` tokens are emitted at the end of every source line, even
- blank ones — the preprocessor needs them for directive termination.
-- Adjacent string literals are **not** concatenated here; that's
- pp's job (after macro expansion).
-- Keyword recognition: the lexer carries a fixed alist of
- (keyword-bv . keyword-symbol) and emits `KW` directly for matches.
- IDENT vs. KW is decided at lex time and never revised.
-- Punctuator longest-match table: `'<<=` before `'<<` before `'<`,
- etc.
-
-Helpers exposed for unit tests:
-
-```scheme
-(lex-read-number src pos) -> (tok . pos')
-(lex-read-string src pos file) -> (tok . pos')
-(lex-read-ident src pos) -> (tok . pos') ; produces IDENT or KW
-```
-
-### Test plan
-
-`tests/cc-lex/` mirrors `tests/scheme1/`. Each test is one `.c`
-fragment plus an `.expected-toks` file containing the expected
-serialized form (one tok per line, e.g. `KW int 1 1`). Driver script:
-
-```
-scheme1 cc/lex-test.scm < input.c | diff - expected-toks
-```
-
-## pp.scm
-
-Pure: token list + initial macro alist → expanded token list.
-
-```scheme
-(pp-expand toks initial-defines) -> (list-of tok)
-;; toks: lex-tokenize output
-;; initial-defines: alist (bv . macro) — from -D flags
-;; result: token list with HASH and NL stripped, KW/IDENT/INT/STR/CHAR/PUNCT/EOF only
-;; aborts via die on bad directive, undefined #if identifier-as-macro misuse, etc.
-```
-
-Internal structure:
-
-- A driver loop classifies each line (looking only at the leading
- HASH or non-HASH state) and dispatches to a directive handler or
- the macro-expansion engine.
-- The conditional stack: list of `(active? . has-taken?)` pairs.
- `#if`/`#ifdef` push; `#elif`/`#else` flip; `#endif` pops.
-- Macro expansion uses C11 6.10.3.4 hide-set discipline. Each
- emitted token's `hide` field is the union of hide-sets of its
- constituent body tokens plus the macro's own name.
-- `defined NAME` is a special form — recognized and resolved
- *before* macro expansion of an `#if` line.
-
-Directive handlers (each takes the tokens up to the next NL):
-
-```scheme
-(%pp-do-define line state)
-(%pp-do-undef line state)
-(%pp-do-if line state)
-(%pp-do-ifdef line state)
-(%pp-do-ifndef line state)
-(%pp-do-elif line state)
-(%pp-do-else line state)
-(%pp-do-endif line state)
-(%pp-do-error line state)
-(%pp-do-line line state)
-(%pp-do-pragma line state)
-(%pp-do-include line state) ;; always dies (per CC.md §Toolchain envelope)
-```
-
-`state` is a private record `pp-state` with fields
-`{macros cond-stack out file-base-line}`. Internal — not in data.scm.
-
-Constant-expression evaluator for `#if`:
-
-```scheme
-(pp-eval-cexpr toks macros) -> fixnum
-;; toks: tokens of the expression after macro expansion
-;; aborts on unrecognized identifier (treated-as-zero is wrong for our errors policy)
-```
-
-### Test plan
-
-`tests/cc-pp/` per-file tests, same shape as cc-lex. Inputs are
-already-tokenized fixtures (so pp can be exercised without the lexer)
-or `.c` files (full pipeline). Both flavors live side-by-side.
-
-## cg.scm
-
-Mutable codegen state and a value-stack-style emission API. The parser
-calls these and never touches output buffers directly.
-
-### Lifecycle
-
-```scheme
-(cg-init) -> cg ; fresh state
-(cg-finish cg) -> bv ; flush all buffers; result is final P1pp text
-```
-
-Every function lives between `cg-fn-begin` and `cg-fn-end`. Globals
-live outside.
-
-```scheme
-(cg-fn-begin cg name params return-type)
-;; name: bv
-;; params: list of sym (their slots are pre-assigned by parser)
-;; return-type: ctype
-(cg-fn-end cg) ; emits epilogue, flushes fn-buf into cg-text under %fn(...)
-```
-
-### Vstack: push / pop / inspect
-
-```scheme
-(cg-push cg opnd) ; push opnd onto vstack
-(cg-pop cg) -> opnd ; remove and return top
-(cg-top cg) -> opnd ; non-destructive
-(cg-depth cg) -> fixnum
-```
-
-### Materialize
-
-```scheme
-(cg-push-imm cg ctype value) -> opnd ; rval
-(cg-push-string cg bv-content) -> opnd ; rval, char* — interns into str-pool
-(cg-push-sym cg sym) -> opnd ; lval (var) or rval (fn name)
-(cg-push-deref cg) -> opnd ; pop ptr-rval, push lval (no emission yet)
-```
-
-### Address & deref operators
-
-```scheme
-(cg-take-addr cg) -> opnd ; pop lval, push its address as rval
-(cg-load cg) -> opnd ; pop lval, push rval (loaded through address)
-```
-
-### Type conversions
-
-```scheme
-(cg-cast cg to-type) -> opnd ; pop, push opnd cast to to-type; emits sign-extension etc. as needed
-(cg-promote cg) -> opnd ; integer promotion (rank ≤ int → int)
-(cg-arith-conv cg) ; usual arithmetic conversions on top two opnds
-```
-
-### Operators
-
-```scheme
-(cg-binop cg op) -> opnd ; pop b, pop a, push (a op b)
-(cg-unop cg op) -> opnd ; pop a, push (op a)
-(cg-assign cg) -> opnd ; pop rval, pop lval, cast rval to lval-type, store, push assigned value (cg-assign owns the cast since parse cannot peek beneath vstack top)
-```
-
-`op` for binop is a symbol from:
-`'add 'sub 'mul 'div 'rem 'and 'or 'xor 'shl 'shr 'eq 'ne 'lt 'le 'gt 'ge`.
-For unop: `'neg 'bnot 'lnot`.
-
-Signed vs. unsigned dispatch is handled inside cg by inspecting the
-operand types after `cg-arith-conv`.
-
-### Calls
-
-```scheme
-(cg-call cg arity has-result?) -> opnd-or-#f
-;; pops (arity + 1) opnds: function, then args left-to-right at top
-;; (i.e. callable was pushed first, then args, so arg-N is on top)
-;; emits arg-passing per P1 ABI, CALL or CALLR, captures return into a fresh frame slot
-;; pushes result opnd; returns it (or #f if has-result? is #f)
-```
-
-### Structured control flow
-
-These take thunks so the parser can recursively emit the body:
-
-```scheme
-(cg-if cg then-thunk) ; pop cond; emit %if_nez { (then-thunk) }
-(cg-ifelse cg then-thunk else-thunk) ; pop cond; emit %ifelse_nez { ... }{ ... }
-(cg-loop cg head-thunk body-thunk) -> tag ; head-thunk emits the cond; body-thunk is invoked as (body-thunk tag) so parse can use the same tag for cg-break / cg-continue inside the body; tag is also returned to the caller
-(cg-loop-end cg tag) ; no-op; reserved for future per-loop teardown
-(cg-break cg tag)
-(cg-continue cg tag)
-```
-
-For `for` and `do-while` the parser composes the building blocks
-above; cg doesn't expose a dedicated for/do helper. (Three helpers
-beat seven.)
-
-### switch helpers
-
-```scheme
-(cg-switch-begin cg) -> swctx ; spill controlling expression to a slot
-(cg-switch-case cg swctx const-int) ; emit %if_eq jump-to-case-label
-(cg-switch-default cg swctx) ; emit B to default-label
-(cg-switch-end cg swctx) ; close out
-```
-
-### Globals and data
-
-```scheme
-(cg-emit-global cg sym init) ; init = #f (zero-init in .bss)
- ; | (piece ...) in .data
-;; piece := <bytevector> — raw bytes
-;; | (label-ref . <label-bv>) — 8-byte slot holding &label
-(cg-emit-extern cg sym) ; declare without defining
-(cg-intern-string cg bv-content) -> bv-label ; idempotent; used internally by cg-push-string
-```
-
-### Frame allocation (used internally and by parse for locals)
-
-```scheme
-(cg-alloc-slot cg bytes align) -> offset ; bumps frame-hi; returns aligned offset
-```
-
-### Test plan
-
-`tests/cc-cg/` is a runtime-validating suite. Each fixture is a `.scm`
-program that drives the cg API directly, calls `cg-finish`, and writes
-the resulting P1pp text to stdout. The harness assembles that P1pp
-through the existing P1pp toolchain, runs the resulting ELF, and
-diffs the program's exit code against `<name>.expected-exit` (default
-`0`) and stdout against `<name>.expected` (default empty).
-
-Fixtures must therefore emit a complete, runnable program — including
-any callees they invoke via `cg-call`. Direct-cg tests that depend on
-external symbols (`extern abs`, `extern foo`) belong in cc once
-real `libc` linkage exists; they don't fit cc-cg.
-
-The contract this suite locks in is **semantic**, not syntactic: cg
-emission can be refactored freely as long as the program still
-computes the asserted result.
-
-## parse.scm
-
-Mutates a `pstate`. Drives `cg`. Single entry point:
-
-```scheme
-(parse-translation-unit ps) ; consumes ps-toks until EOF, emits via ps-cg
-;; ps must have ps-toks set; everything else starts empty.
-```
-
-### Top-down structure
-
-Internal helpers, hierarchically:
-
-```
-parse-translation-unit
- ├─ parse-decl-or-fn ; returns 'fn or 'decl
- │ ├─ parse-decl-spec ; storage + qualifiers + base type
- │ ├─ parse-declarator ; spiral grammar; returns (name ctype)
- │ ├─ parse-init-list ; for variable initializers (incl. designated)
- │ └─ parse-fn-body ; only if declarator is fn-typed AND next tok is `{`
- ├─ parse-stmt
- │ ├─ parse-compound-stmt
- │ ├─ parse-if-stmt
- │ ├─ parse-while-stmt
- │ ├─ parse-do-stmt
- │ ├─ parse-for-stmt
- │ ├─ parse-switch-stmt
- │ ├─ parse-return-stmt
- │ ├─ parse-goto-stmt
- │ └─ parse-expr-stmt
- └─ parse-expr ; Pratt; takes min-bp
- ├─ parse-primary
- ├─ parse-postfix
- ├─ parse-unary
- ├─ parse-cast-or-unary ; `(T)e` vs. `(e)`
- └─ parse-binary-rhs
-```
-
-### Token-stream API (private to parse.scm but conventional)
-
-```scheme
-(peek ps) -> tok
-(peek2 ps) -> tok ; one-token lookahead helper
-(advance ps) -> tok ; consume and return
-(expect-kw ps sym) -> tok ; KW match or die
-(expect-punct ps sym) -> tok ; PUNCT match or die
-(at-kw? ps sym) -> bool
-(at-punct? ps sym) -> bool
-```
-
-`peek2` is needed exactly twice: distinguishing
-`(typename) cast-expr` from `(expr)` parenthesized expression, and
-distinguishing labelled-statement `IDENT :` from expression-statement
-`IDENT ...`.
-
-### Scope helpers
-
-```scheme
-(scope-enter! ps)
-(scope-leave! ps)
-(scope-bind! ps name sym) ; aborts on duplicate at innermost frame
-(scope-lookup ps name) -> sym-or-#f ; walks frames
-
-(tag-bind! ps name ctype)
-(tag-lookup ps name) -> ctype-or-#f
-
-(typedef-add! ps name) ; updates ps-typedefs
-(typedef? ps name) -> bool
-```
-
-### Pratt expression parser
-
-The binding-power table is a top-level alist:
-
-```scheme
-(define %binop-bp
- ;; (punct-symbol . (lhs-bp . rhs-bp))
- ;; left-assoc: rhs-bp = lhs-bp + 1
- ;; right-assoc: rhs-bp = lhs-bp - 1
- '((|*| 110 . 111) (|/| 110 . 111) (|%| 110 . 111)
- ...))
-```
-
-Driver:
-
-```scheme
-(parse-expr-bp ps min-bp) ; Pratt climber
-(parse-expr ps) ; equivalent to (parse-expr-bp ps 0)
-```
-
-`parse-expr` leaves the result on `ps-cg`'s vstack and returns the
-opnd. Statements that don't consume the value follow with `cg-pop`.
-
-### Test plan
-
-`tests/cc/` is a runtime-validating suite. Each fixture is a
-`.c` source file containing a complete, runnable program — `main`
-must exit with a known value. The harness runs the full
-lex+pp+parse+real-cg pipeline, assembles the emitted P1pp through the
-P1pp toolchain, runs the resulting ELF, and diffs exit code against
-`<name>.expected-exit` (default `0`) and stdout against
-`<name>.expected` (default empty).
-
-The contract this suite locks in is that the *C source* compiles to a
-program with the asserted runtime behavior. Per the §Feature workflow,
-parse fixtures land *after* the corresponding cg fixture is green.
-
-## main.scm
-
-Driver. Roughly 80 lines.
-
-```scheme
-(define (cc-main argv)
- (let* ((args (parse-cli argv)) ; record: { input-path output-path defines }
- (src (slurp-fd (open-or-die (cli-input args))))
- (toks (lex-tokenize src (cli-input args)))
- (defines (cli-defines->alist (cli-defines args)))
- (expanded (pp-expand toks defines))
- (cg (cg-init))
- (ps (make-pstate expanded cg)))
- (parse-translation-unit ps)
- (write-bv-fd (open-or-die-out (cli-output args))
- (cg-finish cg))
- 0))
-
-(cc-main (argv))
-```
-
-CLI: `cc input.flat.c -o output.P1pp [-D NAME[=val]] ...`. Strict —
-unrecognized flags die.
-
-## Test infrastructure
-
-Five test trees. The upstream layers (lex, pp) byte-diff their pure
-transformations; the downstream layers (cg, cc) all
-**compile-and-run** the emitted P1pp and assert runtime behavior — exit
-code by default, optional stdout match. This split keeps the byte-diff
-brittleness out of every layer where there's a real "is this code
-correct" question to ask.
-
-- `tests/cc-util/` — direct unit tests of util helpers (Scheme-level).
-- `tests/cc-lex/` — feeds `.c` through `lex-tokenize`, diffs token
- serialization (per CC-CONTRACTS §2.1).
-- `tests/cc-pp/` — feeds tokens (or `.c`) through `pp-expand`, diffs
- token serialization.
-- `tests/cc-cg/` — `.scm` driver calls cg APIs directly to emit a
- complete program; harness builds + runs the P1pp; asserts exit code
- / stdout. (See cg.scm §Test plan.)
-- `tests/cc/` — `.c` fixture goes through the full compiler driver
- (`cc/cc.scm`: lex+pp+parse+cg); harness builds + runs; asserts exit
- code / stdout. (See parse.scm §Test plan.) Holds both feature drills
- (parser / cg coverage) and full-envelope scenarios.
-
-`tests/cc-cg/` is the codegen contract. `tests/cc/` is the C-
-language contract. Per §Feature workflow, every feature lands a cg
-fixture first (which must fail before cg is touched), and a parse
-fixture second (which must fail before parse is touched).
-
-## Out-of-scope here
-
-Deferred to follow-up docs once we start coding:
-
-- Exact P1pp text emitted for each cg primitive (precise opcodes,
- spill discipline, libp1pp macro choices) — lives next to cg.scm
- as a side document.
-- The exact `<stdarg.h>` / `<stddef.h>` we ship — the headers we
- bundle so the flattener has something to inline.
-- Driver script for the pre-flatten pass (host shell tool, not part
- of the Scheme compiler).
-- Performance tuning: alist → tree, vstack list → array of frame
- slots, etc. None of this affects the interfaces above.
diff --git a/docs/CC-PUNCHLIST.md b/docs/CC-PUNCHLIST.md
@@ -1,464 +0,0 @@
-# CC codegen punch list
-
-C99-subset codegen capabilities, ordered for red→green TDD per
-[CC-INTERNALS.md §Feature workflow](CC-INTERNALS.md#feature-workflow).
-The accepted language surface is defined in [CC.md](CC.md); this doc
-is the implementation checklist against that surface.
-
-## Conventions
-
-- Every item has up to two runtime-validated fixtures:
- - **cg**: `tests/cc-cg/<n>-name.scm` — drives the cg API directly.
- - **cc**: `tests/cc/<n>-name.c` — exercises the same shape via real
- C through the full compiler driver. Use the same bucket for
- full-envelope scenarios (driver, multi-fn, libc) once those land.
-- Acceptance: `make test SUITE=cc-cg` (then cc) green on all three
- arches. The runner asserts `.expected-exit` (default `0`) and
- `.expected` stdout (default empty).
-- Land cg work + cg fixture in one PR; cc fixture + parse work in
- the next. Don't block on parse to start cg.
-- Pick the next free `<n>` per suite.
-- Status legend: `[ ]` red · `[~]` partial · `[x]` green.
-
-## Already green
-
-cc-cg 00–14 + cc 00–14 cover: empty fn, `return`
-with const/param, two-param fn, i64 binops, locals + assign, `if` /
-`if-else`, `while` with `break` / `continue`, direct calls (0..5
-args, with stack staging), string literal interning, file-scope
-zero-init globals, `&x` on a param, typedef plumbing through to a
-return.
-
-## Punch list
-
-### A. Width-correct integer codegen
-
-The 64-bit-everything load/store path is the largest correctness gap
-upstream of nearly everything else. Land this first.
-
-- [x] **`char` (8-bit) load/store via lval**
- - cg: `cc-cg/15-char-roundtrip.scm` — two adjacent 1-byte slots;
- stores must not bleed across slots → exit 1 on equality check.
- - cc: `cc/15-char-arith.c` — same shape from C.
- - Done: added `%cg-emit-{ld,st}{,-slot}-typed` helpers that dispatch
- on `ctype-size = 1` to `%lb`/`%sb`; `%cg-load-opnd-into`,
- `cg-load`, `cg-assign` thread the lval ctype. `i8` loads also
- sign-extend via `shli`/`sari` 56. 16/32-bit fall through to the
- 8-byte path until §A.2/§A.3 land.
-
-- [x] **`short` (16-bit) load/store via lval**
- - cg: `cc-cg/16-short-roundtrip.scm`
- - cc: `cc/16-short-arith.c`
- - Done: byte-decomposed dispatch in the typed helpers — store low
- byte then `%shri` 8 + store high byte; load two bytes + `%shli` 8
- + `%or`. `i16` sign-extends via `shli`/`sari` 48. Helpers may
- clobber `t1`; callers never pass `reg=t1`.
-
-- [x] **`int` (32-bit) load/store via lval**
- - cg: `cc-cg/17-int-roundtrip.scm`
- - cc: `cc/17-int-arith.c`
- - Done: byte-decomposed dispatch generalised to N bytes via
- `%cg-emit-{ld,st}N-bytes`; size-4 routes through 4× `%lb`/`%sb`
- with shift+OR / shri-shift. `i32` sign-extends via `shli`/`sari`
- 32. The address-staging in `%cg-load-opnd-into` and `cg-load`'s
- indirect path now uses `t2` so multi-byte gathers don't alias
- dest with base.
-
-- [x] **Signed narrowing keeps sign on re-widen**
- - cg: `cc-cg/18-sext-narrow.scm` — `(int)(char)-3 == -3` → exit 1.
- - cc: `cc/18-sext-narrow.c`
- - Done: `cg-cast`'s narrowing branch now `shli`/`sari`'s for signed
- narrow targets (i8/i16/i32) instead of masking, so the slot holds
- the canonical sign-extended 64-bit form. The widening cast back
- (relabel-only) preserves it.
-
-- [x] **Unsigned narrowing zero-extends**
- - cg: `cc-cg/19-zext-narrow.scm` — `(unsigned)(unsigned char)-3 == 253`.
- - cc: `cc/19-zext-narrow.c`
- - Done: pre-existing mask-on-unsigned-narrow path was already
- correct; fixture locks the contrast with §A.4 in (same source,
- same chain shape, divergent result via target signedness).
-
-- [x] **Integer promotion preserves sign across operations**
- - cg: `cc-cg/20-promote-sign.scm` — `signed char x=-1; ((int)x)+2 == 1`
- via 64-bit comparison so a non-canonical 0x101 result fails.
- - cc: `cc/20-promote-sign.c`
- - Done: load-side sign-extension from §A.1 already canonicalises the
- slot, so `cg-promote`'s relabel-only path is correct. Fixture
- locks the invariant in.
-
-### B. Lvalue mechanics
-
-Picked **(b) `cg-dup`** — duplicate the top vstack entry, used for
-compound assign and pre-inc/dec to keep the lhs lval available across
-its own load. Post-inc/dec use a dedicated `cg-postinc` / `cg-postdec`
-primitive to capture the old rval before the store. See
-[CC-CONTRACTS §4.1](CC-CONTRACTS.md#41-parsers-responsibilities).
-
-- [x] **Pre-`++` / pre-`--`**
- - cg: `cc-cg/21-preinc.scm` — `int x = 5; ++x; return x;` → exit 6.
- - cc: `cc/21-preinc.c`
- - Done: parser dups lhs lval, loads, +1, assigns; pops result.
-
-- [x] **Post-`++` / post-`--` returns old value**
- - cg: `cc-cg/22-postinc.scm` — `int x=5; int y=x++; return x*10+y;`
- → exit 65.
- - cc: `cc/22-postinc.c`
- - Done: `cg-postinc` / `cg-postdec` primitives composed of two
- dup+load passes — one to capture the old rval (which lives in a
- never-reused spill slot), one to compute the +1/-1 store.
-
-- [x] **Compound assignment on simple lval (`+= -= *= /= %= <<= >>= &= ^= |=`)**
- - cg: `cc-cg/23-cmpd-simple.scm` — `int x=7; x+=3; return x;` → exit 10.
- - cc: `cc/23-cmpd-simple.c` — one of every op family.
- - Done: parser uses `cg-dup` + `cg-load` + rhs + arith-conv + binop +
- assign; the `cg-take-addr`/`cg-push-deref` indirection is gone.
-
-- [x] **Compound assignment through pointer**
- - cg: `cc-cg/24-cmpd-ptr.scm` — `int x=7; int *p=&x; *p+=3; return x;`
- - cc: `cc/24-cmpd-ptr.c`
- - Done: same parser sequence; `cg-push-deref`'s indirect-slot lval
- composes correctly with `cg-dup` + `cg-assign`.
-
-- [x] **`*p++` walking an array**
- - cg: `cc-cg/25-deref-postinc.scm` — walks a 3-element span via *p++.
- - cc: `cc/25-deref-postinc.c`
- - Done: composes B.2 (post-inc on a ptr lval; pointer scaling falls
- out of `cg-binop add`'s ptr branch) with `*p` deref. Also fixed
- `cg-arith-conv` to skip the relabel when one operand is ptr/arr
- so `cg-binop` still sees a ptr/int pair (previously it saw both
- sides relabelled to ptr and skipped the scaling).
-
-### C. `sizeof`
-
-- [x] **`sizeof e` returns the type's actual size**
- - cc: `cc/26-sizeof-expr.c` — `int x; return sizeof x;` → 4.
- - Done: parser peeks `(opnd-type (cg-top …))`, takes its
- `ctype-size`, pops, pushes `imm u64 size`. Both forms
- (`sizeof e` and `sizeof(e)`) updated.
-
-- [x] **`sizeof` over struct, array, pointer, char**
- - cc: `cc/27-sizeof-types.c` — sum of `char`, `short`,
- `int`, `long`, `int*`, `int[5]`, `struct S{int a; int b;}` → 51.
- - Done: the type form already returned `ctype-size ty`; this
- fixture just locks the answer in.
-
-### D. Aggregates
-
-- [x] **Struct member load**
- - cg: `cc-cg/36-struct-load.scm` — two-int struct, fields at 0 and 4.
- - cc: `cc/36-struct-load.c`
- - Done: added `cg-push-field cg fname` (cg.scm). Pops struct/union
- lval, looks up `fname` in `ctype-ext`'s `(tag complete? fields)`.
- Three input cases: direct frame lval shifts the slot offset;
- indirect frame lval loads addr+fo into a new indirect slot;
- global lval `la`'s the label, adds `fo`, stashes via indirect
- slot. Parser `dot` arm replaced.
-
-- [x] **Struct member store**
- - cg: `cc-cg/37-struct-store.scm` — three u8 fields, distinct
- multipliers in the readback to isolate per-field width.
- - cc: `cc/37-struct-store.c`
- - Done: cg-push-field from §D.1 plus the width-aware store path
- from §A.1. No new primitive.
-
-- [x] **Pointer-to-struct (`p->x`)**
- - cg: `cc-cg/38-arrow.scm`
- - cc: `cc/38-arrow.c`
- - Done: arrow arm in `parse-postfix-rest` calls rval! (loads ptr),
- cg-push-deref (struct lval through ptr), then cg-push-field.
- Indirect-frame branch of cg-push-field (added in §D.1) handles
- the deref-result struct lval correctly.
-
-- [x] **Nested struct access (`s.inner.x`, `s->inner.x`)**
- - cc: `cc/39-struct-nested.c`
- - Done: cg-push-field pushes a new lval whose ctype is the field's
- type; if that's a struct, a subsequent `.x` chains naturally.
- The fixture exercises both s.inner.x (direct frame) and
- p->inner.x (indirect frame, via the §D.1 indirect path).
-
-- [x] **Array element access at non-zero index**
- - cg: `cc-cg/40-array-index.scm`
- - cc: `cc/40-array-index.c`
- - Done: cg-load on an arr-typed lval delegates to cg-decay-array,
- pushing a ptr-rval to the first element. Existing `cg-binop add`
- pointer-arithmetic path scales by the pointee size, so `a + i`
- yields `&a[i]`, and cg-push-deref turns that into the element
- lval. cg-take-addr on an arr lval was also adjusted to yield
- T* (not (T[N])*) so `&a[0]` stays consistent.
-
-- [x] **Multi-dim arrays**
- - cc: `cc/41-array-2d.c`
- - Done: fixed `parse-decl-suf-cont` to apply suffixes
- right-to-left (innermost first) so `int a[2][3]` produces
- `arr (arr int 3) 2`, not `arr (arr int 2) 3`. Same fix in the
- fn-suffix arm so `T (...)(...)` chains compose correctly. Decay
- + ptr arithmetic from §D.5 then handles the rest.
-
-- [x] **Struct passed by pointer to a function**
- - cc: `cc/42-struct-fn-arg.c` — passes `&s` to `sum2`,
- callee returns `p->x + p->y`.
- - Done: composes §D.1/§D.3 (cg-push-field, arrow access) and the
- pre-existing param/call/return wiring. No new primitive.
-
- *Pass-by-value of structs is outside CC.md's accepted set; tcc.c
- doesn't use it.*
-
-### E. Initializers
-
-`parse-init-list` (`parse.scm` lines 398–413) currently balances
-braces and returns `#f`, dropping all initializer data. `cg-emit-global`
-accepts an init bv but is never given one.
-
-- [x] **Scalar global with constant initializer**
- - cg: `cc-cg/49-init-scalar-global.scm`
- - cc: `cc/49-init-scalar-global.c`
- - Done: cg-emit-global now consumes a list of pieces (bytevectors
- or `(label-ref . label-bv)` pairs); parser's parse-init-global
- builds N-byte LE bv via %int->le-bv from a const expression.
-
-- [x] **Scalar global with address initializer (`int *p = &x;`)**
- - cg: `cc-cg/50-init-addr.scm`
- - cc: `cc/50-init-addr.c`
- - Done: %const-init-piece recognises `&IDENT` and bare-IDENT for
- fn / static / extern symbols, emitting `(label-ref . cc__name)`.
-
-- [x] **Array global from element list**
- - cg: `cc-cg/51-init-array-list.scm`
- - cc: `cc/51-init-array-list.c`
- - Done: %parse-init-array-list walks brace lists; element types
- drive bv width; array-name → label-ref decay so `int *p = a;`
- works as init too.
-
-- [x] **Array global from string literal**
- - cg: `cc-cg/52-init-array-str.scm`
- - cc: `cc/52-init-array-str.c`
- - Done: parse-init-global recognises STR for char[] target;
- inferred-length arrays (`T a[]`) get patched with the literal
- length.
-
-- [x] **Struct global, positional init**
- - cg: `cc-cg/53-init-struct-pos.scm`
- - cc: `cc/53-init-struct-pos.c`
- - Done: %parse-init-struct-list walks fields positionally; trailing
- fields zero-padded.
-
-- [x] **Struct global, designated init (`.field = …`)**
- - cg: `cc-cg/54-init-struct-desig.scm`
- - cc: `cc/54-init-struct-desig.c`
- - Done: same %parse-init-struct-list handles `.name = …` form.
- Also: cg-arith-conv now leaves pointer-typed operands alone so
- `*(p + N)` scales correctly when one side is ptr.
-
-- [x] **Local array initializer**
- - cc: `cc/55-init-local-array.c`
- - Done: parse-init-local-aggregate emits per-element store ops at
- slot+(i*esize); zero-pads trailing slots when declared length
- exceeds initializer count.
-
-- [x] **Local struct initializer**
- - cc: `cc/56-init-local-struct.c`
- - Done: same per-field store sequence; designated form supported.
-
-### F. Control flow extensions
-
-- [x] **`do { } while (e);`**
- - cg: `cc-cg/63-do-while.scm`
- - cc: `cc/63-do-while.c`
- - Done: composes existing `cg-loop` + `cg-if` + `cg-break`;
- fixture-only.
-
-- [x] **`for (init; cond; step)` with declaration in `init`**
- - cc: `cc/64-for-decl.c`
- - Done: existing `parse-for-stmt` exercised end-to-end.
-
-- [x] **`switch / case / default` with fall-through**
- - cg: `cc-cg/64-switch.scm` — three cases falling through to default.
- - cc: `cc/65-switch.c`
- - Done: validated the existing `swctx` machinery in cg.
-
-- [x] **`goto` / labelled statement (forward and backward)**
- - cg: `cc-cg/65-goto.scm`
- - cc: `cc/66-goto.c`
- - Done: replaced the `cg-break` hack in `parse-goto-stmt`. Added
- `cg-emit-label cg name-bv` (drops `::user_<name>`) and
- `cg-goto cg name-bv` (emits `%b(&::user_<name>)`).
- `parse-labelled-stmt` now calls `cg-emit-label` before the inner stmt.
- - Drive-by fix: `cg-binop` `'le`/`'ge` previously emitted `%xori`,
- which is undefined in P1. Replaced with `%li(t1,1) %xor(t0,t0,t1)`.
-
-### G. Variadics
-
-- [x] **Variadic call: per-arg default-promote**
- - cg: `cc-cg/66-vararg-call.scm`
- - cc: `cc/67-vararg-call.c`
- - Done: parser inspects fn type at `parse-call-args`; for arg index
- ≥ named-arg count, emits `cg-promote` per CC.md §Implicit
- conversions. Fixed-arg index emits `cg-cast` to declared param
- type (also covers §K.5).
-
-- [x] **Variadic receive: `__builtin_va_start/arg/end`**
- - cg: `cc-cg/69-vararg-recv.scm` — sums N int-typed variadic args.
- - cc: `cc/76-vararg-recv.c`
- - Done: added `cg-va-start cg`, `cg-va-arg cg ctype`,
- `cg-va-end cg` (each pops ap-lval from vstack);
- `cg-fn-begin/v` reserves a 16-slot incoming-arg window: indices
- 0..3 are saved from a-registers, indices 4..15 from `LDARG` slots
- 0..11. va_arg walks the window linearly from the slot at index =
- named-arg count. Parser recognizes `__builtin_va_start/arg/end` at
- parse-primary; `parse-fn-body` threads the fn ctype's variadic?
- flag. Bundled `cc/headers/stdarg.h` aliases `va_list`/`va_start`/
- `va_arg`/`va_end` to the builtins. Lock-in fixture
- `cc/79-vararg-deep.c` exercises 1 named + 6 variadic args.
- Limit: 15 variadic args after the named ones; bump
- `VARARG_WINDOW` (currently 16) in `cg-fn-begin/v` to extend.
-
-### H. Conditionals as values
-
-Added `cg-ifelse-merge`: caller pre-allocates the result slot, each
-thunk pushes one rval that is then loaded and stored into the slot,
-and the slot's frame rval is left on the vstack. The merged result
-type is taken from the first thunk's pushed type — parser is
-responsible for arranging compatible types in the two branches.
-
-- [x] **Ternary `?:` leaves exactly one rval**
- - cg: `cc-cg/28-ternary.scm` — `c ? 7 : 9` with c=1 → exit 7.
- - cc: `cc/28-ternary.c`
- - Done: parser swaps `cg-ifelse` for `cg-ifelse-merge` in the
- `qmark` arm of `parse-binary-rhs`; both branches push their
- parsed rval directly.
-
-- [x] **`&&` short-circuit leaves exactly one i32 rval**
- - cg: `cc-cg/29-land.scm`
- - cc: `cc/29-land.c`
- - Done: parser injects `cg-cast %t-bool` then `cg-cast %t-i32` on
- the rhs side so the merged result is i32 ∈ {0,1}; the else-arm
- pushes `%t-i32 0`.
-
-- [x] **`||` short-circuit leaves exactly one i32 rval**
- - cg: `cc-cg/30-lor.scm`
- - cc: `cc/30-lor.c`
- - Done: mirrors §H.2 with the bool-cast on the else-arm and a
- constant `%t-i32 1` in the then-arm.
-
-### I. Storage classes
-
-- [x] **Block-scope `static` lives in bss/data, not on the stack**
- - cg: `cc-cg/57-block-static.scm`
- - cc: `cc/57-block-static.c`
- - Done: handle-decl gates on `sto = 'static'` *before* the
- file-vs-block branch and routes block-scope statics to
- cg-emit-global with a `cc__<fn>__<var>` mangled label.
-
-### J. Driver / envelope
-
-- [x] **Entry stub forwards `argc` / `argv` to `main`**
- - cc: `cc/00-return-argc` (already green; locked in).
- - Done: P1's program-entry contract delivers `a0=argc`, `a1=argv`
- at `p1_main` (P1.md §Program Entry). `%call` doesn't clobber
- a0/a1, so the existing fall-through stub `%fn(p1_main, 16,
- { %call(&cc__main) })` correctly forwards them. Documented in
- `cg-finish`.
-
-- [x] **`int main()` falling off the end returns 0**
- - cc: `cc/68-main-noret.c` — `int main(){}` → exit 0.
- - Done: `cg-fn-begin` now zero-inits the ret slot in the prologue
- when the return type isn't void, so falling through to `::ret`
- reads back a defined 0 instead of relying on kernel zero-fill.
-
-- [x] **Multi-function translation unit with forward references**
- - cc: `cc/69-multi-fn.c`
- - Done: forward declaration `int helper(int x);` binds an extern
- fn sym up-front so `parse-primary` finds it before the
- definition appears.
-
-### K. Expressions and conversions
-
-- [x] **Comma operator (`a, b` as expression)**
- - cc: `cc/31-comma.c` — `int a; int b; (a=1, b=2); return a + b*10;`
- → exit 21.
- - Done: added `(comma . (1 . 2))` to `%binop-bp` (left-assoc, below
- `assign`'s 4/3 so `parse-call-args ps 4` still won't slurp it as a
- call separator); handler `cg-pop`s the lhs and evaluates the rhs.
-
-- [x] **Function-pointer call**
- - cg: `cc-cg/67-fnptr-call.scm` — push a fn-typed sym, spill to a
- frame slot, reload, call.
- - cc: `cc/71-fnptr-call.c` — `int (*fp)(int) = f; return fp(7);`
- → exit 21.
- - Done: exercises `cg-call`'s `%callr(t0)` branch; verified
- return-type extraction walks `ptr → fn → ret` correctly.
-
-- [x] **Enum constant in expressions**
- - cc: `cc/72-enum-const.c` — `enum E { A=1, B=10, C };
- return A+B+C;` → exit 22.
- - Done: locked in the existing `parse-primary` `'enum-const` branch.
-
-- [x] **`void *` ↔ `T *` implicit conversion (no cast required)**
- - cc: `cc/73-voidptr-impl.c` — `void *p; int x=42; p=&x;
- int *q=p; return *q;` → exit 42.
- - Done: cg-cast's `to-kind = 'ptr` clause is relabel-only between
- any pointer types; `cg-assign` drives the cast each direction.
-
-- [x] **Implicit narrowing of fixed-arg call arguments to declared
- param type**
- - cc: `cc/74-call-narrow.c` — `int f(unsigned char x){return x;}
- int main(){ return f(258); }` → exit 2.
- - Done: `parse-call-args` now emits `cg-cast` per fixed arg to the
- declared param type (variadic args promoted via §G.1).
-
-- [x] **Pointer comparison is unsigned**
- - cg: `cc-cg/68-ptr-cmp.scm` — verifies `%ifelse_ltu` dispatch.
- - cc: `cc/75-ptr-cmp.c` — `int a[2]; return &a[1] > &a[0];`
- → exit 1.
- - Done: `cg-binop`'s `lt/le/gt/ge` dispatch already picks the
- unsigned variant for ptr/arr operands. Fixtures lock it in.
-
-### L. Aggregates round 2
-
-- [x] **Flexible array member as last struct field**
- - cc: `cc/77-flex-array.c` — backs `struct s { int n; int data[]; };`
- with an `int[]` buffer, casts to `struct s *`, indexes through
- `p->data[i]`. tcc.c's `Sym` / `TokenSym` rely on this.
- - Done: parser already accepted `T name[]` (no size) via the existing
- `parse-decl-suf-cont` `[]` arm; `complete-agg!` already excluded
- flex extent because `(max sz 0)` collapses `-1` to 0; `cg-push-field`
- + the indirect-frame path + `cg-decay-array` already composed
- correctly. The actual fix was a latent cast-conversion bug in
- `parse-cast-or-unary` — it skipped `rval!` when the target type
- was a pointer, so `(struct s *)buf` relabeled the array's lval
- rather than decaying it to a ptr-rval. Now `rval!` runs for every
- cast (matches C lvalue-conversion semantics: arrays decay, lvals
- become rvals before the bit-cast).
-
-- [x] **`T[]` in parameter position decays to `T *`**
- - cc: `cc/43-array-param-decay.c` — `int sum(int a[], int n)
- { ... s+=a[i]; ... } sum(xs,4)` over `{1,2,3,4}` → exit 10.
- - Done: `parse-fn-params` rewrites arr→ptr (and fn→ptr) before
- slot allocation, so cg sees an 8-byte ptr slot and the callee's
- `a[i]` decays the param's loaded ptr-rval and scales the index
- correctly.
-
-- [x] **Array of function pointers initialized with named functions**
- - cg: `cc-cg/58-fnptr-tab.scm`
- - cc: `cc/58-fnptr-tab.c`
- - Done: composes §E.3's array list init with §E.2's label-ref;
- %const-init-piece's bare-IDENT branch already covers `fn` syms.
- Parse fixture uses an explicit typedef for the array element
- because `int (*tab[])()` declarators are owned elsewhere.
-
-## Phase milestones (CC.md §Validation)
-
-The CC.md milestones gate on contiguous blocks above. Each lights up
-once its dependencies are green:
-
-- [ ] **Self-test sweep** (cc fixtures mirroring tests/scheme1) — depends on §A,
- §B, §C, §F, §H.
-- [ ] **Hand-written hello-world ELF** — depends on §G, §I, §J + a
- string-formatting libc surface.
-- [ ] **Compile mes libc `unified-libc.c`** — depends on §D, §E.
-- [ ] **Compile tcc.c (tcc-mes defines)** — depends on everything above.
-- [ ] **tcc-lispcc builds tcc-boot0**; checksum matches live-bootstrap.
-
-The last is the bootstrap milestone — at that point lispcc has fully
-replaced MesCC in the chain.
diff --git a/docs/LISP-C.md b/docs/LISP-C.md
@@ -1,1105 +0,0 @@
-# Scheme in P1
-
-## Goal
-
-A small tree-walking interpreter for the Scheme subset specified in
-[LISP.md](LISP.md), written in P1 via the M1PP + M1 + M0 + hex2
-toolchain. Intent: simplicity and correctness first; performance is a
-later concern.
-
-The self-hosted C compiler (see [PLAN.md](PLAN.md)) will run under
-this interpreter, as will shell-style scripts built on
-[shell.scm](../../shell.scm) and
-[prelude.scm](../lisp/prelude.scm).
-
-## Toolchain
-
-```
-scheme.P1 (hand-written M1PP source using P1.M1pp)
- | prepend arch backend + P1.M1pp
- | m1pp
- v
-scheme.M1 (target-specific M1 ops)
- | M0
- v
-scheme.hex2
- | hex2
- v
-scheme (native ELF)
-```
-
-C-shaped snippets in this document are descriptive of the target shape,
-not the source. The source is M1PP-preprocessed P1 assembly.
-
-## Word size
-
-P1-64 is the initial target. Every heap word is 8 bytes; every object
-is 8-byte aligned. P1-32 is out of scope for the first pass; porting
-would require a parallel layout story because M1PP's `%struct` stride
-is fixed at 8.
-
-## Settled decisions
-
-Load-bearing; the rest of the document assumes them.
-
-1. **Tree-walking eval/apply over raw s-expressions.** No AST
- preprocessing, no bytecode, no compilation step. The reader's
- output is the interpreter's input.
-2. **Static buffers, no GC.** Fixed-size labeled byte reservations for
- heap, symbol table, reader buffer, writer buffer. Heap exhaustion
- aborts with an error.
-3. **Bump allocation, 8-byte aligned.** Single `heap_next` pointer
- (pinned in `s0`); every object is 8-byte aligned by construction.
-4. **Tagged values, low-3-bit tag.** See §Value representation.
-5. **Symbols are interned indices.** No heap object per symbol; the
- value is an index into the symbol table, which stores the name
- bytevector.
-6. **Environment is a flat association list.** `((sym . val) ...)`,
- extended by consing onto the front. Lookup walks the list; falls
- through to the global slot in the symbol table.
-7. **Closures capture the env pointer directly.** No free-variable
- analysis, no flat closures, no boxes. A closure is four words:
- `[header, params, body, env]`.
-8. **No `set!` on captured variables.** Language-level rule. The
- implementation does not enforce it; behavior is unspecified if
- violated. Use a pair for shared mutable state.
-9. **Inner `define` is sequential-only.** Visible to later forms in
- the same body, not to earlier forms. Use `letrec` for mutual
- recursion between local helpers.
-10. **`pmatch` is a built-in special form.** Saves writing a macro
- expander for one feature.
-11. **Tail calls via P1 `TAIL` / `TAILR`.** Every tail position in
- `eval` and `apply` compiles to `TAIL` (label) or `TAILR` (register)
- rather than `CALL` + `RET`. Scheme-level tail-call correctness
- falls out.
-12. **Primitive failure is undefined behavior.** No runtime checks on
- `(car '())`, `(quotient n 0)`, out-of-bounds bytevector access,
- integer overflow, or mutation of literals. Follows the spec's
- "Primitive failure" policy.
-
-## Memory layout
-
-Labeled byte reservations in the final image. Starting sizes:
-
-| Region | Size | Purpose |
-|---------------|----------|--------------------------------------|
-| `heap` | 64 MB | bump-allocated Scheme objects |
-| `symtab` | 16384 × | open-addressing intern table |
-| `readbuf` | 16 MB | slurped source file |
-| `writebuf` | 64 KB | staged output for display/write |
-| `argbuf` | 64 × 8 B | primitive dispatch scratch |
-
-All sized via M1PP constants; easy to retune. Overflow of any buffer →
-`error(...)` → `sys_exit(1)`.
-
-### Symbol table entry
-
-```c
-struct sym_entry {
- val_t name_bv; // interned name, heap bytevector
- val_t global_val; // top-level binding (or UNBOUND)
- uint32_t name_hash;
- uint32_t chain_next;
-};
-```
-
-`global_val` gives O(1) top-level lookup: every symbol's global
-binding is reachable without hash probing once the symbol is resolved.
-
-M1PP layout (widened to word-per-field for 8-byte stride):
-
-```
-%struct SYMENT { name_bv global_val name_hash chain_next }
-; .name_bv = 0, .global_val = 8, .name_hash = 16, .chain_next = 24,
-; .SIZE = 32
-```
-
-The `hash` and `next` fields are logically 32-bit but occupy a full
-word each to stay on the `%struct` stride; simpler than packing.
-
-The symbol table and heap are the only global roots.
-
-## Value representation
-
-All values are 64-bit tagged words. Low 3 bits are the tag.
-
-| Tag | Kind | Notes |
-|-------|----------------------------|--------------------------------------------|
-| `000` | fixnum | 61-bit signed; `>> 3` to recover |
-| `001` | pair pointer | 16 B aligned; `[p-1]` car, `[p+7]` cdr |
-| `010` | symbol | `(idx << 3) | 2`; idx into `symtab` |
-| `011` | headered heap object | `[p-3]` is the header word |
-| `100` | immediate singleton | `#f`, `#t`, `()`, unspec, unbound |
-| `101`–`111` | reserved | |
-
-### Tag and immediate constants
-
-Encoded as M1PP `%enum`s. Concrete values derive via expression:
-
-```
-%enum TAG { FIXNUM PAIR SYM HEAP IMM } ; raw indices 0..4
-%enum IMM { FALSE TRUE NIL UNSPEC UNBOUND } ; raw indices 0..4
-
-; a tag 100 value in low bits is literally %TAG.IMM == 4
-%macro MKIMM(idx)
-$((| (<< idx 3) %TAG.IMM))
-%endm
-
-; usage: %MKIMM(%IMM.FALSE), %MKIMM(%IMM.NIL), etc.
-```
-
-Truthiness: every value except `%MKIMM(%IMM.FALSE)` is truthy. Test is
-a single compare.
-
-## Object types
-
-This is the complete runtime type set.
-
-| Type | Representation |
-|------------------|----------------------------------------------------|
-| fixnum | immediate (tag 000) |
-| boolean | immediate (`FALSE`, `TRUE`) |
-| empty list | immediate (`NIL`) |
-| unspec / unbound | immediates |
-| pair | `[car][cdr]`, 16 B, tag 001, no header |
-| bytevector | `[hdr = BV | len<<8][bytes...]`, tag 011 |
-| closure | `[hdr = CLOSURE][params][body][env]`, tag 011 |
-| primitive | `[hdr = PRIM | entry_label]`, tag 011 |
-| record-td | `[hdr = TD][name_sym][nfields]`, tag 011 |
-| record | `[hdr = REC][td][field_0]...[field_{n-1}]`, tag 011|
-
-Headered object header word (little-endian bytes):
-
-```
-[ type:8 | aux:56 ]
-```
-
-where `aux` carries type-specific data (length for bytevectors,
-entry-label for primitives, nfields for records, etc.). One load
-tells you the type.
-
-### Layouts
-
-```
-%struct PAIR { car cdr } ; .SIZE = 16
-%struct CLOSURE { hdr params body env } ; .SIZE = 32
-%struct BV { hdr } ; bytes follow inline
-%struct PRIM { hdr } ; header encodes entry
-%struct TD { hdr name_sym nfields } ; .SIZE = 24
-; record is variable-width; td+nfields known from TD
-%struct REC_HDR { hdr td } ; field_i at offset 16 + 8*i
-%enum HDR { BV CLOSURE PRIM TD REC } ; type-byte values
-```
-
-Strings and bytevectors are the same type. `string?` ≡ `bytevector?`.
-No character type: `#\a` in source is sugar for fixnum 97.
-
-### Primitive encoding
-
-An M1 label is 4 bytes. A primitive fits in a single 8-byte header:
-
-```
-bytes [0..4): &entry_label ; 4-byte hex2 label reference
-bytes [4..6): arity ; @(arity) little-endian 2 bytes
-byte [6] : type = %HDR.PRIM ; !(%HDR.PRIM)
-byte [7] : 0 ; !(0)
-```
-
-This makes `apply_prim` a header-load plus `callr`: no dispatch table,
-no switch. §Apply sketches the sequence.
-
-## Tag idioms
-
-A handful of short sequences do all of the runtime's bit-twiddling.
-They are M1PP macros so the rest of the code reads at the Scheme
-level. These are illustrative; exact op selection is finalized during
-implementation.
-
-```
-%macro tagof(rd, rs) ; rd = rs & 7
-%andi(rd, rs, 7)
-%endm
-
-%macro make_fixnum(rd, rs) ; rd = rs << 3
-%shli(rd, rs, 3)
-%endm
-
-%macro untag_fixnum(rd, rs) ; rd = rs >> 3, arithmetic
-%sari(rd, rs, 3)
-%endm
-
-%macro car(rd, rs) ; pair tag is 001; raw = rs - 1
-%ld(rd, rs, -1) ; [raw + 0] = car
-%endm
-
-%macro cdr(rd, rs)
-%ld(rd, rs, 7) ; [raw + 8] = cdr
-%endm
-
-%macro set_car(rs, rv)
-%st(rv, rs, -1)
-%endm
-
-%macro set_cdr(rs, rv)
-%st(rv, rs, 7)
-%endm
-
-%macro hdr_load(rd, rs) ; heap tag is 011; raw = rs - 3
-%ld(rd, rs, -3)
-%endm
-
-%macro hdr_type(rd, rs) ; type byte is low byte of header
-%lb(rd, rs, -3)
-%endm
-
-%macro hdr_field(rd, rs, idx) ; word field idx, 0 counts the header word
-%ld(rd, rs, (- (* idx 8) 3))
-%endm
-```
-
-`cons` is a bump + two stores + tag:
-
-```
-%macro cons(rd, rcar, rcdr, rtmp)
-; assumes s0 = heap_next, s1 = heap_end
-%addi(rtmp, s0, 16)
-%la_br(heap_oom)
-%bltu(s1, rtmp) ; if heap_end < next+16, OOM
-%st(rcar, s0, 0)
-%st(rcdr, s0, 8)
-%addi(rd, s0, 1) ; tag 001
-%mov(s0, rtmp)
-%endm
-```
-
-`alloc_hdr(rd, type_byte, bytes, rtmp)` is the corresponding idiom for
-headered objects; it bumps, writes the header word, and returns the
-tagged pointer. The header word is composed on the fly from the type
-byte and any aux field.
-
-## Dispatch tables
-
-P1 has no indexed branch op, but it has `BR rs` / `CALLR rs` / `TAILR
-rs` and one-word loads. That's enough to build a jump table for any
-switch wider than ~3 cases, which we use for the special-form
-cascade in `eval`.
-
-Each entry is 8 bytes: a 4-byte M1 label reference plus 4 bytes of
-pad. P1-64 has no 4-byte load, only full-word `LD`, so padding to
-a word is the cheap path.
-
-```
-%macro table_entry(lref)
-lref %(0) ; label ref + 4 zero bytes = 8 bytes
-%endm
-```
-
-Usage (per special form, in index order):
-
-```
-special_dispatch_table:
-%table_entry(&eval_quote)
-%table_entry(&eval_if)
-%table_entry(&eval_lambda)
-%table_entry(&eval_define)
-%table_entry(&eval_set)
-%table_entry(&eval_begin)
-...
-```
-
-Dispatch is load + indirect branch:
-
-```
-; t0 = sym_idx, already range-checked (< N_SPECIAL)
-%la(t1, special_dispatch_table)
-%shli(t2, t0, 3)
-%add(t1, t1, t2)
-%ld(t1, t1, 0)
-%tailr(t1)
-```
-
-Five ops and one memory load, regardless of how many special forms.
-
-**Reservation convention.** Special-form symbols are interned first at
-startup, so their sym indices are `0..N_SPECIAL-1`. The dispatcher
-checks `idx < N_SPECIAL` before using the table; out-of-range means
-"ordinary application".
-
-## Allocation
-
-C sketch (target shape):
-
-```c
-static val_t alloc(size_t bytes) {
- char *p = heap_next;
- heap_next += (bytes + 7) & ~7;
- if (heap_next > heap_end) error("heap exhausted");
- return (val_t) p;
-}
-```
-
-P1 translation: `s0` and `s1` are pinned to `heap_next` and `heap_end`
-for the lifetime of the program. Allocation is `addi` + `bltu` +
-`mov`. In practice almost every allocation site calls one of `%cons`,
-`%alloc_hdr`, or an inline bytevector emitter, not a generic `alloc`.
-
-## Register assignment
-
-Four callee-saved registers carry the hot state:
-
-| Reg | Role |
-|------|----------------------------------------------------|
-| `s0` | `heap_next` |
-| `s1` | `heap_end` |
-| `s2` | current env pointer |
-| `s3` | scratch held across one inner loop at a time |
-
-`eval(expr, env)` is `a0 = expr`, `a1 = env`, returns in `a0`.
-Handlers copy `a1` into `s2` at entry. `a0..a3` and `t0..t2` are free
-across calls. Leaf code inside a handler may use `s3` without saving,
-provided it restores before the handler returns.
-
-Reader, writer, and primitive bodies follow the same discipline. Any
-primitive that needs deep recursion (e.g. `equal?`) saves `s2` in its
-frame even if it does not touch env, since its callees might.
-
-## Closures
-
-```
-[ hdr = CLOSURE ][ params ][ body ][ env ]
-```
-
-- `params`: parameter list s-expression: `(a b c)`, `(a b . rest)`, or
- a bare symbol `rest` for a fully variadic lambda.
-- `body`: the list of body forms, evaluated sequentially; last in
- tail position.
-- `env`: the environment chain captured at `lambda` evaluation.
-
-Four words total. No free-var analysis at closure creation — the whole
-env chain is retained and walked at lookup time.
-
-## Environment
-
-```
-env ::= NIL
- | ( (sym . val) . env )
-```
-
-- **Lookup.** Walk the list, pointer-compare the car of each pair with
- the target symbol, return the cdr on match. If the walk ends at
- `NIL`, fall through to `symtab[idx].global_val`; if that is
- `UNBOUND`, error out with `"unbound variable"`.
-- **Extension** (scope entry). Cons new `(sym . val)` pairs onto the
- front.
-- **`set!`.** Walk the env chain; on match, `set-cdr!` the pair. If
- the chain is exhausted, write to `symtab[idx].global_val`.
-- **Top-level `define`.** Write to `symtab[idx].global_val`.
-- **Inner `define`.** Handled inline by `eval_body` (see below).
- Each inner define prepends `(name . val)` to the local env as it
- is encountered.
-
-The global env is not a chain — it's the symbol table. Closures
-captured before a top-level `define` nonetheless see the new binding,
-because the fallback is to `symtab[idx].global_val` at lookup time,
-not at capture time. This is how mutual recursion between top-level
-functions works without any pre-pass.
-
-## Eval
-
-C sketch (target shape):
-
-```c
-val_t eval(val_t expr, val_t env) {
- switch (tag(expr)) {
- case TAG_FIXNUM: return expr;
- case TAG_IMM: return expr;
- case TAG_SYM: return env_lookup(expr, env);
- case TAG_HEAP: return expr; // bv, closure, record — self-eval
- case TAG_PAIR: break; // form; dispatch below
- }
-
- val_t head = car(expr);
- val_t rest = cdr(expr);
-
- if (head == SYM_QUOTE) return car(rest);
- if (head == SYM_IF) {
- val_t t = eval(car(rest), env);
- val_t next = (t != FALSE) ? cadr(rest) : caddr(rest);
- MUSTTAIL return eval(next, env);
- }
- if (head == SYM_LAMBDA) return make_closure(car(rest), cdr(rest), env);
- if (head == SYM_DEFINE) return eval_define(rest, env);
- if (head == SYM_SET) return eval_set(rest, env);
- if (head == SYM_BEGIN) MUSTTAIL return eval_begin(rest, env);
- if (head == SYM_LET) MUSTTAIL return eval_let(rest, env);
- if (head == SYM_LETSTAR) MUSTTAIL return eval_letstar(rest, env);
- if (head == SYM_LETREC) MUSTTAIL return eval_letrec(rest, env);
- if (head == SYM_COND) MUSTTAIL return eval_cond(rest, env);
- if (head == SYM_AND) MUSTTAIL return eval_and(rest, env);
- if (head == SYM_OR) MUSTTAIL return eval_or(rest, env);
- if (head == SYM_PMATCH) MUSTTAIL return eval_pmatch(rest, env);
- if (head == SYM_DEFINE_RT) return eval_define_record_type(rest, env);
-
- // application
- val_t fn = eval(head, env);
- val_t args = eval_list(rest, env);
- MUSTTAIL return apply(fn, args);
-}
-```
-
-The special-form symbols (`SYM_QUOTE`, `SYM_IF`, etc.) are interned
-once at startup and stored in labeled word slots. Dispatch is a
-cascade of integer compares.
-
-P1 translation. The outer tag switch is a short cascade of `beq`s on
-the tag value (fixnum / imm / sym / heap return immediately; pair
-falls through to form dispatch). Form dispatch uses the jump table
-from §Dispatch tables:
-
-```
-; eval_pair path, a0 = expr, a1 = env already normalized
-%car(t1, a0) ; t1 = head
-%cdr(a1, a0) ; a1 = rest, set up for handlers
-
-; fall through to ordinary application if head is not a symbol
-%andi(t2, t1, 7)
-%la_br(eval_apply)
-%addi(t3, t2, (- %TAG.SYM))
-%bnez(t3) ; tag != SYM -> apply
-
-; extract idx and bounds-check
-%sari(t0, t1, 3) ; t0 = sym_idx
-%la(t2, N_SPECIAL_const)
-%ld(t2, t2, 0)
-%la_br(eval_apply)
-%bltu(t0, t2) ; invert: take table path iff idx < N
-...
-
-; table dispatch
-%la(t1, special_dispatch_table)
-%shli(t2, t0, 3)
-%add(t1, t1, t2)
-%ld(t1, t1, 0)
-%tailr(t1)
-```
-
-(The bounds-check polarity and the branch for "take table path" are
-sketched loosely; the final shape is whatever combination of
-`BLT`/`BLTU`/`BEQ` produces the fewest ops.)
-
-Each handler is its own P1 function. Tail positions inside handlers
-end with `%la_br(target); %tail()`. An ordinary application is
-`eval(head)`, `eval_list(rest)`, then `%la_br(apply); %tail()`.
-
-## Apply
-
-C sketch:
-
-```c
-val_t apply(val_t fn, val_t args) {
- switch (header_type(fn)) {
- case HDR_PRIM:
- return call_prim(fn, args);
- case HDR_CLOSURE: {
- val_t env = bind_params(clo_params(fn), args, clo_env(fn));
- MUSTTAIL return eval_body(clo_body(fn), env);
- }
- default:
- error("not a procedure", fn);
- }
-}
-```
-
-`bind_params` walks params and args in lockstep. Variadic `.`-tail:
-when params becomes a non-pair symbol, that symbol is bound to the
-remaining args list. When params becomes `NIL` with args remaining
-(or vice versa), error out with an arity message.
-
-P1 translation. `hdr_type` reads one byte; dispatch is two compares:
-
-```
-; apply(fn=a0, args=a1)
-%hdr_type(t0, a0)
-%la(t1, %HDR.PRIM)
-%la_br(apply_prim)
-%beq(t0, t1)
-%la(t1, %HDR.CLOSURE)
-%la_br(apply_closure)
-%beq(t0, t1)
-%la_br(err_not_procedure)
-%b()
-```
-
-`apply_prim` extracts the entry label from the primitive's header
-(low 4 bytes) and `callr`s it with the arg list in `a0`:
-
-```
-apply_prim:
-%hdr_load(t0, a0) ; t0 = header word
-%andi(t0, t0, 0xFFFFFFFF) ; low 32 bits = &entry
-%mov(a0, a1) ; arg list in a0 for the primitive
-%callr(t0)
-%ret()
-```
-
-`apply_closure` runs `bind_params`, extends env, then `%la_br`s
-`eval_body` and `%tail()`s.
-
-## Eval_body
-
-```c
-val_t eval_body(val_t body, val_t env) {
- while (is_pair(cdr(body))) {
- val_t form = car(body);
- if (is_pair(form) && car(form) == SYM_DEFINE) {
- val_t name = define_name(form);
- val_t val = eval(define_rhs(form), env);
- env = cons(cons(name, val), env);
- } else {
- eval(form, env);
- }
- body = cdr(body);
- }
- val_t last = car(body);
- if (is_pair(last) && car(last) == SYM_DEFINE) {
- val_t name = define_name(last);
- val_t val = eval(define_rhs(last), env);
- env = cons(cons(name, val), env);
- return UNSPEC;
- }
- MUSTTAIL return eval(last, env);
-}
-```
-
-Inner `define` is sequential-only: visible to forms that textually
-follow it in the same body, not to earlier forms. For mutual
-recursion between local helpers, use `letrec` or lift to top level.
-The `SYM_DEFINE` handler in `eval` itself handles only top-level
-`define`; inner defines are absorbed here and never reach that path.
-
-## Special forms
-
-Implemented directly, each ~30-60 P1 ops:
-
-- `quote` — return the datum.
-- `if` — eval test; tail-eval the chosen branch.
-- `lambda` — build closure (one `%alloc_hdr`).
-- `define` — top-level: write `symtab[idx].global_val`. Inner:
- handled by `eval_body`.
-- `set!` — walk env; first match mutates. Miss: write global slot.
-- `begin` — eval each form in sequence; tail-eval the last.
-- `let` — eval all inits in the current env; extend env with
- bindings; tail-eval body via `eval_body`.
-- `let*` — extend env one binding at a time, each init evaluated in
- the progressively extended env.
-- `letrec` — pre-bind each name to `UNSPEC` in a new env; eval each
- init in that env; `set!` each binding; tail-eval body.
-- `let` (named) — desugar inline to `letrec` + call, or implement
- directly.
-- `cond` — loop clauses; first truthy test wins; tail-eval its body;
- `else` always true.
-- `and` / `or` — short-circuit fold; return the deciding value, not a
- coerced boolean.
-- `pmatch` — `eval_pmatch` clause loop + `pmatch_match` recursive
- walker. See §pmatch below.
-- `define-record-type` — see below.
-
-## pmatch
-
-Surface spec lives in [LISP-PMATCH.md](LISP-PMATCH.md). This section
-covers integration into the scheme1 spine that's already in
-`scheme1/scheme1.P1pp`.
-
-### Why built-in (not a library)
-
-Pattern bindings extend the interpreter's env via `cons`. The runtime
-exposes no `(eval expr env)` primitive and no macro system, so a
-user-level pmatch could not introduce bindings visible to its
-clause body. Building it in is the smallest path; the alternative is
-adding either macros or first-class envs, both bigger commitments than
-one special form.
-
-### Reader integration — no refactor
-
-pmatch operates on **already-parsed s-expressions**. By the time eval
-reaches `(pmatch e (pat body...) ...)`, every pattern is a heap-consed
-datum produced by the existing `parse_one` / `parse_list` recursion.
-The matcher walks those datums; it never re-invokes the reader. So
-nothing in the reader needs restructuring.
-
-The one additive change is a new branch in `parse_one`'s leading-byte
-dispatch: `,` (ASCII 44) → consume the byte, recurse `parse_one` for
-the inner datum, build `(unquote <datum>)`. This is a near-copy of the
-existing `'` branch (`::quote` in parse_one) — same shape, different
-head symbol:
-
-```
-::comma ;; mirrors ::quote
-%la(t2, &readbuf_pos)
-%ld(t0, t2, 0)
-%addi(t0, t0, 1)
-%st(t0, t2, 0)
-%call(&parse_one)
-%li(a1, %imm_val(%IMM.NIL))
-%call(&cons)
-%la(t0, &sym_unquote)
-%ld(t0, t0, 0)
-%mov(a1, a0)
-%mov(a0, t0)
-%tail(&cons)
-```
-
-`,@` is **not** a separate token: `@` is already in `is_ident_byte`'s
-set, so `,@x` parses as `(unquote @x)` and the matcher then treats
-`@x` as a (legal but odd) binder identifier. An explicit
-`msg_bad_unquote_splice` is optional polish, not load-bearing.
-
-There is no `quasiquote` evaluator, so `,x` written outside a pmatch
-pattern evaluates as `(unquote x)` — an ordinary application of an
-unbound `unquote` — and dies through the existing `unbound variable`
-path. No new failure mode.
-
-### New cached symbol slots
-
-Four labeled slots, interned in `intern_special_forms` alongside the
-existing ones (`sym_quote`, `sym_if`, …):
-
-| Slot | Purpose |
-|--------------------|----------------------------------------------------------|
-| `sym_pmatch` | eval's pair-branch dispatch |
-| `sym_unquote` | pattern-side: detect `(unquote x)` — i.e., a `,x` binder |
-| `sym_guard` | clause shape: `((guard g…) …)` vs. plain body |
-| `sym_underscore` | wildcard test (the literal symbol `_` after `unquote`) |
-
-`sym_else` is already cached for `cond`; pmatch reuses it for the
-`else` clause. Caching `sym_underscore` (vs. byte-comparing the name)
-costs one slot and turns the wildcard test into a single `beq`.
-
-### Eval-side dispatch
-
-Two new lines in eval's pair branch — same shape as every other
-special form already wired up there:
-
-```
-%dispatch_form(&sym_pmatch, &::do_pmatch) ;; in the cascade
-...
-::do_pmatch ;; among the tail handlers
-%tail_to_handler(&eval_pmatch)
-```
-
-The cascade is order-independent (each row is a pointer-compare on a
-distinct cached symbol), so position in the list doesn't matter.
-
-### eval_pmatch(rest=a0, env=a1) → value (a0)
-
-`rest` is `(subject-expr . clauses)`. Subject is evaluated **once**,
-then clauses walk top-to-bottom restarting from the *outer* env each
-time.
-
-```
-Frame: 32 bytes
- +0 subject ; eval(car(rest), env) once, then read-only
- +8 env_outer ; restart point per clause; passed to pmatch_match
- +16 clauses ; cdr(rest); advances per non-match
- +24 spill ; clause body / extended env across guard eval
-```
-
-Per-clause shape (same control flow as `eval_cond`'s clause loop, just
-with a richer match step):
-
-```
-loop:
- if clauses == NIL: die msg_pmatch_no_match
- clause = car(clauses)
- pat = car(clause)
- tail = cdr(clause) ; (body...) or ((guard ...) body...)
-
- if pat eq? sym_else:
- a0 = tail; a1 = env_outer; tail-call eval_body
- ;; we don't validate that an `else` clause has no (guard ...) head;
- ;; per LISP-PMATCH.md grammar that's malformed source, but any
- ;; (guard ...) here will just be evaluated as a body form, which
- ;; will die through the unbound `guard` path.
-
- call pmatch_match(pat, subject, env_outer) ; -> (env_ext=a0, ok=a1)
- if ok == 0: advance clauses; loop
-
- ; guard clause? — tail begins with (guard g...)
- if pair? tail and pair? car(tail) and car(car(tail)) eq? sym_guard:
- guards = cdr(car(tail))
- body = cdr(tail)
- for each g in guards: ; AND-fold, short-circuit
- if eval(g, env_ext) == FALSE: advance clauses; loop
- else:
- body = tail
-
- a0 = body; a1 = env_ext; tail-call eval_body
-```
-
-`eval_body` already preserves tail position for the body's last form,
-so the matched clause's last expression is in tail position of the
-surrounding `pmatch` — same as `cond`, `let`, `begin`.
-
-### pmatch_match(pat=a0, subj=a1, env=a2) → (env'=a0, ok=a1)
-
-Pure leaf-ish recursive helper. Dispatches on the **pattern**'s tag.
-Output convention mirrors `parse_dec`: `a0` carries the extended env
-on success, `a1` is the 1/0 ok flag; on failure `a0` is undefined and
-the caller must not use it.
-
-| Pattern shape | Match condition |
-|-----------------------|----------------------------------------------------------------|
-| `IMM.NIL` | `subj == NIL` |
-| `TAG.FIXNUM` | raw word compare `subj == pat` (covers integers and chars) |
-| `TAG.IMM` (`#t`/`#f`) | raw word compare |
-| `TAG.SYM` | raw word compare (symbols are interned, so this is `eq?`) |
-| `TAG.HEAP` (bv) | byte-for-byte equality via `bv_equal_raw` — see below |
-| `TAG.PAIR` | binder / wildcard if `car(pat) eq? sym_unquote`; else structural recurse |
-
-#### Frame
-
-```
-Frame: 24 bytes
- +0 pat ; preserved across cons (binder) and across the car recursion
- +8 subj ; same — needed in cons in binder case, and as cdr(subj) after recurse
- +16 env ; saved across the cons calls in the binder case
-```
-
-All three are stored on entry; nothing gets unsaved on the way out
-because the function returns through `%eret` after computing a0/a1.
-
-#### Binder / wildcard
-
-```
-phead = car(pat)
-if phead eq? sym_unquote:
- ;; validate shape: pat must be (unquote <sym>), exactly two elements
- if tag(cdr(pat)) != PAIR: die msg_bad_unquote_pattern
- if tag(cdr(cdr(pat))) != IMM.NIL: die msg_bad_unquote_pattern
- pident = car(cdr(pat))
- if tag(pident) != SYM: die msg_bad_unquote_pattern
-
- if pident eq? sym_underscore:
- return (env, ok=1) ;; ,_ — no binding
- return (cons(cons(pident, subj), env), ok=1)
-```
-
-Validation is per-call and cheap — three tag compares before any cons.
-A defined runtime error here (rather than UB) is the one carve-out
-from the "primitive failure is UB" policy: pattern shape is a syntax
-error in the user's source, not a hot-path concern.
-
-#### Structural pair
-
-```
-;; phead is anything else: a literal symbol, a nested pattern, NIL, ...
-if tag(subj) != PAIR: return (_, ok=0)
-(env1, ok) = pmatch_match(car(pat), car(subj), env)
-if !ok: return (_, ok=0)
-return pmatch_match(cdr(pat), cdr(subj), env1) ;; tail position
-```
-
-Two LISP-PMATCH.md guarantees fall out of this shape:
-
-- **Improper-tail patterns** — the cdr-recursion accepts any pattern as
- ptail. `,rest` there binds the list tail (proper or improper); `()`
- there demands a proper list.
-- **Nested patterns** — descending in lockstep handles arbitrary
- depth; no special case in the matcher.
-
-#### Partial-match rollback — none needed
-
-`env` is passed by register, never mutated. A failed sub-match returns
-`ok=0` without having shadowed the caller's env pointer. The next
-clause restarts from `env_outer` stashed in the eval_pmatch frame, so
-any sub-bindings built during the failed match are simply unreachable
-and will be reclaimed when the heap is reset.
-
-#### Allocation cost
-
-- Successful binder: one cons for `(ident . subj)` plus one cons to
- push it onto env — same as a `let` binding.
-- Wildcard: zero allocation.
-- Failed match: zero allocation past the failure point (the
- recursion bails on the first mismatch).
-- No pattern compilation, no decision-tree caching. Patterns are
- re-walked per call. Revisit if profiling demands it; until then this
- matches the rest of scheme1 (tree-walk eval, no preprocessing).
-
-#### Bytevector equality — factor out of `equal?`
-
-The existing `equal?` primitive contains the byte-for-byte loop
-already; pmatch_match needs the same logic but on raw `val_t`s, not on
-an arg list. Factor a leaf helper:
-
-```
-;; bv_equal_raw(a=a0, b=a1) -> a0 = 1 if equal else 0
-;; assumes both inputs are HEAP-tagged bytevectors
-```
-
-Reshape `equal?`'s bv branch to `mov`/`mov` its two unpacked operands
-into a0/a1 and `tail` into `bv_equal_raw`. pmatch_match calls the same
-helper directly. Both keep their existing public behavior; the
-duplicated inline loop disappears.
-
-### Guards
-
-Evaluated in the **extended env**, short-circuit AND on `FALSE`. The
-inner loop reuses the same compare against `%imm_val(%IMM.FALSE)` that
-`eval_and` already uses; no new helper. A rejected guard advances the
-clause cursor — it does **not** backtrack against the subject.
-
-### No-match die path
-
-`msg_pmatch_no_match` routed through `runtime_error`, same path as
-every other die in the interpreter. LISP-PMATCH.md classes this as
-UB; in practice we always crash with a clear message rather than
-wander.
-
-### File placement
-
-Both routines live in the §Special forms section of
-`scheme1/scheme1.P1pp`, slotting in alongside `eval_cond` /
-`eval_let` / etc.:
-
-1. `eval_pmatch` — the special-form entry point, called via the
- `::do_pmatch` tail handler in eval's pair branch.
-2. `pmatch_match` — leaf-ish recursive helper, defined immediately
- below `eval_pmatch` (its only caller).
-3. `bv_equal_raw` — leaf helper. Lives near the existing `equal?`
- primitive in §Primitives, since both call into it.
-
-`intern_form` rows for the four new symbol slots go into
-`intern_special_forms`, in any position. `parse_one`'s `,` branch
-goes alongside the existing `'` branch in the reader section.
-
-## define-record-type
-
-Desugars at eval time into a sequence of primitive operations.
-
-```scheme
-(define-record-type point
- (make-point x y)
- point?
- (x point-x)
- (y point-y set-point-y!))
-```
-
-becomes
-
-```scheme
-(define <point-td> (%make-record-td 'point 2))
-(define make-point (lambda (x y) (%make-record <point-td> x y)))
-(define point? (lambda (v) (%record-is-a? v <point-td>)))
-(define point-x (lambda (p) (%record-ref p 0)))
-(define point-y (lambda (p) (%record-ref p 1)))
-(define set-point-y! (lambda (p v) (%record-set! p 1 v)))
-```
-
-The `%` primitives are internal, not part of the user-facing
-primitive list.
-
-## Primitives
-
-55 user-facing primitives plus 6 internal record ops. Each is its own
-P1 function, reached via `callr` on the label packed into the
-primitive object's header.
-
-**Predicates / equality** (13): `eq?`, `equal?`, `not`, `boolean?`,
-`pair?`, `null?`, `symbol?`, `integer?`, `string?` (≡ `bytevector?`),
-`procedure?`, `record?`, `record-is-a?`, `record-type?`.
-
-**Pairs** (5): `cons`, `car`, `cdr`, `set-car!`, `set-cdr!`.
-
-**Integers** (24): `+`, `-`, `*`, `quotient`, `remainder`, `modulo`,
-`=`, `<`, `<=`, `>`, `>=`, `zero?`, `positive?`, `negative?`, `abs`,
-`min`, `max`, `bit-and`, `bit-or`, `bit-xor`, `bit-not`,
-`arithmetic-shift`, `number->string`, `string->number`.
-
-**Bytevectors / strings / symbols** (10): `make-bytevector`,
-`bytevector-length`, `bytevector-u8-ref`, `bytevector-u8-set!`,
-`bytevector-copy`, `bytevector-copy!`, `bytevector-append`,
-`bytevector=?`, `string->symbol`, `symbol->string`.
-
-**Control / program / error** (3): `apply`, `argv`, `error`.
-
-**Internal record ops** (6): `%make-record-td`, `%make-record`,
-`%record-ref`, `%record-set!`, `%record-is-a?`, `%record-td`.
-
-### Calling convention
-
-`apply_prim` receives the arg list in `a0` and jumps via `callr` to
-the primitive's entry label (extracted from the primitive header).
-Each primitive's body:
-
-1. Walks the arg list, copying values into `argbuf[]` and counting
- `argc`.
-2. Checks arity (fixed-arity primitives; variadic ones do their own).
-3. Computes result into `a0`.
-4. `ret`s.
-
-There is no code-id switch and no dispatch table — the table of
-primitives is the set of labels referenced from `register_primitives`
-at startup, and the interpreter keeps zero pointers to it afterwards.
-
-## Prelude
-
-Written in Scheme, embedded as a bytevector literal in `scheme.P1`,
-parsed and evaluated at startup like any other source. Contents:
-
-```scheme
-(define (list . xs) xs)
-(define (length xs) ...)
-(define (reverse xs) ...)
-(define (append . lists) ...)
-(define (list-ref xs i) ...)
-(define (map f . lists) ...)
-(define (for-each f . lists) ...)
-```
-
-Runs before the user file, sharing the same symbol table and heap.
-~80 Scheme LOC.
-
-## Reader
-
-Single-pass recursive descent over `readbuf` with an integer cursor.
-No token array. Conses directly onto the heap.
-
-Handles the lexical syntax specified in LISP.md:
-
-- lists with `.` dotted tail
-- `'x` → `(quote x)`
-- `,x` → `(unquote x)` — datum sugar only, consumed by `pmatch`; no
- `unquote` evaluator. No `quasiquote`, no `,@` / unquote-splicing.
-- fixnums: decimal and `#x`-prefixed hex, both with optional sign
-- `#t`, `#f`
-- `"..."` strings with `\n \t \r \\ \"` escapes
-- `#\a`..`#\~`, `#\newline`, `#\space`, `#\tab`, `#\return`,
- `#\null`, `#\xNN` → read as fixnums
-- symbols: identifier grammar, case-sensitive
-- `;` line comments (no block comments, no `#;`)
-
-### Source locations
-
-As the reader conses each pair, it records `pair_ptr → (line . col)`
-in a side table (fixed-size open-addressing hash). `error` consults
-this when its first argument is a pair and prefixes the message with
-`"at file:line:col: "`. Without a GC, entries for collected pairs
-never arise; a full table simply overwrites old entries via linear
-probing.
-
-## Writer
-
-Three entry points:
-
-- `display` — no quoting on strings; bare character output.
-- `write` — readable form; strings quoted.
-- `format fmt args...` — supports `~a` (display), `~s` (write), `~d`
- (decimal fixnum), `~%` (newline).
-
-Output buffered in `writebuf`; flushed via `sys_write(1, ...)` on
-buffer-full or on newline for `display`/`write`. `error` flushes
-before writing to fd 2.
-
-## Error path
-
-`(error msg arg ...)`:
-
-1. Flush writer buffer.
-2. Write `"error: "` + message + formatted args to fd 2 via
- `sys_write`.
-3. `sys_exit(1)`.
-
-No unwinding, no handlers. Primitive failure follows the same policy
-as the spec: undefined behavior, likely crash, no runtime check.
-
-## Startup
-
-P1 defines a portable program-entry model (see P1.md §Program Entry):
-the arch backend emits `_start`, captures argc/argv off the native
-entry stack, and calls the portable label `p1_main` with `a0 = argc`,
-`a1 = argv`. On return, the backend does `sys_exit(a0)`.
-
-The Scheme interpreter therefore contains no per-target startup code.
-It just defines `p1_main`:
-
-```
-p1_main:
- ; a0 = argc, a1 = argv
- init_heap ; s0 = &heap, s1 = &heap_end
- init_symtab
- intern_special_forms ; fill labeled slots sym_quote, sym_if, ...
- ; (order reserves sym_idx 0..N_SPECIAL-1)
- register_primitives ; emit %HDR.PRIM words with &entry labels
- parse_and_eval_prelude ; prelude is an embedded bytevector literal
- ; slurp argv[1] into readbuf
- ; loop: read_one_top_level; eval(form, NIL); advance cursor; until EOF
- li a0, 0
- ret ; backend _start does sys_exit(0)
-```
-
-All state lives in labeled reservations; nothing is heap-allocated
-before `p1_main`.
-
-## File layout
-
-```
-boot2/
- src/
- scheme.P1 # M1PP source, consumes P1.M1pp macro library
- lisp/
- prelude.scm # embedded into scheme.P1 as a bytevector literal
- tests/lisp/
- 00-arith.scm
- 01-list.scm
- 02-reader.scm
- 03-printer.scm
- 04-eval.scm
- 05-tailcall.scm
- 06-pmatch.scm
- 07-records.scm
- ...
-```
-
-Build: prepend the arch-specific backend M1M to `p1/P1.M1pp`, prepend
-that bundle to `src/scheme.P1`, run `m1pp` → `M0` → `hex2` → ELF.
-Tests are `.scm` files the interpreter runs; pass = exit 0 and
-expected stdout.
-
-## Milestones
-
-Status legend: `[x]` done · `[~]` in progress · `[ ]` not started.
-
-1. [ ] **Bootstrapping shim.** Per-target `_start`, argv capture,
- labeled reservations for heap / symtab / readbuf / writebuf,
- `sys_write` + `sys_exit` error path. Exits with a placeholder
- status.
-2. [ ] **Tag macros.** `%tagof`, `%make_fixnum`, `%untag_fixnum`,
- `%car`, `%cdr`, `%hdr_*`. Verify constants at M1PP time with
- `%select`.
-3. [ ] **Bump allocator and cons.** `%cons` macro; OOM branch.
- Hand-build `(cons 42 nil)` and exit with the decoded car.
-4. [ ] **Symbols + interning.** 16k-slot open-addressing table;
- `intern(name, len)` returns a tagged symbol val_t. Two interns of
- the same name compare equal as integers.
-5. [ ] **Bytevectors.** Headered heap object; `make-bytevector`,
- `bytevector-u8-ref`, etc.
-6. [ ] **Reader.** End-to-end source → s-expression for the full
- lexical syntax, with source-location side table.
-7. [ ] **Writer.** `display`, `write`, `format`.
-8. [ ] **`eval` + `apply` core.** Self-eval, symbol lookup, `if`,
- `lambda`, top-level `define`, application. Every tail position
- uses `%tail`.
-9. [ ] **Remaining special forms.** `set!`, `begin`,
- `let`/`let*`/`letrec`, named let, `cond`, `and`/`or`, inner
- `define` in `eval_body`.
-10. [ ] **Primitives.** All 55 user-facing + 6 internal. Batches:
- arith/bitwise, list-core, bv/string, I/O+apply+error, record
- ops.
-11. [ ] **Prelude.** Embed as bytevector literal; parse and eval at
- startup.
-12. [ ] **`define-record-type`.** Desugaring + record primitives.
-13. [ ] **`pmatch`.** Pattern matcher + clause driver.
-14. [ ] **End-to-end.** Run the self-hosted compiler under it.
-
-Rough total: **~4–5k P1 LOC + ~80 Scheme LOC**.
diff --git a/docs/LISP-PMATCH.md b/docs/LISP-PMATCH.md
@@ -1,259 +0,0 @@
-# pmatch
-
-Spec for the `pmatch` special form. Companion to [LISP.md](LISP.md) §Special
-forms; implementation sketch in [LISP-C.md](LISP-C.md) §pmatch.
-
-## Why it's in the language
-
-The Scheme subset has no macros, no `case-lambda`, no `define-syntax`. The
-self-hosted compiler destructures s-expressions on every form it walks
-(`(if t a b)`, `(lambda ps . body)`, `(quote x)`, …); doing this by hand
-through `car`/`cdr` chains is tedious and error-prone. `pmatch` is the
-single concession: one built-in pattern matcher so the compiler reads at
-the form level. It is **not** a general macro system — admitting it
-specifically rules out admitting `syntax-rules`.
-
-## Surface syntax
-
-```
-(pmatch <expr> <clause> ...)
-
-<clause> ::= ( <pattern> <body> ... )
- | ( <pattern> (guard <g-expr> ...) <body> ... )
- | ( else <body> ... )
-```
-
-- `<expr>` is evaluated **once**; its value is the *subject*.
-- `<pattern>` is a literal datum read by the standard reader; it is **not**
- evaluated.
-- `<body>` is one or more expressions; the last is in tail position of the
- enclosing `pmatch`.
-- `<g-expr>` are evaluated with the pattern's bindings in scope; see
- *Guards* below.
-
-## Patterns
-
-| Pattern | Matches |
-|----------------------|-------------------------------------------------------|
-| `()` | the empty list (`null?` of subject) |
-| `<integer>` | that integer (`equal?` — same as `=` for fixnums) |
-| `<string>` | byte-for-byte equal bytevector (`equal?`) |
-| `#\c` | the integer that `#\c` denotes (`equal?`) |
-| `#t` / `#f` | that boolean (`eq?`) |
-| `<symbol>` | **that exact symbol** (`eq?`) — *not* a binder |
-| `,<ident>` | anything; binds `<ident>` to the subject |
-| `,_` | anything; introduces no binding (wildcard) |
-| `(p1 p2 ... pN)` | proper list of length exactly `N`; each `pi` matches |
-| `(p1 ... pN . ptail)`| improper list; prefix matches `p1..pN`, tail matches `ptail` |
-
-Patterns nest: `(if ,t (op ,a ,b) ,c)` is well-formed. Sub-patterns are
-matched recursively against the corresponding sub-objects.
-
-### Malformed `,`-patterns
-
-`(unquote)`, `(unquote x y ...)`, and `(unquote <non-symbol>)` (e.g.
-`,42`, `,(foo)`) are all **runtime errors**, not undefined behavior.
-This is the only carve-out from the *Primitive failure* policy in
-LISP.md — pattern shape is a syntax error in the user's source, worth
-catching with a clear message. Implementation cost is three tag
-compares per binder match; see LISP-C.md §pmatch for the check.
-
-### Reader interaction
-
-`pmatch` reuses the standard reader. To make `,<ident>` writeable, the
-reader recognizes `,X` as datum sugar for `(unquote X)` — the same shape
-R7RS gives it, but **only** as a datum form. There is no `unquote`
-evaluator and no `quasiquote`; the form exists purely so that pmatch
-patterns can be parsed. Writing `,x` outside a pmatch pattern produces a
-list `(unquote x)`, which `eval` will reject as an application of an
-unbound variable.
-
-This is the only reader concession. `,@` (unquote-splicing) is **not**
-recognized.
-
-### Symbols vs. binders
-
-This is the most common stumble. In a pattern:
-
-- `foo` matches the literal symbol `foo` (by `eq?`). It does **not** bind.
-- `,foo` matches anything and binds `foo` to it.
-
-So `(if ,t ,a ,b)` matches the head symbol `if` literally and binds `t`,
-`a`, `b` to the three subforms. `(,head ,t ,a ,b)` matches *any* 4-element
-list and binds `head` as well.
-
-### Wildcards
-
-`,_` matches anything and introduces no binding. Multiple `,_` in one
-pattern are fine. The identifier `_` is otherwise unreserved — `(define _
-0)` is legal Scheme; `_` outside a `,` is just a literal symbol pattern.
-
-### Improper-tail patterns
-
-`(p1 ... pN . ptail)` first matches the prefix exactly like a proper-list
-pattern, then matches `ptail` against whatever cdr remains (which may be a
-proper list, an improper tail, or `()`).
-
-`ptail` is a full pattern, not just a binder. Common cases:
-
-- `ptail = ,rest` — `rest` binds the remaining list (the most common
- form; works for both proper and improper tails).
-- `ptail = ()` — equivalent to writing the proper-list pattern
- `(p1 ... pN)`.
-- `ptail = (q1 ... qM . qtail)` — keep destructuring further into the
- tail.
-
-Example: `(lambda ,ps . ,body)` matches any list whose head is the symbol
-`lambda` and at least one more element; `ps` gets the second element,
-`body` gets the cdr from there.
-
-## Semantics
-
-### Clause selection
-
-Clauses are tried top to bottom. The first clause whose pattern matches
-the subject **and** whose guards (if any) all evaluate to a true value is
-selected. Its body is evaluated in the extended environment; the value of
-the last body expression is the value of the `pmatch`.
-
-`else` is a clause that always matches with no bindings; it must be the
-last clause if present.
-
-### Guards
-
-```
-( <pattern> (guard <g-expr> ...) <body> ... )
-```
-
-- Guards are evaluated *only if the pattern structurally matched*.
-- Each `<g-expr>` is evaluated in left-to-right order with the pattern's
- bindings visible. Evaluation short-circuits: the first `<g-expr>` that
- yields `#f` aborts the guard.
-- All `<g-expr>` truthy ⇒ the clause is selected. Any `<g-expr>` `#f` ⇒
- the clause is **rejected**, bindings are discarded, and matching
- continues with the next clause.
-- A guard rejection does **not** re-match the pattern against later
- positions in a list — the subject is fixed. (Same shape as guards in
- Erlang/ML: structural match, then filter.)
-
-Zero `<g-expr>` is permitted but pointless: `(p (guard) body ...)` is the
-same as `(p body ...)`.
-
-### Bindings
-
-- Each `,<ident>` in the pattern introduces a fresh binding. Bindings are
- visible inside *that clause's* guards and body, and nowhere else.
-- Two `,<ident>` with the **same** identifier in one pattern is undefined
- — the language does not promise to compare for equality (no
- Prolog-style unification) nor to reject the pattern. Don't write it.
-- Bindings shadow same-named names in the surrounding lexical scope, just
- like `let`.
-
-### No-match
-
-If no clause's pattern matches and no `else` is present, behavior is
-**undefined** (per the *Primitive failure* policy in LISP.md; expect a
-crash, not a defined error). Write an explicit
-`(else (error "pmatch: no match" subject))` whenever the matcher is not
-provably exhaustive.
-
-### Evaluation order
-
-- `<expr>` evaluated once before any pattern is tried.
-- Patterns themselves are not evaluated; they are walked structurally
- against the subject.
-- For matching nested patterns, walk order is unspecified — patterns are
- pure datum tests against an already-evaluated subject, so order is not
- observable.
-- The selected body is in tail position of the `pmatch`. Tail calls in the
- body are proper tail calls of the surrounding context.
-
-### Equality used for literals
-
-Integer / string / character / boolean patterns match by `equal?`.
-Concretely: integers and characters compare via `=` on the underlying
-fixnum (characters are u8 fixnums per LISP.md); strings compare
-byte-for-byte; booleans compare by `eq?`. Symbol patterns match by `eq?`
-(symbols are interned).
-
-## Examples
-
-Compiler form dispatch:
-
-```scheme
-(pmatch e
- ((quote ,x) (emit-literal x))
- ((if ,t ,a ,b) (emit-if t a b))
- ((lambda ,ps . ,body) (emit-lambda ps body))
- ((set! ,name ,v) (emit-set! name v))
- ((,f . ,args) (emit-call f args))
- (,x (guard (symbol? x)) (emit-var x))
- (,x (guard (integer? x))(emit-int x))
- (else (error "unknown form" e)))
-```
-
-Three things to notice:
-
-1. `quote`, `if`, `lambda`, `set!` are literal symbol patterns — they
- match those head symbols only.
-2. The `(,f . ,args)` clause must come **after** the special-form clauses
- because it would otherwise swallow them.
-3. The trailing `,x (guard ...)` clauses fire on atoms (subject didn't
- match any list pattern); guards then split symbol vs. integer.
-
-Destructuring a `let` clause:
-
-```scheme
-(pmatch clause
- ((,name ,init) (cons name init))
- (else (error "bad let clause" clause)))
-```
-
-Walking a proper list with a tail binder:
-
-```scheme
-(define (sum xs)
- (pmatch xs
- (() 0)
- ((,h . ,t) (+ h (sum t)))))
-```
-
-## What pmatch is *not*
-
-Kept explicit so additions are deliberate.
-
-| Feature | Status |
-|--------------------------------------|------------------------------|
-| `,@<ident>` segment / splice patterns | not supported |
-| `...` / Kleene-star repetition | not supported |
-| `or` / disjunction patterns | not supported |
-| `and` / conjunction patterns | not supported |
-| Predicate patterns (`(? pred)`) | use `(guard (pred ,x))` |
-| As-patterns (`(<pat> :as ,name)`) | not supported |
-| Non-linear patterns (repeated names) | undefined behavior |
-| User-extensible patterns | not supported |
-| Compilation / decision-tree caching | not done; rewalk every call |
-
-If a use case demands one of these, the workaround is a `guard` plus
-auxiliary `,<ident>` bindings, or splitting the match into nested
-`pmatch`es. Anything beyond that is out of scope for v1.
-
-## Implementation contract (for LISP-C.md)
-
-This section pins what the interpreter must guarantee so callers can rely
-on it; the actual P1 implementation lives in LISP-C.md.
-
-- `pmatch` is a built-in special form, not a primitive procedure. It does
- not exist as a value.
-- It receives unevaluated clauses and walks them itself.
-- The matcher walks pattern and subject recursively. On match, it returns
- an environment extended with the new bindings (or fails). The driver
- tries clauses top-to-bottom, evaluates guards in the extended env, and
- tail-evaluates the body of the first selected clause.
-- No pattern compilation; no decision-tree memoization. Patterns are
- re-walked per call. (See *Primitive failure* policy: re-add caching
- later if profiling shows it's worth it.)
-- Bindings are stored in the same flat alist environment used by `let` —
- no separate match-env type.
-- The body's last expression is reached via the same tail-call path as
- `begin` / `let` body, so tail recursion through `pmatch` works.
diff --git a/docs/SCHEME1-R7RS-TODO.md b/docs/SCHEME1-R7RS-TODO.md
@@ -1,118 +0,0 @@
-# scheme1 R7RS TODO
-
-Lists the R7RS items we intend to add, the items we explicitly aren't adding,
-and rough implementation notes.
-
-## Interpreter changes
-
-### Reader
-
-- [ ] **`#(...)` vector literals.** Required by full vector support.
-
-### Runtime types
-
-- [ ] **Vector type.** New heap header (`HDR.VEC`?), inline length +
- slot array. Tag stays `HEAP`.
-- [ ] **Multiple-values protocol.** Internal MV vector (or stack
- convention) returned from `values`. Single-value contexts that
- receive ≠1 values are an error.
-
-### Special forms / form fixes
-
-- [ ] **`let-values`, `let*-values`, `define-values`.** Depend on the
- MV protocol.
-- [x] **Flat (one-level) quasiquote.** `(quasiquote ...)`,
- `(unquote x)`, `(unquote-splicing xs)` evaluated at the single
- template depth. Nested quasiquote is unsupported. Works on
- lists today; extend to vectors when those land.
-
-### New primitives
-
-These can't be expressed in Scheme; they need new entries in
-`prim_table` (or new syscall wrappers).
-
-- [ ] **Vector core:** `vector?`, `make-vector`, `vector-length`,
- `vector-ref`, `vector-set!`.
-- [ ] **Multi-values:** `values`, `call-with-values`.
-
----
-
-## Prelude additions (`prelude.scm`)
-
-These are blocked on the new primitives above; once those land the
-prelude wrappers are straightforward.
-
-### Vectors (§6.8) — over the new vector primitives
-
-- [ ] **Constructor:** `vector` (variadic).
-- [ ] **List/string conversion:** `vector->list`, `list->vector`,
- `vector->string`, `string->vector`.
-- [ ] **Bulk:** `vector-copy`, `vector-copy!`, `vector-append`,
- `vector-fill!`.
-- [ ] **Higher-order:** `vector-map`, `vector-for-each`.
-
----
-
-## Permanent deviations (won't add)
-
-These are deliberate omissions — document them in user-facing notes
-rather than tracking them as TODOs.
-
-Out of scope:
-- §6.2 numerics — keep fixnum.
-- §4.3 macros / `define-syntax` / `syntax-rules` / `let-syntax` /
- `letrec-syntax`.
-- §5.6 libraries / `define-library` / `import` / `export` / `cond-expand`.
-
----
-
-### Types
-
-- Distinct `char` type (chars are fixnum bytes).
-- Distinct R7RS port type (we have a prelude `port` record over fds).
-- `promise`, `parameter`, `continuation`, `error-object`,
- `environment`.
-
-### Forms
-
-- Internal `define` — interpreter raises an explicit error; users
- must use `letrec`.
-- `delay`, `delay-force`, `force`, `make-promise`, `promise?`.
-- `make-parameter`, `parameterize`.
-- `guard` as an exception form (the `guard` keyword is reserved for
- `pmatch` sub-clauses only).
-- `dynamic-wind`.
-- `include`, `include-ci`. (Source concatenation is shell-side.)
-- Nested quasiquote.
-- Block comments `#| ... |#` and datum comments `#;`.
-
-### Procedures
-
-- `utf8->string`, `string->utf8`.
-- All exception procs: `raise`, `raise-continuable`,
- `with-exception-handler`, `error-object?`, `error-object-message`,
- `error-object-irritants`, `read-error?`, `file-error?`. (`error`
- itself stays as scheme1's flush-and-exit.)
-- `call-with-current-continuation`, `call/cc`.
-- `eval`, `environment`, `scheme-report-environment`,
- `null-environment`, `interaction-environment`.
-- The entire R7RS port surface (§6.13): `call-with-port`,
- `current-input-port`, `current-output-port`, `current-error-port`,
- `input-port?` / `output-port?` / `textual-port?` / `binary-port?` /
- `port?` / `input-port-open?` / `output-port-open?`,
- `with-input-from-file`, `with-output-to-file`, `open-input-file` /
- `-binary-` / `open-output-file` / `-binary-`, `close-port` /
- `close-input-port` / `close-output-port`, string ports
- (`open-input-string`, `open-output-string`, `get-output-string`,
- `open-input-bytevector`, `open-output-bytevector`,
- `get-output-bytevector`), reader/writer procs (`read`, `read-char`,
- `peek-char`, `read-line`, `read-string`, `read-u8`, `peek-u8`,
- `u8-ready?`, `char-ready?`, `read-bytevector`, `read-bytevector!`,
- `write`, `write-shared`, `write-simple`, `write-char`,
- `write-string`, `write-u8`, `write-bytevector`,
- `flush-output-port`, `newline`).
-- `load`.
-- `current-second`, `current-jiffy`, `jiffies-per-second`.
-- `features`, `emergency-exit`.
-- `char-ci=?` / `char-ci<?` / `char-ci>?` / `char-ci<=?` /
- `char-ci>=?` (derive from `char-foldcase` if needed).