commit 2dfba4fcce1a94138f06d44d816896ca76f89065
parent 955cc50921de523481bdfc77a37f49725a4cb554
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Thu, 23 Apr 2026 20:08:40 -0700
lisp docs
Diffstat:
| A | docs/LISP-C.md | | | 548 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| D | docs/LISP-GC.md | | | 538 | ------------------------------------------------------------------------------- |
| M | docs/LISP.md | | | 798 | ++++++++++++++++++++++--------------------------------------------------------- |
3 files changed, 765 insertions(+), 1119 deletions(-)
diff --git a/docs/LISP-C.md b/docs/LISP-C.md
@@ -0,0 +1,548 @@
+# LISP: Scheme interpreter in C
+
+## Goal
+
+A small tree-walking interpreter for the Scheme subset specified in
+[LISP-SUBSET.md](LISP-SUBSET.md). Intent: simplicity and correctness
+first; performance is a later concern.
+
+The self-hosted C compiler (see [PLAN.md](PLAN.md)) will run under
+this interpreter, as will shell-style scripts built on
+[shell.scm](../../shell.scm) and
+[prelude.scm](../lisp/prelude.scm).
+
+## Settled decisions
+
+Load-bearing; the rest of the document assumes them.
+
+1. **Tree-walking eval/apply over raw s-expressions.** No AST
+ preprocessing, no bytecode, no compilation step. The reader's
+ output is the interpreter's input.
+2. **Static buffers, no GC.** Fixed-size BSS regions for heap, symbol
+ table, reader buffer, writer buffer. Heap exhaustion aborts with
+ an error.
+3. **Bump allocation, 8-byte aligned.** Single `heap_next` pointer;
+ every object is 8-byte aligned by construction.
+4. **Tagged values, low-3-bit tag.** See §Value representation.
+5. **Symbols are interned indices.** No heap object per symbol; the
+ value is an index into the symbol table, which stores the name
+ bytevector.
+6. **Environment is a flat association list.** `((sym . val) ...)`,
+ extended by consing onto the front. Lookup walks the list; falls
+ through to the global slot in the symbol table.
+7. **Closures capture the env pointer directly.** No free-variable
+ analysis, no flat closures, no boxes. A closure is four words:
+ `[header, params, body, env]`.
+8. **No `set!` on captured variables.** Language-level rule. The
+ implementation does not enforce it; behavior is unspecified if
+ violated. Use a pair for shared mutable state.
+9. **Inner `define` is sequential-only.** Visible to later forms in
+ the same body, not to earlier forms. Use `letrec` for mutual
+ recursion between local helpers.
+10. **`pmatch` is a built-in special form.** Saves writing a macro
+ expander for one feature.
+11. **Tail calls via host `musttail`.** All tail positions in `eval`
+ and `apply` are marked. Scheme-level tail-call correctness falls
+ out.
+12. **Primitive failure is undefined behavior.** No runtime checks on
+ `(car '())`, `(quotient n 0)`, out-of-bounds bytevector access,
+ integer overflow, or mutation of literals. Follows the spec's
+ "Primitive failure" policy.
+
+## Memory layout
+
+Single BSS region, fixed at link time. Starting sizes:
+
+| Region | Size | Purpose |
+|---------------|----------|--------------------------------------|
+| `heap[]` | 64 MB | bump-allocated Scheme objects |
+| `symtab[]` | 16384 × | open-addressing intern table |
+| `readbuf[]` | 16 MB | slurped source file |
+| `writebuf[]` | 64 KB | staged output for display/write |
+| `argbuf[]` | 64 × 8 B | primitive dispatch scratch |
+
+All sized via `#define`s; easy to retune. Overflow of any buffer →
+`error(...)` → `exit(1)`.
+
+### Symbol table entry
+
+```c
+struct sym_entry {
+ val_t name_bv; // interned name, heap bytevector
+ val_t global_val; // top-level binding (or UNBOUND)
+ uint32_t name_hash;
+ uint32_t chain_next;
+};
+```
+
+`global_val` gives O(1) top-level lookup: every symbol's global
+binding is reachable without hash probing once the symbol is resolved.
+
+The symbol table and heap are the only global roots.
+
+## Allocation
+
+`alloc(bytes)` rounds up to 8, checks `heap_next + bytes <= heap_end`,
+advances the bump pointer, returns the pre-bump address. Overflow →
+`error("heap exhausted")` → `exit(1)`.
+
+8-byte alignment is enforced by the bump allocator; every object
+constructor assumes it.
+
+## Value representation
+
+All values are 64-bit tagged words. Low 3 bits are the tag.
+
+| Tag | Kind | Notes |
+|-------|----------------------------|--------------------------------------------|
+| `000` | fixnum | 61-bit signed; `>> 3` to recover |
+| `001` | pair pointer | 16 B aligned; `[p-1]` car, `[p+7]` cdr |
+| `010` | symbol | `(idx << 3) | 2`; idx into `symtab` |
+| `011` | headered heap object | `[p-3]` is the header word |
+| `100` | immediate singleton | `#f`, `#t`, `()`, unspec, unbound |
+| `101`–`111` | reserved | |
+
+Immediates (bit patterns chosen at implementation time, all tag `100`):
+
+| Constant | Meaning |
+|-----------|---------------------------------------------|
+| `FALSE` | the Scheme `#f` |
+| `TRUE` | the Scheme `#t` |
+| `NIL` | the empty list `()` |
+| `UNSPEC` | result of `set!`, `define`, etc. |
+| `UNBOUND` | sentinel for undefined globals |
+
+Truthiness: every value except `FALSE` is truthy. Test is a single
+compare.
+
+## Object types
+
+This is the complete runtime type set.
+
+| Type | Representation |
+|------------------|----------------------------------------------------|
+| fixnum | immediate (tag 000) |
+| boolean | immediate (`FALSE`, `TRUE`) |
+| empty list | immediate (`NIL`) |
+| unspec / unbound | immediates |
+| pair | `[car][cdr]`, 16 B, tag 001, no header |
+| bytevector | `[hdr = BV | len<<8][bytes...]`, tag 011 |
+| closure | `[hdr = CLOSURE][params][body][env]`, tag 011 |
+| primitive | `[hdr = PRIM | code_id<<8]`, tag 011 |
+| record-td | `[hdr = TD][name_sym][nfields]`, tag 011 |
+| record | `[hdr = REC][td][field_0]...[field_{n-1}]`, tag 011|
+
+Headered object header word:
+
+```
+[ type:8 | aux:56 ]
+```
+
+where `aux` carries type-specific data (length for bytevectors,
+code id for primitives, nfields count for records, etc.). One load
+tells you the type.
+
+Strings and bytevectors are the same type. `string?` ≡ `bytevector?`.
+No character type: `#\a` in source is sugar for fixnum 97.
+
+## Closures
+
+```
+[ hdr = CLOSURE ][ params ][ body ][ env ]
+```
+
+- `params`: parameter list s-expression: `(a b c)`, `(a b . rest)`, or
+ a bare symbol `rest` for a fully variadic lambda.
+- `body`: the list of body forms, evaluated sequentially; last in
+ tail position.
+- `env`: the environment chain captured at `lambda` evaluation.
+
+Four words total. No free-var analysis at closure creation — the whole
+env chain is retained and walked at lookup time.
+
+## Environment
+
+```
+env ::= NIL
+ | ( (sym . val) . env )
+```
+
+- **Lookup.** Walk the list, pointer-compare the car of each pair with
+ the target symbol, return the cdr on match. If the walk ends at
+ `NIL`, fall through to `symtab[idx].global_val`; if that is
+ `UNBOUND`, error out with `"unbound variable"`.
+- **Extension** (scope entry). Cons new `(sym . val)` pairs onto the
+ front.
+- **`set!`.** Walk the env chain; on match, `set-cdr!` the pair. If
+ the chain is exhausted, write to `symtab[idx].global_val`.
+- **Top-level `define`.** Write to `symtab[idx].global_val`.
+- **Inner `define`.** Handled inline by `eval_body` (see below).
+ Each inner define prepends `(name . val)` to the local env as it
+ is encountered.
+
+The global env is not a chain — it's the symbol table. Closures
+captured before a top-level `define` nonetheless see the new binding,
+because the fallback is to `symtab[idx].global_val` at lookup time,
+not at capture time. This is how mutual recursion between top-level
+functions works without any pre-pass.
+
+## Eval
+
+```c
+val_t eval(val_t expr, val_t env) {
+ switch (tag(expr)) {
+ case TAG_FIXNUM: return expr;
+ case TAG_IMM: return expr;
+ case TAG_SYM: return env_lookup(expr, env);
+ case TAG_HEAP: return expr; // bv, closure, record — self-eval
+ case TAG_PAIR: break; // form; dispatch below
+ }
+
+ val_t head = car(expr);
+ val_t rest = cdr(expr);
+
+ if (head == SYM_QUOTE) return car(rest);
+ if (head == SYM_IF) {
+ val_t t = eval(car(rest), env);
+ val_t next = (t != FALSE) ? cadr(rest) : caddr(rest);
+ MUSTTAIL return eval(next, env);
+ }
+ if (head == SYM_LAMBDA) return make_closure(car(rest), cdr(rest), env);
+ if (head == SYM_DEFINE) return eval_define(rest, env);
+ if (head == SYM_SET) return eval_set(rest, env);
+ if (head == SYM_BEGIN) MUSTTAIL return eval_begin(rest, env);
+ if (head == SYM_LET) MUSTTAIL return eval_let(rest, env);
+ if (head == SYM_LETSTAR) MUSTTAIL return eval_letstar(rest, env);
+ if (head == SYM_LETREC) MUSTTAIL return eval_letrec(rest, env);
+ if (head == SYM_COND) MUSTTAIL return eval_cond(rest, env);
+ if (head == SYM_AND) MUSTTAIL return eval_and(rest, env);
+ if (head == SYM_OR) MUSTTAIL return eval_or(rest, env);
+ if (head == SYM_PMATCH) MUSTTAIL return eval_pmatch(rest, env);
+ if (head == SYM_DEFINE_RT) return eval_define_record_type(rest, env);
+
+ // application
+ val_t fn = eval(head, env);
+ val_t args = eval_list(rest, env);
+ MUSTTAIL return apply(fn, args);
+}
+```
+
+The special-form symbols (`SYM_QUOTE`, `SYM_IF`, etc.) are
+indices interned once at startup and stored in globals. Dispatch is a
+cascade of integer compares.
+
+## Apply
+
+```c
+val_t apply(val_t fn, val_t args) {
+ switch (header_type(fn)) {
+ case HDR_PRIM:
+ return call_prim(fn, args);
+ case HDR_CLOSURE: {
+ val_t env = bind_params(clo_params(fn), args, clo_env(fn));
+ MUSTTAIL return eval_body(clo_body(fn), env);
+ }
+ default:
+ error("not a procedure", fn);
+ }
+}
+```
+
+`bind_params` walks params and args in lockstep. Variadic `.`-tail:
+when params becomes a non-pair symbol, that symbol is bound to the
+remaining args list. When params becomes `NIL` with args remaining
+(or vice versa), error out with an arity message.
+
+## Eval_body
+
+```c
+val_t eval_body(val_t body, val_t env) {
+ while (is_pair(cdr(body))) {
+ val_t form = car(body);
+ if (is_pair(form) && car(form) == SYM_DEFINE) {
+ val_t name = define_name(form);
+ val_t val = eval(define_rhs(form), env);
+ env = cons(cons(name, val), env);
+ } else {
+ eval(form, env);
+ }
+ body = cdr(body);
+ }
+ val_t last = car(body);
+ if (is_pair(last) && car(last) == SYM_DEFINE) {
+ val_t name = define_name(last);
+ val_t val = eval(define_rhs(last), env);
+ env = cons(cons(name, val), env);
+ return UNSPEC;
+ }
+ MUSTTAIL return eval(last, env);
+}
+```
+
+Inner `define` is sequential-only: visible to forms that textually
+follow it in the same body, not to earlier forms. For mutual
+recursion between local helpers, use `letrec` or lift to top level.
+The `SYM_DEFINE` handler in `eval` itself handles only top-level
+`define`; inner defines are absorbed here and never reach that path.
+
+## Special forms
+
+Implemented directly, ~15-30 lines of C each:
+
+- `quote` — return the datum.
+- `if` — eval test; tail-eval the chosen branch.
+- `lambda` — build closure.
+- `define` — top-level: write `symtab[idx].global_val`. Inner:
+ handled by `eval_body`.
+- `set!` — walk env; first match mutates. Miss: write global slot.
+- `begin` — eval each form in sequence; tail-eval the last.
+- `let` — eval all inits in the current env; extend env with
+ bindings; tail-eval body via `eval_body`.
+- `let*` — extend env one binding at a time, each init evaluated in
+ the progressively extended env.
+- `letrec` — pre-bind each name to `UNSPEC` in a new env; eval each
+ init in that env; `set!` each binding; tail-eval body.
+- `let` (named) — desugar inline to `letrec` + call, or implement
+ directly.
+- `cond` — loop clauses; first truthy test wins; tail-eval its body;
+ `else` always true.
+- `and` / `or` — short-circuit fold; return the deciding value, not a
+ coerced boolean.
+- `pmatch` — see below.
+- `define-record-type` — see below.
+
+## pmatch
+
+The matcher walks pattern and subject recursively; on success it
+returns an env extended with the bindings; on failure it returns a
+sentinel. The driver tries clauses top-to-bottom, skipping clauses
+whose guard evaluates to `#f`. No match and no `else` is undefined
+behavior per spec.
+
+Patterns:
+
+| Pattern | Matches |
+|----------------------|------------------------------------------------|
+| `()` | the empty list |
+| `<literal>` | integer / string / char / bool by `equal?` |
+| `<symbol>` | that exact symbol (not a binder) |
+| `,<ident>` | anything; binds to `<ident>` |
+| `,_` | anything; no binding |
+| `(p1 p2 ...)` | proper list of exactly that length |
+| `(p1 ... . ptail)` | improper list; `ptail` binds the rest |
+
+No compilation or caching — rewalk the patterns each call. ~200 LOC
+total for matcher plus driver.
+
+## define-record-type
+
+Desugars at eval time into a sequence of primitive operations.
+
+```scheme
+(define-record-type point
+ (make-point x y)
+ point?
+ (x point-x)
+ (y point-y set-point-y!))
+```
+
+becomes
+
+```scheme
+(define <point-td> (%make-record-td 'point 2))
+(define make-point (lambda (x y) (%make-record <point-td> x y)))
+(define point? (lambda (v) (%record-is-a? v <point-td>)))
+(define point-x (lambda (p) (%record-ref p 0)))
+(define point-y (lambda (p) (%record-ref p 1)))
+(define set-point-y! (lambda (p v) (%record-set! p 1 v)))
+```
+
+The `%` primitives are internal, not part of the user-facing
+primitive list. ~100 LOC for the desugaring plus ~100 LOC for the
+record primitives.
+
+## Primitives
+
+55 user-facing primitives plus 6 internal record ops. Dispatched by
+code id through a switch in `call_prim`.
+
+**Predicates / equality** (13): `eq?`, `equal?`, `not`, `boolean?`,
+`pair?`, `null?`, `symbol?`, `integer?`, `string?` (≡ `bytevector?`),
+`procedure?`, `record?`, `record-is-a?`, `record-type?`.
+
+**Pairs** (5): `cons`, `car`, `cdr`, `set-car!`, `set-cdr!`.
+
+**Integers** (24): `+`, `-`, `*`, `quotient`, `remainder`, `modulo`,
+`=`, `<`, `<=`, `>`, `>=`, `zero?`, `positive?`, `negative?`, `abs`,
+`min`, `max`, `bit-and`, `bit-or`, `bit-xor`, `bit-not`,
+`arithmetic-shift`, `number->string`, `string->number`.
+
+**Bytevectors / strings / symbols** (10): `make-bytevector`,
+`bytevector-length`, `bytevector-u8-ref`, `bytevector-u8-set!`,
+`bytevector-copy`, `bytevector-copy!`, `bytevector-append`,
+`bytevector=?`, `string->symbol`, `symbol->string`.
+
+**Control / program / error** (3): `apply`, `argv`, `error`.
+
+**Internal record ops** (6): `%make-record-td`, `%make-record`,
+`%record-ref`, `%record-set!`, `%record-is-a?`, `%record-td`.
+
+### Calling convention
+
+`call_prim` reads the args list from `apply`, copies the values into
+`argbuf[]`, and switches on the primitive code id. Each primitive
+reads its inputs from `argbuf`, returns a `val_t`.
+
+Fixed-arity primitives check argc at the dispatch boundary; variadic
+primitives (`+`, `-`, `append`, etc.) do their own argc handling.
+
+## Prelude
+
+Written in Scheme, embedded as a string constant, parsed and
+evaluated at startup like any other source. Contents:
+
+```scheme
+(define (list . xs) xs)
+(define (length xs) ...)
+(define (reverse xs) ...)
+(define (append . lists) ...)
+(define (list-ref xs i) ...)
+(define (map f . lists) ...)
+(define (for-each f . lists) ...)
+```
+
+Runs before the user file, sharing the same symbol table and heap.
+~80 Scheme LOC.
+
+## Reader
+
+Single-pass recursive descent over `readbuf` with an integer cursor.
+No token array. Conses directly onto the heap.
+
+Handles the lexical syntax specified in scheme-subset.md:
+
+- lists with `.` dotted tail
+- `'x` → `(quote x)` (no quasiquote, unquote, unquote-splicing)
+- fixnums: decimal and `#x`-prefixed hex, both with optional sign
+- `#t`, `#f`
+- `"..."` strings with `\n \t \r \\ \"` escapes
+- `#\a`..`#\~`, `#\newline`, `#\space`, `#\tab`, `#\return`,
+ `#\null`, `#\xNN` → read as fixnums
+- symbols: identifier grammar, case-sensitive
+- `;` line comments (no block comments, no `#;`)
+
+### Source locations
+
+As the reader conses each pair, it records `pair_ptr → (line . col)`
+in a side table (fixed-size open-addressing hash). `error` consults
+this when its first argument is a pair and prefixes the message with
+`"at file:line:col: "`. Without a GC, entries for collected pairs
+never arise; a full table simply overwrites old entries via linear
+probing.
+
+## Writer
+
+Three entry points:
+
+- `display` — no quoting on strings; bare character output.
+- `write` — readable form; strings quoted.
+- `format fmt args...` — supports `~a` (display), `~s` (write), `~d`
+ (decimal fixnum), `~%` (newline).
+
+Output buffered in `writebuf[]`; flushed on buffer-full or on newline
+for `display`/`write`. `error` flushes before writing to fd 2.
+
+## Error path
+
+`(error msg arg ...)`:
+
+1. Flush writer buffer.
+2. Write `"error: "` + message + formatted args to fd 2.
+3. `exit(1)`.
+
+No unwinding, no handlers. Primitive failure follows the same policy
+as the spec: undefined behavior, likely crash, no runtime check.
+
+## Startup
+
+```
+_start:
+ parse argv
+ if argc < 2: error "usage: lisp <file.scm>"
+ init heap_next, heap_end
+ init symtab
+ intern the special-form symbols (SYM_QUOTE, SYM_IF, ...)
+ register primitives: for each (name, code_id, arity), intern
+ the name and bind a primitive object in symtab[idx].global_val
+ parse and eval prelude (embedded)
+ open argv[1], slurp into readbuf
+ loop:
+ read one top-level form
+ eval in NIL env
+ advance cursor
+ until EOF
+ exit 0
+```
+
+All state lives in BSS; no heap allocation before `_start`.
+
+## File layout
+
+```
+lispcc/
+ src/
+ lisp.c # single C source, target ~2–2.5k LOC
+ lisp/
+ prelude.scm # embedded at build time
+ tests/lisp/
+ 00-arith.scm
+ 01-list.scm
+ 02-reader.scm
+ 03-printer.scm
+ 04-eval.scm
+ 05-tailcall.scm
+ 06-pmatch.scm
+ 07-records.scm
+ ...
+```
+
+Tests are `.scm` files the interpreter runs; pass = exit 0 and
+expected stdout. Build integrates into the existing Makefile.
+
+## Milestones
+
+Status legend: `[x]` done · `[~]` in progress · `[ ]` not started.
+
+1. [ ] **Runtime skeleton.** `_start`, argv parsing, BSS layout, bump
+ allocator, `error` path. Exits with a placeholder status.
+ ~200 LOC.
+2. [ ] **Tagged values.** Fixnum / pair / immediate encoding;
+ `cons`/`car`/`cdr`/tag predicates as C functions. Hand-build
+ `(cons 42 nil)` and exit with the decoded car. ~200 LOC.
+3. [ ] **Symbols + interning.** 16k-slot open-addressing table;
+ `intern(name, len)` returns a tagged symbol index. Two
+ `intern("foo")` calls compare equal as integers. ~250 LOC.
+4. [ ] **Bytevectors.** Headered heap object; `make-bytevector`,
+ `bytevector-u8-ref`, etc. as C helpers. ~200 LOC.
+5. [ ] **Reader.** End-to-end source → s-expression for the full
+ lexical syntax, with source-location side table. ~500 LOC.
+6. [ ] **Writer.** `display`, `write`, `format`. ~250 LOC.
+7. [ ] **`eval` + `apply` core.** Self-eval, symbol lookup, `if`,
+ `lambda`, top-level `define`, application. No musttail yet.
+ ~300 LOC.
+8. [ ] **Tail calls.** Mark all tail positions with musttail. Stress:
+ 10⁶-iteration named-let loop runs in constant host stack.
+9. [ ] **Remaining special forms.** `set!`, `begin`,
+ `let`/`let*`/`letrec`, named let, `cond`, `and`/`or`, inner
+ `define` in `eval_body`. ~300 LOC.
+10. [ ] **Primitives.** All 55 user-facing + 6 internal. Batches:
+ arith/bitwise, list-core, bv/string, I/O+apply+error, record
+ ops. ~1000 LOC.
+11. [ ] **Prelude.** Embed as string; parse and eval at startup.
+ ~80 Scheme LOC.
+12. [ ] **`define-record-type`.** Desugaring + record primitives.
+ ~250 LOC.
+13. [ ] **`pmatch`.** Pattern matcher + clause driver. ~200 LOC.
+14. [ ] **End-to-end.** Run the self-hosted compiler under it.
+
+Rough total: **~2–2.5k LOC C + ~80 Scheme LOC**.
diff --git a/docs/LISP-GC.md b/docs/LISP-GC.md
@@ -1,538 +0,0 @@
-# LISP-GC: Mark-sweep GC for `lisp.M1`
-
-Implementation plan for [LISP.md](LISP.md) step 13. Scope is the GC itself
-plus the prerequisites it imposes on P1 frame layout and the heap shape.
-
-## Position in the staged plan
-
-Step 13 in LISP.md. Prerequisite: steps 1–12 complete (runtime, values,
-reader core, printer, eval, TAIL, argv, test harness, all ~40 primitives,
-prelude). Follow-on: steps 14 (`pmatch`), 15 (REPL), 16 (source
-locations), 17 (end-to-end).
-
-## Locked decisions
-
-Load-bearing; the rest of the document assumes them.
-
-1. **Segregated arenas, headerless pairs.** Two bump pointers inside
- a single 1 MB heap. Pair arena: 16-byte objects, no header, tag
- `010` points directly to the pair's first word. Object arena:
- headered objects only (string / symbol / vector / closure /
- primitive); marks live in the `gc-flags` byte. One mark bitmap
- over the pair arena only; the object arena needs none. Sweep
- walks each arena with its own stride rule.
-2. **P1 frame-pointer chain, metadata before slots.** Every
- `PROLOGUE_Nk` frame stores the caller's `sp` as `saved_fp`, plus
- the slot count `k`, ahead of its slots. GC walks the chain
- precisely; no conservative scan.
-3. **Explicit mark stack in BSS.** 32 KB (~4k entries). Overflow
- falls back to recursive `CALL` into the marker so no structure
- is too deep to mark.
-4. **Heap size 1 MB for this step.** Triggers GC easily under stress
- tests. Grow to 20 MB in a follow-up once C-compiler workloads
- demand it.
-5. **Mostly-precise stack scan with a bounded-conservative fallback.**
- The roots in §Roots #1–#4 are *precise* — we know each entry is a
- tagged Lisp value. The stack-frame walk (#5) is *bounded
- conservative*: each frame slot is checked against
- `[pair_heap_base, pair_heap_next)` / `[obj_heap_base,
- obj_heap_next)` before being dereferenced, in the BDW-collector
- style. See §"Why mostly-precise" for the reasoning and the
- upgrade path.
-
-These supersede the following LISP.md text, which must be updated when
-this work lands: decision 11 (single bitmap over whole heap), §Heap
-layout (single arena), §GC §Roots (conservative stack filter), and
-P1.md §Semantics prologue/epilogue/TAIL.
-
-## Heap layout
-
-Single BSS region, split in two by fixed labels:
-
-```
-:pair_heap_start
- <600 KB>
-:pair_heap_end / :obj_heap_start
- <400 KB>
-:obj_heap_end
-```
-
-Bump pointers:
-
-- `pair_heap_next` — bumps by 16. Initially `pair_heap_start`, but
- `_start` rounds it up to a 16-byte boundary before any allocation
- and snapshots that aligned value into `:pair_heap_base`. The GC
- paths use `:pair_heap_base` as their canonical origin.
-- `obj_heap_next` — bumps by an 8-rounded size. Same alignment
- treatment: `_start` rounds up to 8 and snapshots `:obj_heap_base`.
-
-The two `_base` slots exist because M1 has no `.align` directive
-and the labels `:pair_heap_start` / `:obj_heap_start` land at
-whatever byte offset the assembler reached. A fixed-stride pair
-sweep walking from the raw label would visit gap addresses; the
-runtime alignment + base snapshot keeps mark and sweep on the same
-grid.
-
-Free lists:
-
-- `free_list_pair` — head of the 16-byte pair free list.
-- `free_lists[10]` — heads for 16, 24, 32, 40, 48, 56, 64, 80, 96, 128
- byte object-arena size classes.
-
-Split is tunable: the only code that cares is the two label
-definitions. Pairs dominate compiler workloads so the pair arena gets
-the larger share.
-
-### Allocators
-
-- `pair_alloc` → 16-byte raw pointer. Tries `free_list_pair`; else
- bumps `pair_heap_next`. If the bump would cross `pair_heap_end`,
- invokes `gc`, retries, else `error "heap exhausted"`.
-- `obj_alloc(size)` → raw pointer to 8-byte-aligned region. Size is
- rounded up to 8. If size ≤ 128, tries the matching free list
- before bumping `obj_heap_next`. Same GC-retry-else-error overflow
- path.
-
-`cons` routes through `pair_alloc`. Every other allocator (strings,
-symbols, vectors, closures, primitives) routes through `obj_alloc`.
-
-### Pair mark bitmap
-
-One bit per 16-byte slot in the pair arena, indexed by
-`(pair_ptr - pair_heap_base) / 16` (note: `_base`, the runtime-
-aligned origin — see §Heap layout). At 600 KB pair arena:
-600 KB / 16 = 37 500 bits ≈ 4.7 KB BSS. Declared as
-`pair_mark_bitmap` with fixed size.
-
-Live pairs also set the bit; clearing happens during sweep.
-
-### Object gc-flags
-
-Already present in every headered object: `header byte 6` (second
-byte of the header word) is the `gc-flags` byte. Bit 0 is the mark
-bit. All other bits reserved.
-
-## Frame layout
-
-New `PROLOGUE_Nk` shape (supersedes P1.md §Semantics lines 230–247):
-
-```
-[sp + 0] = caller's retaddr
-[sp + 8] = saved_fp (caller's sp value at the moment of CALL)
-[sp + 16] = k (slot count, stored as integer; low 3 bits zero)
-[sp + 24] = slot 1
-[sp + 32] = slot 2 (k >= 2)
-[sp + 40] = slot 3 (k >= 3)
-[sp + 48] = slot 4 (k >= 4)
-```
-
-Frame size: `round_up_16(24 + 8*k)`. Concrete sizes:
-
-| k | Old size | New size |
-|---|----------|----------|
-| 1 | 16 | 32 |
-| 2 | 32 | 48 |
-| 3 | 32 | 48 |
-| 4 | 48 | 64 |
-
-Cost: +16 bytes per call frame. For a 10⁶-deep tail loop the stack
-cost is unchanged (TAIL reuses the frame); for a 10⁴-deep recursive
-call it adds ~160 KB, well under any reasonable stack limit.
-
-`EPILOGUE_Nk` / `TAIL_Nk` restore `saved_fp` from `[sp+8]` before
-returning or branching.
-
-### Chain termination
-
-`_start` snapshots its own initial `sp` into `:stack_bottom_fp`
-before its first `CALL`. The first prologued callee then stores
-that same value as its `saved_fp`, so the walker compares each
-`saved_fp` it pops to `:stack_bottom_fp`'s value and stops on
-match. (The `_start` frame itself has no `PROLOGUE`, so there is
-nothing to walk past it.)
-
-### Per-arch implementation in `p1_gen.py`
-
-Three arch-specific emitters change:
-
-- **amd64**: `call` already pushes retaddr natively. Prologue does
- `pop rcx; mov rax, rsp; sub rsp, frame_size; mov [rsp], rcx;
- mov [rsp+8], rax; mov qword [rsp+16], k`, then zeros each slot
- in turn. Epilogue reverses.
-- **aarch64**: `sub sp, sp, frame_size; str lr, [sp, 0]; add x8, sp,
- #frame_size; str x8, [sp, 8]; movz x8, #k; str x8, [sp, 16]`,
- then `str xzr, [sp, 24+8*i]` per slot. Epilogue reverses.
- *Important:* the k-load uses `MOVZ` rather than `add x8, xzr, #k`
- because aarch64 ADD encodes register 31 as SP, not XZR — the
- obvious `mov xN, #imm` synthesis silently computes `sp + imm`.
- This was a latent bug pre-GC since nothing read `[sp+16]`.
-- **riscv64**: analogous to aarch64, using `ra` and a scratch `t0`.
- Slot zeroing uses `sd zero, off(sp)`.
-
-Slot zeroing is the prologue-side half of the mostly-precise scheme
-(see §"Why mostly-precise"); the bounds check in `mark_value_*` is
-the walker-side half.
-
-The "caller's sp" is `sp + frame_size` on all three arches after the
-sub has run, since prologue ran immediately after call.
-
-### Slot-offset shift in `lisp.M1`
-
-Every `sp`-relative access throughout `lisp.M1` shifts:
-
-| Old offset | New offset |
-|------------|------------|
-| 8 | 24 |
-| 16 | 32 |
-| 24 | 40 |
-| 32 | 48 |
-
-The change is mechanical but wide. Step 1 of the implementation
-sequence is entirely about landing this shift cleanly on all three
-arches before any GC work begins.
-
-## Roots
-
-Five sources, walked in this order on each GC invocation:
-
-1. **`global_env_cell`.** Alist head in BSS. Traced transitively via
- `mark_object`.
-2. **`symbol_table[0..4095]`.** Intern table's backing array. Walk
- each slot; mark occupied entries as tagged heap pointers.
-3. **`prim_argv[0..prim_argc)`.** Scratch region live during a
- primitive call. `marshal_argv` publishes the count into the
- `:prim_argc` BSS slot, which the walker uses as the upper bound;
- stale entries past `prim_argc` (from earlier primitive calls)
- are skipped so they can't pin freed objects across GCs.
-4. **Reader scratch** (`saved_src_*`). Holds the integer cursor /
- line / col plus a raw byte pointer into `:src_buf`. None of these
- are tagged Lisp values, so no marking is needed; listed here only
- to make the omission explicit.
-5. **Stack (fp chain).** Start at the caller of `gc()`, walk
- `saved_fp` until the value snapshotted in `:stack_bottom_fp` at
- `_start` (the kernel-provided initial sp). At each frame, read
- `k` from `[fp+16]` and pass every word in `[fp+24..+24+8*k)`
- through `mark_object`. Each word is bounds-checked against the
- pair / obj arenas before dereference (see §"Why mostly-precise").
-
-The `sym_*` special-form symbols are reachable via `symbol_table`
-so they need no separate root entry.
-
-## Why mostly-precise
-
-A fully precise stack scan needs the walker to know, at every
-GC-fire point, exactly which slots in each frame hold tagged Lisp
-values. Two things break that invariant in this codebase:
-
-1. **Uninitialized slots.** A function reserves `k` slots in its
- prologue, but its body may take an allocating call before having
- written all of them. Slots that haven't been written yet contain
- stale stack bytes from prior frames.
-2. **Slots holding non-tagged values.** `obj_alloc` legitimately
- spills a rounded byte size and a free-list head pointer into its
- slots. `intern` / `make_string` / `make_symbol` spill raw byte
- pointers (into `:src_buf` or `:str_*` literal regions) that may
- not even be 8-aligned. These have arbitrary low 3 bits and don't
- look like tagged Lisp values.
-
-The standard answers in the literature:
-
-- **Precise stack maps.** Compiler emits per-callsite metadata
- (HotSpot, V8, Go, OCaml, SBCL). Best correctness, requires real
- codegen infrastructure.
-- **Strict slot discipline.** Every walked slot is always a tagged
- value or sentinel; raw values live in registers or non-walked
- storage. Used by classic SECD-style implementations.
-- **Conservative scanning.** Treat every machine word as potentially
- a pointer; check if it falls in heap range. Boehm–Demers–Weiser,
- Mono, MicroPython, JavaScriptCore (for the C++ stack).
-- **Mostly-precise / semi-conservative.** Precise for known-typed
- roots, conservative for the C stack. CPython's cstack bridge,
- SpiderMonkey for foreign frames.
-
-This implementation is mostly-precise (#4). The defenses live in
-two places:
-
-- **`PROLOGUE_Nk` zeros every slot it reserves** (`p1_gen.py`
- prologue emitters). An uninitialized slot deterministically reads
- as `0`, which has no heap tag and is filtered by the bounds check
- for free.
-- **`mark_value_pair` / `mark_value_object` bounds-check the
- candidate raw pointer** against `[heap_base, heap_next)` of its
- arena before any dereference. A stale stack value whose low 3 bits
- happen to match a heap tag is rejected unless its address actually
- lies in the live arena prefix.
-
-The bounds check is the BDW idea in miniature, scoped to the stack
-walk only. False positives (a random stack int that does fall in
-the heap range) cause leaks, not crashes — they pin a dead object
-until something perturbs the stack enough to overwrite the slot.
-
-### Upgrade path
-
-If/when leak measurement shows the conservative fallback pinning
-real memory, the clean answer is **two slot counts per frame**:
-`k_tagged` for GC-walked slots, `k_raw` for scratch the GC
-ignores. `obj_alloc` declares 0+2, `cons` declares 2+0,
-`intern` declares 0+3. The frame layout grows a second count
-word, every `PROLOGUE_Nk` site is reclassified, frame-relative
-offsets shift for the raw region, and the walker can drop both
-the slot zeroing and the bounds check. That's the right shape
-for the long-lived compiler; for the seed that bootstraps one
-C compiler, the cost outweighs the benefit.
-
-## Mark phase
-
-```
-mark_sp ← mark_stack_base
-for each root r: mark_object(r)
-while mark_sp > mark_stack_base:
- v ← pop(mark_stack)
- mark_object(v)
-```
-
-`mark_object(v)` dispatches on the low-3-bit tag:
-
-- **Fixnum, singleton**: no-op.
-- **Pair (`010`)**: compute bit index from pointer; if bit set,
- return (already marked). Else set bit, push `car` and `cdr` onto
- `mark_stack`.
-- **Vector (`011`)**: read header; if gc-flags bit 0 set, return.
- Else set it, read length, push every element.
-- **String (`100`)**: set mark bit; no children.
-- **Symbol (`101`)**: set mark bit; push name string and intern-chain
- link.
-- **Closure / primitive (`110`)**: set mark bit; closures push
- params list, body list, env pointer; primitives have no children.
-
-### Mark-stack overflow
-
-If `push(mark_stack)` would overflow, recurse by `CALL mark_object`
-on the overflowing value instead. P1 stack depth grows temporarily
-but every reachable object still gets marked. Compiler workloads
-should never hit this path; `06-gc-stress.scm` and
-`18-gc-deep-list.scm` will exercise it intentionally.
-
-## Sweep phase
-
-Two per-arena loops.
-
-### Pair arena sweep
-
-```
-p ← pair_heap_start
-while p < pair_heap_next:
- i ← (p - pair_heap_start) / 16
- if bit i of pair_mark_bitmap is set:
- clear bit i
- else:
- push p onto free_list_pair
- p ← p + 16
-```
-
-Tail rewind: track the last live pair address; if the tail of the
-arena is all-dead, rewind `pair_heap_next` to just past the last
-live. Optional for correctness; small win for fragmentation. Land
-it in step 4 only if the pair-arena free list grows pathologically
-in the stress tests.
-
-### Object arena sweep
-
-Every live headered object has a valid header. Free chunks carry a
-pseudo-header `type=0, gc-flags=0, length=chunk_size` so the walker
-strides uniformly.
-
-```
-p ← obj_heap_start
-while p < obj_heap_next:
- h ← LD [p]
- sz ← header_length(h)
- rounded ← round_up_8(sz)
- if header_type(h) != 0 and header_mark(h):
- clear mark bit
- else:
- if rounded ≤ 128:
- push p onto free_lists[class_of(rounded)]
- overwrite header as free pseudo-header
- else:
- coalesce with following chunk if also dead; write
- combined pseudo-header
- p ← p + rounded
-```
-
-Tail rewind: same optional logic; rewind `obj_heap_next` past the
-trailing all-dead region.
-
-Coalescing pass runs before the free-list push so adjacent dead
->128-byte runs merge into one. Free-list push happens after, for
-the sub-128 chunks.
-
-## Allocator integration
-
-`pair_alloc` and `obj_alloc` both follow the same template:
-
-```
-head ← free_list_head (for size class)
-if head != 0:
- free_list_head ← [head + 0]
- return head
-new ← bump_pointer
-if new + size > arena_end:
- call gc
- retry from top once
- if still no room: error "heap exhausted"
-bump_pointer ← new + size
-return new
-```
-
-`gc` is a normal P1 function that does mark+sweep and returns. It
-assumes GC-safety invariants hold at its entry (every live tagged
-value is either in a named BSS root or in a `PROLOGUE_Nk` slot
-somewhere on the fp chain). These invariants already hold in the
-existing codebase; the step-13 work adds none.
-
-## Implementation sequence
-
-Each step assembles through P1 → M1 → hex2 and passes
-`make test-lisp-all` before the next begins.
-
-1. **`p1_gen.py` frame change + `lisp.M1` offset shift.**
- ~150 Py LOC + mechanical `lisp.M1` edit. Rewrite
- prologue/epilogue/TAIL for amd64, aarch64, riscv64; shift every
- `sp`-relative offset in `lisp.M1`. Write `stack_bottom_fp = 0`
- in `_start`. **Gate**: every existing test passes on all three
- arches. No GC code yet.
-2. **Segregated heap + free-list scaffolding.** ~100 P1 LOC.
- Split heap labels; route `cons` through `pair_alloc`, others
- through `obj_alloc`; declare `free_list_pair` and `free_lists[10]`
- (heads only, no populating logic yet); overflow still routes to
- `alloc_oom`. **Gate**: existing tests pass.
-3. **Mark phase.** ~200 P1 LOC. `mark_stack[4096]`,
- `pair_mark_bitmap`, `mark_object`, root walk including fp-chain
- traversal. Add hidden `(gc-mark-only)` primitive that runs mark
- and returns live-object count. **Gate**: a test validates the
- count matches a known reachable-set size; hitting the mark-stack
- overflow path is exercised by `18-gc-deep-list.scm`.
-4. **Sweep phase + wire into allocator.** ~200 P1 LOC. Both
- per-arena sweeps, free-list population, free-chunk pseudo-headers,
- coalescing for >128 chunks. Replace the `alloc_oom` first-check
- branch with `gc → retry → error`. **Gate**: existing tests pass;
- free lists populated after first GC.
-5. **Stress tests.** Five new `.scm` files in `tests/lisp/`:
- - `17-gc-cons-churn.scm` — 100k pair allocations in a loop, verify
- deterministic checksum.
- - `18-gc-deep-list.scm` — 10k-element list, post-GC sum.
- - `19-gc-vector-churn.scm` — vector alloc/drop, exercises >128-byte
- coalesce.
- - `20-gc-closure-churn.scm` — nested lambdas creating closures
- faster than arena; validates env-chain rooting.
- - `21-gc-mixed.scm` — interleaved pair/vector/string/closure.
- **Gate**: all pass on `make test-lisp-all`.
-
-Rough total: ~700 P1 LOC + ~150 Py LOC + 5 test files.
-
-## Doc updates bundled with the code
-
-- **LISP.md decision 11**: rewrite to "Pair mark bits live in a
- per-pair-arena bitmap; object arena marks live in each object's
- `gc-flags` byte. Pairs stay 16 bytes."
-- **LISP.md §Heap layout**: describe the two-arena split.
-- **LISP.md §GC §Algorithm**: replace the single-walk description
- with per-arena sweep.
-- **LISP.md §GC §Roots**: replace conservative-scan language with
- the fp-chain walker description.
-- **P1.md §Semantics** (prologue / epilogue / TAIL sections, lines
- 230–269): new frame shape with `saved_fp` at `[sp+8]` and `k` at
- `[sp+16]`; updated frame size formula; per-arch mechanics updated.
-
-## Risks and mitigations
-
-- **Missed slot-offset shift in step 1.** Silent: a function with
- a stale `sp,8` access reads `saved_fp` as if it were a slot, and
- corrupts it on store. Mitigation: grep every `,sp,` pattern in
- `lisp.M1` before declaring step 1 done; run every test on every
- arch.
-- **GC fires during primitive with live values only in registers.**
- Primitives must honor the per-frame-slot root discipline. The
- existing code already does; step-13 work must not introduce new
- violations. Audit every new GC code path for this at review time.
-- **fp-chain consistency during mark-stack overflow recursion.**
- The recursive `CALL mark_object` pushes a new frame with its own
- `saved_fp`. The root walker must not re-walk frames it's already
- processed. Simple fix: the walker captures the chain length at
- entry and only walks down to that depth; frames above are the
- walker's own and are skipped.
-- **Free-list corruption from a bug in sweep.** Free chunks double
- as linked-list nodes via their first word. An off-by-one in sweep
- can overwrite a live object with a free-list link. Mitigation:
- stress tests `17` and `21` allocate + read back; any corruption
- surfaces as a crash or wrong checksum.
-
-## Remaining work
-
-Tracked here so the next pass at this code can pick them up without
-re-deriving the list.
-
-### Doc sync (blocks "step 13 complete")
-
-- [ ] `docs/LISP.md` decision 11: rewrite from "single bitmap over
- whole heap" to "pair-arena bitmap + object `gc-flags` byte;
- two segregated arenas."
-- [ ] `docs/LISP.md` §Heap layout: replace the single-arena
- description with the pair/obj split and the runtime-aligned
- `:pair_heap_base` / `:obj_heap_base` snapshots.
-- [ ] `docs/LISP.md` §GC §Roots: drop the conservative-stack-filter
- language; replace with the fp-chain walker + bounded-
- conservative-on-stack description (cross-ref §"Why
- mostly-precise" here).
-- [ ] `docs/LISP.md` §GC §Algorithm: replace the single-walk
- description with the per-arena sweep (bitmap-driven pair
- sweep, header-stride object sweep, free-list population,
- coalescing).
-- [ ] `docs/P1.md` §Semantics prologue/epilogue/TAIL: new frame
- shape with `saved_fp` at `[sp+8]` and `k` at `[sp+16]`;
- updated frame-size formula; note that PROLOGUE_Nk zeros
- each slot (cross-ref LISP-GC.md §"Per-arch implementation").
- Flag the aarch64 MOVZ-over-ADD detail as a load-bearing
- encoding gotcha.
-
-### Stress tests to reach the doc's step-5 set
-
-- [ ] Rename current `19-gc-closure-churn.scm` → `20-gc-closure-
- churn.scm` so numbering matches the plan in §Implementation
- sequence.
-- [ ] Add `19-gc-vector-churn.scm`. Currently the `>128`-byte
- coalescing path in `gc_sweep_obj` has *zero* test coverage;
- a loop that allocates and drops vectors of varying sizes
- (including some `>128` bytes) would hit it.
-- [ ] Add `21-gc-mixed.scm`. Interleaves pair/vector/string/closure
- allocations so pair and obj GC cycles overlap.
-- [ ] Rework `18-gc-deep-list.scm` (or add a companion test) to
- force the mark-stack-overflow recursive-`CALL mark_value`
- fallback. The current test is tail-recursive so the mark
- stack stays shallow; a non-tail-recursive build of a deep
- reachable list would actually exercise the overflow path.
-
-### Deferred by explicit decision (revisit under load)
-
-- [ ] Grow heap from the current bring-up 32 KB + 32 KB to the
- decision-4 size of 1 MB (pair) + 400 KB (obj), and
- eventually 20 MB total once the C-compiler workload lands.
- Keep the tiny bring-up sizes until then so GC stays cheap
- to trigger in tests.
-- [ ] Optional tail-rewind of `pair_heap_next` / `obj_heap_next`
- during sweep. Marked optional; only land if fragmentation
- shows up under load.
-- [ ] Upgrade mostly-precise → fully precise via two slot counts
- per frame (`k_tagged` + `k_raw`). Documented in §"Why
- mostly-precise" as the upgrade path; only land if leak
- measurement (next item) shows the conservative fallback
- pinning real memory.
-
-### Unmeasured invariants
-
-- [ ] Instrument leak rate: count live pairs / objects retained
- after each sweep over a churn workload. Today's claim that
- "false marks from the bounded-conservative stack scan are
- rare in practice" is unverified; a single counter would
- either close the concern or trigger the precise upgrade.
diff --git a/docs/LISP.md b/docs/LISP.md
@@ -1,581 +1,217 @@
-# LISP: Seed Lisp interpreter in P1
-
-## Goal
-
-A small Scheme-flavored Lisp interpreter written once in the P1 pseudo-ISA
-(see [P1.md](P1.md)) and assembled to three host arches (amd64, aarch64,
-riscv64) via the existing `M1` + `hex2` chain. Hosts the C compiler (see
-[PLAN.md](PLAN.md)) as a Lisp program; that compiler is used for both the
-seed userland ([SEED.md](SEED.md)) and tcc-boot. Target size: 4,000–6,000
-lines of P1.
-
-This document covers only the interpreter. The Lisp-language surface it
-exposes is specified by PLAN.md's "Feature floor" — this doc does not
-restate it.
-
-## Position in the chain
-
-```
-M1 asm → P1 pseudo-ISA → Lisp interpreter (this doc) → C compiler → seed + tcc-boot
-```
-
-The interpreter binary is the last artifact built from P1 source in the
-bootstrap. Everything above it is either Lisp source text or a C binary
-produced by the Lisp-hosted C compiler.
-
-## Settled decisions
-
-Load-bearing; the rest of the document assumes them.
-
-1. **Tagged cells, low-3-bit tag, 61-bit fixnum payload.** See
- §Value representation.
-2. **Mark-sweep GC over a fixed BSS arena.** 20 MB, linked in at build
- time, no `brk`/`mmap`.
-3. **Per-frame root slots on the P1 stack.** Scheme-facing functions
- declare `PROLOGUE_Nk` frames with one slot per live tagged value
- they need across an allocating call; the GC mark phase walks those
- frames. `global_env` and `symbol_table` remain named BSS roots
- (long-lived singletons with no stack frame to hang off of).
-4. **Source file via `argv[1]`.** The interpreter opens the path given
- on the command line, reads it whole into a string, parses it,
- evaluates top-to-bottom. No linked-in blob.
-5. **Optional REPL under `--repl`.** Development aid; not used by the
- bootstrap pipeline. ~100 P1 LOC.
-6. **No bignums, ever.** 61-bit fixnums only. Confirmed: the C compiler
- never needs larger.
-7. **Single `lisp.M1` source file.** M1 has no `#include` and
- concatenating shards in the Makefile adds build-time coupling we
- don't need yet. Revisit only if the file passes ~6k LOC.
-8. **`pmatch` is a built-in special form, not a user-level macro.**
- Saves writing a general macro expander for one feature.
-9. **Tail calls via P1 `TAIL`.** `eval` dispatches tail-position calls
- through `TAIL`; non-tail through `CALL`. Scheme-level tail-call
- correctness falls out for free.
-10. **Eight syscalls: `read`, `write`, `openat`, `close`, `exit`,
- `clone`, `execve`, `waitid`.** Matches PLAN.md. The last three
- let the Lisp program spawn M1/hex2 and act as the tcc-boot
- build driver; `openat(AT_FDCWD, …)` replaces bare `open`
- because aarch64/riscv64 lack it in the asm-generic table. No
- signals, no `lseek`, no `stat`.
-11. **Pair GC marks live in a separate bitmap**, not in the pair words.
- ~1.25 MB BSS for a 20 MB heap; keeps pairs at 16 bytes and keeps
- fixnums at 61 bits.
-12. **Fixnum overflow wraps mod 2^61.** `+`/`-`/`*` do not trap. C
- compiler workloads never approach 2^60.
-13. **`equal?` does not detect cycles.** Cyclic input loops forever;
- the C compiler never constructs cycles.
-14. **Reader attaches source locations via a side table.** Pair-pointer
- → `(line . col)` lookup, populated by the reader, consulted by the
- error path. ~200 LOC.
-15. **Closures are three-slot objects**: header + params + body + env.
- GC traces three fields; revisit only if closure churn measures hot.
-
-## Value representation
-
-All Lisp values are 64-bit tagged words. Low 3 bits are the tag:
-
-| Tag (bin) | Meaning | Payload |
-|-----------|----------------------|---------------------------------------|
-| `001` | fixnum | signed 61-bit, `>> 3` to recover |
-| `010` | pair pointer | 8-byte aligned, clear low 3 to deref |
-| `011` | vector pointer | idem |
-| `100` | string pointer | idem |
-| `101` | symbol pointer | idem, interned (`eq?`-comparable) |
-| `110` | closure / primitive | idem |
-| `111` | singleton constant | `()`, `#t`, `#f`, `eof`, `unspec` |
-
-Pointer tags require 8-byte-aligned allocation, enforced by the bump
-allocator.
-
-Singletons live in the `111` space as small immediates (e.g. `0x07` nil,
-`0x0F` #t, `0x17` #f, `0x1F` eof, `0x27` unspec). Distinguishable from
-fixnums by tag, from pointers by tag, and from each other by payload.
-
-### Pairs
-
-Two consecutive 8-byte words: `[car][cdr]`. No header. The tag on the
-pointer tells GC it is a pair and has no embedded length.
-
-### Headered objects
-
-Vectors, strings, symbols, closures, and primitives carry a 1-word header
-before their payload:
-
-```
-[ 8-bit type | 8-bit gc-flags | 48-bit length-or-arity ]
-```
-
-- **Vector**: header + `length` tagged words.
-- **String**: header + `length` bytes (ASCII, per PLAN.md). Not
- null-terminated; length is authoritative. Padded up to 8-byte
- alignment.
-- **Symbol**: header + name bytes + a 1-word intern-chain link. Two
- symbols are `eq?` iff their pointers match; the interner guarantees
- one object per name.
-- **Closure**: header (type 4) + `params` list pointer + `body` list
- pointer + `env` pointer. Arity in header.
-- **Primitive (fixed arity)**: header (type 5) + primitive code id.
- Arity in header low 48 bits.
-- **Primitive (variadic)**: header (type 6) + primitive code id.
- Arity field unused; body does its own min-arg check.
-
-The "code id" is an integer 1..N assigned at generation time. `apply`
-resolves it by cascade — a chain of `BEQ code, N, &prim_N_body` — so
-the seed doesn't need an indirect-call op in P1. A future refactor can
-swap the cascade for an indirect call once a `CALLR` op lands; the
-primitive object layout doesn't change.
-
-## Heap layout
-
-Single BSS region, fixed at link time:
-
-```
-:heap_start
- <20 MB of zero bytes>
-:heap_end
-```
-
-- `heap_next` — bump pointer, starts at `heap_start`.
-- `free_lists[10]` — free-list heads per size class (see §GC).
-
-### Allocation
-
-`alloc(size)` rounds `size` up to 8 bytes, consults the matching size-class
-free list first, otherwise bumps `heap_next`. If the bump would cross
-`heap_end`, run GC; if still no room, `error "heap exhausted"`.
-
-Everything is 8-byte aligned by construction.
-
-## GC
-
-Mark-sweep, stop-the-world, invoked only when the allocator overflows.
-Not triggered by time, step count, or explicit request.
-
-### Roots
-
-Three sources:
-
-1. **Global environment.** Fixed pointer in BSS (`global_env`), traced
- transitively.
-2. **Symbol table.** Intern table's backing array plus all interned
- symbols and their name strings. Fixed pointer (`symbol_table`).
-3. **Per-frame stack slots.** The mark phase walks the P1 stack from
- the current `sp` up to the saved frame-chain bottom. Each frame
- declares its slot count via `PROLOGUE_Nk` (see P1.md §"Semantics"),
- and the GC treats every slot word as a tagged-value root —
- non-pointer slots self-filter because their tags don't match any
- heap object.
-
-Return addresses and raw P1 integers on the stack sit outside the
-PROLOGUE_Nk slot range, so the walker doesn't see them.
-
-### Algorithm
-
-1. **Mark.** Traverse from roots, setting the `gc-flags` mark bit on
- each reachable headered object. Pairs have no header; their marks
- live in a separate bitmap indexed by
- `(pair_addr - heap_start) / 16`. Bitmap cost: ~1.25 MB for a 20 MB
- heap; removes the need to steal bits from pair data.
-2. **Sweep.** Walk the arena object-by-object. Live objects clear
- their mark and continue. Dead objects of size ≤ 64 bytes join the
- matching free list; larger dead objects are coalesced, and if the
- tail of the arena is all free, rewind `heap_next`. No object
- motion.
-
-### Size classes
-
-Free lists for 16, 24, 32, 40, 48, 56, 64, 80, 96, 128 bytes (ten
-classes). The low seven cover the common object sizes:
-
-| Class | Typical occupants |
-|-------|-----------------------------------------------------------------|
-| 16 | pairs (dominant), primitives, 0–8 char strings |
-| 24 | short symbols (1–8 char names), 9–16 char strings, 2-slot |
-| 32 | closures, medium symbols (9–16 char names), 3-slot vectors |
-| 40 | 4-slot vectors / env frames of 4 vars |
-| 48 | 5-slot vectors |
-| 56 | 6-slot vectors |
-| 64 | 7-slot vectors |
-| 80 | 9-slot vectors, env frames of 8 vars |
-| 96 | 11-slot vectors, env frames of 10 vars |
-| 128 | 15-slot vectors, env frames up to 15 vars |
-
-Pairs (16) dominate allocation volume in compiler workloads; closures
-(32) are a fixed size so get their own class. The 80/96/128 classes
-catch local-env vectors for functions with 8–15 bound names, which
-would otherwise only be reclaimed by bump-pointer rewind.
-
-Allocations above 128 bytes go through the bump pointer on every
-cycle — typically long string literals and large vectors, both
-uncommon in compiler workloads.
-
-### Invariants
-
-- Between any two GC-observable points, every live Scheme value must be
- reachable from `global_env`, `symbol_table`, or a PROLOGUE_Nk slot
- in an active frame. "GC-observable" = any call to `cons`,
- `make-vector`, `make-string`, `make-closure`, `intern`, or another
- Scheme function.
-- Scheme-facing functions spill each live tagged value into a frame
- slot before any allocating call. A value kept only in a P1 register
- will be lost if GC fires mid-call.
-
-Tradeoffs taken:
-
-- Mark bitmap costs memory but keeps pair encoding simple.
-- Stop-the-world is fine: single-threaded, and the C compiler is not
- latency-sensitive.
-- No compaction: fragmentation is acceptable for a ≤20 MB heap with a
- pair-dominated workload.
-
-## Environment
-
-Two layers:
-
-- **Global env.** Open-addressing hash table keyed by symbol pointer
- identity. 4096 slots, linear probe. Holds bindings established by
- top-level `define`.
-- **Local env.** A chain of frames. Each frame is a pair
- `((names-list . values-vector) . parent-frame)`. Lookup walks up
- until a name matches, then falls through to the global hash.
-
-Hash over a symbol pointer: `(ptr >> 3) & (TABLE_SIZE - 1)`. Collision
-rate is low — the intern table guarantees one pointer per name, and
-Lisp source is sparse in identifiers.
-
-`define` at top level writes to the global hash; `define` inside a body
-is rewritten to `let`-shape at eval time (standard single-pass
-practice).
-
-## Eval
-
-Textbook meta-circular recursive-descent evaluator. Each form has a case:
-
-```
-eval(expr, env):
- if self-evaluating: return expr
- if symbol: return lookup(expr, env)
- match car(expr):
- 'quote → return cadr(expr)
- 'if → eval test; TAIL eval selected branch
- 'begin → eval each non-last; TAIL eval last
- 'lambda → return make-closure(params, body, env)
- 'define → bind in env; return unspec
- 'set! → mutate binding; return unspec
- 'let/let*/letrec → desugar to lambda application; TAIL eval
- 'cond → desugar to nested if; TAIL eval
- 'quasiquote → expand; TAIL eval
- 'pmatch → compile arm chain; TAIL eval selected arm
- otherwise → eval callee + args; TAIL apply
-```
-
-`TAIL` above compiles to a literal P1 `TAIL` — the current `eval`
-frame is torn down before the recursive call. Non-tail positions
-(test of `if`, non-last of `begin`, callee/arg evaluation of an
-application) use P1 `CALL`.
-
-### Application
-
-`apply(callee, args-list)`:
-
-- **Primitive:** load function pointer from header, pack args into a
- fixed `argv` region in BSS, `CALL` the primitive, return its result.
-- **Closure:** construct a new frame binding params to args; `TAIL eval`
- the body in the new env.
-
-Arity mismatches raise `error`.
-
-## Reader
-
-Recursive-descent over the in-memory source string with an integer
-cursor. No port abstraction.
-
-```
-read(src-string, cursor) → (value, new-cursor)
-```
-
-Tokens are produced on demand; no token array.
-
-Forms recognized:
-
-- `( ... )` — list, with `.` for improper tail
-- `' expr` → `(quote expr)`
-- `` ` expr `` → `(quasiquote expr)`
-- `, expr` → `(unquote expr)`
-- `,@ expr` → `(unquote-splicing expr)`
-- `#( ... )` — vector literal
-- `#t`, `#f`
-- integer literals: decimal (`-42`, `123`) and hex (`0x7F`)
-- character literal: `#\a`, `#\space`, `#\newline`, `#\tab`
-- string literal: `"..."` with `\n \t \\ \"` escapes
-- symbol: every other `[!-~]` run not starting with a digit
-- `;` — line comment, skipped
-
-Reader output conses directly into the heap; the spill-slot discipline
-applies.
-
-### Source locations
-
-The reader maintains a side table mapping pair pointer → `(line . col)`
-of the opening token for that form. The table is an open-addressing
-hash keyed by pair pointer identity; it is a GC root so locations
-survive across collections, and entries for dead pairs are dropped
-during sweep (a location whose pair pointer is unmarked is itself
-dead). `error` consults this table when its argument is a pair and
-prefixes the diagnostic with `"at <file>:<line>:<col>: "`.
-
-Cost: ~150 LOC in the reader, ~50 LOC in the error path and GC sweep.
-Pays for itself when the C compiler mis-parses a tcc-boot source.
-
-## Printer
-
-Three entry points:
-
-- `display` — no quoting on strings; bare character output.
-- `write` — readable form: strings quoted, characters prefixed.
-- `format fmt args...` — supports `~a` (display), `~s` (write), `~d`
- (decimal fixnum), `~%` (newline). No other directives.
-
-Output buffered in a 4 KB BSS buffer; flushed on full or on newline for
-`display`/`write`. `error` flushes before writing to fd 2.
-
-## Symbol interning
-
-Single open-addressing hash table with linear probe. Hash:
-
-```
-h = 0
-for each byte b in name:
- h = (h * 31 + b) & 0xFFFFFFFF
-return h & (TABLE_SIZE - 1)
-```
-
-`TABLE_SIZE = 4096`. Invoked on every symbol the reader sees; the
-table is a GC root.
-
-## Primitives FFI
-
-Each primitive is an ordinary P1 function:
-
-- **In:** `r1` = argc, `r2` = pointer to argv (tagged words,
- back-to-back).
-- **Out:** `r0` = result tagged word.
-
-At startup, the interpreter walks a static `(name, len, code, arity,
-type)` table and, for each row, interns the name, allocates a
-primitive heap object (type 5 fixed or type 6 variadic, carrying the
-code id), and binds it in `global_env`. Primitives are first-class
-values.
-
-`apply` dispatches by cascade on the code id: a straight-line chain of
-`BEQ r_code, N, &prim_N_body` — one entry per primitive. The cost is
-O(n) per call; acceptable for seed workloads. Can be swapped for an
-indirect call later (add `CALLR` to p1_gen.py, store `&prim_N_body`
-directly in the object at +8) without touching any primitive body.
-
-Primitives allocate through the same `alloc`/GC path as user code and
-must honor the per-frame-slot root discipline when they make more than
-one allocation.
-
-Target set: ~40 primitives per PLAN.md. Roughly:
-
-- **List/pair**: `cons car cdr pair? null? list? list length append
- reverse assoc member`
-- **Arithmetic**: `+ - * / % = < > <= >= zero? negative? positive?
- abs min max`
-- **Bitwise**: `bit-and bit-or bit-xor bit-not arithmetic-shift`
-- **String**: `string? string-length string-ref substring
- string-append string->symbol symbol->string`
-- **Vector**: `make-vector vector-ref vector-set! vector-length
- vector->list list->vector`
-- **Type predicates**: `number? symbol? string? vector? procedure?
- eq? eqv? equal?`
-- **I/O**: `read-file write-file display write newline format error`
-- **Control**: `apply`
-
-`map`, `filter`, and `fold` are higher-order and would force
-primitives to re-enter `eval`. They live in a short Scheme prelude
-instead (evaluated by `_start` after the primitive registration loop);
-not P1 code.
-
-## `pmatch`
-
-Built-in special form. Pattern language:
-
-- literal fixnum / string / symbol — matches by `equal?`
-- `?var` — bind variable
-- `_` — wildcard
-- `(p1 p2 ...)` — structural list match
-- `(p1 . p2)` — improper-list split
-- `(quote x)` / `'x` — symbol literal match
-- `(guard expr)` — predicate after a match
-
-```
-(pmatch expr
- [pattern₁ body₁]
- [pattern₂ body₂]
- ...
- [else body-else])
-```
-
-Compiled at eval time into a chain of test / bind / `TAIL eval body`
-operations. Compilation result is cached in the AST after first
-expansion so a `pmatch` site does not recompile per execution.
-
-## Error path
-
-`(error msg arg ...)`:
-
-1. Flush any pending printer buffer.
-2. Write `"error: "` + the message + formatted args to fd 2.
-3. `exit(1)`.
-
-No unwinding, no continuations, no handlers. The C compiler either
-succeeds or exits with diagnostics.
-
-## Startup
-
-```
-_start:
- argc at [sp + 0], argv at [sp + 8] (per P1 entry conv)
- if argc < 2: error "usage: lisp <file.scm> | --repl"
- if argv[1] == "--repl": repl_loop()
- else: run_file(argv[1])
- exit(0)
-```
-
-`run_file` reads the whole file into a string, then loops:
-parse one top-level form, eval it, discard, advance cursor — so a
-`define` of a helper is visible before the next form that uses it.
-
-### REPL
-
-Under `--repl`:
-
-```
-loop:
- write "> " to fd 1
- read one line from fd 0 into scratch buffer
- on EOF: exit 0
- parse one expression from the line
- eval; write result; newline
- goto loop
-```
-
-~100 LOC. Development aid; not invoked by the bootstrap pipeline.
-
-## File layout
-
-```
-lispcc/
- lisp.M1 # single P1 source, target 4–6k LOC
- tests/lisp/
- 00-arith.scm
- 01-list.scm
- 02-reader.scm
- 03-printer.scm
- 04-eval.scm
- 05-tailcall.scm
- 06-gc-stress.scm
- 07-pmatch.scm
- 08-records.scm
- ...
-```
-
-Tests are `.scm` files the interpreter runs; pass = exit 0 and expected
-stdout. `make test-lisp` loops over them; `make test-lisp-all` runs
-each test on all three arches via the P1 differential harness.
-
-## Staged implementation plan
-
-Each step assembles through existing P1 → M1 → hex2 and carries test
-coverage before the next begins.
-
-Status legend: `[x]` done · `[~]` in progress · `[ ]` not started.
-
-1. [x] **Runtime skeleton.** `_start`, argv parsing, `read-file`/`write-file`
- via syscalls, `error` path, BSS layout, bump allocator. Exits with
- the file's first byte as proof-of-life. ~200 P1 LOC.
-2. [x] **Tagged values.** Fixnum/pair/singleton encoding; `cons`/`car`/`cdr`/
- tag predicates. Hand-build `cons(42, nil)` and exit with the decoded
- car as status. ~300 LOC.
-3. [x] **Strings + symbol interning.** Heap strings, 4096-slot intern table,
- symbol allocator. Two `(intern "foo")` calls return `eq?` pointers.
- ~400 LOC.
-4. [x] **Reader (core).** End-to-end `source → sexpr` for lists, decimal
- fixnums, interned symbols, `;` comments, with line/col tracking for
- diagnostics. Extended syntax (quotes, `#\char`, `#( … )`, improper
- `.` tail, hex/negative fixnums) and the source-location side table
- land in later steps. (`#t`/`#f` landed in 10c; string literals in
- 10e.) ~500 LOC.
-5. [x] **Printer.** `display`, `write`, minimal `format`. Closes a
- read-print cycle. ~300 LOC.
-6. [x] **Eval (non-tail).** Self-evaluators, lookup, `if`, `begin`,
- `lambda`, `define`, application via P1 `CALL`. ~600 LOC.
-7. [x] **`TAIL` integration.** Rewrite tail-position `eval` calls to P1
- `TAIL`. Stress: loop of 10^6 self-calls with flat P1 stack.
-8. [x] **`argv[1]` file read.** Replace the embedded `src_text` blob with a
- syscall path: open and read the file named on the command line into
- a heap string; `error` on `argc < 2`.
-9. [x] **Test harness (`tests/lisp/`).** Add `.scm` fixtures plus `make
- test-lisp` (single arch) and `make test-lisp-all` (tri-arch diff);
- pass = exit 0 and expected stdout. Locks in regression coverage
- before the feature surface grows.
-10. [x] **Primitives (~40).** Broken into sub-steps; ~1200 P1 LOC total
- (~200 harness + ~1000 primitives). Earlier ~500 estimate was
- optimistic.
- - [x] **10a. FFI harness.** `make_primitive` constructor (type 5
- fixed, type 6 variadic). Fork in `apply` on header type:
- closure → existing path; prim-fixed → count, arity-check,
- marshal argv, cascade-dispatch on code id; prim-variadic →
- marshal + dispatch, no arity check. `prim_argv[]` BSS buffer
- (64 slots × 8 B).
- - [x] **10b. Registration.** Static `(name, len, code, arity, type)`
- table; `_start` walks it, interning each name and binding a
- fresh primitive object in `global_env`.
- - [x] **10c. Arithmetic + bitwise + tag predicates.** 27 primitives:
- `+ - * / % = < > <= >= zero? negative? positive? abs min max
- bit-and bit-or bit-xor bit-not arithmetic-shift number?
- symbol? string? vector? procedure? eq?`. Predicates return
- `#t`/`#f` — so a minimal slice of step 11 (reader + printer
- support for the `#t`/`#f` singletons) lands here too.
- - [x] **10d. List-core primitives.** `cons car cdr pair? null? list
- length list? append reverse assoc member`.
- - [x] **10e. String primitives.** `string? string-length string-ref
- substring string-append string->symbol symbol->string`.
- - [x] **10f. Vector primitives.** `make-vector vector-ref
- vector-set! vector-length vector->list list->vector`. Adds
- `make_vector` runtime helper (absent before 10f). `make_vector`
- needs to mask the header type byte when reloading length into
- the fill-loop counter; otherwise the loop walks off the arena.
- - [x] **10g. I/O + `equal?` + `apply`.** `display write newline
- format error read-file write-file` wrap existing labels;
- `format` dispatches `~a ~s ~d ~%`. `equal?` recurses on
- pairs/strings/vectors; non-allocating. `apply` primitive
- re-enters the internal `apply` label.
- - [x] **10h. Lisp prelude.** Scheme definitions of `map`, `filter`,
- `fold` embedded as adjacent `"…"` chunks between `:prelude_src`
- and `:prelude_src_end`; `_start` calls `eval_source` on the span
- before the script eval. M1's `"…"` form appends a NUL to each
- chunk, so `skip_ws` treats `\0` as whitespace — the chunks stitch
- together transparently. Gate test `16-prelude.scm` passes on all
- three arches.
-11. [x] **Reader extensions.** `'`/`` ` ``/`,`/`,@`, `#\char`, `#( … )`,
- improper `.` tail, hex and negative fixnums. (`#t`/`#f` landed in
- 10c; string literals in 10e.) Source-location side table
- explicitly deferred to step 16.
-12. [x] **Eval extensions.** `set!`, `let`/`let*`/`letrec`, `cond`,
- `quasiquote`; inner `define` → `letrec`-shape rewrite.
-13. [x] **Mark-sweep GC.** Mark bitmap, root discipline, sweep, size-classed
- free lists. Stress: cons-churn allocating 1000× heap in flight.
- ~500 LOC.
-14. [ ] **`pmatch` + records-via-vectors.** ~300 LOC.
-15. [ ] **REPL.** `--repl` flag, line reader, eval-print loop. ~100 LOC.
-16. [ ] **Source-location side table.** Pair-pointer → `(line . col)` hash
- populated by the reader, consulted by `error`, swept alongside
- pairs so dead entries drop. ~200 LOC.
-17. [ ] **End-to-end.** Run a 200-LOC Scheme test program exercising all
- features; diff tri-arch output.
-
-Rough rollup: ~5,300 P1 LOC (step 10 bumped from ~500 to ~1200 based
-on 10a–10c drafting), still mid-range in PLAN.md's 4–6k budget.
+# Minimal Scheme subset (lispcc)
+
+Working doc. Baseline is R7RS-small (2013); everything here is a delta
+against it. Intent: smallest surface that supports writing a self-hosted
+compiler and shell-style scripts comfortably. Things cut can always come
+back; things admitted are load-bearing.
+
+## Lexical syntax
+
+- Identifiers: standard Scheme grammar, case-sensitive. Includes
+ `+ - * / ! ? < > = _ & : ~ ^` and `->` in names. **No** `|…|`
+ vertical-bar quoted identifiers. A lone `.` is **not** a symbol —
+ it's reserved for dotted-pair syntax.
+- Booleans: `#t`, `#f`.
+- Integers: decimal (`42`, `-7`) and hex (`#xff`, `#x-1a`). Word-size
+ only — 32-bit on 32-bit targets, 64-bit on 64-bit targets. **No** `#o`
+ (octal), **no** `#b` (binary), **no** floats, rationals, complex, or
+ bignums. Overflow is undefined (see "Primitive failure" below).
+- Strings: `"…"` with escapes `\n \t \r \\ \"`. A string *is* a
+ bytevector; indexing is by u8. **String literals are immutable** —
+ `bytevector-u8-set!` on a literal is undefined.
+- Characters: `#\a` through `#\~` for printable ASCII, plus named forms
+ `#\newline` (10), `#\space` (32), `#\tab` (9), `#\return` (13),
+ `#\null` (0), and `#\xNN` for any byte. A character literal *is* a
+ u8 integer — no distinct character type. `(= #\a 97)` is `#t`.
+- Symbols: bare (`foo`, `list->bytes`). **Globally interned** — two
+ symbols that print the same are `eq?`.
+- Pairs / lists: `(a b c)`, `(a . b)`, `'()` for the empty list. Literal
+ pairs are immutable.
+- Quote: `'x` ≡ `(quote x)`. **No** quasiquote / unquote / unquote-splicing.
+- Comments: `;` to end of line. **No** `#| … |#` block, **no** `#;datum`.
+
+## Types
+
+The runtime knows about exactly these:
+
+| Type | Notes |
+|--------------|----------------------------------------------------|
+| boolean | `#t`, `#f` |
+| integer | word-size; 32-bit or 64-bit per target |
+| symbol | globally interned; `eq?`-comparable |
+| string / bv | same type; contiguous `u8[]`; literals immutable |
+| pair | cons cell; literals immutable |
+| empty list | `'()`, disjoint from pair |
+| procedure | |
+| record | via `define-record-type` |
+
+Characters are **not** a separate type — `#\a` is lexical sugar for the
+u8 value 97.
+
+**Not present:** vectors, ports (as a language type), promises,
+parameters (dynamic-scoped values, distinct from function args),
+continuations, multiple values, exceptions.
+
+## Special forms
+
+- `(define name value)` and `(define (name arg ...) body ...)`, including
+ variadic tails: `(define (f . rest) …)` and `(define (f a b . rest) …)`.
+ Top-level `define`s may forward-reference each other — mutual
+ recursion at module scope is supported. Inside a `lambda`/`let`
+ body, `define`s behave like `letrec*` (sequential, visible to
+ later forms).
+- `(lambda (arg ...) body ...)` with the same `.`-tail syntax.
+ A `lambda` evaluates to a closure: a first-class procedure value that
+ captures its enclosing lexical environment by reference. `set!` on a
+ captured variable is visible to every closure that captured it, and
+ closures that escape their creating frame outlive it (general case:
+ heap-allocated; non-escape optimization is an implementation detail).
+- `(if test then else)` — `else` required.
+- `(cond (test body ...) ... (else body ...))`.
+- `(and e ...)` / `(or e ...)` — short-circuiting; return the deciding
+ value, not a coerced boolean.
+- `(let ((x v) ...) body ...)`, `(let* …)`, `(letrec …)`.
+- `(let name ((x v) ...) body ...)` — named let.
+- `(begin e ...)`.
+- `(set! name value)`.
+- `(quote datum)` / `'datum`.
+- `(define-record-type name (constructor field ...) predicate
+ (field accessor [mutator]) ...)` — R7RS form. Creates a disjoint
+ type with the given constructor, predicate, and field
+ accessors/mutators. Implementable as a single heap-allocated object
+ with a type-descriptor pointer; does not require exposing vectors.
+- `(pmatch expr clause ...)` — pattern matcher. *Not* R7RS; added
+ because the self-hosted compiler destructures s-expressions
+ constantly and a macroless language makes hand-rolled `car`/`cdr`
+ chains painful. Clause forms:
+
+ ```
+ (pattern body ...)
+ (pattern (guard g-expr ...) body ...)
+ (else body ...)
+ ```
+
+ Patterns:
+
+ | Pattern | Matches |
+ |----------------------|------------------------------------------|
+ | `()` | the empty list |
+ | `<literal>` | integers, strings, chars, booleans by `equal?` |
+ | `<symbol>` | that exact symbol (*not* a binder) |
+ | `,<ident>` | anything; binds to `<ident>` |
+ | `,_` | anything; no binding (wildcard) |
+ | `(p1 p2 ...)` | proper list of exactly that length |
+ | `(p1 ... . ptail)` | improper list; `ptail` binds the rest |
+
+ Clauses are tried top-to-bottom; first matching pattern with
+ truthy guards wins. A guard that evaluates to `#f` is treated as
+ no-match and falls through to the next clause. **No match with no
+ `else`** is undefined behavior (follows the primitive-failure
+ policy); put an explicit `(else (error ...))` when you care.
+
+ Example:
+
+ ```scheme
+ (pmatch e
+ ((quote ,x) (emit-literal x))
+ ((if ,t ,a ,b) (emit-if t a b))
+ ((lambda ,ps . ,body) (emit-lambda ps body))
+ (,x (guard (symbol? x)) (emit-var x))
+ (else (error "unknown form" e)))
+ ```
+
+## Core procedures
+
+Flat namespace — no library carving for v1.
+
+**Predicates / equality**
+`eq?`, `equal?`, `not`, `boolean?`, `pair?`, `null?`, `symbol?`,
+`integer?`, `string?` (≡ `bytevector?`), `procedure?`.
+
+**Pairs & lists**
+`cons`, `car`, `cdr`, `set-car!`, `set-cdr!`, `list`, `length`,
+`reverse`, `append`, `list-ref`, `map`, `for-each`.
+
+**Integers** (word-size; overflow undefined)
+`+ - *`, `quotient`, `remainder`, `modulo`, `= < <= > >=`, `zero?`,
+`positive?`, `negative?`, `abs`, `min`, `max`,
+`bit-and`, `bit-or`, `bit-xor`, `bit-not`, `arithmetic-shift`,
+`number->string` (optional radix; 10 and 16 required),
+`string->number` (optional radix; returns `#f` on parse failure — this
+is a pure op, not a syscall, so the normal `(ok . val)` convention
+doesn't apply).
+
+Division rounding (R7RS semantics) — worth stating since target ISAs
+differ at the instruction level:
+
+ - `quotient` truncates toward zero: `(quotient -7 2) ⇒ -3`
+ - `remainder` has the sign of the *dividend*: `(remainder -7 2) ⇒ -1`
+ - `modulo` has the sign of the *divisor*: `(modulo -7 2) ⇒ 1`
+ - Identity: `(= n (+ (* (quotient n d) d) (remainder n d)))`
+
+**Strings / bytevectors**
+`make-bytevector`, `bytevector-length`, `bytevector-u8-ref`,
+`bytevector-u8-set!`, `bytevector-copy` (optional start/end → fresh bv),
+`bytevector-copy!` (R7RS arg order: `dst dst-start src [src-start src-end]`),
+`bytevector-append` (variadic; fresh bv),
+`bytevector=?` (binary; byte-for-byte equality),
+`string->symbol`, `symbol->string`.
+
+**Control**
+`apply`. Tail calls are guaranteed proper on all targets — named let and
+every looping idiom depend on it.
+
+**Program**
+`(argv)` returns the process's command-line arguments as a list of
+bytevectors. The first element is the program name as passed by the
+kernel (argv[0]); user arguments start at the second element.
+
+**Errors**
+`error` aborts with a message and optional irritants. No `raise` /
+`guard` / handlers.
+
+**Primitive failure** is undefined behavior / crash. `(car '())`,
+`(quotient 1 0)`, out-of-bounds `bytevector-u8-ref`, integer overflow,
+mutating a literal — all implementation-defined, expect a crash. Callers
+should not rely on any particular outcome.
+
+## Cut from R7RS-small
+
+Kept explicit so additions are deliberate.
+
+| Feature | Why cut |
+|-------------------------------------------|----------------------------------------------|
+| `syntax-rules`, macros, `define-syntax` | library is written without them for v1 |
+| `call/cc`, `dynamic-wind` | big runtime cost, not needed here |
+| `values`, `call-with-values` | replaced by `(ok . val)` pair convention |
+| Numeric tower (rational/real/complex/float)| integers only |
+| Bignums | word-size integers only (32- or 64-bit) |
+| Character *type* (char? char->integer etc)| `#\a` syntax kept as u8 sugar; no char type |
+| Vectors (`#(…)`, `make-vector`, etc.) | lists for sequences, records for structs |
+| First-class ports | shell.scm defines its own |
+| `quasiquote` / `` ` `` `,` `,@` | sugar; re-add with macros |
+| `case`, `when`, `unless`, `do` | sugar over `cond`/`if` |
+| `delay`, `force`, promises | — |
+| `make-parameter`, `parameterize` | — |
+| `guard`, `raise`, `with-exception-handler`| v1 uses `(ok . val)` |
+| `define-library`, `import`, `export` | files are files for v1 |
+| `include`, `cond-expand` | — |
+| `case-lambda` | no arity overloading |
+| `#o…` `#b…` `#;` `#\| \|#` | keeps the lexer small (`#\…` is kept) |
+| `eqv?` | `eq?` + `equal?` are enough |
+
+## Dependencies of shell.scm on this subset
+
+Sanity check that the subset actually supports the library we've written:
+
+- variadic `.` in `lambda` / `define` → `spawn`, `run`
+- named let → `refill!`-driven read loops, `bv-concat-reverse`
+- `define-record-type` → `port` record
+- `apply` → `run` calling `spawn`
+- `bit-or`, `bit-and`, `arithmetic-shift` → openat flags, wait-status decode
+- `make-bytevector` with fill byte → `NL-BV`
+- `bytevector-copy` (3-arg) and `bytevector-copy!` → read/write paths
+- `cons`, `car`, `cdr`, `null?`, `zero?`, `=`, `<`, `-`, `+` → everywhere
+
+Anything flagged here that we ultimately cut forces a rewrite of
+shell.scm.