boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit 2dfba4fcce1a94138f06d44d816896ca76f89065
parent 955cc50921de523481bdfc77a37f49725a4cb554
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 23 Apr 2026 20:08:40 -0700

lisp docs

Diffstat:
Adocs/LISP-C.md | 548+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Ddocs/LISP-GC.md | 538-------------------------------------------------------------------------------
Mdocs/LISP.md | 798++++++++++++++++++++++---------------------------------------------------------
3 files changed, 765 insertions(+), 1119 deletions(-)

diff --git a/docs/LISP-C.md b/docs/LISP-C.md @@ -0,0 +1,548 @@ +# LISP: Scheme interpreter in C + +## Goal + +A small tree-walking interpreter for the Scheme subset specified in +[LISP-SUBSET.md](LISP-SUBSET.md). Intent: simplicity and correctness +first; performance is a later concern. + +The self-hosted C compiler (see [PLAN.md](PLAN.md)) will run under +this interpreter, as will shell-style scripts built on +[shell.scm](../../shell.scm) and +[prelude.scm](../lisp/prelude.scm). + +## Settled decisions + +Load-bearing; the rest of the document assumes them. + +1. **Tree-walking eval/apply over raw s-expressions.** No AST + preprocessing, no bytecode, no compilation step. The reader's + output is the interpreter's input. +2. **Static buffers, no GC.** Fixed-size BSS regions for heap, symbol + table, reader buffer, writer buffer. Heap exhaustion aborts with + an error. +3. **Bump allocation, 8-byte aligned.** Single `heap_next` pointer; + every object is 8-byte aligned by construction. +4. **Tagged values, low-3-bit tag.** See §Value representation. +5. **Symbols are interned indices.** No heap object per symbol; the + value is an index into the symbol table, which stores the name + bytevector. +6. **Environment is a flat association list.** `((sym . val) ...)`, + extended by consing onto the front. Lookup walks the list; falls + through to the global slot in the symbol table. +7. **Closures capture the env pointer directly.** No free-variable + analysis, no flat closures, no boxes. A closure is four words: + `[header, params, body, env]`. +8. **No `set!` on captured variables.** Language-level rule. The + implementation does not enforce it; behavior is unspecified if + violated. Use a pair for shared mutable state. +9. **Inner `define` is sequential-only.** Visible to later forms in + the same body, not to earlier forms. Use `letrec` for mutual + recursion between local helpers. +10. **`pmatch` is a built-in special form.** Saves writing a macro + expander for one feature. +11. **Tail calls via host `musttail`.** All tail positions in `eval` + and `apply` are marked. Scheme-level tail-call correctness falls + out. +12. **Primitive failure is undefined behavior.** No runtime checks on + `(car '())`, `(quotient n 0)`, out-of-bounds bytevector access, + integer overflow, or mutation of literals. Follows the spec's + "Primitive failure" policy. + +## Memory layout + +Single BSS region, fixed at link time. Starting sizes: + +| Region | Size | Purpose | +|---------------|----------|--------------------------------------| +| `heap[]` | 64 MB | bump-allocated Scheme objects | +| `symtab[]` | 16384 × | open-addressing intern table | +| `readbuf[]` | 16 MB | slurped source file | +| `writebuf[]` | 64 KB | staged output for display/write | +| `argbuf[]` | 64 × 8 B | primitive dispatch scratch | + +All sized via `#define`s; easy to retune. Overflow of any buffer → +`error(...)` → `exit(1)`. + +### Symbol table entry + +```c +struct sym_entry { + val_t name_bv; // interned name, heap bytevector + val_t global_val; // top-level binding (or UNBOUND) + uint32_t name_hash; + uint32_t chain_next; +}; +``` + +`global_val` gives O(1) top-level lookup: every symbol's global +binding is reachable without hash probing once the symbol is resolved. + +The symbol table and heap are the only global roots. + +## Allocation + +`alloc(bytes)` rounds up to 8, checks `heap_next + bytes <= heap_end`, +advances the bump pointer, returns the pre-bump address. Overflow → +`error("heap exhausted")` → `exit(1)`. + +8-byte alignment is enforced by the bump allocator; every object +constructor assumes it. + +## Value representation + +All values are 64-bit tagged words. Low 3 bits are the tag. + +| Tag | Kind | Notes | +|-------|----------------------------|--------------------------------------------| +| `000` | fixnum | 61-bit signed; `>> 3` to recover | +| `001` | pair pointer | 16 B aligned; `[p-1]` car, `[p+7]` cdr | +| `010` | symbol | `(idx << 3) | 2`; idx into `symtab` | +| `011` | headered heap object | `[p-3]` is the header word | +| `100` | immediate singleton | `#f`, `#t`, `()`, unspec, unbound | +| `101`–`111` | reserved | | + +Immediates (bit patterns chosen at implementation time, all tag `100`): + +| Constant | Meaning | +|-----------|---------------------------------------------| +| `FALSE` | the Scheme `#f` | +| `TRUE` | the Scheme `#t` | +| `NIL` | the empty list `()` | +| `UNSPEC` | result of `set!`, `define`, etc. | +| `UNBOUND` | sentinel for undefined globals | + +Truthiness: every value except `FALSE` is truthy. Test is a single +compare. + +## Object types + +This is the complete runtime type set. + +| Type | Representation | +|------------------|----------------------------------------------------| +| fixnum | immediate (tag 000) | +| boolean | immediate (`FALSE`, `TRUE`) | +| empty list | immediate (`NIL`) | +| unspec / unbound | immediates | +| pair | `[car][cdr]`, 16 B, tag 001, no header | +| bytevector | `[hdr = BV | len<<8][bytes...]`, tag 011 | +| closure | `[hdr = CLOSURE][params][body][env]`, tag 011 | +| primitive | `[hdr = PRIM | code_id<<8]`, tag 011 | +| record-td | `[hdr = TD][name_sym][nfields]`, tag 011 | +| record | `[hdr = REC][td][field_0]...[field_{n-1}]`, tag 011| + +Headered object header word: + +``` +[ type:8 | aux:56 ] +``` + +where `aux` carries type-specific data (length for bytevectors, +code id for primitives, nfields count for records, etc.). One load +tells you the type. + +Strings and bytevectors are the same type. `string?` ≡ `bytevector?`. +No character type: `#\a` in source is sugar for fixnum 97. + +## Closures + +``` +[ hdr = CLOSURE ][ params ][ body ][ env ] +``` + +- `params`: parameter list s-expression: `(a b c)`, `(a b . rest)`, or + a bare symbol `rest` for a fully variadic lambda. +- `body`: the list of body forms, evaluated sequentially; last in + tail position. +- `env`: the environment chain captured at `lambda` evaluation. + +Four words total. No free-var analysis at closure creation — the whole +env chain is retained and walked at lookup time. + +## Environment + +``` +env ::= NIL + | ( (sym . val) . env ) +``` + +- **Lookup.** Walk the list, pointer-compare the car of each pair with + the target symbol, return the cdr on match. If the walk ends at + `NIL`, fall through to `symtab[idx].global_val`; if that is + `UNBOUND`, error out with `"unbound variable"`. +- **Extension** (scope entry). Cons new `(sym . val)` pairs onto the + front. +- **`set!`.** Walk the env chain; on match, `set-cdr!` the pair. If + the chain is exhausted, write to `symtab[idx].global_val`. +- **Top-level `define`.** Write to `symtab[idx].global_val`. +- **Inner `define`.** Handled inline by `eval_body` (see below). + Each inner define prepends `(name . val)` to the local env as it + is encountered. + +The global env is not a chain — it's the symbol table. Closures +captured before a top-level `define` nonetheless see the new binding, +because the fallback is to `symtab[idx].global_val` at lookup time, +not at capture time. This is how mutual recursion between top-level +functions works without any pre-pass. + +## Eval + +```c +val_t eval(val_t expr, val_t env) { + switch (tag(expr)) { + case TAG_FIXNUM: return expr; + case TAG_IMM: return expr; + case TAG_SYM: return env_lookup(expr, env); + case TAG_HEAP: return expr; // bv, closure, record — self-eval + case TAG_PAIR: break; // form; dispatch below + } + + val_t head = car(expr); + val_t rest = cdr(expr); + + if (head == SYM_QUOTE) return car(rest); + if (head == SYM_IF) { + val_t t = eval(car(rest), env); + val_t next = (t != FALSE) ? cadr(rest) : caddr(rest); + MUSTTAIL return eval(next, env); + } + if (head == SYM_LAMBDA) return make_closure(car(rest), cdr(rest), env); + if (head == SYM_DEFINE) return eval_define(rest, env); + if (head == SYM_SET) return eval_set(rest, env); + if (head == SYM_BEGIN) MUSTTAIL return eval_begin(rest, env); + if (head == SYM_LET) MUSTTAIL return eval_let(rest, env); + if (head == SYM_LETSTAR) MUSTTAIL return eval_letstar(rest, env); + if (head == SYM_LETREC) MUSTTAIL return eval_letrec(rest, env); + if (head == SYM_COND) MUSTTAIL return eval_cond(rest, env); + if (head == SYM_AND) MUSTTAIL return eval_and(rest, env); + if (head == SYM_OR) MUSTTAIL return eval_or(rest, env); + if (head == SYM_PMATCH) MUSTTAIL return eval_pmatch(rest, env); + if (head == SYM_DEFINE_RT) return eval_define_record_type(rest, env); + + // application + val_t fn = eval(head, env); + val_t args = eval_list(rest, env); + MUSTTAIL return apply(fn, args); +} +``` + +The special-form symbols (`SYM_QUOTE`, `SYM_IF`, etc.) are +indices interned once at startup and stored in globals. Dispatch is a +cascade of integer compares. + +## Apply + +```c +val_t apply(val_t fn, val_t args) { + switch (header_type(fn)) { + case HDR_PRIM: + return call_prim(fn, args); + case HDR_CLOSURE: { + val_t env = bind_params(clo_params(fn), args, clo_env(fn)); + MUSTTAIL return eval_body(clo_body(fn), env); + } + default: + error("not a procedure", fn); + } +} +``` + +`bind_params` walks params and args in lockstep. Variadic `.`-tail: +when params becomes a non-pair symbol, that symbol is bound to the +remaining args list. When params becomes `NIL` with args remaining +(or vice versa), error out with an arity message. + +## Eval_body + +```c +val_t eval_body(val_t body, val_t env) { + while (is_pair(cdr(body))) { + val_t form = car(body); + if (is_pair(form) && car(form) == SYM_DEFINE) { + val_t name = define_name(form); + val_t val = eval(define_rhs(form), env); + env = cons(cons(name, val), env); + } else { + eval(form, env); + } + body = cdr(body); + } + val_t last = car(body); + if (is_pair(last) && car(last) == SYM_DEFINE) { + val_t name = define_name(last); + val_t val = eval(define_rhs(last), env); + env = cons(cons(name, val), env); + return UNSPEC; + } + MUSTTAIL return eval(last, env); +} +``` + +Inner `define` is sequential-only: visible to forms that textually +follow it in the same body, not to earlier forms. For mutual +recursion between local helpers, use `letrec` or lift to top level. +The `SYM_DEFINE` handler in `eval` itself handles only top-level +`define`; inner defines are absorbed here and never reach that path. + +## Special forms + +Implemented directly, ~15-30 lines of C each: + +- `quote` — return the datum. +- `if` — eval test; tail-eval the chosen branch. +- `lambda` — build closure. +- `define` — top-level: write `symtab[idx].global_val`. Inner: + handled by `eval_body`. +- `set!` — walk env; first match mutates. Miss: write global slot. +- `begin` — eval each form in sequence; tail-eval the last. +- `let` — eval all inits in the current env; extend env with + bindings; tail-eval body via `eval_body`. +- `let*` — extend env one binding at a time, each init evaluated in + the progressively extended env. +- `letrec` — pre-bind each name to `UNSPEC` in a new env; eval each + init in that env; `set!` each binding; tail-eval body. +- `let` (named) — desugar inline to `letrec` + call, or implement + directly. +- `cond` — loop clauses; first truthy test wins; tail-eval its body; + `else` always true. +- `and` / `or` — short-circuit fold; return the deciding value, not a + coerced boolean. +- `pmatch` — see below. +- `define-record-type` — see below. + +## pmatch + +The matcher walks pattern and subject recursively; on success it +returns an env extended with the bindings; on failure it returns a +sentinel. The driver tries clauses top-to-bottom, skipping clauses +whose guard evaluates to `#f`. No match and no `else` is undefined +behavior per spec. + +Patterns: + +| Pattern | Matches | +|----------------------|------------------------------------------------| +| `()` | the empty list | +| `<literal>` | integer / string / char / bool by `equal?` | +| `<symbol>` | that exact symbol (not a binder) | +| `,<ident>` | anything; binds to `<ident>` | +| `,_` | anything; no binding | +| `(p1 p2 ...)` | proper list of exactly that length | +| `(p1 ... . ptail)` | improper list; `ptail` binds the rest | + +No compilation or caching — rewalk the patterns each call. ~200 LOC +total for matcher plus driver. + +## define-record-type + +Desugars at eval time into a sequence of primitive operations. + +```scheme +(define-record-type point + (make-point x y) + point? + (x point-x) + (y point-y set-point-y!)) +``` + +becomes + +```scheme +(define <point-td> (%make-record-td 'point 2)) +(define make-point (lambda (x y) (%make-record <point-td> x y))) +(define point? (lambda (v) (%record-is-a? v <point-td>))) +(define point-x (lambda (p) (%record-ref p 0))) +(define point-y (lambda (p) (%record-ref p 1))) +(define set-point-y! (lambda (p v) (%record-set! p 1 v))) +``` + +The `%` primitives are internal, not part of the user-facing +primitive list. ~100 LOC for the desugaring plus ~100 LOC for the +record primitives. + +## Primitives + +55 user-facing primitives plus 6 internal record ops. Dispatched by +code id through a switch in `call_prim`. + +**Predicates / equality** (13): `eq?`, `equal?`, `not`, `boolean?`, +`pair?`, `null?`, `symbol?`, `integer?`, `string?` (≡ `bytevector?`), +`procedure?`, `record?`, `record-is-a?`, `record-type?`. + +**Pairs** (5): `cons`, `car`, `cdr`, `set-car!`, `set-cdr!`. + +**Integers** (24): `+`, `-`, `*`, `quotient`, `remainder`, `modulo`, +`=`, `<`, `<=`, `>`, `>=`, `zero?`, `positive?`, `negative?`, `abs`, +`min`, `max`, `bit-and`, `bit-or`, `bit-xor`, `bit-not`, +`arithmetic-shift`, `number->string`, `string->number`. + +**Bytevectors / strings / symbols** (10): `make-bytevector`, +`bytevector-length`, `bytevector-u8-ref`, `bytevector-u8-set!`, +`bytevector-copy`, `bytevector-copy!`, `bytevector-append`, +`bytevector=?`, `string->symbol`, `symbol->string`. + +**Control / program / error** (3): `apply`, `argv`, `error`. + +**Internal record ops** (6): `%make-record-td`, `%make-record`, +`%record-ref`, `%record-set!`, `%record-is-a?`, `%record-td`. + +### Calling convention + +`call_prim` reads the args list from `apply`, copies the values into +`argbuf[]`, and switches on the primitive code id. Each primitive +reads its inputs from `argbuf`, returns a `val_t`. + +Fixed-arity primitives check argc at the dispatch boundary; variadic +primitives (`+`, `-`, `append`, etc.) do their own argc handling. + +## Prelude + +Written in Scheme, embedded as a string constant, parsed and +evaluated at startup like any other source. Contents: + +```scheme +(define (list . xs) xs) +(define (length xs) ...) +(define (reverse xs) ...) +(define (append . lists) ...) +(define (list-ref xs i) ...) +(define (map f . lists) ...) +(define (for-each f . lists) ...) +``` + +Runs before the user file, sharing the same symbol table and heap. +~80 Scheme LOC. + +## Reader + +Single-pass recursive descent over `readbuf` with an integer cursor. +No token array. Conses directly onto the heap. + +Handles the lexical syntax specified in scheme-subset.md: + +- lists with `.` dotted tail +- `'x` → `(quote x)` (no quasiquote, unquote, unquote-splicing) +- fixnums: decimal and `#x`-prefixed hex, both with optional sign +- `#t`, `#f` +- `"..."` strings with `\n \t \r \\ \"` escapes +- `#\a`..`#\~`, `#\newline`, `#\space`, `#\tab`, `#\return`, + `#\null`, `#\xNN` → read as fixnums +- symbols: identifier grammar, case-sensitive +- `;` line comments (no block comments, no `#;`) + +### Source locations + +As the reader conses each pair, it records `pair_ptr → (line . col)` +in a side table (fixed-size open-addressing hash). `error` consults +this when its first argument is a pair and prefixes the message with +`"at file:line:col: "`. Without a GC, entries for collected pairs +never arise; a full table simply overwrites old entries via linear +probing. + +## Writer + +Three entry points: + +- `display` — no quoting on strings; bare character output. +- `write` — readable form; strings quoted. +- `format fmt args...` — supports `~a` (display), `~s` (write), `~d` + (decimal fixnum), `~%` (newline). + +Output buffered in `writebuf[]`; flushed on buffer-full or on newline +for `display`/`write`. `error` flushes before writing to fd 2. + +## Error path + +`(error msg arg ...)`: + +1. Flush writer buffer. +2. Write `"error: "` + message + formatted args to fd 2. +3. `exit(1)`. + +No unwinding, no handlers. Primitive failure follows the same policy +as the spec: undefined behavior, likely crash, no runtime check. + +## Startup + +``` +_start: + parse argv + if argc < 2: error "usage: lisp <file.scm>" + init heap_next, heap_end + init symtab + intern the special-form symbols (SYM_QUOTE, SYM_IF, ...) + register primitives: for each (name, code_id, arity), intern + the name and bind a primitive object in symtab[idx].global_val + parse and eval prelude (embedded) + open argv[1], slurp into readbuf + loop: + read one top-level form + eval in NIL env + advance cursor + until EOF + exit 0 +``` + +All state lives in BSS; no heap allocation before `_start`. + +## File layout + +``` +lispcc/ + src/ + lisp.c # single C source, target ~2–2.5k LOC + lisp/ + prelude.scm # embedded at build time + tests/lisp/ + 00-arith.scm + 01-list.scm + 02-reader.scm + 03-printer.scm + 04-eval.scm + 05-tailcall.scm + 06-pmatch.scm + 07-records.scm + ... +``` + +Tests are `.scm` files the interpreter runs; pass = exit 0 and +expected stdout. Build integrates into the existing Makefile. + +## Milestones + +Status legend: `[x]` done · `[~]` in progress · `[ ]` not started. + +1. [ ] **Runtime skeleton.** `_start`, argv parsing, BSS layout, bump + allocator, `error` path. Exits with a placeholder status. + ~200 LOC. +2. [ ] **Tagged values.** Fixnum / pair / immediate encoding; + `cons`/`car`/`cdr`/tag predicates as C functions. Hand-build + `(cons 42 nil)` and exit with the decoded car. ~200 LOC. +3. [ ] **Symbols + interning.** 16k-slot open-addressing table; + `intern(name, len)` returns a tagged symbol index. Two + `intern("foo")` calls compare equal as integers. ~250 LOC. +4. [ ] **Bytevectors.** Headered heap object; `make-bytevector`, + `bytevector-u8-ref`, etc. as C helpers. ~200 LOC. +5. [ ] **Reader.** End-to-end source → s-expression for the full + lexical syntax, with source-location side table. ~500 LOC. +6. [ ] **Writer.** `display`, `write`, `format`. ~250 LOC. +7. [ ] **`eval` + `apply` core.** Self-eval, symbol lookup, `if`, + `lambda`, top-level `define`, application. No musttail yet. + ~300 LOC. +8. [ ] **Tail calls.** Mark all tail positions with musttail. Stress: + 10⁶-iteration named-let loop runs in constant host stack. +9. [ ] **Remaining special forms.** `set!`, `begin`, + `let`/`let*`/`letrec`, named let, `cond`, `and`/`or`, inner + `define` in `eval_body`. ~300 LOC. +10. [ ] **Primitives.** All 55 user-facing + 6 internal. Batches: + arith/bitwise, list-core, bv/string, I/O+apply+error, record + ops. ~1000 LOC. +11. [ ] **Prelude.** Embed as string; parse and eval at startup. + ~80 Scheme LOC. +12. [ ] **`define-record-type`.** Desugaring + record primitives. + ~250 LOC. +13. [ ] **`pmatch`.** Pattern matcher + clause driver. ~200 LOC. +14. [ ] **End-to-end.** Run the self-hosted compiler under it. + +Rough total: **~2–2.5k LOC C + ~80 Scheme LOC**. diff --git a/docs/LISP-GC.md b/docs/LISP-GC.md @@ -1,538 +0,0 @@ -# LISP-GC: Mark-sweep GC for `lisp.M1` - -Implementation plan for [LISP.md](LISP.md) step 13. Scope is the GC itself -plus the prerequisites it imposes on P1 frame layout and the heap shape. - -## Position in the staged plan - -Step 13 in LISP.md. Prerequisite: steps 1–12 complete (runtime, values, -reader core, printer, eval, TAIL, argv, test harness, all ~40 primitives, -prelude). Follow-on: steps 14 (`pmatch`), 15 (REPL), 16 (source -locations), 17 (end-to-end). - -## Locked decisions - -Load-bearing; the rest of the document assumes them. - -1. **Segregated arenas, headerless pairs.** Two bump pointers inside - a single 1 MB heap. Pair arena: 16-byte objects, no header, tag - `010` points directly to the pair's first word. Object arena: - headered objects only (string / symbol / vector / closure / - primitive); marks live in the `gc-flags` byte. One mark bitmap - over the pair arena only; the object arena needs none. Sweep - walks each arena with its own stride rule. -2. **P1 frame-pointer chain, metadata before slots.** Every - `PROLOGUE_Nk` frame stores the caller's `sp` as `saved_fp`, plus - the slot count `k`, ahead of its slots. GC walks the chain - precisely; no conservative scan. -3. **Explicit mark stack in BSS.** 32 KB (~4k entries). Overflow - falls back to recursive `CALL` into the marker so no structure - is too deep to mark. -4. **Heap size 1 MB for this step.** Triggers GC easily under stress - tests. Grow to 20 MB in a follow-up once C-compiler workloads - demand it. -5. **Mostly-precise stack scan with a bounded-conservative fallback.** - The roots in §Roots #1–#4 are *precise* — we know each entry is a - tagged Lisp value. The stack-frame walk (#5) is *bounded - conservative*: each frame slot is checked against - `[pair_heap_base, pair_heap_next)` / `[obj_heap_base, - obj_heap_next)` before being dereferenced, in the BDW-collector - style. See §"Why mostly-precise" for the reasoning and the - upgrade path. - -These supersede the following LISP.md text, which must be updated when -this work lands: decision 11 (single bitmap over whole heap), §Heap -layout (single arena), §GC §Roots (conservative stack filter), and -P1.md §Semantics prologue/epilogue/TAIL. - -## Heap layout - -Single BSS region, split in two by fixed labels: - -``` -:pair_heap_start - <600 KB> -:pair_heap_end / :obj_heap_start - <400 KB> -:obj_heap_end -``` - -Bump pointers: - -- `pair_heap_next` — bumps by 16. Initially `pair_heap_start`, but - `_start` rounds it up to a 16-byte boundary before any allocation - and snapshots that aligned value into `:pair_heap_base`. The GC - paths use `:pair_heap_base` as their canonical origin. -- `obj_heap_next` — bumps by an 8-rounded size. Same alignment - treatment: `_start` rounds up to 8 and snapshots `:obj_heap_base`. - -The two `_base` slots exist because M1 has no `.align` directive -and the labels `:pair_heap_start` / `:obj_heap_start` land at -whatever byte offset the assembler reached. A fixed-stride pair -sweep walking from the raw label would visit gap addresses; the -runtime alignment + base snapshot keeps mark and sweep on the same -grid. - -Free lists: - -- `free_list_pair` — head of the 16-byte pair free list. -- `free_lists[10]` — heads for 16, 24, 32, 40, 48, 56, 64, 80, 96, 128 - byte object-arena size classes. - -Split is tunable: the only code that cares is the two label -definitions. Pairs dominate compiler workloads so the pair arena gets -the larger share. - -### Allocators - -- `pair_alloc` → 16-byte raw pointer. Tries `free_list_pair`; else - bumps `pair_heap_next`. If the bump would cross `pair_heap_end`, - invokes `gc`, retries, else `error "heap exhausted"`. -- `obj_alloc(size)` → raw pointer to 8-byte-aligned region. Size is - rounded up to 8. If size ≤ 128, tries the matching free list - before bumping `obj_heap_next`. Same GC-retry-else-error overflow - path. - -`cons` routes through `pair_alloc`. Every other allocator (strings, -symbols, vectors, closures, primitives) routes through `obj_alloc`. - -### Pair mark bitmap - -One bit per 16-byte slot in the pair arena, indexed by -`(pair_ptr - pair_heap_base) / 16` (note: `_base`, the runtime- -aligned origin — see §Heap layout). At 600 KB pair arena: -600 KB / 16 = 37 500 bits ≈ 4.7 KB BSS. Declared as -`pair_mark_bitmap` with fixed size. - -Live pairs also set the bit; clearing happens during sweep. - -### Object gc-flags - -Already present in every headered object: `header byte 6` (second -byte of the header word) is the `gc-flags` byte. Bit 0 is the mark -bit. All other bits reserved. - -## Frame layout - -New `PROLOGUE_Nk` shape (supersedes P1.md §Semantics lines 230–247): - -``` -[sp + 0] = caller's retaddr -[sp + 8] = saved_fp (caller's sp value at the moment of CALL) -[sp + 16] = k (slot count, stored as integer; low 3 bits zero) -[sp + 24] = slot 1 -[sp + 32] = slot 2 (k >= 2) -[sp + 40] = slot 3 (k >= 3) -[sp + 48] = slot 4 (k >= 4) -``` - -Frame size: `round_up_16(24 + 8*k)`. Concrete sizes: - -| k | Old size | New size | -|---|----------|----------| -| 1 | 16 | 32 | -| 2 | 32 | 48 | -| 3 | 32 | 48 | -| 4 | 48 | 64 | - -Cost: +16 bytes per call frame. For a 10⁶-deep tail loop the stack -cost is unchanged (TAIL reuses the frame); for a 10⁴-deep recursive -call it adds ~160 KB, well under any reasonable stack limit. - -`EPILOGUE_Nk` / `TAIL_Nk` restore `saved_fp` from `[sp+8]` before -returning or branching. - -### Chain termination - -`_start` snapshots its own initial `sp` into `:stack_bottom_fp` -before its first `CALL`. The first prologued callee then stores -that same value as its `saved_fp`, so the walker compares each -`saved_fp` it pops to `:stack_bottom_fp`'s value and stops on -match. (The `_start` frame itself has no `PROLOGUE`, so there is -nothing to walk past it.) - -### Per-arch implementation in `p1_gen.py` - -Three arch-specific emitters change: - -- **amd64**: `call` already pushes retaddr natively. Prologue does - `pop rcx; mov rax, rsp; sub rsp, frame_size; mov [rsp], rcx; - mov [rsp+8], rax; mov qword [rsp+16], k`, then zeros each slot - in turn. Epilogue reverses. -- **aarch64**: `sub sp, sp, frame_size; str lr, [sp, 0]; add x8, sp, - #frame_size; str x8, [sp, 8]; movz x8, #k; str x8, [sp, 16]`, - then `str xzr, [sp, 24+8*i]` per slot. Epilogue reverses. - *Important:* the k-load uses `MOVZ` rather than `add x8, xzr, #k` - because aarch64 ADD encodes register 31 as SP, not XZR — the - obvious `mov xN, #imm` synthesis silently computes `sp + imm`. - This was a latent bug pre-GC since nothing read `[sp+16]`. -- **riscv64**: analogous to aarch64, using `ra` and a scratch `t0`. - Slot zeroing uses `sd zero, off(sp)`. - -Slot zeroing is the prologue-side half of the mostly-precise scheme -(see §"Why mostly-precise"); the bounds check in `mark_value_*` is -the walker-side half. - -The "caller's sp" is `sp + frame_size` on all three arches after the -sub has run, since prologue ran immediately after call. - -### Slot-offset shift in `lisp.M1` - -Every `sp`-relative access throughout `lisp.M1` shifts: - -| Old offset | New offset | -|------------|------------| -| 8 | 24 | -| 16 | 32 | -| 24 | 40 | -| 32 | 48 | - -The change is mechanical but wide. Step 1 of the implementation -sequence is entirely about landing this shift cleanly on all three -arches before any GC work begins. - -## Roots - -Five sources, walked in this order on each GC invocation: - -1. **`global_env_cell`.** Alist head in BSS. Traced transitively via - `mark_object`. -2. **`symbol_table[0..4095]`.** Intern table's backing array. Walk - each slot; mark occupied entries as tagged heap pointers. -3. **`prim_argv[0..prim_argc)`.** Scratch region live during a - primitive call. `marshal_argv` publishes the count into the - `:prim_argc` BSS slot, which the walker uses as the upper bound; - stale entries past `prim_argc` (from earlier primitive calls) - are skipped so they can't pin freed objects across GCs. -4. **Reader scratch** (`saved_src_*`). Holds the integer cursor / - line / col plus a raw byte pointer into `:src_buf`. None of these - are tagged Lisp values, so no marking is needed; listed here only - to make the omission explicit. -5. **Stack (fp chain).** Start at the caller of `gc()`, walk - `saved_fp` until the value snapshotted in `:stack_bottom_fp` at - `_start` (the kernel-provided initial sp). At each frame, read - `k` from `[fp+16]` and pass every word in `[fp+24..+24+8*k)` - through `mark_object`. Each word is bounds-checked against the - pair / obj arenas before dereference (see §"Why mostly-precise"). - -The `sym_*` special-form symbols are reachable via `symbol_table` -so they need no separate root entry. - -## Why mostly-precise - -A fully precise stack scan needs the walker to know, at every -GC-fire point, exactly which slots in each frame hold tagged Lisp -values. Two things break that invariant in this codebase: - -1. **Uninitialized slots.** A function reserves `k` slots in its - prologue, but its body may take an allocating call before having - written all of them. Slots that haven't been written yet contain - stale stack bytes from prior frames. -2. **Slots holding non-tagged values.** `obj_alloc` legitimately - spills a rounded byte size and a free-list head pointer into its - slots. `intern` / `make_string` / `make_symbol` spill raw byte - pointers (into `:src_buf` or `:str_*` literal regions) that may - not even be 8-aligned. These have arbitrary low 3 bits and don't - look like tagged Lisp values. - -The standard answers in the literature: - -- **Precise stack maps.** Compiler emits per-callsite metadata - (HotSpot, V8, Go, OCaml, SBCL). Best correctness, requires real - codegen infrastructure. -- **Strict slot discipline.** Every walked slot is always a tagged - value or sentinel; raw values live in registers or non-walked - storage. Used by classic SECD-style implementations. -- **Conservative scanning.** Treat every machine word as potentially - a pointer; check if it falls in heap range. Boehm–Demers–Weiser, - Mono, MicroPython, JavaScriptCore (for the C++ stack). -- **Mostly-precise / semi-conservative.** Precise for known-typed - roots, conservative for the C stack. CPython's cstack bridge, - SpiderMonkey for foreign frames. - -This implementation is mostly-precise (#4). The defenses live in -two places: - -- **`PROLOGUE_Nk` zeros every slot it reserves** (`p1_gen.py` - prologue emitters). An uninitialized slot deterministically reads - as `0`, which has no heap tag and is filtered by the bounds check - for free. -- **`mark_value_pair` / `mark_value_object` bounds-check the - candidate raw pointer** against `[heap_base, heap_next)` of its - arena before any dereference. A stale stack value whose low 3 bits - happen to match a heap tag is rejected unless its address actually - lies in the live arena prefix. - -The bounds check is the BDW idea in miniature, scoped to the stack -walk only. False positives (a random stack int that does fall in -the heap range) cause leaks, not crashes — they pin a dead object -until something perturbs the stack enough to overwrite the slot. - -### Upgrade path - -If/when leak measurement shows the conservative fallback pinning -real memory, the clean answer is **two slot counts per frame**: -`k_tagged` for GC-walked slots, `k_raw` for scratch the GC -ignores. `obj_alloc` declares 0+2, `cons` declares 2+0, -`intern` declares 0+3. The frame layout grows a second count -word, every `PROLOGUE_Nk` site is reclassified, frame-relative -offsets shift for the raw region, and the walker can drop both -the slot zeroing and the bounds check. That's the right shape -for the long-lived compiler; for the seed that bootstraps one -C compiler, the cost outweighs the benefit. - -## Mark phase - -``` -mark_sp ← mark_stack_base -for each root r: mark_object(r) -while mark_sp > mark_stack_base: - v ← pop(mark_stack) - mark_object(v) -``` - -`mark_object(v)` dispatches on the low-3-bit tag: - -- **Fixnum, singleton**: no-op. -- **Pair (`010`)**: compute bit index from pointer; if bit set, - return (already marked). Else set bit, push `car` and `cdr` onto - `mark_stack`. -- **Vector (`011`)**: read header; if gc-flags bit 0 set, return. - Else set it, read length, push every element. -- **String (`100`)**: set mark bit; no children. -- **Symbol (`101`)**: set mark bit; push name string and intern-chain - link. -- **Closure / primitive (`110`)**: set mark bit; closures push - params list, body list, env pointer; primitives have no children. - -### Mark-stack overflow - -If `push(mark_stack)` would overflow, recurse by `CALL mark_object` -on the overflowing value instead. P1 stack depth grows temporarily -but every reachable object still gets marked. Compiler workloads -should never hit this path; `06-gc-stress.scm` and -`18-gc-deep-list.scm` will exercise it intentionally. - -## Sweep phase - -Two per-arena loops. - -### Pair arena sweep - -``` -p ← pair_heap_start -while p < pair_heap_next: - i ← (p - pair_heap_start) / 16 - if bit i of pair_mark_bitmap is set: - clear bit i - else: - push p onto free_list_pair - p ← p + 16 -``` - -Tail rewind: track the last live pair address; if the tail of the -arena is all-dead, rewind `pair_heap_next` to just past the last -live. Optional for correctness; small win for fragmentation. Land -it in step 4 only if the pair-arena free list grows pathologically -in the stress tests. - -### Object arena sweep - -Every live headered object has a valid header. Free chunks carry a -pseudo-header `type=0, gc-flags=0, length=chunk_size` so the walker -strides uniformly. - -``` -p ← obj_heap_start -while p < obj_heap_next: - h ← LD [p] - sz ← header_length(h) - rounded ← round_up_8(sz) - if header_type(h) != 0 and header_mark(h): - clear mark bit - else: - if rounded ≤ 128: - push p onto free_lists[class_of(rounded)] - overwrite header as free pseudo-header - else: - coalesce with following chunk if also dead; write - combined pseudo-header - p ← p + rounded -``` - -Tail rewind: same optional logic; rewind `obj_heap_next` past the -trailing all-dead region. - -Coalescing pass runs before the free-list push so adjacent dead ->128-byte runs merge into one. Free-list push happens after, for -the sub-128 chunks. - -## Allocator integration - -`pair_alloc` and `obj_alloc` both follow the same template: - -``` -head ← free_list_head (for size class) -if head != 0: - free_list_head ← [head + 0] - return head -new ← bump_pointer -if new + size > arena_end: - call gc - retry from top once - if still no room: error "heap exhausted" -bump_pointer ← new + size -return new -``` - -`gc` is a normal P1 function that does mark+sweep and returns. It -assumes GC-safety invariants hold at its entry (every live tagged -value is either in a named BSS root or in a `PROLOGUE_Nk` slot -somewhere on the fp chain). These invariants already hold in the -existing codebase; the step-13 work adds none. - -## Implementation sequence - -Each step assembles through P1 → M1 → hex2 and passes -`make test-lisp-all` before the next begins. - -1. **`p1_gen.py` frame change + `lisp.M1` offset shift.** - ~150 Py LOC + mechanical `lisp.M1` edit. Rewrite - prologue/epilogue/TAIL for amd64, aarch64, riscv64; shift every - `sp`-relative offset in `lisp.M1`. Write `stack_bottom_fp = 0` - in `_start`. **Gate**: every existing test passes on all three - arches. No GC code yet. -2. **Segregated heap + free-list scaffolding.** ~100 P1 LOC. - Split heap labels; route `cons` through `pair_alloc`, others - through `obj_alloc`; declare `free_list_pair` and `free_lists[10]` - (heads only, no populating logic yet); overflow still routes to - `alloc_oom`. **Gate**: existing tests pass. -3. **Mark phase.** ~200 P1 LOC. `mark_stack[4096]`, - `pair_mark_bitmap`, `mark_object`, root walk including fp-chain - traversal. Add hidden `(gc-mark-only)` primitive that runs mark - and returns live-object count. **Gate**: a test validates the - count matches a known reachable-set size; hitting the mark-stack - overflow path is exercised by `18-gc-deep-list.scm`. -4. **Sweep phase + wire into allocator.** ~200 P1 LOC. Both - per-arena sweeps, free-list population, free-chunk pseudo-headers, - coalescing for >128 chunks. Replace the `alloc_oom` first-check - branch with `gc → retry → error`. **Gate**: existing tests pass; - free lists populated after first GC. -5. **Stress tests.** Five new `.scm` files in `tests/lisp/`: - - `17-gc-cons-churn.scm` — 100k pair allocations in a loop, verify - deterministic checksum. - - `18-gc-deep-list.scm` — 10k-element list, post-GC sum. - - `19-gc-vector-churn.scm` — vector alloc/drop, exercises >128-byte - coalesce. - - `20-gc-closure-churn.scm` — nested lambdas creating closures - faster than arena; validates env-chain rooting. - - `21-gc-mixed.scm` — interleaved pair/vector/string/closure. - **Gate**: all pass on `make test-lisp-all`. - -Rough total: ~700 P1 LOC + ~150 Py LOC + 5 test files. - -## Doc updates bundled with the code - -- **LISP.md decision 11**: rewrite to "Pair mark bits live in a - per-pair-arena bitmap; object arena marks live in each object's - `gc-flags` byte. Pairs stay 16 bytes." -- **LISP.md §Heap layout**: describe the two-arena split. -- **LISP.md §GC §Algorithm**: replace the single-walk description - with per-arena sweep. -- **LISP.md §GC §Roots**: replace conservative-scan language with - the fp-chain walker description. -- **P1.md §Semantics** (prologue / epilogue / TAIL sections, lines - 230–269): new frame shape with `saved_fp` at `[sp+8]` and `k` at - `[sp+16]`; updated frame size formula; per-arch mechanics updated. - -## Risks and mitigations - -- **Missed slot-offset shift in step 1.** Silent: a function with - a stale `sp,8` access reads `saved_fp` as if it were a slot, and - corrupts it on store. Mitigation: grep every `,sp,` pattern in - `lisp.M1` before declaring step 1 done; run every test on every - arch. -- **GC fires during primitive with live values only in registers.** - Primitives must honor the per-frame-slot root discipline. The - existing code already does; step-13 work must not introduce new - violations. Audit every new GC code path for this at review time. -- **fp-chain consistency during mark-stack overflow recursion.** - The recursive `CALL mark_object` pushes a new frame with its own - `saved_fp`. The root walker must not re-walk frames it's already - processed. Simple fix: the walker captures the chain length at - entry and only walks down to that depth; frames above are the - walker's own and are skipped. -- **Free-list corruption from a bug in sweep.** Free chunks double - as linked-list nodes via their first word. An off-by-one in sweep - can overwrite a live object with a free-list link. Mitigation: - stress tests `17` and `21` allocate + read back; any corruption - surfaces as a crash or wrong checksum. - -## Remaining work - -Tracked here so the next pass at this code can pick them up without -re-deriving the list. - -### Doc sync (blocks "step 13 complete") - -- [ ] `docs/LISP.md` decision 11: rewrite from "single bitmap over - whole heap" to "pair-arena bitmap + object `gc-flags` byte; - two segregated arenas." -- [ ] `docs/LISP.md` §Heap layout: replace the single-arena - description with the pair/obj split and the runtime-aligned - `:pair_heap_base` / `:obj_heap_base` snapshots. -- [ ] `docs/LISP.md` §GC §Roots: drop the conservative-stack-filter - language; replace with the fp-chain walker + bounded- - conservative-on-stack description (cross-ref §"Why - mostly-precise" here). -- [ ] `docs/LISP.md` §GC §Algorithm: replace the single-walk - description with the per-arena sweep (bitmap-driven pair - sweep, header-stride object sweep, free-list population, - coalescing). -- [ ] `docs/P1.md` §Semantics prologue/epilogue/TAIL: new frame - shape with `saved_fp` at `[sp+8]` and `k` at `[sp+16]`; - updated frame-size formula; note that PROLOGUE_Nk zeros - each slot (cross-ref LISP-GC.md §"Per-arch implementation"). - Flag the aarch64 MOVZ-over-ADD detail as a load-bearing - encoding gotcha. - -### Stress tests to reach the doc's step-5 set - -- [ ] Rename current `19-gc-closure-churn.scm` → `20-gc-closure- - churn.scm` so numbering matches the plan in §Implementation - sequence. -- [ ] Add `19-gc-vector-churn.scm`. Currently the `>128`-byte - coalescing path in `gc_sweep_obj` has *zero* test coverage; - a loop that allocates and drops vectors of varying sizes - (including some `>128` bytes) would hit it. -- [ ] Add `21-gc-mixed.scm`. Interleaves pair/vector/string/closure - allocations so pair and obj GC cycles overlap. -- [ ] Rework `18-gc-deep-list.scm` (or add a companion test) to - force the mark-stack-overflow recursive-`CALL mark_value` - fallback. The current test is tail-recursive so the mark - stack stays shallow; a non-tail-recursive build of a deep - reachable list would actually exercise the overflow path. - -### Deferred by explicit decision (revisit under load) - -- [ ] Grow heap from the current bring-up 32 KB + 32 KB to the - decision-4 size of 1 MB (pair) + 400 KB (obj), and - eventually 20 MB total once the C-compiler workload lands. - Keep the tiny bring-up sizes until then so GC stays cheap - to trigger in tests. -- [ ] Optional tail-rewind of `pair_heap_next` / `obj_heap_next` - during sweep. Marked optional; only land if fragmentation - shows up under load. -- [ ] Upgrade mostly-precise → fully precise via two slot counts - per frame (`k_tagged` + `k_raw`). Documented in §"Why - mostly-precise" as the upgrade path; only land if leak - measurement (next item) shows the conservative fallback - pinning real memory. - -### Unmeasured invariants - -- [ ] Instrument leak rate: count live pairs / objects retained - after each sweep over a churn workload. Today's claim that - "false marks from the bounded-conservative stack scan are - rare in practice" is unverified; a single counter would - either close the concern or trigger the precise upgrade. diff --git a/docs/LISP.md b/docs/LISP.md @@ -1,581 +1,217 @@ -# LISP: Seed Lisp interpreter in P1 - -## Goal - -A small Scheme-flavored Lisp interpreter written once in the P1 pseudo-ISA -(see [P1.md](P1.md)) and assembled to three host arches (amd64, aarch64, -riscv64) via the existing `M1` + `hex2` chain. Hosts the C compiler (see -[PLAN.md](PLAN.md)) as a Lisp program; that compiler is used for both the -seed userland ([SEED.md](SEED.md)) and tcc-boot. Target size: 4,000–6,000 -lines of P1. - -This document covers only the interpreter. The Lisp-language surface it -exposes is specified by PLAN.md's "Feature floor" — this doc does not -restate it. - -## Position in the chain - -``` -M1 asm → P1 pseudo-ISA → Lisp interpreter (this doc) → C compiler → seed + tcc-boot -``` - -The interpreter binary is the last artifact built from P1 source in the -bootstrap. Everything above it is either Lisp source text or a C binary -produced by the Lisp-hosted C compiler. - -## Settled decisions - -Load-bearing; the rest of the document assumes them. - -1. **Tagged cells, low-3-bit tag, 61-bit fixnum payload.** See - §Value representation. -2. **Mark-sweep GC over a fixed BSS arena.** 20 MB, linked in at build - time, no `brk`/`mmap`. -3. **Per-frame root slots on the P1 stack.** Scheme-facing functions - declare `PROLOGUE_Nk` frames with one slot per live tagged value - they need across an allocating call; the GC mark phase walks those - frames. `global_env` and `symbol_table` remain named BSS roots - (long-lived singletons with no stack frame to hang off of). -4. **Source file via `argv[1]`.** The interpreter opens the path given - on the command line, reads it whole into a string, parses it, - evaluates top-to-bottom. No linked-in blob. -5. **Optional REPL under `--repl`.** Development aid; not used by the - bootstrap pipeline. ~100 P1 LOC. -6. **No bignums, ever.** 61-bit fixnums only. Confirmed: the C compiler - never needs larger. -7. **Single `lisp.M1` source file.** M1 has no `#include` and - concatenating shards in the Makefile adds build-time coupling we - don't need yet. Revisit only if the file passes ~6k LOC. -8. **`pmatch` is a built-in special form, not a user-level macro.** - Saves writing a general macro expander for one feature. -9. **Tail calls via P1 `TAIL`.** `eval` dispatches tail-position calls - through `TAIL`; non-tail through `CALL`. Scheme-level tail-call - correctness falls out for free. -10. **Eight syscalls: `read`, `write`, `openat`, `close`, `exit`, - `clone`, `execve`, `waitid`.** Matches PLAN.md. The last three - let the Lisp program spawn M1/hex2 and act as the tcc-boot - build driver; `openat(AT_FDCWD, …)` replaces bare `open` - because aarch64/riscv64 lack it in the asm-generic table. No - signals, no `lseek`, no `stat`. -11. **Pair GC marks live in a separate bitmap**, not in the pair words. - ~1.25 MB BSS for a 20 MB heap; keeps pairs at 16 bytes and keeps - fixnums at 61 bits. -12. **Fixnum overflow wraps mod 2^61.** `+`/`-`/`*` do not trap. C - compiler workloads never approach 2^60. -13. **`equal?` does not detect cycles.** Cyclic input loops forever; - the C compiler never constructs cycles. -14. **Reader attaches source locations via a side table.** Pair-pointer - → `(line . col)` lookup, populated by the reader, consulted by the - error path. ~200 LOC. -15. **Closures are three-slot objects**: header + params + body + env. - GC traces three fields; revisit only if closure churn measures hot. - -## Value representation - -All Lisp values are 64-bit tagged words. Low 3 bits are the tag: - -| Tag (bin) | Meaning | Payload | -|-----------|----------------------|---------------------------------------| -| `001` | fixnum | signed 61-bit, `>> 3` to recover | -| `010` | pair pointer | 8-byte aligned, clear low 3 to deref | -| `011` | vector pointer | idem | -| `100` | string pointer | idem | -| `101` | symbol pointer | idem, interned (`eq?`-comparable) | -| `110` | closure / primitive | idem | -| `111` | singleton constant | `()`, `#t`, `#f`, `eof`, `unspec` | - -Pointer tags require 8-byte-aligned allocation, enforced by the bump -allocator. - -Singletons live in the `111` space as small immediates (e.g. `0x07` nil, -`0x0F` #t, `0x17` #f, `0x1F` eof, `0x27` unspec). Distinguishable from -fixnums by tag, from pointers by tag, and from each other by payload. - -### Pairs - -Two consecutive 8-byte words: `[car][cdr]`. No header. The tag on the -pointer tells GC it is a pair and has no embedded length. - -### Headered objects - -Vectors, strings, symbols, closures, and primitives carry a 1-word header -before their payload: - -``` -[ 8-bit type | 8-bit gc-flags | 48-bit length-or-arity ] -``` - -- **Vector**: header + `length` tagged words. -- **String**: header + `length` bytes (ASCII, per PLAN.md). Not - null-terminated; length is authoritative. Padded up to 8-byte - alignment. -- **Symbol**: header + name bytes + a 1-word intern-chain link. Two - symbols are `eq?` iff their pointers match; the interner guarantees - one object per name. -- **Closure**: header (type 4) + `params` list pointer + `body` list - pointer + `env` pointer. Arity in header. -- **Primitive (fixed arity)**: header (type 5) + primitive code id. - Arity in header low 48 bits. -- **Primitive (variadic)**: header (type 6) + primitive code id. - Arity field unused; body does its own min-arg check. - -The "code id" is an integer 1..N assigned at generation time. `apply` -resolves it by cascade — a chain of `BEQ code, N, &prim_N_body` — so -the seed doesn't need an indirect-call op in P1. A future refactor can -swap the cascade for an indirect call once a `CALLR` op lands; the -primitive object layout doesn't change. - -## Heap layout - -Single BSS region, fixed at link time: - -``` -:heap_start - <20 MB of zero bytes> -:heap_end -``` - -- `heap_next` — bump pointer, starts at `heap_start`. -- `free_lists[10]` — free-list heads per size class (see §GC). - -### Allocation - -`alloc(size)` rounds `size` up to 8 bytes, consults the matching size-class -free list first, otherwise bumps `heap_next`. If the bump would cross -`heap_end`, run GC; if still no room, `error "heap exhausted"`. - -Everything is 8-byte aligned by construction. - -## GC - -Mark-sweep, stop-the-world, invoked only when the allocator overflows. -Not triggered by time, step count, or explicit request. - -### Roots - -Three sources: - -1. **Global environment.** Fixed pointer in BSS (`global_env`), traced - transitively. -2. **Symbol table.** Intern table's backing array plus all interned - symbols and their name strings. Fixed pointer (`symbol_table`). -3. **Per-frame stack slots.** The mark phase walks the P1 stack from - the current `sp` up to the saved frame-chain bottom. Each frame - declares its slot count via `PROLOGUE_Nk` (see P1.md §"Semantics"), - and the GC treats every slot word as a tagged-value root — - non-pointer slots self-filter because their tags don't match any - heap object. - -Return addresses and raw P1 integers on the stack sit outside the -PROLOGUE_Nk slot range, so the walker doesn't see them. - -### Algorithm - -1. **Mark.** Traverse from roots, setting the `gc-flags` mark bit on - each reachable headered object. Pairs have no header; their marks - live in a separate bitmap indexed by - `(pair_addr - heap_start) / 16`. Bitmap cost: ~1.25 MB for a 20 MB - heap; removes the need to steal bits from pair data. -2. **Sweep.** Walk the arena object-by-object. Live objects clear - their mark and continue. Dead objects of size ≤ 64 bytes join the - matching free list; larger dead objects are coalesced, and if the - tail of the arena is all free, rewind `heap_next`. No object - motion. - -### Size classes - -Free lists for 16, 24, 32, 40, 48, 56, 64, 80, 96, 128 bytes (ten -classes). The low seven cover the common object sizes: - -| Class | Typical occupants | -|-------|-----------------------------------------------------------------| -| 16 | pairs (dominant), primitives, 0–8 char strings | -| 24 | short symbols (1–8 char names), 9–16 char strings, 2-slot | -| 32 | closures, medium symbols (9–16 char names), 3-slot vectors | -| 40 | 4-slot vectors / env frames of 4 vars | -| 48 | 5-slot vectors | -| 56 | 6-slot vectors | -| 64 | 7-slot vectors | -| 80 | 9-slot vectors, env frames of 8 vars | -| 96 | 11-slot vectors, env frames of 10 vars | -| 128 | 15-slot vectors, env frames up to 15 vars | - -Pairs (16) dominate allocation volume in compiler workloads; closures -(32) are a fixed size so get their own class. The 80/96/128 classes -catch local-env vectors for functions with 8–15 bound names, which -would otherwise only be reclaimed by bump-pointer rewind. - -Allocations above 128 bytes go through the bump pointer on every -cycle — typically long string literals and large vectors, both -uncommon in compiler workloads. - -### Invariants - -- Between any two GC-observable points, every live Scheme value must be - reachable from `global_env`, `symbol_table`, or a PROLOGUE_Nk slot - in an active frame. "GC-observable" = any call to `cons`, - `make-vector`, `make-string`, `make-closure`, `intern`, or another - Scheme function. -- Scheme-facing functions spill each live tagged value into a frame - slot before any allocating call. A value kept only in a P1 register - will be lost if GC fires mid-call. - -Tradeoffs taken: - -- Mark bitmap costs memory but keeps pair encoding simple. -- Stop-the-world is fine: single-threaded, and the C compiler is not - latency-sensitive. -- No compaction: fragmentation is acceptable for a ≤20 MB heap with a - pair-dominated workload. - -## Environment - -Two layers: - -- **Global env.** Open-addressing hash table keyed by symbol pointer - identity. 4096 slots, linear probe. Holds bindings established by - top-level `define`. -- **Local env.** A chain of frames. Each frame is a pair - `((names-list . values-vector) . parent-frame)`. Lookup walks up - until a name matches, then falls through to the global hash. - -Hash over a symbol pointer: `(ptr >> 3) & (TABLE_SIZE - 1)`. Collision -rate is low — the intern table guarantees one pointer per name, and -Lisp source is sparse in identifiers. - -`define` at top level writes to the global hash; `define` inside a body -is rewritten to `let`-shape at eval time (standard single-pass -practice). - -## Eval - -Textbook meta-circular recursive-descent evaluator. Each form has a case: - -``` -eval(expr, env): - if self-evaluating: return expr - if symbol: return lookup(expr, env) - match car(expr): - 'quote → return cadr(expr) - 'if → eval test; TAIL eval selected branch - 'begin → eval each non-last; TAIL eval last - 'lambda → return make-closure(params, body, env) - 'define → bind in env; return unspec - 'set! → mutate binding; return unspec - 'let/let*/letrec → desugar to lambda application; TAIL eval - 'cond → desugar to nested if; TAIL eval - 'quasiquote → expand; TAIL eval - 'pmatch → compile arm chain; TAIL eval selected arm - otherwise → eval callee + args; TAIL apply -``` - -`TAIL` above compiles to a literal P1 `TAIL` — the current `eval` -frame is torn down before the recursive call. Non-tail positions -(test of `if`, non-last of `begin`, callee/arg evaluation of an -application) use P1 `CALL`. - -### Application - -`apply(callee, args-list)`: - -- **Primitive:** load function pointer from header, pack args into a - fixed `argv` region in BSS, `CALL` the primitive, return its result. -- **Closure:** construct a new frame binding params to args; `TAIL eval` - the body in the new env. - -Arity mismatches raise `error`. - -## Reader - -Recursive-descent over the in-memory source string with an integer -cursor. No port abstraction. - -``` -read(src-string, cursor) → (value, new-cursor) -``` - -Tokens are produced on demand; no token array. - -Forms recognized: - -- `( ... )` — list, with `.` for improper tail -- `' expr` → `(quote expr)` -- `` ` expr `` → `(quasiquote expr)` -- `, expr` → `(unquote expr)` -- `,@ expr` → `(unquote-splicing expr)` -- `#( ... )` — vector literal -- `#t`, `#f` -- integer literals: decimal (`-42`, `123`) and hex (`0x7F`) -- character literal: `#\a`, `#\space`, `#\newline`, `#\tab` -- string literal: `"..."` with `\n \t \\ \"` escapes -- symbol: every other `[!-~]` run not starting with a digit -- `;` — line comment, skipped - -Reader output conses directly into the heap; the spill-slot discipline -applies. - -### Source locations - -The reader maintains a side table mapping pair pointer → `(line . col)` -of the opening token for that form. The table is an open-addressing -hash keyed by pair pointer identity; it is a GC root so locations -survive across collections, and entries for dead pairs are dropped -during sweep (a location whose pair pointer is unmarked is itself -dead). `error` consults this table when its argument is a pair and -prefixes the diagnostic with `"at <file>:<line>:<col>: "`. - -Cost: ~150 LOC in the reader, ~50 LOC in the error path and GC sweep. -Pays for itself when the C compiler mis-parses a tcc-boot source. - -## Printer - -Three entry points: - -- `display` — no quoting on strings; bare character output. -- `write` — readable form: strings quoted, characters prefixed. -- `format fmt args...` — supports `~a` (display), `~s` (write), `~d` - (decimal fixnum), `~%` (newline). No other directives. - -Output buffered in a 4 KB BSS buffer; flushed on full or on newline for -`display`/`write`. `error` flushes before writing to fd 2. - -## Symbol interning - -Single open-addressing hash table with linear probe. Hash: - -``` -h = 0 -for each byte b in name: - h = (h * 31 + b) & 0xFFFFFFFF -return h & (TABLE_SIZE - 1) -``` - -`TABLE_SIZE = 4096`. Invoked on every symbol the reader sees; the -table is a GC root. - -## Primitives FFI - -Each primitive is an ordinary P1 function: - -- **In:** `r1` = argc, `r2` = pointer to argv (tagged words, - back-to-back). -- **Out:** `r0` = result tagged word. - -At startup, the interpreter walks a static `(name, len, code, arity, -type)` table and, for each row, interns the name, allocates a -primitive heap object (type 5 fixed or type 6 variadic, carrying the -code id), and binds it in `global_env`. Primitives are first-class -values. - -`apply` dispatches by cascade on the code id: a straight-line chain of -`BEQ r_code, N, &prim_N_body` — one entry per primitive. The cost is -O(n) per call; acceptable for seed workloads. Can be swapped for an -indirect call later (add `CALLR` to p1_gen.py, store `&prim_N_body` -directly in the object at +8) without touching any primitive body. - -Primitives allocate through the same `alloc`/GC path as user code and -must honor the per-frame-slot root discipline when they make more than -one allocation. - -Target set: ~40 primitives per PLAN.md. Roughly: - -- **List/pair**: `cons car cdr pair? null? list? list length append - reverse assoc member` -- **Arithmetic**: `+ - * / % = < > <= >= zero? negative? positive? - abs min max` -- **Bitwise**: `bit-and bit-or bit-xor bit-not arithmetic-shift` -- **String**: `string? string-length string-ref substring - string-append string->symbol symbol->string` -- **Vector**: `make-vector vector-ref vector-set! vector-length - vector->list list->vector` -- **Type predicates**: `number? symbol? string? vector? procedure? - eq? eqv? equal?` -- **I/O**: `read-file write-file display write newline format error` -- **Control**: `apply` - -`map`, `filter`, and `fold` are higher-order and would force -primitives to re-enter `eval`. They live in a short Scheme prelude -instead (evaluated by `_start` after the primitive registration loop); -not P1 code. - -## `pmatch` - -Built-in special form. Pattern language: - -- literal fixnum / string / symbol — matches by `equal?` -- `?var` — bind variable -- `_` — wildcard -- `(p1 p2 ...)` — structural list match -- `(p1 . p2)` — improper-list split -- `(quote x)` / `'x` — symbol literal match -- `(guard expr)` — predicate after a match - -``` -(pmatch expr - [pattern₁ body₁] - [pattern₂ body₂] - ... - [else body-else]) -``` - -Compiled at eval time into a chain of test / bind / `TAIL eval body` -operations. Compilation result is cached in the AST after first -expansion so a `pmatch` site does not recompile per execution. - -## Error path - -`(error msg arg ...)`: - -1. Flush any pending printer buffer. -2. Write `"error: "` + the message + formatted args to fd 2. -3. `exit(1)`. - -No unwinding, no continuations, no handlers. The C compiler either -succeeds or exits with diagnostics. - -## Startup - -``` -_start: - argc at [sp + 0], argv at [sp + 8] (per P1 entry conv) - if argc < 2: error "usage: lisp <file.scm> | --repl" - if argv[1] == "--repl": repl_loop() - else: run_file(argv[1]) - exit(0) -``` - -`run_file` reads the whole file into a string, then loops: -parse one top-level form, eval it, discard, advance cursor — so a -`define` of a helper is visible before the next form that uses it. - -### REPL - -Under `--repl`: - -``` -loop: - write "> " to fd 1 - read one line from fd 0 into scratch buffer - on EOF: exit 0 - parse one expression from the line - eval; write result; newline - goto loop -``` - -~100 LOC. Development aid; not invoked by the bootstrap pipeline. - -## File layout - -``` -lispcc/ - lisp.M1 # single P1 source, target 4–6k LOC - tests/lisp/ - 00-arith.scm - 01-list.scm - 02-reader.scm - 03-printer.scm - 04-eval.scm - 05-tailcall.scm - 06-gc-stress.scm - 07-pmatch.scm - 08-records.scm - ... -``` - -Tests are `.scm` files the interpreter runs; pass = exit 0 and expected -stdout. `make test-lisp` loops over them; `make test-lisp-all` runs -each test on all three arches via the P1 differential harness. - -## Staged implementation plan - -Each step assembles through existing P1 → M1 → hex2 and carries test -coverage before the next begins. - -Status legend: `[x]` done · `[~]` in progress · `[ ]` not started. - -1. [x] **Runtime skeleton.** `_start`, argv parsing, `read-file`/`write-file` - via syscalls, `error` path, BSS layout, bump allocator. Exits with - the file's first byte as proof-of-life. ~200 P1 LOC. -2. [x] **Tagged values.** Fixnum/pair/singleton encoding; `cons`/`car`/`cdr`/ - tag predicates. Hand-build `cons(42, nil)` and exit with the decoded - car as status. ~300 LOC. -3. [x] **Strings + symbol interning.** Heap strings, 4096-slot intern table, - symbol allocator. Two `(intern "foo")` calls return `eq?` pointers. - ~400 LOC. -4. [x] **Reader (core).** End-to-end `source → sexpr` for lists, decimal - fixnums, interned symbols, `;` comments, with line/col tracking for - diagnostics. Extended syntax (quotes, `#\char`, `#( … )`, improper - `.` tail, hex/negative fixnums) and the source-location side table - land in later steps. (`#t`/`#f` landed in 10c; string literals in - 10e.) ~500 LOC. -5. [x] **Printer.** `display`, `write`, minimal `format`. Closes a - read-print cycle. ~300 LOC. -6. [x] **Eval (non-tail).** Self-evaluators, lookup, `if`, `begin`, - `lambda`, `define`, application via P1 `CALL`. ~600 LOC. -7. [x] **`TAIL` integration.** Rewrite tail-position `eval` calls to P1 - `TAIL`. Stress: loop of 10^6 self-calls with flat P1 stack. -8. [x] **`argv[1]` file read.** Replace the embedded `src_text` blob with a - syscall path: open and read the file named on the command line into - a heap string; `error` on `argc < 2`. -9. [x] **Test harness (`tests/lisp/`).** Add `.scm` fixtures plus `make - test-lisp` (single arch) and `make test-lisp-all` (tri-arch diff); - pass = exit 0 and expected stdout. Locks in regression coverage - before the feature surface grows. -10. [x] **Primitives (~40).** Broken into sub-steps; ~1200 P1 LOC total - (~200 harness + ~1000 primitives). Earlier ~500 estimate was - optimistic. - - [x] **10a. FFI harness.** `make_primitive` constructor (type 5 - fixed, type 6 variadic). Fork in `apply` on header type: - closure → existing path; prim-fixed → count, arity-check, - marshal argv, cascade-dispatch on code id; prim-variadic → - marshal + dispatch, no arity check. `prim_argv[]` BSS buffer - (64 slots × 8 B). - - [x] **10b. Registration.** Static `(name, len, code, arity, type)` - table; `_start` walks it, interning each name and binding a - fresh primitive object in `global_env`. - - [x] **10c. Arithmetic + bitwise + tag predicates.** 27 primitives: - `+ - * / % = < > <= >= zero? negative? positive? abs min max - bit-and bit-or bit-xor bit-not arithmetic-shift number? - symbol? string? vector? procedure? eq?`. Predicates return - `#t`/`#f` — so a minimal slice of step 11 (reader + printer - support for the `#t`/`#f` singletons) lands here too. - - [x] **10d. List-core primitives.** `cons car cdr pair? null? list - length list? append reverse assoc member`. - - [x] **10e. String primitives.** `string? string-length string-ref - substring string-append string->symbol symbol->string`. - - [x] **10f. Vector primitives.** `make-vector vector-ref - vector-set! vector-length vector->list list->vector`. Adds - `make_vector` runtime helper (absent before 10f). `make_vector` - needs to mask the header type byte when reloading length into - the fill-loop counter; otherwise the loop walks off the arena. - - [x] **10g. I/O + `equal?` + `apply`.** `display write newline - format error read-file write-file` wrap existing labels; - `format` dispatches `~a ~s ~d ~%`. `equal?` recurses on - pairs/strings/vectors; non-allocating. `apply` primitive - re-enters the internal `apply` label. - - [x] **10h. Lisp prelude.** Scheme definitions of `map`, `filter`, - `fold` embedded as adjacent `"…"` chunks between `:prelude_src` - and `:prelude_src_end`; `_start` calls `eval_source` on the span - before the script eval. M1's `"…"` form appends a NUL to each - chunk, so `skip_ws` treats `\0` as whitespace — the chunks stitch - together transparently. Gate test `16-prelude.scm` passes on all - three arches. -11. [x] **Reader extensions.** `'`/`` ` ``/`,`/`,@`, `#\char`, `#( … )`, - improper `.` tail, hex and negative fixnums. (`#t`/`#f` landed in - 10c; string literals in 10e.) Source-location side table - explicitly deferred to step 16. -12. [x] **Eval extensions.** `set!`, `let`/`let*`/`letrec`, `cond`, - `quasiquote`; inner `define` → `letrec`-shape rewrite. -13. [x] **Mark-sweep GC.** Mark bitmap, root discipline, sweep, size-classed - free lists. Stress: cons-churn allocating 1000× heap in flight. - ~500 LOC. -14. [ ] **`pmatch` + records-via-vectors.** ~300 LOC. -15. [ ] **REPL.** `--repl` flag, line reader, eval-print loop. ~100 LOC. -16. [ ] **Source-location side table.** Pair-pointer → `(line . col)` hash - populated by the reader, consulted by `error`, swept alongside - pairs so dead entries drop. ~200 LOC. -17. [ ] **End-to-end.** Run a 200-LOC Scheme test program exercising all - features; diff tri-arch output. - -Rough rollup: ~5,300 P1 LOC (step 10 bumped from ~500 to ~1200 based -on 10a–10c drafting), still mid-range in PLAN.md's 4–6k budget. +# Minimal Scheme subset (lispcc) + +Working doc. Baseline is R7RS-small (2013); everything here is a delta +against it. Intent: smallest surface that supports writing a self-hosted +compiler and shell-style scripts comfortably. Things cut can always come +back; things admitted are load-bearing. + +## Lexical syntax + +- Identifiers: standard Scheme grammar, case-sensitive. Includes + `+ - * / ! ? < > = _ & : ~ ^` and `->` in names. **No** `|…|` + vertical-bar quoted identifiers. A lone `.` is **not** a symbol — + it's reserved for dotted-pair syntax. +- Booleans: `#t`, `#f`. +- Integers: decimal (`42`, `-7`) and hex (`#xff`, `#x-1a`). Word-size + only — 32-bit on 32-bit targets, 64-bit on 64-bit targets. **No** `#o` + (octal), **no** `#b` (binary), **no** floats, rationals, complex, or + bignums. Overflow is undefined (see "Primitive failure" below). +- Strings: `"…"` with escapes `\n \t \r \\ \"`. A string *is* a + bytevector; indexing is by u8. **String literals are immutable** — + `bytevector-u8-set!` on a literal is undefined. +- Characters: `#\a` through `#\~` for printable ASCII, plus named forms + `#\newline` (10), `#\space` (32), `#\tab` (9), `#\return` (13), + `#\null` (0), and `#\xNN` for any byte. A character literal *is* a + u8 integer — no distinct character type. `(= #\a 97)` is `#t`. +- Symbols: bare (`foo`, `list->bytes`). **Globally interned** — two + symbols that print the same are `eq?`. +- Pairs / lists: `(a b c)`, `(a . b)`, `'()` for the empty list. Literal + pairs are immutable. +- Quote: `'x` ≡ `(quote x)`. **No** quasiquote / unquote / unquote-splicing. +- Comments: `;` to end of line. **No** `#| … |#` block, **no** `#;datum`. + +## Types + +The runtime knows about exactly these: + +| Type | Notes | +|--------------|----------------------------------------------------| +| boolean | `#t`, `#f` | +| integer | word-size; 32-bit or 64-bit per target | +| symbol | globally interned; `eq?`-comparable | +| string / bv | same type; contiguous `u8[]`; literals immutable | +| pair | cons cell; literals immutable | +| empty list | `'()`, disjoint from pair | +| procedure | | +| record | via `define-record-type` | + +Characters are **not** a separate type — `#\a` is lexical sugar for the +u8 value 97. + +**Not present:** vectors, ports (as a language type), promises, +parameters (dynamic-scoped values, distinct from function args), +continuations, multiple values, exceptions. + +## Special forms + +- `(define name value)` and `(define (name arg ...) body ...)`, including + variadic tails: `(define (f . rest) …)` and `(define (f a b . rest) …)`. + Top-level `define`s may forward-reference each other — mutual + recursion at module scope is supported. Inside a `lambda`/`let` + body, `define`s behave like `letrec*` (sequential, visible to + later forms). +- `(lambda (arg ...) body ...)` with the same `.`-tail syntax. + A `lambda` evaluates to a closure: a first-class procedure value that + captures its enclosing lexical environment by reference. `set!` on a + captured variable is visible to every closure that captured it, and + closures that escape their creating frame outlive it (general case: + heap-allocated; non-escape optimization is an implementation detail). +- `(if test then else)` — `else` required. +- `(cond (test body ...) ... (else body ...))`. +- `(and e ...)` / `(or e ...)` — short-circuiting; return the deciding + value, not a coerced boolean. +- `(let ((x v) ...) body ...)`, `(let* …)`, `(letrec …)`. +- `(let name ((x v) ...) body ...)` — named let. +- `(begin e ...)`. +- `(set! name value)`. +- `(quote datum)` / `'datum`. +- `(define-record-type name (constructor field ...) predicate + (field accessor [mutator]) ...)` — R7RS form. Creates a disjoint + type with the given constructor, predicate, and field + accessors/mutators. Implementable as a single heap-allocated object + with a type-descriptor pointer; does not require exposing vectors. +- `(pmatch expr clause ...)` — pattern matcher. *Not* R7RS; added + because the self-hosted compiler destructures s-expressions + constantly and a macroless language makes hand-rolled `car`/`cdr` + chains painful. Clause forms: + + ``` + (pattern body ...) + (pattern (guard g-expr ...) body ...) + (else body ...) + ``` + + Patterns: + + | Pattern | Matches | + |----------------------|------------------------------------------| + | `()` | the empty list | + | `<literal>` | integers, strings, chars, booleans by `equal?` | + | `<symbol>` | that exact symbol (*not* a binder) | + | `,<ident>` | anything; binds to `<ident>` | + | `,_` | anything; no binding (wildcard) | + | `(p1 p2 ...)` | proper list of exactly that length | + | `(p1 ... . ptail)` | improper list; `ptail` binds the rest | + + Clauses are tried top-to-bottom; first matching pattern with + truthy guards wins. A guard that evaluates to `#f` is treated as + no-match and falls through to the next clause. **No match with no + `else`** is undefined behavior (follows the primitive-failure + policy); put an explicit `(else (error ...))` when you care. + + Example: + + ```scheme + (pmatch e + ((quote ,x) (emit-literal x)) + ((if ,t ,a ,b) (emit-if t a b)) + ((lambda ,ps . ,body) (emit-lambda ps body)) + (,x (guard (symbol? x)) (emit-var x)) + (else (error "unknown form" e))) + ``` + +## Core procedures + +Flat namespace — no library carving for v1. + +**Predicates / equality** +`eq?`, `equal?`, `not`, `boolean?`, `pair?`, `null?`, `symbol?`, +`integer?`, `string?` (≡ `bytevector?`), `procedure?`. + +**Pairs & lists** +`cons`, `car`, `cdr`, `set-car!`, `set-cdr!`, `list`, `length`, +`reverse`, `append`, `list-ref`, `map`, `for-each`. + +**Integers** (word-size; overflow undefined) +`+ - *`, `quotient`, `remainder`, `modulo`, `= < <= > >=`, `zero?`, +`positive?`, `negative?`, `abs`, `min`, `max`, +`bit-and`, `bit-or`, `bit-xor`, `bit-not`, `arithmetic-shift`, +`number->string` (optional radix; 10 and 16 required), +`string->number` (optional radix; returns `#f` on parse failure — this +is a pure op, not a syscall, so the normal `(ok . val)` convention +doesn't apply). + +Division rounding (R7RS semantics) — worth stating since target ISAs +differ at the instruction level: + + - `quotient` truncates toward zero: `(quotient -7 2) ⇒ -3` + - `remainder` has the sign of the *dividend*: `(remainder -7 2) ⇒ -1` + - `modulo` has the sign of the *divisor*: `(modulo -7 2) ⇒ 1` + - Identity: `(= n (+ (* (quotient n d) d) (remainder n d)))` + +**Strings / bytevectors** +`make-bytevector`, `bytevector-length`, `bytevector-u8-ref`, +`bytevector-u8-set!`, `bytevector-copy` (optional start/end → fresh bv), +`bytevector-copy!` (R7RS arg order: `dst dst-start src [src-start src-end]`), +`bytevector-append` (variadic; fresh bv), +`bytevector=?` (binary; byte-for-byte equality), +`string->symbol`, `symbol->string`. + +**Control** +`apply`. Tail calls are guaranteed proper on all targets — named let and +every looping idiom depend on it. + +**Program** +`(argv)` returns the process's command-line arguments as a list of +bytevectors. The first element is the program name as passed by the +kernel (argv[0]); user arguments start at the second element. + +**Errors** +`error` aborts with a message and optional irritants. No `raise` / +`guard` / handlers. + +**Primitive failure** is undefined behavior / crash. `(car '())`, +`(quotient 1 0)`, out-of-bounds `bytevector-u8-ref`, integer overflow, +mutating a literal — all implementation-defined, expect a crash. Callers +should not rely on any particular outcome. + +## Cut from R7RS-small + +Kept explicit so additions are deliberate. + +| Feature | Why cut | +|-------------------------------------------|----------------------------------------------| +| `syntax-rules`, macros, `define-syntax` | library is written without them for v1 | +| `call/cc`, `dynamic-wind` | big runtime cost, not needed here | +| `values`, `call-with-values` | replaced by `(ok . val)` pair convention | +| Numeric tower (rational/real/complex/float)| integers only | +| Bignums | word-size integers only (32- or 64-bit) | +| Character *type* (char? char->integer etc)| `#\a` syntax kept as u8 sugar; no char type | +| Vectors (`#(…)`, `make-vector`, etc.) | lists for sequences, records for structs | +| First-class ports | shell.scm defines its own | +| `quasiquote` / `` ` `` `,` `,@` | sugar; re-add with macros | +| `case`, `when`, `unless`, `do` | sugar over `cond`/`if` | +| `delay`, `force`, promises | — | +| `make-parameter`, `parameterize` | — | +| `guard`, `raise`, `with-exception-handler`| v1 uses `(ok . val)` | +| `define-library`, `import`, `export` | files are files for v1 | +| `include`, `cond-expand` | — | +| `case-lambda` | no arity overloading | +| `#o…` `#b…` `#;` `#\| \|#` | keeps the lexer small (`#\…` is kept) | +| `eqv?` | `eq?` + `equal?` are enough | + +## Dependencies of shell.scm on this subset + +Sanity check that the subset actually supports the library we've written: + +- variadic `.` in `lambda` / `define` → `spawn`, `run` +- named let → `refill!`-driven read loops, `bv-concat-reverse` +- `define-record-type` → `port` record +- `apply` → `run` calling `spawn` +- `bit-or`, `bit-and`, `arithmetic-shift` → openat flags, wait-status decode +- `make-bytevector` with fill byte → `NL-BV` +- `bytevector-copy` (3-arg) and `bytevector-copy!` → read/write paths +- `cons`, `car`, `cdr`, `null?`, `zero?`, `=`, `<`, `-`, `+` → everywhere + +Anything flagged here that we ultimately cut forces a rewrite of +shell.scm.