boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit d189dc2447799f1159087d8a3383382b2de6d49b
parent de399adbe43b14b14b58c0f55c521fc29bd08328
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 20 Apr 2026 17:35:47 -0700

Add LISP.md for the P1-hosted Lisp interpreter

Specifies value representation, heap layout, mark-sweep GC with pair
bitmap and 10 size classes, environment model, eval with P1 TAIL,
reader with source-location side table, printer, symbol interning,
primitive FFI, pmatch, startup/REPL, and a 12-step implementation plan
budgeted at ~4,600 P1 LOC.

Diffstat:
ALISP.md | 492+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 492 insertions(+), 0 deletions(-)

diff --git a/LISP.md b/LISP.md @@ -0,0 +1,492 @@ +# LISP: Seed Lisp interpreter in P1 + +## Goal + +A small Scheme-flavored Lisp interpreter written once in the P1 pseudo-ISA +(see [P1.md](P1.md)) and assembled to three host arches (amd64, aarch64, +riscv64) via the existing `M1` + `hex2` chain. Hosts the C1 compiler (see +[SEED.md](SEED.md)) and the C compiler (see [PLAN.md](PLAN.md)) as Lisp +programs. Target size: 4,000–6,000 lines of P1. + +This document covers only the interpreter. The Lisp-language surface it +exposes is specified by PLAN.md's "Feature floor" — this doc does not +restate it. + +## Position in the chain + +``` +M1 asm → P1 pseudo-ISA → Lisp interpreter (this doc) → C1 compiler + → C compiler → tcc-boot +``` + +The interpreter binary is the last artifact built from P1 source in the +bootstrap. Everything above it is either Lisp source text or a C1/C binary +produced by a Lisp program. + +## Settled decisions + +Load-bearing; the rest of the document assumes them. + +1. **Tagged cells, low-3-bit tag, 61-bit fixnum payload.** See + §Value representation. +2. **Mark-sweep GC over a fixed BSS arena.** 20 MB, linked in at build + time, no `brk`/`mmap`. +3. **Fixed root-spill slots.** 32 named cells in BSS; Scheme-facing + functions spill live values there before any allocating call. The + P1 stack is not scanned. +4. **Source file via `argv[1]`.** The interpreter opens the path given + on the command line, reads it whole into a string, parses it, + evaluates top-to-bottom. No linked-in blob. +5. **Optional REPL under `--repl`.** Development aid; not used by the + bootstrap pipeline. ~100 P1 LOC. +6. **No bignums, ever.** 61-bit fixnums only. Confirmed: the C compiler + never needs larger. +7. **Single `lisp.M1` source file.** M1 has no `#include` and + concatenating shards in the Makefile adds build-time coupling we + don't need yet. Revisit only if the file passes ~6k LOC. +8. **`pmatch` is a built-in special form, not a user-level macro.** + Saves writing a general macro expander for one feature. +9. **Tail calls via P1 `TAIL`.** `eval` dispatches tail-position calls + through `TAIL`; non-tail through `CALL`. Scheme-level tail-call + correctness falls out for free. +10. **Five syscalls: `read`, `write`, `open`, `close`, `exit`.** Matches + PLAN.md. No signals, no `lseek`, no `stat`. +11. **Pair GC marks live in a separate bitmap**, not in the pair words. + ~1.25 MB BSS for a 20 MB heap; keeps pairs at 16 bytes and keeps + fixnums at 61 bits. +12. **Fixnum overflow wraps mod 2^61.** `+`/`-`/`*` do not trap. C + compiler workloads never approach 2^60. +13. **`equal?` does not detect cycles.** Cyclic input loops forever; + the C compiler never constructs cycles. +14. **Reader attaches source locations via a side table.** Pair-pointer + → `(line . col)` lookup, populated by the reader, consulted by the + error path. ~200 LOC. +15. **Closures are three-slot objects**: header + params + body + env. + GC traces three fields; revisit only if closure churn measures hot. + +## Value representation + +All Lisp values are 64-bit tagged words. Low 3 bits are the tag: + +| Tag (bin) | Meaning | Payload | +|-----------|----------------------|---------------------------------------| +| `001` | fixnum | signed 61-bit, `>> 3` to recover | +| `010` | pair pointer | 8-byte aligned, clear low 3 to deref | +| `011` | vector pointer | idem | +| `100` | string pointer | idem | +| `101` | symbol pointer | idem, interned (`eq?`-comparable) | +| `110` | closure / primitive | idem | +| `111` | singleton constant | `()`, `#t`, `#f`, `eof`, `unspec` | + +Pointer tags require 8-byte-aligned allocation, enforced by the bump +allocator. + +Singletons live in the `111` space as small immediates (e.g. `0x07` nil, +`0x0F` #t, `0x17` #f, `0x1F` eof, `0x27` unspec). Distinguishable from +fixnums by tag, from pointers by tag, and from each other by payload. + +### Pairs + +Two consecutive 8-byte words: `[car][cdr]`. No header. The tag on the +pointer tells GC it is a pair and has no embedded length. + +### Headered objects + +Vectors, strings, symbols, closures, and primitives carry a 1-word header +before their payload: + +``` +[ 8-bit type | 8-bit gc-flags | 48-bit length-or-arity ] +``` + +- **Vector**: header + `length` tagged words. +- **String**: header + `length` bytes (ASCII, per PLAN.md). Not + null-terminated; length is authoritative. Padded up to 8-byte + alignment. +- **Symbol**: header + name bytes + a 1-word intern-chain link. Two + symbols are `eq?` iff their pointers match; the interner guarantees + one object per name. +- **Closure**: header + `params` list pointer + `body` list pointer + + `env` pointer. Arity in header. +- **Primitive**: header + P1 function pointer. Arity in header. + +## Heap layout + +Single BSS region, fixed at link time: + +``` +:heap_start + <20 MB of zero bytes> +:heap_end +``` + +- `heap_next` — bump pointer, starts at `heap_start`. +- `free_lists[10]` — free-list heads per size class (see §GC). + +### Allocation + +`alloc(size)` rounds `size` up to 8 bytes, consults the matching size-class +free list first, otherwise bumps `heap_next`. If the bump would cross +`heap_end`, run GC; if still no room, `error "heap exhausted"`. + +Everything is 8-byte aligned by construction. + +## GC + +Mark-sweep, stop-the-world, invoked only when the allocator overflows. +Not triggered by time, step count, or explicit request. + +### Roots + +Three sources: + +1. **Global environment.** Fixed pointer in BSS (`global_env`), traced + transitively. +2. **Symbol table.** Intern table's backing array plus all interned + symbols and their name strings. Fixed pointer (`symbol_table`). +3. **Root-spill slots.** 32 tagged words at `root_slots[0..32]` in BSS. + A 32-bit `live_mask` word marks which slots currently hold a live + value. Scheme-facing functions set and clear bits in the mask + across their dynamic extent. + +The P1 stack is **not scanned.** Stack frames hold P1 integers, return +addresses, and scratch — not Scheme-value-shaped words unless those +words are also rooted in a spill slot. + +### Algorithm + +1. **Mark.** Traverse from roots, setting the `gc-flags` mark bit on + each reachable headered object. Pairs have no header; their marks + live in a separate bitmap indexed by + `(pair_addr - heap_start) / 16`. Bitmap cost: ~1.25 MB for a 20 MB + heap; removes the need to steal bits from pair data. +2. **Sweep.** Walk the arena object-by-object. Live objects clear + their mark and continue. Dead objects of size ≤ 64 bytes join the + matching free list; larger dead objects are coalesced, and if the + tail of the arena is all free, rewind `heap_next`. No object + motion. + +### Size classes + +Free lists for 16, 24, 32, 40, 48, 56, 64, 80, 96, 128 bytes (ten +classes). The low seven cover the common object sizes: + +| Class | Typical occupants | +|-------|-----------------------------------------------------------------| +| 16 | pairs (dominant), primitives, 0–8 char strings | +| 24 | short symbols (1–8 char names), 9–16 char strings, 2-slot | +| 32 | closures, medium symbols (9–16 char names), 3-slot vectors | +| 40 | 4-slot vectors / env frames of 4 vars | +| 48 | 5-slot vectors | +| 56 | 6-slot vectors | +| 64 | 7-slot vectors | +| 80 | 9-slot vectors, env frames of 8 vars | +| 96 | 11-slot vectors, env frames of 10 vars | +| 128 | 15-slot vectors, env frames up to 15 vars | + +Pairs (16) dominate allocation volume in compiler workloads; closures +(32) are a fixed size so get their own class. The 80/96/128 classes +catch local-env vectors for functions with 8–15 bound names, which +would otherwise only be reclaimed by bump-pointer rewind. + +Allocations above 128 bytes go through the bump pointer on every +cycle — typically long string literals and large vectors, both +uncommon in compiler workloads. + +### Invariants + +- Between any two GC-observable points, every live Scheme value must be + reachable from `global_env`, `symbol_table`, or a live root slot. + "GC-observable" = any call to `cons`, `make-vector`, `make-string`, + `make-closure`, `intern`, or another Scheme function. +- `live_mask` bits are owned by the function that set them; callers + don't set a callee's bits. + +Tradeoffs taken: + +- Mark bitmap costs memory but keeps pair encoding simple. +- Stop-the-world is fine: single-threaded, and the C compiler is not + latency-sensitive. +- No compaction: fragmentation is acceptable for a ≤20 MB heap with a + pair-dominated workload. + +## Environment + +Two layers: + +- **Global env.** Open-addressing hash table keyed by symbol pointer + identity. 4096 slots, linear probe. Holds bindings established by + top-level `define`. +- **Local env.** A chain of frames. Each frame is a pair + `((names-list . values-vector) . parent-frame)`. Lookup walks up + until a name matches, then falls through to the global hash. + +Hash over a symbol pointer: `(ptr >> 3) & (TABLE_SIZE - 1)`. Collision +rate is low — the intern table guarantees one pointer per name, and +Lisp source is sparse in identifiers. + +`define` at top level writes to the global hash; `define` inside a body +is rewritten to `let`-shape at eval time (standard single-pass +practice). + +## Eval + +Textbook meta-circular recursive-descent evaluator. Each form has a case: + +``` +eval(expr, env): + if self-evaluating: return expr + if symbol: return lookup(expr, env) + match car(expr): + 'quote → return cadr(expr) + 'if → eval test; TAIL eval selected branch + 'begin → eval each non-last; TAIL eval last + 'lambda → return make-closure(params, body, env) + 'define → bind in env; return unspec + 'set! → mutate binding; return unspec + 'let/let*/letrec → desugar to lambda application; TAIL eval + 'cond → desugar to nested if; TAIL eval + 'quasiquote → expand; TAIL eval + 'pmatch → compile arm chain; TAIL eval selected arm + otherwise → eval callee + args; TAIL apply +``` + +`TAIL` above compiles to a literal P1 `TAIL` — the current `eval` +frame is torn down before the recursive call. Non-tail positions +(test of `if`, non-last of `begin`, callee/arg evaluation of an +application) use P1 `CALL`. + +### Application + +`apply(callee, args-list)`: + +- **Primitive:** load function pointer from header, pack args into a + fixed `argv` region in BSS, `CALL` the primitive, return its result. +- **Closure:** construct a new frame binding params to args; `TAIL eval` + the body in the new env. + +Arity mismatches raise `error`. + +## Reader + +Recursive-descent over the in-memory source string with an integer +cursor. No port abstraction. + +``` +read(src-string, cursor) → (value, new-cursor) +``` + +Tokens are produced on demand; no token array. + +Forms recognized: + +- `( ... )` — list, with `.` for improper tail +- `' expr` → `(quote expr)` +- `` ` expr `` → `(quasiquote expr)` +- `, expr` → `(unquote expr)` +- `,@ expr` → `(unquote-splicing expr)` +- `#( ... )` — vector literal +- `#t`, `#f` +- integer literals: decimal (`-42`, `123`) and hex (`0x7F`) +- character literal: `#\a`, `#\space`, `#\newline`, `#\tab` +- string literal: `"..."` with `\n \t \\ \"` escapes +- symbol: every other `[!-~]` run not starting with a digit +- `;` — line comment, skipped + +Reader output conses directly into the heap; the spill-slot discipline +applies. + +### Source locations + +The reader maintains a side table mapping pair pointer → `(line . col)` +of the opening token for that form. The table is an open-addressing +hash keyed by pair pointer identity; it is a GC root so locations +survive across collections, and entries for dead pairs are dropped +during sweep (a location whose pair pointer is unmarked is itself +dead). `error` consults this table when its argument is a pair and +prefixes the diagnostic with `"at <file>:<line>:<col>: "`. + +Cost: ~150 LOC in the reader, ~50 LOC in the error path and GC sweep. +Pays for itself when the C compiler mis-parses a tcc-boot source. + +## Printer + +Three entry points: + +- `display` — no quoting on strings; bare character output. +- `write` — readable form: strings quoted, characters prefixed. +- `format fmt args...` — supports `~a` (display), `~s` (write), `~d` + (decimal fixnum), `~%` (newline). No other directives. + +Output buffered in a 4 KB BSS buffer; flushed on full or on newline for +`display`/`write`. `error` flushes before writing to fd 2. + +## Symbol interning + +Single open-addressing hash table with linear probe. Hash: + +``` +h = 0 +for each byte b in name: + h = (h * 31 + b) & 0xFFFFFFFF +return h & (TABLE_SIZE - 1) +``` + +`TABLE_SIZE = 4096`. Invoked on every symbol the reader sees; the +table is a GC root. + +## Primitives FFI + +Each primitive is an ordinary P1 function: + +- **In:** `r1` = argc, `r2` = pointer to argv (tagged words, + back-to-back). +- **Out:** `r0` = result tagged word. + +A static dispatch table maps `(symbol-pointer → primitive-pointer)`. At +startup the interpreter populates `global_env` with one binding per +primitive, each wrapped as a primitive-type heap object. + +Primitives allocate through the same `alloc`/GC path as user code and +must honor the root-spill discipline when they make more than one +allocation. + +Target set: ~40 primitives per PLAN.md. Roughly: + +- **List/pair**: `cons car cdr pair? null? list? list length append + reverse map filter fold assoc member` +- **Arithmetic**: `+ - * / % = < > <= >= zero? negative? positive? + abs min max` +- **Bitwise**: `bit-and bit-or bit-xor bit-not arithmetic-shift` +- **String**: `string? string-length string-ref substring + string-append string->symbol symbol->string` +- **Vector**: `make-vector vector-ref vector-set! vector-length + vector->list list->vector` +- **Type predicates**: `number? symbol? string? vector? procedure? + eq? eqv? equal?` +- **I/O**: `read-file write-file display write newline format error` +- **Control**: `apply` + +## `pmatch` + +Built-in special form. Pattern language: + +- literal fixnum / string / symbol — matches by `equal?` +- `?var` — bind variable +- `_` — wildcard +- `(p1 p2 ...)` — structural list match +- `(p1 . p2)` — improper-list split +- `(quote x)` / `'x` — symbol literal match +- `(guard expr)` — predicate after a match + +``` +(pmatch expr + [pattern₁ body₁] + [pattern₂ body₂] + ... + [else body-else]) +``` + +Compiled at eval time into a chain of test / bind / `TAIL eval body` +operations. Compilation result is cached in the AST after first +expansion so a `pmatch` site does not recompile per execution. + +## Error path + +`(error msg arg ...)`: + +1. Flush any pending printer buffer. +2. Write `"error: "` + the message + formatted args to fd 2. +3. `exit(1)`. + +No unwinding, no continuations, no handlers. The C1/C compilers either +succeed or exit with diagnostics. + +## Startup + +``` +_start: + argc at [sp + 0], argv at [sp + 8] (per P1 entry conv) + if argc < 2: error "usage: lisp <file.scm> | --repl" + if argv[1] == "--repl": repl_loop() + else: run_file(argv[1]) + exit(0) +``` + +`run_file` reads the whole file into a string, then loops: +parse one top-level form, eval it, discard, advance cursor — so a +`define` of a helper is visible before the next form that uses it. + +### REPL + +Under `--repl`: + +``` +loop: + write "> " to fd 1 + read one line from fd 0 into scratch buffer + on EOF: exit 0 + parse one expression from the line + eval; write result; newline + goto loop +``` + +~100 LOC. Development aid; not invoked by the bootstrap pipeline. + +## File layout + +``` +lispcc/ + lisp.M1 # single P1 source, target 4–6k LOC + tests/lisp/ + 00-arith.scm + 01-list.scm + 02-reader.scm + 03-printer.scm + 04-eval.scm + 05-tailcall.scm + 06-gc-stress.scm + 07-pmatch.scm + 08-records.scm + ... +``` + +Tests are `.scm` files the interpreter runs; pass = exit 0 and expected +stdout. `make test-lisp` loops over them; `make test-lisp-all` runs +each test on all three arches via the P1 differential harness. + +## Staged implementation plan + +Each step assembles through existing P1 → M1 → hex2 and carries test +coverage before the next begins. + +1. **Runtime skeleton.** `_start`, argv parsing, `read-file`/`write-file` + via syscalls, `error` path, BSS layout, bump allocator. Exits with + the file's first byte as proof-of-life. ~200 P1 LOC. +2. **Tagged values.** Fixnum/pair/singleton encoding; `cons`/`car`/`cdr`/ + tag predicates. Hand-build a pair; print its car. ~300 LOC. +3. **Strings + symbol interning.** Heap strings, 4096-slot intern table, + symbol allocator. Two `(intern "foo")` calls return `eq?` pointers. + ~400 LOC. +4. **Reader.** End-to-end `source → sexpr`, plus the source-location + side table. Test: parse a 200-LOC Scheme file; print the tree back + and diff; inject a syntax error and confirm `line:col` in the + diagnostic. ~850 LOC. +5. **Printer.** `display`, `write`, minimal `format`. Closes a + read-print cycle. ~300 LOC. +6. **Eval (non-tail).** Self-evaluators, lookup, `if`, `begin`, + `lambda`, `define`, application via P1 `CALL`. ~600 LOC. +7. **`TAIL` integration.** Rewrite tail-position `eval` calls to P1 + `TAIL`. Stress: loop of 10^6 self-calls with flat P1 stack. +8. **Primitives (40).** Mechanical once the FFI harness exists. ~500 + LOC. +9. **Mark-sweep GC.** Mark bitmap, root discipline, sweep, size-classed + free lists. Stress: cons-churn allocating 1000× heap in flight. + ~500 LOC. +10. **`pmatch` + records-via-vectors.** ~300 LOC. +11. **REPL.** `--repl` flag, line reader, eval-print loop. ~100 LOC. +12. **End-to-end.** Run a 200-LOC Scheme test program exercising all + features; diff tri-arch output. + +Rough rollup: ~4,600 P1 LOC (includes source-location side table), +mid-range in PLAN.md's 4–6k budget.