commit d189dc2447799f1159087d8a3383382b2de6d49b
parent de399adbe43b14b14b58c0f55c521fc29bd08328
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 20 Apr 2026 17:35:47 -0700
Add LISP.md for the P1-hosted Lisp interpreter
Specifies value representation, heap layout, mark-sweep GC with pair
bitmap and 10 size classes, environment model, eval with P1 TAIL,
reader with source-location side table, printer, symbol interning,
primitive FFI, pmatch, startup/REPL, and a 12-step implementation plan
budgeted at ~4,600 P1 LOC.
Diffstat:
| A | LISP.md | | | 492 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 492 insertions(+), 0 deletions(-)
diff --git a/LISP.md b/LISP.md
@@ -0,0 +1,492 @@
+# LISP: Seed Lisp interpreter in P1
+
+## Goal
+
+A small Scheme-flavored Lisp interpreter written once in the P1 pseudo-ISA
+(see [P1.md](P1.md)) and assembled to three host arches (amd64, aarch64,
+riscv64) via the existing `M1` + `hex2` chain. Hosts the C1 compiler (see
+[SEED.md](SEED.md)) and the C compiler (see [PLAN.md](PLAN.md)) as Lisp
+programs. Target size: 4,000–6,000 lines of P1.
+
+This document covers only the interpreter. The Lisp-language surface it
+exposes is specified by PLAN.md's "Feature floor" — this doc does not
+restate it.
+
+## Position in the chain
+
+```
+M1 asm → P1 pseudo-ISA → Lisp interpreter (this doc) → C1 compiler
+ → C compiler → tcc-boot
+```
+
+The interpreter binary is the last artifact built from P1 source in the
+bootstrap. Everything above it is either Lisp source text or a C1/C binary
+produced by a Lisp program.
+
+## Settled decisions
+
+Load-bearing; the rest of the document assumes them.
+
+1. **Tagged cells, low-3-bit tag, 61-bit fixnum payload.** See
+ §Value representation.
+2. **Mark-sweep GC over a fixed BSS arena.** 20 MB, linked in at build
+ time, no `brk`/`mmap`.
+3. **Fixed root-spill slots.** 32 named cells in BSS; Scheme-facing
+ functions spill live values there before any allocating call. The
+ P1 stack is not scanned.
+4. **Source file via `argv[1]`.** The interpreter opens the path given
+ on the command line, reads it whole into a string, parses it,
+ evaluates top-to-bottom. No linked-in blob.
+5. **Optional REPL under `--repl`.** Development aid; not used by the
+ bootstrap pipeline. ~100 P1 LOC.
+6. **No bignums, ever.** 61-bit fixnums only. Confirmed: the C compiler
+ never needs larger.
+7. **Single `lisp.M1` source file.** M1 has no `#include` and
+ concatenating shards in the Makefile adds build-time coupling we
+ don't need yet. Revisit only if the file passes ~6k LOC.
+8. **`pmatch` is a built-in special form, not a user-level macro.**
+ Saves writing a general macro expander for one feature.
+9. **Tail calls via P1 `TAIL`.** `eval` dispatches tail-position calls
+ through `TAIL`; non-tail through `CALL`. Scheme-level tail-call
+ correctness falls out for free.
+10. **Five syscalls: `read`, `write`, `open`, `close`, `exit`.** Matches
+ PLAN.md. No signals, no `lseek`, no `stat`.
+11. **Pair GC marks live in a separate bitmap**, not in the pair words.
+ ~1.25 MB BSS for a 20 MB heap; keeps pairs at 16 bytes and keeps
+ fixnums at 61 bits.
+12. **Fixnum overflow wraps mod 2^61.** `+`/`-`/`*` do not trap. C
+ compiler workloads never approach 2^60.
+13. **`equal?` does not detect cycles.** Cyclic input loops forever;
+ the C compiler never constructs cycles.
+14. **Reader attaches source locations via a side table.** Pair-pointer
+ → `(line . col)` lookup, populated by the reader, consulted by the
+ error path. ~200 LOC.
+15. **Closures are three-slot objects**: header + params + body + env.
+ GC traces three fields; revisit only if closure churn measures hot.
+
+## Value representation
+
+All Lisp values are 64-bit tagged words. Low 3 bits are the tag:
+
+| Tag (bin) | Meaning | Payload |
+|-----------|----------------------|---------------------------------------|
+| `001` | fixnum | signed 61-bit, `>> 3` to recover |
+| `010` | pair pointer | 8-byte aligned, clear low 3 to deref |
+| `011` | vector pointer | idem |
+| `100` | string pointer | idem |
+| `101` | symbol pointer | idem, interned (`eq?`-comparable) |
+| `110` | closure / primitive | idem |
+| `111` | singleton constant | `()`, `#t`, `#f`, `eof`, `unspec` |
+
+Pointer tags require 8-byte-aligned allocation, enforced by the bump
+allocator.
+
+Singletons live in the `111` space as small immediates (e.g. `0x07` nil,
+`0x0F` #t, `0x17` #f, `0x1F` eof, `0x27` unspec). Distinguishable from
+fixnums by tag, from pointers by tag, and from each other by payload.
+
+### Pairs
+
+Two consecutive 8-byte words: `[car][cdr]`. No header. The tag on the
+pointer tells GC it is a pair and has no embedded length.
+
+### Headered objects
+
+Vectors, strings, symbols, closures, and primitives carry a 1-word header
+before their payload:
+
+```
+[ 8-bit type | 8-bit gc-flags | 48-bit length-or-arity ]
+```
+
+- **Vector**: header + `length` tagged words.
+- **String**: header + `length` bytes (ASCII, per PLAN.md). Not
+ null-terminated; length is authoritative. Padded up to 8-byte
+ alignment.
+- **Symbol**: header + name bytes + a 1-word intern-chain link. Two
+ symbols are `eq?` iff their pointers match; the interner guarantees
+ one object per name.
+- **Closure**: header + `params` list pointer + `body` list pointer +
+ `env` pointer. Arity in header.
+- **Primitive**: header + P1 function pointer. Arity in header.
+
+## Heap layout
+
+Single BSS region, fixed at link time:
+
+```
+:heap_start
+ <20 MB of zero bytes>
+:heap_end
+```
+
+- `heap_next` — bump pointer, starts at `heap_start`.
+- `free_lists[10]` — free-list heads per size class (see §GC).
+
+### Allocation
+
+`alloc(size)` rounds `size` up to 8 bytes, consults the matching size-class
+free list first, otherwise bumps `heap_next`. If the bump would cross
+`heap_end`, run GC; if still no room, `error "heap exhausted"`.
+
+Everything is 8-byte aligned by construction.
+
+## GC
+
+Mark-sweep, stop-the-world, invoked only when the allocator overflows.
+Not triggered by time, step count, or explicit request.
+
+### Roots
+
+Three sources:
+
+1. **Global environment.** Fixed pointer in BSS (`global_env`), traced
+ transitively.
+2. **Symbol table.** Intern table's backing array plus all interned
+ symbols and their name strings. Fixed pointer (`symbol_table`).
+3. **Root-spill slots.** 32 tagged words at `root_slots[0..32]` in BSS.
+ A 32-bit `live_mask` word marks which slots currently hold a live
+ value. Scheme-facing functions set and clear bits in the mask
+ across their dynamic extent.
+
+The P1 stack is **not scanned.** Stack frames hold P1 integers, return
+addresses, and scratch — not Scheme-value-shaped words unless those
+words are also rooted in a spill slot.
+
+### Algorithm
+
+1. **Mark.** Traverse from roots, setting the `gc-flags` mark bit on
+ each reachable headered object. Pairs have no header; their marks
+ live in a separate bitmap indexed by
+ `(pair_addr - heap_start) / 16`. Bitmap cost: ~1.25 MB for a 20 MB
+ heap; removes the need to steal bits from pair data.
+2. **Sweep.** Walk the arena object-by-object. Live objects clear
+ their mark and continue. Dead objects of size ≤ 64 bytes join the
+ matching free list; larger dead objects are coalesced, and if the
+ tail of the arena is all free, rewind `heap_next`. No object
+ motion.
+
+### Size classes
+
+Free lists for 16, 24, 32, 40, 48, 56, 64, 80, 96, 128 bytes (ten
+classes). The low seven cover the common object sizes:
+
+| Class | Typical occupants |
+|-------|-----------------------------------------------------------------|
+| 16 | pairs (dominant), primitives, 0–8 char strings |
+| 24 | short symbols (1–8 char names), 9–16 char strings, 2-slot |
+| 32 | closures, medium symbols (9–16 char names), 3-slot vectors |
+| 40 | 4-slot vectors / env frames of 4 vars |
+| 48 | 5-slot vectors |
+| 56 | 6-slot vectors |
+| 64 | 7-slot vectors |
+| 80 | 9-slot vectors, env frames of 8 vars |
+| 96 | 11-slot vectors, env frames of 10 vars |
+| 128 | 15-slot vectors, env frames up to 15 vars |
+
+Pairs (16) dominate allocation volume in compiler workloads; closures
+(32) are a fixed size so get their own class. The 80/96/128 classes
+catch local-env vectors for functions with 8–15 bound names, which
+would otherwise only be reclaimed by bump-pointer rewind.
+
+Allocations above 128 bytes go through the bump pointer on every
+cycle — typically long string literals and large vectors, both
+uncommon in compiler workloads.
+
+### Invariants
+
+- Between any two GC-observable points, every live Scheme value must be
+ reachable from `global_env`, `symbol_table`, or a live root slot.
+ "GC-observable" = any call to `cons`, `make-vector`, `make-string`,
+ `make-closure`, `intern`, or another Scheme function.
+- `live_mask` bits are owned by the function that set them; callers
+ don't set a callee's bits.
+
+Tradeoffs taken:
+
+- Mark bitmap costs memory but keeps pair encoding simple.
+- Stop-the-world is fine: single-threaded, and the C compiler is not
+ latency-sensitive.
+- No compaction: fragmentation is acceptable for a ≤20 MB heap with a
+ pair-dominated workload.
+
+## Environment
+
+Two layers:
+
+- **Global env.** Open-addressing hash table keyed by symbol pointer
+ identity. 4096 slots, linear probe. Holds bindings established by
+ top-level `define`.
+- **Local env.** A chain of frames. Each frame is a pair
+ `((names-list . values-vector) . parent-frame)`. Lookup walks up
+ until a name matches, then falls through to the global hash.
+
+Hash over a symbol pointer: `(ptr >> 3) & (TABLE_SIZE - 1)`. Collision
+rate is low — the intern table guarantees one pointer per name, and
+Lisp source is sparse in identifiers.
+
+`define` at top level writes to the global hash; `define` inside a body
+is rewritten to `let`-shape at eval time (standard single-pass
+practice).
+
+## Eval
+
+Textbook meta-circular recursive-descent evaluator. Each form has a case:
+
+```
+eval(expr, env):
+ if self-evaluating: return expr
+ if symbol: return lookup(expr, env)
+ match car(expr):
+ 'quote → return cadr(expr)
+ 'if → eval test; TAIL eval selected branch
+ 'begin → eval each non-last; TAIL eval last
+ 'lambda → return make-closure(params, body, env)
+ 'define → bind in env; return unspec
+ 'set! → mutate binding; return unspec
+ 'let/let*/letrec → desugar to lambda application; TAIL eval
+ 'cond → desugar to nested if; TAIL eval
+ 'quasiquote → expand; TAIL eval
+ 'pmatch → compile arm chain; TAIL eval selected arm
+ otherwise → eval callee + args; TAIL apply
+```
+
+`TAIL` above compiles to a literal P1 `TAIL` — the current `eval`
+frame is torn down before the recursive call. Non-tail positions
+(test of `if`, non-last of `begin`, callee/arg evaluation of an
+application) use P1 `CALL`.
+
+### Application
+
+`apply(callee, args-list)`:
+
+- **Primitive:** load function pointer from header, pack args into a
+ fixed `argv` region in BSS, `CALL` the primitive, return its result.
+- **Closure:** construct a new frame binding params to args; `TAIL eval`
+ the body in the new env.
+
+Arity mismatches raise `error`.
+
+## Reader
+
+Recursive-descent over the in-memory source string with an integer
+cursor. No port abstraction.
+
+```
+read(src-string, cursor) → (value, new-cursor)
+```
+
+Tokens are produced on demand; no token array.
+
+Forms recognized:
+
+- `( ... )` — list, with `.` for improper tail
+- `' expr` → `(quote expr)`
+- `` ` expr `` → `(quasiquote expr)`
+- `, expr` → `(unquote expr)`
+- `,@ expr` → `(unquote-splicing expr)`
+- `#( ... )` — vector literal
+- `#t`, `#f`
+- integer literals: decimal (`-42`, `123`) and hex (`0x7F`)
+- character literal: `#\a`, `#\space`, `#\newline`, `#\tab`
+- string literal: `"..."` with `\n \t \\ \"` escapes
+- symbol: every other `[!-~]` run not starting with a digit
+- `;` — line comment, skipped
+
+Reader output conses directly into the heap; the spill-slot discipline
+applies.
+
+### Source locations
+
+The reader maintains a side table mapping pair pointer → `(line . col)`
+of the opening token for that form. The table is an open-addressing
+hash keyed by pair pointer identity; it is a GC root so locations
+survive across collections, and entries for dead pairs are dropped
+during sweep (a location whose pair pointer is unmarked is itself
+dead). `error` consults this table when its argument is a pair and
+prefixes the diagnostic with `"at <file>:<line>:<col>: "`.
+
+Cost: ~150 LOC in the reader, ~50 LOC in the error path and GC sweep.
+Pays for itself when the C compiler mis-parses a tcc-boot source.
+
+## Printer
+
+Three entry points:
+
+- `display` — no quoting on strings; bare character output.
+- `write` — readable form: strings quoted, characters prefixed.
+- `format fmt args...` — supports `~a` (display), `~s` (write), `~d`
+ (decimal fixnum), `~%` (newline). No other directives.
+
+Output buffered in a 4 KB BSS buffer; flushed on full or on newline for
+`display`/`write`. `error` flushes before writing to fd 2.
+
+## Symbol interning
+
+Single open-addressing hash table with linear probe. Hash:
+
+```
+h = 0
+for each byte b in name:
+ h = (h * 31 + b) & 0xFFFFFFFF
+return h & (TABLE_SIZE - 1)
+```
+
+`TABLE_SIZE = 4096`. Invoked on every symbol the reader sees; the
+table is a GC root.
+
+## Primitives FFI
+
+Each primitive is an ordinary P1 function:
+
+- **In:** `r1` = argc, `r2` = pointer to argv (tagged words,
+ back-to-back).
+- **Out:** `r0` = result tagged word.
+
+A static dispatch table maps `(symbol-pointer → primitive-pointer)`. At
+startup the interpreter populates `global_env` with one binding per
+primitive, each wrapped as a primitive-type heap object.
+
+Primitives allocate through the same `alloc`/GC path as user code and
+must honor the root-spill discipline when they make more than one
+allocation.
+
+Target set: ~40 primitives per PLAN.md. Roughly:
+
+- **List/pair**: `cons car cdr pair? null? list? list length append
+ reverse map filter fold assoc member`
+- **Arithmetic**: `+ - * / % = < > <= >= zero? negative? positive?
+ abs min max`
+- **Bitwise**: `bit-and bit-or bit-xor bit-not arithmetic-shift`
+- **String**: `string? string-length string-ref substring
+ string-append string->symbol symbol->string`
+- **Vector**: `make-vector vector-ref vector-set! vector-length
+ vector->list list->vector`
+- **Type predicates**: `number? symbol? string? vector? procedure?
+ eq? eqv? equal?`
+- **I/O**: `read-file write-file display write newline format error`
+- **Control**: `apply`
+
+## `pmatch`
+
+Built-in special form. Pattern language:
+
+- literal fixnum / string / symbol — matches by `equal?`
+- `?var` — bind variable
+- `_` — wildcard
+- `(p1 p2 ...)` — structural list match
+- `(p1 . p2)` — improper-list split
+- `(quote x)` / `'x` — symbol literal match
+- `(guard expr)` — predicate after a match
+
+```
+(pmatch expr
+ [pattern₁ body₁]
+ [pattern₂ body₂]
+ ...
+ [else body-else])
+```
+
+Compiled at eval time into a chain of test / bind / `TAIL eval body`
+operations. Compilation result is cached in the AST after first
+expansion so a `pmatch` site does not recompile per execution.
+
+## Error path
+
+`(error msg arg ...)`:
+
+1. Flush any pending printer buffer.
+2. Write `"error: "` + the message + formatted args to fd 2.
+3. `exit(1)`.
+
+No unwinding, no continuations, no handlers. The C1/C compilers either
+succeed or exit with diagnostics.
+
+## Startup
+
+```
+_start:
+ argc at [sp + 0], argv at [sp + 8] (per P1 entry conv)
+ if argc < 2: error "usage: lisp <file.scm> | --repl"
+ if argv[1] == "--repl": repl_loop()
+ else: run_file(argv[1])
+ exit(0)
+```
+
+`run_file` reads the whole file into a string, then loops:
+parse one top-level form, eval it, discard, advance cursor — so a
+`define` of a helper is visible before the next form that uses it.
+
+### REPL
+
+Under `--repl`:
+
+```
+loop:
+ write "> " to fd 1
+ read one line from fd 0 into scratch buffer
+ on EOF: exit 0
+ parse one expression from the line
+ eval; write result; newline
+ goto loop
+```
+
+~100 LOC. Development aid; not invoked by the bootstrap pipeline.
+
+## File layout
+
+```
+lispcc/
+ lisp.M1 # single P1 source, target 4–6k LOC
+ tests/lisp/
+ 00-arith.scm
+ 01-list.scm
+ 02-reader.scm
+ 03-printer.scm
+ 04-eval.scm
+ 05-tailcall.scm
+ 06-gc-stress.scm
+ 07-pmatch.scm
+ 08-records.scm
+ ...
+```
+
+Tests are `.scm` files the interpreter runs; pass = exit 0 and expected
+stdout. `make test-lisp` loops over them; `make test-lisp-all` runs
+each test on all three arches via the P1 differential harness.
+
+## Staged implementation plan
+
+Each step assembles through existing P1 → M1 → hex2 and carries test
+coverage before the next begins.
+
+1. **Runtime skeleton.** `_start`, argv parsing, `read-file`/`write-file`
+ via syscalls, `error` path, BSS layout, bump allocator. Exits with
+ the file's first byte as proof-of-life. ~200 P1 LOC.
+2. **Tagged values.** Fixnum/pair/singleton encoding; `cons`/`car`/`cdr`/
+ tag predicates. Hand-build a pair; print its car. ~300 LOC.
+3. **Strings + symbol interning.** Heap strings, 4096-slot intern table,
+ symbol allocator. Two `(intern "foo")` calls return `eq?` pointers.
+ ~400 LOC.
+4. **Reader.** End-to-end `source → sexpr`, plus the source-location
+ side table. Test: parse a 200-LOC Scheme file; print the tree back
+ and diff; inject a syntax error and confirm `line:col` in the
+ diagnostic. ~850 LOC.
+5. **Printer.** `display`, `write`, minimal `format`. Closes a
+ read-print cycle. ~300 LOC.
+6. **Eval (non-tail).** Self-evaluators, lookup, `if`, `begin`,
+ `lambda`, `define`, application via P1 `CALL`. ~600 LOC.
+7. **`TAIL` integration.** Rewrite tail-position `eval` calls to P1
+ `TAIL`. Stress: loop of 10^6 self-calls with flat P1 stack.
+8. **Primitives (40).** Mechanical once the FFI harness exists. ~500
+ LOC.
+9. **Mark-sweep GC.** Mark bitmap, root discipline, sweep, size-classed
+ free lists. Stress: cons-churn allocating 1000× heap in flight.
+ ~500 LOC.
+10. **`pmatch` + records-via-vectors.** ~300 LOC.
+11. **REPL.** `--repl` flag, line reader, eval-print loop. ~100 LOC.
+12. **End-to-end.** Run a 200-LOC Scheme test program exercising all
+ features; diff tri-arch output.
+
+Rough rollup: ~4,600 P1 LOC (includes source-location side table),
+mid-range in PLAN.md's 4–6k budget.