commit 3e6ce5ada262294ca065705bdf65003ebf823fa1
parent b96f45aff0a222fd0290bd77309c8a4c812307e4
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 20 Apr 2026 13:18:55 -0700
Rewrite PLAN.md to layer the Lisp bootstrap on P1
Previously the plan wrote the Lisp interpreter directly in amd64 M1 asm,
giving a single-arch result and paying the per-arch encoding tax inside
the interpreter. With P1 now implemented (spike across three arches),
the interpreter is authored once in P1 and assembled three ways;
porting to a fourth arch means a new defs file, not a rewrite.
Changes:
- Reframe the chain as M1 -> P1 -> Lisp -> C compiler -> tcc-boot.
- Add "Why P1 as the host" section noting dependency on P1.md stages
1-4 and that stage 5 is this plan's kickoff.
- Backend now two options (emit M1-amd64 vs emit P1), decision deferred
to after measuring P1 codegen quality.
- Budget table updated; total LOC unchanged (~9-13k) but result is
tri-arch instead of amd64-only.
- Add Resolutions section capturing the design decisions surfaced
while scoping P1: narrow-load zero-extend only, accept 2x code-size
tax, codify TAIL in P1, per-function constant pools, static BSS GC
arena, five-syscall surface (read/write/open/close/exit).
Diffstat:
| M | PLAN.md | | | 92 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------ |
1 file changed, 79 insertions(+), 13 deletions(-)
diff --git a/PLAN.md b/PLAN.md
@@ -1,10 +1,13 @@
-# Alternative bootstrap path: Lisp-in-M1 → C compiler in Lisp → tcc-boot
+# Alternative bootstrap path: Lisp-in-P1 → C compiler in Lisp → tcc-boot
## Goal
Shrink the auditable LOC between M1 assembly and tcc-boot by replacing the
current `M2-Planet → mes → MesCC → nyacc` stack with a small Lisp written
-directly in M1 asm and a C compiler written in that Lisp.
+once in the P1 portable pseudo-ISA (see `../P1.md`) and a C compiler written
+in that Lisp. P1 is the same layer described in `P1.md`: ~30 RISC-shaped ops
+whose per-arch `DEFINE` tables expand to amd64 / aarch64 / riscv64 encodings,
+so one Lisp source serves all three hosts.
## Current chain (validated counts)
@@ -22,13 +25,29 @@ directly in M1 asm and a C compiler written in that Lisp.
## Proposed chain
```
-M1 asm → Lisp interpreter (in M1 asm) → C compiler (in Lisp) → tcc-boot
+M1 asm → P1 pseudo-ISA → Lisp interpreter (in P1) → C compiler (in Lisp) → tcc-boot
```
-Two languages, one new interpreter, one new compiler. No M2-Planet, no Mes
-core, no MesCC, no nyacc.
+Two languages plus one portable asm layer, one new interpreter, one new
+compiler. No M2-Planet, no Mes core, no MesCC, no nyacc. The interpreter is
+authored once in P1 and assembled three ways; porting to a fourth arch means
+a new P1 defs file, not a rewrite.
-## Asm Lisp — feature floor
+## Why P1 as the host
+
+- **Single source of truth.** A Lisp in raw M1 asm would need three
+ hand-written variants (one per target arch). In P1, there is one source;
+ the per-arch cost is already paid inside the P1 defs files.
+- **Cost lives in P1, not here.** P1's one-time tax (~1500 defines × 3 arches
+ generator-driven, plus ~240 LOC of `hex2_word` + `M1-macro` aarch64 work)
+ is accounted in `P1.md`. This plan inherits that layer rather than
+ duplicating it.
+- **Dependency ordering.** PLAN cannot start the Lisp interpreter until P1
+ stages 1–4 in `P1.md` are complete (spike on all three arches plus the
+ full ~30-op matrix). P1 stage 5 ("seed Lisp interpreter in ~500 lines of
+ P1") is effectively this plan's kickoff.
+
+## Lisp — feature floor
Justification: empirical audit of MesCC's actual Scheme usage. MesCC barely
exercises Scheme.
@@ -99,17 +118,64 @@ uses these heavily.
## Backend
-Emit **text M1 assembly** for x86_64. Reuse the existing M1 macro assembler
-+ hex2 linker downstream (no change there). Single architecture only.
+Two options, to be decided after the P1 spike:
+
+1. **Emit text M1 assembly** for x86_64, single-arch. Simplest codegen;
+ tcc-boot only runs on amd64. Matches the original plan.
+2. **Emit P1** from the C compiler. The C compiler is written once in
+ portable Lisp and also *emits* portable asm, so tcc-boot lands on all
+ three arches for free (modulo tcc-boot's own arch support). Codegen gets
+ slightly harder — P1 is deliberately dumb, so C idioms like `x += y`
+ expand to multi-op P1 sequences — but we pay the ~2× code-size tax
+ already budgeted in `P1.md` rather than writing three backends.
+
+Option 2 is the natural endpoint of the P1 investment. Defer the decision
+until we have measured P1 codegen quality on a non-trivial program (P1.md
+stage 5).
## Estimated budget
| Component | Lines |
|---|---|
-| Lisp interpreter in M1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 M1 |
+| Lisp interpreter in P1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 P1 |
| C lexer + recursive-descent parser + CPP (in Lisp) | 2,000–3,000 |
| Type checker + IR (slimmed compile.scm + info.scm) | 2,000–3,000 |
-| x86_64 codegen + M1 emit | 800–1,200 |
-| **Total** | **~9,000–13,000 LOC** |
-
-vs. **~54,000 LOC** current = **~4–6× shrink**.
+| Codegen + asm emit (M1-amd64 or P1, see Backend) | 800–1,500 |
+| **Total auditable (this plan)** | **~9,000–13,000 LOC** |
+
+vs. **~54,000 LOC** current = **~4–6× shrink**, and the result is
+tri-arch instead of amd64-only. P1's own infrastructure (defs files,
+`hex2_word` extensions, generator) is audited once in `P1.md` and shared
+with any future seed-stage program.
+
+## Resolutions
+
+- **Narrow loads: zero-extend only.** P1 keeps `LB`/`LW` zero-extending;
+ no `LBS`/`LWS` added. Fixnums live in full 64-bit tagged cells, so
+ the interpreter never needs a sign-extended narrow load — byte/ASCII
+ access is unsigned, and arithmetic happens on 64-bit values already.
+- **Static code size: accept the 2× tax.** P1's destructive-expansion
+ rule on amd64 roughly doubles instruction count vs. hand-tuned amd64.
+ Matches P1's "deliberately dumb" contract (see `P1.md`). Interpreter
+ binary expected in low single-digit MB — irrelevant for a seed.
+- **Tail calls: codify `TAIL` in P1.** A new `TAIL %label` macro (see
+ `P1.md`, Control flow) expands to `LD lr, sp, 0; ADDI sp, sp, +16;
+ B %label` or the per-arch equivalent. The interpreter's `eval` is
+ written in the natural recursive style with tail-position calls
+ compiled through `TAIL`, so the P1 stack does not grow per Scheme
+ frame. As a side effect, Scheme-level tail calls fall out R5RS-proper
+ for the interpreter's subset without extra mechanism.
+- **Pool placement: per-function on all arches.** Each function emits its
+ constant pool at its epilogue, inside the aarch64 `LDR`-literal ±1 MiB
+ range. Labels are file-local; duplicated constants across functions
+ are accepted. Simple rule, no range-check logic in codegen.
+- **GC arena: static BSS.** The ~20 MB heap is reserved as a single BSS
+ region at link time. No `brk`/`mmap` at runtime, no arena-sizing flag.
+ Keeps the P1 program to a minimal syscall surface and makes the
+ interpreter image self-describing.
+- **Syscalls: five.** `read`, `write`, `open`, `close`, `exit`. Each
+ becomes one P1 `SYSCALL` op backed by a per-arch number table in the
+ P1 defs file. `read-file` loops `read` into a growable string until
+ EOF (no `stat`/`lseek`); `display`/`write`/`error` go through `write`
+ on fd 1/2; `error` finishes with `exit`. No signals, time, fork/exec,
+ or networking.