Rewrite PLAN.md to layer the Lisp bootstrap on P1 - boot2

commit 3e6ce5ada262294ca065705bdf65003ebf823fa1
parent b96f45aff0a222fd0290bd77309c8a4c812307e4
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 20 Apr 2026 13:18:55 -0700

Rewrite PLAN.md to layer the Lisp bootstrap on P1

Previously the plan wrote the Lisp interpreter directly in amd64 M1 asm,
giving a single-arch result and paying the per-arch encoding tax inside
the interpreter. With P1 now implemented (spike across three arches),
the interpreter is authored once in P1 and assembled three ways;
porting to a fourth arch means a new defs file, not a rewrite.

Changes:
- Reframe the chain as M1 -> P1 -> Lisp -> C compiler -> tcc-boot.
- Add "Why P1 as the host" section noting dependency on P1.md stages
  1-4 and that stage 5 is this plan's kickoff.
- Backend now two options (emit M1-amd64 vs emit P1), decision deferred
  to after measuring P1 codegen quality.
- Budget table updated; total LOC unchanged (~9-13k) but result is
  tri-arch instead of amd64-only.
- Add Resolutions section capturing the design decisions surfaced
  while scoping P1: narrow-load zero-extend only, accept 2x code-size
  tax, codify TAIL in P1, per-function constant pools, static BSS GC
  arena, five-syscall surface (read/write/open/close/exit).

Diffstat:
M PLAN.md  | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------

1 file changed, 79 insertions(+), 13 deletions(-)
diff --git a/PLAN.md b/PLAN.md
@@ -1,10 +1,13 @@
-# Alternative bootstrap path: Lisp-in-M1 → C compiler in Lisp → tcc-boot
+# Alternative bootstrap path: Lisp-in-P1 → C compiler in Lisp → tcc-boot
 
 ## Goal
 
 Shrink the auditable LOC between M1 assembly and tcc-boot by replacing the
 current `M2-Planet → mes → MesCC → nyacc` stack with a small Lisp written
-directly in M1 asm and a C compiler written in that Lisp.
+once in the P1 portable pseudo-ISA (see `../P1.md`) and a C compiler written
+in that Lisp. P1 is the same layer described in `P1.md`: ~30 RISC-shaped ops
+whose per-arch `DEFINE` tables expand to amd64 / aarch64 / riscv64 encodings,
+so one Lisp source serves all three hosts.
 
 ## Current chain (validated counts)
 
@@ -22,13 +25,29 @@ directly in M1 asm and a C compiler written in that Lisp.
 ## Proposed chain
 
 ```
-M1 asm  →  Lisp interpreter (in M1 asm)  →  C compiler (in Lisp)  →  tcc-boot
+M1 asm  →  P1 pseudo-ISA  →  Lisp interpreter (in P1)  →  C compiler (in Lisp)  →  tcc-boot
 ```
 
-Two languages, one new interpreter, one new compiler. No M2-Planet, no Mes
-core, no MesCC, no nyacc.
+Two languages plus one portable asm layer, one new interpreter, one new
+compiler. No M2-Planet, no Mes core, no MesCC, no nyacc. The interpreter is
+authored once in P1 and assembled three ways; porting to a fourth arch means
+a new P1 defs file, not a rewrite.
 
-## Asm Lisp — feature floor
+## Why P1 as the host
+
+- **Single source of truth.** A Lisp in raw M1 asm would need three
+  hand-written variants (one per target arch). In P1, there is one source;
+  the per-arch cost is already paid inside the P1 defs files.
+- **Cost lives in P1, not here.** P1's one-time tax (~1500 defines × 3 arches
+  generator-driven, plus ~240 LOC of `hex2_word` + `M1-macro` aarch64 work)
+  is accounted in `P1.md`. This plan inherits that layer rather than
+  duplicating it.
+- **Dependency ordering.** PLAN cannot start the Lisp interpreter until P1
+  stages 1–4 in `P1.md` are complete (spike on all three arches plus the
+  full ~30-op matrix). P1 stage 5 ("seed Lisp interpreter in ~500 lines of
+  P1") is effectively this plan's kickoff.
+
+## Lisp — feature floor
 
 Justification: empirical audit of MesCC's actual Scheme usage. MesCC barely
 exercises Scheme.
@@ -99,17 +118,64 @@ uses these heavily.
 
 ## Backend
 
-Emit **text M1 assembly** for x86_64. Reuse the existing M1 macro assembler
-+ hex2 linker downstream (no change there). Single architecture only.
+Two options, to be decided after the P1 spike:
+
+1. **Emit text M1 assembly** for x86_64, single-arch. Simplest codegen;
+   tcc-boot only runs on amd64. Matches the original plan.
+2. **Emit P1** from the C compiler. The C compiler is written once in
+   portable Lisp and also *emits* portable asm, so tcc-boot lands on all
+   three arches for free (modulo tcc-boot's own arch support). Codegen gets
+   slightly harder — P1 is deliberately dumb, so C idioms like `x += y`
+   expand to multi-op P1 sequences — but we pay the ~2× code-size tax
+   already budgeted in `P1.md` rather than writing three backends.
+
+Option 2 is the natural endpoint of the P1 investment. Defer the decision
+until we have measured P1 codegen quality on a non-trivial program (P1.md
+stage 5).
 
 ## Estimated budget
 
 | Component | Lines |
 |---|---|
-| Lisp interpreter in M1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 M1 |
+| Lisp interpreter in P1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 P1 |
 | C lexer + recursive-descent parser + CPP (in Lisp) | 2,000–3,000 |
 | Type checker + IR (slimmed compile.scm + info.scm) | 2,000–3,000 |
-| x86_64 codegen + M1 emit | 800–1,200 |
-| **Total** | **~9,000–13,000 LOC** |
-
-vs. **~54,000 LOC** current = **~4–6× shrink**.
+| Codegen + asm emit (M1-amd64 or P1, see Backend) | 800–1,500 |
+| **Total auditable (this plan)** | **~9,000–13,000 LOC** |
+
+vs. **~54,000 LOC** current = **~4–6× shrink**, and the result is
+tri-arch instead of amd64-only. P1's own infrastructure (defs files,
+`hex2_word` extensions, generator) is audited once in `P1.md` and shared
+with any future seed-stage program.
+
+## Resolutions
+
+- **Narrow loads: zero-extend only.** P1 keeps `LB`/`LW` zero-extending;
+  no `LBS`/`LWS` added. Fixnums live in full 64-bit tagged cells, so
+  the interpreter never needs a sign-extended narrow load — byte/ASCII
+  access is unsigned, and arithmetic happens on 64-bit values already.
+- **Static code size: accept the 2× tax.** P1's destructive-expansion
+  rule on amd64 roughly doubles instruction count vs. hand-tuned amd64.
+  Matches P1's "deliberately dumb" contract (see `P1.md`). Interpreter
+  binary expected in low single-digit MB — irrelevant for a seed.
+- **Tail calls: codify `TAIL` in P1.** A new `TAIL %label` macro (see
+  `P1.md`, Control flow) expands to `LD lr, sp, 0; ADDI sp, sp, +16;
+  B %label` or the per-arch equivalent. The interpreter's `eval` is
+  written in the natural recursive style with tail-position calls
+  compiled through `TAIL`, so the P1 stack does not grow per Scheme
+  frame. As a side effect, Scheme-level tail calls fall out R5RS-proper
+  for the interpreter's subset without extra mechanism.
+- **Pool placement: per-function on all arches.** Each function emits its
+  constant pool at its epilogue, inside the aarch64 `LDR`-literal ±1 MiB
+  range. Labels are file-local; duplicated constants across functions
+  are accepted. Simple rule, no range-check logic in codegen.
+- **GC arena: static BSS.** The ~20 MB heap is reserved as a single BSS
+  region at link time. No `brk`/`mmap` at runtime, no arena-sizing flag.
+  Keeps the P1 program to a minimal syscall surface and makes the
+  interpreter image self-describing.
+- **Syscalls: five.** `read`, `write`, `open`, `close`, `exit`. Each
+  becomes one P1 `SYSCALL` op backed by a per-arch number table in the
+  P1 defs file. `read-file` loops `read` into a growable string until
+  EOF (no `stat`/`lseek`); `display`/`write`/`error` go through `write`
+  on fd 1/2; `error` finishes with `exit`. No signals, time, fork/exec,
+  or networking.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs