boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit 3e6ce5ada262294ca065705bdf65003ebf823fa1
parent b96f45aff0a222fd0290bd77309c8a4c812307e4
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 20 Apr 2026 13:18:55 -0700

Rewrite PLAN.md to layer the Lisp bootstrap on P1

Previously the plan wrote the Lisp interpreter directly in amd64 M1 asm,
giving a single-arch result and paying the per-arch encoding tax inside
the interpreter. With P1 now implemented (spike across three arches),
the interpreter is authored once in P1 and assembled three ways;
porting to a fourth arch means a new defs file, not a rewrite.

Changes:
- Reframe the chain as M1 -> P1 -> Lisp -> C compiler -> tcc-boot.
- Add "Why P1 as the host" section noting dependency on P1.md stages
  1-4 and that stage 5 is this plan's kickoff.
- Backend now two options (emit M1-amd64 vs emit P1), decision deferred
  to after measuring P1 codegen quality.
- Budget table updated; total LOC unchanged (~9-13k) but result is
  tri-arch instead of amd64-only.
- Add Resolutions section capturing the design decisions surfaced
  while scoping P1: narrow-load zero-extend only, accept 2x code-size
  tax, codify TAIL in P1, per-function constant pools, static BSS GC
  arena, five-syscall surface (read/write/open/close/exit).

Diffstat:
MPLAN.md | 92+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 79 insertions(+), 13 deletions(-)

diff --git a/PLAN.md b/PLAN.md @@ -1,10 +1,13 @@ -# Alternative bootstrap path: Lisp-in-M1 → C compiler in Lisp → tcc-boot +# Alternative bootstrap path: Lisp-in-P1 → C compiler in Lisp → tcc-boot ## Goal Shrink the auditable LOC between M1 assembly and tcc-boot by replacing the current `M2-Planet → mes → MesCC → nyacc` stack with a small Lisp written -directly in M1 asm and a C compiler written in that Lisp. +once in the P1 portable pseudo-ISA (see `../P1.md`) and a C compiler written +in that Lisp. P1 is the same layer described in `P1.md`: ~30 RISC-shaped ops +whose per-arch `DEFINE` tables expand to amd64 / aarch64 / riscv64 encodings, +so one Lisp source serves all three hosts. ## Current chain (validated counts) @@ -22,13 +25,29 @@ directly in M1 asm and a C compiler written in that Lisp. ## Proposed chain ``` -M1 asm → Lisp interpreter (in M1 asm) → C compiler (in Lisp) → tcc-boot +M1 asm → P1 pseudo-ISA → Lisp interpreter (in P1) → C compiler (in Lisp) → tcc-boot ``` -Two languages, one new interpreter, one new compiler. No M2-Planet, no Mes -core, no MesCC, no nyacc. +Two languages plus one portable asm layer, one new interpreter, one new +compiler. No M2-Planet, no Mes core, no MesCC, no nyacc. The interpreter is +authored once in P1 and assembled three ways; porting to a fourth arch means +a new P1 defs file, not a rewrite. -## Asm Lisp — feature floor +## Why P1 as the host + +- **Single source of truth.** A Lisp in raw M1 asm would need three + hand-written variants (one per target arch). In P1, there is one source; + the per-arch cost is already paid inside the P1 defs files. +- **Cost lives in P1, not here.** P1's one-time tax (~1500 defines × 3 arches + generator-driven, plus ~240 LOC of `hex2_word` + `M1-macro` aarch64 work) + is accounted in `P1.md`. This plan inherits that layer rather than + duplicating it. +- **Dependency ordering.** PLAN cannot start the Lisp interpreter until P1 + stages 1–4 in `P1.md` are complete (spike on all three arches plus the + full ~30-op matrix). P1 stage 5 ("seed Lisp interpreter in ~500 lines of + P1") is effectively this plan's kickoff. + +## Lisp — feature floor Justification: empirical audit of MesCC's actual Scheme usage. MesCC barely exercises Scheme. @@ -99,17 +118,64 @@ uses these heavily. ## Backend -Emit **text M1 assembly** for x86_64. Reuse the existing M1 macro assembler -+ hex2 linker downstream (no change there). Single architecture only. +Two options, to be decided after the P1 spike: + +1. **Emit text M1 assembly** for x86_64, single-arch. Simplest codegen; + tcc-boot only runs on amd64. Matches the original plan. +2. **Emit P1** from the C compiler. The C compiler is written once in + portable Lisp and also *emits* portable asm, so tcc-boot lands on all + three arches for free (modulo tcc-boot's own arch support). Codegen gets + slightly harder — P1 is deliberately dumb, so C idioms like `x += y` + expand to multi-op P1 sequences — but we pay the ~2× code-size tax + already budgeted in `P1.md` rather than writing three backends. + +Option 2 is the natural endpoint of the P1 investment. Defer the decision +until we have measured P1 codegen quality on a non-trivial program (P1.md +stage 5). ## Estimated budget | Component | Lines | |---|---| -| Lisp interpreter in M1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 M1 | +| Lisp interpreter in P1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 P1 | | C lexer + recursive-descent parser + CPP (in Lisp) | 2,000–3,000 | | Type checker + IR (slimmed compile.scm + info.scm) | 2,000–3,000 | -| x86_64 codegen + M1 emit | 800–1,200 | -| **Total** | **~9,000–13,000 LOC** | - -vs. **~54,000 LOC** current = **~4–6× shrink**. +| Codegen + asm emit (M1-amd64 or P1, see Backend) | 800–1,500 | +| **Total auditable (this plan)** | **~9,000–13,000 LOC** | + +vs. **~54,000 LOC** current = **~4–6× shrink**, and the result is +tri-arch instead of amd64-only. P1's own infrastructure (defs files, +`hex2_word` extensions, generator) is audited once in `P1.md` and shared +with any future seed-stage program. + +## Resolutions + +- **Narrow loads: zero-extend only.** P1 keeps `LB`/`LW` zero-extending; + no `LBS`/`LWS` added. Fixnums live in full 64-bit tagged cells, so + the interpreter never needs a sign-extended narrow load — byte/ASCII + access is unsigned, and arithmetic happens on 64-bit values already. +- **Static code size: accept the 2× tax.** P1's destructive-expansion + rule on amd64 roughly doubles instruction count vs. hand-tuned amd64. + Matches P1's "deliberately dumb" contract (see `P1.md`). Interpreter + binary expected in low single-digit MB — irrelevant for a seed. +- **Tail calls: codify `TAIL` in P1.** A new `TAIL %label` macro (see + `P1.md`, Control flow) expands to `LD lr, sp, 0; ADDI sp, sp, +16; + B %label` or the per-arch equivalent. The interpreter's `eval` is + written in the natural recursive style with tail-position calls + compiled through `TAIL`, so the P1 stack does not grow per Scheme + frame. As a side effect, Scheme-level tail calls fall out R5RS-proper + for the interpreter's subset without extra mechanism. +- **Pool placement: per-function on all arches.** Each function emits its + constant pool at its epilogue, inside the aarch64 `LDR`-literal ±1 MiB + range. Labels are file-local; duplicated constants across functions + are accepted. Simple rule, no range-check logic in codegen. +- **GC arena: static BSS.** The ~20 MB heap is reserved as a single BSS + region at link time. No `brk`/`mmap` at runtime, no arena-sizing flag. + Keeps the P1 program to a minimal syscall surface and makes the + interpreter image self-describing. +- **Syscalls: five.** `read`, `write`, `open`, `close`, `exit`. Each + becomes one P1 `SYSCALL` op backed by a per-arch number table in the + P1 defs file. `read-file` loops `read` into a growable string until + EOF (no `stat`/`lseek`); `display`/`write`/`error` go through `write` + on fd 1/2; `error` finishes with `exit`. No signals, time, fork/exec, + or networking.