boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 9e1803cb83d77b9d2501e034d8be1631a25020b0
parent 799eba0226f49f5dc353cef2359a0f5311a4fc87
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed,  6 May 2026 10:53:19 -0700

PLAN.md for audit and polish

Diffstat:
Rcc/cc.scm.md -> docs/CCSCM.md | 0
Adocs/PLAN.md | 410+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 410 insertions(+), 0 deletions(-)

diff --git a/cc/cc.scm.md b/docs/CCSCM.md diff --git a/docs/PLAN.md b/docs/PLAN.md @@ -0,0 +1,410 @@ +# Cleanup & Audit Plan + +The mechanical goal is met: hex0-seed → … → tcc3 → musl → seed-kernel runs on +{aarch64, amd64, riscv64} via two drivers (podman, seed). This plan covers +the next pass: making the result auditable and uniform. + +## Decisions + +1. **Output layout**: `build/<arch>/<driver>/...` +2. **Makefile vs bootN scripts**: replace the parallel Makefile recipes + (`make m1pp`, `make hex2pp`, `make scheme1`, etc.) with rules that drive + the `bootN.sh` scripts. One pipeline, one layout. +3. **Reproducibility criterion**: every `boot{N}/` output artifact must be + byte-identical between `DRIVER=podman` and `DRIVER=seed`, per arch. +4. **Source preparation is a separate up-front host stage.** All flattening, + patching, unpacking, calibration happens once into a canonical generated + source tree. Boot stages copy/reference from it; they do no source prep + themselves. +5. **Path-based Makefile deps.** `make build/aarch64/seed/boot6/Image` + walks the dependency chain — no separate `boot0`/`boot1`/… phony + targets driving stages. Outputs are the targets. +6. **Tests live in their own Makefile** (`tests/Makefile`), invoked from + the top level but not commingled with the build pipeline. +7. **Consistency over backcompat.** Patch, refactor, rename, rewrite as + needed. tcc patches are in scope when they buy uniformity. +8. **Phase order**: A6 → A3 → A0 → A4 → AT → AX → A2 → A5. Each phase + lands on a clean base from the previous one. AT (tcc patches) and AX + (tests Makefile) are largely independent of the others and can slot in + wherever convenient. + +## Phases + +### A6. Hoist driver/arch boilerplate + +**Goal.** One source of truth for arch→platform mapping, driver dispatch, +prereq checks, and log prefixes. Stage-N scripts shrink to the parts that +are actually stage-specific. + +**Touch list.** +- New: `scripts/lib-arch.sh`. Exports `PLATFORM`, `KERNEL_NAME`, + `KERNEL_IMAGE`, `MUSL_ARCH` from `$ARCH`. Single source. +- `scripts/lib-pipeline.sh`, `scripts/lib-runscm.sh`: source `lib-arch.sh` + in `_init_*`; pull `case "$DRIVER"` validation + image build / kernel + check up here. +- `scripts/boot{0..6}.sh`: delete duplicated `case $ARCH` and + `case $DRIVER` blocks; delete prereq `[ -x "$BOOT(N-1)/x" ] || die` + blocks; replace with `require_prev hex2 catm M0` style helper. +- All scripts: standardize log prefix to `[bootN/$driver/$arch]`. +- `scripts/boot.sh`: same. + +**Validation.** `make test` (all suites, default arch) + +`scripts/boot.sh aarch64` end-to-end on both drivers. Output bytes +unchanged. + +--- + +### A3. Per-driver output trees + +**Goal.** `build/<arch>/<driver>/...` everywhere. Kill the +`build/.seed-bootstrap/` shuffle in `boot.sh`. Two drivers can coexist on +disk; nothing gets clobbered when you switch. + +**Touch list.** +- `scripts/lib-pipeline.sh`, `scripts/lib-runscm.sh`: change `OUT`/`STAGE` + derivation to insert `$DRIVER`. +- `scripts/boot{0..6}.sh`: every `BOOT(N-1)=build/$ARCH/boot(N-1)` → + `build/$ARCH/$DRIVER/boot(N-1)`. +- `scripts/boot.sh`: drop the `.seed-bootstrap` stash/restore. The boot6 + kernel that the seed driver uses now lives at + `build/<arch>/podman/boot6/{Image|kernel.elf}` and is referenced + directly when running seed. +- `seed-kernel/run.sh`: kernel-path arg / env still works; just point + callers at the new location. +- `Makefile`: every `OUT_DIR := build/$(ARCH)` → include `$(DRIVER)`. +- `.gitignore` if it pins `build/<arch>/`. +- Top-level `Makefile` comment block (output-layout doc). + +**Validation.** Both `DRIVER=podman scripts/boot.sh aarch64` and +`DRIVER=seed scripts/boot.sh aarch64` produce coherent trees side by side. + +--- + +### A0. Canonical generated source tree (host source-prep) + +**Goal.** All host-side source preparation happens once, up front, into a +single canonical tree at `build/<arch>/src/`. This tree is the audit +basis and the only thing boot stages read for source. Boot stages do no +flattening, no unpacking, no patching, no calibration. + +**Layout.** +``` +build/<arch>/src/ + bin/ # binary inputs that aren't built by a stage + hex0-seed # (vendored seed only) + src/ # everything textual + vendor-seed/ # ELF.hex2, *.hex0, *.hex1, *.hex2 (vendored) + M1pp/ # M1pp.P1 + hex2pp/ # hex2pp.P1 + P1/ # P1.M1pp, P1-<arch>.M1, P1-<arch>.M1pp, + # P1pp.P1pp, entry-*.P1pp, elf-end.P1pp + catm/ # catm.P1pp + scheme1/ # scheme1.P1pp, prelude.scm + cc/ # cc.scm, main.scm + tcc/ # tcc.flat.c, stdarg-bridge.h, + # tcc-0.9.26-…/{include,lib}/ tree + libc/ # libc.flat.c (mes-libc flattened) + musl/ # filtered musl-1.2.5 tree (overrides + # merged, deletes applied), per-arch + # skip-list, generated alltypes.h / + # syscall.h + kernel/ # seed-kernel kernel.c, arch/<arch>/*, + # user/* + test-fixtures/ # only the slice tests need at runtime + # (host runner reads tests/ directly) +``` + +The `bin/` vs `src/` split is established here; downstream stages +inherit it (their `$STAGE/in/{bin,src}/` is a copy-or-symlink view of +the right slice of this tree). + +**What moves into A0.** +- `scripts/stage1-flatten.sh` (tcc flatten) — runs once, output to + `build/<arch>/src/src/tcc/`. +- `scripts/libc-flatten.sh` (mes-libc flatten) — runs once, output to + `build/<arch>/src/src/libc/`. +- `scripts/musl-vendor.sh` (musl unpack + overrides + deletes) — runs + once, output to `build/<arch>/src/src/musl/`. +- `scripts/boot5-calibrate.sh` (musl skip-list per arch) — runs once, + output as part of `src/musl/skip.txt`. If skip-list is committed + upstream (`vendor/upstream/musl-1.2.5-skip-<arch>.txt`), copy it; else + run calibration (which itself depends on boot4 — see below). +- All the per-stage `.stage/in/` materialization moves out of + `lib-pipeline.sh` / `lib-runscm.sh` and becomes a thin "copy from + `build/<arch>/src/`" step. + +**Calibration ordering.** `boot5-calibrate.sh` needs a working tcc3, so +the canonical src tree is built in two passes: **A0a** (everything that +doesn't need a compiler) before boot0, **A0b** (musl filter using the +calibration result) after boot4. Both write into `build/<arch>/src/`. +Document the split clearly; the Makefile encodes it via deps. + +**Touch list.** +- New: `scripts/prep-src.sh`. Orchestrates A0a (vendored copy, tcc flatten, + libc flatten, musl unpack+merge). +- New: `scripts/prep-musl.sh`. The A0b half (run/copy calibration, apply + filter, snapshot result into `src/musl/`). +- `scripts/boot{3,4,5,6}.sh`: delete the auto-invoke flatten/calibrate + blocks; rely on `build/<arch>/src/` being present. +- `scripts/lib-pipeline.sh`, `scripts/lib-runscm.sh`: add a + `stage_input_from_src <subpath> <bin|src>` helper that pulls from the + canonical tree. +- `scripts/boot{0..6}.sh`: every `pipeline_input` / `runscm_input` call + switches to the new helper. Inputs become declarative. +- `scripts/boot{4,5,6}-gen-runscm.sh`: paths emitted into `run.scm` use + the canonical layout. +- Retire `WORK_SUBPATH` env in `scripts/boot-build-p1pp.sh` and similar + ad-hoc input plumbing. + +**Validation.** `tree build/<arch>/src/` is reviewable in one sitting. +Re-running any boot stage twice produces byte-identical outputs. The +flatten/musl scripts are no longer triggered from within a boot stage. + +--- + +### A4. Makefile drives `bootN.sh` with path-based deps + +**Goal.** Outputs are targets. `make build/aarch64/seed/boot6/Image` +walks the chain: prep-src → boot0 → boot1 → … → boot6. + +**Rule shape.** +``` +build/$(ARCH)/$(DRIVER)/boot1/M1pp \ +build/$(ARCH)/$(DRIVER)/boot1/hex2pp: \ + build/$(ARCH)/$(DRIVER)/boot0/hex2 \ + build/$(ARCH)/$(DRIVER)/boot0/M0 \ + build/$(ARCH)/$(DRIVER)/boot0/catm \ + build/$(ARCH)/src/.stamp \ + scripts/boot1.sh scripts/lib-pipeline.sh scripts/lib-arch.sh + scripts/boot1.sh $(ARCH) +``` + +The `.stamp` files in `build/<arch>/src/` and each `build/<arch>/<driver>/boot{N}/` +are the make-rule pegs. Real outputs are listed as additional targets so +single-binary builds work. + +**Touch list.** +- `Makefile`: rewrite the body. Per-stage rule blocks: src-prep, boot0, + boot1, …, boot6. Each lists its real outputs and depends on the + previous stage's outputs plus the relevant scripts and library files. +- `Makefile`: drop `m1pp`/`hex2pp`/`scheme1`/`cc`/`tcc-boot2`/`tcc-tcc`/ + `tcc-tcc-tcc` aliases entirely. Path is the name. +- `Makefile`: drop dead vars (`TOOLS_CATM`); single tcc-version var. +- `Makefile`: hoist `cloc` (~14 lines) into `scripts/count-lines.sh`. +- `Makefile`: include `tests/Makefile` for test targets (see AX). +- `Makefile`: top-level convenience targets accept `DRIVER` and `ARCH`, + expand to the right path. `make all` builds everything. +- `BOOT6_TIMEOUT` env in `boot6.sh` (parity with boot3/4/5). +- Retire `boot-build-p1.sh` and `boot-build-p1pp.sh` if the boot-script + path subsumes them (it should — they're container-side helpers for + the old Makefile recipes). + +**Validation.** `make build/aarch64/podman/boot6/Image` from a clean +tree builds the full chain. `make build/aarch64/podman/boot1/M1pp` only +builds prep-src + boot0 + boot1. `touch scripts/boot4.sh && make +build/aarch64/podman/boot6/Image` rebuilds boot4–6 only. Output bytes +unchanged from pre-A4. + +--- + +### AT. tcc patches for uniformity + +**Goal.** Eliminate per-arch workarounds that exist only because tcc +0.9.26 is incomplete. After AT, every arch is on the same footing in +seed-kernel and the build pipeline. + +**In scope.** +- **amd64 `.quad` truncation in `gen_le64`.** Currently `arch/amd64/mmu.c` + encodes GDT entries as `.long lo, hi` pairs. Patch tcc's amd64 + assembler to accept full 64-bit immediates; revert the workaround. + (Existing memory note: `project_tcc_arm64_svcul_truncation.md` + describes a related arm64 issue — track separately or in the same + pass.) +- **`.note.*` SHT_NOTE.** tcc emits `.note.*` sections as PROGBITS, + forcing the post-link `seed-kernel/scripts/elf-pvh-note.c` tool to + rewrite the ELF for amd64 PVH boot. Patch tcc to recognize `.note.*` + → SHT_NOTE and emit a PT_NOTE phdr. Delete `elf-pvh-note.c` and the + amd64 boot6 fixup. +- **riscv64 inline-asm constraints / register-asm silent drop.** Existing + memory `project_tcc_inline_asm_silent_drop.md` documents tcc 0.9.26 + silently dropping register-asm constraints. If this affects any + riscv64 codepath in the kernel or musl build, fix it here. +- **Audit `seed-kernel/simple-patches/` (if any) and per-arch `arch.h` + externs vs inlines.** amd64 and riscv64 declare arch helpers as + externs (workaround for tcc inline-asm gaps) while aarch64 inlines + them. Once the asm-support patches above land, normalize to the + aarch64 style. + +**Out of scope (still tracked separately).** +- riscv64 boot0 stage-4 user-trap bug (`docs/SEED-RISCV64-TODO.md`) — + may or may not be tcc; investigate after AT. +- amd64 boot3+ validation under DRIVER=seed + (`docs/SEED-AMD64-TODO.md`) — compute-bound, not a tcc issue. + +**Touch list.** +- `vendor/tcc/…` (or wherever the tcc tree lives): the patches. +- `scripts/stage1-flatten.sh`: re-flatten with the new patches; commit + the resulting `tcc.flat.c` if it's vendored, otherwise just regenerate. +- `seed-kernel/arch/amd64/mmu.c`: revert `.quad` workarounds. +- `seed-kernel/arch/amd64/kernel.S`: reinstate native `.note.*` if the + workaround required emitting them differently. +- `seed-kernel/arch/{amd64,riscv64}/arch.h`: inline what was extern. +- `seed-kernel/arch/{amd64,riscv64}/{kernel.S,mmu.c}`: drop quirks. +- `seed-kernel/scripts/elf-pvh-note.c`: delete. +- `scripts/boot6.sh`, `scripts/boot6-gen-runscm.sh`: delete the + amd64-only post-link fixup block. +- `docs/TCC.md`: document the new tcc patch set; retire any + workaround-explanation sections that are now moot. +- Update memory entries `project_tcc_*` as patches eliminate the + recorded bugs. + +**Validation.** seed-kernel builds on all three arches with no +arch-conditional shell logic in boot6. `make build/<arch>/podman/boot6/Image` +produces a clean ELF without post-link fixup. All test suites still pass. + +--- + +### AX. tests/ Makefile + +**Goal.** Tests are a separate concern with their own Makefile. Top-level +build is decoupled from test dispatch. + +**Touch list.** +- New: `tests/Makefile`. Targets per suite (`test-cc`, `test-cc-libc`, + `test-cc-cg`, `test-cc-lex`, `test-cc-pp`, `test-cc-util`, + `test-M1pp`, `test-p1`, `test-scheme1`, `test-cc-ext`, `test-tcc-cc`, + `test-tcc-libc`) plus aggregate `test`. +- New: `tests/lib-runner.sh` (or fold into `scripts/lib-test.sh`). + Helpers: `discover_fixtures`, `run_diff_text`, `run_diff_bytes`, + `report_pass_fail`. Refactor `scripts/boot-run-tests.sh` (818 lines) + to call these. +- Top-level `Makefile`: `include tests/Makefile`. Drop the test-dispatch + branch (lines 680–732 currently). `make test` proxies to the tests + Makefile. +- Collapse `scripts/seed-accept{,-boot34,-boot5}.sh` into one + parameterized script under `tests/` (or `scripts/`, but called from + `tests/Makefile`). +- Test-naming consistency: pick one of `NNN-`, `NN-`, or unprefixed and + apply across `tests/cc`, `tests/cc-libc`, `tests/M1pp`, `tests/p1`. + Rename freely. +- Golden-file convention: pick one of `.expected` (stdout) / + `.expected-exit` (exit code) / `.expected-bytes` (binary). Rename + freely. +- New: `tests/README.md` documenting the per-suite contract. + +**Validation.** `make test` from repo root green on default arch. +`make -C tests test-cc` works in isolation. `tests/Makefile` is +self-contained. + +--- + +### A2. Audit manifest + hashes + +**Goal.** A single command produces `manifest.txt` + `sha256.txt` +covering the canonical source tree and every boot stage's outputs, with +provenance. + +**Notes.** +- The canonical source tree from A0 is the audit basis. A2 adds the + manifest and the hashes; the bytes are already there. +- For host-flattened tcc/libc: snapshot the flattened source (per + decision), record the flatten script's SHA in the manifest line so + the recipe is identified. +- One audit per `<arch>/<driver>` pair. Compare-drivers (A5) diffs two + manifests. + +**Touch list.** +- New: `scripts/build-audit.sh`. Walks `build/<arch>/src/` and + `build/<arch>/<driver>/boot{0..6}/`; emits + `build/<arch>/<driver>/audit/manifest.txt` (one line per file: + `<path> <stage> <role> <origin> <sha256>`) and + `build/<arch>/<driver>/audit/sha256.txt` (flat hash list). +- `Makefile`: `audit` target depending on prep-src + boot0..boot6 for + the active driver. + +**Validation.** `sha256sum -c build/<arch>/<driver>/audit/sha256.txt` +clean. Line counts of `manifest.txt` and `find` agree. + +--- + +### A5. `make compare-drivers` reproducibility target + +**Goal.** Mechanical proof: every `boot{N}/` output artifact is +byte-identical between `DRIVER=podman` and `DRIVER=seed`. + +**Recipe.** +1. `DRIVER=podman make build/$arch/podman/boot6/Image` +2. `DRIVER=seed make build/$arch/seed/boot6/Image` + (the seed driver consumes the podman-built kernel for its QEMU run) +3. `diff -r build/$arch/podman/boot{0..6}/ build/$arch/seed/boot{0..6}/` +4. `diff build/$arch/{podman,seed}/audit/manifest.txt` (provenance lines + may differ; hashes must match for shared artifacts). + +**Touch list.** +- New: `scripts/compare-drivers.sh`. +- `Makefile`: `compare-drivers` target wrapping it. + +**Pre-step: determinism audit.** Find and eliminate non-determinism +sources before declaring success. Likely suspects: +- ELF timestamps (any `__DATE__`/`__TIME__` macros in tcc/musl/kernel) +- cpio mtimes when packing initramfs (lib-pipeline.sh, lib-runscm.sh, + seed-kernel build) — pin to epoch 0 or fixed value +- File-iteration order in shell glob enumeration (boot5 musl source + walk is the main risk; `LC_ALL=C sort` everything) +- Any `find -printf` ordering, tar mtimes, ar mtimes + +**Validation.** `make compare-drivers ARCH=aarch64` exits 0. Per arch, +report the highest stage that converges (amd64/riscv64 may be partial +until the live blockers in `docs/SEED-{AMD64,RISCV64}-TODO.md` are +cleared). + +--- + +## B. Polish (independent, do anytime) + +### Docs +- `README.md`: arch matrix; DRIVER explanation; pointer to + `scripts/boot.sh` and `make build/<arch>/<driver>/boot6/Image`; + test-suite section. +- `docs/SEED-AMD64-TODO.md`: prune items completed by 799eba0; keep + amd64 boot3+ validation, fixed-point check, run-tests under seed. +- `docs/SEED-RISCV64-TODO.md`: prune items completed by e4bfcde; the + boot0 stage-4 user-trap blocker stays. +- `docs/SEED-VIRTIO-BLK.md`: banner marking it as design notes, not + pipeline state. +- Audit `docs/OS.md`, `docs/LIBC.md`, `docs/LIBC.txt`, + `docs/TCC-ARM64-ASM.md` for staleness vs current code. +- `cc/cc.scm.md` → `docs/CC-IMPL.md` (or fold into `docs/CC.md`). +- `docs/TCC.md`: rewrite around the AT patch set. + +### kernel.c +- DTB / cpio / hvm_start_info magics → named constants with one-line + comments. +- amd64 PVH `hvm_start_info` path: inline the one-sentence ABI note + from SEED-AMD64-TODO. +- `sys_unlinkat` flags param: two-line guard or a comment. + +### bootN.sh polish (mostly folded into A6, A0, A4) +- `boot4.sh` `TCC_BOOTSTRAP_RELAX_FIXEDPOINT`: surface in + `boot.sh --help`. +- Standardize argument convention: positional `$ARCH` only; + env-vars only for tunables (`DRIVER`, `BOOT*_TIMEOUT`, etc.). + +--- + +## Validation gates + +After each phase: +1. `make test` (all suites) green on the default arch. +2. `scripts/boot.sh aarch64` end-to-end clean on both drivers (or the + equivalent `make build/aarch64/<driver>/boot6/Image` after A4). +3. Diff `build/aarch64/<driver>/boot{0..6}/` trees against the previous + phase's outputs — phases A6, A3, A0, A4, AX must be no-ops for + output bytes. AT changes outputs (intentionally — workarounds gone). + A2 adds the audit tree. A5 is the proof. + +After all phases: +- `make compare-drivers ARCH=aarch64` exits 0. +- `make compare-drivers ARCH=amd64` and `ARCH=riscv64` report progress + up to the current seed-driver completion line. +- `tree build/<arch>/src/` is the audit basis: one tree, one read.