Cleanup & Audit Plan
The mechanical goal is met: hex0-seed → … → tcc3 → musl → seed-kernel runs on {aarch64, amd64, riscv64} via two drivers (podman, seed). This plan covers the next pass: making the result auditable and uniform.
Decisions
- Output layout:
build/<arch>/<driver>/... - Makefile vs bootN scripts: replace the parallel Makefile recipes
(
make m1pp,make hex2pp,make scheme1, etc.) with rules that drive thebootN.shscripts. One pipeline, one layout. - Reproducibility criterion: every
boot{N}/output artifact must be byte-identical betweenDRIVER=podmanandDRIVER=seed, per arch. - Source preparation is a separate up-front host stage. All flattening, patching, unpacking, calibration happens once into a canonical generated source tree. Boot stages copy/reference from it; they do no source prep themselves.
- Path-based Makefile deps.
make build/aarch64/seed/boot6/Imagewalks the dependency chain — no separateboot0/boot1/… phony targets driving stages. Outputs are the targets. - Tests live in their own Makefile (
tests/Makefile), invoked from the top level but not commingled with the build pipeline. - Consistency over backcompat. Patch, refactor, rename, rewrite as needed. tcc patches are in scope when they buy uniformity.
- Phase order: A6 → A3 → A0 → A4 → AT → AX → A2 → A5. Each phase lands on a clean base from the previous one. AT (tcc patches) and AX (tests Makefile) are largely independent of the others and can slot in wherever convenient.
Phases
[DONE] A6. Hoist driver/arch boilerplate
Goal. One source of truth for arch→platform mapping, driver dispatch, prereq checks, and log prefixes. Stage-N scripts shrink to the parts that are actually stage-specific.
Touch list.
- New:
scripts/lib-arch.sh. ExportsPLATFORM,KERNEL_NAME,KERNEL_IMAGE,MUSL_ARCHfrom$ARCH. Single source. scripts/lib-pipeline.sh,scripts/lib-runscm.sh: sourcelib-arch.shin_init_*; pullcase "$DRIVER"validation + image build / kernel check up here.scripts/boot{0..6}.sh: delete duplicatedcase $ARCHandcase $DRIVERblocks; delete prereq[ -x "$BOOT(N-1)/x" ] || dieblocks; replace withrequire_prev hex2 catm M0style helper.- All scripts: standardize log prefix to
[bootN/$driver/$arch]. scripts/boot.sh: same.
Validation. scripts/boot.sh aarch64 end-to-end on both drivers.
A3. Per-driver output trees
Goal. build/<arch>/<driver>/... everywhere. Kill the
build/.seed-bootstrap/ shuffle in boot.sh. Two drivers can coexist on
disk; nothing gets clobbered when you switch.
Touch list.
scripts/lib-pipeline.sh,scripts/lib-runscm.sh: changeOUT/STAGEderivation to insert$DRIVER.scripts/boot{0..6}.sh: everyBOOT(N-1)=build/$ARCH/boot(N-1)→build/$ARCH/$DRIVER/boot(N-1).scripts/boot.sh: drop the.seed-bootstrapstash/restore. The boot6 kernel that the seed driver uses now lives atbuild/<arch>/podman/boot6/{Image|kernel.elf}and is referenced directly when running seed.seed-kernel/run.sh: kernel-path arg / env still works; just point callers at the new location.Makefile: everyOUT_DIR := build/$(ARCH)→ include$(DRIVER)..gitignoreif it pinsbuild/<arch>/.- Top-level
Makefilecomment block (output-layout doc).
Validation. Both DRIVER=podman scripts/boot.sh aarch64 and
DRIVER=seed scripts/boot.sh aarch64 produce coherent trees side by side.
A0. Canonical generated source tree (host source-prep)
Goal. All host-side source preparation happens once, up front, into a
single canonical tree at build/<arch>/src/. This tree is the audit
basis and the only thing boot stages read for source. Boot stages do no
flattening, no unpacking, no patching, no calibration.
Layout.
build/<arch>/src/
bin/ # binary inputs that aren't built by a stage
hex0-seed # (vendored seed only)
src/ # everything textual
vendor-seed/ # ELF.hex2, *.hex0, *.hex1, *.hex2 (vendored)
M1pp/ # M1pp.P1
hex2pp/ # hex2pp.P1
P1/ # P1.M1pp, P1-<arch>.M1, P1-<arch>.M1pp,
# P1pp.P1pp, entry-*.P1pp, elf-end.P1pp
catm/ # catm.P1pp
scheme1/ # scheme1.P1pp, prelude.scm
cc/ # cc.scm, main.scm
tcc/ # tcc.flat.c, stdarg-bridge.h,
# tcc-0.9.26-…/{include,lib}/ tree
libc/ # libc.flat.c (mes-libc flattened)
musl/ # filtered musl-1.2.5 tree (overrides
# merged, deletes applied), per-arch
# skip-list, generated alltypes.h /
# syscall.h
kernel/ # seed-kernel kernel.c, arch/<arch>/*,
# user/*
test-fixtures/ # only the slice tests need at runtime
# (host runner reads tests/ directly)
The bin/ vs src/ split is established here; downstream stages
inherit it (their $STAGE/in/{bin,src}/ is a copy-or-symlink view of
the right slice of this tree).
What moves into A0.
scripts/stage1-flatten.sh(tcc flatten) — runs once, output tobuild/<arch>/src/src/tcc/.scripts/libc-flatten.sh(mes-libc flatten) — runs once, output tobuild/<arch>/src/src/libc/.scripts/musl-vendor.sh(musl unpack + overrides + deletes) — runs once, output tobuild/<arch>/src/src/musl/.scripts/boot5-calibrate.sh(musl skip-list per arch) — runs once, output as part ofsrc/musl/skip.txt. If skip-list is committed upstream (vendor/upstream/musl-1.2.5-skip-<arch>.txt), copy it; else run calibration (which itself depends on boot4 — see below).- All the per-stage
.stage/in/materialization moves out oflib-pipeline.sh/lib-runscm.shand becomes a thin "copy frombuild/<arch>/src/" step.
Calibration ordering. boot5-calibrate.sh needs a working tcc3, so
the canonical src tree is built in two passes: A0a (everything that
doesn't need a compiler) before boot0, A0b (musl filter using the
calibration result) after boot4. Both write into build/<arch>/src/.
Document the split clearly; the Makefile encodes it via deps.
Touch list.
- New:
scripts/prep-src.sh. Orchestrates A0a (vendored copy, tcc flatten, libc flatten, musl unpack+merge). - New:
scripts/prep-musl.sh. The A0b half (run/copy calibration, apply filter, snapshot result intosrc/musl/). scripts/boot{3,4,5,6}.sh: delete the auto-invoke flatten/calibrate blocks; rely onbuild/<arch>/src/being present.scripts/lib-pipeline.sh,scripts/lib-runscm.sh: add astage_input_from_src <subpath> <bin|src>helper that pulls from the canonical tree.scripts/boot{0..6}.sh: everypipeline_input/runscm_inputcall switches to the new helper. Inputs become declarative.scripts/boot{4,5,6}-gen-runscm.sh: paths emitted intorun.scmuse the canonical layout.- Retire
WORK_SUBPATHenv inscripts/boot-build-p1pp.shand similar ad-hoc input plumbing.
Validation. tree build/<arch>/src/ is reviewable in one sitting.
Re-running any boot stage twice produces byte-identical outputs. The
flatten/musl scripts are no longer triggered from within a boot stage.
A4. Makefile drives bootN.sh with path-based deps
Goal. Outputs are targets. make build/aarch64/seed/boot6/Image
walks the chain: prep-src → boot0 → boot1 → … → boot6.
Rule shape.
build/$(ARCH)/$(DRIVER)/boot1/M1pp \
build/$(ARCH)/$(DRIVER)/boot1/hex2pp: \
build/$(ARCH)/$(DRIVER)/boot0/hex2 \
build/$(ARCH)/$(DRIVER)/boot0/M0 \
build/$(ARCH)/$(DRIVER)/boot0/catm \
build/$(ARCH)/src/.stamp \
scripts/boot1.sh scripts/lib-pipeline.sh scripts/lib-arch.sh
scripts/boot1.sh $(ARCH)
The .stamp files in build/<arch>/src/ and each build/<arch>/<driver>/boot{N}/
are the make-rule pegs. Real outputs are listed as additional targets so
single-binary builds work.
Touch list.
Makefile: rewrite the body. Per-stage rule blocks: src-prep, boot0, boot1, …, boot6. Each lists its real outputs and depends on the previous stage's outputs plus the relevant scripts and library files.Makefile: dropm1pp/hex2pp/scheme1/cc/tcc-boot2/tcc-tcc/tcc-tcc-tccaliases entirely. Path is the name.Makefile: drop dead vars (TOOLS_CATM); single tcc-version var.Makefile: hoistcloc(~14 lines) intoscripts/count-lines.sh.Makefile: includetests/Makefilefor test targets (see AX).Makefile: top-level convenience targets acceptDRIVERandARCH, expand to the right path.make allbuilds everything.BOOT6_TIMEOUTenv inboot6.sh(parity with boot3/4/5).- Retire
boot-build-p1.shandboot-build-p1pp.shif the boot-script path subsumes them (it should — they're container-side helpers for the old Makefile recipes).
Validation. make build/aarch64/podman/boot6/Image from a clean
tree builds the full chain. make build/aarch64/podman/boot1/M1pp only
builds prep-src + boot0 + boot1. touch scripts/boot4.sh && make build/aarch64/podman/boot6/Image rebuilds boot4–6 only. Output bytes
unchanged from pre-A4.
AT. tcc patches for uniformity
Goal. Eliminate per-arch workarounds that exist only because tcc 0.9.26 is incomplete. After AT, every arch is on the same footing in seed-kernel and the build pipeline.
In scope.
- amd64
.quadtruncation ingen_le64. Currentlyarch/amd64/mmu.cencodes GDT entries as.long lo, hipairs. Patch tcc's amd64 assembler to accept full 64-bit immediates; revert the workaround. (Existing memory note:project_tcc_arm64_svcul_truncation.mddescribes a related arm64 issue — track separately or in the same pass.) .note.*SHT_NOTE. tcc emits.note.*sections as PROGBITS, forcing the post-linkseed-kernel/scripts/elf-pvh-note.ctool to rewrite the ELF for amd64 PVH boot. Patch tcc to recognize.note.*→ SHT_NOTE and emit a PT_NOTE phdr. Deleteelf-pvh-note.cand the amd64 boot6 fixup.- riscv64 inline-asm constraints / register-asm silent drop. Existing
memory
project_tcc_inline_asm_silent_drop.mddocuments tcc 0.9.26 silently dropping register-asm constraints. If this affects any riscv64 codepath in the kernel or musl build, fix it here. - Audit
seed-kernel/simple-patches/(if any) and per-archarch.hexterns vs inlines. amd64 and riscv64 declare arch helpers as externs (workaround for tcc inline-asm gaps) while aarch64 inlines them. Once the asm-support patches above land, normalize to the aarch64 style.
Out of scope (still tracked separately).
- riscv64 boot0 stage-4 user-trap bug (
docs/SEED-RISCV64-TODO.md) — may or may not be tcc; investigate after AT. - amd64 boot3+ validation under DRIVER=seed
(
docs/SEED-AMD64-TODO.md) — compute-bound, not a tcc issue.
Touch list.
vendor/tcc/…(or wherever the tcc tree lives): the patches.scripts/stage1-flatten.sh: re-flatten with the new patches; commit the resultingtcc.flat.cif it's vendored, otherwise just regenerate.seed-kernel/arch/amd64/mmu.c: revert.quadworkarounds.seed-kernel/arch/amd64/kernel.S: reinstate native.note.*if the workaround required emitting them differently.seed-kernel/arch/{amd64,riscv64}/arch.h: inline what was extern.seed-kernel/arch/{amd64,riscv64}/{kernel.S,mmu.c}: drop quirks.seed-kernel/scripts/elf-pvh-note.c: delete.scripts/boot6.sh,scripts/boot6-gen-runscm.sh: delete the amd64-only post-link fixup block.docs/TCC.md: document the new tcc patch set; retire any workaround-explanation sections that are now moot.- Update memory entries
project_tcc_*as patches eliminate the recorded bugs.
Validation. seed-kernel builds on all three arches with no
arch-conditional shell logic in boot6. make build/<arch>/podman/boot6/Image
produces a clean ELF without post-link fixup. All test suites still pass.
AX. tests/ Makefile
Goal. Tests are a separate concern with their own Makefile. Top-level build is decoupled from test dispatch.
Touch list.
- New:
tests/Makefile. Targets per suite (test-cc,test-cc-libc,test-cc-cg,test-cc-lex,test-cc-pp,test-cc-util,test-M1pp,test-p1,test-scheme1,test-cc-ext,test-tcc-cc,test-tcc-libc) plus aggregatetest. - New:
tests/lib-runner.sh(or fold intoscripts/lib-test.sh). Helpers:discover_fixtures,run_diff_text,run_diff_bytes,report_pass_fail. Refactorscripts/boot-run-tests.sh(818 lines) to call these. - Top-level
Makefile:include tests/Makefile. Drop the test-dispatch branch (lines 680–732 currently).make testproxies to the tests Makefile. - Collapse
scripts/seed-accept{,-boot34,-boot5}.shinto one parameterized script undertests/(orscripts/, but called fromtests/Makefile). - Test-naming consistency: pick one of
NNN-,NN-, or unprefixed and apply acrosstests/cc,tests/cc-libc,tests/M1pp,tests/p1. Rename freely. - Golden-file convention: pick one of
.expected(stdout) /.expected-exit(exit code) /.expected-bytes(binary). Rename freely. - New:
tests/README.mddocumenting the per-suite contract.
Validation. make test from repo root green on default arch.
make -C tests test-cc works in isolation. tests/Makefile is
self-contained.
A2. Audit manifest + hashes
Goal. A single command produces manifest.txt + sha256.txt
covering the canonical source tree and every boot stage's outputs, with
provenance.
Notes.
- The canonical source tree from A0 is the audit basis. A2 adds the manifest and the hashes; the bytes are already there.
- For host-flattened tcc/libc: snapshot the flattened source (per decision), record the flatten script's SHA in the manifest line so the recipe is identified.
- One audit per
<arch>/<driver>pair. Compare-drivers (A5) diffs two manifests.
Touch list.
- New:
scripts/build-audit.sh. Walksbuild/<arch>/src/andbuild/<arch>/<driver>/boot{0..6}/; emitsbuild/<arch>/<driver>/audit/manifest.txt(one line per file:<path> <stage> <role> <origin> <sha256>) andbuild/<arch>/<driver>/audit/sha256.txt(flat hash list). Makefile:audittarget depending on prep-src + boot0..boot6 for the active driver.
Validation. sha256sum -c build/<arch>/<driver>/audit/sha256.txt
clean. Line counts of manifest.txt and find agree.
A5. make compare-drivers reproducibility target
Goal. Mechanical proof: every boot{N}/ output artifact is
byte-identical between DRIVER=podman and DRIVER=seed.
Recipe.
DRIVER=podman make build/$arch/podman/boot6/ImageDRIVER=seed make build/$arch/seed/boot6/Image(the seed driver consumes the podman-built kernel for its QEMU run)diff -r build/$arch/podman/boot{0..6}/ build/$arch/seed/boot{0..6}/diff build/$arch/{podman,seed}/audit/manifest.txt(provenance lines may differ; hashes must match for shared artifacts).
Touch list.
- New:
scripts/compare-drivers.sh. Makefile:compare-driverstarget wrapping it.
Pre-step: determinism audit. Find and eliminate non-determinism sources before declaring success. Likely suspects:
- ELF timestamps (any
__DATE__/__TIME__macros in tcc/musl/kernel) - cpio mtimes when packing initramfs (lib-pipeline.sh, lib-runscm.sh, seed-kernel build) — pin to epoch 0 or fixed value
- File-iteration order in shell glob enumeration (boot5 musl source
walk is the main risk;
LC_ALL=C sorteverything) - Any
find -printfordering, tar mtimes, ar mtimes
Validation. make compare-drivers ARCH=aarch64 exits 0. Per arch,
report the highest stage that converges (amd64/riscv64 may be partial
until the live blockers in docs/SEED-{AMD64,RISCV64}-TODO.md are
cleared).
B. Polish (independent, do anytime)
Docs
README.md: arch matrix; DRIVER explanation; pointer toscripts/boot.shandmake build/<arch>/<driver>/boot6/Image; test-suite section.docs/SEED-AMD64-TODO.md: prune items completed by 799eba0; keep amd64 boot3+ validation, fixed-point check, run-tests under seed.docs/SEED-RISCV64-TODO.md: prune items completed by e4bfcde; the boot0 stage-4 user-trap blocker stays.docs/SEED-VIRTIO-BLK.md: banner marking it as design notes, not pipeline state.- Audit
docs/OS.md,docs/LIBC.md,docs/LIBC.txt,docs/TCC-ARM64-ASM.mdfor staleness vs current code. cc/cc.scm.md→docs/CC-IMPL.md(or fold intodocs/CC.md).docs/TCC.md: rewrite around the AT patch set.
kernel.c
- DTB / cpio / hvm_start_info magics → named constants with one-line comments.
- amd64 PVH
hvm_start_infopath: inline the one-sentence ABI note from SEED-AMD64-TODO. sys_unlinkatflags param: two-line guard or a comment.
bootN.sh polish (mostly folded into A6, A0, A4)
boot4.shTCC_BOOTSTRAP_RELAX_FIXEDPOINT: surface inboot.sh --help.- Standardize argument convention: positional
$ARCHonly; env-vars only for tunables (DRIVER,BOOT*_TIMEOUT, etc.).
Validation gates
After each phase:
make test(all suites) green on the default arch.scripts/boot.sh aarch64end-to-end clean on both drivers (or the equivalentmake build/aarch64/<driver>/boot6/Imageafter A4).- Diff
build/aarch64/<driver>/boot{0..6}/trees against the previous phase's outputs — phases A6, A3, A0, A4, AX must be no-ops for output bytes. AT changes outputs (intentionally — workarounds gone). A2 adds the audit tree. A5 is the proof.
After all phases:
make compare-drivers ARCH=aarch64exits 0.make compare-drivers ARCH=amd64andARCH=riscv64report progress up to the current seed-driver completion line.tree build/<arch>/src/is the audit basis: one tree, one read.