boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

Cleanup & Audit Plan

The mechanical goal is met: hex0-seed → … → tcc3 → musl → seed-kernel runs on {aarch64, amd64, riscv64} via two drivers (podman, seed). This plan covers the next pass: making the result auditable and uniform.

Decisions

  1. Output layout: build/<arch>/<driver>/...
  2. Makefile vs bootN scripts: replace the parallel Makefile recipes (make m1pp, make hex2pp, make scheme1, etc.) with rules that drive the bootN.sh scripts. One pipeline, one layout.
  3. Reproducibility criterion: every boot{N}/ output artifact must be byte-identical between DRIVER=podman and DRIVER=seed, per arch.
  4. Source preparation is a separate up-front host stage. All flattening, patching, unpacking, calibration happens once into a canonical generated source tree. Boot stages copy/reference from it; they do no source prep themselves.
  5. Path-based Makefile deps. make build/aarch64/seed/boot6/Image walks the dependency chain — no separate boot0/boot1/… phony targets driving stages. Outputs are the targets.
  6. Tests live in their own Makefile (tests/Makefile), invoked from the top level but not commingled with the build pipeline.
  7. Consistency over backcompat. Patch, refactor, rename, rewrite as needed. tcc patches are in scope when they buy uniformity.
  8. Phase order: A6 → A3 → A0 → A4 → AT → AX → A2 → A5. Each phase lands on a clean base from the previous one. AT (tcc patches) and AX (tests Makefile) are largely independent of the others and can slot in wherever convenient.

Phases

[DONE] A6. Hoist driver/arch boilerplate

Goal. One source of truth for arch→platform mapping, driver dispatch, prereq checks, and log prefixes. Stage-N scripts shrink to the parts that are actually stage-specific.

Touch list.

Validation. scripts/boot.sh aarch64 end-to-end on both drivers.


A3. Per-driver output trees

Goal. build/<arch>/<driver>/... everywhere. Kill the build/.seed-bootstrap/ shuffle in boot.sh. Two drivers can coexist on disk; nothing gets clobbered when you switch.

Touch list.

Validation. Both DRIVER=podman scripts/boot.sh aarch64 and DRIVER=seed scripts/boot.sh aarch64 produce coherent trees side by side.


A0. Canonical generated source tree (host source-prep)

Goal. All host-side source preparation happens once, up front, into a single canonical tree at build/<arch>/src/. This tree is the audit basis and the only thing boot stages read for source. Boot stages do no flattening, no unpacking, no patching, no calibration.

Layout.

build/<arch>/src/
  bin/                       # binary inputs that aren't built by a stage
    hex0-seed                #   (vendored seed only)
  src/                       # everything textual
    vendor-seed/             #   ELF.hex2, *.hex0, *.hex1, *.hex2 (vendored)
    M1pp/                    #   M1pp.P1
    hex2pp/                  #   hex2pp.P1
    P1/                      #   P1.M1pp, P1-<arch>.M1, P1-<arch>.M1pp,
                             #   P1pp.P1pp, entry-*.P1pp, elf-end.P1pp
    catm/                    #   catm.P1pp
    scheme1/                 #   scheme1.P1pp, prelude.scm
    cc/                      #   cc.scm, main.scm
    tcc/                     #   tcc.flat.c, stdarg-bridge.h,
                             #   tcc-0.9.26-…/{include,lib}/ tree
    libc/                    #   libc.flat.c (mes-libc flattened)
    musl/                    #   filtered musl-1.2.5 tree (overrides
                             #   merged, deletes applied), per-arch
                             #   skip-list, generated alltypes.h /
                             #   syscall.h
    kernel/                  #   seed-kernel kernel.c, arch/<arch>/*,
                             #   user/*
    test-fixtures/           #   only the slice tests need at runtime
                             #   (host runner reads tests/ directly)

The bin/ vs src/ split is established here; downstream stages inherit it (their $STAGE/in/{bin,src}/ is a copy-or-symlink view of the right slice of this tree).

What moves into A0.

Calibration ordering. boot5-calibrate.sh needs a working tcc3, so the canonical src tree is built in two passes: A0a (everything that doesn't need a compiler) before boot0, A0b (musl filter using the calibration result) after boot4. Both write into build/<arch>/src/. Document the split clearly; the Makefile encodes it via deps.

Touch list.

Validation. tree build/<arch>/src/ is reviewable in one sitting. Re-running any boot stage twice produces byte-identical outputs. The flatten/musl scripts are no longer triggered from within a boot stage.


A4. Makefile drives bootN.sh with path-based deps

Goal. Outputs are targets. make build/aarch64/seed/boot6/Image walks the chain: prep-src → boot0 → boot1 → … → boot6.

Rule shape.

build/$(ARCH)/$(DRIVER)/boot1/M1pp \
build/$(ARCH)/$(DRIVER)/boot1/hex2pp: \
    build/$(ARCH)/$(DRIVER)/boot0/hex2 \
    build/$(ARCH)/$(DRIVER)/boot0/M0 \
    build/$(ARCH)/$(DRIVER)/boot0/catm \
    build/$(ARCH)/src/.stamp \
    scripts/boot1.sh scripts/lib-pipeline.sh scripts/lib-arch.sh
    scripts/boot1.sh $(ARCH)

The .stamp files in build/<arch>/src/ and each build/<arch>/<driver>/boot{N}/ are the make-rule pegs. Real outputs are listed as additional targets so single-binary builds work.

Touch list.

Validation. make build/aarch64/podman/boot6/Image from a clean tree builds the full chain. make build/aarch64/podman/boot1/M1pp only builds prep-src + boot0 + boot1. touch scripts/boot4.sh && make build/aarch64/podman/boot6/Image rebuilds boot4–6 only. Output bytes unchanged from pre-A4.


AT. tcc patches for uniformity

Goal. Eliminate per-arch workarounds that exist only because tcc 0.9.26 is incomplete. After AT, every arch is on the same footing in seed-kernel and the build pipeline.

In scope.

Out of scope (still tracked separately).

Touch list.

Validation. seed-kernel builds on all three arches with no arch-conditional shell logic in boot6. make build/<arch>/podman/boot6/Image produces a clean ELF without post-link fixup. All test suites still pass.


AX. tests/ Makefile

Goal. Tests are a separate concern with their own Makefile. Top-level build is decoupled from test dispatch.

Touch list.

Validation. make test from repo root green on default arch. make -C tests test-cc works in isolation. tests/Makefile is self-contained.


A2. Audit manifest + hashes

Goal. A single command produces manifest.txt + sha256.txt covering the canonical source tree and every boot stage's outputs, with provenance.

Notes.

Touch list.

Validation. sha256sum -c build/<arch>/<driver>/audit/sha256.txt clean. Line counts of manifest.txt and find agree.


A5. make compare-drivers reproducibility target

Goal. Mechanical proof: every boot{N}/ output artifact is byte-identical between DRIVER=podman and DRIVER=seed.

Recipe.

  1. DRIVER=podman make build/$arch/podman/boot6/Image
  2. DRIVER=seed make build/$arch/seed/boot6/Image (the seed driver consumes the podman-built kernel for its QEMU run)
  3. diff -r build/$arch/podman/boot{0..6}/ build/$arch/seed/boot{0..6}/
  4. diff build/$arch/{podman,seed}/audit/manifest.txt (provenance lines may differ; hashes must match for shared artifacts).

Touch list.

Pre-step: determinism audit. Find and eliminate non-determinism sources before declaring success. Likely suspects:

Validation. make compare-drivers ARCH=aarch64 exits 0. Per arch, report the highest stage that converges (amd64/riscv64 may be partial until the live blockers in docs/SEED-{AMD64,RISCV64}-TODO.md are cleared).


B. Polish (independent, do anytime)

Docs

kernel.c

bootN.sh polish (mostly folded into A6, A0, A4)


Validation gates

After each phase:

  1. make test (all suites) green on the default arch.
  2. scripts/boot.sh aarch64 end-to-end clean on both drivers (or the equivalent make build/aarch64/<driver>/boot6/Image after A4).
  3. Diff build/aarch64/<driver>/boot{0..6}/ trees against the previous phase's outputs — phases A6, A3, A0, A4, AX must be no-ops for output bytes. AT changes outputs (intentionally — workarounds gone). A2 adds the audit tree. A5 is the proof.

After all phases: