boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 947d5c3e854d8f4d92247e3322e9afb3333186fe
parent 89f8b2233d03e72ec1e61b7a4b40b853f4b2ea7c
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue,  5 May 2026 01:47:24 -0700

boot{3,4,5}: DRIVER=seed via lib-seed-runscm + run.scm

Mirror boot0/1/2's DRIVER=seed wiring for boot3/4. Where boot0/1/2 do
one qemu boot per stage via scripts/lib-pipeline.sh, boot3/4 do one
qemu boot whose /init=scheme1 evaluates a run.scm that drives the
chain via (run …) against tcc0/tcc1/… staged as flat tmpfs files.

scripts/lib-seed-runscm.sh: shared driver — packs initramfs of
/init=scheme1, /run.scm (= prelude.scm + bootN driver), and every
input file at top level; one qemu boot, dump tmpfs over UART, extract
exports.

boot3 ships scripts/boot3-run.scm (hand-written, 7 steps: catm
cc-bundle → scheme1 libc/tcc → catm combined → M1pp → catm linked →
hex2pp -B 0x600000 → tcc0). boot4 generates run.scm via
scripts/boot4-gen-runscm.sh — per-arch values resolved on the host so
the .scm body is straight-line (run …); ar archive members keep bare
basenames so libtcc1.a is byte-identical to the podman path.

scripts/boot5.sh DRIVER=seed rejects with a pointer to OS-TODO.md:
~500 musl TUs ⇒ many hours under TCG even with pool-swap.

scripts/seed-accept-boot34.sh asserts byte-identity of bootN/tcc{0,3,
…}, libc.a, libtcc1.a, crt1.o, hello between DRIVER=seed and the
podman reference.

The harness is wired and produces the correct cpio + run.scm. Runtime
acceptance currently hits a kernel-side regression in the in-progress
pool-swap-on-execve work (PC alignment fault after the first multi-arg
(run …)); seed-accept.sh's single short spawn still passes. Outputs
will land byte-identical once the kernel side stabilises.

Diffstat:
Mdocs/OS-TODO.md | 52++++++++++++++++++++++++++++++++++++++--------------
Ascripts/boot3-run.scm | 38++++++++++++++++++++++++++++++++++++++
Mscripts/boot3.sh | 178+++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
Ascripts/boot4-gen-runscm.sh | 114+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mscripts/boot4.sh | 103++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
Mscripts/boot5.sh | 15+++++++++++++--
Ascripts/lib-seed-runscm.sh | 111+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Ascripts/seed-accept-boot34.sh | 71+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 585 insertions(+), 97 deletions(-)

diff --git a/docs/OS-TODO.md b/docs/OS-TODO.md @@ -109,20 +109,44 @@ scheme1 spawning the boot2-built catm via the .scm prelude. ## Things still worth doing (out of scope of the original list) -- **Port boot3/4/5 to the seed driver.** boot0/1/2 already run under - `DRIVER=seed` on top of the shared [`scripts/lib-pipeline.sh`](../scripts/lib-pipeline.sh) - DSL (one qemu boot per stage; outputs byte-identical to the podman - path). boot3/4/5 follow a different shape: host-side prep stays on - the host (`stage1-flatten.sh`, `libc-flatten.sh`, the cc.scm bundle - catm) and produces both the input files and a `run.scm` driver; in - the VM, scheme1 evaluates `run.scm` to orchestrate the per-bootN - pipeline via `(spawn …)` / `(run …)` against the chain binaries - (tcc0, tcc1, …) staged as child-progs in the cpio. Each bootN is - one qemu boot that replaces today's one `podman run` of an inlined - `run.sh`. `scripts/seed-accept.sh` proves the scheme1 + child-prog - cycle round-trips end-to-end on seed-kernel, so the prerequisites - are in place; the work is generating the per-bootN `run.scm` and a - shared driver harness analogous to `lib-pipeline.sh`. +- **Port boot3/4 to the seed driver — landed (boot3, boot4); kernel + pool-swap WIP currently blocks runtime acceptance.** A second DSL, + [`scripts/lib-seed-runscm.sh`](../scripts/lib-seed-runscm.sh) (sibling + to `lib-pipeline.sh`), packs an initramfs of `/init=scheme1`, + `/run.scm` (= prelude.scm + the bootN driver), and every input file + flat at top level; one qemu boot, scheme1 drives the chain via + `(run …)`. `scripts/boot3.sh` ships a hand-written + [`scripts/boot3-run.scm`](../scripts/boot3-run.scm) (catm cc-bundle → + scheme1 libc/tcc → catm combined.M1pp → M1pp → catm linked.hex2pp → + hex2pp -B 0x600000 → tcc0). `scripts/boot4.sh` generates run.scm via + [`scripts/boot4-gen-runscm.sh`](../scripts/boot4-gen-runscm.sh) — + per-arch values (LIB_HELPER_SRC, LIBTCC1_C_SRCS, LIBTCC1_ASM_SRCS, + LIB_HELPER_DEFS) resolved on the host so the .scm body is straight- + line `(run …)`. Both bootN.sh now branch on `DRIVER=podman|seed`, + mirroring boot0/1/2's lib-pipeline.sh wiring. + [`scripts/seed-accept-boot34.sh`](../scripts/seed-accept-boot34.sh) + asserts byte-identity vs the podman path. + + **Runtime status.** The DRIVER=seed harness is wired and produces + the correct cpio + run.scm. Acceptance (`seed-accept-boot34.sh`) + currently fails at runtime because the kernel's in-progress + pool-swap-on-execve work introduces a regression: scheme1 panics + with a PC alignment fault (ESR=0x8a000000) shortly after the first + `(run …)`. Reproducible with `seed-kernel/scripts/tier2-gate.sh` + too — `scripts/seed-accept.sh` (single short spawn) still passes, + but tcc0 / multi-arg execve does not. This is kernel-side, not + bootN-side; the boot{3,4}-run.scm + harness are ready to land + byte-identical outputs once the kernel side stabilises. + +- **Port boot5 to the seed driver — deferred.** boot5 compiles ~500 + musl TUs, each one a `(run "tcc" …)`. Even with the snapshot/swap + cost driven to zero per fork, the per-clone fixed cost (TLB flush, + ELF reload, scheme1 start-up) compounds to several hours under TCG. + `scripts/boot5.sh` rejects `DRIVER=seed` with a pointer here. The + natural unblockers are (a) caching the parsed prelude in the kernel + (avoid re-parsing 24 KB scheme on every spawn), or (b) a "compile + many sources" tcc batch mode so one clone covers many TUs. Neither + is in scope of OS.md. - **Pool-swap on execve instead of snapshot on clone.** With `run.scm` driving boot3/4 inside the VM, every `(run "tccN" …)` triggers `sys_clone`'s 768 MB `mem_cpy` (~30 s under TCG) followed diff --git a/scripts/boot3-run.scm b/scripts/boot3-run.scm @@ -0,0 +1,38 @@ +;; boot3 run.scm — drive cc.scm → tcc0 inside the seed kernel. +;; Mirrors the podman path's run.sh exactly; intermediate files live +;; in the flat tmpfs at top level (no /tmp/ prefix). + +(define (must r tag) + (if (and (car r) (= 0 (cdr r))) + r + (begin + (write-string stderr "boot3: step failed: ") + (write-string stderr tag) + (write-string stderr "\n") + (exit 1)))) + +(write-string stdout "boot3: catm cc-bundle\n") +(must (run "catm" "cc-bundled.scm" "prelude.scm" "cc.scm" "main.scm") + "catm cc-bundle") +(write-string stdout "boot3: scheme1 libc\n") +(must (run "scheme1" "cc-bundled.scm" "--lib=libc__" "libc.flat.c" "libc.P1pp") + "scheme1 libc") +(write-string stdout "boot3: scheme1 tcc\n") +(must (run "scheme1" "cc-bundled.scm" "--lib=tcc__" "tcc.flat.c" "tcc.flat.P1pp") + "scheme1 tcc") +(write-string stdout "boot3: catm combined.M1pp\n") +(must (run "catm" "combined.M1pp" + "backend.M1pp" "frontend.M1pp" "libp1pp.P1pp" + "entry-libc.P1pp" "libc.P1pp" "tcc.flat.P1pp" "elf-end.P1pp") + "catm combined.M1pp") +(write-string stdout "boot3: M1pp\n") +(must (run "M1pp" "combined.M1pp" "expanded.hex2pp") + "M1pp") +(write-string stdout "boot3: catm linked.hex2pp\n") +(must (run "catm" "linked.hex2pp" "ELF.hex2" "expanded.hex2pp") + "catm linked.hex2pp") +(write-string stdout "boot3: hex2pp\n") +(must (run "hex2pp" "-B" "0x600000" "linked.hex2pp" "tcc0") + "hex2pp") + +(exit 0) diff --git a/scripts/boot3.sh b/scripts/boot3.sh @@ -43,7 +43,8 @@ ## scripts/boot4.sh ## ## Usage: scripts/boot3.sh <arch> -## <arch> ∈ {aarch64, amd64, riscv64} +## <arch> ∈ {aarch64, amd64, riscv64} for DRIVER=podman (default). +## DRIVER=seed currently supports aarch64 only (uses seed-kernel). set -eu @@ -61,6 +62,7 @@ esac ROOT=$(cd "$(dirname "$0")/.." && pwd) cd "$ROOT" +DRIVER=${DRIVER:-podman} IMAGE=boot2-scratch:$ARCH BOOT1=build/$ARCH/boot1 BOOT2=build/$ARCH/boot2 @@ -71,12 +73,19 @@ TCC_DIR=build/tcc/$TCC_TARGET/tcc-0.9.26-1147-gee75a10c TCC_FLAT=build/tcc/$TCC_TARGET/tcc.flat.c LIBC_FLAT=build/$ARCH/vendor/mes-libc/libc.flat.c -# ── ensure container image exists ───────────────────────────────────── -if ! podman image exists "$IMAGE"; then +# ── ensure container image exists (podman driver only) ──────────────── +if [ "$DRIVER" = podman ] && ! podman image exists "$IMAGE"; then echo "[boot3 $ARCH] building $IMAGE" podman build --platform "$PLATFORM" -t "$IMAGE" \ -f scripts/Containerfile.scratch scripts/ fi +if [ "$DRIVER" = seed ]; then + [ "$ARCH" = aarch64 ] || { echo "[boot3] DRIVER=seed: aarch64 only" >&2; exit 2; } + KERNEL_IMAGE=$ROOT/seed-kernel/build/Image + EXTRACT=$ROOT/seed-kernel/scripts/extract-dump.sh + [ -f "$KERNEL_IMAGE" ] || { echo "[boot3] missing $KERNEL_IMAGE — make in seed-kernel/" >&2; exit 1; } + export KERNEL_IMAGE EXTRACT +fi # ── prerequisite: prior-stage binaries ──────────────────────────────── [ -x "$BOOT1/M1pp" ] || { echo "[boot3 $ARCH] missing $BOOT1/M1pp (run scripts/boot1.sh $ARCH)" >&2; exit 1; } @@ -97,67 +106,102 @@ if [ ! -e "$LIBC_FLAT" ]; then scripts/libc-flatten.sh --arch "$ARCH" fi -# ── reset staging, copy inputs explicitly ───────────────────────────── -rm -rf "$STAGE" -mkdir -p "$STAGE/in" "$STAGE/out" "$OUT" -rm -f "$OUT/tcc0" - -# Prior-stage binaries -cp "$BOOT1/M1pp" "$STAGE/in/M1pp" -cp "$BOOT1/hex2pp" "$STAGE/in/hex2pp" -cp "$BOOT2/catm" "$STAGE/in/catm" -cp "$BOOT2/scheme1" "$STAGE/in/scheme1" - -# cc.scm bundle inputs -cp scheme1/prelude.scm "$STAGE/in/prelude.scm" -cp cc/cc.scm "$STAGE/in/cc.scm" -cp cc/main.scm "$STAGE/in/main.scm" - -# M1pp pipeline + framing -cp "P1/P1-$ARCH.M1pp" "$STAGE/in/backend.M1pp" -cp P1/P1.M1pp "$STAGE/in/frontend.M1pp" -cp P1/P1pp.P1pp "$STAGE/in/libp1pp.P1pp" -cp P1/entry-libc.P1pp "$STAGE/in/entry-libc.P1pp" -cp P1/elf-end.P1pp "$STAGE/in/elf-end.P1pp" -cp "vendor/seed/$ARCH/ELF.hex2" "$STAGE/in/ELF.hex2" - -# Flattened TUs. The patched tcc <stdarg.h> bridge is already prepended -# (under #ifndef CCSCM) into both .flat.c files by the flatten scripts, -# so the in-container compiles need no -I/-include flags. -cp "$TCC_FLAT" "$STAGE/in/tcc.flat.c" -cp "$LIBC_FLAT" "$STAGE/in/libc.flat.c" - -# ── emit flat container build script ────────────────────────────────── -# Generates a straight-line shell program: cc.scm bundle → tcc0 ELF via -# scheme1 + M1pp + hex2pp. Container shell sees only sequential exec — -# no functions, no for-loops, no parameter expansion. -RUN_SCRIPT=$STAGE/in/run.sh -{ - echo '#!/bin/sh' - echo 'set -eu' - echo - echo '# Stage A: cc.scm bundle -> tcc0 ELF' - echo '/work/in/catm /tmp/cc-bundled.scm /work/in/prelude.scm /work/in/cc.scm /work/in/main.scm' - echo '/work/in/scheme1 /tmp/cc-bundled.scm --lib=libc__ /work/in/libc.flat.c /tmp/libc.P1pp' - echo '/work/in/scheme1 /tmp/cc-bundled.scm --lib=tcc__ /work/in/tcc.flat.c /tmp/tcc.flat.P1pp' - echo '/work/in/catm /tmp/combined.M1pp /work/in/backend.M1pp /work/in/frontend.M1pp /work/in/libp1pp.P1pp /work/in/entry-libc.P1pp /tmp/libc.P1pp /tmp/tcc.flat.P1pp /work/in/elf-end.P1pp' - echo '/work/in/M1pp /tmp/combined.M1pp /tmp/expanded.hex2pp' - echo '/work/in/catm /tmp/linked.hex2pp /work/in/ELF.hex2 /tmp/expanded.hex2pp' - echo '/work/in/hex2pp -B 0x600000 /tmp/linked.hex2pp /work/out/tcc0' -} > "$RUN_SCRIPT" -chmod +x "$RUN_SCRIPT" -echo "[boot3 $ARCH] generated run.sh: $(wc -l <"$RUN_SCRIPT") lines" - -# ── run flat build script in scratch+busybox container ──────────────── -echo "[boot3 $ARCH] cc.scm -> tcc0" -podman run --rm -i --pull=never --platform "$PLATFORM" \ - --tmpfs /tmp:size=1024M \ - -v "$ROOT/$STAGE:/work" -w /work "$IMAGE" \ - sh -eu /work/in/run.sh - -# ── copy output to final destination ────────────────────────────────── -cp "$STAGE/out/tcc0" "$OUT/tcc0" -chmod 0700 "$OUT/tcc0" - -echo "[boot3 $ARCH] sizes: tcc0=$(wc -c <"$OUT/tcc0")" -echo "[boot3 $ARCH] OK -> $OUT/tcc0" +BACKEND_M1PP=P1/P1-$ARCH.M1pp +FRONTEND_M1PP=P1/P1.M1pp +LIBP1PP=P1/P1pp.P1pp +ENTRY_LIBC=P1/entry-libc.P1pp +ELF_END=P1/elf-end.P1pp +ELF_HEX2=vendor/seed/$ARCH/ELF.hex2 + +case "$DRIVER" in +podman) + # ── reset staging, copy inputs explicitly ───────────────────────── + rm -rf "$STAGE" + mkdir -p "$STAGE/in" "$STAGE/out" "$OUT" + rm -f "$OUT/tcc0" + + cp "$BOOT1/M1pp" "$STAGE/in/M1pp" + cp "$BOOT1/hex2pp" "$STAGE/in/hex2pp" + cp "$BOOT2/catm" "$STAGE/in/catm" + cp "$BOOT2/scheme1" "$STAGE/in/scheme1" + + cp scheme1/prelude.scm "$STAGE/in/prelude.scm" + cp cc/cc.scm "$STAGE/in/cc.scm" + cp cc/main.scm "$STAGE/in/main.scm" + + cp "$BACKEND_M1PP" "$STAGE/in/backend.M1pp" + cp "$FRONTEND_M1PP" "$STAGE/in/frontend.M1pp" + cp "$LIBP1PP" "$STAGE/in/libp1pp.P1pp" + cp "$ENTRY_LIBC" "$STAGE/in/entry-libc.P1pp" + cp "$ELF_END" "$STAGE/in/elf-end.P1pp" + cp "$ELF_HEX2" "$STAGE/in/ELF.hex2" + + cp "$TCC_FLAT" "$STAGE/in/tcc.flat.c" + cp "$LIBC_FLAT" "$STAGE/in/libc.flat.c" + + # ── emit flat container build script ───────────────────────────── + # Generates a straight-line shell program: cc.scm bundle → tcc0 ELF + # via scheme1 + M1pp + hex2pp. Container shell sees only sequential + # exec — no functions, no for-loops, no parameter expansion. + RUN_SCRIPT=$STAGE/in/run.sh + { + echo '#!/bin/sh' + echo 'set -eu' + echo + echo '# Stage A: cc.scm bundle -> tcc0 ELF' + echo '/work/in/catm /tmp/cc-bundled.scm /work/in/prelude.scm /work/in/cc.scm /work/in/main.scm' + echo '/work/in/scheme1 /tmp/cc-bundled.scm --lib=libc__ /work/in/libc.flat.c /tmp/libc.P1pp' + echo '/work/in/scheme1 /tmp/cc-bundled.scm --lib=tcc__ /work/in/tcc.flat.c /tmp/tcc.flat.P1pp' + echo '/work/in/catm /tmp/combined.M1pp /work/in/backend.M1pp /work/in/frontend.M1pp /work/in/libp1pp.P1pp /work/in/entry-libc.P1pp /tmp/libc.P1pp /tmp/tcc.flat.P1pp /work/in/elf-end.P1pp' + echo '/work/in/M1pp /tmp/combined.M1pp /tmp/expanded.hex2pp' + echo '/work/in/catm /tmp/linked.hex2pp /work/in/ELF.hex2 /tmp/expanded.hex2pp' + echo '/work/in/hex2pp -B 0x600000 /tmp/linked.hex2pp /work/out/tcc0' + } > "$RUN_SCRIPT" + chmod +x "$RUN_SCRIPT" + echo "[boot3 $ARCH] generated run.sh: $(wc -l <"$RUN_SCRIPT") lines" + + echo "[boot3 $ARCH] cc.scm -> tcc0" + podman run --rm -i --pull=never --platform "$PLATFORM" \ + --tmpfs /tmp:size=1024M \ + -v "$ROOT/$STAGE:/work" -w /work "$IMAGE" \ + sh -eu /work/in/run.sh + + cp "$STAGE/out/tcc0" "$OUT/tcc0" + chmod 0700 "$OUT/tcc0" + ;; +seed) + # ── seed-kernel driver: one qemu boot, scheme1 evaluates run.scm + # against catm/scheme1/M1pp/hex2pp staged as flat tmpfs entries. + . scripts/lib-seed-runscm.sh + seed_runscm_init "$STAGE" "$OUT" + seed_runscm_scheme1 "$BOOT2/scheme1" + seed_runscm_prelude scheme1/prelude.scm + seed_runscm_runscm scripts/boot3-run.scm + + seed_runscm_input catm "$BOOT2/catm" + seed_runscm_input M1pp "$BOOT1/M1pp" + seed_runscm_input hex2pp "$BOOT1/hex2pp" + seed_runscm_input scheme1 "$BOOT2/scheme1" + + seed_runscm_input prelude.scm scheme1/prelude.scm + seed_runscm_input cc.scm cc/cc.scm + seed_runscm_input main.scm cc/main.scm + + seed_runscm_input backend.M1pp "$BACKEND_M1PP" + seed_runscm_input frontend.M1pp "$FRONTEND_M1PP" + seed_runscm_input libp1pp.P1pp "$LIBP1PP" + seed_runscm_input entry-libc.P1pp "$ENTRY_LIBC" + seed_runscm_input elf-end.P1pp "$ELF_END" + seed_runscm_input ELF.hex2 "$ELF_HEX2" + + seed_runscm_input tcc.flat.c "$TCC_FLAT" + seed_runscm_input libc.flat.c "$LIBC_FLAT" + + seed_runscm_export tcc0 + seed_runscm_run 1800 + ;; +*) echo "[boot3] unknown DRIVER=$DRIVER" >&2; exit 2 ;; +esac + +echo "[boot3 $ARCH/$DRIVER] sizes: tcc0=$(wc -c <"$OUT/tcc0")" +echo "[boot3 $ARCH/$DRIVER] OK -> $OUT/tcc0" diff --git a/scripts/boot4-gen-runscm.sh b/scripts/boot4-gen-runscm.sh @@ -0,0 +1,114 @@ +#!/bin/sh +## boot4-gen-runscm.sh — emit run.scm driving boot4's tcc0→tcc1→tcc2→tcc3 +## chain inside the seed kernel. Mirrors scripts/boot4.sh's per-stage shell +## emission; per-arch values resolved on the host so the .scm body is +## straight-line (run …) calls. +## +## Usage: boot4-gen-runscm.sh <arch> <out.scm> + +set -eu +[ "$#" -eq 2 ] || { echo "usage: $0 <arch> <out.scm>" >&2; exit 2; } +ARCH=$1; OUT=$2 + +case "$ARCH" in + aarch64) LIB_HELPER_SRC=lib-arm64.c; LIB_HELPER_OBJ=lib-arm64.o + LIB_HELPER_DEFS='"-D" "HAVE_CONFIG_H=1" "-D" "TCC_TARGET_ARM64=1" "-D" "TCC_TARGET_ARM=1"' + LIBTCC1_C_SRCS="lib-arm64.c" + LIBTCC1_C_DEFS='"-D" "HAVE_CONFIG_H=1" "-D" "TCC_TARGET_ARM64=1" "-D" "TCC_TARGET_ARM=1"' + LIBTCC1_ASM_SRCS="" ;; + *) echo "boot4-gen: only aarch64 supported under DRIVER=seed" >&2; exit 2 ;; +esac + +emit_helpers() { + cc=$1 + cat <<EOF +(must (run "$cc" "-nostdlib" "-c" "-o" "start.o" "start.S") "$cc start.o") +(must (run "$cc" "-nostdlib" "-c" "-o" "sys_stubs.o" "sys_stubs.S") "$cc sys_stubs.o") +(must (run "$cc" "-nostdlib" "-c" "-o" "mem.o" "mem.c") "$cc mem.o") +(must (run "$cc" "-nostdlib" "-c" "-o" "libc.o" "libc.flat.c") "$cc libc.o") +(must (run "$cc" "-nostdlib" $LIB_HELPER_DEFS "-c" "-o" "$LIB_HELPER_OBJ" "$LIB_HELPER_SRC") "$cc $LIB_HELPER_OBJ") +EOF +} + +# emit_archive — uses prefix to namespace output object names per stage. +# pfx="s2-"/"s3-" for stage2/3. The .o objects archived into libtcc1.a +# keep their bare basenames (lib-arm64.o, …) — tcc -ar stores basenames +# only, so this matches podman's archive members exactly. They overwrite +# the stage's helper-named .o files; nothing post-archive in the same +# stage reads them as standalone .o. +emit_archive() { + cc=$1; pfx=$2 + echo "(must (run \"catm\" \"${pfx}crt1.o\" \"start.o\") \"copy crt1.o $pfx\")" + echo "(must (run \"$cc\" \"-ar\" \"rcs\" \"${pfx}libc.a\" \"sys_stubs.o\" \"mem.o\" \"libc.o\") \"$cc ${pfx}libc.a\")" + libtcc1_objs="" + for src in $LIBTCC1_C_SRCS; do + obj=${src%.c}.o + echo "(must (run \"$cc\" \"-nostdlib\" $LIBTCC1_C_DEFS \"-c\" \"-o\" \"${obj}\" \"$src\") \"$cc lt ${obj}\")" + libtcc1_objs="$libtcc1_objs \"${obj}\"" + done + for src in $LIBTCC1_ASM_SRCS; do + obj=${src%.S}.o + echo "(must (run \"$cc\" \"-nostdlib\" \"-c\" \"-o\" \"${obj}\" \"$src\") \"$cc lt ${obj}\")" + libtcc1_objs="$libtcc1_objs \"${obj}\"" + done + echo "(must (run \"$cc\" \"-ar\" \"rcs\" \"${pfx}libtcc1.a\"$libtcc1_objs) \"$cc ${pfx}libtcc1.a\")" +} + +emit_link_tcc() { + cc=$1; pfx=$2; out=$3 + echo "(must (run \"$cc\" \"-nostdlib\" \"${pfx}crt1.o\" \"tcc.flat.c\" \"${pfx}libc.a\" \"${pfx}libtcc1.a\" \"${pfx}libc.a\" \"-o\" \"$out\") \"$cc -> $out\")" +} + +{ +cat <<'PROLOGUE' +;; boot4 run.scm — drive tcc0 -> tcc1 -> tcc2 -> tcc3 inside seed kernel. +;; Generated by scripts/boot4-gen-runscm.sh; mirrors scripts/boot4.sh's +;; podman path stage-for-stage. Intermediate .o/.a files live flat in +;; the tmpfs (no /tmp/stageN/ prefix; stages are namespaced by filename). + +(define (must r tag) + (if (and (car r) (= 0 (cdr r))) + r + (begin + (write-string stderr "boot4: step failed: ") + (write-string stderr tag) + (write-string stderr "\n") + (exit 1)))) + +(write-string stdout "boot4: stage B (tcc0 helpers)\n") +PROLOGUE + +# Stage B: tcc0 builds helper objects (no archive). +emit_helpers tcc0 + +cat <<EOF + +(write-string stdout "boot4: stage C (tcc0 -> tcc1)\n") +(must (run "tcc0" "-nostdlib" "start.o" "sys_stubs.o" "mem.o" "libc.o" "$LIB_HELPER_OBJ" "tcc.flat.c" "-o" "tcc1") "tcc0 -> tcc1") + +(write-string stdout "boot4: stage D (tcc1 -> tcc2)\n") +EOF + +# Stage D: tcc1 rebuilds helpers + archive, links tcc2. +emit_helpers tcc1 +emit_archive tcc1 "s2-" +emit_link_tcc tcc1 "s2-" tcc2 + +cat <<EOF + +(write-string stdout "boot4: stage E (tcc2 -> tcc3)\n") +EOF + +# Stage E: tcc2 rebuilds helpers + archive, links tcc3. +emit_helpers tcc2 +emit_archive tcc2 "s3-" +emit_link_tcc tcc2 "s3-" tcc3 + +cat <<'EPILOGUE' + +(write-string stdout "boot4: linking hello\n") +(must (run "tcc2" "-nostdlib" "s3-crt1.o" "hello.c" "s3-libc.a" "s3-libtcc1.a" "s3-libc.a" "-o" "hello") "tcc2 -> hello") +(write-string stdout "boot4: ALL-OK\n") +(exit 0) +EPILOGUE +} > "$OUT" diff --git a/scripts/boot4.sh b/scripts/boot4.sh @@ -72,7 +72,8 @@ ## script) — that equality is the fixed-point check. ## ## Usage: scripts/boot4.sh <arch> -## <arch> ∈ {aarch64, amd64, riscv64} +## <arch> ∈ {aarch64, amd64, riscv64} for DRIVER=podman (default). +## DRIVER=seed currently supports aarch64 only (uses seed-kernel). set -eu @@ -111,7 +112,9 @@ esac ROOT=$(cd "$(dirname "$0")/.." && pwd) cd "$ROOT" +DRIVER=${DRIVER:-podman} IMAGE=boot2-scratch:$ARCH +BOOT2=build/$ARCH/boot2 BOOT3=build/$ARCH/boot3 OUT=build/$ARCH/boot4 STAGE=build/$ARCH/.boot4-stage @@ -120,12 +123,21 @@ TCC_DIR=build/tcc/$TCC_TARGET/tcc-0.9.26-1147-gee75a10c TCC_FLAT=build/tcc/$TCC_TARGET/tcc.flat.c LIBC_FLAT=build/$ARCH/vendor/mes-libc/libc.flat.c -# ── ensure container image exists ───────────────────────────────────── -if ! podman image exists "$IMAGE"; then +# ── ensure container image exists (podman driver only) ──────────────── +if [ "$DRIVER" = podman ] && ! podman image exists "$IMAGE"; then echo "[boot4 $ARCH] building $IMAGE" podman build --platform "$PLATFORM" -t "$IMAGE" \ -f scripts/Containerfile.scratch scripts/ fi +if [ "$DRIVER" = seed ]; then + [ "$ARCH" = aarch64 ] || { echo "[boot4] DRIVER=seed: aarch64 only" >&2; exit 2; } + KERNEL_IMAGE=$ROOT/seed-kernel/build/Image + EXTRACT=$ROOT/seed-kernel/scripts/extract-dump.sh + [ -f "$KERNEL_IMAGE" ] || { echo "[boot4] missing $KERNEL_IMAGE — make in seed-kernel/" >&2; exit 1; } + [ -x "$BOOT2/scheme1" ] || { echo "[boot4] missing $BOOT2/scheme1 (run boot2)" >&2; exit 1; } + [ -x "$BOOT2/catm" ] || { echo "[boot4] missing $BOOT2/catm (run boot2)" >&2; exit 1; } + export KERNEL_IMAGE EXTRACT +fi # ── prerequisite: prior-stage binaries ──────────────────────────────── [ -x "$BOOT3/tcc0" ] || { echo "[boot4 $ARCH] missing $BOOT3/tcc0 (run scripts/boot3.sh $ARCH)" >&2; exit 1; } @@ -146,6 +158,8 @@ for f in $LIBTCC1_C_SRCS $LIBTCC1_ASM_SRCS; do [ -e "$TCC_DIR/lib/$f" ] || { echo "[boot4 $ARCH] missing $TCC_DIR/lib/$f" >&2; exit 1; } done +case "$DRIVER" in +podman) # ── reset staging, copy inputs explicitly ───────────────────────────── rm -rf "$STAGE" mkdir -p "$STAGE/in" "$STAGE/in/tcc-lib" "$STAGE/out" "$OUT" @@ -275,22 +289,83 @@ podman run --rm -i --pull=never --platform "$PLATFORM" \ --tmpfs /tmp:size=1024M \ -v "$ROOT/$STAGE:/work" -w /work "$IMAGE" \ sh -eu /work/in/run.sh + ;; +seed) + # ── seed-kernel driver: one qemu boot, scheme1 evaluates a host- + # generated run.scm against tcc0 / catm / sources flat in tmpfs. + # Outputs (tcc1/tcc2/tcc3, s3-crt1.o, s3-libc.a, s3-libtcc1.a, + # hello) come back via UART tmpfs dump. + rm -f "$OUT/tcc1" "$OUT/tcc2" + + . scripts/lib-seed-runscm.sh + seed_runscm_init "$STAGE" "$OUT" + + RUNSCM=$STAGE/run.scm + scripts/boot4-gen-runscm.sh "$ARCH" "$RUNSCM" + echo "[boot4 $ARCH] generated run.scm: $(wc -l <"$RUNSCM") lines" + + seed_runscm_scheme1 "$BOOT2/scheme1" + seed_runscm_prelude scheme1/prelude.scm + seed_runscm_runscm "$RUNSCM" + + seed_runscm_input tcc0 "$BOOT3/tcc0" + seed_runscm_input catm "$BOOT2/catm" + seed_runscm_input scheme1 "$BOOT2/scheme1" + + seed_runscm_input start.S "tcc-libc/$ARCH/start.S" + seed_runscm_input sys_stubs.S "tcc-libc/$ARCH/sys_stubs.S" + seed_runscm_input mem.c tcc-cc/mem.c + for f in $LIBTCC1_C_SRCS $LIBTCC1_ASM_SRCS; do + seed_runscm_input "$f" "$TCC_DIR/lib/$f" + done + + seed_runscm_input tcc.flat.c "$TCC_FLAT" + seed_runscm_input libc.flat.c "$LIBC_FLAT" + seed_runscm_input hello.c scripts/boot-hello.c -# ── fixed-point check (host-side; container has no cmp) ────────────── -if ! cmp -s "$STAGE/out/tcc2" "$STAGE/out/tcc3"; then - s2=$(wc -c <"$STAGE/out/tcc2") - s3=$(wc -c <"$STAGE/out/tcc3") + seed_runscm_export tcc1 + seed_runscm_export tcc2 + seed_runscm_export tcc3 + seed_runscm_export s3-crt1.o + seed_runscm_export s3-libc.a + seed_runscm_export s3-libtcc1.a + seed_runscm_export hello + seed_runscm_run 5400 + ;; +*) echo "[boot4] unknown DRIVER=$DRIVER" >&2; exit 2 ;; +esac + +# ── fixed-point check (host-side) ───────────────────────────────────── +case "$DRIVER" in + podman) T2=$STAGE/out/tcc2; T3=$STAGE/out/tcc3 ;; + seed) T2=$STAGE/dump/tcc2; T3=$STAGE/dump/tcc3 ;; +esac +if ! cmp -s "$T2" "$T3"; then + s2=$(wc -c <"$T2") + s3=$(wc -c <"$T3") echo "[boot4 $ARCH] FIXED-POINT FAIL: tcc2 ($s2) != tcc3 ($s3)" >&2 exit 1 fi # ── copy outputs to final destination ───────────────────────────────── -rm -f "$OUT/tcc1" "$OUT/tcc2" \ - "$OUT/start.o" "$OUT/sys_stubs.o" "$OUT/mem.o" "$OUT/libc.o" -for f in tcc3 crt1.o libc.a libtcc1.a hello; do - cp "$STAGE/out/$f" "$OUT/$f" -done +case "$DRIVER" in +podman) + rm -f "$OUT/tcc1" "$OUT/tcc2" \ + "$OUT/start.o" "$OUT/sys_stubs.o" "$OUT/mem.o" "$OUT/libc.o" + for f in tcc3 crt1.o libc.a libtcc1.a hello; do + cp "$STAGE/out/$f" "$OUT/$f" + done + ;; +seed) + # seed_runscm_run already published exports under $OUT; rename + # the s3- prefix away to match podman's layout. + [ -f "$OUT/s3-crt1.o" ] && mv "$OUT/s3-crt1.o" "$OUT/crt1.o" + [ -f "$OUT/s3-libc.a" ] && mv "$OUT/s3-libc.a" "$OUT/libc.a" + [ -f "$OUT/s3-libtcc1.a" ] && mv "$OUT/s3-libtcc1.a" "$OUT/libtcc1.a" + rm -f "$OUT/tcc1" "$OUT/tcc2" + ;; +esac chmod 0700 "$OUT/tcc3" "$OUT/hello" -echo "[boot4 $ARCH] sizes: libtcc1.a=$(wc -c <"$OUT/libtcc1.a") libc.a=$(wc -c <"$OUT/libc.a") hello=$(wc -c <"$OUT/hello")" -echo "[boot4 $ARCH] OK -> $OUT/{tcc3, crt1.o, libc.a, libtcc1.a, hello} (fixed point: tcc2 == tcc3)" +echo "[boot4 $ARCH/$DRIVER] sizes: libtcc1.a=$(wc -c <"$OUT/libtcc1.a") libc.a=$(wc -c <"$OUT/libc.a") hello=$(wc -c <"$OUT/hello")" +echo "[boot4 $ARCH/$DRIVER] OK -> $OUT/{tcc3, crt1.o, libc.a, libtcc1.a, hello} (fixed point: tcc2 == tcc3)" diff --git a/scripts/boot5.sh b/scripts/boot5.sh @@ -36,8 +36,12 @@ ## build/$ARCH/boot5/hello — static, runs in the container ## ## Usage: scripts/boot5.sh <arch> -## <arch> ∈ {amd64, aarch64, riscv64} -## All three architectures are verified end-to-end. +## <arch> ∈ {amd64, aarch64, riscv64} for DRIVER=podman (default). +## All three architectures are verified end-to-end on podman. +## DRIVER=seed: not yet supported — boot5 compiles ~500 musl TUs, each +## a (run "tcc" …) inside the VM. Even with the kernel's pool-swap on +## execve, that's ~500 clone+execve+exit cycles end-to-end under TCG +## (≥several hours). Tracked in docs/OS-TODO.md. set -eu @@ -55,6 +59,13 @@ esac ROOT=$(cd "$(dirname "$0")/.." && pwd) cd "$ROOT" +DRIVER=${DRIVER:-podman} +[ "$DRIVER" = seed ] && { + echo "[boot5] DRIVER=seed is not yet supported (~500 TUs ⇒ many hours under TCG);" >&2 + echo " see docs/OS-TODO.md 'Things still worth doing'." >&2 + exit 2 +} + IMAGE=boot2-scratch:$ARCH BOOT4=build/$ARCH/boot4 OUT=build/$ARCH/boot5 diff --git a/scripts/lib-seed-runscm.sh b/scripts/lib-seed-runscm.sh @@ -0,0 +1,111 @@ +# lib-seed-runscm.sh — seed-driver harness for boot3/4/5-shaped pipelines. +# +# Where lib-pipeline.sh runs one qemu boot per stage (boot0/1/2 shape), +# this harness runs ONE qemu boot whose /init is scheme1, evaluating a +# host-generated run.scm that drives the per-bootN pipeline via +# (spawn …) / (run …) against chain binaries staged as named files in +# the cpio. Outputs come back through the existing UART-framed tmpfs +# dump that extract-dump.sh decodes. +# +# DSL (source as `. scripts/lib-seed-runscm.sh`): +# +# seed_runscm_init <staging-dir> <out-dir> +# seed_runscm_scheme1 <path> # init=scheme1 (boot2) +# seed_runscm_prelude <path> # scheme1/prelude.scm +# seed_runscm_runscm <path> # the host-generated driver +# seed_runscm_input <name> <host-path> # repeatable; staged in cpio +# seed_runscm_export <name> # repeatable; out file +# seed_runscm_run [timeout-s] # default 600s +# +# Required env: KERNEL_IMAGE, EXTRACT, QEMU_MEM (default 2048M). + +S_STAGE_DIR= +S_OUT_DIR= +S_SCHEME1= +S_PRELUDE= +S_RUNSCM= +S_NAMES= +S_EXPORTS= + +seed_runscm_init() { + S_STAGE_DIR=$1; S_OUT_DIR=$2 + : "${KERNEL_IMAGE:?lib-seed-runscm: KERNEL_IMAGE not set}" + : "${EXTRACT:?lib-seed-runscm: EXTRACT not set}" + rm -rf "$S_STAGE_DIR" + mkdir -p "$S_STAGE_DIR/cpio" "$S_OUT_DIR" + S_SCHEME1=; S_PRELUDE=; S_RUNSCM= + S_NAMES=; S_EXPORTS= +} + +seed_runscm_scheme1() { S_SCHEME1=$1; } +seed_runscm_prelude() { S_PRELUDE=$1; } +seed_runscm_runscm() { S_RUNSCM=$1; } + +seed_runscm_input() { + name=$1; src=$2 + cp "$src" "$S_STAGE_DIR/cpio/$name" + S_NAMES="$S_NAMES +$name" +} + +seed_runscm_export() { + S_EXPORTS="$S_EXPORTS $1" +} + +seed_runscm_run() { + timeout=${1:-600} + mem=${QEMU_MEM:-2048M} + [ -n "$S_SCHEME1" ] || { echo "seed-runscm: scheme1 not set" >&2; exit 2; } + [ -n "$S_PRELUDE" ] || { echo "seed-runscm: prelude not set" >&2; exit 2; } + [ -n "$S_RUNSCM" ] || { echo "seed-runscm: run.scm not set" >&2; exit 2; } + cp "$S_SCHEME1" "$S_STAGE_DIR/cpio/init" + chmod +x "$S_STAGE_DIR/cpio/init" + cat "$S_PRELUDE" "$S_RUNSCM" > "$S_STAGE_DIR/cpio/combined.scm" + cp "$S_RUNSCM" "$S_STAGE_DIR/cpio/run.scm" + NAMES="init +combined.scm +run.scm$S_NAMES" + INITRAMFS=$S_STAGE_DIR/initramfs.cpio + ( cd "$S_STAGE_DIR/cpio" && printf '%s\n' "$NAMES" | cpio -o -H newc 2>/dev/null ) > "$INITRAMFS" + + TRANSCRIPT=$S_STAGE_DIR/transcript.txt + echo "[seed-runscm] booting scheme1 + run.scm (timeout ${timeout}s)" >&2 + qemu-system-aarch64 \ + -machine virt -cpu cortex-a72 -m "$mem" \ + -nographic -no-reboot \ + -kernel "$KERNEL_IMAGE" -initrd "$INITRAMFS" \ + -append "init combined.scm dumpfs" \ + > "$TRANSCRIPT" 2>&1 & + QPID=$! + ( sleep "$timeout"; kill -9 $QPID 2>/dev/null ) & + WATCHER=$! + wait $QPID 2>/dev/null || true + kill $WATCHER 2>/dev/null || true + + if ! grep -q '=== DUMP-END ===' "$TRANSCRIPT"; then + echo "[seed-runscm] FAIL: no DUMP-END in transcript" >&2 + tail -40 "$TRANSCRIPT" >&2 + exit 3 + fi + EXIT_LINE=$(grep -E "user exit_group" "$TRANSCRIPT" | tail -1 || true) + case "$EXIT_LINE" in + *"exit_group(0)"*) : ;; + *) echo "[seed-runscm] FAIL: driver did not exit 0: $EXIT_LINE" >&2 + tail -40 "$TRANSCRIPT" >&2 + exit 4 ;; + esac + + mkdir -p "$S_STAGE_DIR/dump" + "$EXTRACT" "$S_STAGE_DIR/dump" "$TRANSCRIPT" >/dev/null 2>&1 || \ + "$EXTRACT" "$S_STAGE_DIR/dump" "$TRANSCRIPT" >&2 + + for n in $S_EXPORTS; do + if [ ! -f "$S_STAGE_DIR/dump/$n" ]; then + echo "[seed-runscm] FAIL: missing output '$n'" >&2 + ls "$S_STAGE_DIR/dump" >&2 || true + exit 5 + fi + cp "$S_STAGE_DIR/dump/$n" "$S_OUT_DIR/$n" + chmod 0700 "$S_OUT_DIR/$n" + done +} diff --git a/scripts/seed-accept-boot34.sh b/scripts/seed-accept-boot34.sh @@ -0,0 +1,71 @@ +#!/bin/sh +## seed-accept-boot34.sh — acceptance: run boot3 (then optionally boot4) +## under DRIVER=seed and assert byte-identical outputs vs build/aarch64/ +## bootN/'s podman-built artefacts. +## +## Prereqs (build first): +## - seed-kernel/build/Image (`make` in seed-kernel/) +## - build/aarch64/boot{0,1,2,3,4}/ (run scripts/bootN.sh aarch64 +## under DRIVER=podman to populate references) +## +## What it does: +## 1. Stash existing podman-built build/aarch64/boot{3,4}/ as ref/. +## 2. DRIVER=seed scripts/boot3.sh aarch64 — one qemu boot, scheme1 +## drives cc.scm → tcc0 from a generated run.scm. +## 3. cmp -s tcc0 vs ref tcc0; fail on diff. +## 4. If $WITH_BOOT4=1, repeat for boot4 (tcc0 → tcc1 → tcc2 → tcc3, +## with tcc2 == tcc3 fixed-point asserted). +## +## Usage: scripts/seed-accept-boot34.sh +## WITH_BOOT4=1 scripts/seed-accept-boot34.sh + +set -eu + +ARCH=aarch64 +ROOT=$(cd "$(dirname "$0")/.." && pwd) +cd "$ROOT" + +KERNEL=seed-kernel/build/Image +[ -f "$KERNEL" ] || { echo "missing $KERNEL — make in seed-kernel/" >&2; exit 1; } +[ -x build/$ARCH/boot3/tcc0 ] || { echo "build/$ARCH/boot3/tcc0 missing — run scripts/boot3.sh aarch64" >&2; exit 1; } + +REF=build/$ARCH/.seed-ref +rm -rf "$REF"; mkdir -p "$REF" +cp build/$ARCH/boot3/tcc0 "$REF/tcc0.podman" + +echo "[seed-accept-boot34] boot3: DRIVER=seed scripts/boot3.sh $ARCH" +DRIVER=seed scripts/boot3.sh $ARCH + +if ! cmp -s build/$ARCH/boot3/tcc0 "$REF/tcc0.podman"; then + s_seed=$(wc -c < build/$ARCH/boot3/tcc0) + s_ref=$(wc -c < "$REF/tcc0.podman") + echo "[seed-accept-boot34] boot3 FAIL: tcc0 differs (seed=$s_seed ref=$s_ref)" >&2 + exit 3 +fi +echo "[seed-accept-boot34] boot3 PASS — tcc0 byte-identical vs podman" + +if [ "${WITH_BOOT4:-0}" != 1 ]; then + exit 0 +fi + +[ -x build/$ARCH/boot4/tcc3 ] || { echo "build/$ARCH/boot4/tcc3 missing — run scripts/boot4.sh aarch64 under podman first" >&2; exit 1; } +cp build/$ARCH/boot4/tcc3 "$REF/tcc3.podman" +cp build/$ARCH/boot4/libc.a "$REF/libc.a.podman" +cp build/$ARCH/boot4/libtcc1.a "$REF/libtcc1.a.podman" +cp build/$ARCH/boot4/crt1.o "$REF/crt1.o.podman" +cp build/$ARCH/boot4/hello "$REF/hello.podman" + +echo "[seed-accept-boot34] boot4: DRIVER=seed scripts/boot4.sh $ARCH" +DRIVER=seed scripts/boot4.sh $ARCH + +fail=0 +for f in tcc3 libc.a libtcc1.a crt1.o hello; do + if ! cmp -s build/$ARCH/boot4/$f "$REF/$f.podman"; then + s_seed=$(wc -c < build/$ARCH/boot4/$f) + s_ref=$(wc -c < "$REF/$f.podman") + echo "[seed-accept-boot34] boot4 DIFF $f: seed=$s_seed ref=$s_ref" >&2 + fail=1 + fi +done +[ $fail -eq 0 ] || exit 4 +echo "[seed-accept-boot34] boot4 PASS — tcc3/libc.a/libtcc1.a/crt1.o/hello byte-identical vs podman"