boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 148a86184a0c6be28162f05f6a39ef921f3335f9
parent 3d0c80c1d6e1c5b0757cd6597ad2ca4f97a3af60
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  4 May 2026 10:29:58 -0700

boot3: tcc{0,1,2,3} naming, stage E fixed-point check, auto-flatten

Rename boot3.sh outputs to tcc0..tcc3 (cc.scm-built; tcc0-built;
tcc1-built; tcc2-built). Add stage E that builds tcc3 and asserts
tcc2 == tcc3 byte-for-byte — the actual self-host fixed point.
tcc0 -> tcc1 isn't a fixed point because cc.scm's emitted machine
code introduces subtle codegen-decision differences in tcc0's
behavior; once tcc is the compiler (tcc1 onward) the chain
converges.

Auto-invoke stage1-flatten.sh / libc-flatten.sh on the host when
their outputs are missing, logged with "(host)" matching the
existing cross-asm step.

TCC-TODO: replace the previous "tcc-tcc != tcc-tcc-tcc on riscv64"
(framed as a linker layout sensitivity) with the actual story —
cc.scm has a behavioral bug on riscv64 that surfaces as missed
immediate-folding peepholes in tcc0(tcc.flat.c). Open question for
later: which cc.scm predicate is mis-evaluating to make tcc0 fall
into the materialize-imm branch where tcc1 takes the fold-imm
branch.

Diffstat:
Mdocs/TCC-TODO.md | 97++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
Mscripts/boot3.sh | 144++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
2 files changed, 170 insertions(+), 71 deletions(-)

diff --git a/docs/TCC-TODO.md b/docs/TCC-TODO.md @@ -210,32 +210,77 @@ For now: known limitation, document, move on. The scalar codegen elsewhere on riscv64 is fine — only u32 narrowing of a wider source trips it. -### riscv64: tcc-tcc != tcc-tcc-tcc when boot3.sh assembles its own .S - -`scripts/boot3.sh riscv64` builds tcc-boot2 → tcc-tcc → tcc-tcc-tcc -through tcc-boot2's own riscv64 assembler (no host cross-asm). All -three binaries run, but the bootstrap fixed-point property breaks: -`tcc-tcc` is 305656 bytes, `tcc-tcc-tcc` is 305464 bytes. - -The Makefile path (which uses alpine-gcc to pre-build start.o / -sys_stubs.o) hits fixed-point on the same tcc-boot2: 305624 bytes for -both stages. amd64 also hits fixed-point through boot3.sh's -in-container assembler. So this is riscv64 + tcc-asm-built .o -specific. - -Triage: tcc-boot2 and tcc-tcc emit byte-identical start.o / -sys_stubs.o when given the .S inputs — the asm path is consistent. -The divergence is in the *link* step: tcc-boot2's linker, when fed -tcc-asm-built start.o (vs alpine-gcc-built start.o of the same -semantic content but different ELF section/symbol layout), produces a -tcc-tcc that is 32 bytes larger and miscompiles tcc.flat.c by 192 -bytes on the next stage. Likely a layout-sensitivity in tcc's riscv64 -linker around `.rela.text`/symtab ordering or alignment, but not yet -isolated. - -Workarounds: keep using the Makefile / alpine-gcc path on riscv64 if -fixed-point matters; the boot3.sh path is fine for "produce a working -tcc" but not for the self-host idempotence check. +### tcc0 → tcc1 is not a fixed point on riscv64 (cc.scm behavioral bug) + +`boot3.sh` produces four staged compilers: + +- `tcc0` = tcc-source compiled by cc.scm +- `tcc1` = tcc-source compiled by tcc0 +- `tcc2` = tcc-source compiled by tcc1 +- `tcc3` = tcc-source compiled by tcc2 + +The fixed-point check is **`tcc2 == tcc3`** (asserted at the end of +`boot3.sh`, verified on aarch64, amd64, riscv64). On riscv64 the +weaker `tcc1 == tcc2` does *not* hold: `tcc0(tcc.flat.c)` produces +a 616100-byte `.o` while `tcc1(tcc.flat.c)` and `tcc2(tcc.flat.c)` +produce a byte-identical 615892-byte `.o` — 208 bytes larger from +tcc0 (200 in `.text` + 8 ripple in symtab/reloc offsets). amd64 and +aarch64 satisfy `tcc1 == tcc2`; only riscv64 diverges. + +This is a **bug to investigate**, not just a "fatter code" +observation. cc.scm should be a *faithful* (semantics-preserving) +compiler — slower or larger output is acceptable, but tcc0 and tcc1 +must produce byte-identical output when run on the same source. +That they don't on riscv64 means cc.scm's translation of tcc.flat.c +into tcc0 changed what tcc0 *does at runtime*, not just how it's +encoded. We don't care about peephole optimizations being missed; we +do care that tcc0 makes different codegen decisions than tcc1 +makes. + +#### What's known + +The visible symptom: tcc0 emits 4 RISCV codegen patterns differently +than tcc1 does: + +| Source pattern | tcc0 emits | tcc1 emits | Δ | +|---|---|---|---| +| `x = x - imm` (i32) | `addiw t,zero,imm; addw rd,rs,t` | `addiw rd,rs,imm` | +4 B | +| `x = x & imm` | `addiw t,zero,imm; and rd,rs,t` | `andi rd,rs,imm` | +4 B | +| zero-ext after `sext.w` | `sext.w r,r; slli r,r,0x20; srli r,r,0x20` | `sext.w r,r` | +8 B | +| `x == 0xFFFFFFFF` (i32) | `addiw t,zero,-1; slli/srli; beq x,t,L` | `addi x,x,1; beqz x,L` | +8 B | + +These are decision points in `riscv64-gen.c` (immediate-folding, +zero-ext elision). Same source code, same input C, but the running +tcc0 takes the slow branch where the running tcc1 takes the fast +one — even though both are compiled from the same `tcc.flat.c`. + +#### Hypothesis to test + +cc.scm likely miscompiles an integer comparison or bit-test inside +the immediate-fits-in-instruction guard in `riscv64-gen.c`. Most of +the missed patterns share the shape `if (small_int_fits) { fold } else +{ materialize }`. If cc.scm gets the predicate wrong (e.g. signed vs. +unsigned compare, or wrong branch on a particular bit pattern), tcc0 +falls into the materialize path on inputs where tcc1 takes the fold +path. + +#### Repro / starting point + +```sh +# In the riscv64 container with boot3 outputs present: +$TCC0 -nostdlib -I $TCC_INC -include $SHIM -c -o /tmp/flat-tcc0.o tcc.flat.c +$TCC1 -nostdlib -I $TCC_INC -include $SHIM -c -o /tmp/flat-tcc1.o tcc.flat.c +# wc -c /tmp/flat-tcc0.o /tmp/flat-tcc1.o → 616100 vs 615892 +# objdump -d both, normalize addresses, diff to find divergent functions +``` + +The first divergent function in disassembly is `tal_free_impl` — a +small refcount-decrement that hits the "x = x - 1" pattern. Good +starting point because the function is short and the source path is +narrow. + +Until this is fixed, tcc1 is the "shake-out" stage and tcc2 is the +canonical compiler. ## Standalone `bootN.sh`: remaining host deps diff --git a/scripts/boot3.sh b/scripts/boot3.sh @@ -1,21 +1,37 @@ #!/bin/sh -## boot3.sh — standalone three-stage tcc bootstrap. +## boot3.sh — standalone four-stage tcc bootstrap. ## -## README's `(define tcc (tcc1 tcc.c))`: produces tcc-boot2 (cc.scm -## compiles tcc.flat.c), tcc-tcc (tcc-boot2 compiles tcc.flat.c), and -## tcc-tcc-tcc (tcc-tcc compiles tcc.flat.c). Stages 2 and 3 are the -## bootstrap fixed-point check. +## README's `(define tcc (tcc1 tcc.c))`: cc.scm compiles tcc.flat.c +## into tcc0; tcc0 compiles tcc.flat.c into tcc1; tcc1 does the same +## to produce tcc2; tcc2 does the same to produce tcc3. ## -## ─── Inputs (host-side preconditions, NOT produced by this script) ─── +## tcc0 = tcc-source compiled by cc.scm +## tcc1 = tcc-source compiled by tcc0 +## tcc2 = tcc-source compiled by tcc1 +## tcc3 = tcc-source compiled by tcc2 +## +## The bootstrap fixed-point check is `tcc2 == tcc3`: once tcc is +## compiling itself with no help from cc.scm, the chain reaches a +## byte-identical fixed point. tcc0 ≠ tcc1 in *behavior* (not just +## in code size) because cc.scm's emitted machine code introduces +## subtle codegen-decision differences — e.g. on riscv64 cc.scm +## misses several immediate-folding peepholes that tcc applies, so +## tcc0(tcc.flat.c) emits ~200 more bytes of `.text` than +## tcc1(tcc.flat.c) does. tcc1 is faithful tcc behavior (its source +## is tcc.flat.c, run through the cc.scm-built tcc0 translator +## semantically intact); tcc2 is the first binary whose machine code +## was emitted by faithful tcc. +## +## ─── Inputs (host-side, auto-built if missing) ──────────────────────── ## build/tcc/$TCC_TARGET/tcc.flat.c -## — flattened tcc TU (run scripts/stage1-flatten.sh -## --arch $TCC_TARGET to produce) ## build/tcc/$TCC_TARGET/tcc-0.9.26-1147-gee75a10c/{include,lib} -## — tcc-0.9.26 unpacked tree (side-product of -## stage1-flatten.sh) +## — flattened tcc TU + unpacked tree; built +## via scripts/stage1-flatten.sh --arch +## $TCC_TARGET (host cc -E, no container) ## build/$ARCH/vendor/mes-libc/libc.flat.c -## — flattened mes-libc TU (run -## scripts/libc-flatten.sh --arch $ARCH) +## — flattened mes-libc TU; built via +## scripts/libc-flatten.sh --arch $ARCH +## (host cc -E, no container) ## ## ─── Inputs (sources, copied into staging) ──────────────────────────── ## scheme1/prelude.scm cc/cc.scm cc/main.scm — catm'd to cc.scm bundle @@ -24,6 +40,9 @@ ## vendor/seed/$ARCH/ELF.hex2 — ELF header fragment ## tcc-libc/$ARCH/start.S — _start, calls __libc_init+main ## tcc-libc/$ARCH/sys_stubs.S — sys_* syscall wrappers +## (Throughout this script: tcc0/tcc1/tcc2/tcc3 are the four stages +## above; tcc0 is the cc.scm-built bootstrap, tcc2/tcc3 form the +## self-host fixed-point check.) ## tcc-libc/va_list_shim.h — gcc/tcc va_list bridge ## tcc-cc/mem.c — memcpy/memmove/memset/memcmp ## build/tcc/$TCC_TARGET/tcc-0.9.26-1147-gee75a10c/include/** (whole tree) @@ -45,12 +64,15 @@ ## tcc 0.9.26 has no aarch64 assembler (no arm64-asm.c), ## so .S inputs are pre-compiled host-side. amd64 and ## riscv64 have CONFIG_TCC_ASM in their backends and feed -## .S straight to tcc-boot2 in stages C+D — no host tool. +## .S straight to tcc0 in stages C+D — no host tool. ## ## ─── Outputs ────────────────────────────────────────────────────────── -## build/$ARCH/boot3/tcc-boot2 — cc.scm-built tcc (compile 1) -## build/$ARCH/boot3/tcc-tcc — tcc-boot2-built tcc (compile 2) -## build/$ARCH/boot3/tcc-tcc-tcc — tcc-tcc-built tcc (compile 3) +## build/$ARCH/boot3/tcc0 — cc.scm-built tcc (compile 1) +## build/$ARCH/boot3/tcc1 — tcc0-built tcc (compile 2) +## build/$ARCH/boot3/tcc2 — tcc1-built tcc (compile 3) +## build/$ARCH/boot3/tcc3 — tcc2-built tcc (compile 4) +## tcc2 and tcc3 are byte-identical (asserted at the end of this +## script) — that equality is the fixed-point check. ## ## Usage: scripts/boot3.sh <arch> ## <arch> ∈ {aarch64, amd64, riscv64} @@ -108,10 +130,17 @@ fi [ -x "$BOOT2/scheme1" ] || { echo "[boot3 $ARCH] missing $BOOT2/scheme1 (run scripts/boot2.sh $ARCH)" >&2; exit 1; } # ── prerequisite: host-flattened sources + unpacked tcc tree ────────── -[ -e "$TCC_FLAT" ] || { echo "[boot3 $ARCH] missing $TCC_FLAT (run scripts/stage1-flatten.sh --arch $TCC_TARGET)" >&2; exit 1; } -[ -e "$LIBC_FLAT" ] || { echo "[boot3 $ARCH] missing $LIBC_FLAT (run scripts/libc-flatten.sh --arch $ARCH)" >&2; exit 1; } -[ -d "$TCC_DIR/include" ] || { echo "[boot3 $ARCH] missing $TCC_DIR/include (run scripts/stage1-flatten.sh --arch $TCC_TARGET)" >&2; exit 1; } -[ -e "$TCC_DIR/lib/$LIB_HELPER_SRC" ] || { echo "[boot3 $ARCH] missing $TCC_DIR/lib/$LIB_HELPER_SRC" >&2; exit 1; } +# tcc.flat.c + the unpacked $TCC_DIR/{include,lib} tree are produced +# together by stage1-flatten.sh; libc.flat.c by libc-flatten.sh. Both +# run on the host (cc -E), no container — auto-invoke if missing. +if [ ! -e "$TCC_FLAT" ] || [ ! -d "$TCC_DIR/include" ] || [ ! -e "$TCC_DIR/lib/$LIB_HELPER_SRC" ]; then + echo "[boot3 $ARCH] flatten tcc.flat.c (host)" + scripts/stage1-flatten.sh --arch "$TCC_TARGET" +fi +if [ ! -e "$LIBC_FLAT" ]; then + echo "[boot3 $ARCH] flatten libc.flat.c (host)" + scripts/libc-flatten.sh --arch "$ARCH" +fi # ── reset staging, copy inputs explicitly ───────────────────────────── rm -rf "$STAGE" @@ -149,7 +178,7 @@ cp "$TCC_DIR/lib/$LIB_HELPER_SRC" "$STAGE/in/$LIB_HELPER_SRC" cp "$TCC_FLAT" "$STAGE/in/tcc.flat.c" cp "$LIBC_FLAT" "$STAGE/in/libc.flat.c" -# tcc include tree (small, < 200KB) — copied wholesale so tcc-boot2's +# tcc include tree (small, < 200KB) — copied wholesale so tcc0's # -I resolves stdarg.h etc. Recursive cp keeps directory layout. cp -R "$TCC_DIR/include/." "$STAGE/in/tcc-include/" @@ -169,13 +198,15 @@ else ASM_BUILD_NEEDED=1 fi -# ── run the full Stage A + B + C + D pipeline in one container ──────── +# ── run the full Stage A + B + C + D + E pipeline in one container ─── # Stage A: cc.scm bundle, libc.P1pp + tcc.flat.P1pp via scheme1 + cc.scm, -# link tcc-boot2 ELF via M1pp + hex2pp. -# Stage B: tcc-boot2 builds mem.o, libc.o, helper.o (va_list or lib-arm64). -# Stage C: tcc-boot2 links tcc-tcc. -# Stage D: tcc-tcc rebuilds helpers, links tcc-tcc-tcc. -echo "[boot3 $ARCH] cc.scm bundle -> tcc-boot2 -> tcc-tcc -> tcc-tcc-tcc" +# link tcc0 ELF via M1pp + hex2pp. +# Stage B: tcc0 builds mem.o, libc.o, helper.o (va_list or lib-arm64). +# Stage C: tcc0 compiles+links tcc1. +# Stage D: tcc1 rebuilds helpers, compiles+links tcc2. +# Stage E: tcc2 rebuilds helpers, compiles+links tcc3; the script +# then asserts tcc2 == tcc3 (the fixed-point check). +echo "[boot3 $ARCH] cc.scm -> tcc0 -> tcc1 -> tcc2 -> tcc3" podman run --rm -i --pull=never --platform "$PLATFORM" \ --tmpfs /tmp:size=1024M \ -e LIB_HELPER_SRC="$LIB_HELPER_SRC" \ @@ -197,15 +228,15 @@ $IN/scheme1 /tmp/cc-bundled.scm --lib=libc__ $IN/libc.flat.c /tmp/libc.P1pp # ── Stage A.3: scheme1 + cc.scm -> tcc.flat.P1pp ────────────────────── $IN/scheme1 /tmp/cc-bundled.scm --lib=tcc__ $IN/tcc.flat.c /tmp/tcc.flat.P1pp -# ── Stage A.4: M1pp + hex2pp pipeline -> tcc-boot2 ELF ──────────────── +# ── Stage A.4: M1pp + hex2pp pipeline -> tcc0 ELF ───────────────────── $IN/catm /tmp/combined.M1pp \ $IN/backend.M1pp $IN/frontend.M1pp $IN/libp1pp.P1pp \ $IN/entry-libc.P1pp /tmp/libc.P1pp /tmp/tcc.flat.P1pp $IN/elf-end.P1pp $IN/M1pp /tmp/combined.M1pp /tmp/expanded.hex2pp $IN/catm /tmp/linked.hex2pp $IN/ELF.hex2 /tmp/expanded.hex2pp -$IN/hex2pp -B 0x600000 /tmp/linked.hex2pp $OUT/tcc-boot2 +$IN/hex2pp -B 0x600000 /tmp/linked.hex2pp $OUT/tcc0 -# ── Stage B: tcc-boot2 builds helper objects ────────────────────────── +# ── Stage B: tcc0 builds helper objects ─────────────────────────────── # build_asm produces start.o + sys_stubs.o into $workdir. amd64/riscv64 # assemble .S in-container via tcc's CONFIG_TCC_ASM (no -include flag — # the asm parser doesn't accept C typedefs from va_list_shim.h's @@ -230,33 +261,56 @@ build_helpers() { "$cc" -nostdlib -I "$TCC_INC" $LIB_HELPER_DEFINES \ -c -o "$workdir/$LIB_HELPER_OBJ" "$IN/$LIB_HELPER_SRC" } -mkdir -p /tmp/stage2 /tmp/stage3 -build_asm $OUT/tcc-boot2 /tmp/stage2 -build_helpers $OUT/tcc-boot2 /tmp/stage2 +mkdir -p /tmp/stage1 /tmp/stage2 /tmp/stage3 +build_asm $OUT/tcc0 /tmp/stage1 +build_helpers $OUT/tcc0 /tmp/stage1 -# ── Stage C: tcc-boot2 -> tcc-tcc ───────────────────────────────────── -$OUT/tcc-boot2 -nostdlib -I "$TCC_INC" -include $IN/va_list_shim.h \ +# ── Stage C: tcc0 -> tcc1 ───────────────────────────────────────────── +$OUT/tcc0 -nostdlib -I "$TCC_INC" -include $IN/va_list_shim.h \ + /tmp/stage1/start.o /tmp/stage1/sys_stubs.o \ + /tmp/stage1/mem.o /tmp/stage1/libc.o \ + /tmp/stage1/$LIB_HELPER_OBJ \ + $IN/tcc.flat.c -o $OUT/tcc1 +chmod +x $OUT/tcc1 + +# ── Stage D: tcc1 rebuilds helpers, links tcc2 ──────────────────────── +build_asm $OUT/tcc1 /tmp/stage2 +build_helpers $OUT/tcc1 /tmp/stage2 +$OUT/tcc1 -nostdlib -I "$TCC_INC" -include $IN/va_list_shim.h \ /tmp/stage2/start.o /tmp/stage2/sys_stubs.o \ /tmp/stage2/mem.o /tmp/stage2/libc.o \ /tmp/stage2/$LIB_HELPER_OBJ \ - $IN/tcc.flat.c -o $OUT/tcc-tcc -chmod +x $OUT/tcc-tcc + $IN/tcc.flat.c -o $OUT/tcc2 +chmod +x $OUT/tcc2 -# ── Stage D: tcc-tcc rebuilds helpers, links tcc-tcc-tcc ────────────── -build_asm $OUT/tcc-tcc /tmp/stage3 -build_helpers $OUT/tcc-tcc /tmp/stage3 -$OUT/tcc-tcc -nostdlib -I "$TCC_INC" -include $IN/va_list_shim.h \ +# ── Stage E: tcc2 rebuilds helpers, links tcc3 ──────────────────────── +# Self-host idempotence check: tcc2 compiling itself with its own +# helpers must produce a byte-identical binary. This is the real +# bootstrap fixed point — tcc0 → tcc1 isn't expected to converge +# because cc.scm's emitted machine code introduces subtle codegen +# differences in tcc0's behavior, but from tcc1 onward the chain +# is tcc compiling tcc. +build_asm $OUT/tcc2 /tmp/stage3 +build_helpers $OUT/tcc2 /tmp/stage3 +$OUT/tcc2 -nostdlib -I "$TCC_INC" -include $IN/va_list_shim.h \ /tmp/stage3/start.o /tmp/stage3/sys_stubs.o \ /tmp/stage3/mem.o /tmp/stage3/libc.o \ /tmp/stage3/$LIB_HELPER_OBJ \ - $IN/tcc.flat.c -o $OUT/tcc-tcc-tcc -chmod +x $OUT/tcc-tcc-tcc + $IN/tcc.flat.c -o $OUT/tcc3 +chmod +x $OUT/tcc3 + +if ! cmp -s $OUT/tcc2 $OUT/tcc3; then + s2=$(wc -c <$OUT/tcc2) + s3=$(wc -c <$OUT/tcc3) + echo "[boot3] FIXED-POINT FAIL: tcc2 ($s2) != tcc3 ($s3)" >&2 + exit 1 +fi CONTAINER # ── copy outputs to final destination ───────────────────────────────── -for f in tcc-boot2 tcc-tcc tcc-tcc-tcc; do +for f in tcc0 tcc1 tcc2 tcc3; do cp "$STAGE/out/$f" "$OUT/$f" chmod 0700 "$OUT/$f" done -echo "[boot3 $ARCH] OK -> $OUT/{tcc-boot2, tcc-tcc, tcc-tcc-tcc}" +echo "[boot3 $ARCH] OK -> $OUT/{tcc0, tcc1, tcc2, tcc3} (fixed point: tcc2 == tcc3)"