Self-Build Bootstrap (current state and roadmap)
This roadmap covers the staged self-build of kit: building the compiler with itself until it reproduces its own output byte-for-byte. The mechanics and products of the build are described in ../BUILD.md; this document tracks the reproducibility goal, the current baseline, the open problems that remain, and the next steps for widening coverage. The bootstrap is the strongest end-to-end correctness oracle in the project, because it exercises the C frontend, every optimizer pass, the native backends, the object writers, the linker, and the archive tools, all on the compiler's own source.
Goal: a self-reproducing fixed point
The bootstrap builds kit three times and requires the last two stages to be identical:
- stage1 = the host-built kit, copied aside, exposing
cc/ld/ar/ranlib/as. - stage2 = the whole tree rebuilt with stage1 as the toolchain (
CC/AR/LD). - stage3 = the whole tree rebuilt again with stage2 as the toolchain.
- The invariant is
cmp stage2/kit stage3/kit— they must be byte-identical.
Stage2 vs stage3 is the fixed point: once the compiler reproduces itself, a third
pass cannot change anything. The bootstrap drives the normal Makefile with
CC/AR/LD repointed at each stage's symlinks, so there is no separate build
system to maintain — it is the same rules run with kit as the toolchain. This
depends on the reproducible-build guarantees in ../BUILD.md
(deterministic ordering, no embedded timestamps/paths); any nondeterminism in
codegen or object layout surfaces here as a stage2/stage3 mismatch.
Driving targets (see ../BUILD.md):
make bootstrapruns both the debug (-O0) and release (-O1) chains.make bootstrap-debug/make bootstrap-releaserun one chain.make test-bootstrap-toyadditionally runs the Toy corpus through the bootstrapped compiler as a behavioral check on top of the byte-identity check.
Current baseline
Done (baseline) on aarch64-macos:
- Both the
-O0(debug) and-O1(release) chains reach the fixed point:cmp stage2/kit stage3/kitis clean, and the per-object check across all*.oin stage2 vs stage3 reports zero differences in both modes. - Both bootstrapped compilers run the full Toy corpus clean (1034 pass, 0 fail, 8 skip) across the run, link/native, C-backend, and Wasm paths at Toy opt levels 0 and 1.
Done on aarch64-linux (ELF), run natively inside an arm64 Linux container from the macOS host (see "Bootstrapping a Linux target from a non-Linux host" below):
- musl (alpine): both the
-O0and-O1chains reach the fixed point;cmp stage2/kit stage3/kitis clean and the per-object check across all 321*.o/*.ain stage2 vs stage3 reports zero differences in both modes. The bootstrapped stage3 runs the Toy corpus at 1365 pass / 15 fail / 39 skip; the 15 failures are not bootstrap/codegen issues (per-object is byte-identical) — they are Mach-O-tuned.objdumpgolden substrings that differ on ELF, emitted C the container's host clang rejects under-Werror, and one Linux JIT-TLS.tdata-init discrepancy (see below). - glibc (debian): reaching the fixed point required a series of kit
C-frontend / preprocessor compatibility fixes for the glibc + Linux-UAPI
header set (musl's ISO-C headers never exercised them): erase
__extension__on all targets, map__signed__/__volatile__/__const__to the canonical keywords, and support GNU named variadic macro parameters (args...) in the preprocessor.
Done on aarch64-freebsd (ELF), run natively inside the FreeBSD aarch64 VM from
the macOS host (scripts/freebsd_bootstrap.sh aarch64; see "Bootstrapping a
Linux target from a non-Linux host" — the FreeBSD VM path is the same shape):
Both the
-O0(debug) and-O1(release) chains reach the fixed point:cmp stage2/kit stage3/kitis byte-identical in both modes. The bootstrapped stage3 runs the Toy corpus at 1378 pass / 2 fail / 39 skip in both chains; the 2 failures are the JIT-TLS.tdata-initR-lane discrepancy (141_threadlocal_mutate) — a non-bootstrap gap shared with aarch64-linux (the in-process JIT does not set up a per-thread TLS block / copy the.tdatainitializer; the native-link path is already.link.skip-gated), not a codegen issue.Four fixes were needed, in the order they surfaced:
- kit
ccaccepting-rdynamic(FreeBSD'sHOST_ENV_LDFLAGSpasses it; the other ELF hosts do not). - ELF symbol-version (Verneed/Versym) emission. FreeBSD's INO64 transition
left
stat/fstat/... as two incompatiblestruct statABIs behind a hiddenFBSD_1.0(compat) and the defaultFBSD_1.5; kit emitted unversioned undefined references, so the runtime bound the compat version and readst_sizeat the wrong offset — stage2 then failed to read its own source files. The linker now reads each DSO's.gnu.version_d/.gnu.versionand emits a matching.gnu.version_r+.gnu.version, gated on the DSO carrying versions (musl/static links unchanged; glibc links gain correctGLIBC_*requirements). - (
-O1) Deferred-symbol globalization in the assembler.-O1deferred anonymous const-data / jump-table symbols (.Lkit_ro.N/.Lkit_jt.N) are LOCAL tombstones (obj_symbol_defer,removed=1) untilopt_whole_module_finalizematerializes them.promote_undef_externs(src/asm/asm.c) — which globalizes undefined LOCAL externs — walked every slot including tombstones (the onlyobj_symiterconsumer not honoring theremovedcontract) and flipped them to defined GLOBALs. It bit FreeBSD because<stdlib.h>injects a file-scope__asm__(".symver …")whose replay runs that pass before the deferred data is materialized, so the four hosteddriver/env/*.oeach defined a global.Lkit_ro.0→duplicate definition of global symbol '.Lkit_ro.0'. Fix: skipremovedtombstones. Not FreeBSD-specific — any TU with a file-scopeasm+ a deferred const-data symbol at-O1reproduces it on any target. - (
-O1)--gc-sectionsrooting of DSO back-references. With the above fixed the stage2 link succeeded, but the-O1kitfailed to load (ld-elf.so.1: /lib/libc.so.7: Undefined symbol "__progname"): the release chain links-Wl,--gc-sections, and kit's section-GC liveness (src/link/link_resolve.c) did not root executable definitions that a linked DSO references, so__progname/environ(crt-defined, needed bylibc.so.7) were collected out of the dynsym.read_elf_dsonow records each DSO's undefined-symbol names and the GC pass roots the executable's definitions of them — matching GNU ld's default "keep what shared libraries need" behaviour (-O0has no--gc-sectionsand is unaffected).
- kit
This gives four fully self-hosting configurations — aarch64-macos,
aarch64-linux (musl + glibc), and aarch64-freebsd — each at both -O0 and
-O1. The remaining work is breadth: the other native targets (x86-64, rv64),
and guarding the property over time.
Open problems and next steps
Widen target and platform coverage
The fixed point holds for aarch64-macos, aarch64-linux (musl + glibc), and aarch64-freebsd. The bootstrap should hold for every supported native target and object format. Until each is green it is an open question whether its backend + object writer are fully deterministic and self-consistent.
- Reach the fixed point on x86-64 (ELF and Mach-O) for both
-O0and-O1. - Reach the fixed point on rv64 (ELF) for both
-O0and-O1. - Reach the fixed point on aarch64-linux (ELF), distinct from the macOS Mach-O path already covered. Done for musl and glibc; the aarch64-linux backend + ELF writer are confirmed deterministic and self-consistent.
- For each new configuration, run the per-object diff and the Toy corpus
through the bootstrapped compiler, not just the final
cmp. Done for aarch64-linux (321/321 objects identical; Toy 1365/15/39).
Bootstrapping a Linux target from a non-Linux host
make bootstrap keys off the build host's own uname (HOST_OS + machine), so
it selects the native toolchain and object format with no cross-compilation. To
bootstrap aarch64-linux from the macOS dev host, run the normal three-stage
build inside an arm64 Linux container, where it is an ordinary native build:
scripts/linux_bootstrap.sh [musl|glibc] [both|debug|release]drives a podman container (alpine for musl, debian for glibc — the same image families the hosted test suite uses), provisions a seed clang + make + libc headers, and runsmake bootstrapwith the stage tree underbuild/linux-boot/<libc>/.KIT_LINUX_BOOT_TOY=1additionally runs the Toy corpus through stage3.make bootstrap-linux(→-musl) /make bootstrap-linux-glibcwrap it.
Three host-environment differences from the macOS reference, all handled by the
script / mk/bootstrap.mk:
- LeakSanitizer. The
-O0chain builds stage1 with ASan+UBSan as on macOS, but LSan (unsupported on Darwin, so never run there) flags kit's arena allocator — which deliberately never frees — and aborts every stage1cc. The container setsASAN_OPTIONS=…:detect_leaks=0, the honest equivalent of Darwin's behavior. -lcfor the kit-compiled stages. On macOS kit gets its system headers via the-isysrootinHOST_SYSROOT_CFLAGS; on Linux that is empty and kit's hosted profile only wires up the libc include + library dirs once libc is requested, somk/bootstrap.mkpassesHOST_SYSROOT_{C,LD}FLAGS=-lcto the stage2/3 sub-makes (Linux/FreeBSD only).- glibc header compatibility. glibc + Linux-UAPI headers use GCC-isms musl's cleaner headers don't; reaching the glibc fixed point needed the C-frontend / pp fixes listed in the baseline above.
Non-bootstrap gaps surfaced by the aarch64-linux Toy run (tracked, not fixed-point blockers — the per-object diff is byte-identical):
- JIT-TLS
.tdatainitializer.141_threadlocal_mutatereturns 3 instead of 43 on the R (in-process JIT) lane: the JIT zero-initializes the thread-local block instead of copying its.tdatainitializer (40). Native link of this case is already.link.skip-gated; the JIT path is the gap. .objdumpgolden substrings for a few cases (122_data_entsize,127_switch_forced_jump_table,62_decl_data_attrs) encode Mach-O section / entsize spellings and need ELF-flavored sidecars.- C-backend emitted source for ~7 cases is rejected by the container's host
clang under
-Wall -Wextra -Werror(host-toolchain strictness, varies by clang version).
The native-ELF Toy L lane links hosted (kit cc -lc, so the crt provides
_start) rather than freestanding kit ld, because an ELF executable needs a
crt entry where Mach-O drives LC_MAIN straight to main; test/toy/run.sh
selects this automatically on non-Darwin hosts (KIT_TOY_L_HOSTED).
These connect to the per-arch backend state tracked in ../CODEGEN.md and ../ARCH.md, and to the object/format paths in ../OBJ.md and ../LINK.md / LINKER.md. A new arch's first bootstrap is also the most thorough regression test those components get.
Guard the property over time
The fixed point is easy to break with a single nondeterministic or miscompiling change, and a regression is expensive to bisect after the fact.
- Run
make bootstrap(or at least one chain) in CI on the reference host so breakage is caught at the offending change, not after multiple commits have been made. - Keep the per-object diff available as a first-line triage signal: it points directly at the diverging translation unit, which is far cheaper than diffing whole linked binaries.
Cross-bootstrap (stretch)
The current chains are native (host arch building host arch). A cross-bootstrap — host kit building a stage2 for a different target, then validating that stage2 reproduces a stage3 when run under ../EMU.md or on hardware — would prove the backends independent of the host. This is a stretch goal that depends on the emulator being able to host the full compiler.
Triage playbook for fixed-point regressions
When a stage2/stage3 mismatch (or a stage3 link failure) appears, the following approach has proven effective and should be the default starting point.
Use object reproduction, not just "does it link", as the oracle. A stage3
link failure is usually a symptom of a malformed object emitted earlier, not a
linker bug. The decisive question is whether stage2, used as a compiler,
reproduces the same .o that the host-built compiler produces. Compile one
suspect TU with both the host kit and the stage2 kit using identical
flags, then cmp the two objects. This separates malformed-object bugs from
link-driver symptoms and points straight at the diverging codegen.
Narrow with hybrid relinks. Relink stage2 after replacing one suspect TU (or one piece of a split TU) with a clang-built object, then use that stage2 to compile the known-differing target object. This isolates whether a failure is in the linker itself or in codegen for a specific source file.
Inspect MIR around the suspect symbol. A temporary filtered MIR dump around the target function, taken after lowering and the combine pass, is usually enough to see the divergence (e.g. a call argument that should reference an allocable register but instead references a backend scratch register).
Avoid -g while triaging -O1 codegen. Debug info changes object layout and
can create or hide layout-sensitive bugs; one historical "regalloc" diagnosis was
actually a -g artifact. Triage on the non--g object first.
Root-cause classes seen at the fixed point
These bug classes were responsible for past -O1 fixed-point and stage3-link
failures. They are fixed in the baseline, but they map the parts of the pipeline
most likely to break the property again, so they are worth keeping in mind when a
new arch or platform is brought up. See ../OPT.md for the passes.
Operand clobber in native emit. Materializing the left operand of a binop, compare, or compare-branch into a scratch register that already holds the right operand. The general rule: compute the RHS location first and exclude its register when materializing the LHS. A real instance produced
1 << 1for1u << (n & 31), which corrupted Mach-O section alignments and only manifested as a downstreamld -r/ stage3 link failure.Copy propagation across backend scratch registers. Treating backend scratch registers as ordinary hard registers during the combine pass: scratch registers may appear in lowered MIR, but they must not be extended across later instructions, because native lowering reuses them as transient temporaries. A real instance rewrote a stack-argument call operand back to scratch
x9, which was then clobbered before the store, sending an unrelated value into a stack slot and flipping an inline-always flag.Coalesce overlap checks must use raw range points, not compressed points.
Lower-pass hint fallback must not place values that are live across a call into caller-saved hint registers.
Native scratch budget. A backend needs enough integer scratch registers for all-spilled three-operand operations (aa64 needs three).
Aggregate copy/set with pointer operands. Pointer-valued operands of aggregate copy/set must not force-home the pointer local; genuinely frame-backed pointer locals need prematerialized indirect bases.
The throughline: the fragile interactions are between the optimizer's register-level reasoning (../OPT.md) and the backend's scratch-register discipline (../ARCH.md), with the object/link layer (../OBJ.md, ../LINK.md) as where the symptom finally surfaces. New backends should expect to re-litigate these before reaching their own fixed point.