kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 88f892c4c2d7df14d39dabecfbeb98b878e36b9b
parent 38f83b0b93daa7403a21625ead1ffcaf1da37f20
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 29 May 2026 14:13:39 -0700

doc: record 3-stage bootstrap state and the open -O1 cc.c param-bind miscompile

Diffstat:
Adoc/BOOTSTRAP_O1.md | 157+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 157 insertions(+), 0 deletions(-)

diff --git a/doc/BOOTSTRAP_O1.md b/doc/BOOTSTRAP_O1.md @@ -0,0 +1,157 @@ +# 3-Stage Bootstrap: state & open bugs + +The bootstrap compiles cfree with itself three times and checks the result is a +fixed point: + +- **stage1** = the host (clang/asan) build of cfree (`build/cfree`, copied to + `build/<mode>/bootstrap/stage1`). +- **stage1 compiles stage2**, **stage2 compiles stage3**, then `cmp stage2 stage3`. + +``` +make bootstrap-debug # -O0 path (HOST_OPTFLAGS=-O0) +make bootstrap-release # -O1 path (Makefile sets HOST_OPTFLAGS=-O1 inside RELEASE=1) +make bootstrap-test-toy # bootstrap-debug, then run test/toy against stage3 +``` + +Host = aarch64-macos (aa64 backend, the `-O1` native-emit path: +`src/opt/pass_native_emit.c` + `src/arch/aa64/native.c`). + +--- + +## -O0 — DONE (reproduces) + +`make bootstrap-debug` reproduces: `stage2 == stage3` (identical sha256), toy +**1034 pass / 0 fail / 8 skip**. + +Earlier `-O0` bugs (weak syms, return coercion, sret reg-clobber — commit +`826fa2a`; and a `type_qualified` struct-init-padding self-miscompile — commit +`85446c9`) are fixed. The `type_qualified` one is worth remembering as a class of +hazard: cfree lowers aggregate **initialization** field-by-field and does **not** +copy inter-field padding, whereas aggregate **assignment** uses a full-width +`copy_bytes`. Any `memcmp` over a struct value built by aggregate *init* can see +uninitialized padding. Sibling `memcmp`-on-struct sites still exist +(`src/cg/type.c` ~329/347 over `attrs`); the safe idiom is to build the template +by plain assignment (`Type tmpl; tmpl = *base;`). + +--- + +## -O1 — three aa64 codegen bugs fixed (commit `b520142`) + +Before this commit `make bootstrap-release` died on the **first** stage2 TU +(`src/api/asm_emit.c`). After it, the `-O1` self-build compiles **and links a +complete, runnable stage2**. `-O0` still reproduces and toy is still 1034/0. + +Per-file reproduction used throughout: + +``` +build/cfree cc -O1 -DNDEBUG -ffreestanding -nostdinc -Irt/include \ + -fvisibility=hidden -Iinclude -Isrc -c <file> -o /tmp/x.o +``` + +### 1. `opt_ranges_overlap_kind` must use raw, not compressed, points +`src/opt/pass_coalesce.c`. `range_compress_points` only keeps points that are a +range boundary, so an interior instruction point shared by two live values gets +dropped — collapsing a genuine 2-point overlap into a single compressed point +that masquerades as the benign unit-overlap of a coalescable move. This let the +O1 hint fallback in `opt_assign_ranges` place a live call result and a later +x0-bound copy into the **same** hard reg (`src/cg/control.c` block 18: +`call def=v44` then `copy v46=v1`, both x0). The COPY/swap pattern the +unit-overlap is meant to permit is genuinely one *raw* point wide, so raw points +distinguish the two cases with no false positives. **Fix:** iterate +`raw_start`/`raw_end` instead of `start`/`end`. + +### 2. Never park a live-across-call value in the caller-saved hint reg +`src/opt/pass_lower.c`, `opt_assign_ranges` hint fallback. The +1000 caller-save +penalty in `hard_reg_alloc_score` only deflects the out-of-allocable-set hint reg +(e.g. x0 on aa64) when a cheaper reg is *found*; under high register pressure +(`found == 0`) the fallback took the hint reg regardless, parking a cross-call +value (x0-hinted via a copy chain from an earlier call result) in x0 where it +collided with the next call's result (`src/api/asm_emit.c`: v38, live across two +calls, used in a successor block). **Fix:** guard the hint-reg branch with +`!(vi->live_across_call_freq && is_caller_saved(f, cls, hint))`. + +### 3. aa64 needs three int scratch registers, not two +`src/arch/aa64/native.c`. A 3-operand op whose dst and both sources all spill +(e.g. `binop dst, a, b` with a non-encodable immediate operand, or +`store [base+index], value`) needs three distinct scratch regs at emit time — the +IR spill-rewrite round-robins operands across this pool and the native emitter +materializes each into one. Two left an all-spilled binop's immediate operand +with nowhere to land (`src/arch/mc.c`, `src/link/link_reloc_layout.c`). **Fix:** +`aa_int_scratch = {x9, x10, x11}`; x11 moved from allocable to reserved in +`aa_int_phys`. (Note `aa_int_allocable[]` still lists x11; that array feeds only +the **-O0** `NativeDirectTarget`, which uses its own x16/x17 temps, so there is no +conflict — the two emitters keep independent register models.) + +--- + +## -O1 — OPEN: runtime entry-param-bind miscompile in `driver/cc.c` + +The `-O1`-self-compiled **stage2** compiler **segfaults compiling stage3's +`abi.c`**. This is a *runtime* miscompile (wrong machine code in stage2), not a +compile-time panic. + +### Reproduce +``` +build/release/bootstrap/stage2/cc -O1 -DNDEBUG -ffunction-sections -fdata-sections \ + -std=c11 -ffreestanding -nostdinc -Irt/include -fvisibility=hidden \ + -Iinclude -Isrc -c src/abi/abi.c -o /tmp/x.o # exit 139 +``` +`build/cfree` is the clang build, so `build/cfree` compiling `driver/cc.c` at +`-O1` reproduces the same miscompiled object directly. + +### Localization +Bisected (hybrid relink harness, below) across all subsystems — +opt / cg / lang(c,cpp) / core / arch / obj / abi / api were all clean — down to +`driver/*` and finally to **`driver/cc.c`**, function **`cc_alloc_arrays`** +(line ~278). + +### Root cause +`cc_alloc_arrays(CcOptions* o, int argc)` makes ~14 sequential +`driver_alloc_zeroed(o->env, ...)` calls, so `o` is live across all of them and is +(correctly) allocated to a **callee-saved** register, x19. The prologue saves the +old x19 and stores the incoming arg (x0 = `o`) to a spill slot — but the body uses +**x19** as `o` and the move/reload that should populate x19 (from x0, or from the +spill slot) is **missing**. x19 therefore holds caller garbage (a leftover path +string), so `ldr x0, [x19]` reads junk for `o->env` and `driver_alloc_zeroed` +faults. + +Disassembly of `cc_alloc_arrays` (stage2, `-O1`): +``` ++292: str x19, [x29, #0x28] ; save old callee-saved x19 ++296: str x0, [x29, #0x20] ; store incoming `o` to a spill slot +... ++324: str x9, [x19, #0x10] ; <-- uses x19 as `o`, but x19 was NEVER loaded ++328: ldr x0, [x19] ; o->env (x19 = garbage -> crash downstream) ++33c: bl driver_alloc_zeroed +``` +`aa_bind_native_param` (`src/arch/aa64/native.c` ~3498) is correct for a register +destination (it emits `mov d, src`). The defect is **upstream**: the allocator +gave `o`'s *param storage* a frame slot while a body value (a copy of `o`) lives +in x19, and the connecting reload (`ldr x19, [slot]`) was elided. So the hunt is +in the param-storage / copy-reload path at `-O1`, not in the aa64 bind code. + +### This bug was *exposed*, not caused, by commit `b520142` +With the pre-`b520142` compiler, `driver/cc.c` couldn't be compiled at `-O1` at +all — it hit the interference verifier (block 103, op 15 / BINOP). Fix #2 above +correctly routes the live-across-call param away from caller-saved x0 to +callee-saved x19, which surfaced the latent param-bind defect. It is **not** a +verifier issue and the verifier must not be relaxed. + +### Bisection harness (how the file was found) +The release stage2 links **only** via cfree's own `ld` (Apple `ld` asserts on +cfree objects; the stage2 binary's own `ld` also crashes once it is itself +miscompiled). To build a hybrid stage2 with a chosen subset of TUs at `-O0`: + +1. Replay each TU's exact stage2 compile command from the bootstrap build log, + swapping `-O1`→`-O0` (flags **must** match the original or `.build-config` + changes and forces a full recompile, clobbering the swap). +2. Relink by replaying the captured `ld -r … -o …/libcfree.o`, + `ar rcs …/libcfree.a …`, and the final `stage1/cc … -o …/stage2/cfree` + commands (grep them out of the build log). +3. Smoke-test: `stage2/cc … -c src/abi/abi.c -o /tmp/x.o` (139 = still crashes, + 0 = the buggy TU is now in the `-O0` set). + +For a named backtrace: drop `-Wl,-S` on the final link and compile the suspect TU +with `-g`. cfree emits symbols only for **non-static** functions, so static +functions bucket under the previous global symbol — `-g` line info is needed to +identify them.