commit 88f892c4c2d7df14d39dabecfbeb98b878e36b9b
parent 38f83b0b93daa7403a21625ead1ffcaf1da37f20
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Fri, 29 May 2026 14:13:39 -0700
doc: record 3-stage bootstrap state and the open -O1 cc.c param-bind miscompile
Diffstat:
| A | doc/BOOTSTRAP_O1.md | | | 157 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 157 insertions(+), 0 deletions(-)
diff --git a/doc/BOOTSTRAP_O1.md b/doc/BOOTSTRAP_O1.md
@@ -0,0 +1,157 @@
+# 3-Stage Bootstrap: state & open bugs
+
+The bootstrap compiles cfree with itself three times and checks the result is a
+fixed point:
+
+- **stage1** = the host (clang/asan) build of cfree (`build/cfree`, copied to
+ `build/<mode>/bootstrap/stage1`).
+- **stage1 compiles stage2**, **stage2 compiles stage3**, then `cmp stage2 stage3`.
+
+```
+make bootstrap-debug # -O0 path (HOST_OPTFLAGS=-O0)
+make bootstrap-release # -O1 path (Makefile sets HOST_OPTFLAGS=-O1 inside RELEASE=1)
+make bootstrap-test-toy # bootstrap-debug, then run test/toy against stage3
+```
+
+Host = aarch64-macos (aa64 backend, the `-O1` native-emit path:
+`src/opt/pass_native_emit.c` + `src/arch/aa64/native.c`).
+
+---
+
+## -O0 — DONE (reproduces)
+
+`make bootstrap-debug` reproduces: `stage2 == stage3` (identical sha256), toy
+**1034 pass / 0 fail / 8 skip**.
+
+Earlier `-O0` bugs (weak syms, return coercion, sret reg-clobber — commit
+`826fa2a`; and a `type_qualified` struct-init-padding self-miscompile — commit
+`85446c9`) are fixed. The `type_qualified` one is worth remembering as a class of
+hazard: cfree lowers aggregate **initialization** field-by-field and does **not**
+copy inter-field padding, whereas aggregate **assignment** uses a full-width
+`copy_bytes`. Any `memcmp` over a struct value built by aggregate *init* can see
+uninitialized padding. Sibling `memcmp`-on-struct sites still exist
+(`src/cg/type.c` ~329/347 over `attrs`); the safe idiom is to build the template
+by plain assignment (`Type tmpl; tmpl = *base;`).
+
+---
+
+## -O1 — three aa64 codegen bugs fixed (commit `b520142`)
+
+Before this commit `make bootstrap-release` died on the **first** stage2 TU
+(`src/api/asm_emit.c`). After it, the `-O1` self-build compiles **and links a
+complete, runnable stage2**. `-O0` still reproduces and toy is still 1034/0.
+
+Per-file reproduction used throughout:
+
+```
+build/cfree cc -O1 -DNDEBUG -ffreestanding -nostdinc -Irt/include \
+ -fvisibility=hidden -Iinclude -Isrc -c <file> -o /tmp/x.o
+```
+
+### 1. `opt_ranges_overlap_kind` must use raw, not compressed, points
+`src/opt/pass_coalesce.c`. `range_compress_points` only keeps points that are a
+range boundary, so an interior instruction point shared by two live values gets
+dropped — collapsing a genuine 2-point overlap into a single compressed point
+that masquerades as the benign unit-overlap of a coalescable move. This let the
+O1 hint fallback in `opt_assign_ranges` place a live call result and a later
+x0-bound copy into the **same** hard reg (`src/cg/control.c` block 18:
+`call def=v44` then `copy v46=v1`, both x0). The COPY/swap pattern the
+unit-overlap is meant to permit is genuinely one *raw* point wide, so raw points
+distinguish the two cases with no false positives. **Fix:** iterate
+`raw_start`/`raw_end` instead of `start`/`end`.
+
+### 2. Never park a live-across-call value in the caller-saved hint reg
+`src/opt/pass_lower.c`, `opt_assign_ranges` hint fallback. The +1000 caller-save
+penalty in `hard_reg_alloc_score` only deflects the out-of-allocable-set hint reg
+(e.g. x0 on aa64) when a cheaper reg is *found*; under high register pressure
+(`found == 0`) the fallback took the hint reg regardless, parking a cross-call
+value (x0-hinted via a copy chain from an earlier call result) in x0 where it
+collided with the next call's result (`src/api/asm_emit.c`: v38, live across two
+calls, used in a successor block). **Fix:** guard the hint-reg branch with
+`!(vi->live_across_call_freq && is_caller_saved(f, cls, hint))`.
+
+### 3. aa64 needs three int scratch registers, not two
+`src/arch/aa64/native.c`. A 3-operand op whose dst and both sources all spill
+(e.g. `binop dst, a, b` with a non-encodable immediate operand, or
+`store [base+index], value`) needs three distinct scratch regs at emit time — the
+IR spill-rewrite round-robins operands across this pool and the native emitter
+materializes each into one. Two left an all-spilled binop's immediate operand
+with nowhere to land (`src/arch/mc.c`, `src/link/link_reloc_layout.c`). **Fix:**
+`aa_int_scratch = {x9, x10, x11}`; x11 moved from allocable to reserved in
+`aa_int_phys`. (Note `aa_int_allocable[]` still lists x11; that array feeds only
+the **-O0** `NativeDirectTarget`, which uses its own x16/x17 temps, so there is no
+conflict — the two emitters keep independent register models.)
+
+---
+
+## -O1 — OPEN: runtime entry-param-bind miscompile in `driver/cc.c`
+
+The `-O1`-self-compiled **stage2** compiler **segfaults compiling stage3's
+`abi.c`**. This is a *runtime* miscompile (wrong machine code in stage2), not a
+compile-time panic.
+
+### Reproduce
+```
+build/release/bootstrap/stage2/cc -O1 -DNDEBUG -ffunction-sections -fdata-sections \
+ -std=c11 -ffreestanding -nostdinc -Irt/include -fvisibility=hidden \
+ -Iinclude -Isrc -c src/abi/abi.c -o /tmp/x.o # exit 139
+```
+`build/cfree` is the clang build, so `build/cfree` compiling `driver/cc.c` at
+`-O1` reproduces the same miscompiled object directly.
+
+### Localization
+Bisected (hybrid relink harness, below) across all subsystems —
+opt / cg / lang(c,cpp) / core / arch / obj / abi / api were all clean — down to
+`driver/*` and finally to **`driver/cc.c`**, function **`cc_alloc_arrays`**
+(line ~278).
+
+### Root cause
+`cc_alloc_arrays(CcOptions* o, int argc)` makes ~14 sequential
+`driver_alloc_zeroed(o->env, ...)` calls, so `o` is live across all of them and is
+(correctly) allocated to a **callee-saved** register, x19. The prologue saves the
+old x19 and stores the incoming arg (x0 = `o`) to a spill slot — but the body uses
+**x19** as `o` and the move/reload that should populate x19 (from x0, or from the
+spill slot) is **missing**. x19 therefore holds caller garbage (a leftover path
+string), so `ldr x0, [x19]` reads junk for `o->env` and `driver_alloc_zeroed`
+faults.
+
+Disassembly of `cc_alloc_arrays` (stage2, `-O1`):
+```
++292: str x19, [x29, #0x28] ; save old callee-saved x19
++296: str x0, [x29, #0x20] ; store incoming `o` to a spill slot
+...
++324: str x9, [x19, #0x10] ; <-- uses x19 as `o`, but x19 was NEVER loaded
++328: ldr x0, [x19] ; o->env (x19 = garbage -> crash downstream)
++33c: bl driver_alloc_zeroed
+```
+`aa_bind_native_param` (`src/arch/aa64/native.c` ~3498) is correct for a register
+destination (it emits `mov d, src`). The defect is **upstream**: the allocator
+gave `o`'s *param storage* a frame slot while a body value (a copy of `o`) lives
+in x19, and the connecting reload (`ldr x19, [slot]`) was elided. So the hunt is
+in the param-storage / copy-reload path at `-O1`, not in the aa64 bind code.
+
+### This bug was *exposed*, not caused, by commit `b520142`
+With the pre-`b520142` compiler, `driver/cc.c` couldn't be compiled at `-O1` at
+all — it hit the interference verifier (block 103, op 15 / BINOP). Fix #2 above
+correctly routes the live-across-call param away from caller-saved x0 to
+callee-saved x19, which surfaced the latent param-bind defect. It is **not** a
+verifier issue and the verifier must not be relaxed.
+
+### Bisection harness (how the file was found)
+The release stage2 links **only** via cfree's own `ld` (Apple `ld` asserts on
+cfree objects; the stage2 binary's own `ld` also crashes once it is itself
+miscompiled). To build a hybrid stage2 with a chosen subset of TUs at `-O0`:
+
+1. Replay each TU's exact stage2 compile command from the bootstrap build log,
+ swapping `-O1`→`-O0` (flags **must** match the original or `.build-config`
+ changes and forces a full recompile, clobbering the swap).
+2. Relink by replaying the captured `ld -r … -o …/libcfree.o`,
+ `ar rcs …/libcfree.a …`, and the final `stage1/cc … -o …/stage2/cfree`
+ commands (grep them out of the build log).
+3. Smoke-test: `stage2/cc … -c src/abi/abi.c -o /tmp/x.o` (139 = still crashes,
+ 0 = the buggy TU is now in the `-O0` set).
+
+For a named backtrace: drop `-Wl,-S` on the final link and compile the suspect TU
+with `-g`. cfree emits symbols only for **non-static** functions, so static
+functions bucket under the previous global symbol — `-g` line info is needed to
+identify them.