boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit db08235c3ef5e143f15c843c191dd2d9e08f446b
parent e586fa17898c1d3b99304f4cf22af60b00041837
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat,  2 May 2026 15:50:45 -0700

docs: tcc-cc bug investigation report + gdb harness

docs/TCC-CC-INVESTIGATION.md documents the chase for the assert-fail-0
failures in the tcc-cc suite (14 of 15 failures hit the same site). Bug is
localized but not fixed: cc.scm-built tcc-boot2 corrupts ret.type in
cc__unary's function-call-return path, which makes is_float() return true on
a junk pointer-like value, allocates a float register for an int return, and
later trips a legitimate assert in arm64-gen.c:load() for the unsupported
int<->float register-pair case.

The report captures confirmed facts (verified by gdb on the running binary),
ruled-out hypotheses, ranked candidate root causes, and concrete next-step
suggestions so a fresh agent can pick this up without re-deriving the chain.

scripts/dbg-load-cbz.sh is the working gdb harness — runs in the
boot2-alpine-gcc:aarch64 container, breaks at key addresses in cc__unary /
cc__load / cc__vsetc / cc__is_float, and dumps register/stack state.
Addresses listed in the report match the binary at the report's commit.

Diffstat:
Adocs/TCC-CC-INVESTIGATION.md | 303+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Ascripts/dbg-load-cbz.sh | 36++++++++++++++++++++++++++++++++++++
2 files changed, 339 insertions(+), 0 deletions(-)

diff --git a/docs/TCC-CC-INVESTIGATION.md b/docs/TCC-CC-INVESTIGATION.md @@ -0,0 +1,303 @@ +# tcc-cc bug investigation: 14 fixtures fail with `assert fail: 0@12051` + +## Status + +Bug **localized but not fixed**. Root cause is in `cc.scm`-built `tcc-boot2`'s +runtime behavior: a corrupted `ret.type.t` in `cc__unary`'s function-call-return +path leads to allocating a float register for an int return value, which then +trips a legitimate `assert(0)` in `arm64-gen.c:load()` for the unsupported +mixed int/float register class case. + +Of the 15 failing fixtures, 14 hit the same `assert fail: 0` at the same +`tcc.flat.c:12051` site. The 15th (`220-const-promote`) is a separate +"compile succeeds, exits wrong" issue not covered here. + +## TL;DR for the next agent + +You're looking for a **`cc.scm` miscompile of `cc__unary` (in `tcc.flat.c`) +that corrupts the local `SValue ret` between `ret.type = s->type;` +(line 7837) and `is_float(ret.type.t)` (line 7840)**. + +The corruption is layout-sensitive (any 4-byte instruction added in +`cc__load` or anywhere in `cc__unary` makes the bug not fire on this +fixture). The corrupted value reads back as `0x007B4C19` — a vstack +region pointer-looking value, which has low 4 bits = 9 = `VT_DOUBLE`, so +`is_float` returns true, `ret.r = TREG_F(0) = 20`, and `vsetc` pushes an +SValue with `r=20` (a float register) for an int return. Later +`gv(RC_INT) → load(0, vtop)` correctly asserts because there's no int↔float +move for the `svr<0x30` register-pair case. + +## Reproduction + +```sh +make test SUITE=tcc-cc NAMES=013-call # fails: assert fail: 0@12051 +``` + +(The Makefile already drives `TCC_TARGET=ARM64` for the `tcc-cc` suite. +Don't run `make tcc-boot2 ARCH=aarch64` standalone without setting +`TCC_TARGET` — until commit `3317ca3` the default `TCC_TARGET=X86_64` was +silently producing an x86_64-targeted `tcc.flat.c`. That's fixed.) + +To run native gdb against the binary: +```sh +podman run --rm --pull=never --platform linux/arm64 \ + -v "$PWD":/work -w /work boot2-alpine-gcc:aarch64 \ + sh scripts/dbg-load-cbz.sh +``` + +`scripts/dbg-load-cbz.sh` is a working diagnostic harness with breakpoints +already wired up. Edit the gdb commands in `/tmp/g.gdb` (heredoc inside the +script) to add new breakpoints. Note: the binary has no symbols — work +in raw addresses. + +## Confirmed facts (verified by gdb on the running binary) + +1. **The `assert(0)` itself is correct given its inputs.** At the failing + `load()` call: `r=0` (target = `TREG_R(0)` = X0, an int reg) and + `sv->r=0x14` (= `TREG_F(0)` = 20, a float reg). `arm64-gen.c:12042-51` + asserts because there's no defined cross-class register move for the + `svr<0x30` branch. So the bug is upstream. + +2. **`vtop->r=20` was set by a `vsetc` call** at `LR=0x6c777c`, which is + `cc__unary` line 7864: + ```c + for (r = ret.r + ret_nregs + !ret_nregs; r-- > ret.r;) { + vsetc(&ret.type, r, &ret.c); + vtop->r2 = ret.r2; + } + ``` + For `ret.r=20` (= TREG_F(0)) and `ret_nregs=1`, the loop runs once with + `r=20`, so `vsetc` pushes an SValue with `r=20`. + +3. **`ret.r` was set to 20 by line 7841** because `is_float(ret.type.t)` + returned true: + ```c + if (is_float(ret.type.t)) { + ret.r = reg_fret(ret.type.t); // TREG_F(0) = 20 + } else { + ret.r = ((0)); + } + ``` + +4. **`is_float` was called with `t=0x007B4C19`**, not a valid type code. + Low 4 bits = 9 (= `VT_DOUBLE`), so `is_float` returns true. gdb output + confirmed: `is_float(t=0x7b4c19) -> TRUE lr=0x6c6e30`. + +5. **`ret.type.t` (read at SP+704 in `cc__unary`'s frame) holds + `0x007B4C19`** — a vstack-region pointer-like value, NOT a type code. + It must have been corrupted between line 7837 (the assignment `ret.type + = s->type`) and line 7840 (the `is_float` read). + +6. **Adjacent stack bytes look like memcpy loop variables.** At the + moment of the failing `is_float`: + - `[SP+704..711] = 0x007B4C19` + - `[SP+712..719] = 0x007B4C1A` + + These differ by exactly 1, which is the signature of `_memcpy`'s + byte-by-byte `dest`/`src` walking (each iteration: `dest++; src++`). + `0x7B4C0A` is `vtop` for this fixture; `0x7B4C1A = vtop+16` is exactly + `&vtop->r`. So one of these slots is holding a pointer that walked + into the middle of an SValue during a struct-copy memcpy. + +7. **`mes-libc/string/memcpy.c:_memcpy` is byte-by-byte** (compiled to a + loop with 4 LDRBs + ORs + shifts per byte for ld_w, etc). cc.scm + compiles its locals into 240 bytes of frame; max slot offset used is + 216, so it doesn't overflow its own frame. The `_memcpy → memcpy` + wrapper is the only `_memcpy` caller. + +8. **The bug is `cc.scm`-specific.** The gcc-built control + (`scripts/run-gcc-libc-flat-tcc.sh`) compiles the same `tcc.flat.c` and + passes 177/178 of the same fixtures. So the C source is fine; it's + cc.scm's lowering that breaks. + +9. **Layout-sensitive fix.** Inserting any 4-byte instruction (even + `%addi(t0, t0, 0)`) anywhere before the failing site in `cc__load` + makes the test pass. The CBZ at `0x73B6C0` (mod 64 = 0) is *not* + the cause — replacing all CBZ/CBNZ with CMP+B.cond+BR (which shifts + load() by ~144 bytes) didn't fix it. Layout sensitivity comes from + `tcc-boot2`'s runtime state changing as code positions shift, not from + any specific instruction's alignment. + +10. **`%li(rd, imm)` lowering was changed** from LDR-literal-pool to + MOVZ/MOVK chain (4 instructions, 16 bytes — same size as before). + This was investigated as a possible alignment fix; it isn't, but the + new lowering is kept as a defensible cleanup that eliminates literal + pool entries from the executable instruction stream. + +## Hypotheses, ranked by likelihood + +### H1 (most likely): cc.scm slot-allocator bug — `ret`'s slot overlaps memcpy state + +The fact that two slots in `cc__unary`'s frame (`SP+704` and `SP+712`) +hold sequential pointer values one byte apart is the signature of +`_memcpy`'s `dest`/`src` loop variables. cc.scm allocates locals as +fixed slot offsets per function, but if its bookkeeping for `ret`'s +slot collides with another local *or* if there's interference from a +helper called via `gfunc_call`, then `ret.type.t` and `ret.type.ref` get +clobbered. + +The shape of the corrupted value (`0x7B4C19` = vstack address midway +through an SValue) strongly suggests the leak is from inside an +SValue-copying memcpy. Likely sources of such struct copies between +line 7837 (set ret.type) and line 7840 (read ret.type.t): +- there are NO C statements between those lines, so the leak must + come from how cc.scm compiles **line 7837 itself** — `ret.type = s->type` + which is a 16-byte struct copy via memcpy. +- Or a related copy emitted by cc.scm for a temporary. + +**Investigation steps:** +1. Generate `cc__unary`'s P1pp around the function-call-return path + (lines ~7800-7870). Look for the slot offsets used for `ret.type` and + `s->type` and any `%call(&memcpy)` between them. +2. Compare cc.scm's computed slot offset for `ret.type.t` against + what's actually loaded at the failing address `0x6c6dd0` (loads from + `SP+704`). Do they match? +3. Hypothesis-test by adding a deliberate stack-padding local in + `unary()` (e.g. `volatile int __pad[64];` near `ret`) and re-running. + If that fixes it, slot allocation is the issue. + +### H2: cc.scm miscompile of `s->type` member access + +Maybe `s->type` is being computed wrong — reading from the wrong +offset within Sym, returning a pointer-like value. cc.scm's C grammar +includes `member-of-pointer-deref`; if `s->type` (where type is a +nested struct) is mis-translated to `&s->type` (the address) or to +`s + offsetof(Sym, type) + N` for the wrong N, the source of the +struct copy is wrong. + +**Investigation steps:** +1. Look at `cc.scm`'s parsing/codegen for `->` accessing a struct + member that is itself a struct. +2. Inspect the emitted P1pp for `ret.type = s->type` — does the + memcpy source address look correct? + +### H3: ABI mismatch in cc.scm's struct-copy lowering for `CType` + +`CType` is 16 bytes (int t + Sym *ref) with 4 bytes of padding for +8-byte alignment of `ref`. If cc.scm's struct-copy lowering walks bytes +0..16 of source but skips/duplicates the padding region differently +between source and destination, `ret.type.ref`'s bytes can land in +`ret.type.t`'s position. + +The previous SValue struct-copy fix (`cc/cg-assign-struct`) was +specifically called out in `docs/TCC-TODO.md` — a similar fix may be +needed for nested CType copy. + +**Investigation steps:** +1. Read `cc.scm`'s `cg-assign-struct` and verify it handles CType-sized + (16 byte) copies. Check whether `cc-assign` for the case + "lhs is a struct member that is itself a struct" routes through + `cg-assign-struct` correctly. +2. Try adding a regression test: a tiny C program that does `dst.type + = src.type;` where type is a `CType`-shaped struct, run through + cc.scm and verify field values are preserved. + +### H4 (less likely): vstack pointer corruption + +Maybe `vtop` itself is moving incorrectly, and the SValue at the +"failing vtop position" is actually unused garbage left behind from a +prior operation. This would mean the "wrong sv->r=20" was set in some +earlier vstack slot and we're just reading stale memory. + +We already confirmed via watchpoint that the `r=0x14` at +`vtop[0]+16` (= `0x7B4C1A`) was *written* by the memcpy of +`vtop[-1]` (a vswap), so the source of the bad value is `vtop[-1].r = +20` immediately before the swap. Trace continues upstream: who set +`vtop[-1].r = 20`? The trace showed the vsetc(r=0x14) at lr=0x6c777c +WROTE r=20 to its target slot — and that slot is the one that becomes +vtop[-1] after the next vpush. So this is downstream of H1-H3. + +### H5 (ruled out, listed for completeness) + +- LDR-literal alignment (8-byte literals at 4-byte aligned addresses): + ruled out — replaced `%li` with MOVZ/MOVK chain, bug persists. +- CBZ at cache-line boundary (Apple Silicon erratum): ruled out — replaced + CBZ with CMP+B.cond+BR, bug persists. +- Wrong opcode encoding for cond-branch: ruled out — count of CBZ vs CBNZ + in expanded.M1 matches the count of `%ifelse_nez` vs `%cmpset_eqz` in + source. +- Hex2 mis-resolving labels: verified literals in binary point to correct + branch targets. + +## Suggested next steps in priority order + +1. **Read `cc.scm`'s `cg-assign-struct`** (search for `cg-assign-struct` + and the nested struct copy path). Verify handling for CType-sized + nested struct copies. This is the most likely culprit (H3). + +2. **Build a minimal C reproducer** that triggers the corruption without + needing the whole tcc compile. Something like: + ```c + typedef struct { int t; void *ref; } CType; + typedef struct { CType type; int r; } SValue; + int test(SValue *sv) { + SValue ret; + ret.type = sv->type; + return ret.type.t; + } + ``` + Compile with cc.scm and verify the field copy works correctly. If it + doesn't reproduce in isolation, the trigger requires more state + (e.g. specific stack frame size or memcpy interaction). + +3. **Compare the emitted P1pp for the failing call site against a + working call site.** Find another place in `cc__unary` that does a + similar struct copy followed by an int read, and diff the two P1pp + sequences. The buggy one will have a structural anomaly. + +4. **If P1pp looks correct, drop down to gdb tracing.** Use + `scripts/dbg-load-cbz.sh` as a starting point. Set watchpoints on + stack slots in `cc__unary`'s frame to find what writes the + pointer-like value into `ret.type.t`'s slot. Don't add new code + inside `cc__unary` for diagnostic — it shifts the layout and the bug + disappears. + +## Useful addresses (will shift if anything in the load chain changes) + +In the **current broken binary** (no `%addi` workaround): +- `cc__unary` entry: `0x006BFC70` +- `cc__unary` is_float call (line 7840): `0x006C6E2C` +- `cc__unary` vsetc call (line 7864): `0x006C7778` +- `cc__load` entry: `0x007395EC` +- `cc__load` outer-2 if test CBZ: `0x0073B6C0` (the actual assert site) +- `cc__vsetc` entry: `0x006795BC` +- `cc__vpop` entry: `0x0067A6B8` +- `cc__save_reg` entry: `0x0067DB94` +- `cc__get_reg` entry: `0x0067F130` +- `cc__gv` entry: `0x00680918` +- `cc__is_float` entry: `0x00672C54` +- `cc__vtop` (global ptr): `0x007B4A32` +- `cc__pvtop` (global ptr): `0x007B4A2A` +- `cc____vstack` (array): `0x007B4A3A` +- `_memcpy` entry: `0x006068BC` + +To regenerate addresses after a build change, use the recipe in +`scripts/dbg-load-cbz.sh` (anchor on `cc__load`'s entry signature +`FF0324D1` = SUB SP, SP, #2304, then walk byte counts in expanded.M1). + +## Files of interest + +- `tcc.flat.c:7745-7870` — `unary()`'s symbol-lookup and function-call + paths. The C code allegedly sets up `ret` correctly. +- `tcc.flat.c:5006-5022` — `vsetc` source (where the wrong r=20 ends up + written into vtop). +- `tcc.flat.c:11999-12053` — `arm64-gen.c:load()` (where the assert + fires; not actually wrong). +- `cc/cc.scm` — the cc.scm compiler. Search for `cg-assign-struct`, + member access through pointer, slot allocation logic. +- `P1/P1-aarch64.M1pp:385-403` — `p1_li` (already changed to MOVZ/MOVK, + not relevant to the bug). +- `scripts/dbg-load-cbz.sh` — gdb diagnostic harness with working + breakpoints. + +## What's already in the tree from this investigation + +Committed: +- `3317ca3` — Makefile: ARCH controls TCC_TARGET (no longer silently + building x86_64-targeted tcc-boot2 when running with ARCH=aarch64). + +Uncommitted (working tree): +- `P1/P1-aarch64.M1pp` — `p1_li` rewritten as MOVZ/MOVK chain. Same + 16-byte size, no functional change beyond eliminating literal-pool + reads. **Defensible cleanup; does not fix the bug.** +- `scripts/dbg-load-cbz.sh` — gdb diagnostic helper (untracked). diff --git a/scripts/dbg-load-cbz.sh b/scripts/dbg-load-cbz.sh @@ -0,0 +1,36 @@ +#!/bin/sh +set -e +apk add --quiet gdb >/dev/null 2>&1 + +cd /work +cat > /tmp/g.gdb << 'GDB' +set pagination off +set width 0 +set print address off + +# vstack[7] is at 0x7b4c0a (sv->r at +16 = 0x7b4c1a) +# Watch the r field for changes; trace what writes 20 to it. +# Break at cc__vsetc entry — dump ret.type.t +break *0x6795BC +commands + printf " vsetc(type.t=%d, r=0x%llx) lr=0x%llx\n", *(int *)$x0, $x1, $lr + continue +end + +break *0x6c6e2c +commands + printf " is_float at 7840: X0=0x%llx vtop=0x%llx\n", $x0, *(unsigned long long *)0x7b4a32 + printf " [vtop+0..7]=0x%llx [vtop+8..15]=0x%llx [vtop+16..23]=0x%llx\n", *(unsigned long long *)(*(unsigned long long *)0x7b4a32), *(unsigned long long *)(*(unsigned long long *)0x7b4a32 + 8), *(unsigned long long *)(*(unsigned long long *)0x7b4a32 + 16) + continue +end + +# Also: load entry to know when we're in the failing call +break *0x7395ec +commands + printf ">> load(r=%d sv=0x%llx sv->r=0x%x)\n", $x0, $x1, *(unsigned short *)($x1 + 16) + continue +end + +run -nostdlib build/aarch64/tcc-cc/start.o build/aarch64/tcc-cc/mem.o tests/cc/013-call.c -o /tmp/out_013 +GDB +gdb -batch -x /tmp/g.gdb build/aarch64/tcc-boot2/tcc-boot2 2>&1