kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit dc1fabbf4cea6f8570680c2365efc94d2cbf4a09
parent 0a2faa92c8f3b6a302e7f621f78a92ba4b4180c7
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri,  5 Jun 2026 06:40:34 -0700

doc/plan: add stack-trace builtins & runtime backtrace roadmap

Proposes a three-layer design: GCC-compatible __builtin_return_address /
__builtin_frame_address primitives (FP-chain lowering, enabled by kit always
keeping a frame pointer), a freestanding __kit_backtrace capture helper, and a
symbolizing __kit_print_backtrace print path that keeps the DWARF reader on the
hosted side of the freestanding boundary.

Diffstat:
Adoc/plan/BACKTRACE.md | 228+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdoc/plan/README.md | 1+
2 files changed, 229 insertions(+), 0 deletions(-)

diff --git a/doc/plan/BACKTRACE.md b/doc/plan/BACKTRACE.md @@ -0,0 +1,228 @@ +# Plan: stack-trace builtins & runtime backtrace + +## Status — 2026-06-05 — proposed; nothing built yet + +kit has no way for compiled code to inspect its own call stack. This roadmap +adds that capability in three layers: GCC-compatible primitive **builtins** +(`__builtin_return_address`, `__builtin_frame_address`), a freestanding runtime +**capture** function (`__kit_backtrace`), and a **symbolizing print** path +(`__kit_print_backtrace`) that turns return addresses into `func at file:line`. + +Matching design docs once shipped: [../FRONTENDS.md](../FRONTENDS.md) (the +builtins), [../RUNTIME.md](../RUNTIME.md) (the rt helpers), [../DWARF.md](../DWARF.md) +(symbolization). + +## Why + +- **Portability.** `__builtin_return_address` / `__builtin_frame_address` are a + de-facto part of the GCC/Clang surface. Real C code (libc `backtrace`, + sanitizer shims, allocators, profilers, `unwind`-free panic handlers) uses + them; kit currently can't compile any of it. +- **Diagnostics.** `__kit_assert_fail` (`rt/lib/assert/assert.c`) and the + emulator fault path (`src/emu/emu.c`, `compiler_panic`) currently die silently + with `__builtin_trap()`. A backtrace at the trap point is the single biggest + debuggability win for kit-compiled programs. +- **It is cheap here, specifically.** kit maintains a frame pointer on **every** + backend and has **no `-fomit-frame-pointer`** (x29 on aarch64, rbp on x64, + s0/x8 on rv64; `AA_FP = 29` at `src/arch/aa64/native.c:61`). Every prologue + stores a `{saved_fp, saved_ra}` frame record. Frame-pointer-chain walking is + therefore *reliable*, with no unwind tables and no `.eh_frame` dependency. + +## What already exists (and what it can't do) + +- **`.eh_frame` CFI** is emitted by default for hosted targets + (`src/arch/mc.c:736`, `mc_emit_eh_frame`), and **off for freestanding**. +- **A CFI unwinder**, `kit_dwarf_unwind_step` (`src/debug/dwarf_cfi.c:213`), + interprets FDE/CIE programs — but deliberately takes **no memory provider**, so + it *cannot self-unwind a live stack*. It is built for the dbg/JIT path where a + session reads target memory out-of-band (`driver/cmd/dbg.c:1010`, the `bt` + command). It is not a candidate for in-process capture. +- **Symbolization** (`kit_dwarf_addr_to_line`, `kit_dwarf_func_at`, + `include/kit/dwarf.h:21`) is mature — it backs `addr2line` + (`driver/cmd/addr2line.c`) and `dbg bt`. But it lives in **`libkit.a`, not the + freestanding runtime `rt/`**. Pulling it into a freestanding image is a + non-goal (see L3). +- **The runtime has zero unwind/backtrace code today.** `rt/lib/stack/` exists + but holds only the Windows `chkstk` helper — a natural home for the new + capture code. + +Design consequence: **capture via the FP chain; symbolize via the existing +DWARF reader, kept on the hosted side of the boundary.** Do *not* reuse the CFI +unwinder for self-capture. + +--- + +## L1 — Primitive builtins (`__builtin_return_address`, `__builtin_frame_address`) + +GCC semantics: `__builtin_frame_address(n)` returns the frame address of the +current function (n=0) or its n-th caller; `__builtin_return_address(n)` returns +the return address into that frame. The level argument **must be an integer +constant** (kit validates via the existing `eval_const_int()` path, as +`__builtin_offsetof` already does at `parse_expr.c:1331`). Out-of-range / runaway +walks are allowed to return a garbage-but-safe value or 0, matching GCC's "use 0 +only with care" contract. + +### Lowering: two new CG intrinsics, FP-chain only + +Add to `KitCgIntrinsic` (`include/kit/cg.h:916`): + +``` +KIT_CG_INTRIN_FRAME_ADDRESS, /* pop level(u32 const); push void* */ +KIT_CG_INTRIN_RETURN_ADDRESS, /* pop level(u32 const); push void* */ +``` + +Both lower through one shared FP-walk so level 0 and level N use the same path, +and so level 0's return address comes from the **spilled** frame-record slot (not +the live LR/RA, which may be clobbered mid-function): + +| arch | FP reg | `frame(0)` | walk one frame | return addr from frame F | +|------|--------|-----------|----------------|--------------------------| +| aarch64 | x29 | x29 | `fp = *(fp)` | `*(fp + 8)` (saved x30) | +| x86-64 | rbp | rbp | `fp = *(fp)` | `*(fp + 8)` (pushed retaddr) | +| rv64 | s0/x8 | s0 | `fp = *(fp - 16)` | `*(fp - 8)` (saved ra) | + +For a constant level the walk unrolls to `level` dependent loads (typically 0–2), +so no loop is emitted. wasm has no FP chain → **diagnose unsupported**, exactly +as the IRQ/cache intrinsics already do per-arch. + +### Files to touch (the standard "new value-producing intrinsic" path) + +- `include/kit/cg.h` — two enum entries + doc comments. +- `src/cg/arith.c:1726` — two rows in the `KitCgIntrinsic → INTRIN_*` table. +- `src/cg/cgtarget.h:148` — two `INTRIN_*` enum entries + (`INTRIN_FRAME_ADDRESS`, `INTRIN_RETURN_ADDRESS`). +- `lang/c/parse/parse_priv.h:231` + `parse.c:1526` — intern + `__builtin_return_address` / `__builtin_frame_address` symbols. +- `lang/c/parse/parse_expr.c` (in `try_parse_builtin_call`, ~1696–2018) — two + handlers: parse the constant level via `eval_const_int`, then emit the + intrinsic with result type `void*`. New `cg_adapter.c` helper + `pcg_frame_or_return_address(p, kind, level)`. +- Per-arch O0 lowering: `src/arch/aa64/native.c` (~3572), `src/arch/x64/native.c` + (~3378), `src/arch/riscv/native.c` (~2992) — emit the FP-walk loads; + `src/arch/wasm/emit.c` (~1590) + `src/arch/c_target/c_emit.c` (~2603) — handle + or diagnose (C target can emit `__builtin_*` straight through to the host + compiler). +- Capability hooks: `src/arch/{aa64,x64,riscv,wasm}/arch.c` (alongside the + existing `KIT_CG_INTRIN_TRAP` cases at e.g. `aa64/arch.c:197`). +- **Optimizer (O1/O2):** unlike `INTRIN_TRAP`/`INTRIN_LONGJMP` these are *not* + control-flow terminators, so the special-casing scattered through `src/opt` + (`pass_cfg.c:43`, `cg_ir_lower.c:286`, `pass_ssa.c:818`, …) does **not** apply. + But they **read memory and depend on the live frame**, so they must be modeled + as memory-reading / non-pure: not hoistable out of the function, not CSE'd + across a call, level-0 not sunk past a frame-pointer change. Audit the O1 emit + path (`src/opt/pass_native_emit.c`, `cg_ir_lower.c`) to lower them like an + ordinary load with a frame dependency. **This is the main risk area** and wants + dedicated O1 tests. + +### Tests (L1) + +- `test/toy/` — a CG-API toy case exercising both intrinsics at levels 0/1/2. +- `test/parse/cases/` — `builtin_NN_return_address.c` etc.; an error case for a + non-constant level. +- Smoke (`test/smoke`, exit-code-42 convention): a known 2-deep call chain where + `f` returns `__builtin_return_address(0)` and `main` compares it against a + label/`&&`-style anchor, asserting it lands inside the caller's range. Run at + **O0 and O1** on x64 + aa64 + rv64. + +--- + +## L2 — Capture: `__kit_backtrace` (freestanding runtime fn) + +Surface decision (confirmed): **primitives are builtins; capture/print are +runtime functions**, mirroring the GCC-builtin / glibc-`backtrace` split. New +freestanding TU `rt/lib/stack/backtrace.c`, declared in a new public runtime +header `rt/include/kit/backtrace.h`: + +```c +/* Fill buf[0..max) with return addresses, innermost first, skipping the + * innermost `skip` frames (skip >= 1 hides __kit_backtrace itself). + * Returns the number of frames written. Freestanding: pure FP walk, no libc, + * no DWARF, works on every target that keeps a frame pointer (all of kit's). */ +int __kit_backtrace(void** buf, int max, int skip); +``` + +Implementation is the L1 walk expressed in portable C: seed from +`__builtin_frame_address(0)`, then loop `fp = *(void**)fp` reading the saved-RA +slot per the table above, stopping on NULL fp, an fp that doesn't increase +(stack grows down — detect cycles/garbage), or `max`. The per-arch slot offsets +are the *only* target knob; keep them in one small `static` per the +RUNTIME.md "no target-dispatch ifdef" rule (parameterize, don't `#ifdef`-cascade +— select by a build-time constant the way the int/fp helpers do). + +- `mk/rt.mk` — add `rt/lib/stack/backtrace.c` to every variant (it already + compiles `rt/lib/stack/` for the Windows chkstk helper). +- **Hook the trap paths:** make `rt/lib/assert/assert.c::__kit_assert_fail` call + `__kit_print_backtrace()` (L3) before `__builtin_trap()`. Because the symbol is + `weak`, a freestanding user with no output sink can still override it. + +### Tests (L2) + +- `test/rt/cases/backtrace_capture.c` — build a known N-deep recursion, capture, + assert depth and monotonic frame addresses; `return 42` on success. Runs under + the existing `test/rt/run.sh` harness across variants. + +--- + +## L3 — Symbolize & print: `__kit_print_backtrace` + +This is where the freestanding boundary bites: turning an address into +`func at file:line` needs the DWARF reader, which is **libkit, not rt**. Three +sub-options, ordered by how cleanly they respect that boundary. Recommend +shipping **L3a now**, leaving L3b/L3c as documented extensions. + +- **L3a — raw print + out-of-process symbolization (recommended default).** + `__kit_print_backtrace()` lives in rt, walks via `__kit_backtrace`, and writes + raw lines (`#0 0x401136`, …) to a host-provided sink (a weak + `__kit_backtrace_write(const char*, size_t)` the host or `_start` wires to + `write(2)`; freestanding default is a no-op). Symbolization is then a separate + step through the **existing** `kit addr2line` tool (or a thin new `kit + symbolize` that batches). Zero new symbolization code, fully freestanding, + matches how minimal panic handlers work in the wild. + +- **L3b — in-process self-symbolization (hosted-only).** A trimmed line/func + reader (reusing `kit_dwarf_addr_to_line` + `kit_dwarf_func_at`) linked into a + **hosted-only** archive — e.g. `libkit_bt.a` or a `*-hosted` rt variant — that + opens the running image's own DWARF. Heavy (drags in the DWARF reader and an + image-self-map); strictly opt-in, never in the freestanding default. Only build + if a concrete consumer needs in-binary symbolized panics. + +- **L3c — tool-side auto-backtrace.** `kit run` / `kit emu` / `dbg` already own a + DWARF reader and the `dbg bt` rendering path (`driver/cmd/dbg.c:1010`). Hook + their fault/trap handlers (e.g. the `EMU_TRAP_FAULT` → `compiler_panic` site in + `src/emu/emu.c`) to print a symbolized backtrace automatically. This is the + highest-value, lowest-risk symbolized experience because it reuses everything + and never crosses into rt. Largely independent of L1/L2 (the tools can unwind + via their own session memory + `kit_dwarf_unwind_step`). + +### Tests (L3) + +- L3a: smoke test piping captured addresses through `kit addr2line`, asserting + the expected function names appear. +- L3c: an `kit emu` fault test asserting a symbolized frame line on stderr. + +--- + +## Suggested sequencing + +1. **WS1 — L1 primitives, O0**, all three native arches + parse/toy tests. Ship + the GCC-compatible surface first; it's the foundation and independently useful. +2. **WS2 — L1 at O1/O2**: the optimizer memory-effect modeling + O1 smoke tests. + (Highest-risk slice; isolate it.) +3. **WS3 — L2 `__kit_backtrace`** in rt + capture test + assert-hook. +4. **WS4 — L3a** raw print + `kit addr2line` round-trip; wire into assert/emu. +5. **WS5 — L3c** tool-side auto-backtrace (optional, parallelizable with WS3/4). +6. **L3b** deferred until a consumer needs in-binary symbolized panics. + +## Open questions + +- **wasm:** confirm "diagnose unsupported" is acceptable for L1 (no FP chain), or + whether the C/wasm targets should forward `__builtin_*` to the host toolchain. +- **rv64 frame-record layout:** verify the saved-ra/prev-fp offsets against the + actual prologue emitted by `src/arch/riscv/native.c` (the table above assumes + `ra@fp-8`, `fp@fp-16`; confirm before coding the walk). +- **Output sink for L3a:** weak `__kit_backtrace_write` vs. requiring the host to + pass a sink explicitly. Weak-symbol default keeps freestanding builds linking. +- **Level-0 return address semantics under tail-call / leaf-frame omission:** kit + keeps a frame pointer everywhere, but confirm leaf functions still store the + frame record (if a leaf skips the `{fp,ra}` store, level-0 return address must + fall back to live LR/RA on aa64/rv64). diff --git a/doc/plan/README.md b/doc/plan/README.md @@ -19,4 +19,5 @@ shrinks to whatever remains open. | [IMAGE_INSPECT.md](IMAGE_INSPECT.md) | Extending object inspection to executables and shared libraries. | [../OBJ.md](../OBJ.md) | | [BUILD.md](BUILD.md) | A new content-addressed build coordinator (Bazel/Nix-style incremental builds layered on the CAS) — storage state machine, caching algorithm, recipe protocol. Distinct from `../BUILD.md` (kit's own Makefile build). | — (new subsystem) | | [BUILD_COMMANDS.md](BUILD_COMMANDS.md) | The kit-native `build-exe`/`build-lib`/`build-obj` verbs that replace `compile`: polyglot, in-memory compile+link with `--group` flag scoping and full link-flag control. Distinct from `BUILD.md` (the CAS coordinator). | [../DRIVER.md](../DRIVER.md) | +| [BACKTRACE.md](BACKTRACE.md) | Stack-trace support: GCC-compatible `__builtin_return_address`/`__builtin_frame_address` primitives, a freestanding `__kit_backtrace` capture helper, and symbolized backtrace printing. | [../FRONTENDS.md](../FRONTENDS.md), [../RUNTIME.md](../RUNTIME.md), [../DWARF.md](../DWARF.md) | | [TODO.md](TODO.md) | Open deferred fixes and code smells only. Completed items are removed instead of checked off. Not a roadmap; a current backlog. | — |