kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Plan: stack-trace builtins & runtime backtrace

Status — 2026-06-05 — L1 + L2 + L3a + kit symbolize + L3c (tool-side auto-backtrace) shipped (WS1–WS5); L3b remaining

L3c (WS5) — tool-side auto-backtrace for kit run and kit dbg — is now shipped. Both tools print a symbolized frame-pointer-chain backtrace at a fault/trap, reusing the DWARF reader they already own and never crossing into rt/. The decisive finding: the CFI stepper kit_dwarf_unwind_step (src/debug/dwarf_cfi.c:213) takes no memory provider, so when the return address is spilled to the stack (the normal case) it returns pc=0 and the walk dies after the leaf — the existing dbg bt was effectively single-frame. The fix is to walk the frame-pointer chain (kit's uniform fp[0]=caller fp / fp[1]=saved ra record, no .eh_frame needed), the same walk __kit_backtrace does, lifted tool-side with a memory-read callback.

Scope note: kit emu auto-backtrace is out of scope (the emulator doesn't retain the guest's DWARF after load); left as a follow-up alongside L3b.

L3a (WS4) is now shipped on top of L1/L2:

Tests (L3a): test/rt/cases/print_backtrace.c (in-process parse of the emitted #N 0xADDR lines, aa64/x64/rv64 under exec, exit 42) and test/rt/addr2line.sh

Opt coverage — the backtrace path passes at O0 and O1 on all three arches. The rt-runtime corpus (test/rt/run.sh) and the addr2line round-trip (test/rt/addr2line.sh) now sweep both opt levels (KIT_RT_OPT_LEVELS), so backtrace_capture (L2) and print_backtrace (L3a) are exercised against optimized callers — all green at O0/O1. Sweeping O1 also surfaced two unrelated, pre-existing kit bugs, left red (not skipped) and logged in doc/plan/TODO.md: (1) x86-64 -g -O1 + the 4-operand register-pinned syscall idiom aborts the compiler (too many memory asm operands, src/arch/x64/native.c:4014) — this is why the x64/O1 lane of test-rt-backtrace is red, though x64/O1 backtrace correctness is still proven by print_backtrace/backtrace_capture (no asm); (2) setjmp/longjmp is miscompiled at -O1 on every arch (setjmp_runtime/O1 returns 1, not 42 — the second-return value isn't observed), failing test-rt-runtime.

Remaining: L3b in-process self-symbolization (and the deferred kit emu auto-backtrace). WS5/L3c (tool-side auto-backtrace) is done — see Status.

Implemented and tested through L2:

Open questions resolved while building:

Tests: test/rt/cases/backtrace_capture.c (aa64/x64/rv64 under exec), test/parse/cases/builtin_29..31_* (+ cases_err/..._nonconst) across the D/R/E/J/C lanes at O0/O1, test/toy/cases/154_frame_return_address.toy.

Remaining tasks (L3)

Nothing in L1/L2/L3a is outstanding. What's left is the rest of L3:

All Open-questions items are now resolved (the L3a output sink chose the weak default — see Open questions).

Overview

kit has no way for compiled code to inspect its own call stack. This roadmap adds that capability in three layers: GCC-compatible primitive builtins (__builtin_return_address, __builtin_frame_address), a freestanding runtime capture function (__kit_backtrace), and a symbolizing print path (__kit_print_backtrace) that turns return addresses into func at file:line.

Matching design docs once shipped: ../FRONTENDS.md (the builtins), ../RUNTIME.md (the rt helpers), ../DWARF.md (symbolization).

Why

What already exists (and what it can't do)

Design consequence: capture via the FP chain; symbolize via the existing DWARF reader, kept on the hosted side of the boundary. Do not reuse the CFI unwinder for self-capture.


L1 — Primitive builtins (__builtin_return_address, __builtin_frame_address)

GCC semantics: __builtin_frame_address(n) returns the frame address of the current function (n=0) or its n-th caller; __builtin_return_address(n) returns the return address into that frame. The level argument must be an integer constant (kit validates via the existing eval_const_int() path, as __builtin_offsetof already does at parse_expr.c:1331). Out-of-range / runaway walks are allowed to return a garbage-but-safe value or 0, matching GCC's "use 0 only with care" contract.

Lowering: two new CG intrinsics, FP-chain only

Add to KitCgIntrinsic (include/kit/cg.h:916):

KIT_CG_INTRIN_FRAME_ADDRESS,   /* pop level(u32 const); push void*  */
KIT_CG_INTRIN_RETURN_ADDRESS,  /* pop level(u32 const); push void*  */

Both lower through one shared FP-walk so level 0 and level N use the same path, and so level 0's return address comes from the spilled frame-record slot (not the live LR/RA, which may be clobbered mid-function):

arch FP reg frame(0) walk one frame return addr from frame F
aarch64 x29 x29 fp = *(fp) *(fp + 8) (saved x30)
x86-64 rbp rbp fp = *(fp) *(fp + 8) (pushed retaddr)
rv64 s0/x8 s0 fp = *(fp) *(fp + ptr) (saved ra)

The table is uniform across kit's targets: the prologue stores [fp+0] = caller fp, [fp+ptr] = saved ra everywhere (verified against rv_build_prologue — note this differs from the RISC-V psABI's ra@s0-8 / fp@s0-16, which an early draft of this table wrongly assumed).

For a constant level the walk unrolls to level dependent loads (typically 0–2), so no loop is emitted. wasm has no FP chain → diagnose unsupported, exactly as the IRQ/cache intrinsics already do per-arch.

Files to touch (the standard "new value-producing intrinsic" path)

Tests (L1) [done]


L2 — Capture: __kit_backtrace (freestanding runtime fn)

Surface decision (confirmed): primitives are builtins; capture/print are runtime functions, mirroring the GCC-builtin / glibc-backtrace split. New freestanding TU rt/lib/stack/backtrace.c, declared in a new public runtime header rt/include/kit/backtrace.h:

/* Fill buf[0..max) with return addresses, innermost first, skipping the
 * innermost `skip` frames (skip >= 1 hides __kit_backtrace itself).
 * Returns the number of frames written. Freestanding: pure FP walk, no libc,
 * no DWARF, works on every target that keeps a frame pointer (all of kit's). */
int __kit_backtrace(void** buf, int max, int skip);

Implementation is the L1 walk expressed in portable C: seed from __builtin_frame_address(0), then loop fp = *(void**)fp reading the saved-RA slot, stopping on a NULL saved-RA (the synthetic stack origin), a NULL or non-increasing fp (stack grows down — detect cycles/garbage), a misaligned link, or max. No per-arch knob is needed: kit's frame layout is uniform, so the walk indexes fp[0] (caller fp) and fp[1] (saved ra) as void**, which scales to the target pointer width automatically — no offset table, no #ifdef cascade. skip discards the innermost N frames (a print wrapper passes skip >= 1).

Tests (L2) [done]


L3 — Symbolize & print: __kit_print_backtrace

This is where the freestanding boundary bites: turning an address into func at file:line needs the DWARF reader, which is libkit, not rt. Three sub-options, ordered by how cleanly they respect that boundary. Recommend shipping L3a now, leaving L3b/L3c as documented extensions.

Tests (L3)


Suggested sequencing

  1. WS1 — L1 primitives, O0 — all three native arches + parse/toy tests. ✅ done.
  2. WS2 — L1 at O1/O2 — opt effect-modeling audit (turned out to need only the riscv frame-record fix) + O1 tests. ✅ done.
  3. WS3 — L2 __kit_backtrace in rt + capture test. ✅ done (assert-hook moved to WS4 — it needs the L3 print fn).
  4. WS4 — L3a raw print (__kit_print_backtrace + weak __kit_backtrace_write sink) + kit addr2line round-trip; wire the assert hook. ✅ done.
  5. WS5 — kit symbolize hosted batching symbolizer over the #N 0x<hex> stream, sharing driver/lib/dwarfsym.c with addr2line; second lane of test/rt/addr2line.sh. ✅ done.
  6. WS5 — L3c tool-side auto-backtrace for kit run + kit dbg. ✅ done (kit emu deferred — no retained guest DWARF).
  7. L3b deferred until a consumer needs in-binary symbolized panics.

Open questions

None outstanding.

Resolved in WS4:

Resolved while building L1/L2: