Plan: stack-trace builtins & runtime backtrace
Status — 2026-06-05 — L1 + L2 + L3a + kit symbolize + L3c (tool-side auto-backtrace) shipped (WS1–WS5); L3b remaining
L3c (WS5) — tool-side auto-backtrace for kit run and kit dbg — is now
shipped. Both tools print a symbolized frame-pointer-chain backtrace at a
fault/trap, reusing the DWARF reader they already own and never crossing into
rt/. The decisive finding: the CFI stepper kit_dwarf_unwind_step
(src/debug/dwarf_cfi.c:213) takes no memory provider, so when the return
address is spilled to the stack (the normal case) it returns pc=0 and the walk
dies after the leaf — the existing dbg bt was effectively single-frame. The
fix is to walk the frame-pointer chain (kit's uniform fp[0]=caller fp /
fp[1]=saved ra record, no .eh_frame needed), the same walk __kit_backtrace
does, lifted tool-side with a memory-read callback.
- Shared module
driver/lib/backtrace.c+.h: the FP-step kernel (driver_bt_fp_step, with__kit_backtrace's guards) + arch FP-reg/ptr-size helpers + a PC-list symbolizer (driver_backtrace_print_pcs). Gated into the DBG/RUN tool builds (mk/driver_srcs.mk). Walks stop at the kit-image boundary (kit_jit_runtime_to_image== 0) so output ends atmainand is host-independent (no libc/dyld trampoline noise). kit dbg(driver/cmd/dbg.c):dbg_cmd_btnow advances via the FP-step kernel overkit_jit_session_read_mem(so it walks the whole stack, not just the leaf), anddbg_render_stopauto-invokes it onKIT_STOP_SIGNAL(faults +__builtin_trap/assert; not breakpoints/steps).kit run(driver/cmd/run.c+driver/env/posix_dbg.c): a lightweight in-process crash guard (driver_run_with_crash_guard) installs SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGABRT/SIGTRAP handlers around the directentry_fncall, reusing the existingdbg_ucontext_to_framemarshalling. Becausekit runshares its stack with the program, the chain is captured inside the handler (before the post-siglongjmpstack is reused) and symbolized afterward in normal context; the process exits128 + signo. Windows has a no-op stub (vectored-handler port is a follow-up).- Tests:
test/dbg/cases/toy-trap-backtrace(multi-frame trap → auto-bt +bt), updatedtoy-trap-stopgolden, and akit runcrash lane intest/driver/run.sh(run-backtrace-*: non-zero exit + symbolizedbt_leaf/bt_mid/bt_root+ source file).
Scope note: kit emu auto-backtrace is out of scope (the emulator doesn't
retain the guest's DWARF after load); left as a follow-up alongside L3b.
L3a (WS4) is now shipped on top of L1/L2:
- L3a print
__kit_print_backtrace—rt/lib/stack/print_backtrace.cwalks via__kit_backtrace(buf, 64, skip=1)(skip hides the print frame, so#0is the caller) and writes one raw#N 0x<hex>line per frame to the weak__kit_backtrace_write(const char*, size_t)sink. Integer/hex formatting is hand-rolled (no printf/libc pulled into the panic path); the address usesuintptr_tso it is not truncated on LLP64. Declared inrt/include/kit/backtrace.h; added toRT_BASE_SRCS. - Output sink (open question resolved): weak no-op
__kit_backtrace_writedefault, so freestanding images that never wire a sink still link; the host /_startoverrides it to route bytes towrite(2)(or a UART). Chosen over a mandatory explicit-sink param to keep freestanding builds link-clean. - Assert hook (deferred from L2) —
rt/lib/assert/assert.c::__kit_assert_failnow emits akit: assertion failed: <expr>, file <file>, line <line>, function <func>banner then__kit_print_backtrace()before__builtin_trap(), all through the same weak sink (printf-free). Pulling__kit_assert_failtherefore also pullsprint_backtrace.o→backtrace.ofrom the archive — the intended wiring. - Symbolization is out-of-process via two hosted tools that share one
DWARF-open + func/line core (
driver/lib/dwarfsym.c):kit addr2line— the faithful GNU/LLVM clone (bare addresses in,file:lineout), unchanged in contract.kit symbolize(driver/cmd/symbolize.c, shipped) — reads the raw#N 0x<hex>stream__kit_print_backtraceemits, finds the address on each line, resolves it through the same DWARF reader, and rewrites the line in place as#0 0x401136 bt_leaf at addr2line_prog.c:51:3, keeping the#Nframing addr2line structurally can't. Lines with no address pass through verbatim. A single-e <image>today; multi--e/module-map (forlibc.soframes that need their own load slide) is the natural extension. Verified round-trip: a static non-PIE ELF prints its own trace at runtime, and the captured addresses resolvebt_leaf/bt_mid/bt_root/test_mainthrough bothkit addr2line -f -e <image>andkit symbolize -e <image>(outer no--gframes show??).
Tests (L3a): test/rt/cases/print_backtrace.c (in-process parse of the emitted
#N 0xADDR lines, aa64/x64/rv64 under exec, exit 42) and test/rt/addr2line.sh
test/rt/addr2line_prog.c(the symbolization round-trip, make targettest-rt-backtrace). The round-trip script runs the captured stream through both lanes per arch/opt:kit addr2line -fover the bare addresses, andkit symbolizeover the raw#N 0xADDRstream (asserting the#Nframing is preserved and<func> at file:lineis appended).test/rt/smoke.calso includes<kit/backtrace.h>so the header compiles on every rt-header target.
Opt coverage — the backtrace path passes at O0 and O1 on all three arches.
The rt-runtime corpus (test/rt/run.sh) and the addr2line round-trip
(test/rt/addr2line.sh) now sweep both opt levels (KIT_RT_OPT_LEVELS), so
backtrace_capture (L2) and print_backtrace (L3a) are exercised against
optimized callers — all green at O0/O1. Sweeping O1 also surfaced two
unrelated, pre-existing kit bugs, left red (not skipped) and logged in
doc/plan/TODO.md: (1) x86-64 -g -O1 + the 4-operand register-pinned syscall
idiom aborts the compiler (too many memory asm operands,
src/arch/x64/native.c:4014) — this is why the x64/O1 lane of
test-rt-backtrace is red, though x64/O1 backtrace correctness is still proven
by print_backtrace/backtrace_capture (no asm); (2) setjmp/longjmp is
miscompiled at -O1 on every arch (setjmp_runtime/O1 returns 1, not 42 —
the second-return value isn't observed), failing test-rt-runtime.
Remaining: L3b in-process self-symbolization (and the deferred kit emu
auto-backtrace). WS5/L3c (tool-side auto-backtrace) is done — see Status.
Implemented and tested through L2:
- L1 builtins
__builtin_frame_address/__builtin_return_address— two CG intrinsics (KIT_CG_INTRIN_FRAME_ADDRESS/_RETURN_ADDRESS), constant level carried as a single IMM operand, lowered as an unrolled FP walk on aarch64 / x86-64 / riscv (O0 and O1, same backend handler). The C target forwards__builtin_*to the host compiler; wasm reports unsupported; the C frontend validates the level viaeval_const_int. - L1 O1 modeling —
IR_INTRINSICis already conservatively side-effecting in opt (never DCE'd / CSE'd / hoisted), so no new effect modeling was needed. The one real O1 hazard — riscv's frameless-leaf tier (slim_prologue) emits no prologue and never anchorss0— is handled by a newNativeKnownFrameDesc.reads_frameflag set during frame analysis when these intrinsics appear; aarch64/x64 keep the frame record in every prologue shape, so they need no change. - L2 capture
__kit_backtrace—rt/lib/stack/backtrace.c+rt/include/kit/backtrace.h, inRT_BASE_SRCSfor every variant.
Open questions resolved while building:
- rv64 frame-record layout — the psABI
ra@s0-8/fp@s0-16guess in the L2 sketch is wrong for kit. kit's prologue stores the pair at and above s0:[s0+0] = caller fp,[s0+ptr] = saved ra(verified againstrv_build_prologue). So the layout is uniform across all kit targets (fp[0]/fp[1]in units ofvoid*) —__kit_backtraceneeds no per-arch offset table at all, just index 0 and 1. - wasm — diagnose unsupported (confirmed acceptable); the capability hook returns false and the C frontend emits a clean error.
- leaf-frame omission — handled via
reads_frame(above).
Tests: test/rt/cases/backtrace_capture.c (aa64/x64/rv64 under exec),
test/parse/cases/builtin_29..31_* (+ cases_err/..._nonconst) across the
D/R/E/J/C lanes at O0/O1, test/toy/cases/154_frame_return_address.toy.
Remaining tasks (L3)
Nothing in L1/L2/L3a is outstanding. What's left is the rest of L3:
WS4 — L3a:done (see Status) —__kit_print_backtrace()+ weak__kit_backtrace_writesink + assert-path hook +kit addr2lineround-trip.WS5 —done (see Status) — the hosted batching symbolizer that reads thekit symbolize:#N 0x<hex>stream and annotates it in place, sharingdriver/lib/dwarfsym.cwithaddr2line. Tested by the second lane oftest/rt/addr2line.sh.WS5 — L3c (tool-side auto-backtrace):done (see Status) —kit run+kit dbgauto-print a symbolized FP-chain backtrace at a fault/trap via the shareddriver/lib/backtrace.c; truncated at the kit-image boundary; never crosses into rt.kit emuauto-backtrace remains deferred (it doesn't retain the guest DWARF).- L3b: in-process self-symbolization (hosted-only
libkit_bt.a); deferred until a concrete consumer needs in-binary symbolized panics.
All Open-questions items are now resolved (the L3a output sink chose the weak default — see Open questions).
Overview
kit has no way for compiled code to inspect its own call stack. This roadmap
adds that capability in three layers: GCC-compatible primitive builtins
(__builtin_return_address, __builtin_frame_address), a freestanding runtime
capture function (__kit_backtrace), and a symbolizing print path
(__kit_print_backtrace) that turns return addresses into func at file:line.
Matching design docs once shipped: ../FRONTENDS.md (the builtins), ../RUNTIME.md (the rt helpers), ../DWARF.md (symbolization).
Why
- Portability.
__builtin_return_address/__builtin_frame_addressare a de-facto part of the GCC/Clang surface. Real C code (libcbacktrace, sanitizer shims, allocators, profilers,unwind-free panic handlers) uses them; kit currently can't compile any of it. - Diagnostics.
__kit_assert_fail(rt/lib/assert/assert.c) and the emulator fault path (src/emu/emu.c,compiler_panic) currently die silently with__builtin_trap(). A backtrace at the trap point is the single biggest debuggability win for kit-compiled programs. - It is cheap here, specifically. kit maintains a frame pointer on every
backend and has no
-fomit-frame-pointer(x29 on aarch64, rbp on x64, s0/x8 on rv64;AA_FP = 29atsrc/arch/aa64/native.c:61). Every prologue stores a{saved_fp, saved_ra}frame record. Frame-pointer-chain walking is therefore reliable, with no unwind tables and no.eh_framedependency.
What already exists (and what it can't do)
.eh_frameCFI is emitted by default for hosted targets (src/arch/mc.c:736,mc_emit_eh_frame), and off for freestanding.- A CFI unwinder,
kit_dwarf_unwind_step(src/debug/dwarf_cfi.c:213), interprets FDE/CIE programs — but deliberately takes no memory provider, so it cannot self-unwind a live stack. It is built for the dbg/JIT path where a session reads target memory out-of-band (driver/cmd/dbg.c:1010, thebtcommand). It is not a candidate for in-process capture. - Symbolization (
kit_dwarf_addr_to_line,kit_dwarf_func_at,include/kit/dwarf.h:21) is mature — it backsaddr2line(driver/cmd/addr2line.c) anddbg bt. But it lives inlibkit.a, not the freestanding runtimert/. Pulling it into a freestanding image is a non-goal (see L3). - The runtime has zero unwind/backtrace code today.
rt/lib/stack/exists but holds only the Windowschkstkhelper — a natural home for the new capture code.
Design consequence: capture via the FP chain; symbolize via the existing DWARF reader, kept on the hosted side of the boundary. Do not reuse the CFI unwinder for self-capture.
L1 — Primitive builtins (__builtin_return_address, __builtin_frame_address)
GCC semantics: __builtin_frame_address(n) returns the frame address of the
current function (n=0) or its n-th caller; __builtin_return_address(n) returns
the return address into that frame. The level argument must be an integer
constant (kit validates via the existing eval_const_int() path, as
__builtin_offsetof already does at parse_expr.c:1331). Out-of-range / runaway
walks are allowed to return a garbage-but-safe value or 0, matching GCC's "use 0
only with care" contract.
Lowering: two new CG intrinsics, FP-chain only
Add to KitCgIntrinsic (include/kit/cg.h:916):
KIT_CG_INTRIN_FRAME_ADDRESS, /* pop level(u32 const); push void* */
KIT_CG_INTRIN_RETURN_ADDRESS, /* pop level(u32 const); push void* */
Both lower through one shared FP-walk so level 0 and level N use the same path, and so level 0's return address comes from the spilled frame-record slot (not the live LR/RA, which may be clobbered mid-function):
| arch | FP reg | frame(0) |
walk one frame | return addr from frame F |
|---|---|---|---|---|
| aarch64 | x29 | x29 | fp = *(fp) |
*(fp + 8) (saved x30) |
| x86-64 | rbp | rbp | fp = *(fp) |
*(fp + 8) (pushed retaddr) |
| rv64 | s0/x8 | s0 | fp = *(fp) |
*(fp + ptr) (saved ra) |
The table is uniform across kit's targets: the prologue stores
[fp+0] = caller fp, [fp+ptr] = saved ra everywhere (verified against
rv_build_prologue — note this differs from the RISC-V psABI's ra@s0-8 /
fp@s0-16, which an early draft of this table wrongly assumed).
For a constant level the walk unrolls to level dependent loads (typically 0–2),
so no loop is emitted. wasm has no FP chain → diagnose unsupported, exactly
as the IRQ/cache intrinsics already do per-arch.
Files to touch (the standard "new value-producing intrinsic" path)
include/kit/cg.h— two enum entries + doc comments.src/cg/arith.c:1726— two rows in theKitCgIntrinsic → INTRIN_*table.src/cg/cgtarget.h:148— twoINTRIN_*enum entries (INTRIN_FRAME_ADDRESS,INTRIN_RETURN_ADDRESS).lang/c/parse/parse_priv.h:231+parse.c:1526— intern__builtin_return_address/__builtin_frame_addresssymbols.lang/c/parse/parse_expr.c(intry_parse_builtin_call, ~1696–2018) — two handlers: parse the constant level viaeval_const_int, then emit the intrinsic with result typevoid*. Newcg_adapter.chelperpcg_frame_or_return_address(p, kind, level).- Per-arch O0 lowering:
src/arch/aa64/native.c(3572),3378),src/arch/x64/native.c(src/arch/riscv/native.c(2992) — emit the FP-walk loads;1590) +src/arch/wasm/emit.c(src/arch/c_target/c_emit.c(~2603) — handle or diagnose (C target can emit__builtin_*straight through to the host compiler). - Capability hooks:
src/arch/{aa64,x64,riscv,wasm}/arch.c(alongside the existingKIT_CG_INTRIN_TRAPcases at e.g.aa64/arch.c:197). - Optimizer (O1/O2) [done]: in practice no new effect modeling was needed —
IR_INTRINSICis already conservatively side-effecting in opt (never DCE'd, CSE'd, or hoisted; seepass_dce.c), and the FP it reads is stable across the whole function, so scheduling is harmless. The one real O1 hazard turned out to be a backend frame issue, not an opt-modeling one: riscv's frameless-leaf tier (slim_prologue) emits no prologue and never anchorss0, so a leaf that reads its own frame would walk a stales0. Fixed with aNativeKnownFrameDesc.reads_frameflag set inpass_native_emit.cframe analysis and ANDed into riscv'sslim_prologuedecision; aarch64/x64 keep the frame record in every prologue shape, so they need nothing. O1 smoke tests run on all three arches.
Tests (L1) [done]
test/toy/cases/154_frame_return_address.toy— CG-API case exercising both intrinsics at levels 0/1/2 (@[.noinline]chain pins the depth).test/parse/cases/builtin_29_return_address.c,builtin_30_frame_address.c, andbuiltin_31_return_address_anchor.c; error casecases_err/builtin_return_address_nonconst.cfor a non-constant level.- The plan's "anchor in caller's range" smoke check is
builtin_31(run via the parse harness's qemu/podman exec lane on x64 + aa64 + rv64 at O0 and O1), not atest/smokescript. It anchors on the caller's function address, not a&&label: GNU labels-as-values whose address is taken but nevergoto'd break at O1 (undefined reference to '.Lcfblk.N'; see doc/plan/TODO.md).
L2 — Capture: __kit_backtrace (freestanding runtime fn)
Surface decision (confirmed): primitives are builtins; capture/print are
runtime functions, mirroring the GCC-builtin / glibc-backtrace split. New
freestanding TU rt/lib/stack/backtrace.c, declared in a new public runtime
header rt/include/kit/backtrace.h:
/* Fill buf[0..max) with return addresses, innermost first, skipping the
* innermost `skip` frames (skip >= 1 hides __kit_backtrace itself).
* Returns the number of frames written. Freestanding: pure FP walk, no libc,
* no DWARF, works on every target that keeps a frame pointer (all of kit's). */
int __kit_backtrace(void** buf, int max, int skip);
Implementation is the L1 walk expressed in portable C: seed from
__builtin_frame_address(0), then loop fp = *(void**)fp reading the saved-RA
slot, stopping on a NULL saved-RA (the synthetic stack origin), a NULL or
non-increasing fp (stack grows down — detect cycles/garbage), a misaligned link,
or max. No per-arch knob is needed: kit's frame layout is uniform, so the
walk indexes fp[0] (caller fp) and fp[1] (saved ra) as void**, which scales
to the target pointer width automatically — no offset table, no #ifdef cascade.
skip discards the innermost N frames (a print wrapper passes skip >= 1).
mk/rt.mk— addedrt/lib/stack/backtrace.ctoRT_BASE_SRCS(built for every variant;rt/lib/stack/already compiled the Windows chkstk helper).- Assert-path hook — landed in WS4 (was deferred):
rt/lib/assert/assert.c::__kit_assert_failnow emits a banner +__kit_print_backtrace()before__builtin_trap(). It needed the L3__kit_print_backtrace(), so it shipped with WS4 rather than L2.
Tests (L2) [done]
test/rt/cases/backtrace_capture.c— a known-depth@[.noinline]recursion; asserts depth, all return addresses non-null, that the recursive frames share a call site (proving the walk follows the chain), and theskip/maxbounds;return 42on success. Runs undertest/rt/run.shon aa64/x64/rv64.
L3 — Symbolize & print: __kit_print_backtrace
This is where the freestanding boundary bites: turning an address into
func at file:line needs the DWARF reader, which is libkit, not rt. Three
sub-options, ordered by how cleanly they respect that boundary. Recommend
shipping L3a now, leaving L3b/L3c as documented extensions.
L3a — raw print + out-of-process symbolization (shipped — WS4).
__kit_print_backtrace()lives in rt (rt/lib/stack/print_backtrace.c), walks via__kit_backtrace, and writes raw lines (#0 0x401136, …) to a host-provided sink (the weak__kit_backtrace_write(const char*, size_t)the host or_startwires towrite(2); freestanding default is a no-op). Symbolization is a separate hosted step through eitherkit addr2line(bare addresses) orkit symbolize(the raw#N 0x<hex>stream, annotated in place — shipped; see Status). Both share the DWARF-open + func/line core indriver/lib/dwarfsym.cand reuse the existing reader, so the freestanding image carries zero new symbolization code, matching how minimal panic handlers work in the wild.L3b — in-process self-symbolization (hosted-only). A trimmed line/func reader (reusing
kit_dwarf_addr_to_line+kit_dwarf_func_at) linked into a hosted-only archive — e.g.libkit_bt.aor a*-hostedrt variant — that opens the running image's own DWARF. Heavy (drags in the DWARF reader and an image-self-map); strictly opt-in, never in the freestanding default. Only build if a concrete consumer needs in-binary symbolized panics.L3c — tool-side auto-backtrace.
kit run/kit emu/dbgalready own a DWARF reader and thedbg btrendering path (driver/cmd/dbg.c:1010). Hook their fault/trap handlers (e.g. theEMU_TRAP_FAULT→compiler_panicsite insrc/emu/emu.c) to print a symbolized backtrace automatically. This is the highest-value, lowest-risk symbolized experience because it reuses everything and never crosses into rt. Largely independent of L1/L2 (the tools can unwind via their own session memory +kit_dwarf_unwind_step).
Tests (L3)
- L3a [done]:
test/rt/addr2line.sh(+addr2line_prog.c) runs a kit-compiled program that prints its own trace, then symbolizes the captured stream two ways —kit addr2line -fover the bare addresses, andkit symbolizeover the raw#N 0xADDRstream (asserting the#Nframing survives and<func> at file:lineis appended) — checkingbt_leaf/bt_mid/bt_root/test_mainappear (make targettest-rt-backtrace, aa64/x64/rv64). In-process companion:test/rt/cases/print_backtrace.cparses the emitted#N 0xADDRlines. - L3c: an
kit emufault test asserting a symbolized frame line on stderr.
Suggested sequencing
- WS1 — L1 primitives, O0 — all three native arches + parse/toy tests. ✅ done.
- WS2 — L1 at O1/O2 — opt effect-modeling audit (turned out to need only the riscv frame-record fix) + O1 tests. ✅ done.
- WS3 — L2
__kit_backtracein rt + capture test. ✅ done (assert-hook moved to WS4 — it needs the L3 print fn). - WS4 — L3a raw print (
__kit_print_backtrace+ weak__kit_backtrace_writesink) +kit addr2lineround-trip; wire the assert hook. ✅ done. - WS5 —
kit symbolizehosted batching symbolizer over the#N 0x<hex>stream, sharingdriver/lib/dwarfsym.cwithaddr2line; second lane oftest/rt/addr2line.sh. ✅ done. - WS5 — L3c tool-side auto-backtrace for
kit run+kit dbg. ✅ done (kit emudeferred — no retained guest DWARF). - L3b deferred until a consumer needs in-binary symbolized panics.
Open questions
None outstanding.
Resolved in WS4:
Output sink for L3a:weak__kit_backtrace_write(no-op default) vs. requiring the host to pass a sink explicitly. Chose the weak default — it keeps freestanding builds linking with no sink, and a host /_startoverrides it to route bytes towrite(2)or a UART. (Resolved while building WS4.)
Resolved while building L1/L2:
wasm:diagnose unsupported — confirmed acceptable; the capability hook returns false and the C frontend emits a clean error. (C target separately forwards__builtin_*to the host compiler.)rv64 frame-record layout:verified againstrv_build_prologue— kit stores[s0+0]=caller fp,[s0+ptr]=saved ra(NOT the psABIra@s0-8/fp@s0-16), so the layout is uniform across targets.leaf-frame omission:handled byNativeKnownFrameDesc.reads_frame, which forces riscv off its frameless-leaf tier when these intrinsics appear; aa64/x64 always keep the frame record. (Level-0 reads the spilled slot via the FP, so no live-LR/RA fallback is needed.)