kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 47cede9aca1511ee8a697c37bbf98aab26bb4ee3
parent 0ae44208d15bba91c4d3a9188848cb3871bf02bd
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  8 Jun 2026 11:39:35 -0700

plan: clean out completed plans

Diffstat:
Ddoc/plan/BACKTRACE.md | 419-------------------------------------------------------------------------------
Ddoc/plan/DIST_LIBRARY.md | 290-------------------------------------------------------------------------------
Ddoc/plan/LTO.md | 534-------------------------------------------------------------------------------
Mdoc/plan/README.md | 4----
Ddoc/plan/RELOC.md | 371-------------------------------------------------------------------------------
5 files changed, 0 insertions(+), 1618 deletions(-)

diff --git a/doc/plan/BACKTRACE.md b/doc/plan/BACKTRACE.md @@ -1,419 +0,0 @@ -# Plan: stack-trace builtins & runtime backtrace - -## Status — 2026-06-05 — L1 + L2 + L3a + `kit symbolize` + L3c (tool-side auto-backtrace) shipped (WS1–WS5); L3b remaining - -L3c (WS5) — **tool-side auto-backtrace for `kit run` and `kit dbg`** — is now -shipped. Both tools print a symbolized frame-pointer-chain backtrace at a -fault/trap, reusing the DWARF reader they already own and never crossing into -`rt/`. The decisive finding: the CFI stepper `kit_dwarf_unwind_step` -(`src/debug/dwarf_cfi.c:213`) takes **no memory provider**, so when the return -address is spilled to the stack (the normal case) it returns pc=0 and the walk -dies after the leaf — the existing `dbg bt` was effectively single-frame. The -fix is to walk the **frame-pointer chain** (kit's uniform `fp[0]`=caller fp / -`fp[1]`=saved ra record, no `.eh_frame` needed), the same walk `__kit_backtrace` -does, lifted tool-side with a memory-read callback. - -- **Shared module** `driver/lib/backtrace.c` + `.h`: the FP-step kernel - (`driver_bt_fp_step`, with `__kit_backtrace`'s guards) + arch FP-reg/ptr-size - helpers + a PC-list symbolizer (`driver_backtrace_print_pcs`). Gated into the - DBG/RUN tool builds (`mk/driver_srcs.mk`). Walks stop at the **kit-image - boundary** (`kit_jit_runtime_to_image` == 0) so output ends at `main` and is - host-independent (no libc/dyld trampoline noise). -- **`kit dbg`** (`driver/cmd/dbg.c`): `dbg_cmd_bt` now advances via the FP-step - kernel over `kit_jit_session_read_mem` (so it walks the whole stack, not just - the leaf), and `dbg_render_stop` auto-invokes it on `KIT_STOP_SIGNAL` - (faults + `__builtin_trap`/assert; not breakpoints/steps). -- **`kit run`** (`driver/cmd/run.c` + `driver/env/posix_dbg.c`): a lightweight - in-process crash guard (`driver_run_with_crash_guard`) installs - SIGSEGV/SIGBUS/SIGILL/SIGFPE/SIGABRT/SIGTRAP handlers around the direct - `entry_fn` call, reusing the existing `dbg_ucontext_to_frame` marshalling. - Because `kit run` shares its stack with the program, the chain is captured - **inside the handler** (before the post-`siglongjmp` stack is reused) and - symbolized afterward in normal context; the process exits `128 + signo`. - Windows has a no-op stub (vectored-handler port is a follow-up). -- **Tests:** `test/dbg/cases/toy-trap-backtrace` (multi-frame trap → auto-bt + - `bt`), updated `toy-trap-stop` golden, and a `kit run` crash lane in - `test/driver/run.sh` (`run-backtrace-*`: non-zero exit + symbolized - `bt_leaf/bt_mid/bt_root` + source file). - -Scope note: `kit emu` auto-backtrace is **out of scope** (the emulator doesn't -retain the guest's DWARF after load); left as a follow-up alongside L3b. - -L3a (WS4) is now shipped on top of L1/L2: - -- **L3a print** `__kit_print_backtrace` — `rt/lib/stack/print_backtrace.c` walks - via `__kit_backtrace(buf, 64, skip=1)` (skip hides the print frame, so `#0` is - the caller) and writes one raw `#N 0x<hex>` line per frame to the **weak** - `__kit_backtrace_write(const char*, size_t)` sink. Integer/hex formatting is - hand-rolled (no printf/libc pulled into the panic path); the address uses - `uintptr_t` so it is not truncated on LLP64. Declared in - `rt/include/kit/backtrace.h`; added to `RT_BASE_SRCS`. -- **Output sink (open question resolved):** weak no-op `__kit_backtrace_write` - default, so freestanding images that never wire a sink still link; the host / - `_start` overrides it to route bytes to `write(2)` (or a UART). Chosen over a - mandatory explicit-sink param to keep freestanding builds link-clean. -- **Assert hook (deferred from L2)** — `rt/lib/assert/assert.c::__kit_assert_fail` - now emits a `kit: assertion failed: <expr>, file <file>, line <line>, function - <func>` banner then `__kit_print_backtrace()` before `__builtin_trap()`, all - through the same weak sink (printf-free). Pulling `__kit_assert_fail` therefore - also pulls `print_backtrace.o` → `backtrace.o` from the archive — the intended - wiring. -- **Symbolization** is out-of-process via two hosted tools that share one - DWARF-open + func/line core (`driver/lib/dwarfsym.c`): - - `kit addr2line` — the faithful GNU/LLVM clone (bare addresses in, - `file:line` out), unchanged in contract. - - `kit symbolize` (`driver/cmd/symbolize.c`, **shipped**) — reads the raw - `#N 0x<hex>` stream `__kit_print_backtrace` emits, finds the address on each - line, resolves it through the same DWARF reader, and rewrites the line in - place as `#0 0x401136 bt_leaf at addr2line_prog.c:51:3`, keeping the `#N` - framing addr2line structurally can't. Lines with no address pass through - verbatim. A single `-e <image>` today; multi-`-e`/module-map (for `libc.so` - frames that need their own load slide) is the natural extension. - Verified round-trip: a static non-PIE ELF prints its own trace at runtime, and - the captured addresses resolve `bt_leaf`/`bt_mid`/`bt_root`/`test_main` through - both `kit addr2line -f -e <image>` and `kit symbolize -e <image>` (outer - no-`-g` frames show `??`). - -Tests (L3a): `test/rt/cases/print_backtrace.c` (in-process parse of the emitted -`#N 0xADDR` lines, aa64/x64/rv64 under exec, exit 42) and `test/rt/addr2line.sh` -+ `test/rt/addr2line_prog.c` (the symbolization round-trip, make target -`test-rt-backtrace`). The round-trip script runs the captured stream through -**both** lanes per arch/opt: `kit addr2line -f` over the bare addresses, and -`kit symbolize` over the raw `#N 0xADDR` stream (asserting the `#N` framing is -preserved and `<func> at file:line` is appended). `test/rt/smoke.c` also -includes `<kit/backtrace.h>` so the header compiles on every rt-header target. - -**Opt coverage — the backtrace path passes at O0 *and* O1 on all three arches.** -The rt-runtime corpus (`test/rt/run.sh`) and the addr2line round-trip -(`test/rt/addr2line.sh`) now sweep both opt levels (`KIT_RT_OPT_LEVELS`), so -`backtrace_capture` (L2) and `print_backtrace` (L3a) are exercised against -optimized callers — all green at O0/O1. Sweeping O1 also surfaced **two -unrelated, pre-existing kit bugs**, left red (not skipped) and logged in -doc/plan/TODO.md: (1) **x86-64 `-g -O1` + the 4-operand register-pinned syscall -idiom** aborts the compiler (`too many memory asm operands`, -`src/arch/x64/native.c:4014`) — this is why the `x64/O1` lane of -`test-rt-backtrace` is red, though x64/O1 backtrace correctness is still proven -by `print_backtrace`/`backtrace_capture` (no asm); (2) **setjmp/longjmp is -miscompiled at `-O1`** on every arch (`setjmp_runtime/O1` returns 1, not 42 — -the second-return value isn't observed), failing `test-rt-runtime`. - -Remaining: **L3b** in-process self-symbolization (and the deferred `kit emu` -auto-backtrace). WS5/L3c (tool-side auto-backtrace) is **done** — see Status. - -Implemented and tested through L2: - -- **L1 builtins** `__builtin_frame_address` / `__builtin_return_address` — two - CG intrinsics (`KIT_CG_INTRIN_FRAME_ADDRESS` / `_RETURN_ADDRESS`), constant - level carried as a single IMM operand, lowered as an unrolled FP walk on - aarch64 / x86-64 / riscv (O0 and O1, same backend handler). The C target - forwards `__builtin_*` to the host compiler; wasm reports unsupported; the C - frontend validates the level via `eval_const_int`. -- **L1 O1 modeling** — `IR_INTRINSIC` is already conservatively side-effecting - in opt (never DCE'd / CSE'd / hoisted), so no new effect modeling was needed. - The one real O1 hazard — riscv's frameless-leaf tier (`slim_prologue`) emits - no prologue and never anchors `s0` — is handled by a new - `NativeKnownFrameDesc.reads_frame` flag set during frame analysis when these - intrinsics appear; aarch64/x64 keep the frame record in every prologue shape, - so they need no change. -- **L2 capture** `__kit_backtrace` — `rt/lib/stack/backtrace.c` + - `rt/include/kit/backtrace.h`, in `RT_BASE_SRCS` for every variant. - -Open questions resolved while building: - -- **rv64 frame-record layout** — the psABI `ra@s0-8` / `fp@s0-16` guess in the - L2 sketch is *wrong for kit*. kit's prologue stores the pair at and above s0: - `[s0+0] = caller fp`, `[s0+ptr] = saved ra` (verified against - `rv_build_prologue`). So the layout is uniform across all kit targets - (`fp[0]`/`fp[1]` in units of `void*`) — `__kit_backtrace` needs no per-arch - offset table at all, just index 0 and 1. -- **wasm** — diagnose unsupported (confirmed acceptable); the capability hook - returns false and the C frontend emits a clean error. -- **leaf-frame omission** — handled via `reads_frame` (above). - -Tests: `test/rt/cases/backtrace_capture.c` (aa64/x64/rv64 under exec), -`test/parse/cases/builtin_29..31_*` (+ `cases_err/..._nonconst`) across the -D/R/E/J/C lanes at O0/O1, `test/toy/cases/154_frame_return_address.toy`. - -### Remaining tasks (L3) - -Nothing in L1/L2/L3a is outstanding. What's left is the rest of L3: - -- ~~**WS4 — L3a:**~~ **done** (see Status) — `__kit_print_backtrace()` + weak - `__kit_backtrace_write` sink + assert-path hook + `kit addr2line` round-trip. -- ~~**WS5 — `kit symbolize`:**~~ **done** (see Status) — the hosted batching - symbolizer that reads the `#N 0x<hex>` stream and annotates it in place, - sharing `driver/lib/dwarfsym.c` with `addr2line`. Tested by the second lane of - `test/rt/addr2line.sh`. -- ~~**WS5 — L3c (tool-side auto-backtrace):**~~ **done** (see Status) — - `kit run` + `kit dbg` auto-print a symbolized FP-chain backtrace at a - fault/trap via the shared `driver/lib/backtrace.c`; truncated at the kit-image - boundary; never crosses into rt. `kit emu` auto-backtrace remains deferred (it - doesn't retain the guest DWARF). -- **L3b:** in-process self-symbolization (hosted-only `libkit_bt.a`); deferred - until a concrete consumer needs in-binary symbolized panics. - -All Open-questions items are now resolved (the L3a output sink chose the weak -default — see Open questions). - -## Overview - -kit has no way for compiled code to inspect its own call stack. This roadmap -adds that capability in three layers: GCC-compatible primitive **builtins** -(`__builtin_return_address`, `__builtin_frame_address`), a freestanding runtime -**capture** function (`__kit_backtrace`), and a **symbolizing print** path -(`__kit_print_backtrace`) that turns return addresses into `func at file:line`. - -Matching design docs once shipped: [../FRONTENDS.md](../FRONTENDS.md) (the -builtins), [../RUNTIME.md](../RUNTIME.md) (the rt helpers), [../DWARF.md](../DWARF.md) -(symbolization). - -## Why - -- **Portability.** `__builtin_return_address` / `__builtin_frame_address` are a - de-facto part of the GCC/Clang surface. Real C code (libc `backtrace`, - sanitizer shims, allocators, profilers, `unwind`-free panic handlers) uses - them; kit currently can't compile any of it. -- **Diagnostics.** `__kit_assert_fail` (`rt/lib/assert/assert.c`) and the - emulator fault path (`src/emu/emu.c`, `compiler_panic`) currently die silently - with `__builtin_trap()`. A backtrace at the trap point is the single biggest - debuggability win for kit-compiled programs. -- **It is cheap here, specifically.** kit maintains a frame pointer on **every** - backend and has **no `-fomit-frame-pointer`** (x29 on aarch64, rbp on x64, - s0/x8 on rv64; `AA_FP = 29` at `src/arch/aa64/native.c:61`). Every prologue - stores a `{saved_fp, saved_ra}` frame record. Frame-pointer-chain walking is - therefore *reliable*, with no unwind tables and no `.eh_frame` dependency. - -## What already exists (and what it can't do) - -- **`.eh_frame` CFI** is emitted by default for hosted targets - (`src/arch/mc.c:736`, `mc_emit_eh_frame`), and **off for freestanding**. -- **A CFI unwinder**, `kit_dwarf_unwind_step` (`src/debug/dwarf_cfi.c:213`), - interprets FDE/CIE programs — but deliberately takes **no memory provider**, so - it *cannot self-unwind a live stack*. It is built for the dbg/JIT path where a - session reads target memory out-of-band (`driver/cmd/dbg.c:1010`, the `bt` - command). It is not a candidate for in-process capture. -- **Symbolization** (`kit_dwarf_addr_to_line`, `kit_dwarf_func_at`, - `include/kit/dwarf.h:21`) is mature — it backs `addr2line` - (`driver/cmd/addr2line.c`) and `dbg bt`. But it lives in **`libkit.a`, not the - freestanding runtime `rt/`**. Pulling it into a freestanding image is a - non-goal (see L3). -- **The runtime has zero unwind/backtrace code today.** `rt/lib/stack/` exists - but holds only the Windows `chkstk` helper — a natural home for the new - capture code. - -Design consequence: **capture via the FP chain; symbolize via the existing -DWARF reader, kept on the hosted side of the boundary.** Do *not* reuse the CFI -unwinder for self-capture. - ---- - -## L1 — Primitive builtins (`__builtin_return_address`, `__builtin_frame_address`) - -GCC semantics: `__builtin_frame_address(n)` returns the frame address of the -current function (n=0) or its n-th caller; `__builtin_return_address(n)` returns -the return address into that frame. The level argument **must be an integer -constant** (kit validates via the existing `eval_const_int()` path, as -`__builtin_offsetof` already does at `parse_expr.c:1331`). Out-of-range / runaway -walks are allowed to return a garbage-but-safe value or 0, matching GCC's "use 0 -only with care" contract. - -### Lowering: two new CG intrinsics, FP-chain only - -Add to `KitCgIntrinsic` (`include/kit/cg.h:916`): - -``` -KIT_CG_INTRIN_FRAME_ADDRESS, /* pop level(u32 const); push void* */ -KIT_CG_INTRIN_RETURN_ADDRESS, /* pop level(u32 const); push void* */ -``` - -Both lower through one shared FP-walk so level 0 and level N use the same path, -and so level 0's return address comes from the **spilled** frame-record slot (not -the live LR/RA, which may be clobbered mid-function): - -| arch | FP reg | `frame(0)` | walk one frame | return addr from frame F | -|------|--------|-----------|----------------|--------------------------| -| aarch64 | x29 | x29 | `fp = *(fp)` | `*(fp + 8)` (saved x30) | -| x86-64 | rbp | rbp | `fp = *(fp)` | `*(fp + 8)` (pushed retaddr) | -| rv64 | s0/x8 | s0 | `fp = *(fp)` | `*(fp + ptr)` (saved ra) | - -The table is **uniform** across kit's targets: the prologue stores -`[fp+0] = caller fp`, `[fp+ptr] = saved ra` everywhere (verified against -`rv_build_prologue` — note this differs from the RISC-V psABI's `ra@s0-8` / -`fp@s0-16`, which an early draft of this table wrongly assumed). - -For a constant level the walk unrolls to `level` dependent loads (typically 0–2), -so no loop is emitted. wasm has no FP chain → **diagnose unsupported**, exactly -as the IRQ/cache intrinsics already do per-arch. - -### Files to touch (the standard "new value-producing intrinsic" path) - -- `include/kit/cg.h` — two enum entries + doc comments. -- `src/cg/arith.c:1726` — two rows in the `KitCgIntrinsic → INTRIN_*` table. -- `src/cg/cgtarget.h:148` — two `INTRIN_*` enum entries - (`INTRIN_FRAME_ADDRESS`, `INTRIN_RETURN_ADDRESS`). -- `lang/c/parse/parse_priv.h:231` + `parse.c:1526` — intern - `__builtin_return_address` / `__builtin_frame_address` symbols. -- `lang/c/parse/parse_expr.c` (in `try_parse_builtin_call`, ~1696–2018) — two - handlers: parse the constant level via `eval_const_int`, then emit the - intrinsic with result type `void*`. New `cg_adapter.c` helper - `pcg_frame_or_return_address(p, kind, level)`. -- Per-arch O0 lowering: `src/arch/aa64/native.c` (~3572), `src/arch/x64/native.c` - (~3378), `src/arch/riscv/native.c` (~2992) — emit the FP-walk loads; - `src/arch/wasm/emit.c` (~1590) + `src/arch/c_target/c_emit.c` (~2603) — handle - or diagnose (C target can emit `__builtin_*` straight through to the host - compiler). -- Capability hooks: `src/arch/{aa64,x64,riscv,wasm}/arch.c` (alongside the - existing `KIT_CG_INTRIN_TRAP` cases at e.g. `aa64/arch.c:197`). -- **Optimizer (O1/O2) [done]:** in practice no new effect modeling was needed — - `IR_INTRINSIC` is already conservatively side-effecting in opt (never DCE'd, - CSE'd, or hoisted; see `pass_dce.c`), and the FP it reads is stable across the - whole function, so scheduling is harmless. The one real O1 hazard turned out to - be a backend frame issue, not an opt-modeling one: riscv's frameless-leaf tier - (`slim_prologue`) emits no prologue and never anchors `s0`, so a leaf that reads - its own frame would walk a stale `s0`. Fixed with a `NativeKnownFrameDesc.reads_frame` - flag set in `pass_native_emit.c` frame analysis and ANDed into riscv's - `slim_prologue` decision; aarch64/x64 keep the frame record in every prologue - shape, so they need nothing. O1 smoke tests run on all three arches. - -### Tests (L1) [done] - -- `test/toy/cases/154_frame_return_address.toy` — CG-API case exercising both - intrinsics at levels 0/1/2 (`@[.noinline]` chain pins the depth). -- `test/parse/cases/builtin_29_return_address.c`, `builtin_30_frame_address.c`, - and `builtin_31_return_address_anchor.c`; error case - `cases_err/builtin_return_address_nonconst.c` for a non-constant level. -- The plan's "anchor in caller's range" smoke check is `builtin_31` (run via the - parse harness's qemu/podman exec lane on x64 + aa64 + rv64 at **O0 and O1**), - not a `test/smoke` script. It anchors on the **caller's function address**, not - a `&&label`: GNU labels-as-values whose address is taken but never `goto`'d - break at O1 (`undefined reference to '.Lcfblk.N'`; see doc/plan/TODO.md). - ---- - -## L2 — Capture: `__kit_backtrace` (freestanding runtime fn) - -Surface decision (confirmed): **primitives are builtins; capture/print are -runtime functions**, mirroring the GCC-builtin / glibc-`backtrace` split. New -freestanding TU `rt/lib/stack/backtrace.c`, declared in a new public runtime -header `rt/include/kit/backtrace.h`: - -```c -/* Fill buf[0..max) with return addresses, innermost first, skipping the - * innermost `skip` frames (skip >= 1 hides __kit_backtrace itself). - * Returns the number of frames written. Freestanding: pure FP walk, no libc, - * no DWARF, works on every target that keeps a frame pointer (all of kit's). */ -int __kit_backtrace(void** buf, int max, int skip); -``` - -Implementation is the L1 walk expressed in portable C: seed from -`__builtin_frame_address(0)`, then loop `fp = *(void**)fp` reading the saved-RA -slot, stopping on a NULL saved-RA (the synthetic stack origin), a NULL or -non-increasing fp (stack grows down — detect cycles/garbage), a misaligned link, -or `max`. **No per-arch knob is needed:** kit's frame layout is uniform, so the -walk indexes `fp[0]` (caller fp) and `fp[1]` (saved ra) as `void**`, which scales -to the target pointer width automatically — no offset table, no `#ifdef` cascade. -`skip` discards the innermost N frames (a print wrapper passes `skip >= 1`). - -- `mk/rt.mk` — added `rt/lib/stack/backtrace.c` to `RT_BASE_SRCS` (built for - every variant; `rt/lib/stack/` already compiled the Windows chkstk helper). -- **Assert-path hook — landed in WS4 (was deferred):** - `rt/lib/assert/assert.c::__kit_assert_fail` now emits a banner + - `__kit_print_backtrace()` before `__builtin_trap()`. It needed the L3 - `__kit_print_backtrace()`, so it shipped with WS4 rather than L2. - -### Tests (L2) [done] - -- `test/rt/cases/backtrace_capture.c` — a known-depth `@[.noinline]` recursion; - asserts depth, all return addresses non-null, that the recursive frames share a - call site (proving the walk follows the chain), and the `skip`/`max` bounds; - `return 42` on success. Runs under `test/rt/run.sh` on aa64/x64/rv64. - ---- - -## L3 — Symbolize & print: `__kit_print_backtrace` - -This is where the freestanding boundary bites: turning an address into -`func at file:line` needs the DWARF reader, which is **libkit, not rt**. Three -sub-options, ordered by how cleanly they respect that boundary. Recommend -shipping **L3a now**, leaving L3b/L3c as documented extensions. - -- **L3a — raw print + out-of-process symbolization (shipped — WS4).** - `__kit_print_backtrace()` lives in rt (`rt/lib/stack/print_backtrace.c`), walks - via `__kit_backtrace`, and writes raw lines (`#0 0x401136`, …) to a - host-provided sink (the weak `__kit_backtrace_write(const char*, size_t)` the - host or `_start` wires to `write(2)`; freestanding default is a no-op). - Symbolization is a separate hosted step through either `kit addr2line` (bare - addresses) or `kit symbolize` (the raw `#N 0x<hex>` stream, annotated in place - — **shipped**; see Status). Both share the DWARF-open + func/line core in - `driver/lib/dwarfsym.c` and reuse the existing reader, so the freestanding - image carries zero new symbolization code, matching how minimal panic handlers - work in the wild. - -- **L3b — in-process self-symbolization (hosted-only).** A trimmed line/func - reader (reusing `kit_dwarf_addr_to_line` + `kit_dwarf_func_at`) linked into a - **hosted-only** archive — e.g. `libkit_bt.a` or a `*-hosted` rt variant — that - opens the running image's own DWARF. Heavy (drags in the DWARF reader and an - image-self-map); strictly opt-in, never in the freestanding default. Only build - if a concrete consumer needs in-binary symbolized panics. - -- **L3c — tool-side auto-backtrace.** `kit run` / `kit emu` / `dbg` already own a - DWARF reader and the `dbg bt` rendering path (`driver/cmd/dbg.c:1010`). Hook - their fault/trap handlers (e.g. the `EMU_TRAP_FAULT` → `compiler_panic` site in - `src/emu/emu.c`) to print a symbolized backtrace automatically. This is the - highest-value, lowest-risk symbolized experience because it reuses everything - and never crosses into rt. Largely independent of L1/L2 (the tools can unwind - via their own session memory + `kit_dwarf_unwind_step`). - -### Tests (L3) - -- L3a [done]: `test/rt/addr2line.sh` (+ `addr2line_prog.c`) runs a kit-compiled - program that prints its own trace, then symbolizes the captured stream two - ways — `kit addr2line -f` over the bare addresses, and `kit symbolize` over the - raw `#N 0xADDR` stream (asserting the `#N` framing survives and `<func> at - file:line` is appended) — checking `bt_leaf`/`bt_mid`/`bt_root`/`test_main` - appear (make target `test-rt-backtrace`, aa64/x64/rv64). In-process companion: - `test/rt/cases/print_backtrace.c` parses the emitted `#N 0xADDR` lines. -- L3c: an `kit emu` fault test asserting a symbolized frame line on stderr. - ---- - -## Suggested sequencing - -1. **WS1 — L1 primitives, O0** — all three native arches + parse/toy tests. ✅ done. -2. **WS2 — L1 at O1/O2** — opt effect-modeling audit (turned out to need only the - riscv frame-record fix) + O1 tests. ✅ done. -3. **WS3 — L2 `__kit_backtrace`** in rt + capture test. ✅ done (assert-hook moved - to WS4 — it needs the L3 print fn). -4. **WS4 — L3a** raw print (`__kit_print_backtrace` + weak `__kit_backtrace_write` - sink) + `kit addr2line` round-trip; wire the assert hook. ✅ done. -5. **WS5 — `kit symbolize`** hosted batching symbolizer over the `#N 0x<hex>` - stream, sharing `driver/lib/dwarfsym.c` with `addr2line`; second lane of - `test/rt/addr2line.sh`. ✅ done. -6. **WS5 — L3c** tool-side auto-backtrace for `kit run` + `kit dbg`. ✅ done - (`kit emu` deferred — no retained guest DWARF). -7. **L3b** deferred until a consumer needs in-binary symbolized panics. - -## Open questions - -None outstanding. - -Resolved in WS4: - -- ~~**Output sink for L3a:**~~ weak `__kit_backtrace_write` (no-op default) vs. - requiring the host to pass a sink explicitly. **Chose the weak default** — it - keeps freestanding builds linking with no sink, and a host / `_start` - overrides it to route bytes to `write(2)` or a UART. (Resolved while building - WS4.) - -Resolved while building L1/L2: - -- ~~**wasm:**~~ diagnose unsupported — confirmed acceptable; the capability hook - returns false and the C frontend emits a clean error. (C target separately - forwards `__builtin_*` to the host compiler.) -- ~~**rv64 frame-record layout:**~~ verified against `rv_build_prologue` — kit - stores `[s0+0]=caller fp`, `[s0+ptr]=saved ra` (NOT the psABI `ra@s0-8` / - `fp@s0-16`), so the layout is uniform across targets. -- ~~**leaf-frame omission:**~~ handled by `NativeKnownFrameDesc.reads_frame`, which - forces riscv off its frameless-leaf tier when these intrinsics appear; aa64/x64 - always keep the frame record. (Level-0 reads the spilled slot via the FP, so no - live-LR/RA fallback is needed.) diff --git a/doc/plan/DIST_LIBRARY.md b/doc/plan/DIST_LIBRARY.md @@ -1,290 +0,0 @@ -# Distribution as a library subsystem - -> **Status: implemented.** The migration below has landed (one commit): the -> dist subsystem moved to `src/dist/` (+ top-level `vendor/`), exposed through -> `<kit/cas.h>` / `<kit/package.h>` (`src/api/{cas,package}.c`), gated by -> `KIT_CAS_ENABLED` / `KIT_PKG_ENABLED`; `kit cas` / `kit pkg` are thin -> CLIs over the public API via a `KitCasHost` vtable (`driver/lib/dist_host.c`), -> with operational errors flowing through `ctx->diag`. Verified green: -> `test-driver-cas` (41) + `test-driver-pkg` (182) under ASan/UBSan. -> -> **Deferred:** the v2 deletion + `3`-suffix rename (Stage 3, below) were *not* -> done — the dead v2 code was carried over unchanged. On inspection the deletion -> is more surgical than the line-ranges below imply: the v2 *extern* surface -> (`DistManifest`/`DistArtifact`/`DistDependency`, `dist_manifest_*`, -> `dist_kpkg2_*`, the v2 `DistKpkg*` structs + `dist_kpkg_*` v2 codecs) is -> safely unreferenced, but the v2 and v3 manifest parsers **share** the static -> helpers `set_err` / `trim_lead` / `trim_trail` / `copy_field` / `kind_valid` -> in `src/dist/manifest.c` (only `parse_u64`, the v2 `finalize`, and -> `dist_manifest_path_valid` are v2-only). The cleanup must keep the shared -> helpers — verify the same in `src/dist/kpkg.c` — and recompile + rerun the -> cas/pkg suites after. - -Signed, content-addressed distribution (`kit cas` / `kit pkg`) is today the -only major capability that lives **entirely inside `driver/`** — its model, -its vendored crypto/compression, and its create/verify/unpack pipelines all sit -under `driver/dist/` and `driver/cmd/{cas,pkg}.c`. Every other capability is a -libkit subsystem behind `include/kit/`, with the CLI tool a thin -arg-parser on top. This doc captures the plan to bring distribution into the -same shape: move the implementation into the library, expose it through two -public headers, and reduce `cas.c`/`pkg.c` to flag-parsing + host wiring. The -design it realizes is in [../DISTRIBUTE.md](../DISTRIBUTE.md); the precedent it -follows is the `ar` subsystem (`src/api/archive.c` + `include/kit/archive.h`, -gated by `KIT_AR_ENABLED` distinct from `KIT_TOOL_AR_ENABLED`). - -## Goal - -`libkit.a` gains a content-store API and a signed-package API, gated by their -own subsystem flags so a minimal embedding pays nothing for them. The `kit -cas` and `kit pkg` tools become thin CLIs that translate flags into public -calls and supply host vtables — exactly like `ar`, `ld`, `objdump`. An embedder -can create, sign, verify, inspect, and unpack packages, and drive a CAS, without -the driver and without linking host crypto/compression. - -## Why this is the right shape (not CLI-only logic) - -Two layers are stacked under `driver/dist/`, and they have very different -readiness: - -- **The `dist_*` byte model** (`driver/dist/*.c`, ~6.4k lines plus ~6.7k - vendored) is **already written to the public boundary's contract.** It - includes only `<kit/core.h>` plus its own headers — no `driver.h`/`env.h`. - It sources no entropy and does no I/O except through `KitWriter` callbacks - and a small host vtable (`DistCasHost` = `KitFileIO` + `mkdir_p` + - `mark_executable`). This obeys the "host supplies all side effects" principle - verbatim. Moving it is near-mechanical. - -- **The `pkg_*` / `cas_*` orchestration** (`driver/cmd/pkg.c` 2123 lines, - `cas.c` 491 lines) holds the valuable pipelines — `pkg_create_targz`, - `pkg_create_kpkg`, `pkg_verify_portable`, `pkg_verify_native`, blob - reconstruction, trust/key resolution — but is entangled with the CLI. The - glue to unwind, by call count in `pkg.c`: - - `driver_errf` ×88 — stderr error reporting → structured error returns / the - `KitContext` diag sink (the `dist_*` parsers already take - `char* err, size_t errcap`). - - `driver_mkdir_p`, `driver_mark_executable_output`, - `driver_walk_regular_files` — host filesystem ops beyond `KitFileIO`. - - `driver_random_bytes` ×2 — host CSPRNG, only for keygen. - - `driver_getenv` ×2 — trust-file path defaulting - (`$KIT_TRUSTED_KEYS` / `$HOME`); env-var *policy* that **stays in the - driver**. - - `driver_streq` / `driver_printf` / `driver_has_suffix` — arg parsing and - stdout formatting; **stay in the driver**. - -The layering invariant forces the move: `driver/` may include only -`<kit/*.h>`, and `src/api` may not include `driver/` headers — so a public -boundary is impossible while the code sits in `driver/dist/`. Relocating to -`src/` is a precondition, not a cleanup. - -## Target tree layout - -``` -vendor/ # top-level: pristine third-party trees - monocypher/ # (moved from driver/dist/vendor/monocypher) - lz4/ # (moved from driver/dist/vendor/lz4) -include/kit/cas.h # content model: blob/tree hashing + CAS store -include/kit/package.h # package model: manifest, sign/verify, create/unpack -src/api/cas.c # public handles <-> internal (archive.c precedent) -src/api/package.c -src/dist/ # moved dist_* subsystem (private headers) - dist.{c,h} blob.{c,h} tree.{c,h} cas.{c,h} - manifest.{c,h} kpkg.{c,h} trust.{c,h} - blake2b.{c,h} ed25519.{c,h} minisig.{c,h} b64.{c,h} - deflate.{c,h} lz4.{c,h} tar.{c,h} # kit-maintained shims/extracts -``` - -Vendor split, confirmed by inspection: only **monocypher** and **lz4** are -pristine third-party trees pulled in by `#include` — they move to a repo-root -`vendor/`. `deflate.c` is a kit-maintained *extract* of miniz (already -modified, not pristine), and `b64.c` / `tar.c` are self-contained — these stay -in `src/dist/`. The shim includes that currently read -`"vendor/monocypher/..."` (e.g. `blake2b.h`, `ed25519.c`) get rewritten to the -new top-level path. - -## Config gating - -Add subsystem flags to `include/kit/config.h`, separate from the tool flags -(mirroring `KIT_AR_ENABLED` vs `KIT_TOOL_AR_ENABLED`): - -```c -#define KIT_CAS_ENABLED 1 /* content store: src/dist/{blob,tree,cas} + kit/cas.h */ -#define KIT_PKG_ENABLED 1 /* signed packages: adds manifest/kpkg/minisig/crypto + kit/package.h */ -``` - -`KIT_PKG_ENABLED` implies `KIT_CAS_ENABLED` (packages are built over the -content model). `KIT_TOOL_CAS_ENABLED` / `KIT_TOOL_PKG_ENABLED` stay and -assert their subsystem flag. Off → the units (and the vendored crypto) drop -entirely, so a minimal embedding carries no Ed25519/BLAKE2b/DEFLATE/LZ4. The -Makefile's `LIB_SRCS_*` gains a dist regime that pulls `src/dist/*.c` plus the -enabled `vendor/` trees. - -## Public API surface - -Two headers, mirroring DISTRIBUTE.md's content-model vs signed-package split. -Model structs are exposed as POD (renamed to the `Kit*` convention); the -vendored primitives and the kpkg wire codecs stay internal. - -### `include/kit/cas.h` — content model (self-verifying, no trust) - -- POD types: `KitTree`, `KitTreeEntry`, `KitBlobInfo`. -- Pure hashing (no I/O): `kit_blob_id`, `kit_blob_root`, `kit_blob_info`, - `kit_tree_id`, `kit_tree_emit`, `kit_tree_parse`, `kit_tree_find`. -- A `KitCas` handle over `KitContext` + a host vtable: - `kit_cas_open`, `kit_cas_put_blob` / `get_blob`, - `kit_cas_put_tree` / `get_tree`, `kit_cas_add_tree_from_dir`, - `kit_cas_verify_tree`, `kit_cas_materialize`. - -### `include/kit/package.h` — package model (signed) - -- POD model: `KitPackageManifest` with its outputs / artifacts / deps; a - public `KitPackageEncoding` descriptor (region layout, chunk-index summary, - external-fetch templates) so `inspect --encoding` and external-fetch planning - are real library features. -- Keys / trust: `KitMinisigKeypair`; `kit_pkg_keygen` (entropy injected via - the host vtable, never read by the library); pubkey/seckey emit + parse; - `kit_pkg_sign` / `kit_pkg_verify_signature`. Trust resolution takes - **explicit** trusted-keys bytes — the library reads no env vars and no - `$HOME`; the driver supplies the resolved path/bytes. -- Pipelines as opts-struct calls: `kit_pkg_create` (format `kpkg`|`tar.gz`, - native-shape `fat`|`metadata`|`thin`, compression, source = `--root` dir or - `cas + tree`, external dir), `kit_pkg_verify`, `kit_pkg_unpack`, - `kit_pkg_inspect`. - -### Kept internal (`src/dist/` private headers) - -All vendored code; the `dist_blake2b` / `dist_ed25519` / `dist_minisig` / -`dist_b64` / `dist_gz` / `dist_lz4` / `dist_tar` shims; and the kpkg wire -codecs (header / descriptor / index encode-decode). Rationale: raw crypto and -on-wire binary layout are implementation detail — exposing them invites misuse -and an API-stability burden. The logical model and pipelines are the contract. - -## New host capabilities - -The library reaches the host through `KitContext.file_io` (read/write, -already present) plus one new vtable for the operations `KitFileIO` doesn't -cover — every one of which the driver already implements: - -```c -typedef struct KitDistHost { - int (*mkdir_p)(void* user, const char* path); - int (*mark_executable)(void* user, const char* path); - int (*walk_regular_files)(void* user, const char* root, /* callback */ ...); - int (*fill_random)(void* user, uint8_t* out, size_t n); /* keygen only */ - void* user; -} KitDistHost; -``` - -`DistCasHost` already models `mkdir_p` + `mark_executable`; this generalizes it -and adds the directory walk (`driver_walk_regular_files`) and CSPRNG -(`driver_random_bytes`). Naming/placement TBD during Stage 2 (could fold the -CAS-only subset into `KitCas` and keep `fill_random` package-side). - -## Error reporting (decided) - -Public dist calls **return `KitStatus`** and **emit human-readable detail -through `ctx->diag`** — not through an err-buffer at the boundary. This is the -established convention, not a new pattern: - -- `KitContext` carries `KitDiagSink* diag` directly (`core.h`), so the sink - is reachable without a `KitCompiler` — exactly as the pure-byte subsystems - (object, archive, dwarf) get it. -- It mirrors the linker: `src/link/link_layout.c` emits operational errors such - as "linker script: undefined symbol …" through the sink and returns a status. - Package/CAS errors are the same shape — operational, no source position. -- `KitStatus` already carries the right categories: `KIT_MALFORMED` (bad - manifest/tree/signature), `KIT_NOT_FOUND` (missing blob/tree/key), - `KIT_IO`, `KIT_INVALID` (unsafe path), `KIT_UNSUPPORTED` (encrypted - seckey / scrypt). The status is the machine-readable category; the diag - message is the actionable detail (`"blob root mismatch for: <path>"`). - -Mechanics: - -- **No source location.** Emit with a zero `KitSrcLoc` (file_id 0), as the - linker does for non-source errors; the host stderr sink already tolerates it. -- **The internal `dist_*` parsers keep their `(char* err, size_t errcap)` - buffer** unchanged. The `src/api` wrapper catches that string and forwards it - to `ctx->diag`, so the byte model barely changes and its detailed parse - messages survive intact. -- **A small internal `api_diagf(ctx, kind, fmt, …)` helper** over - `ctx->diag->emit` (no-op when `diag` is NULL) packs varargs for the api layer. -- **The 88 `driver_errf` sites split by ownership.** Operational/pipeline errors - (create / verify / unpack / resolve) move into `src/api/package.c` as diag - emits; pure arg-parse errors (`"unknown option"`, `"-o BASE is required"`) - stay in `pkg.c` as `driver_errf`, because argument parsing is driver policy. -- **Embedder control.** The sink bumps its `errors` counter and prints. For the - CLI that is exactly today's `driver_errf` behavior. An embedder doing - speculative verification supplies its own (or no) sink and reads only the - `KitStatus`, so a failed verify stays quiet. - -## Versioning: latest-only (decided) - -We support only the current on-disk format and drop all back-compat code. This -is verified to be **pure deletion with zero behavioral change**: every v2 symbol -(`dist_manifest_*`, the non-`3` `dist_kpkg_*`, `dist_kpkg2_*`, and the -`DistManifest` / `DistArtifact` / `DistDependency` / `DistKpkgHeader` / -`DistKpkgDescriptor` / `DistKpkgIndexRecord` structs) is referenced *only* in -its own definition files — never by `pkg.c`, `cas.c`, or any test. The driver -already emits and reads v3 exclusively. - -Dropping it pays off twice: - -1. Deletes the dead v2 structs / functions / constants from `manifest.c` and - `kpkg.c`. -2. Lets the survivors **shed the `3` suffix** as they go public: - `DistPackageManifest` → `KitPackageManifest`, `DistKpkg3Header` → - `KitPackageHeader`, internal `dist_kpkg3_*` → `dist_kpkg_*`. The versioned - naming only existed to coexist with v2. - -**Precision:** drop the v2 *parse paths and C identifiers*, but keep the on-disk -wire magic at `kpkg3\0` / `kit-package 3` / `kit-encoding 3`. "Latest -version" means v3 on disk; renumbering the wire format would itself break -anything already produced. We stop *accepting* v2 input; we do not renumber. - -## Staged plan (each stage builds green) - -1. **Vendor move.** `driver/dist/vendor/{monocypher,lz4}` → top-level - `vendor/{monocypher,lz4}`; rewrite the shim `#include` paths; update the - Makefile. Pure relocation, no API change — lands first to isolate path - churn. -2. **Lift-and-shift the content layer.** Move `dist.{c,h}` `blob` `tree` `cas` - to `src/dist/`; add `src/api/cas.c` + `include/kit/cas.h` wrapping - blob/tree/CAS; add `KIT_CAS_ENABLED`; repoint `driver/cmd/cas.c` at the - public header + a host vtable. Smallest behavioral slice; proves the - boundary end to end. -3. **Drop v2 first, then extract the package pipelines.** Delete the dead v2 - code (see *Versioning* above) and shed the `3` suffix — a self-contained, - zero-behavior-change cleanup that shrinks the surface before it moves. Then - move `manifest` `kpkg` `minisig` `trust` `b64` `deflate` `lz4` `tar` + - crypto shims to `src/dist/`; lift the `pkg_create_*` / `pkg_verify_*` / - unpack / key-resolution logic out of `driver/cmd/pkg.c` into - `src/api/package.c` behind `kit_pkg_*`, converting operational `driver_errf` - → `api_diagf` (see *Error reporting* above) and `driver_*` fs/random → - `KitDistHost`. `pkg.c` shrinks to arg parsing + host wiring + trust-path/env - policy. This is the bulk of the work and the main risk. -4. **Tests.** Keep `test/cas/run.sh` + `test/pkg/run.sh` as end-to-end CLI - tests; optionally add unit tests that call the new public API directly (now - possible — coverage was CLI-only before). -5. **Docs.** Update `../DISTRIBUTE.md` paths (the layering diagram's - `driver/dist/*` rows become `src/dist/*` + the two public headers), the - `../DESIGN.md` layering box (the `driver/dist/` callout moves), and the - `CLAUDE.md` code map (add `vendor/`, `src/dist/`, `kit/cas.h`, - `kit/package.h`). - -## Risks / watch items - -- **Error reporting** is decided (see above): `KitStatus` + `ctx->diag`, no - boundary err-buffers. Remaining care is mechanical — route the ~70 operational - `driver_errf` sites to `api_diagf` while leaving arg-parse errors in `pkg.c`, - and confirm the CLI's stderr output is unchanged by the existing - `test/pkg/run.sh` corpus. -- **Trust policy must not leak into the library.** `$KIT_TRUSTED_KEYS` / - `$HOME` defaulting and `--tofu` write-back are *driver* policy; the library - takes resolved bytes/paths and returns "would-pin this key id" decisions for - the driver to act on. Keep `getenv` driver-side. -- **Binary-format stability.** Once the manifest/tree/kpkg model is public, the - determinism invariants in DISTRIBUTE.md become a public contract. With v2 gone - there is only one format to preserve — keep the wire magic at v3 (do not - renumber) and lock the bytes with the existing corpus before refactoring. -- **Subsystem flag matrix.** Verify `KIT_PKG_ENABLED && !KIT_CAS_ENABLED` - is a build-time error, and that both-off drops the vendored crypto so a - no-dist embedding stays clean (assert as the other subsystems do). diff --git a/doc/plan/LTO.md b/doc/plan/LTO.md @@ -1,534 +0,0 @@ -# LTO / Whole-Program Optimization (planned work) - -This is the forward-looking plan for link-time optimization in kit: making a -library or executable look like a single translation unit to the optimizer, so -inlining, dead-code elimination, internalization, and the rest of the -interprocedural family can cross TU boundaries. It deliberately does **not** -target GCC/Clang LTO bitcode compatibility. The initial scope is kit invocations -that provide all sources up front (`kit cc *.c -O2 -flto -o prog`); separately -compiled IR objects are a later phase that reuses the same core. - -The optimizer baseline this builds on — the recording IR, the -recording/optimizing boundary, the finalize path, and the pass catalog — is in -[../OPT.md](../OPT.md) and [OPTIMIZER.md](OPTIMIZER.md). The link-time symbol -model is in [LINKER.md](LINKER.md). The CG/object lifetime boundary used by the -remaining Phase 1 staging work is in -[CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md). This document treats those as given -and describes only the LTO-specific additions. - -The headline finding from investigating the tree: **most of the machinery for -whole-program optimization already exists; it is just per-TU, single-arch, and -partly unreached.** LTO here is three concrete refactors plus wiring, not a new -subsystem. The largest of the three is factoring the linker's symbol-resolution -policy out so it can run at merge time as well as at link time. - -## Status (2026-06-04) - -**Phase 0 is complete and shipping; Phase 1's all-sources-up-front LTO path is -implemented in this branch.** The end state is not a C-only shortcut: -every source-building verb routes through one staging engine, and every -in-tree frontend declares either semantic CG staging or opaque-object -participation. The link-picture-driven preserved/export prepass now feeds -`kit_cg_finish`, and executable LTO internalizes non-preserved globals before -the whole-module reachability walk. Where reality diverged from the original -wording below: - -- **The gate is `-O1`, not `-O2`.** Whole-program optimization (deferred emit + - module sweep + inliner) runs whenever the optimizer runs: - `o->whole_program = (level >= 1)` in `opt_cgtarget_new`. `-O2` is treated as - `-O1` for now. References to `-O2`/`-fwhole-program` gating below are superseded. -- **One arch path, no identity checks.** The ARM64-only sweep is now - `opt_whole_module_finalize` for every arch; `src/opt` has zero - `arch == KIT_ARCH_*` checks. The sret arg-slot rule moved off arch identity to - `ABIFuncInfo.sret_consumes_int_arg` (set per ABI impl). Remaining generic-layer - arch identity (`src/cg/type.c`, `src/cg/atomic.c`, `src/link/link_resolve.c`) is - tracked as separate cleanup, not part of LTO. -- **Cross-TU LTO will be opt-in behind `-flto`** (revisit making it the `-O1` - default once proven) — resolves the flag-surface open question. -- **Frontend participation is explicit.** C, Toy, and Wasm lower into a - caller-owned open `KitCg`; asm is an opaque LTO participant and continues to - compile as an ordinary object. -- **The lifecycle target is borrowed `KitCg` + caller-owned `ObjBuilder`, not a - separate LTO unit abstraction.** `ObjBuilder` owns object lifetime; `KitCg` - records source units into a borrowed object and finishes semantic codegen with - link-picture policy. See [CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md). -- **`symresolve_merge` signature** as built is `(SymAttrs existing, SymAttrs - incoming)` with `in_comdat` carried inside `SymAttrs`; no separate `coff_target` - parameter (the COMDAT flags carry everything the decision needs). -- **Preserved/export internalization is part of Phase 1.** The LTO CG finish - path receives linker-computed preserved symbols for executable links, and - `cc -shared -flto` remains disabled until shared-library output is exercised. - -### Done - -- [x] **§6.1 Generalize the finalize sweep to all arches** — `opt_whole_module_finalize` - (`src/opt/opt.c`); x64/rv64 defer-to-finalize; `-O0` and the JIT/interp/run paths - unchanged; `opt_maybe_capture_interp` still invoked per reachable func. -- [x] **§6.4 Wire `opt_inline`** over the reachable `FuncSet` — `opt_run_o1_native` - split into `opt_o1_native_prepare` / `opt_o1_native_finish`; the sweep lowers the - live set into one FuncSet, runs the inliner, then finishes each func. -- [x] **Interposition soundness fix** (strengthens §9): weak/interposable callees are - never inlined — `opt_cg_func_interposable` marks them `KIT_CG_INLINE_NEVER`, honored by - both the streaming tiny-inliner and the whole-program inliner. Caught by a - strong-over-weak override case the prior (tiny-inliner) behavior miscompiled. -- [x] **§3 `symresolve` extraction** — `src/obj/symresolve.{h,c}`; - `link_resolve_symbols` refactored onto `symresolve_merge`; `link_bind_strength` / - `link_sym_is_def` / `link_sym_is_spurious_undef` are now wrappers. Behavior-preserving - (test-link 122/0, test-macho 80/0, ODR/weak/common/COMDAT all covered). -- [x] **§3 `ObjBuilder` name→id index** — `SymNameIndex` in `src/obj/obj.c`; - `obj_symbol_find` is an authoritative O(1) hash lookup with no linear scan, kept - exact through `obj_symbol_ex` and `obj_symbol_rename`. -- [x] **Tests** — `test/opt/whole_program_inline.sh` (wired `test-opt-whole-program-inline`): - static callee fuses on aa64/x64/rv64, weak callee kept out-of-line (interposition - guard), `opt.inline.inlined` fires at `-O1`, and the kit-native build verbs - (`build-obj`/`build-exe`) fuse too. -- [x] **Build verbs participate.** `build-exe`/`build-lib`/`build-obj` (which replaced - `compile` on `main`) compile each source to an in-memory builder under one - `KitCompiler` via `build_compile_all` (`driver/cmd/build.c`) and route through the - shared `kit_cg` path, so per-TU whole-program optimization applies at `-O1` with no - verb-specific wiring. `build_compile_all` is also the single seam the Phase 1 - cross-TU staging loop will hook (all three verbs at once); `cc` keeps its own - `cc_run_link_exe` → `link_engine` path. - -### Phase 1 source-staging checklist - -- [x] **Architecture lock-in.** Phase 1 is implemented as a frontend staging - and CG/ObjBuilder lifecycle refactor, not a C-driver shortcut. All - source-building verbs (`cc`, `build-exe`, `build-lib`, `build-obj`) route - through the same staging engine. Frontends explicitly declare how they - participate: semantic `kit_cg` staging for frontends that lower through CG, or - opaque-object participation for inputs that cannot expose semantic IR - (notably asm). The change is not complete until every in-tree frontend is - opted into one of those modes. -- [x] **§2 Skip-intern locals.** In `kit_cg_decl` (`src/cg/session.c:198`), for - `SB_LOCAL` bindings skip `obj_symbol_find` and always mint a fresh id. Confirm the - per-`Decl` id cache keeps intra-TU static reuse pointing at the cached id, and that - single-TU behavior is unchanged (locals are already unique per name within a TU). -- [x] **§4 Recording-arena lifetime — settle first.** Choose dedicated LTO arena vs - `c->global` for the recorder/`CgIrModule` so accumulated IR outlives each per-TU - frontend run. This is the one structural hazard (§9). -- [x] **§4 Source staging under the current CG API.** Add a deferred-finalize - mode to `kit_cg`: record N TUs into one shared session / `ObjBuilder` / - `CgIrModule` without per-TU finalization, then finish CG and finalize the - object once. Keep per-TU frontend state (Pool/DeclTable/type interning) - independent. -- [x] **§4 CG/ObjBuilder borrowed lifecycle.** Replace the former - object-shaped CG bracket with the lifecycle in - [CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md): caller-owned `ObjBuilder`, - borrowed `KitCg`, explicit unit boundaries, `kit_cg_finish` for semantic - codegen policy, and caller-owned object finalization. One-TU and multi-TU - builds now use the same - ownership model. -- [x] **§3/§4 Recording-time merge.** At the per-TU staging boundary, when a TU - contributes a body for a symbol already defined, call `symresolve_merge` to pick the - winner; drop the loser's `CgIrFunc`/data and keep its decl as a reference; report ODR - at the second definition's `SrcLoc`. -- [x] **§4 Driver loop + `-flto` flag.** Parse `-flto` in `cc` and the build verbs, - thread an LTO flag through `KitCodeOptions`/the driver, and add the staging path: - one shared session, frontend per source, one CG finish/object finalize, single - builder to the link session. Hook it at `build_compile_all` - (`driver/cmd/build.c`) so build-exe/lib/obj get it together, plus - `cc_run_link_exe`. (The build verbs already share one `KitCompiler`, so the - seam is in place.) -- [x] **§5 Preserved/export set.** Compute from the assembled link (entry symbol, - dynamic exports, undefs referenced by opaque inputs, `used`/init-fini/asm-named/IFUNC/ - address-significant) and hand it to `kit_cg_finish`. Current Phase 1 behavior - is conservative for relocatable/archive outputs, while executable outputs - internalize non-preserved LTO definitions. Shared-library LTO remains disabled - until shared output is exercised. -- [x] **§6.2 Internalize** non-preserved globals using the preserved set (unlocks - cross-TU DCE and unconstrained inlining), then re-run GC. -- [x] **Tests.** A two-TU `test/smoke` (or `test/link`) case where a cross-TU callee - inlines under `-flto`; a guard that a weak/exported cross-TU symbol is *not* - inlined/internalized; cross-TU ODR reported at the right `SrcLoc`. - -## Baseline (what already exists) - -A handful of facts about the current code path frame everything below. - -- **Globals already intern by name within an `ObjBuilder`.** `kit_cg_decl` does - `obj_symbol_find` then reuse-or-create (`src/cg/session.c:198`). Two frontends - that `decl` `foo` into the *same* builder receive the *same* `ObjSymId`. The - CG and optimizer IRs reference call targets and globals by `ObjSymId` - (`IRCallAux.desc.callee.v.global.sym`), so a caller's `call foo` already points - at the id the definer will define — no remap, no clone. This is the load-bearing - fact for the whole design. -- **The recorder already accumulates a whole module.** One `CgIrRecorder` owns - one `CgIrModule` and appends every `func_begin`/`func_end` into it - (`src/cg/ir_recorder.c`), flushing only at `finalize`. `CgIrModule` - (`src/cg/ir.h:270`) holds all functions, aliases, and file-scope asm. Per - function it carries `call_refs` and `global_refs` symbol sets - (`src/cg/ir.h:247`) — the call/use graph is materialized during recording. -- **The optimizer already finalizes over the whole module — for one arch.** - `opt_on_finalize` (`src/opt/opt.c:566`) hands the entire `CgIrModule` to - `opt_emit_reachable_aarch64` (`src/opt/opt.c:495`), which seeds a root set - (non-`LOCAL` symbols, `KIT_CG_SYM_USED` locals, alias targets, exported data - relocs), walks each function's `call_refs`/`global_refs` plus the data-reloc - graph, **removes unreachable local symbols**, then lowers + optimizes + emits - only what is live. This is whole-program GC for one TU. x86-64 and riscv64 - instead emit eagerly per function in `opt_on_func` (`src/opt/opt.c:322`); they - have no module pass. -- **The whole-program inliner exists and is unreached.** `opt_inline(FuncSet*, - max_iters)` (`src/opt/pass_inline.c:667`) does topologically ordered, - growth-gated, call-graph inlining over a `FuncSet` of lowered `Func`s. Only the - streaming tiny variant `opt_try_tiny_inline` (cost cap 8, straightline only) - runs today. The real inliner has never had a caller. See OPTIMIZER.md §6. -- **The driver already shares one `KitCompiler` across sources** and keeps - objects in memory through link. `cc_run_link_exe` compiles each source to its - own in-memory `KitObjBuilder` under one compiler (`driver/cmd/cc.c:2655`, - `objs[]` at `:2585`) and hands the builders straight to the link session via - `kit_link_session_add_obj` (`:2735`/`:2771`) — no temp `.o` files. The - orchestration seam for LTO is already where it needs to be. -- **The obj layer has no resolution policy and no name index.** - `obj_symbol_define` is last-writer-wins with no precedence check - (`src/obj/obj.c:544`), and `obj_symbol_find` is a linear scan - (`src/obj/obj.c:534`). The only resolution rule anywhere below the linker is - the weak-demotion special case hand-coded into `kit_cg_decl` - (`src/cg/session.c:203`). All real precedence lives in `link_resolve_symbols` - (`src/link/link_resolve.c:258`). - -The merged module, then, is not something LTO must *build*. It is something the -recorder already builds and the finalize path already consumes — for one TU, on -one arch. LTO is mostly about *not tearing it down between TUs*, *generalizing -the finalize pass*, and *applying real resolution policy as the merge happens*. - -## 1. Design decision: shared context, not clone-and-merge - -Two architectures can make the optimizer see one module: - -1. **Clone-and-merge.** Each TU records into its own `CgIrModule`/`ObjBuilder`; - an IR-linker deep-copies every function into a merged module, rebuilds the - symbol table, and remaps every operand/reloc/alias to merged ids. -2. **Shared context.** All TUs record into *one* live session — one - `ObjBuilder`, one recorder, one `CgIrModule` — so globals unify in place via - the existing decl interning and the finalize path sees the union directly. - -We choose **shared context**. The comparison: - -| | Shared context | Clone + remap | -|---|---|---| -| Global identity | Free (decl already interns by name) | Rebuild symbol table + remap every operand/reloc/alias | -| Memory / time | Record once, in place | Duplicate all IR into a merge arena | -| Resolution policy | Apply at the per-TU merge boundary | Apply in the merge pass | -| Local distinctness | Skip-intern locals (small CG change) | Falls out of remap | -| Lifecycle cost | Staging mode + cross-TU arena lifetime | None — TUs stay independent | -| Net new code | Mostly wiring + policy extraction | A full cloner/remapper on the hot path | -| Serialized objects (Phase 2) | Deserialize = replay records through the same recording/merge API | A separate clone-from-bytes engine | - -Clone's only advantage is that TUs stay fully independent, so there is no -staging lifecycle to manage. That is not worth re-implementing the symbol merge -the linker already knows how to do, nor the per-TU IR duplication. The decisive -row is the last one: shared context makes the recording/merge API the single -funnel. A frontend feeds it; a `.kit.ir` deserializer feeds it the same way. -There is no second merge engine to build for Phase 2 — fat-object LTO becomes -"replay serialized decl/func records into the live shared module," reusing the -same local-handling and resolution code paths verbatim. - -The rest of this document describes the shared-context design. - -## 2. Symbol identity: what unifies, what must stay distinct - -Shared context gets global unification for free (§Baseline). The one correctness -trap is **local symbols**, and there is exactly one rule to add. - -`kit_cg_decl` interns *every* name through `obj_symbol_find`. For globals that is -correct and desirable. For `SB_LOCAL` symbols it is wrong: two TUs each with -`static int x;` (the frontend passes the bare name with LOCAL binding, -`lang/c/decl/decl.c:72`) would collapse to one symbol. The same hazard exists for -static functions, and for the per-TU counters behind block-scope statics -(`mint_static_local_sym`, `lang/c/parse/parse.c:660`) and compound literals -(`mint_compound_literal_sym`, `lang/c/parse/parse_init.c:1012`), which reset per -parser and would produce colliding names like `y.0` and -`__kit_compound_literal.2` across TUs. - -**Fix:** for `SB_LOCAL` bindings, skip `obj_symbol_find` and always mint a fresh -id. Consequences, all benign: - -- Two locals named `x` get distinct `ObjSymId`s. Duplicate `STB_LOCAL` names in - one object are legal in every format kit emits; locals never enter the global - name table; the optimizer indexes functions by id, not name. -- The frontend caches the id per `Decl`, so intra-TU reuse of a static is - unaffected — the second reference goes through the cached id, not a fresh decl. -- The static-vs-extern-same-name case resolves correctly: the static gets a fresh - id; an unrelated `extern foo` keeps the shared global id. - -No frontend mangling is required. Anonymous read-only data (`.Lkit_ro.N`, -`src/cg/memory.c:102`) needs no change at all once the session is shared, because -the `rodata_counter` is no longer reset between TUs (see §4) and keeps climbing. - -## 3. Resolution policy: factoring `symresolve` out of the linker - -Today, if we naively share one `ObjBuilder`, we lose all symbol-resolution -semantics: `obj_symbol_define` overwrites last-writer-wins, so two strong -definitions silently clobber instead of raising an ODR error, strong-vs-weak -becomes declaration-order dependent, and commons never merge. The precedence -rules we need already exist in `link_resolve_symbols` -(`src/link/link_resolve.c:258`): strong-vs-strong → ODR error (modulo COFF -COMDAT/SELECTANY), strong beats weak, weak-weak keeps the first, common merging -takes max size/align, and a definition beats a common. - -Per the investigation, that logic is cleanly separable. The *decision* is pure -over `(name, bind, kind, size, align, common_align, defined?, in_comdat)` -tuples; only the *bookkeeping* — the `globals` `SymHash`, the per-input -`InputMap`, COMDAT section discard, DSO iteration — is entangled with linker -state. - -**Extract a small shared module**, `src/obj/symresolve.{h,c}` (the obj layer is -the natural home; both consumers sit above it): - -```c -// pure: no linker state, no allocation -SymMergeResult symresolve_merge(SymAttrs existing, SymAttrs incoming, - int coff_target); -// -> KEEP_EXISTING | REPLACE | MERGE_COMMON(size, align) -// | COMDAT_DISCARD | ODR_ERROR -``` - -Move `link_bind_strength`, `link_sym_is_def`, and `link_sym_is_spurious_undef` -(`src/link/link_internal.h`) alongside it. Then: - -- **Refactor `link_resolve_symbols` onto it.** A pure cleanup with no behavior - change, fully covered by the `test/link` corpus, that gives the policy one - source of truth and leaves the linker better than we found it. -- **The LTO staging coordinator calls the same function** at the per-TU merge - boundary. Crucially this is a *binding-precedence* decision — which body wins — - not id remapping, because ids are already unified. When TU B contributes a body - for a global TU A already provided, `symresolve_merge` decides whether to keep - A's `CgIrFunc`/data, replace it with B's, merge commons, or raise ODR. The - loser's body is dropped from the module and its decl remains as a reference. - ODR conflicts are reported at the second definition's `SrcLoc` — better - diagnostics than the linker's post-hoc panic, because source locations still - exist at this point. - -Two mechanical needs fall out of sharing one builder: - -- **Give `ObjBuilder` a `name -> id` hash map.** With the whole program's symbols - in one builder, the linear `obj_symbol_find` (`src/obj/obj.c:534`) is O(n²) at - decl time. The assembler already carries its own `SymSymMap` precisely because - obj lacks one (`src/asm/asm.c`). Adding the index to the builder removes the - quadratic and hosts the resolution hook — a win even for ordinary single-TU - compiles, and it lets the assembler shed its private map later. -- **One open question on `define` timing.** During pure recording a global is - *declared* (with binding) but not *defined* in the obj sense until finalize - emits its section/offset. So the "which body wins" decision must run at the - staging boundary against the set of bodies a TU contributes (it has a - `CgIrFunc` or data record for the symbol), not at `obj_symbol_define` time. - This is the linker's per-input symbol merge applied to `CgIrModule` - contributions. - -The opaque inputs in a link (libc, crt, kit archives, DSOs) are still resolved by -the linker at link time against the single emitted LTO object. So the policy -module has two call sites — recording-time merge among the LTO set, and -link-time resolution against everything — which is the justification for -extracting it rather than duplicating it. - -## 4. The staging lifecycle - -The lifecycle target for Phase 1 is documented in -[CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md). The short version: `ObjBuilder` -owns object lifetime, while `KitCg` borrows an object, records one or more -semantic units, and finishes codegen into that object. `kit_cg_finish` is a CG -flush/lowering/debug operation; it is not object finalization. - -The old object-shaped bracket used to finalize (lowers + emits everything), -null `g->obj`/`g->target`, and reset per-object state including -`rodata_counter` (`src/cg/session.c`). The structural state is now a borrowed -lifecycle: - -- **Record each TU as a unit in one live CG session without object - finalization.** Run a single `kit_cg_finish` after the last semantic source, - then let the caller finalize the `ObjBuilder`. The shared path records N - semantic frontends into one shared `KitCg` / `ObjBuilder` and finalizes once - through the explicit lifecycle: `kit_cg_begin`, `kit_cg_begin_unit`, - `kit_cg_end_unit`, `kit_cg_finish`, and `kit_cg_detach`/`kit_cg_abort`. - Drivers collect sources and opaque inputs; they do not implement definition - selection, IR lifetime, semantic finalization, or object finalization policy. -- **Frontend participation is explicit.** `KitFrontendVTable` has a split - contract: semantic frontends implement `compile_cg`, while opaque frontends - implement `compile_obj`. C, Toy, and Wasm participate by emitting into a - caller-owned open `KitCg` session; one-TU object builds are wrapped at the - compile-session layer by creating an `ObjBuilder`, attaching `KitCg` for one - unit, finishing CG, and then finalizing the object. - Asm has no semantic CG representation, so its LTO participation mode is opaque: - it compiles to an ordinary object and contributes references/definitions to the - link picture but not to the merged optimization module. This keeps all verbs - and all frontends on one declared path while allowing semantic frontend opt-in - one at a time. -- **The recording arena must outlive any single TU.** The recorder and module are - arena-allocated from `c->tu` today (`opt_cgtarget_new`, `cg_ir_recorder_new`). - In the current implementation `c->tu` is already compiler-session lifetime - (not reset between source inputs), so Phase 1 uses it as the cross-TU recorder - arena and documents that lifetime. If `c->tu` later becomes per-source again, - the shared CG path must switch to an explicit cross-source arena; the frontend - staging API must not depend on that allocator choice. -- **Each TU keeps its own frontend state.** The per-TU `Pool`, `DeclTable`, and - type interning stay independent; only the CG session and `ObjBuilder` are - shared. The shared `KitCompiler` already spans sources today, so `c->global` - name interning is already consistent across TUs. - -The driver change is a shared staging engine: group every LTO-capable source -input in command-line order, stage semantic frontends into the borrowed CG -session and shared object, compile opaque frontends/objects as ordinary inputs, -then finish CG once and substitute the resulting builder at the right place in -the link order. The hook is `build_compile_all` in `driver/cmd/build.c` (shared -by build-exe/build-lib/build-obj) and `cc_run_link_exe` — both already compile -every source under one `KitCompiler`, which is the seam this loop replaces. -(`compile`/`compile_engine` from the original plan were retired in favor of the -build verbs on `main`.) - -## 5. The export / preserved set - -Internalizing a global — demoting it to hidden/local, which unlocks DCE and -unconstrained inlining — is sound only when nothing outside the LTO set can -reference it by name and it is not interposable. This is the one input that -genuinely needs the full link picture, so it is computed at link time and handed -to the LTO core. A symbol must be **preserved** if it is: - -- the entry symbol (`main`/`_start`), or in the dynamic export set; for `-shared`, - default-visibility symbols are interposable and must **not** be internalized or - inlined across unless `-fvisibility=hidden` / a version script / `-Bsymbolic` - says otherwise; -- referenced (undefined) by any **opaque** input — libc/crt calling `main`, a - kit archive member that is not IR, a DSO; -- `__attribute__((used))`, in an init/fini array, named in inline or file-scope - asm, an IFUNC resolver, or address-significant in an opaque input. - -The linker already answers "is this symbol referenced from outside" for archive -pull (`scan_presence_before` / `member_satisfies`, `src/link/link_resolve.c:859`, -`:923`); the preserved set is the same question asked of the LTO set against the -opaque inputs and the output-kind/visibility policy. Conservative default: -internalize only for executable outputs or provably non-exported symbols. - -Phase 1 implements this for all-sources-up-front executable LTO: the driver -stages semantic sources, assembles the ordered link session, asks the linker for -preserved LTO symbols, then passes those IDs to `kit_cg_finish` before object -finalization. Relocatable and archive-member outputs remain conservative because -later links may still reference globals by name. Shared-library LTO continues -to reject until shared output policy is exercised. - -## 6. The whole-program optimization core - -With a merged module and a preserved set, the core is `opt_emit_reachable_aarch64` -generalized: - -1. **Generalize the finalize sweep to all arches.** Lift the ARM64-only path in - `opt_on_finalize` into an arch-independent `opt_whole_module_finalize`, and - switch x86-64/riscv64 from eager per-function emit to defer-to-finalize when - the whole-program path is active. Keep `-O0`/`-O1` streaming and the - JIT/interp/`run`/`dbg`/`emu` paths on the existing eager path — LTO is an AOT - concern. The one verification item is that nothing downstream depends on x64/rv64 - eager emission (`opt_maybe_capture_interp`). -2. **Internalize** non-preserved globals using the §5 set. -3. **GC** unreachable functions and data (the existing reachability walk, now over - the whole program). -4. **Lower the reachable set into a `FuncSet`** and run `opt_inline` — the - already-written, never-called whole-program inliner — with a real cost model. -5. **Emit one object** and substitute it for the IR inputs before the final link. - -Steps 1 and 4 are independently valuable and land first (Phase 0): they turn the -unreached inliner and the generalized sweep into a tested, shipping path on a -single TU before any cross-TU complexity exists. - -## 7. Phased delivery - -**Phase 0 — Whole-translation-unit optimization.** No merge, no serialization, -no driver changes. Generalize the finalize sweep to all arches, switch -x64/rv64 to defer-to-finalize under the whole-program path, and wire `opt_inline` -over the reachable `FuncSet`. Delivers real cross-function inlining within a TU at -`-O2` on every arch, generalized dead-static elimination, and the inliner finally -exercised on real code. Lowest risk — purely inside the optimizer — and it -validates the deferred-emit path that Phase 1's staging lifecycle also relies on. - -**Phase 1 — Shared-context, all-sources-up-front LTO.** The target case, -`kit cc *.c -flto -o prog` and `kit build-exe -flto` (and `build-lib`/`build-obj`; -`build-obj` replaced the retired `compile`). Build on Phase 0 by adding: -(a) the `symresolve` extraction (§3), (b) the `ObjBuilder` name index (§3), -(c) skip-intern for locals (§2), (d) the `KitCg`/`ObjBuilder` borrowed staging -lifecycle and the driver loop that records N frontends into one session and -finishes CG once (§4), (e) the preserved set fed from the assembled link into -`kit_cg_finish` (§5). No cloner, no serialization, no archive support yet. - -**Phase 2 — Serialized IR objects (`.kit.ir`).** Optional follow-on for separate -compilation, archives, and build caches. `kit cc -c -flto a.c` emits a normal -object whose symbol table is the real decl set — so the linker's symbol-driven -archive pull works unchanged — plus a `.kit.ir` custom section (the object model -already supports arbitrary `SEC_OTHER` sections) carrying a serialized -`CgIrModule`. The linker detects the section and **replays the records into the -same shared context** through the same recording/merge API, reusing -skip-intern-locals and `symresolve_merge` verbatim. Archives of IR objects work -because pull is symbol-table driven. Note: whole-program LTO is incompatible with -the file-based incremental linker (LINKER.md); `-flto` forces a full link. - -## 8. Optimizations unlocked - -Inlining is the headline; the merged-module + `FuncSet` framework makes a whole -interprocedural family *expressible* (listed as enabled, not committed): - -- **Cross-TU / whole-program inlining** — `opt_inline`, already written. -- **Internalization** to hidden/local for non-exported globals — enables DCE, - removes PLT/GOT indirection, frees intra-function optimization. -- **Whole-program dead code / data elimination** — the generalized sweep. -- Future: devirtualization / direct-call promotion, IPSCCP and cross-function - constant/range propagation, argument promotion, identical-code folding, - `const`/`pure` inference, global-to-local-constant propagation. - -## 9. Risks, semantics, and limitations - -- **Resolution fidelity.** ODR, weak/strong, common merging, COMDAT, IFUNC, - aliases, and visibility must match the linker exactly or LTO miscompiles — - hence the shared `symresolve` module rather than a re-implementation. -- **Interposition / shared libraries.** Never internalize or inline across an - interposable default-visibility boundary unless `-Bsymbolic`/hidden visibility - makes it safe. Default conservative for `-shared`. -- **Inline and file-scope asm** naming symbols are opaque references: treat as - roots, never rename, internalize, or DCE them. -- **Debug info.** Cross-TU inlines need `inlined_subroutine` DWARF with correct - file/line; `SrcLoc` must carry file identity through the merged module. - Acceptable initial limitation: degraded inlined debug info under `-g -flto`, - stated explicitly. -- **Compile time / memory.** The whole program lives in memory; `opt_inline`'s - growth gates bound blow-up; only the LTO set is optimized, opaque inputs stay - opaque. -- **Determinism.** Record and merge in input order; iterate stably. -- **Recording arena lifetime** (§4) is the one structural hazard — settle it - before building the staging loop. -- **TLS, varargs, atomics, computed goto, label-address tables** must survive the - shared module unchanged. Function-local label addresses are already - function-scoped; cross-function `data_addr`/`pcrel`/`symdiff` reference symbols, - which are already unified by id. - -## 10. First slices - -Two independently landable, low-risk steps that de-risk the whole direction -before any LTO surface exists: - -1. **Extract `symresolve` and refactor `link_resolve_symbols` onto it.** Pure - refactor, covered by `test/link`. Lands the load-bearing piece and improves - the linker regardless of LTO. -2. **Add the `ObjBuilder` name -> id index** behind the existing - `obj_symbol_find`/`obj_symbol_ex` API. Drop-in; measurable on its own. - -Then Phase 0 (generalize the sweep + wire `opt_inline`, gated behind `-O2` / -`-fwhole-program`), validated first on x86-64 with red-green tests in `test/opt` -(a caller+callee that should fuse) and `test/smoke/x64` (behavioral parity). Then -the staging lifecycle and skip-intern-locals behind `-flto`, exercised first on a -two-TU `test/smoke` case where a cross-TU callee inlines. - -## Open questions - -- **Define-timing for resolution** (§3): confirm the staging-boundary merge is the - right hook versus an `obj_symbol_define`-time check, given symbols are only - obj-defined at emit. -- **Recording arena follow-through** (§4): Phase 1 relies on `c->tu` having - compiler-session lifetime for the cross-TU recorder/module. If frontend reset - semantics later make `c->tu` per-source again, move the recorder/module to an - explicit cross-source arena without changing the frontend staging API. -- **`-flto` flag surface** (largely resolved — see Status): `-flto` opt-in on `cc` - and the build verbs, decided per the Status section. Still open: whether - `-fwhole-program` is a distinct, more aggressive internalization mode, and whether - to make cross-TU LTO the `-O1` default later. -- **CG API exposure**: how much of the borrowed lifecycle - (`kit_cg_begin`/`kit_cg_begin_unit`/`kit_cg_finish`/`kit_cg_detach`) remains - internal to the driver (`build.c`'s `build_compile_all`, `cc_run_link_exe`) - versus becoming a public `kit_cg`/`kit_compile` surface for embedders driving - multi-TU LTO. diff --git a/doc/plan/README.md b/doc/plan/README.md @@ -11,7 +11,6 @@ shrinks to whatever remains open. | [RELEASE.md](RELEASE.md) | Cross-cutting initial-release punchlist: release scope, deferred features, and per-subsystem completion/validation items. | — | | [OPTIMIZER.md](OPTIMIZER.md) | Completing the O2 SSA mid-end, expanded inlining, -O0/-O1 performance work, machine register-constraint improvements. | [../OPT.md](../OPT.md) | | [LINKER.md](LINKER.md) | Incremental linking: the file-based object-link redesign and remaining non-ELF format coverage. | [../LINK.md](../LINK.md) | -| [RELOC.md](RELOC.md) | Genericizing the canonical-`RelocKind` half of the relocation layer. WS-B/C/E all landed (per-arch `RelocDesc` table, byte-patcher partitioned per-arch, FreeBSD IFUNC/IRELATIVE); only optional WS-A enum collapse remains. | [../OBJ.md](../OBJ.md), [../LINK.md](../LINK.md) | | [JIT.md](JIT.md) | Function-level hot reload, Go-runtime-style codegen support, and remaining JIT host-portability work. | [../JIT.md](../JIT.md) | | [DEBUG.md](DEBUG.md) | The Windows debugger host adapter, x64/rv64 displaced single-step, profiling, and DWARF gaps. | [../DBG.md](../DBG.md), [../DWARF.md](../DWARF.md) | | [WASM.md](WASM.md) | Completing the Wasm object backend and remaining parser/validator coverage. | [../WASM.md](../WASM.md) | @@ -21,9 +20,6 @@ shrinks to whatever remains open. | [BUILD.md](BUILD.md) | A new content-addressed build coordinator (Bazel/Nix-style incremental builds layered on the CAS) — storage state machine, caching algorithm, recipe protocol. Distinct from `../BUILD.md` (kit's own Makefile build). | — (new subsystem) | | [BUILD_COMMANDS.md](BUILD_COMMANDS.md) | The kit-native `build-exe`/`build-lib`/`build-obj` verbs that replace `compile`: polyglot, in-memory compile+link with `--group` flag scoping and full link-flag control. Distinct from `BUILD.md` (the CAS coordinator). | [../DRIVER.md](../DRIVER.md) | | [LLGEN_IMPORT.md](LLGEN_IMPORT.md) | Importing the standalone LL(1)/Pratt parser and lexer generator into libkit, including public API renames, file moves, build gates, and a `kit llgen` command. | — | -| [BACKTRACE.md](BACKTRACE.md) | Stack-trace support: GCC-compatible `__builtin_return_address`/`__builtin_frame_address` primitives, a freestanding `__kit_backtrace` capture helper, and symbolized backtrace printing. L1–L3a/L3c shipped; L3b (in-process self-symbolization) deferred. | [../FRONTENDS.md](../FRONTENDS.md), [../RUNTIME.md](../RUNTIME.md), [../DWARF.md](../DWARF.md) | -| [LTO.md](LTO.md) | Whole-program optimization: `symresolve` extraction, cross-TU inlining, internalization. Phase 0 (whole-TU opt) and Phase 1 (all-sources-up-front LTO) shipped; Phase 2 (serialized `.kit.ir` objects) open. | [../OPT.md](../OPT.md) | | [CODEGEN.md](CODEGEN.md) | CG API interface cleanup: PLACE/VALUE centerpiece, op/intrinsic taxonomy, atomic/order/AsmDir unification, multi-result API, i128/f128-as-VALUE. Tracks 1/3/4/5/6/7 landed; Track 2 (binop/cmp split) and Track 1c open. | [../CODEGEN.md](../CODEGEN.md) | -| [DIST_LIBRARY.md](DIST_LIBRARY.md) | Migrating the CAS/package distribution subsystem into libkit as a gated public API (`kit/cas.h`, `kit/package.h`). Main migration shipped; Stage 3 v2 dead-code deletion deferred. | [../DISTRIBUTE.md](../DISTRIBUTE.md) | | [FREEBSD.md](FREEBSD.md) | FreeBSD target support: VM harness, triple parsing, runtime variants, COMDAT/`STB_GNU_UNIQUE` fixes. Static link blocked on archive weak-alias cycle (needs `--start-group` semantics); dynamic link and full VM validation remaining. | — | | [TODO.md](TODO.md) | Open deferred fixes and code smells only. Completed items are removed instead of checked off. Not a roadmap; a current backlog. | — | diff --git a/doc/plan/RELOC.md b/doc/plan/RELOC.md @@ -1,371 +0,0 @@ -# Relocation-layer genericization (planned work) - -## Status — 2026-06-05 — WS-B (descriptor table) + WS-C (byte-patcher partition) + WS-E.2/E.3 (residual gates) landed; only the optional WS-A enum collapse remains - -This roadmap makes the **canonical-`RelocKind` half** of the relocation subsystem -as modular as the wire half already is. The goal is the project's standing -contract (see [../INTERFACES.md](../INTERFACES.md)): code that depends on a -pluggable item — here, the target **arch** — must never switch on its identity, -and adding or changing an arch's relocations must touch exactly **one place**. - -The "modularity wave" commits (`9d905b3c..769d6ae1`) already closed the two -identity *switches* in the reloc path and moved the reloc-name table onto a -per-arch hook, all via the incremental capability-hook style (narrow fields/hooks -on the existing `LinkArchDesc` / `ObjElfArchOps` vtables). **What remains is the -structural denormalization**: the per-kind static facts (width, GOT/TLS class) are -still re-enumerated in generic switches, and the byte-patcher's ISA encoders still -live in the format-neutral obj layer. This revision marks the landed items as -baseline and rescopes the open work accordingly. - -Design docs this work feeds back into once shipped: -[../OBJ.md](../OBJ.md) ("Relocation model and the shared byte-patcher"), -[../LINK.md](../LINK.md) (the reloc passes), [../INTERFACES.md](../INTERFACES.md) -(the backend contract). - -## Landed since this plan was first written (`9d905b3c..769d6ae1`) - -- **The one arch-identity switch is gone (was finding #25).** The - `(target.arch == KIT_ARCH_X86_64) ? R_X64_TPOFF64 : R_AARCH64_TPOFF64` ternary in - `link_emit_internal_tpoff64` is now `link_arch_desc_for(l->c)->tpoff64_reloc`, a - new per-arch `LinkArchDesc` field (`src/link/link_arch.h`, populated in - `src/arch/{aa64,x64,riscv}/link.c`). This is WS-A's *functional* fix via the - field route rather than the value-class collapse — the collapse remains an - optional cleanup (now WS-A below, downgraded). -- **The FreeBSD static-IFUNC OS gate is gone (was finding #18).** `use_rela_iplt` - now calls `obj_format_static_ifunc_via_rela_iplt(c)` (`src/obj/obj.h:819`, impl - `src/obj/obj_secnames.c:371`) instead of `os == KIT_OS_FREEBSD && obj == - KIT_OBJ_ELF`. WS-E item 1 is **done**. -- **The reloc-name table moved to a per-arch hook (was finding #24, partially).** - `kit_obj_reloc_kind_name` no longer inlines an x86_64 table; it lowers the - canonical kind via `reloc_to` and calls the new `ObjElfArchOps.reloc_name` - (`src/obj/format.h:65`; impls `elf_{x86_64,aarch64,riscv}_reloc_name`). **But** - the dispatch is still gated `if (fmt != KIT_OBJ_ELF || arch != KIT_ARCH_X86_64) - return NULL;` (`src/api/object_file.c:384`): the aarch64/riscv `reloc_name` - functions exist but are deliberately *not* consulted, because the rv64/aa64 - objdump golden corpus expects the arch-neutral spelling ("RV_CALL", not - "R_RISCV_CALL"). So the name *table* is now per-arch data, but a residual - two-axis identity gate remains, coupled to the test corpus. See WS-E item 3. - -Net: the reloc path now contains **no arch-identity branch**, but still -denormalizes per-kind facts across generic switches (the structural work below). - -## The thesis (what still stands) - -A relocation kind is a single logical entity. Its static attributes still live in -parallel tables the compiler cannot keep in sync: - -| Attribute | Lives in | Status | -|-----------|----------|--------| -| how to patch the bytes | per-arch `src/arch/<arch>/reloc.c` (`*_reloc_apply_insn`) + neutral `reloc_apply_neutral()` `src/obj/reloc_apply.c`; dispatched by `link_reloc_apply()` `src/link/link_reloc_apply.c` | **landed** — WS-C | -| byte width | `RelocDesc.width` (per-arch `src/arch/<arch>/reloc.c` + neutral `src/obj/reloc.c`) | **landed** — WS-B | -| uses GOT / is TLS-GOT | `RelocDesc.flags` `RELOC_USES_GOT`/`RELOC_IS_TLS_GOT` | **landed** — WS-B | -| branch / got-load / tlvp / direct-page | `RelocDesc.flags` `RELOC_IS_BRANCH`/`USES_GOT`/`IS_TLVP`/`DIRECT_PAGE` | **landed** — WS-B | -| display name | `ObjElfArchOps.reloc_name` `src/obj/format.h:65` (per-arch hook) | **landed** (with a residual gate — WS-E.3) | - -Two generic switches (`reloc_width`, `reloc_uses_got`/`is_tls_got`) still enumerate -every arch's kinds, so adding an arch's relocation edits generic `link` code; and -the GOT/branch classification is *answered twice* — once by those generic switches -(consumed by the ELF/static GOT pass) and once by the per-arch `LinkArchDesc.is_*` -hooks (consumed by the Mach-O linker). The byte-patcher's per-kind encoders — pure -ISA knowledge — still sit in the format-neutral `src/obj/reloc_apply.c`. - -## Baseline — already clean (context, not work) - -- **Per-(arch,format) wire translators** (`reloc_to`/`reloc_from`/`reloc_pcrel`/ - `reloc_length`, and now `reloc_name`) in `src/obj/{elf,macho,coff}/reloc_<arch>.c`, - reached only through the format sub-ops (`src/obj/format.h:55-81`). Adding a format - or an arch's wire encoding is a one-table change. These do **not** move; the - per-arch reloc *name* legitimately belongs here, not in the descriptor below. -- **The single-entry byte-patcher boundary.** `link_reloc_apply(c, kind, P, S, A, P)` - is reused verbatim by the static linker, JIT linker, assembler, and emulator guest - loader ([../OBJ.md](../OBJ.md): "one encoder, three loaders"). That **one-entry, - one-encoder invariant is load-bearing** and WS-C preserves it: only the - implementation behind the entry is partitioned, never the entry. -- **`LinkArchDesc`** already carries per-arch PLT/IPLT geometry, stub emitters, the - `is_*` classifiers, and now `tpoff64_reloc`. It is the proven home for per-arch - link facts; WS-B extends it (or a descriptor it points to), it does not replace it. -- **The canonical `RelocKind` enum** (`src/obj/obj.h:108`) — one global enum, - backends emit canonical kinds — is correct and stays. - -## The end state (ownership) - -``` -src/obj/reloc_apply.c neutral core: reloc_apply_neutral() — byte encoders - for the arch-independent data-word kinds (R_ABS*, - R_REL*, R_PC*, R_TPOFF*, the x64 GOT/dynamic data - slots, the RISC-V data ADD/SUB/SET arithmetic) + the - ULEB128 codec. Pure obj-core, no link/arch dep. -src/link/link_reloc_apply.c (NEW) the single public link_reloc_apply() dispatcher: - neutral-then-arch. Housed in link (not obj-core) - because resolving the per-arch slice needs - link_arch_desc_for() — same boundary call as WS-B's - reloc_desc() dispatcher. -src/arch/<arch>/reloc.c that arch's RelocDesc rows (width + class flags, WS-B) - AND its instruction-immediate byte encoders - (*_reloc_apply_insn, WS-C), reached via - LinkArchDesc.reloc_apply_insn. (R_PLT32's apply is the - RISC-V AUIPC+JALR pair, so it lives in the rv hook with - R_RV_CALL — not neutral, despite its neutral name.) -src/obj/<fmt>/reloc_<arch>.c UNCHANGED — the per-(arch,fmt) wire translators, - incl. the reloc_name spellings (already landed). -src/obj/coff/reloc.c COFF-specific kinds' RelocDesc rows (format, not arch). -``` - -After this, adding an arch's relocation is **one row** (width + flags) in that -arch's `reloc.c`, one byte encoder beside it, and one wire-translator entry — all -arch-local. No generic file in `src/link` or `src/api` enumerates relocation kinds. - ---- - -## WS-A — Value-class kind collapse (addresses **A**) — *#25 done; collapse optional* - -**Status.** The identity switch (#25) is **fixed** via `LinkArchDesc.tpoff64_reloc`. -What remains is the underlying naming smell, now *optional* and lower-value: the -canonical enum still carries two byte-identical 64-bit-tpoff kinds, and RISC-V -reuses the AArch64-named one cross-arch (`src/arch/riscv/link.c:131,149: -.tpoff64_reloc = R_AARCH64_TPOFF64`). - -**Optional cleanup.** Collapse `R_X64_TPOFF64` + `R_AARCH64_TPOFF64` → a neutral -`R_TPOFF64` (apply arm is shared already, `reloc_apply.c:98-99`). This additionally -**retires the `tpoff64_reloc` field** — once all three arches name the same kind, -`link_emit_internal_tpoff64` just writes `R_TPOFF64` and the per-arch field has no -remaining variation. Touch-sites: `obj.h:198,284` (enum), `reloc_apply.c:98-99` + -`reloc_width` (fold arms), `obj/elf/reloc_x86_64.c` (`R_TPOFF64 ↔ -ELF_R_X86_64_TPOFF64`; aa64 stays wire-less), `obj/elf/link.c:352,388` (the two -arch-specific tpoff-classification helpers — verify the variant-I/II *coordinate* -selection there keys on the ABI/arch context, not on the kind name, before -merging), and `arch/{aa64,x64,riscv}/link.c` (drop `.tpoff64_reloc`). - -**Defer unless** doing WS-B/C anyway — it is pure tidiness now and best folded into -that pass (the descriptor work touches the same enum + apply arms). No urgency: -there is no remaining identity switch here. - -**Oracle.** `make test-link test-elf test-smoke-x64 test-smoke-rv64 -test-aa64-inline` + a TLS `test-toy` slice + `make bootstrap` (IE-model TLS). - ---- - -## WS-B — One per-arch `RelocDesc {width, flags}` table (addresses **B + C**) — *LANDED* - -**Status (landed).** `RelocDesc {u8 width; u8 flags}` resolved arch-aware by -`reloc_desc(c, k)`: -- neutral data-word kinds → `src/obj/reloc.c` (`reloc_desc_neutral`, pure obj-core); -- per-arch slices → `src/arch/{aa64,x64,riscv}/reloc.c`, reached through a new - `LinkArchDesc.reloc_desc` hook that replaces the five `is_*` hooks; -- dispatcher + `reloc_kind_*` predicates → `src/link/link_reloc_desc.{h,c}`. - -Placement note: the **dispatcher** lives in `src/link`, not the plan's -`src/obj/reloc.c`, because resolving the per-arch slice needs `link_arch_desc_for()` -— housing it in obj-core would invert the obj→link boundary (CLAUDE.md). The neutral -descriptor *data* is still pure obj-core (`src/obj/reloc.c`). The arch slice wins over -neutral so `R_PLT32` can be a branch on x86-64/RISC-V but flag-free on AArch64 while -sharing the neutral width. - -Deleted: `reloc_width` / `reloc_uses_got` / `reloc_is_tls_got` (link_reloc_layout) and -`jit_reloc_width_local` (link_jit). Migrated consumers (GOT/stub/width passes, -`link_jit`, and the Mach-O `is_*` call sites) read `reloc_kind_*`. Migration guard: -`test/link/reloc_desc_test.c` — frozen-oracle parity over every kind × every backend -arch (3016 checks). `rg "case R_(AARCH64|X64|RV)_" src/link` is now empty; full -link/elf/macho/ar/isa/aa64-inline suites + `make bootstrap` (debug+release, -byte-identical) pass. WS-A's enum collapse stays deferred — `tpoff64_reloc` remains a -per-arch field. - -**Problem (original).** `reloc_width()` and `reloc_uses_got()`/`reloc_is_tls_got()` are generic -switches re-enumerating every arch's kinds, and the GOT/branch classification is -answered *twice* (those switches vs the per-arch `LinkArchDesc.is_*` hooks). Adding -an arch's reloc edits generic `link_reloc_layout.c`; the two classification -mechanisms can silently disagree. - -**Change.** One descriptor, owned per-arch, as the single source of a kind's static -*structural* facts. **Name is excluded** — it already landed on the per-arch wire -ops (`ObjElfArchOps.reloc_name`), which is its correct home; the descriptor carries -only width + classification. - -```c -/* src/obj/reloc.h (new) */ -typedef enum RelocDescFlag { - RELOC_PCREL = 1u << 0, - RELOC_USES_GOT = 1u << 1, - RELOC_IS_TLS_GOT = 1u << 2, - RELOC_IS_BRANCH = 1u << 3, /* needs a JIT/range veneer (== needs_jit_call_stub) */ - RELOC_IS_TLVP = 1u << 4, /* Mach-O TLV page/pageoff */ - RELOC_DIRECT_PAGE = 1u << 5, /* Mach-O ADRP-direct */ - RELOC_MARKER = 1u << 6, /* RELAX/ALIGN/TPREL_ADD — no bytes */ - RELOC_WIDTH_DYN = 1u << 7, /* ULEB128 — width read from bytes at apply */ -} RelocDescFlag; - -typedef struct RelocDesc { u8 width; u8 flags; } RelocDesc; - -const RelocDesc* reloc_desc(const Compiler* c, RelocKind k); /* caller holds target arch */ -``` - -**Ownership / assembly.** `reloc_desc()` resolves neutral-core kinds from a table in -`src/obj/reloc.c`; arch-family kinds dispatch to `link_arch_desc_for(c)->reloc_desc(k)` -(a new `LinkArchDesc` hook returning that arch's slice, the same shape as the -existing `is_*`/`tpoff64_reloc` entries); COFF-family kinds resolve from a COFF slice. -Adding an arch is one slice in `src/arch/<arch>/reloc.c` — no generic edit. - -**Migrate consumers, then delete the generic switches:** -- `reloc_width()` (`link_reloc_layout.c:256`) → delete; callers read - `reloc_desc(c,k)->width`. Keep the `RELOC_WIDTH_DYN` sentinel + the ULEB128 - offset-bounds guard (`link_reloc_layout.c:1117-1126`). -- `reloc_uses_got()`/`reloc_is_tls_got()` (`link_reloc_layout.c:392,380`) → delete; - the GOT pass reads `reloc_desc(c,k)->flags & RELOC_USES_GOT / RELOC_IS_TLS_GOT`. -- The four `LinkArchDesc.is_*` hooks (`link_arch.h:79-82`) + their impls in - `src/arch/{aa64,x64,riscv}/link.c` → delete; the Mach-O linker callers - (`src/obj/macho/link.c:420,492,566,1483,1496,1505,1514,1563`) read descriptor - flags. `needs_jit_call_stub` (`link_reloc_layout.c:594,1095`) → `RELOC_IS_BRANCH` - (it aliases `is_branch_reloc` on every arch today). - -End state: **no generic file classifies or sizes relocations by enumerating arch -kinds, and each fact has exactly one source** — width/flags in the descriptor, -name on the wire ops. - -**Exhaustiveness test (the red-green anchor).** Add `test/obj/reloc_desc` iterating -**every** `RelocKind` for each enabled arch, asserting `reloc_desc()` returns a row -(`width != 0` unless `MARKER`/`WIDTH_DYN`). Cross-check that, for every kind the old -`reloc_width()` covered, the descriptor returns the *same* width (a migration guard). -This makes "forgot a row" a failing test instead of a silent default. Write it red -first. - -**Oracle.** The exhaustiveness/migration test, then `make test-link test-elf -test-macho test-ar test-smoke-x64 test-smoke-rv64`, then `make bootstrap` -(macOS/aa64 bootstrap drives the Mach-O GOT/TLVP/branch classifiers that the `is_*` -deletion touches; byte-identity catches any width drift). - ---- - -## WS-C — Partition the byte-patcher per-arch behind the single entry (addresses **D**) — *LANDED* - -**Status (landed).** The instruction-immediate byte encoders moved into each -backend as `*_reloc_apply_insn` (`src/arch/{aa64,x64,riscv}/reloc.c`), reached -through a new `LinkArchDesc.reloc_apply_insn` hook (`src/link/link_arch.h`, -wired in each arch's `link.c`). The format-neutral data-word arms (R_ABS/REL/PC/ -TPOFF writes, x64 GOT/dynamic slots, the RISC-V data ADD/SUB/SET arithmetic, and -the ULEB128 codec) stay in obj-core as `reloc_apply_neutral()` -(`src/obj/reloc_apply.c`), which has no link/arch dependency. The single public -entry `link_reloc_apply()` moved to `src/link/link_reloc_apply.c` (neutral-then- -arch dispatch) — *not* obj-core, because resolving the per-arch slice needs -`link_arch_desc_for()`, the same boundary reason WS-B placed `reloc_desc()` in -`src/link`. The dispatcher enumerates no kinds (`rg "case R_(AARCH64|X64|RV)_" -src/link` is empty). x64 owns only `R_X64_PC8`; the wider x64 GOT/PLT/TPOFF data -slots remained neutral. `R_PLT32` is applied as the RISC-V AUIPC+JALR pair so it -lives in the rv hook beside `R_RV_CALL` (x64 never emits canonical `R_PLT32` — it -emits `R_X64_PLT32` via `reloc_from`). Migration guard: -`test/link/reloc_apply_test.c` (`test-link-reloc-apply`) — frozen pre-WS-C -patched bytes for every instruction-immediate kind across aa64/x64/rv (50 -checks). The reloc_uleb128 c=NULL path still works (neutral never touches the -compiler). Full link/elf/macho/ar/asm/isa/opt/coff/smoke matrix + bootstrap pass. - -**Problem (original).** `src/obj/reloc_apply.c` lives in the format-neutral obj layer but -encodes pure ISA knowledge — AArch64 imm19/imm26/ADRP page math, RISC-V U/I/S/B/J -immediate scatter and the 0x800 HI20 bias, x64 field writes. Adding an arch edits -this shared file; the encoders belong in the backends, beside that arch's MC emitter -and (post-WS-B) its `reloc.c` descriptor slice. - -**Constraint (must not break).** `link_reloc_apply(c, kind, ...)` stays the **one -public entry**, called unchanged by all four loaders (`src/asm/asm.c:1296`, -`src/emu/dl.c:15`, `src/link/link_jit.c`, `src/obj/{elf,macho,coff}/link.c`). The -"one encoder, three loaders" invariant ([../OBJ.md](../OBJ.md)) is preserved — there -is still exactly one encoder per kind; it moves to the owning backend. - -**Change.** -1. Keep `link_reloc_apply` in `src/obj/reloc.c` as the dispatcher; it handles the - **arch-neutral data-word arms inline** (`R_ABS32/64`, `R_REL*/PC*`, `R_TPOFF*`, - `R_GOT32`, `R_PLT32`, the ULEB128 codec) — plain `wr_uN_le`, no ISA knowledge. -2. Instruction-embedded kinds dispatch to a new `LinkArchDesc.reloc_apply_insn(c, k, - P, S, A, P)` hook. Move the AArch64 arms to `src/arch/aa64/reloc.c`, RISC-V to - `src/arch/riscv/reloc.c`, x64 instruction arms (`R_X64_PC8`) to - `src/arch/x64/reloc.c`. `c` (hence `target.arch`) is available at every call site - (verified: all `link_reloc_apply` callers pass a `Compiler*`). -3. COFF-specific kinds route to a COFF encoder slice. - -Each backend's `reloc.c` then owns {desc rows (WS-B), class flags (WS-B), byte -encoders (WS-C)} for its kinds — one file per arch. - -**Oracle.** Highest blast radius; lean on the WS-B exhaustiveness test + the full -matrix: `make test-link test-elf test-macho test-isa test-asm test-smoke-x64 -test-smoke-rv64 test-aa64-inline`, the JIT/emu reloc paths (`test-cg-api`, a -`run`/`emu` smoke), then **both** bootstrap chains (`make bootstrap-debug -bootstrap-release`) — byte-identity over the compiler's own object output is the -definitive proof no encoding shifted. Do this last, one arch at a time (neutral-core -extraction first, then aa64, x64, rv), keeping old switch arms live until each arch's -hook is proven, so every step bisects to one arch. - ---- - -## WS-E — Residual format gates (addresses **E**) — *all items LANDED* - -1. **FreeBSD static-IFUNC mechanism (#18).** **Done** — now - `obj_format_static_ifunc_via_rela_iplt(c)` (`src/obj/obj_secnames.c:371`). -2. **IRELATIVE wire type via hardcoded `KIT_OBJ_ELF`.** **Done.** The generic - `link_elf_irelative_type` is deleted; the iplt pass now calls - `obj_format_static_ifunc_irelative_type(l->c)` (sibling of the WS-E.1 predicate in - `src/obj/obj_secnames.c`), which resolves the resolver reloc through the *target* - object format (`c->target.obj`) rather than the literal `KIT_OBJ_ELF`. The generic - link pass names no format constant. -3. **`reloc_name` dispatch gate (#24 residual).** **Done.** `kit_obj_reloc_kind_name` - (`src/api/object_file.c`) now guards only `if (fmt != KIT_OBJ_ELF) return NULL;` — - the `arch != KIT_ARCH_X86_64` axis is gone, so aarch64/riscv ELF relocs print via - their `ObjElfArchOps.reloc_name` tables (matching binutils objdump: - `R_AARCH64_CALL26`, `R_RISCV_CALL`). One golden refreshed - (`test/objdump/rv64/cases/03-reloc-annotations`: `RV_CALL` → `R_RISCV_CALL` in the - `-r` records; the `-d` disasm annotation keeps the arch-neutral `[RV_CALL]`, which - comes from the disassembler's `reloc_kind_name`, a separate path). Mach-O/COFF have - no `reloc_name` table yet, so they still fall back to the neutral spelling. - -**Oracle.** `make test-link test-elf test-driver-objdump` — all pass (item 3's golden -churn was the single rv64 reloc-annotations case, purely the reloc spelling). Item 2's -FreeBSD static-IFUNC path is unexercised on the macOS host but the change is a -behaviour-preserving refactor (same per-arch `r_irelative`, resolved format == ELF -wherever `use_rela_iplt` is true); deeper coverage is the FreeBSD VM lane -(`scripts/freebsd_vm.sh` / `test-toy-freebsd-vm`, see [FREEBSD.md](FREEBSD.md)). - ---- - -## Sequencing & risk - -1. **WS-B** — the central remaining change: the `RelocDesc {width, flags}` table + - exhaustiveness test, deleting both generic switches and the duplicating `is_*` - hooks. This is now the highest-value open item (the identity switches are already - gone). Fold WS-A's value-class collapse in here since it touches the same enum/arms. -2. **WS-C** — **DONE.** Encoder partition behind the single entry, gated by the new - `test/link/reloc_apply_test.c` frozen-bytes guard + bootstrap byte-identity. -3. **WS-E.2 / WS-E.3** — **DONE.** (WS-E.3's binutils-spelling switch also required - refreshing `test/smoke/rv64_tls_link.sh`'s reloc grep — `RV_TPREL_HI20` → - `R_RISCV_TPREL_HI20` — a stale expectation it had missed.) - -**Risk controls.** Every WS is red-green: WS-B's exhaustiveness + width-migration test -is written first and fails until each arch's slice is complete. The **bootstrap** is -the load-bearing oracle — it patches every relocation kind the compiler emits for its -own source, so a byte-identical stage2/stage3 proves the encoding path is unchanged. -Per CLAUDE.md, prefer targeted suites during iteration (redirect output to a file); -reserve `make bootstrap` for end-of-WS gates. Keep old paths live beside new within a -WS (especially WS-C, per-arch) so any regression bisects to one arch's hook. - -## Done criteria - -All met by WS-B + WS-C below except the optional WS-A enum collapse (still deferred). - -- ✓ No file under `src/link/` enumerates `RelocKind` arms: `reloc_width`, - `reloc_uses_got`, `reloc_is_tls_got`, the `LinkArchDesc.is_*` hooks, **and the - byte-patcher's instruction arms** are gone; consumers read the per-arch - `RelocDesc` / call the per-arch `reloc_apply_insn`. (`rg "case R_(AARCH64|X64|RV)_" - src/link` returns nothing — the WS-C dispatcher is case-free.) -- ✓ Every relocation static fact has exactly one source: width + class flags in the - per-arch `RelocDesc` slice, wire encoding + name in `src/obj/<fmt>/reloc_<arch>.c`, - **and the instruction byte encoder in that arch's `reloc.c` `*_reloc_apply_insn`**. -- ✓ `link_reloc_apply` remains the single public byte-patcher entry (now in - `src/link/link_reloc_apply.c`); its instruction-encoding arms live in - `src/arch/<arch>/reloc.c`, the obj layer keeps only the arch-neutral data-word arms - (`reloc_apply_neutral`). -- ✓ Adding a hypothetical new arch's relocation touches only that arch's - `src/arch/<arch>/reloc.c` (one `RelocDesc` row + one `reloc_apply_insn` arm) and its - `src/obj/<fmt>/reloc_<arch>.c` — guarded by `test/link/reloc_desc_test.c` (rows) and - `test/link/reloc_apply_test.c` (bytes); no generic file needs edits. -- (Optional/low-pri, **still open** — WS-A) the `tpoff64_reloc` field is retired by the - `R_TPOFF64` collapse. (The `object_file.c` `reloc_name` gate removal + objdump golden - refresh and `link_elf_irelative_type` already landed under WS-E.) -- ✓ `make bootstrap-debug` reaches the byte-identical fixed point; the full - link/elf/macho/coff/isa/asm/opt/smoke matrix passes. (Release bootstrap carries a - PRE-EXISTING `.Lkit_jt.0` break unrelated to this work — gate on `bootstrap-debug`.)