kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 2686dfe936ae06310cd00dd61166417df8b3b8a7
parent 1806e4076ddd06706caa95e512c54844869694b9
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat, 30 May 2026 18:25:20 -0700

doc: file-based incremental linking design (obj+link internals)

Diffstat:
Mdoc/INCREMENTAL_OBJLINK.md | 380++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 281 insertions(+), 99 deletions(-)

diff --git a/doc/INCREMENTAL_OBJLINK.md b/doc/INCREMENTAL_OBJLINK.md @@ -66,12 +66,13 @@ Plus: `link_resolve_at`/`link_resolve_extend` are panic stubs `LinkRelocApply` records that *produce* those writes are preserved as data first (invariant, internal `src/link/link.h:234-246`). -**Benchmark (true shape).** `tmp/projects/lua`: 35 `.c` files; the Makefile -compiles **32 objects** (CORE_O=20 + LIB_O=12) into `liblua.a`, then links **two** -executables — `lua` and `luac` — that share the archive. So the substrate must -model (a) archive members as link inputs and (b) one edited TU fanning out to -multiple final images. `sqlite-amalg` (1 huge TU) and `yyjson` (1 TU) exercise -the single-TU degenerate case. +**Benchmark shape.** The substrate must model (a) archive members as link inputs +and (b) one edited TU fanning out to multiple final images. The acceptance suite +uses a small **synthetic** multi-TU fixture for exactly this (§19.2) — a handful +of core TUs archived into a static library, linked into two executables that +share it — rather than vendoring a third-party project as a test dependency. +Real codebases (amalgamations like sqlite, multi-TU libraries like lua) remain +useful as *later* wall-clock perf targets, but are deliberately not test deps. --- @@ -215,7 +216,8 @@ typedef struct LinkSession { u64 cursor[SEG_NBUCKETS]; /* append cursor per class (from JIT) */ u64 limit[SEG_NBUCKETS]; /* reserved ceiling per class */ LinkFreeList free[SEG_NBUCKETS]; /* vacated slots, first-fit reuse */ - u32 slack_pct; /* per-atom reserve, default 10% */ + u32 code_slack_pct; /* per-code-atom reserve; modest (code can relocate) */ + u32 data_slack_pct; /* per-data-atom reserve; generous (data-grow forces fallback) */ /* atom placement table, keyed by content_id; the persisted core (§10) */ LinkAtomPlace* atoms; u32 natoms; } LinkSession; @@ -291,10 +293,14 @@ control flow. ## 8. Placement, slack, and the move-on-grow primitive **Slack.** Today sections are contiguous with only alignment padding -(`link_layout.c:340-348`). Under `--incremental`, reserve per-atom slack -(`slack_pct`, gold's `--incremental-patch=n` analog) so overwrite-in-place is the -common case. A two-level free-list (one of free file blocks, one per segment -bucket) recycles vacated slots, first-fit. +(`link_layout.c:340-348`). Under `--incremental`, reserve per-atom slack so +overwrite-in-place is the common case. **Code and data get separate, tunable +budgets** (decision §20.2): `code_slack_pct` is modest because code atoms can +relocate cheaply (§8.1), while `data_slack_pct` is more generous because a data +atom that outgrows its slot forces a full-link fallback (data can't be thunked). +Both default sensibly and are overridable via a link option (gold's +`--incremental-patch=n` style). A two-level free-list (one of free file blocks, +one per segment bucket) recycles vacated slots, first-fit. **The move primitive — swappable.** When an atom moves, callers must still reach it without their bytes changing. Abstract this as one hook with two @@ -412,11 +418,14 @@ off) are canonical and reproducible. - **Stability (falsifiable):** after a patch, `nm`/`addr2line` on an *unchanged* symbol must return the identical vaddr as before. Enforced by overwrite-in-slack / append-to-free-slot, never compact. -- **Determinism audit (prerequisite for dedup, not correctness):** confirm that - identical `(source, flags, target)` yields byte-identical objects — audit - symbol ordering and `pool_intern` first-access order in obj emit. With - content/name keying (§10) a nondeterministic order only costs cache dedup, not - a wrong patch; but byte-stability is still wanted so two machines agree. +- **Determinism (decision §20.4 — lock with a test, keep content-keying):** obj + emission is *already* byte-deterministic — sections/symbols/relocs emit in + insertion order, `.strtab` dedups by linear search, and there are no + timestamps, embedded addresses, hash-map iteration, or threading in the emit + path (`src/obj/elf/emit.c:298,386,505`). Lock this with a regression test (two + compiles ⇒ identical bytes), which enables cross-machine / shared-cache dedup. + Content/name keying (§10) remains the *correctness* backbone: if a future + change ever reintroduces nondeterminism, it degrades dedup, never correctness. - **Reloc re-derivation:** never store an absolute `write_vaddr`; always `atom.vaddr + offset_within_atom` (principle 4). @@ -424,27 +433,61 @@ off) are canonical and reproducible. ## 13. Debug info (DWARF) consistency -- A moved atom's `.debug_info`/`.debug_line`/`.debug_aranges` address ranges - change → reapply that atom's debug relocs (re-derived like §9). Unchanged - atoms' debug stays byte-stable because their addresses do. -- v1 stance: rebuild only the *changed TU's* debug sections, `O(change)`. An - in-slack overwrite that does not move the atom leaves addresses (and therefore - `.debug_line` byte content) unchanged — free, but see the open question on - line-number-only shifts (§20). -- `addr2line` and `cfree dbg` re-read debug from the patched image. The JIT path - invalidates a cached view by generation counter; a file consumer re-reads the - file, so the build-id change (§11) is the staleness signal. +**Decision §20.1: regenerate the changed TU's debug on any body change.** +Investigation (`src/debug/debug_emit.c:823-905`, `:650-703`) established that +debug update is **`O(changed TU)`, not `O(atom)`**, and that skipping it is +*incoherent*: + +- cfree emits **one monolithic `.debug_line` program per CU** and a single + `.debug_info` CU whose DIEs reference each other by intra-CU `DW_FORM_ref4` + offsets. You cannot splice one function's line rows or patch one subprogram DIE + in isolation — a change shifts subsequent offsets across the CU. +- A body change rewrites the instruction→line mapping **regardless of whether + the atom moved** (an in-slack overwrite has the same vaddr but different + instruction offsets). So "keep `.debug_line` because the address didn't change" + is wrong — there is no correct skip. +- Therefore: on any changed atom, **re-emit that TU's full `.debug_*`**. This is + `O(changed TU)`, which is fine — unchanged TUs' debug is byte-stable because + their atoms keep their addresses, and one TU's debug regen is cheap relative to + the rest of the patch. Address-bearing fields (`DW_AT_low_pc`, `.debug_aranges`, + `.debug_rnglists`) re-derive from current placements (§9); `DW_AT_high_pc` is a + size, not an address. +- `addr2line` and `cfree dbg` re-read debug from the patched image; the build-id + change (§11) is the staleness signal for file consumers. +- *Future option (not pursued now):* per-function CUs / split line programs would + make debug `O(atom)`, but the gain is marginal versus the per-TU regen cost. --- ## 14. Multi-format & multi-arch -- **ELF first.** The atom + slack + move-primitive model is format-agnostic, but - Mach-O carries whole-image structures (chained fixups, `LC_DYLD_INFO`, - code-signature, `LC_UUID`) that resist in-place patching; enumerate which load - commands must be regenerated before attempting Mach-O incremental. COFF later. - Persisted state is side-band CAS for all three (§10), so no per-format on-disk - incremental metadata. +- **No format is fundamentally unsuitable (decision §20.6); difference is + machinery, so: ELF first, then COFF, then Mach-O.** The atom + slack + + move-primitive core is format-agnostic; the persisted state is side-band CAS + for all three (§10). Per-format cost: + - **ELF — least machinery (first).** Relocations are side data (`.rela.*`) + applied at emit, the symtab is flat fixed-offset records (patch the changed + entry in place), section headers are file-only, signing is optional. An atom + move touches only its own symbol's `st_value` and the relocs targeting it. + - **COFF/PE — incremental-friendly (second).** PE is the canonical incremental + target (MSVC `/INCREMENTAL` + `.ilk`): imported calls indirect through the + IAT, base relocs are simple per-page RVA lists, debug lives in a separate + PDB, and Authenticode is optional. The practical gate is cfree's COFF + maturity, not the format. (Reasoned from the PE/MSVC precedent, not a code + dive of cfree's COFF link.) + - **Mach-O — heaviest but feasible (last).** `__LINKEDIT` folds loader fixup + metadata into compact whole-image structures co-designed with a *mandatory* + dyld: chained fixups (`LC_DYLD_CHAINED_FIXUPS`), the export trie, the + indirect symtab. Each needs its own incremental updater — but they are + **bounded, not `O(image)`**: a code-atom move updates only the chained-fixup + *slots pointing at it* (chain `next`-links don't move unless pointer slots + move), and the symtab patch is `O(changed symbols)` like ELF (an earlier + survey overstated these as whole-image). The one real floor is **mandatory + code signing** on Apple Silicon: every patch must re-sign, and the + CodeDirectory is a per-4KiB-page hash array — naively `O(image)`, but + cacheable to `O(changed pages)` by retaining unchanged pages' hashes. + - Until a format's updater lands, that format falls back to the fast in-process + full link. - **Per-arch surface is small:** only (a) the move primitive's island/cell shape and (b) the branch-into-island/cell reloc kind. aa64 has the jit-stub shape to reuse; x64 (`src/obj/x64/link.c:40`) and rv64 each have a trampoline shape to @@ -496,28 +539,66 @@ out of caching; Toy's **batch/file** compile conforms like any other frontend. ## 16. The interface boundary the build system consumes -The separate build-system plan (build graph, cache, watch) calls only this -public surface; it never touches `src/link` internals. +**Incrementality is not a parallel API — it is the existing `CfreeLinkSession` +made fully mutable.** A full link is the degenerate cold case (no prior state, +nothing replaced); an incremental relink seeds prior state and replaces the +changed inputs. The build system always drives *the same* session, and +`resolve` internally decides patch-vs-full and reports which — there is no +separate "incremental" entry point to keep in sync with the full-link path. This +matches the internal direction (`link_resolve` is "inputs → image"; +`link_resolve_at`/`extend`, `link.c:629,638`, make that `resolve` extend-capable). + +Changes from today's surface (`new`/`add_obj…`/`resolve`/`emit`/`jit`/`free`, +`link.h:189-207`) — all **additive**: ```c -/* include/cfree/object.h */ +/* include/cfree/object.h — object identity */ CfreeStatus cfree_obj_content_id(CfreeObjBuilder*, uint8_t out[CFREE_BLAKE2B_LEN]); -/* include/cfree/link.h — new incremental session surface */ -typedef enum { CFREE_LINK_PATCHED, CFREE_LINK_FELL_BACK_FULL } CfreeLinkOutcome; +/* include/cfree/link.h */ +typedef enum { CFREE_LINK_FULL, /* cold: no prior state */ + CFREE_LINK_PATCHED, /* incremental fast path applied */ + CFREE_LINK_FELL_BACK_FULL /* was incremental, gate forced full */ +} CfreeLinkOutcome; -CfreeStatus cfree_link_session_open_incremental(CfreeLinkSession*, - const void* persisted, size_t persisted_len); /* NULL = cold */ +/* add_* gain a stable input handle so a live session can mutate one slot; + the handle is optional (NULL) for the cold/file path. */ +CfreeStatus cfree_link_session_add_obj(CfreeLinkSession*, CfreeObjBuilder*, + CfreeLinkInputId* out /*nullable*/); CfreeStatus cfree_link_session_replace_input(CfreeLinkSession*, CfreeLinkInputId, - CfreeObjBuilder* changed); /* by content */ -CfreeStatus cfree_link_session_patch_emit(CfreeLinkSession*, CfreeWriter* image, - CfreeWriter* persisted_out, CfreeLinkOutcome* outcome); + CfreeObjBuilder* changed); +CfreeStatus cfree_link_session_remove_input(CfreeLinkSession*, CfreeLinkInputId); + +/* seed prior incremental state (opaque bytes); unset/empty => cold full link. */ +CfreeStatus cfree_link_session_set_prior_state(CfreeLinkSession*, const CfreeSlice*); + +/* resolve() and emit() are the SAME calls, now incremental-aware: + resolve reconciles the CURRENT input set against prior state BY CONTENT (§10) + — unchanged atoms reuse placement, changed atoms patch, non-local edits fall + back to a full re-resolve. Idempotent: safe to re-call after mutations. */ +/* CfreeStatus cfree_link_session_resolve(CfreeLinkSession*); (existing) */ +/* CfreeStatus cfree_link_session_emit(CfreeLinkSession*, CfreeWriter*); (existing) */ + +/* emit the new persisted incremental state (opaque); query the last outcome. */ +CfreeStatus cfree_link_session_serialize_state(CfreeLinkSession*, CfreeWriter*); +CfreeStatus cfree_link_session_outcome(CfreeLinkSession*, CfreeLinkOutcome* out); ``` -The build system supplies changed objects (it decides *which* via its cache), -gets back the patched image, the new persisted blob, and — crucially — the -**outcome** so it knows whether the fast path applied or the link fell back. The -object content id lets it detect "this TU's object is byte-identical, skip it." +- **Cold full link (today, unchanged):** `new → add_obj… → resolve → emit`. +- **Incremental relink:** `new → set_prior_state(blob) → add_obj…/replace_input → + resolve → emit + serialize_state`, then read `outcome`. + +Because reconciliation is **by content hash** (§10, decision §20.3), the +cross-process path needs no stable-id continuity: a fresh session seeded with +prior state matches re-added inputs to prior placements by content. +`replace_input`/`remove_input` are a live-session (daemon) convenience for +slot-precise mutation; they still match by content underneath. + +**Ownership (decision §20.5):** the build system owns the persisted blob's key, +CAS storage, and lifetime; the session only reads it via `set_prior_state` and +writes it via `serialize_state` as opaque bytes through `CfreeWriter`. libcfree +does no file IO/CAS (both are driver-only — `driver/dist`; libcfree reads bytes +via `Compiler.env->file_io`, `src/link/link.h:150`). --- @@ -532,73 +613,174 @@ A patch is all-or-nothing from the consumer's view: --- -## 18. Implementation sequence +## 18. Implementation sequence (acceptance-test-first, red → green) + +The **first phase builds the acceptance suite** (§19), which encodes "done for +ELF" as an executable spec. It starts fully red; each milestone below drives a +named set of its scenarios green. Each milestone also has its own narrow +red-green unit cycles (listed inline); the acceptance scenarios are the +integration capstones. + +**Phase 0 (first) — author the ELF acceptance suite, RED.** +Land the additive public surface (§16) as not-implemented stubs — +`_set_prior_state` / `_replace_input` / `_remove_input` / `_serialize_state` / +`_outcome`, the input-id out-param on `add_obj`, the `CfreeLinkOutcome` enum, and +`cfree_obj_content_id` — each returning a "not implemented" status, with +`resolve`/`emit` initially doing only the cold full link (mirroring the existing +`link_resolve_at`/`extend` panic stubs, `link.c:629,638`), so the suite compiles +and links. Then write `test/link-incremental/` with the §19 scenarios A–F, the +synthetic fixture build, and the `link_resolve` whole-program instrumentation. Every +scenario is red. This nails the spec before any implementation and is the red +baseline. (No parallel "incremental" surface — §16: the one mutable session.) **M0 — atom identity & obj indices (no behavior change).** `obj_content_id` / `obj_atom_content_id`, per-atom reloc index, symbol-by-name -hash, deterministic round-trip + a determinism test. Wire `CfreeFrontendCaps` -and the contract (C deps via `CfreeDepIter`; trivial for others). +hash, deterministic round-trip, `CfreeFrontendCaps` (C deps via `CfreeDepIter`; +trivial for others). → **turns Scenario E (determinism) green** and provides the +`obj_content_id` the harness keys on. Narrow: one-byte body edit ⇒ exactly that +atom's content id changes, others stable. **M1 — `LinkSession` + append-only extend (Stage A).** -Introduce `LinkSession`, implement `link_resolve_extend` for append-only against -a file image, reusing JIT cursor/slack *placement* but **falling back, not -panicking**. Persisted blob round-trips (§10). +Introduce `LinkSession`; implement the append-only subset of +`link_resolve_extend` against a file image, reusing JIT cursor/slack *placement* +but **falling back, not panicking**; persisted-blob round-trip (§10). → **turns +Scenario F (no-op relink) green**. Narrow: appended object whose code calls an +initial function links; appended duplicate-strong-def falls back (not panic); +unresolved ref is transactional (image unchanged). **M2 — patch changed atoms in slack (Stage B, no move yet).** -Per-atom diff, overwrite-in-slack, reapply that atom's relocs (§9), per-segment -build-id (§11), the soundness gate + transactional rollback (§7.3, §17). Atoms -that would grow past capacity ⇒ fall back (no move primitive yet). +Per-atom diff, overwrite-in-slack, reapply the changed atom's relocs (§9), +per-segment build-id (§11), regenerate the changed TU's debug (§13), the +soundness gate + transactional rollback (§7.3, §17). Atoms that would grow past +capacity fall back here (no move primitive yet). → **turns Scenarios A (in-slack +edit), C (fallback), and D (multi-output) green.** **M3 — move-on-grow via thunk (`LinkMoveOps` = thunk).** -Free-list, grow-relocate code atoms, jump islands (reuse jit-stub shape), data -slack + fall-back-on-data-grow. ELF/aa64 then ELF/x64. +Free-list, relocate grown code atoms, jump islands (reuse the +`link_layout_jit_stubs` shape, `link_reloc_layout.c:429`), separate code/data +slack with data-grow → fallback. → **turns Scenario B (grow past slack) green.** +At the end of M3 the full §19 suite is green on **ELF/aa64 + ELF/x64 — this is +"done for ELF."** -**M4 — converge on GOT-cell (`LinkMoveOps` = got), if/when hot reload needs it.** -`--incremental` codegen mode for cross-unit calls + movable data, reserved GOT -slack + free-list. Shares the primitive with `doc/HOT_RELOAD.md`. +**M4 (deferred) — converge on GOT-cell (`LinkMoveOps` = got).** +Built when hot reload is scheduled (decision §20.3), designed to serve both +paths: a `--incremental` codegen mode for cross-unit calls + movable data, +reserved GOT slack + free-list. Not required for the §19 suite. -Mach-O/COFF and rv64 patching follow M3/M4 per §14. +Then COFF, then Mach-O updaters (§14); the rv64 patch path follows aa64/x64. --- -## 19. Test plan (narrow, per-arch, red-green) - -Prefer targeted runs; redirect output to a file and read it (project rules). - -- **M0:** compile `tmp/projects/lua/src/ltable.c` twice ⇒ identical - `obj_content_id` (determinism). Edit one function body ⇒ exactly that atom's - content id changes, others stable. aa64 + x64 only. -- **M1:** initial object + appended object where appended code calls an initial - function; appended duplicate-strong-def ⇒ fall back (not panic); unresolved - ⇒ transactional, image unchanged. -- **M2:** build `liblua.a` + `lua`; patch one in-slack function body ⇒ unchanged - symbols keep vaddrs (`nm` diff), binary runs (`test/lib` `exec_target`), and - `link_resolve` whole-program path was *not* taken (instrument a counter, dump - to file). Negative: add a new global ⇒ fall back; weak↔strong flip ⇒ fall back; - new archive pull-in ⇒ fall back. -- **M3:** grow a function past its slack ⇒ it relocates, an island appears at the - old slot, callers' bytes are byte-identical, result runs. Grow a *global* past - data slack ⇒ fall back. `addr2line` an unchanged function after a patch ⇒ - correct file:line. -- **Multi-output:** edit a core TU shared by `lua` and `luac` ⇒ both images - patch (or both fall back) consistently. +## 19. ELF definition of done: outcome & acceptance suite + +This is the executable specification authored **first** (Phase 0, §18) and the +north-star M0–M3 drive to green. ELF/aa64 + ELF/x64 only (per the "narrow runs" +rule). It is `test/link-incremental/`. + +### 19.1 Outcome — what a fully-built ELF implementation produces + +Under `--incremental` (`-O0`/`-O1`), the build system seeds the session with prior +state, replaces the changed input(s) (`cfree_link_session_replace_input`, or +re-adds inputs on the cold path), then calls the same `resolve` → `emit` + +`serialize_state` (§16): + +- **In-slack body edit:** the changed atom's bytes overwrite in place, only its + relocs re-derive/reapply, the changed TU's `.debug_*` regenerates, only the + changed segment's build-id subhash recomputes. Outcome `CFREE_LINK_PATCHED`; + every other atom and every unchanged vaddr is byte-identical. +- **Grow past slack:** the atom relocates to a free-list slot, a jump island is + left at its old address, **callers' bytes do not change**; only the moved + atom's relocs re-derive. Still `PATCHED`, still `O(change)`. +- **Non-local edit:** added/removed global, weak↔strong flip, new archive + pull-in, COMDAT-ownership flip, TLS/import-size change, or slack exhaustion + (incl. any data-atom grow) ⇒ `CFREE_LINK_FELL_BACK_FULL`: a correct full + in-process link. **Never a wrong binary.** + +Guarantees: address stability (unchanged symbols keep their vaddr), debug +correctness after a patch (`addr2line`/`cfree dbg`), byte-deterministic objects +(release builds with `--incremental` off are the reproducible artifact), and +consistent multi-output (a core-TU edit patches/falls-back for both apps). Cost: +a one-function in-slack edit takes the link from `O(all objects)` to `O(one atom ++ its relocs + one TU's debug + one segment rehash)` — for the synthetic fixture +(§19.2), from "relink both executables over the whole archive" to "patch one +atom." + +### 19.2 Harness & instrumentation + +- Run the patched binary via `test/lib` `exec_target`/`exec_kernel`. +- Instrument the whole-program `link_resolve` entry with a counter dumped to a + file (read it back; don't re-run). A `PATCHED` outcome must **not** increment + it; a fallback must. +- **Fixture — a synthetic multi-TU codebase** under + `test/link-incremental/fixture/` (hand-written, deterministic, no third-party + deps; freestanding-style apps that return a *computed* status, like the smoke + harness, so the only archive in play is the fixture's own). Five core TUs + archived into `libcore.a`, linked into two executables that share it: + - `arith.c` — leaf callees `arith_add` / `arith_mul`. + - `table.c` — `tbl_get` (the **in-slack** edit target) and `tbl_sum` (the + **grow** target); both call `arith_*` cross-TU (so callers live in other TUs). + - `state.c` — data globals `g_config[8]`, `g_name[]` (the **data** atoms for + data-slack / data-grow). + - `weakdef.c` — `feature_level()` defined **weak** (the **weak→strong** target). + - `optional.c` — `opt_helper()`, an archive member **referenced by no one + initially** (so it isn't pulled — the **archive-pull-in** target). + - `app_a.c`, `app_b.c` — two `main`s with overlapping core use, each linking + `app_*.o` + `libcore.a` → executables `app_a`, `app_b` (the multi-output + shape; `table.c` is shared by both). +- Build the fixture once with `--incremental`; capture each app's baseline `nm` + vaddr map, the persisted-state blob, and each app's expected computed status + (run via `test/lib` `exec_target`). +- Prefer targeted runs; redirect output to a file (project rules). + +### 19.3 Acceptance scenarios (each assertion falsifies one claim) + +| Scenario | Action | Assertions | Green at | +|---|---|---|---| +| **A — in-slack edit** | edit `tbl_get`'s body within its slack | `outcome==PATCHED`; whole-program `link_resolve` counter **did not** increment; `nm` diff: **every** symbol keeps its vaddr; `app_b` (a cross-TU caller) runs and its status reflects the edit; `addr2line` correct for an unchanged *and* the edited function; only the changed segment's bytes differ; build-id changed | M2 | +| **B — grow past slack** | grow `tbl_sum` ~10 KB past its slot | `PATCHED`; `tbl_sum` moved to a new vaddr; jump island at its **old** vaddr; **`app_a`'s caller bytes byte-identical** to baseline; both apps run | M3 | +| **C — soundness gate** | c1 add `int g_extra;` to `state.c` · c2 flip `feature_level` weak→strong · c3 edit `table.c` to call `opt_helper` (pulls `optional.c`) · c4 grow `g_config` past `data_slack` | each ⇒ `FELL_BACK_FULL`; binary matches a from-scratch full link | M2 | +| **D — multi-output** | edit `table.c` (shared by both `app_a` and `app_b`) | both images patch or fall back **consistently**; both run | M2 | +| **E — determinism** | compile `table.c` twice | identical `obj_content_id` **and** identical bytes; one-byte edit ⇒ only that atom's id changes | M0 | +| **F — no-op relink** | `replace_input` with a byte-identical object | no atoms diff ⇒ image unchanged, near-zero link work | M1 | + +### 19.4 The two gates that define correctness + +**Scenario A's vaddr-stability assertion** (a patch must move nothing it +shouldn't) and **Scenario C** (non-local edits must fall back, never silently +mislink) are the two ways this feature can be wrong. Both must be green before +"done for ELF" is claimed. --- -## 20. Open questions / decisions - -1. **DWARF on in-slack overwrite:** accept that an overwrite that does not move - the atom leaves `.debug_line` byte-identical (free) even if source *line - numbers* shifted within the body — or always re-emit the atom's `.debug_line` - on any body change (correct, slightly slower)? (§13) -2. **Data movement under thunk mode:** v1 forbids moving data (slack + fall - back). Is the slack budget for data atoms tunable per project, or fixed? -3. **GOT convergence trigger:** build M4 only when hot reload needs the shared - cell, or proactively to unify the two paths sooner? (§8.2) -4. **Determinism guarantee strength:** require byte-stable objects (enables - cross-machine dedup) or only content-keyed correctness (§12)? -5. **Persisted-blob lifetime/keying:** the link action id is a build-system - concern (§16) — confirm the boundary: does the build system own the CAS key, - or does the link session? -6. **Mach-O/COFF scope:** confirm ELF-only for v1 (§14); enumerate Mach-O - whole-image structures before committing to patch them. +## 20. Decisions (resolved) + +All six original open questions were investigated against the code and resolved +with the user. Recorded here as the binding design choices; the relevant sections +above were updated to match. + +1. **DWARF (§13) — regenerate the changed TU's debug on any body change.** The + "keep stale `.debug_line`" option is *incoherent* (a body change rewrites the + line mapping regardless of move) and debug is per-CU monolithic, so update is + `O(changed TU)`, not `O(atom)`. Per-function CUs for `O(atom)` debug are a + noted future option, not pursued now. +2. **Slack (§6, §8) — separate, tunable code/data budgets.** `code_slack_pct` + modest (code relocates cheaply via the move primitive); `data_slack_pct` + generous (a data-grow forces a full-link fallback since data can't be + thunked). Both default sensibly, overridable via a link option. +3. **GOT timing (§3, §8.2, §18) — defer; thunk-first.** Ship thunk-on-grow for + file-incremental now (zero codegen change). Build the shared indirection-cell + primitive when hot reload is scheduled, designing it then to serve both. + Rationale: hot reload needs its *own* per-function slots regardless + (`doc/HOT_RELOAD.md:34-48,144-157`), thunk-on-grow doesn't advance it, and + neither is implemented — so unifying now is speculative. +4. **Determinism (§12) — lock with a test, keep content-keying.** Objects are + already byte-deterministic (`src/obj/elf/emit.c:298,386,505`); a regression + test locks it (enabling cross-machine dedup), while content/name keying stays + the correctness backbone so any future drift degrades dedup, not correctness. +5. **Persisted-state keying (§16) — build system owns key/storage/lifetime; the + session emits opaque bytes** (+ exposes the blob's content id). Keeps libcfree + IO/CAS-free, matching the driver-only CAS boundary. +6. **Format scope (§14) — ELF first; COFF then Mach-O as follow-on milestones.** + None of the three is fundamentally unsuitable; the difference is how much + format-specific machinery each needs. Formats without an updater yet fall back + to the fast in-process full link.