kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 1806e4076ddd06706caa95e512c54844869694b9
parent 9e7fedaaf62629d397c972e56bcd90b25da3f732
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat, 30 May 2026 17:46:56 -0700

test+doc: correct rv64 hostas-cross root cause (cc -S infidelity, not emulation)

Debugging the rv64 cross-exec hang showed it is NOT an emulation problem: a
minimal clang rv64 static exe and the DIRECT cfree `cc -c` object both run
correctly under the same podman qemu-riscv64; only the `cc -S | as` round-trip
hangs. Root cause: rv64 has no symbolizer (no ArchAsmOps), so `cc -S` emits the
call as `auipc ra,0x0; jalr ra,0(ra)` with R_RISCV_CALL unsymbolized — it calls
itself — and branches keep numeric targets (`j 0x90`). Correct the harness
header and doc/ASM_ROUNDTRIP_TESTING.md accordingly, and record that x64 now
assembles 312/312 (272/312 exec; data round-trip backlog). Both are the
"x64/rv64 cc -S round-trip" backlog; no code change here.

Diffstat:
Mdoc/ASM_ROUNDTRIP_TESTING.md | 30++++++++++++++++++------------
Adoc/INCREMENTAL_OBJLINK.md | 604++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/asm/hostas_cross.sh | 14++++++++++----
3 files changed, 632 insertions(+), 16 deletions(-)

diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md @@ -250,18 +250,24 @@ downgrades to SKIP instead of hanging). Status: - **aarch64-linux**: green end-to-end (cfree-as 312/0, clang-as 312/0) — podman runs arm64 natively in its VM, so it's fast and the primary verified target. -- **x86_64-linux**: SKIPS on the x64 `cc -S` symbolizer gap — x64 emits numeric - branch targets (`jmp 0x77`) the x64 `as` can't reassemble. The aarch64 - symbolizer (intra-section branch-target label synthesis in - `src/api/asm_emit.c`, and the relocation-operand syntax via - `ArchAsmOps.reloc_operand`) needs x64 implementations — `is_local_branch`-style - recognition for `jmp`/`jcc`, plus an x64 `reloc_operand` table - (`sym(%rip)`/`@PLT`/`@GOTPCREL`). Tracked. -- **riscv64-linux**: `cc -S | cfree as | cfree ld -static` works; SKIPS where - riscv64 user-mode emulation is unavailable or too slow/wedged to pass the exec - smoke (e.g. the macOS/arm64 dev host's podman riscv64 path hangs on the - cfree-built static ELF even though it runs a clang-built one — likely a - cfree-rv64-ELF-under-qemu-user issue to chase separately). +- **x86_64-linux**: the x64 `cc -S` symbolizer landed (the aarch64 symbolizer is + now arch-generalized — `ArchAsmOps.is_local_branch` for `jmp`/`jcc`, a x64 + `reloc_operand` table for `sym(%rip)`/bare-`@PLT`/`@GOTPCREL` with a +4 rel32 + addend bias, and operand-driven RIP surgery), so the whole corpus + **re-assembles 312/312** via both cfree-as and clang. Cross-EXEC is **272/312**: + ~23 cases (switch/jump tables, global/array/fp data, varargs) lose fidelity in + the x64 cc -S **data** round-trip — confirmed cc -S infidelity, since the + DIRECT `cc -c` object executes correctly. That data backlog is the remaining + x64 work. Opt-in until it closes. +- **riscv64-linux**: assembles, but cross-EXEC **hangs** — NOT emulation (a + minimal clang rv64 static exe AND the DIRECT cfree `cc -c` object both run + correctly under the same qemu-riscv64; only the `cc -S | as` round-trip hangs). + Root cause: rv64 has **no symbolizer** (no `ArchAsmOps`), so `cc -S` emits the + call as `auipc ra,0x0; jalr ra,0(ra)` with the `R_RISCV_CALL` reloc + unsymbolized — it calls itself — and branches keep numeric targets (`j 0x90`). + Needs an rv64 `ArchAsmOps`: `is_local_branch` (j/beq/bne/...) and a + `reloc_operand` for the RISC-V `%pcrel_hi`/`%pcrel_lo`/`%hi`/`%lo`/`call` + syntax (the `%pcrel_lo` label-pairing makes this the hardest of the three). Override the matrix with `CFREE_HOSTAS_CROSS_TARGETS="tag:triple ..."`, the exec-smoke cap with `CFREE_HOSTAS_EXEC_TIMEOUT=<secs>`, and per-arch images with diff --git a/doc/INCREMENTAL_OBJLINK.md b/doc/INCREMENTAL_OBJLINK.md @@ -0,0 +1,604 @@ +# File-based incremental linking — obj + link internals + +Status: design draft. Scope: the `src/obj` and `src/link` substrate that lets a +file-based rebuild patch a prior image instead of relinking from scratch, plus +the public interface a build system consumes. This is **distinct from** the two +existing incremental docs and is sequenced under neither: + +- `doc/INCREMENTAL_LINK.md` — append-only growth of a *live JIT image* for + `cfree dbg`. In-process, never patches existing code. We reuse its machinery + (append cursors, durable reloc records) but target on-disk rebuilds. +- `doc/HOT_RELOAD.md` — replacing a *running* function body in a live process. + Shares the "indirection cell" idea (see §8) but is a different consumer. + +The build graph, compile cache, dependency scanning, and daemon/watch modes are +**out of scope** here — they live in the separate build-system plan and consume +the interface in §16. This document is deliberately about the linker and object +internals those layers stand on. + +--- + +## 1. Scope & non-goals + +**In scope.** Make the obj/link layers able to: +- give every object and every function/data *atom* a stable **content identity**; +- persist a prior link's placement + relocation state as side-band data; +- on a changed input, **re-resolve and patch only the changed atoms**, keeping + every unchanged address byte-stable; +- detect when a change is *not* provably local and **fall back to a full link**; +- expose all of the above through a small public API. + +**Non-goals (here).** +- Build graph / DAG, compile cache, header-dependency scanning, `cfree serve` + daemon, watch mode — separate build-system plan (consumes §16). +- Cross-TU / whole-program optimization (ThinLTO-style). Incremental link is a + `-O0`/`-O1` *dev* feature; release builds always full-link, clean. +- Reclaiming dead patch space within a session (we recycle via a free-list but + do not compact; a clean link reclaims). +- Mach-O / COFF in-place patching in v1 (ELF first — see §14). + +--- + +## 2. Target & cost model + +**"Instant" defined.** After editing one translation unit in a project of *N* +TUs, the *link* cost should be `O(changed atoms + their relocations)`, not +`O(whole program)`. Compile cost is the build system's problem (cache); this doc +makes the *link* incremental. + +**Where link time goes today.** `link_resolve` (`src/link/link_layout.c:1212`) +is six whole-program phases: + +| Phase | Function | Cost | +|---|---|---| +| 1 Symbol resolve | `link_resolve_symbols` (`link_resolve.c:228`) | `O(Σ symbols)` — one global `SymHash` | +| 1b GC | `link_gc_compute` | `O(sections + relocs)` BFS, no delta-marking | +| 2 Layout | `link_layout_sections` (`link_layout.c:209`) | `O(total_kept)`; **any size change shifts all downstream vaddrs** (`link_layout.c:340-348`, no slack) | +| 2c Bytes | `link_emit_segment_bytes` (`link_layout.c:1050`) | `O(Σ bytes)` into one monolithic per-segment buffer | +| 3 Vaddr | `link_assign_symbol_vaddrs` (`link_reloc_layout.c:40`) | `O(Σ symbols)` | +| 4 Relocs | `link_emit_relocations` (`link_reloc_layout.c:1227`) | `O(Σ relocs)` | +| 6 Emit + id | format emitter + `link_image_id_compute` (`link_image_id.c:31`) | `O(output)`; FNV-1a over **all** segment bytes *and* vaddrs | + +Plus: `link_resolve_at`/`link_resolve_extend` are panic stubs +(`link.c:629,638`); the GOT is one exactly-sized segment placed after everything +(`link_reloc_layout.c:710-748`); relocations are applied **destructively** into +`segment_bytes` at emit (`src/obj/elf/link.c:318-470`) — but the +`LinkRelocApply` records that *produce* those writes are preserved as data first +(invariant, internal `src/link/link.h:234-246`). + +**Benchmark (true shape).** `tmp/projects/lua`: 35 `.c` files; the Makefile +compiles **32 objects** (CORE_O=20 + LIB_O=12) into `liblua.a`, then links **two** +executables — `lua` and `luac` — that share the archive. So the substrate must +model (a) archive members as link inputs and (b) one edited TU fanning out to +multiple final images. `sqlite-amalg` (1 huge TU) and `yyjson` (1 TU) exercise +the single-TU degenerate case. + +--- + +## 3. Design principles + +1. **Provable locality, else fall back.** Reuse is correct only when the change + cannot alter symbol resolution (mold's cascading-effects argument). The full + link is always available and always correct; the incremental path is an + accelerator gated on a soundness check (§7.3). A correct-but-slow result + always beats a fast-but-wrong one. +2. **Address stability is the bedrock.** Once a vaddr is published it never + moves. Unchanged atoms keep their bytes *and* addresses, so their relocations + never reapply — this is what makes a patch `O(change)`. Enforced by + overwrite-in-slack / append-to-free-slot, **never compact**. +3. **Content-hash keying, not transient IDs.** `LinkInputId`/`LinkSymId` are + stable only *in-process* (`link.h:240-241`); a file rebuild allocates a fresh + `Linker`. So persisted state is keyed by **content hashes** and **symbol + names**, never by re-derived IDs (§10). This makes determinism a dedup + *nicety*, not a correctness *requirement*. +4. **Relocation location is relative, target is symbolic.** Persist a reloc as + `(atom, offset-within-atom, kind, target-name, addend)`. Derive the absolute + write address and target address from *current* placements at apply time. + Moving an atom then needs **zero reloc rewriting** — placements change, the + reloc re-derives. (Closes the "rewrite `write_vaddr` on move" hazard.) +5. **The move-on-grow primitive is swappable.** Everything else (atoms, slack, + free-list, persisted session, soundness gate) is independent of *how a caller + reaches a moved callee*. Ship **thunk-on-grow** first (no codegen change), + converge on **GOT-cell** later to share one mechanism with hot reload (§8). +6. **Frontend-agnostic.** All work attaches at the shared `ObjBuilder` boundary + (`obj_finalize`, `src/api/compile.c:356`). C, Toy, asm, and WASM all benefit + with no per-frontend code beyond a tiny capability (§15). +7. **Project rules.** No VLAs; no global state (everything hangs off + `Compiler`/`LinkSession`); multi-arch/multi-format behind the existing + `ArchImpl`/`ObjFormatImpl` vtables; determinism preserved on the full-link + path. + +--- + +## 4. What exists vs what is new + +| Need | Status | Where | +|---|---|---| +| Durable, non-destructive reloc records | **exists** | `LinkRelocApply`, internal `link.h:234-246`, `link_internal.h:129` | +| Stable IDs *within a link* | **exists** (in-process only) | `link.h:240-241` | +| Per-input id translation | **exists** | `InputMap`, `link_internal.h:21` | +| Atom-level GC granularity | **exists** | `InputMap.section_atom_*`; atoms placed individually `link_layout.c:282-284` | +| Append cursors + reserved slack | **exists (JIT only)** | `link_jit.c:111-114`; **panics** on exhaustion `link_jit.c:1080` | +| Apply one reloc to a live mapping | **exists (JIT)** | `cfree_jit_append_obj` path, `reloc_apply.c` | +| AArch64 call-stub template | **exists (JIT only, off for static exe)** | `link_layout_jit_stubs` `link_reloc_layout.c:429` | +| GOT slot machinery | **exists, but only for GOT-reloc kinds** | `link_layout_got` `link_reloc_layout.c:654`; `reloc_uses_got` `:376` | +| BLAKE2b CAS blob/tree store | **exists** | `driver/dist`: `cas.c`/`blob.c`/`tree.c`, `DIST_BLAKE2B_LEN=32` | +| Dependency iteration (C includes) | **exists** | `cfree_dep_iter_new/next` `src/api/compile.c:417-462`; `cc_dep_finish` `cc.c:2121` | +| `LinkSession` type | **new** (only sketched in docs; today bare fields on `CfreeJit`) | — | +| `link_resolve_extend` | **new** (panic stub) | `link.c:638` | +| Per-atom slack / free-list / overwrite / grow-relocate | **new** | — | +| Fall-back-instead-of-panic discipline | **new** (JIT preflight panics) | — | +| Object/atom content identity | **new** | — | +| Per-atom reloc & symbol indices | **new** (flat scans today: `obj_reloc_count` `obj.c:831`, `obj_symbol_find` `obj.c:528`) | — | +| Incremental (per-segment) build-id | **new** (FNV-1a is whole-image) | `link_image_id.c:31` | +| Move-on-grow primitive (thunk / GOT) | **new** (direct `R_AARCH64_CALL26` today) | — | + +The honest summary: durable relocs, stable in-process IDs, atom GC, and the JIT +append *placement* are reusable; **everything that makes a relink incremental — +content identity, slack/free-list, overwrite/grow, persistence, the soundness +gate, graceful fallback, and the move primitive — is net-new code.** + +--- + +## 5. The atom model (obj side) + +The minimal relocatable unit is an **atom**: one function or one data object. +cfree already carries atoms for GC; incremental link promotes them to the +patch unit. + +- Under `--incremental` (dev mode), frontends emit one section per + function/global (a `-ffunction-sections`/`-fdata-sections` equivalent) so each + atom is independently placeable. cfree already lays out kept atoms as + individual `LinkSection`s (`link_layout.c:282-284`). +- Each atom gets a **content id**: BLAKE2b over its canonical form — + `bytes || align || flags || canonical(relocs)`, where `canonical(relocs)` + encodes each reloc as `(offset-within-atom, kind, target-name, addend)`. + Target is the *name*, never a transient id (principle 3). +- The **object content id** is BLAKE2b over the atom-id list plus object-level + metadata (format, arch, ext flags). Two byte-identical compiles → identical + object id (modulo the determinism audit; see §12). + +### 5.1 obj internals additions + +```c +/* src/obj/obj.h — new */ +typedef struct ObjAtomId { u32 v; } ObjAtomId; /* 0 = none */ + +/* Deterministic content identity over the canonical form above. */ +void obj_atom_content_id(ObjBuilder*, ObjAtomId, u8 out[DIST_BLAKE2B_LEN]); +void obj_content_id(ObjBuilder*, u8 out[DIST_BLAKE2B_LEN]); + +/* O(1) per-atom lookups (today both are linear scans). */ +const Reloc* obj_atom_reloc_first(ObjBuilder*, ObjAtomId, ObjRelocCursor*); +ObjSymId obj_symbol_by_name(ObjBuilder*, Sym name); /* hash, not O(nsyms) */ +``` + +Required obj changes: +1. **Per-atom reloc index.** Today `obj_reloc_count` scans the flat reloc table + (`obj.c:831`). Add a per-atom reloc list so "relocs touching atom A" is `O(1)`. +2. **Symbol-by-name hash.** `obj_symbol_find` is `O(nsyms)` (`obj.c:528`). Add a + name→`ObjSymId` hash on the builder. +3. **Deterministic, lossless serialize/deserialize.** A cached/persisted object + must rehydrate identically: atoms, relocs, COMDAT groups, debug sections, and + format `ext_type/ext_flags` (round-trip-safe today per survey) all preserved. + This is the cache-value contract the build system relies on (§16). +4. `obj_finalize` (`obj.c`, currently reserved/empty) is the natural place to + compute and memoize the content ids once a TU is built. + +--- + +## 6. The `LinkSession` (link side) + +A new type that owns the state that must outlive one `link_resolve` and can be +persisted. It generalizes the per-segment cursor/slack fields currently inlined +in `struct CfreeJit` (`link_jit.c:92-114`) and adds overwrite, free-list, grow, +and graceful fallback. + +```c +/* src/link/link_session.h — new. Hangs off Compiler; no global state. */ +typedef struct LinkFreeList LinkFreeList; /* gold-style two-level free list */ + +typedef struct LinkAtomPlace { /* one per placed atom */ + u8 content_id[DIST_BLAKE2B_LEN]; /* §5 atom content id (the key) */ + u64 vaddr; /* published address — STABLE */ + u64 file_offset; + u32 size; + u32 capacity; /* size + reserved slack */ + u32 seg_bucket; /* SEG_RX/R/RW/TLS */ + /* relocs stored relative: (offset-within-atom, kind, target_name, addend) */ +} LinkAtomPlace; + +typedef struct LinkSession { + Compiler* c; + Linker* l; /* stable LinkInputId -> ObjBuilder* */ + LinkImage* img; /* now MUTABLE-by-patch */ + u64 cursor[SEG_NBUCKETS]; /* append cursor per class (from JIT) */ + u64 limit[SEG_NBUCKETS]; /* reserved ceiling per class */ + LinkFreeList free[SEG_NBUCKETS]; /* vacated slots, first-fit reuse */ + u32 slack_pct; /* per-atom reserve, default 10% */ + /* atom placement table, keyed by content_id; the persisted core (§10) */ + LinkAtomPlace* atoms; u32 natoms; +} LinkSession; + +/* Fixed-size transaction watermark (no VLA). */ +typedef struct LinkPatchTxn { + u32 old_natoms, old_nsyms, old_nsections, old_nrelocs; + u64 old_cursor[SEG_NBUCKETS]; + /* free-list undo log handle */ +} LinkPatchTxn; +``` + +`LinkImage` stays the read-side view for inspection/DWARF/emit, but its symbol, +section, and reloc vectors become append/overwrite-capable (they already grow on +the JIT path). + +--- + +## 7. Incremental resolve & the soundness gate + +Implement `link_resolve_extend` (`link.c:638`, panic stub today) in two stages. + +### 7.1 Stage A — append-only (sound subset, first milestone) +New inputs that only *add* definitions, resolving against the existing image + +external resolver. This is exactly the JIT append model and reuses its cursor + +slack *placement* (`link_jit.c` append path) — but writing to a file image and, +critically, **falling back instead of panicking** on exhaustion. + +### 7.2 Stage B — patch changed atoms +For a changed input, diff its atoms (by content id) against the persisted +placement table: +- **Unchanged atom** (id matches): keep placement, keep bytes, **do nothing** — + its relocations are never revisited. +- **Changed atom, fits capacity**: overwrite bytes in place; reapply only *its* + relocations (re-derived from current placements, §9). +- **Changed atom, grows past capacity**: allocate a new slot (free-list, else + bump `cursor[seg]`), return the old slot to the free-list, write bytes, and + make callers reach it via the move primitive (§8). Reapply its own relocations. +- **New atom**: place at cursor/free-list; resolve & apply its relocations. +- **Removed atom**: return its slot to the free-list; drop its symbols. + +### 7.3 The soundness gate — fall back to full link when +Compute the changed object's **interface** = { defined global names + bindings, +COMMON sizes/aligns, set of undefs }. The edit is *local* only if the interface +is identical to the persisted one **and** no archive pull-in changes. Otherwise +fall back. Triggers (grounded in `link_resolve.c`): +1. **Symbol-set / binding change** — added/removed global, weak↔strong flip: + changes global resolution (`bind_strength`) and which archive members pull. +2. **Archive pull-in change** — a new undef now selects a `.a` member that was + not in the prior link (`link_ingest_archives` is greedy single-pass). +3. **COMDAT ownership** — COFF SELECTANY keeps the *earlier* winner + (`link_resolve.c:308-323`). If the edited TU is the winner and its group body + changed, patch the shared body; if it is a loser, no-op; if ownership would + flip, fall back. COMMON size/align merge (`:288-303`) changing → fall back. +4. **TLS size change** — boundary syms `__tdata_start/end`, `__tbss_size` + (`link_layout.c`) shift if any TLS section resizes → fall back. +5. **Import-set change** (PLT/`.got.plt`/dynamic) → re-synthesize via + `fmt->layout_dyn` → fall back. +6. **Slack/free-list exhaustion** in any segment → fall back. +7. **Layout-affecting flags / linker script / `--gc-sections` / `-r` / LTO** → + full link (GC liveness is whole-graph; incompatible, as in gold). + +On fallback, **discard the half-mutated session** (the `LinkPatchTxn` watermark +rolls back `cursor[]`/free-list/append counts) and run a normal full link, which +because objects are resident is already far cheaper than a cold `cfree ld`. + +The JIT's duplicate-global *preflight* is the precedent for the gate — but it +**panics**; converting "detect non-local" into "roll back + full link" is new +control flow. + +--- + +## 8. Placement, slack, and the move-on-grow primitive + +**Slack.** Today sections are contiguous with only alignment padding +(`link_layout.c:340-348`). Under `--incremental`, reserve per-atom slack +(`slack_pct`, gold's `--incremental-patch=n` analog) so overwrite-in-place is the +common case. A two-level free-list (one of free file blocks, one per segment +bucket) recycles vacated slots, first-fit. + +**The move primitive — swappable.** When an atom moves, callers must still reach +it without their bytes changing. Abstract this as one hook with two +implementations; the rest of the design is identical either way. + +```c +/* src/link/link_move.h — the only A/B difference */ +typedef struct LinkMoveOps { + /* make all references to `atom` reach its NEW vaddr, without touching callers */ + void (*atom_moved)(LinkSession*, LinkAtomPlace* atom, u64 new_vaddr); +} LinkMoveOps; +``` + +### 8.1 Thunk-on-grow (ship first) — `LinkMoveOps` = thunk +Calls stay **direct** (`R_AARCH64_CALL26`, x64 `PLT32`, RV `CALL` — what codegen +emits today; cross-TU is direct per `src/obj/macho/link.c:537`, +`src/obj/elf/link.c:251`). On a move, leave a **jump island** at the atom's *old* +slot pointing to the new location. Callers branch to the old address as before → +hit the island → jump on. Properties: +- **No codegen change.** Pure linker. Reuses the `link_layout_jit_stubs` + (`link_reloc_layout.c:429`) island shape as a template (per arch: aa64 jit/iplt + stub, x64 iplt stub `src/obj/x64/link.c:40`, rv64 trampoline). +- **Reachability is free**: callers already branched directly to the old slot, + so the island there is in range by construction. +- **Tax**: an extra jump *only for functions that moved* (one island per + function that ever grew, re-pointed on subsequent grows). Unmoved functions + pay nothing. +- **Data caveat**: a thunk redirects code only. A grown *global* that must move + cannot be thunked. v1 rule: give data atoms generous slack and **fall back to + full link if a data atom outgrows its capacity** (never move data). This keeps + the thunk path entirely codegen-free. + +### 8.2 GOT-cell (convergence target) — `LinkMoveOps` = got +Under `--incremental`, codegen emits cross-unit calls (and movable-data loads) +through a GOT cell (aa64 `ADRP+LDR+BLR`, x64 `call *cell(%rip)`, rv64 +`auipc+ld+jalr`). A move updates **one** cell. Properties: +- **Per-arch codegen change** (instruction selection + reloc kinds) for calls + *and* data — `reloc_uses_got` (`link_reloc_layout.c:376`) currently lists only + GOT-relative kinds, and `link_layout_got` only allocates slots for those. +- **Tax**: one extra indirect load on *every* cross-unit reference, uniformly. +- **GOT growth**: a new cross-unit target adds a slot, but the GOT is a single + exactly-sized segment at the image end (`link_reloc_layout.c:710-748`). Needs + **reserved GOT slack + a GOT free-list**, with fall-back on exhaustion — + otherwise adding a slot moves the GOT and breaks stability for existing slots. +- **Strategic upside**: it is the *same* primitive `doc/HOT_RELOAD.md §7` assumes + ("one slot per function changes; call sites not patched"). One GOT-cell-update + mechanism would then serve JIT hot reload *and* file incremental link. + +**Why thunk-first.** Thunk taxes only what moves and needs zero codegen, so it +proves the slack/free-list/persistence/soundness machinery end-to-end fastest. +The free-list, slack, session, and gate are reused verbatim when we later swap in +GOT cells; only `LinkMoveOps` changes. Converge on GOT when hot reload needs it. + +--- + +## 9. Relocation reuse & application + +`LinkRelocApply` records are durable data, never burned into bytes before emit +(invariant `link.h:234-246`). Incremental link leans on this hard. + +- **Relative + symbolic form.** Persist each reloc as `(atom_content_id, + offset_within_atom, kind, target_name, addend)`. At apply time the absolute + write address is `atom.vaddr + offset_within_atom` and the target address is + the *current* placement of `target_name`. **An atom that moves needs no reloc + rewriting** — both addresses re-derive from current placements (principle 4). +- **Reapply only the changed atom's relocs**, found via the new per-atom index + (§5.1). Unchanged atoms' relocs are never touched. +- **Apply path.** File emit currently writes relocations destructively into + `segment_bytes` at emit (`src/obj/elf/link.c:318-470`). For patching we apply a + single atom's relocs into its (possibly newly placed) bytes using the same + `reloc_apply.c` kind dispatch the JIT uses, then re-emit only the changed + segment ranges. Per-arch reloc kinds already flow through `reloc_apply.c`. + +--- + +## 10. Persisted incremental state + +Side-band, content-addressed — **not** gold's ELF-embedded `.gnu_incremental_*` +sections (those are ELF-only; we are multi-format). Store one blob in the +existing `driver/dist` CAS (`dist_cas_put_blob`, BLAKE2b) keyed by the link +action id (a build-system concern; §16). The blob records, per input and per +atom: +- object content id + atom content ids (the diff keys); +- `LinkAtomPlace` table: vaddr / file_offset / size / capacity / bucket; +- symbol → vaddr bindings, keyed by **name**; +- relocations in the relative+symbolic form of §9; +- free-list state and per-segment cursors/limits. + +Because everything is content/name-keyed, reloading does **not** depend on a +fresh `Linker` re-deriving identical `LinkSymId`s. The determinism audit (§12) +becomes a dedup optimization, not a correctness gate. We still add a cheap guard: +on reload, verify each referenced object blob's BLAKE2b matches its recorded id +before trusting it (defends against a torn/garbage cache entry). + +--- + +## 11. Image identity / build-id + +`link_image_id_compute` (`link_image_id.c:31`) is FNV-1a streamed over **every** +segment's vaddr + file_size + bytes — `O(image)` and not incrementally +updatable. For patching, compute a **per-segment subhash** and combine them +(Merkle-style) into the image id, so a patch re-hashes only changed segments. +Note this hash is FNV-1a and independent of the BLAKE2b used for content/CAS; +keep them distinct. + +Consequence (acceptable, document loudly): an incremental output is **not +byte-reproducible against a from-scratch full link** of the same sources — slack +padding and (under GOT mode) indirection differ. Release builds (`--incremental` +off) are canonical and reproducible. + +--- + +## 12. Address-stability & determinism invariants + +- **Stability (falsifiable):** after a patch, `nm`/`addr2line` on an *unchanged* + symbol must return the identical vaddr as before. Enforced by + overwrite-in-slack / append-to-free-slot, never compact. +- **Determinism audit (prerequisite for dedup, not correctness):** confirm that + identical `(source, flags, target)` yields byte-identical objects — audit + symbol ordering and `pool_intern` first-access order in obj emit. With + content/name keying (§10) a nondeterministic order only costs cache dedup, not + a wrong patch; but byte-stability is still wanted so two machines agree. +- **Reloc re-derivation:** never store an absolute `write_vaddr`; always + `atom.vaddr + offset_within_atom` (principle 4). + +--- + +## 13. Debug info (DWARF) consistency + +- A moved atom's `.debug_info`/`.debug_line`/`.debug_aranges` address ranges + change → reapply that atom's debug relocs (re-derived like §9). Unchanged + atoms' debug stays byte-stable because their addresses do. +- v1 stance: rebuild only the *changed TU's* debug sections, `O(change)`. An + in-slack overwrite that does not move the atom leaves addresses (and therefore + `.debug_line` byte content) unchanged — free, but see the open question on + line-number-only shifts (§20). +- `addr2line` and `cfree dbg` re-read debug from the patched image. The JIT path + invalidates a cached view by generation counter; a file consumer re-reads the + file, so the build-id change (§11) is the staleness signal. + +--- + +## 14. Multi-format & multi-arch + +- **ELF first.** The atom + slack + move-primitive model is format-agnostic, but + Mach-O carries whole-image structures (chained fixups, `LC_DYLD_INFO`, + code-signature, `LC_UUID`) that resist in-place patching; enumerate which load + commands must be regenerated before attempting Mach-O incremental. COFF later. + Persisted state is side-band CAS for all three (§10), so no per-format on-disk + incremental metadata. +- **Per-arch surface is small:** only (a) the move primitive's island/cell shape + and (b) the branch-into-island/cell reloc kind. aa64 has the jit-stub shape to + reuse; x64 (`src/obj/x64/link.c:40`) and rv64 each have a trampoline shape to + adapt. All reloc kinds already dispatch through `reloc_apply.c`. +- CI exercises the patch path on **ELF/aa64 + ELF/x64** first (per project + "narrow test runs"); rv64 and Mach-O/COFF follow. + +--- + +## 15. The frontend contract (shared across all frontends) + +All frontends converge to `ObjBuilder` and join the shared path at +`obj_finalize` (`src/api/compile.c:356`), so the incremental machinery attaches +once, frontend-agnostically. **Toy, asm, and WASM get incremental link with no +frontend-specific code.** The "clear expectations" are a small optional +capability plus four guarantees. + +```c +/* include/cfree/compile.h — optional addition to CfreeFrontendVTable */ +typedef struct CfreeFrontendCaps { + const char* frontend_id; /* "c" / "toy" / "asm" / "wasm" */ + u32 schema_version; /* bump on any codegen/output change */ + /* report external inputs read this compile (for the build system's key). */ + CfreeStatus (*report_deps)(CfreeFrontendState*, const CfreeFrontendCompileOptions*, + const CfreeSourceInput*, CfreeDepSink*); +} CfreeFrontendCaps; +``` + +The contract each frontend must honor to be incrementally safe: +1. **Deterministic output** — identical `(source, flags, target, deps)` ⇒ + byte-identical `ObjBuilder` (§12). +2. **Declared dependency set** — report every external input read. C reuses the + existing `CfreeDepIter` (`src/api/compile.c:417-462`); asm/Toy/WASM report + "none" (single-source TUs). +3. **Stable, source-derived symbol naming** — no run-varying temp names; atom + content ids depend on it (§5). +4. **Identity + version** — `frontend_id` + `schema_version` salt the + build-system key so any frontend change invalidates correctly. + +**Per-frontend cost:** C — low (wire `CfreeDepIter` + a version constant). asm, +Toy, WASM — trivial (no deps; version constant). + +**Toy's REPL wrinkle.** Toy's durable module (the existing `commit`/`abort` +hooks, `lang/toy/compile.c:215-223`) means the REPL path is *not* a pure function +of source. That path either folds the module snapshot into the input key or opts +out of caching; Toy's **batch/file** compile conforms like any other frontend. + +--- + +## 16. The interface boundary the build system consumes + +The separate build-system plan (build graph, cache, watch) calls only this +public surface; it never touches `src/link` internals. + +```c +/* include/cfree/object.h */ +CfreeStatus cfree_obj_content_id(CfreeObjBuilder*, uint8_t out[CFREE_BLAKE2B_LEN]); + +/* include/cfree/link.h — new incremental session surface */ +typedef enum { CFREE_LINK_PATCHED, CFREE_LINK_FELL_BACK_FULL } CfreeLinkOutcome; + +CfreeStatus cfree_link_session_open_incremental(CfreeLinkSession*, + const void* persisted, size_t persisted_len); /* NULL = cold */ +CfreeStatus cfree_link_session_replace_input(CfreeLinkSession*, CfreeLinkInputId, + CfreeObjBuilder* changed); /* by content */ +CfreeStatus cfree_link_session_patch_emit(CfreeLinkSession*, CfreeWriter* image, + CfreeWriter* persisted_out, CfreeLinkOutcome* outcome); +``` + +The build system supplies changed objects (it decides *which* via its cache), +gets back the patched image, the new persisted blob, and — crucially — the +**outcome** so it knows whether the fast path applied or the link fell back. The +object content id lets it detect "this TU's object is byte-identical, skip it." + +--- + +## 17. Failure behavior (transactional) + +A patch is all-or-nothing from the consumer's view: +- compile/resolve failure, gate fallback, slack exhaustion, or reloc-apply + failure ⇒ the image is unchanged (or a clean full link is produced). +- Pages/bytes may have been written before a late failure; the `LinkPatchTxn` + watermark rolls back `cursor[]`, the free-list undo log, and the + atom/symbol/section/reloc counts so no partial result is published. + +--- + +## 18. Implementation sequence + +**M0 — atom identity & obj indices (no behavior change).** +`obj_content_id` / `obj_atom_content_id`, per-atom reloc index, symbol-by-name +hash, deterministic round-trip + a determinism test. Wire `CfreeFrontendCaps` +and the contract (C deps via `CfreeDepIter`; trivial for others). + +**M1 — `LinkSession` + append-only extend (Stage A).** +Introduce `LinkSession`, implement `link_resolve_extend` for append-only against +a file image, reusing JIT cursor/slack *placement* but **falling back, not +panicking**. Persisted blob round-trips (§10). + +**M2 — patch changed atoms in slack (Stage B, no move yet).** +Per-atom diff, overwrite-in-slack, reapply that atom's relocs (§9), per-segment +build-id (§11), the soundness gate + transactional rollback (§7.3, §17). Atoms +that would grow past capacity ⇒ fall back (no move primitive yet). + +**M3 — move-on-grow via thunk (`LinkMoveOps` = thunk).** +Free-list, grow-relocate code atoms, jump islands (reuse jit-stub shape), data +slack + fall-back-on-data-grow. ELF/aa64 then ELF/x64. + +**M4 — converge on GOT-cell (`LinkMoveOps` = got), if/when hot reload needs it.** +`--incremental` codegen mode for cross-unit calls + movable data, reserved GOT +slack + free-list. Shares the primitive with `doc/HOT_RELOAD.md`. + +Mach-O/COFF and rv64 patching follow M3/M4 per §14. + +--- + +## 19. Test plan (narrow, per-arch, red-green) + +Prefer targeted runs; redirect output to a file and read it (project rules). + +- **M0:** compile `tmp/projects/lua/src/ltable.c` twice ⇒ identical + `obj_content_id` (determinism). Edit one function body ⇒ exactly that atom's + content id changes, others stable. aa64 + x64 only. +- **M1:** initial object + appended object where appended code calls an initial + function; appended duplicate-strong-def ⇒ fall back (not panic); unresolved + ⇒ transactional, image unchanged. +- **M2:** build `liblua.a` + `lua`; patch one in-slack function body ⇒ unchanged + symbols keep vaddrs (`nm` diff), binary runs (`test/lib` `exec_target`), and + `link_resolve` whole-program path was *not* taken (instrument a counter, dump + to file). Negative: add a new global ⇒ fall back; weak↔strong flip ⇒ fall back; + new archive pull-in ⇒ fall back. +- **M3:** grow a function past its slack ⇒ it relocates, an island appears at the + old slot, callers' bytes are byte-identical, result runs. Grow a *global* past + data slack ⇒ fall back. `addr2line` an unchanged function after a patch ⇒ + correct file:line. +- **Multi-output:** edit a core TU shared by `lua` and `luac` ⇒ both images + patch (or both fall back) consistently. + +--- + +## 20. Open questions / decisions + +1. **DWARF on in-slack overwrite:** accept that an overwrite that does not move + the atom leaves `.debug_line` byte-identical (free) even if source *line + numbers* shifted within the body — or always re-emit the atom's `.debug_line` + on any body change (correct, slightly slower)? (§13) +2. **Data movement under thunk mode:** v1 forbids moving data (slack + fall + back). Is the slack budget for data atoms tunable per project, or fixed? +3. **GOT convergence trigger:** build M4 only when hot reload needs the shared + cell, or proactively to unify the two paths sooner? (§8.2) +4. **Determinism guarantee strength:** require byte-stable objects (enables + cross-machine dedup) or only content-keyed correctness (§12)? +5. **Persisted-blob lifetime/keying:** the link action id is a build-system + concern (§16) — confirm the boundary: does the build system own the CAS key, + or does the link session? +6. **Mach-O/COFF scope:** confirm ELF-only for v1 (§14); enumerate Mach-O + whole-image structures before committing to patch them. diff --git a/test/asm/hostas_cross.sh b/test/asm/hostas_cross.sh @@ -36,10 +36,16 @@ # fidelity in the x64 cc -S data round-trip — confirmed cc -S # infidelity (direct `cc -c` executes correctly). Opt-in until # that backlog closes. See doc/ASM_ROUNDTRIP_TESTING.md. -# - riscv64-linux: `cc -S | cfree as | cfree ld` works; cross-EXEC pends a -# healthy riscv64 user-mode emulator — the cfree-built static -# rv64 ELF hangs under this host's podman qemu-riscv64 (a -# clang-built one runs), so the bounded exec smoke SKIPS it. +# - riscv64-linux: assembles, but cross-EXEC hangs — the rv64 cc -S round-trip +# is unfaithful because rv64 has no symbolizer (ArchAsmOps): +# the call emits `auipc ra,0x0; jalr ra,0(ra)` with the +# R_RISCV_CALL reloc unsymbolized, so it calls itself (and +# branches like `j 0x90` keep numeric targets). NOT an +# emulation issue — a minimal clang rv64 static exe and the +# DIRECT cfree `cc -c` object both run correctly under the +# same qemu-riscv64. Needs an rv64 ArchAsmOps (is_local_branch +# for j/beq/...; reloc_operand for %pcrel_hi/%pcrel_lo/call). +# The bounded exec smoke SKIPS it until then. # # Override the matrix with CFREE_HOSTAS_CROSS_TARGETS="tag:triple ..." and the # clang-as gate with CFREE_HOSTAS_ENFORCE_CLANG=0 (demote lane B to XFAIL).