Linker (planned work)

This roadmap covers where the kit linker is headed beyond the static and JIT linking it does today. It is dominated by incremental linking: two related but distinct workstreams — append-only growth of a live JIT image (the kit dbg / kit emu consumer) and file-based incremental object linking (the build-system consumer, the "m2" redesign). Both rest on the same linker invariants — address stability, durable non-destructive relocation records, content-keyed reuse — and both fall back to a correct full link whenever a change cannot be proven local. For the linker's current architecture, passes, and invariants see ../LINK.md; for how a resolved image runs in process see ../JIT.md; for the object substrate see ../OBJ.md; for the build-system layer that consumes the file-incremental interface see ../BUILD.md and the distribution CAS in ../DISTRIBUTE.md.

Why incremental, and the shared invariants

The full link is always available and always correct. Incremental linking is an accelerator gated on a soundness check: a correct-but-slow result always beats a fast-but-wrong one. Three invariants hold across both workstreams and must never be violated by any incremental path:

Address stability. Once a runtime/file vaddr is observable it never moves. Unchanged atoms keep their bytes and their addresses, so their relocations are never reapplied — this is what makes a patch cost O(change). Enforced by overwrite-in-slack / append-to-free-slot, never compact.
Relocations are durable, relative, and symbolic. LinkRelocApply records survive as data and are not burned into bytes before emit. Persist each as (atom, offset-within-atom, kind, target-name, addend); derive the absolute write address and target address from current placements at apply time. An atom that moves then needs zero reloc rewriting.
Content-hash keying, not transient IDs. LinkInputId/LinkSymId are stable only in-process. Persisted state is keyed by content hashes and symbol names, never by re-derived IDs, so determinism is a dedup nicety, not a correctness requirement.

Workstream 1 — append-only incremental JIT link

Grow one live KitJit image with additional compiled objects while keeping every previously published runtime address stable. New code may reference old symbols; old debugger surfaces (kit_jit_lookup, kit_jit_addr_to_sym, symbol iteration, breakpoints, PC translation, the JIT debug view) must see new symbols. This is explicitly not hot reload: existing code is never replaced or repatched (see ../DBG.md for the debugger and the separate hot-reload design).

Done (baseline)

The in-process append path is implemented and is the foundation the rest builds on: the kit_jit_publish surface (an append/replace batch driven by a KitLinkSession, reporting a bumped generation); append cursors with reserved RX/R/RW/TLS slack over one contiguous master VA reservation committed page-by-page; transactional rollback of cursors and symbol/section/reloc counts on failure; generation-bumped invalidation of the cached kit_jit_view; symbol resolution against the existing image, the append batch, and the external resolver with duplicate-strong detection; and a dbg REPL that drives compile → append → DWARF refresh with the worker stopped so line-table replacement never races a running thread.

Remaining

Pending source-level breakpoints across appends. Today a b file:line for a file not yet covered stays unresolved until retried. Maintain pending source breakpoint specs and arm them automatically after each append.
Archive reselection on append. v1 resolves a snippet against the already-linked image plus the external resolver only. A later cut can let appended inputs pull fresh archive members, sharing the gate logic from the file-incremental gate (below).
kit emu's append consumer. Per-basic-block JIT translation wants to grow a single LinkImage as cold blocks land (see ../EMU.md §6). This is a separate consumer of the same append machinery and lands alongside the emu lifter cut. It is the motivation for the link_resolve_at(Linker*, base_va) / link_resolve_extend(Linker*, LinkImage*) entries in src/link/link.c, which are panic stubs today — see the shared surface below.
Promote the API. kit_jit_publish stays experimental until a second consumer (emu) exercises its append/replace batch, then settles as the stable extend surface.

Workstream 2 — file-based incremental object link (the "m2" redesign)

The goal is "instant" relinks for dev builds: after editing one translation unit in a project of N TUs, the link cost should be O(changed atoms + their relocations), not O(whole program). Compile cost (caching, dependency scanning, the build graph and watch/daemon modes) is the build system's problem and is out of scope here — this workstream is the obj/link substrate that layer stands on. Incremental link is a -O0/-O1 dev feature; release builds (--incremental off) always full-link, clean, and remain the canonical reproducible artifact.

Done (baseline) — "Done for ELF"

The first cut landed on ELF as the reference format, with the acceptance suite (test/link-incremental/) green on ELF/aa64 + ELF/x64: atom content identity, per-atom reloc/symbol indices, the LinkSession with per-segment cursors/slack/free-list, append-only extend, patch-in-slack, the soundness gate with transactional rollback, per-segment build-id, per-changed-TU debug regen, and move-on-grow via thunk. This is the starting point; the rest of this section is what is not yet built.

The m2 redesign — design intent

The redesign's central decision: incrementality is not a parallel API — it is the existing link session made fully mutable. A full link is the degenerate cold case (no prior state, nothing replaced); an incremental relink seeds prior state and replaces the changed inputs. The build system always drives the same session, and resolve internally decides patch-vs-full and reports which via an outcome enum (FULL / PATCHED / FELL_BACK_FULL). There is no separate "incremental" entry point that could drift from the full-link path. This directly matches the internal direction where link_resolve is "inputs → image" and the link_resolve_at / link_resolve_extend surface makes that resolve extend-capable.

The atom is the patch unit — one function or one data object. Under --incremental, frontends emit one section per function/global (a -ffunction-sections/-fdata-sections equivalent) so each atom is independently placeable; kit already lays out kept atoms as individual LinkSections. Each atom gets a BLAKE2b content id over its canonical form (bytes || align || flags || canonical(relocs)), the diff key.

The soundness gate

Reuse is correct only when the change cannot alter symbol resolution. The edit is local only when the changed object's interface (defined global names + bindings, COMMON sizes/aligns, set of undefs) is unchanged and no archive pull-in changes; anything that can shift layout or resolution — symbol-set/binding flips, new archive members, COMDAT/COMMON merge changes, TLS-size shifts, import-set changes, slack/free-list exhaustion (data is never thunked), or layout-affecting flags — forces a fall-back. On fall-back the half-mutated session is discarded via the LinkPatchTxn watermark and a correct full link runs; the JIT append path's duplicate-global preflight is the precedent, but it panics, so converting "detect non-local" into "roll back + full link" is the new control flow at the heart of the redesign. See ../LINK.md for the full trigger set and rollback mechanics.

The move-on-grow primitive (swappable)

When an atom outgrows its slot it must move, and callers must still reach it without their bytes changing. This is abstracted behind a single LinkMoveOps.atom_moved hook with two implementations; the rest of the design (atoms, slack, free-list, persisted session, the gate) is identical either way.

Thunk-on-grow — ship first. Calls stay direct (what codegen emits today). On a move, leave a jump island at the atom's old slot pointing to the new location; callers branch to the old address and hit the island. No codegen change, reachability is free by construction, and the tax is one extra jump only for functions that actually moved. Reuses the existing JIT call-stub island shape per arch. Data cannot be thunked, so a grown data atom that outgrows its slack falls back to a full link.
GOT-cell — convergence target. Under --incremental, codegen emits cross-unit calls and movable-data loads through a GOT cell; a move updates one cell. Costs a per-arch codegen change and a uniform extra indirect load, and needs reserved GOT slack + a GOT free-list (the GOT is one exactly-sized end segment today). Its strategic value is that it is the same primitive hot reload assumes, so one mechanism would serve both JIT hot reload and file incremental link. Build it when hot reload is scheduled, designed then to serve both — unifying earlier is speculative.

Persisted incremental state

Side-band and content-addressed — not ELF-embedded incremental sections, because kit is multi-format. Store one blob in the existing driver/dist BLAKE2b CAS, recording per input and per atom: object + atom content ids, the LinkAtomPlace table (vaddr / file_offset / size / capacity / bucket), symbol→vaddr bindings keyed by name, relocations in relative+symbolic form, and free-list + per-segment cursor state. The session reads/writes it as opaque bytes through KitWriter; the build system owns the key, CAS storage, and lifetime — libkit stays IO/CAS-free.

Remaining work

Resolve the panic stubs. link_resolve_at and link_resolve_extend in src/link/link.c are still compiler_panic stubs on the main path. They are the public extend-capable surface for both the file-incremental consumer and the emu append consumer; wiring them to the LinkSession patch/extend logic (and to graceful fallback rather than panic on exhaustion) is the remaining integration step to land the redesign on the main resolve path.
Non-ELF formats. The atom + slack + move-primitive core is format-agnostic; the difference is per-format machinery, so the order is ELF (done) → COFF/PE → Mach-O. COFF/PE is the incremental-friendly case (IAT-indirected imports, per-page base relocs, side-band PDB debug) and is gated mainly on kit's COFF maturity — see ../OBJ.md. Mach-O is heaviest but feasible last: each of __LINKEDIT fixups, the export trie, the indirect symtab, and the per-page code-signing CodeDirectory needs a bounded (not O(image)) incremental updater. Until a format's updater lands, that format falls back to the fast in-process full link.
GOT-cell move primitive. Deferred until hot reload is scheduled (above); the free-list, slack, session, and gate are reused verbatim when it lands — only LinkMoveOps changes.
rv64 patch path. The per-arch surface is small — the island/cell shape and the branch-into-island reloc kind. CI exercises ELF/aa64 + ELF/x64 first; rv64 follows by adapting its trampoline shape.
Incremental build-id. Per-segment FNV-1a subhashes combined Merkle-style so a patch re-hashes only changed segments, replacing the current whole-image O(image) build-id. Keep this FNV-1a distinct from the BLAKE2b used for content/CAS keying.
Determinism regression lock. Object emission is already byte-deterministic; lock it with a two-compiles-equal regression test to enable cross-machine / shared-cache dedup. Content/name keying stays the correctness backbone so any future drift degrades dedup, never correctness.

Frontend contract and debug-info consistency

All frontends converge to ObjBuilder and join the shared path at obj_finalize, so the machinery attaches once, frontend-agnostically — Toy, asm, and WASM get incremental link with no frontend-specific code. To be incrementally safe a frontend must produce deterministic output for identical (source, flags, target, deps), declare its external dependency set (C reuses KitDepIter; single-source frontends report none), use stable source-derived symbol names, and expose a frontend_id + schema_version that salts the build-system key. Toy's durable-module REPL path is not a pure function of source, so it folds the module snapshot into the input key or opts out of caching; Toy's batch/file compile conforms like any other frontend.

On debug info: on any changed atom, re-emit that TU's full .debug_*. kit emits one monolithic .debug_line program and one .debug_info CU with intra-CU DW_FORM_ref4 offsets, so a function's rows cannot be spliced in isolation; and a body change rewrites the instruction→line mapping even when the atom did not move, so "keep stale .debug_line" is incoherent. Per-TU regen is O(changed TU), cheap relative to the rest of the patch, and unchanged TUs' debug stays byte-stable because their atoms keep their addresses. Per-function CUs for O(atom) debug are a future option, not pursued now. See ../DWARF.md.

Acceptance: definition of done per format

The executable spec lives in test/link-incremental/, authored test-first (red → green). Its synthetic multi-TU fixture (core TUs archived into a static library linked into two executables that share it; no third-party deps) covers an in-slack body edit (PATCHED, every vaddr stable, whole-program link_resolve counter does not increment), a grow-past-slack edit (PATCHED, atom moves, jump island at the old address, caller bytes byte-identical), the soundness gate (each non-local edit ⇒ FELL_BACK_FULL matching a from-scratch link), multi-output consistency, determinism, and a no-op relink. The two gates that define correctness are vaddr-stability on a patch and fall-back on a non-local edit; both must be green before a format is "done." ELF/aa64 + ELF/x64 are done; COFF, Mach-O, and the rv64 patch path each repeat this bar. See ../TESTING.md.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README