Linker (planned work)
This roadmap covers where the kit linker is headed beyond the static
and JIT linking it does today. It is dominated by incremental linking:
two related but distinct workstreams — append-only growth of a live JIT
image (the kit dbg / kit emu consumer) and file-based incremental
object linking (the build-system consumer, the "m2" redesign). Both rest
on the same linker invariants — address stability, durable non-destructive
relocation records, content-keyed reuse — and both fall back to a correct
full link whenever a change cannot be proven local. For the linker's
current architecture, passes, and invariants see ../LINK.md;
for how a resolved image runs in process see ../JIT.md; for
the object substrate see ../OBJ.md; for the build-system layer
that consumes the file-incremental interface see ../BUILD.md
and the distribution CAS in ../DISTRIBUTE.md.
Why incremental, and the shared invariants
The full link is always available and always correct. Incremental linking is an accelerator gated on a soundness check: a correct-but-slow result always beats a fast-but-wrong one. Three invariants hold across both workstreams and must never be violated by any incremental path:
- Address stability. Once a runtime/file vaddr is observable it never
moves. Unchanged atoms keep their bytes and their addresses, so their
relocations are never reapplied — this is what makes a patch cost
O(change). Enforced by overwrite-in-slack / append-to-free-slot, never compact. - Relocations are durable, relative, and symbolic.
LinkRelocApplyrecords survive as data and are not burned into bytes before emit. Persist each as(atom, offset-within-atom, kind, target-name, addend); derive the absolute write address and target address from current placements at apply time. An atom that moves then needs zero reloc rewriting. - Content-hash keying, not transient IDs.
LinkInputId/LinkSymIdare stable only in-process. Persisted state is keyed by content hashes and symbol names, never by re-derived IDs, so determinism is a dedup nicety, not a correctness requirement.
Workstream 1 — append-only incremental JIT link
Grow one live KitJit image with additional compiled objects while
keeping every previously published runtime address stable. New code may
reference old symbols; old debugger surfaces (kit_jit_lookup,
kit_jit_addr_to_sym, symbol iteration, breakpoints, PC translation,
the JIT debug view) must see new symbols. This is explicitly not hot
reload: existing code is never replaced or repatched (see
../DBG.md for the debugger and the separate hot-reload
design).
Done (baseline)
The in-process append path is implemented and is the foundation the rest
builds on: the kit_jit_publish surface (an append/replace batch driven
by a KitLinkSession, reporting a bumped generation); append cursors with
reserved RX/R/RW/TLS slack over one contiguous master
VA reservation committed page-by-page; transactional rollback of cursors
and symbol/section/reloc counts on failure; generation-bumped invalidation
of the cached kit_jit_view; symbol resolution against the existing
image, the append batch, and the external resolver with duplicate-strong
detection; and a dbg REPL that drives compile → append → DWARF refresh
with the worker stopped so line-table replacement never races a running
thread.
Remaining
- Pending source-level breakpoints across appends. Today a
b file:linefor a file not yet covered stays unresolved until retried. Maintain pending source breakpoint specs and arm them automatically after each append. - Archive reselection on append. v1 resolves a snippet against the already-linked image plus the external resolver only. A later cut can let appended inputs pull fresh archive members, sharing the gate logic from the file-incremental gate (below).
kit emu's append consumer. Per-basic-block JIT translation wants to grow a singleLinkImageas cold blocks land (see ../EMU.md §6). This is a separate consumer of the same append machinery and lands alongside the emu lifter cut. It is the motivation for thelink_resolve_at(Linker*, base_va)/link_resolve_extend(Linker*, LinkImage*)entries insrc/link/link.c, which are panic stubs today — see the shared surface below.- Promote the API.
kit_jit_publishstays experimental until a second consumer (emu) exercises its append/replace batch, then settles as the stable extend surface.
Workstream 2 — file-based incremental object link (the "m2" redesign)
The goal is "instant" relinks for dev builds: after editing one
translation unit in a project of N TUs, the link cost should be
O(changed atoms + their relocations), not O(whole program). Compile
cost (caching, dependency scanning, the build graph and watch/daemon
modes) is the build system's problem and is out of scope here — this
workstream is the obj/link substrate that layer stands on. Incremental
link is a -O0/-O1 dev feature; release builds (--incremental off)
always full-link, clean, and remain the canonical reproducible artifact.
Done (baseline) — "Done for ELF"
The first cut landed on ELF as the reference format, with the acceptance
suite (test/link-incremental/) green on ELF/aa64 + ELF/x64: atom
content identity, per-atom reloc/symbol indices, the LinkSession with
per-segment cursors/slack/free-list, append-only extend, patch-in-slack,
the soundness gate with transactional rollback, per-segment build-id,
per-changed-TU debug regen, and move-on-grow via thunk. This is the
starting point; the rest of this section is what is not yet built.
The m2 redesign — design intent
The redesign's central decision: incrementality is not a parallel API —
it is the existing link session made fully mutable. A full link is the
degenerate cold case (no prior state, nothing replaced); an incremental
relink seeds prior state and replaces the changed inputs. The build system
always drives the same session, and resolve internally decides
patch-vs-full and reports which via an outcome enum
(FULL / PATCHED / FELL_BACK_FULL). There is no separate "incremental"
entry point that could drift from the full-link path. This directly
matches the internal direction where link_resolve is "inputs → image"
and the link_resolve_at / link_resolve_extend surface makes that
resolve extend-capable.
The atom is the patch unit — one function or one data object. Under
--incremental, frontends emit one section per function/global (a
-ffunction-sections/-fdata-sections equivalent) so each atom is
independently placeable; kit already lays out kept atoms as individual
LinkSections. Each atom gets a BLAKE2b content id over its canonical
form (bytes || align || flags || canonical(relocs)), the diff key.
The soundness gate
Reuse is correct only when the change cannot alter symbol resolution. The
edit is local only when the changed object's interface (defined global
names + bindings, COMMON sizes/aligns, set of undefs) is unchanged and no
archive pull-in changes; anything that can shift layout or resolution —
symbol-set/binding flips, new archive members, COMDAT/COMMON merge changes,
TLS-size shifts, import-set changes, slack/free-list exhaustion (data is
never thunked), or layout-affecting flags — forces a fall-back. On
fall-back the half-mutated session is discarded via the LinkPatchTxn
watermark and a correct full link runs; the JIT append path's
duplicate-global preflight is the precedent, but it panics, so
converting "detect non-local" into "roll back + full link" is the new
control flow at the heart of the redesign. See ../LINK.md for
the full trigger set and rollback mechanics.
The move-on-grow primitive (swappable)
When an atom outgrows its slot it must move, and callers must still reach
it without their bytes changing. This is abstracted behind a single
LinkMoveOps.atom_moved hook with two implementations; the rest of the
design (atoms, slack, free-list, persisted session, the gate) is identical
either way.
- Thunk-on-grow — ship first. Calls stay direct (what codegen emits today). On a move, leave a jump island at the atom's old slot pointing to the new location; callers branch to the old address and hit the island. No codegen change, reachability is free by construction, and the tax is one extra jump only for functions that actually moved. Reuses the existing JIT call-stub island shape per arch. Data cannot be thunked, so a grown data atom that outgrows its slack falls back to a full link.
- GOT-cell — convergence target. Under
--incremental, codegen emits cross-unit calls and movable-data loads through a GOT cell; a move updates one cell. Costs a per-arch codegen change and a uniform extra indirect load, and needs reserved GOT slack + a GOT free-list (the GOT is one exactly-sized end segment today). Its strategic value is that it is the same primitive hot reload assumes, so one mechanism would serve both JIT hot reload and file incremental link. Build it when hot reload is scheduled, designed then to serve both — unifying earlier is speculative.
Persisted incremental state
Side-band and content-addressed — not ELF-embedded incremental
sections, because kit is multi-format. Store one blob in the existing
driver/dist BLAKE2b CAS, recording per input and per atom: object + atom
content ids, the LinkAtomPlace table (vaddr / file_offset / size /
capacity / bucket), symbol→vaddr bindings keyed by name, relocations in
relative+symbolic form, and free-list + per-segment cursor state. The
session reads/writes it as opaque bytes through KitWriter; the build
system owns the key, CAS storage, and lifetime — libkit stays IO/CAS-free.
Remaining work
- Resolve the panic stubs.
link_resolve_atandlink_resolve_extendinsrc/link/link.care stillcompiler_panicstubs on the main path. They are the public extend-capable surface for both the file-incremental consumer and the emu append consumer; wiring them to theLinkSessionpatch/extend logic (and to graceful fallback rather than panic on exhaustion) is the remaining integration step to land the redesign on the main resolve path. - Non-ELF formats. The atom + slack + move-primitive core is
format-agnostic; the difference is per-format machinery, so the order is
ELF (done) → COFF/PE → Mach-O. COFF/PE is the incremental-friendly case
(IAT-indirected imports, per-page base relocs, side-band PDB debug) and
is gated mainly on kit's COFF maturity — see ../OBJ.md.
Mach-O is heaviest but feasible last: each of
__LINKEDITfixups, the export trie, the indirect symtab, and the per-page code-signing CodeDirectory needs a bounded (notO(image)) incremental updater. Until a format's updater lands, that format falls back to the fast in-process full link. - GOT-cell move primitive. Deferred until hot reload is scheduled
(above); the free-list, slack, session, and gate are reused verbatim
when it lands — only
LinkMoveOpschanges. - rv64 patch path. The per-arch surface is small — the island/cell shape and the branch-into-island reloc kind. CI exercises ELF/aa64 + ELF/x64 first; rv64 follows by adapting its trampoline shape.
- Incremental build-id. Per-segment FNV-1a subhashes combined
Merkle-style so a patch re-hashes only changed segments, replacing the
current whole-image
O(image)build-id. Keep this FNV-1a distinct from the BLAKE2b used for content/CAS keying. - Determinism regression lock. Object emission is already byte-deterministic; lock it with a two-compiles-equal regression test to enable cross-machine / shared-cache dedup. Content/name keying stays the correctness backbone so any future drift degrades dedup, never correctness.
Frontend contract and debug-info consistency
All frontends converge to ObjBuilder and join the shared path at
obj_finalize, so the machinery attaches once, frontend-agnostically —
Toy, asm, and WASM get incremental link with no frontend-specific code. To
be incrementally safe a frontend must produce deterministic output for
identical (source, flags, target, deps), declare its external dependency
set (C reuses KitDepIter; single-source frontends report none), use
stable source-derived symbol names, and expose a frontend_id +
schema_version that salts the build-system key. Toy's durable-module REPL
path is not a pure function of source, so it folds the module snapshot into
the input key or opts out of caching; Toy's batch/file compile conforms
like any other frontend.
On debug info: on any changed atom, re-emit that TU's full .debug_*.
kit emits one monolithic .debug_line program and one .debug_info CU
with intra-CU DW_FORM_ref4 offsets, so a function's rows cannot be spliced
in isolation; and a body change rewrites the instruction→line mapping even
when the atom did not move, so "keep stale .debug_line" is incoherent.
Per-TU regen is O(changed TU), cheap relative to the rest of the patch,
and unchanged TUs' debug stays byte-stable because their atoms keep their
addresses. Per-function CUs for O(atom) debug are a future option, not
pursued now. See ../DWARF.md.
Acceptance: definition of done per format
The executable spec lives in test/link-incremental/, authored test-first
(red → green). Its synthetic multi-TU fixture (core TUs archived into a
static library linked into two executables that share it; no third-party
deps) covers an in-slack body edit (PATCHED, every vaddr stable,
whole-program link_resolve counter does not increment), a grow-past-slack
edit (PATCHED, atom moves, jump island at the old address, caller bytes
byte-identical), the soundness gate (each non-local edit ⇒
FELL_BACK_FULL matching a from-scratch link), multi-output consistency,
determinism, and a no-op relink. The two gates that define correctness are
vaddr-stability on a patch and fall-back on a non-local edit; both must be
green before a format is "done." ELF/aa64 + ELF/x64 are done; COFF, Mach-O,
and the rv64 patch path each repeat this bar. See
../TESTING.md.