LINK.md — the kit linker
The linker turns a set of relocatable inputs (objects, archives, shared objects, raw byte buffers) into a single resolved image: a static ET_EXEC, a position-independent ET_DYN, a partial ET_REL, or an in-process JIT mapping. It is a multi-format, multi-arch component built as a strict pipeline of passes over an immutable input set. This document describes that architecture — the layering, the data flow, and the invariants that hold the whole thing together. For how a resolved image becomes runnable in process see JIT.md; for the object-file read/write substrate underneath it see OBJ.md; for per-target relocation kinds and register/ABI detail see ARCH.md; for debug-section retention see DWARF.md.
Where the linker sits
The public surface is KitLinkSession (include/kit/link.h); the
driver tools ld, cc, run, and dbg drive it. Inside libkit the
real work is the internal Linker / LinkImage pair declared in
src/link/link.h and src/link/link_internal.h. The session is a thin
wrapper: it owns a Linker, accumulates inputs, and on resolve produces
a LinkImage. Path handling (reading bytes off disk, -l search paths,
sysroots) lives entirely in the driver — the library boundary is
byte-buffer-shaped. Every bytes input is read through Compiler.env-> file_io by the driver before it reaches the linker.
The two central abstractions:
Linker— the mutable accumulator. Holds the registered inputs (objects, archives, DSOs), the entry name, link-mode flags (PIE, static-exe, JIT, gc-sections, strip-debug), an optional linker script, and an optional external resolver. It is built up by thelink_add_*/link_set_*calls and then read by resolution. It is never rewritten by resolution.LinkImage— the resolved output. A fresh image is produced by everylink_resolve. It owns the symbol table, the section and segment tables, the per-segment byte buffers, the durable relocation records, and any synthesized dynamic-link / IFUNC / debug state. It is the read-side view consumed by the format emitters, the JIT mapper, and the debug-info view.
The load-bearing invariants
Three rules are stated in the link.h header comment and enforced throughout. They exist so that incremental re-resolution can be added without reworking the core; they also make the single-shot path easier to reason about.
- Inputs are never mutated; resolve is a function from inputs to a
fresh image.
link_resolvereads theLinkerand allocates a newLinkImage; it does not edit inputObjBuilders or rewrite theLinkerin place. Re-resolving the sameLinkerwould yield another independent image. LinkInputId/ObjBuilder*mappings are stable for the Linker's lifetime. Adding an input never invalidates an existing handle.ObjSymId/ObjSecIdare per-ObjBuilderid spaces, so each input carries anInputMap(link_internal.h) translating its local ids into the globalLinkSymId/LinkSectionIdspace.- Relocation records stay as data; they are never burned destructively
into segment bytes during resolve.
link_emit_relocationsproducesLinkRelocApplyrecords —(write site, kind, target symbol, addend)— and stores them on the image. The actual patching of bytes happens later, at emit time (format emitter) or map time (JIT mapper). The segment byte buffers produced during layout hold raw, unrelocated input bytes.
A fourth invariant governs addresses:
- Image-relative vaddr discipline. Every vaddr and file_offset on a
resolved
LinkImageis computed as if the image were based at 0. Layout, symbol vaddrs, GOT/PLT placement, and reloc write-sites are all in this coordinate system. Consumers add their own runtime base exactly once: the ELF emitter bumps everything byimg_base(shift_image_addressesin src/obj/elf/link.c — 0x400000 for static ET_EXEC, 0 for PIE/DSO so the loader picks the base), and the JIT mapper bumps by the chosen reservation address. Because relocations are re-derived from (post-shift) placements, an image can be shifted wholesale by adding one delta to every coordinate.
The pass pipeline
link_resolve (src/link/link_layout.c) orchestrates the whole pipeline.
The phases, with the file that owns each:
inputs (Linker)
|
| link_synth_coff_ctor_dtor_list (PE/COFF CRT boundary synth)
| link_ingest_archives ── archive member selection
v
link_resolve_symbols ── build global symbol table [resolve.c]
link_gc_compute ── --gc-sections liveness (BFS)
|
v
link_layout_sections ── bucket + place sections [layout.c]
link_layout_commons ── COMMON -> .bss.common
link_emit_segment_bytes ── copy raw input bytes
link_layout_debug ── carry .debug_* (file-only)
|
v
link_assign_symbol_vaddrs ── symbol -> vaddr [reloc_layout.c]
link_emit_*_boundaries ── __init_array_start, __tdata_*, __start_X ...
link_resolve_undefs ── globals / DSO imports / resolver
link_gc_drop_dead_globals
link_layout_iplt ── STT_GNU_IFUNC trampolines
link_layout_jit_stubs ── AArch64 JIT call islands
link_layout_got ── static-PIC .got
link_emit_relocations ── LinkRelocApply records
fmt->layout_dyn ── PIE/DSO synthetic dyn sections [obj/elf/link_dyn.c]
link_resolve_entry ── entry symbol lookup
link_capture_debug_inputs ── retain ObjBuilders for JIT view
|
v
LinkImage ──> link_emit_image_writer (format emit) | kit_jit_from_image
Phase 1 — input registration and archive selection (link.c, link_resolve.c)
link_add_obj borrows a caller-owned ObjBuilder. link_add_obj_bytes
detects the binary format, reads bytes into a linker-owned ObjBuilder,
and — via the format's classify_obj_input hook — reclassifies the input
as a DSO if the bytes are a shared object. link_add_dso_bytes parses an
ET_DYN explicitly, materializing only its exported (dynsym) symbols.
link_add_archive_bytes eagerly parses every member into an ObjBuilder
at registration time but defers the include/exclude decision to resolve.
A DSO input contributes nothing to layout. Its presence only influences resolution (an undef matched by name against its exports becomes an imported symbol) and DT_NEEDED bookkeeping (its SONAME, or filename fallback, is recorded as a runtime dependency).
Archive member selection (link_ingest_archives) is the demand-driven
pull familiar from GNU ld. --whole-archive members are pulled
unconditionally first. The rest are scanned in input order: for each
archive a presence scan (scan_presence_before) computes the set of
defined and still-wanted undefined globals from all inputs that come
before that archive in link order; any member that defines a wanted,
not-yet-defined global is pulled, and the scan repeats until a fixed
point so a freshly pulled member can drag in its own dependencies.
Spurious header-artifact undefs (unreferenced extern prototypes) are
excluded from the want set so an unused declaration never pulls a member.
Archives in the same nonzero group_id form a --start-group cycle.
Pulled members move into Linker.inputs and get stable ids like any
other input. PE/COFF has two special cases handled here: short-import
shim members route through the DSO path (their symbols are DLL exports),
and a synthetic ObjBuilder supplies the mingw CRT ctor/dtor boundary
symbols and an AArch64 __chkstk.
Phase 1 — symbol resolution (link_resolve.c)
link_resolve_symbols walks every (non-DSO) input's symbols, allocating
its InputMap and appending a LinkSymbol per local symbol while
building img->globals — an open-addressed name→LinkSymId hash for
global/weak definitions. Locals never enter that hash. When two inputs
define the same global, a binding-strength policy decides the winner:
GLOBAL beats WEAK beats LOCAL; two COMMON symbols merge to the larger
size with the stricter alignment; a real definition overrides a COMMON;
two strong definitions are an error — except COFF/PE SELECTANY, where two
COMDAT (SF_GROUP) definitions keep the earlier and mark the later
section discarded (recorded in InputMap.comdat_discarded, honored by GC
and layout).
link_resolve_undefs runs after layout has assigned vaddrs (it needs
them) and settles every still-undefined symbol: against a defined global
of the same name; else against a DSO export (becomes imported); else
against the external resolver (becomes an absolute address — this is the
JIT/host-symbol path); else a COFF mingw alias-by-naming-convention
fallback; else, for a weak undef, resolves to absolute zero; else it is a
hard "undefined reference" error. A JIT-mode escape hatch tolerates
Mach-O __tlv_bootstrap.
The atom model underlies GC and layout: an ObjBuilder section may be
subdivided into atoms (one function / one data object), and the
InputMap records, per section, which atoms are live and which
LinkSection each atom/section maps to. This lets --gc-sections
operate at function granularity.
Phase 1b — garbage collection (link_resolve.c)
With --gc-sections off, link_gc_compute simply marks every kept
allocatable section (or its atoms) live. With it on, GC is a BFS: roots
are the entry symbol, retained (SF_RETAIN) and init/fini-array
sections, and script-KEEP sections; the worklist follows relocations
from live sections/atoms to the symbols they reference, marking each
target's defining section/atom. __start_X / __stop_X references
promote every section named X. After layout, link_gc_drop_dead_ globals clears defined on symbols whose section was collected.
Phase 2 — section and segment layout (link_layout.c)
link_layout_sections (the default, non-scripted path) partitions kept
sections into four permission buckets — SEG_RX, SEG_R, SEG_RW,
SEG_TLS — and lays them out grouped by name within each bucket, in
first-occurrence order. Same-name contributions are placed adjacently so
the format emitter can merge them into one output section. NOBITS
(.bss, .tbss) sections are tracked as trailing zero-fill so a
segment's mem_size exceeds its file_size. One LinkSegment is
materialized per non-empty bucket; segments are assigned image-relative,
page-aligned vaddrs back-to-back from 0, and every section's vaddr /
file_offset is fixed up into its segment. A PIE quirk lives here:
read-only data carrying an absolute reloc is promoted from SEG_R to
SEG_RW, because the dynamic loader must write the relocated pointer
into the slot and a never-writable segment would fault.
link_layout_commons allocates all surviving COMMON symbols into a
synthetic .bss.common section appended to the writable segment,
assigning each its offset and rewriting it to a normal SK_OBJ
definition. link_emit_segment_bytes then copies each section's raw
input bytes into its segment buffer (skipping NOBITS) — no relocations
are applied, per invariant 3. On the JIT lane this byte copy is skipped:
the mapper copies input bytes straight into execmem.
link_layout_debug carries .debug_* sections through to AOT ELF/Mach-O
output as file-only LinkSections: they live in img->sections (so
their SK_SECTION symbols resolve and the reloc engine applies to them)
but carry segment_id == LINK_SEG_NONE and their own byte buffers in the
image's debug registry, getting no PT_LOAD. Same-name contributions are
assigned a per-name cumulative base (a DWARF-section-relative offset) so
the emitter merges them into one output section with correct
cross-section offsets. The JIT lane serves debug differently (via
kit_jit_view, see DWARF.md), so it skips this pass; strip
mode drops it. See DWARF.md for the producer/reader side.
Scripted layout (link_layout.c, link_script.c)
When a linker script is set, link_layout_sections_scripted replaces the
bucket path: it walks the script's output sections in declaration order,
placing matched input sections at a "dot" location counter, materializing
one segment per non-DISCARD output section and turning script symbol
assignments into defined global symbols. /DISCARD/ matches leave the
input section's InputMap slot as LINK_SEC_NONE, which downstream
passes already treat as "dropped". A scripted image is flagged so the
emitter keeps script-assigned absolute vaddrs and omits the
self-describing header PT_LOAD / build-id note.
The script itself is parsed by kit_link_script_parse (link_script.c),
a hand-written recursive-descent parser for a deliberately small GNU-ld
subset: ENTRY(sym), top-level and in-section symbol assignments with a
small arithmetic-expression grammar, . = expr dot moves and alignment,
SECTIONS { output : { input-matchers } }, and /DISCARD/. Unsupported
directives (MEMORY, PHDRS, PROVIDE, OVERLAY, OUTPUT_FORMAT,
GROUP, ...) are rejected with a diagnostic rather than silently
ignored. The linker accepts only the structured KitLinkScript form —
there is no text setter on the Linker; hosts that have GNU-ld text run
the parser first. Input matchers use a *-only glob.
Phase 3 — post-placement vaddr / boundary / GOT / PLT / IPLT (link_reloc_layout.c)
With sections placed, link_assign_symbol_vaddrs binds every defined
symbol to section.vaddr + (symbol.value - section.obj_offset). Then a
family of boundary passes synthesize the linker-defined globals that C
runtimes expect: __init_array_start/end, __fini_array_start/end,
preinit equivalents; the TLS boundaries __tdata_start/end and
__tbss_size; the encoding-section __start_X / __stop_X pairs; and
target/format globals such as _GLOBAL_OFFSET_TABLE_, _DYNAMIC,
__dso_handle, the RISC-V global pointer, and PE __ImageBase.
Three synthesis passes append new segments/sections to the image after
the user payload (each using the link_iplt_alloc_* growth helpers,
which keep image-owned tables resizable):
link_layout_iplt— for every definedSTT_GNU_IFUNCsymbol, builds a per-arch resolver trampoline plus its.igot.pltslot and records the(resolver_vaddr, slot_vaddr)pairs on the image. On the static-exe path it also wires a.init_arrayentry calling__kit_ifunc_initso slots are filled at startup; the JIT path resolves them in process instead.link_layout_jit_stubs— AArch64-only, JIT-lane: synthesizes call/jump islands forCALL26/JUMP26relocs whose targets may sit outside ±128 MB of the call site once mapped, and returns a per-symbol stub map.link_layout_got— a static-PIC.got: scans relocations for GOT-relative kinds (reloc_uses_got), allocates one 8-byte slot per referenced symbol in a single exactly-sized.gotsegment placed after everything, defines a local symbol per slot, and emits anR_ABS64record to fill each slot. Returns a per-symbol GOT map.
Phase 4 — relocation emission (link_reloc_layout.c) and dynamic synthesis
link_emit_relocations walks every input relocation, skips ones whose
source section was dropped, redirects GOT-using relocs to their GOT slot
(via the GOT map) and AArch64 JIT-call relocs to their stub (via the stub
map), and emits a LinkRelocApply record with the write site in
image-relative coordinates, the resolved target LinkSymId, the kind,
and the addend. These records are the durable, non-destructive output of
resolve (invariant 3); nothing is patched into bytes yet.
For PIE/DSO output the format's layout_dyn hook (src/obj/elf/link_dyn.c)
then synthesizes the dynamic sections — .interp, .dynsym, .dynstr,
.gnu.hash, .plt, .got.plt, .rela.plt, .rela.dyn, .dynamic —
recording one JUMP_SLOT per imported function and a PLT entry per import.
Its layout invariants (dynsym slot 0 reserved, imports ordered
PLT-functions-then-GOT-data, the three reserved .got.plt slots) live in
LinkDynState (link_internal.h). The .rela.dyn RELATIVE tail is filled
during emit, when internal absolute relocs are seen.
link_resolve_entry looks up the entry symbol (the per-format default
from obj_format_default_entry_name: _start for ELF, _main for
Mach-O) and stamps it on the image.
Emit / consume
link_emit_image_writer dispatches by target object format to the ELF /
Mach-O / COFF link_emit function. That emitter is where invariant 4's
shift happens (shift_image_addresses) and where the LinkRelocApply
records are finally applied into the output bytes (apply_all_relocs),
with imported targets routed through PLT/GLOB_DAT and internal absolutes
turned into RELATIVE records under PIE. Image identity
(link_image_id_compute) is a format-agnostic 16-byte hash over
post-shift segment bytes and vaddrs, wrapped per-format (ELF build-id
note, Mach-O LC_UUID, PE debug directory). See OBJ.md for the
format writers. Alternatively kit_jit_from_image maps the image into
executable memory — that is JIT.md's territory.
Partial / relocatable linking (link_relocatable.c)
ld -r is a deliberately separate path: link_emit_relocatable_writer
builds a fresh ObjBuilder rather than a LinkImage. A relocatable
output must preserve object-file structure — keep non-alloc sections,
leave unresolved externals as relocatable references, assign no final
vaddrs, synthesize no GOT/PLT/IFUNC/entry state. So it merges input
sections into compatible output sections, merges globals (with the same
binding-strength policy, including COMMON merging), copies symbols,
COMDAT groups, and relocations with their symbol/section references
rewritten into the output id space, then emits through the object-format
writer. Archive ingest still runs (so -r over an archive pulls members),
but linker scripts are rejected on this path.
Incremental linking
Incremental relink avoids paying O(whole program) for a one-line edit.
The four invariants exist precisely to keep this addable without
reworking the core. There are two tiers, and they are at different
levels of realization.
Append-only, in-process (JIT) — the realized mechanism
This is the one incremental path that exists today. A live JIT image
grows by appending new objects without ever moving a previously
published address. It lives on the JIT side (kit_jit_append_obj,
append cursors and reserved per-bucket slack in src/link/link_jit.c) and
serves kit dbg. Its hard invariant is that any observable runtime
address — a lookup result, a breakpoint, a return address, a DWARF PC
range — never changes: new code may reference old code, old code is
never repatched, and an append that would exhaust a bucket's reserved
slack fails rather than relocating. See JIT.md.
Forward-compat surface for file-based patch (AOT) — designed only
Two internal entry points — link_resolve_at (base-pinned resolve) and
link_resolve_extend (append new inputs to an existing image) — are
declared and reserved but not yet implemented; they are panic stubs. They
exist so the invariants above have a concrete shape to satisfy, not as a
working feature. The intended design they anchor: patch a prior on-disk
image instead of relinking from scratch — diff a changed input's atoms by
content hash against a persisted placement table, overwrite unchanged-size
atoms in per-atom slack, relocate grown atoms via a move primitive (a jump
island, later a GOT cell), and re-derive only the touched relocations from
current placements. The design is gated by a soundness check: apply
incrementally only when an edit provably cannot change symbol resolution
(no added/removed/rebound global, no new archive pull-in, no
COMDAT-ownership flip, no TLS/import size change, no slack exhaustion);
otherwise fall back to a full — but in-memory, so cheap — relink, because
a correct-but-slow result always beats a fast-but-wrong one. The substrate
that design leans on — the durable LinkRelocApply records, the stable
input-id mapping, and atom granularity — is the same substrate the
realized JIT path already uses.