LINK.md — the kit linker

The linker turns a set of relocatable inputs (objects, archives, shared objects, raw byte buffers) into a single resolved image: a static ET_EXEC, a position-independent ET_DYN, a partial ET_REL, or an in-process JIT mapping. It is a multi-format, multi-arch component built as a strict pipeline of passes over an immutable input set. This document describes that architecture — the layering, the data flow, and the invariants that hold the whole thing together. For how a resolved image becomes runnable in process see JIT.md; for the object-file read/write substrate underneath it see OBJ.md; for per-target relocation kinds and register/ABI detail see ARCH.md; for debug-section retention see DWARF.md.

Where the linker sits

The public surface is KitLinkSession (include/kit/link.h); the driver tools ld, cc, run, and dbg drive it. Inside libkit the real work is the internal Linker / LinkImage pair declared in src/link/link.h and src/link/link_internal.h. The session is a thin wrapper: it owns a Linker, accumulates inputs, and on resolve produces a LinkImage. Path handling (reading bytes off disk, -l search paths, sysroots) lives entirely in the driver — the library boundary is byte-buffer-shaped. Every bytes input is read through Compiler.env-> file_io by the driver before it reaches the linker.

The two central abstractions:

Linker — the mutable accumulator. Holds the registered inputs (objects, archives, DSOs), the entry name, link-mode flags (PIE, static-exe, JIT, gc-sections, strip-debug), an optional linker script, and an optional external resolver. It is built up by the link_add_* / link_set_* calls and then read by resolution. It is never rewritten by resolution.
LinkImage — the resolved output. A fresh image is produced by every link_resolve. It owns the symbol table, the section and segment tables, the per-segment byte buffers, the durable relocation records, and any synthesized dynamic-link / IFUNC / debug state. It is the read-side view consumed by the format emitters, the JIT mapper, and the debug-info view.

The load-bearing invariants

Three rules are stated in the link.h header comment and enforced throughout. They exist so that incremental re-resolution can be added without reworking the core; they also make the single-shot path easier to reason about.

Inputs are never mutated; resolve is a function from inputs to a fresh image. link_resolve reads the Linker and allocates a new LinkImage; it does not edit input ObjBuilders or rewrite the Linker in place. Re-resolving the same Linker would yield another independent image.
LinkInputId / ObjBuilder* mappings are stable for the Linker's lifetime. Adding an input never invalidates an existing handle. ObjSymId / ObjSecId are per-ObjBuilder id spaces, so each input carries an InputMap (link_internal.h) translating its local ids into the global LinkSymId / LinkSectionId space.
Relocation records stay as data; they are never burned destructively into segment bytes during resolve. link_emit_relocations produces LinkRelocApply records — (write site, kind, target symbol, addend) — and stores them on the image. The actual patching of bytes happens later, at emit time (format emitter) or map time (JIT mapper). The segment byte buffers produced during layout hold raw, unrelocated input bytes.

A fourth invariant governs addresses:

Image-relative vaddr discipline. Every vaddr and file_offset on a resolved LinkImage is computed as if the image were based at 0. Layout, symbol vaddrs, GOT/PLT placement, and reloc write-sites are all in this coordinate system. Consumers add their own runtime base exactly once: the ELF emitter bumps everything by img_base (shift_image_addresses in src/obj/elf/link.c — 0x400000 for static ET_EXEC, 0 for PIE/DSO so the loader picks the base), and the JIT mapper bumps by the chosen reservation address. Because relocations are re-derived from (post-shift) placements, an image can be shifted wholesale by adding one delta to every coordinate.

The pass pipeline

link_resolve (src/link/link_layout.c) orchestrates the whole pipeline. The phases, with the file that owns each:

  inputs (Linker)
    |
    | link_synth_coff_ctor_dtor_list   (PE/COFF CRT boundary synth)
    | link_ingest_archives             ── archive member selection
    v
  link_resolve_symbols                 ── build global symbol table   [resolve.c]
  link_gc_compute                      ── --gc-sections liveness (BFS)
    |
    v
  link_layout_sections                 ── bucket + place sections     [layout.c]
  link_layout_commons                  ── COMMON -> .bss.common
  link_emit_segment_bytes              ── copy raw input bytes
  link_layout_debug                    ── carry .debug_* (file-only)
    |
    v
  link_assign_symbol_vaddrs            ── symbol -> vaddr              [reloc_layout.c]
  link_emit_*_boundaries               ── __init_array_start, __tdata_*, __start_X ...
  link_resolve_undefs                  ── globals / DSO imports / resolver
  link_gc_drop_dead_globals
  link_layout_iplt                     ── STT_GNU_IFUNC trampolines
  link_layout_jit_stubs                ── AArch64 JIT call islands
  link_layout_got                      ── static-PIC .got
  link_emit_relocations                ── LinkRelocApply records
  fmt->layout_dyn                       ── PIE/DSO synthetic dyn sections [obj/elf/link_dyn.c]
  link_resolve_entry                   ── entry symbol lookup
  link_capture_debug_inputs            ── retain ObjBuilders for JIT view
    |
    v
  LinkImage  ──> link_emit_image_writer (format emit) | kit_jit_from_image

Phase 1 — input registration and archive selection (link.c, link_resolve.c)

link_add_obj borrows a caller-owned ObjBuilder. link_add_obj_bytes detects the binary format, reads bytes into a linker-owned ObjBuilder, and — via the format's classify_obj_input hook — reclassifies the input as a DSO if the bytes are a shared object. link_add_dso_bytes parses an ET_DYN explicitly, materializing only its exported (dynsym) symbols. link_add_archive_bytes eagerly parses every member into an ObjBuilder at registration time but defers the include/exclude decision to resolve.

A DSO input contributes nothing to layout. Its presence only influences resolution (an undef matched by name against its exports becomes an imported symbol) and DT_NEEDED bookkeeping (its SONAME, or filename fallback, is recorded as a runtime dependency).

Archive member selection (link_ingest_archives) is the demand-driven pull familiar from GNU ld. --whole-archive members are pulled unconditionally first. The rest are scanned in input order: for each archive a presence scan (scan_presence_before) computes the set of defined and still-wanted undefined globals from all inputs that come before that archive in link order; any member that defines a wanted, not-yet-defined global is pulled, and the scan repeats until a fixed point so a freshly pulled member can drag in its own dependencies. Spurious header-artifact undefs (unreferenced extern prototypes) are excluded from the want set so an unused declaration never pulls a member. Archives in the same nonzero group_id form a --start-group cycle. Pulled members move into Linker.inputs and get stable ids like any other input. PE/COFF has two special cases handled here: short-import shim members route through the DSO path (their symbols are DLL exports), and a synthetic ObjBuilder supplies the mingw CRT ctor/dtor boundary symbols and an AArch64 __chkstk.

Phase 1 — symbol resolution (link_resolve.c)

link_resolve_symbols walks every (non-DSO) input's symbols, allocating its InputMap and appending a LinkSymbol per local symbol while building img->globals — an open-addressed name→LinkSymId hash for global/weak definitions. Locals never enter that hash. When two inputs define the same global, a binding-strength policy decides the winner: GLOBAL beats WEAK beats LOCAL; two COMMON symbols merge to the larger size with the stricter alignment; a real definition overrides a COMMON; two strong definitions are an error — except COFF/PE SELECTANY, where two COMDAT (SF_GROUP) definitions keep the earlier and mark the later section discarded (recorded in InputMap.comdat_discarded, honored by GC and layout).

link_resolve_undefs runs after layout has assigned vaddrs (it needs them) and settles every still-undefined symbol: against a defined global of the same name; else against a DSO export (becomes imported); else against the external resolver (becomes an absolute address — this is the JIT/host-symbol path); else a COFF mingw alias-by-naming-convention fallback; else, for a weak undef, resolves to absolute zero; else it is a hard "undefined reference" error. A JIT-mode escape hatch tolerates Mach-O __tlv_bootstrap.

The atom model underlies GC and layout: an ObjBuilder section may be subdivided into atoms (one function / one data object), and the InputMap records, per section, which atoms are live and which LinkSection each atom/section maps to. This lets --gc-sections operate at function granularity.

Phase 1b — garbage collection (link_resolve.c)

With --gc-sections off, link_gc_compute simply marks every kept allocatable section (or its atoms) live. With it on, GC is a BFS: roots are the entry symbol, retained (SF_RETAIN) and init/fini-array sections, and script-KEEP sections; the worklist follows relocations from live sections/atoms to the symbols they reference, marking each target's defining section/atom. __start_X / __stop_X references promote every section named X. After layout, link_gc_drop_dead_ globals clears defined on symbols whose section was collected.

Phase 2 — section and segment layout (link_layout.c)

link_layout_sections (the default, non-scripted path) partitions kept sections into four permission buckets — SEG_RX, SEG_R, SEG_RW, SEG_TLS — and lays them out grouped by name within each bucket, in first-occurrence order. Same-name contributions are placed adjacently so the format emitter can merge them into one output section. NOBITS (.bss, .tbss) sections are tracked as trailing zero-fill so a segment's mem_size exceeds its file_size. One LinkSegment is materialized per non-empty bucket; segments are assigned image-relative, page-aligned vaddrs back-to-back from 0, and every section's vaddr / file_offset is fixed up into its segment. A PIE quirk lives here: read-only data carrying an absolute reloc is promoted from SEG_R to SEG_RW, because the dynamic loader must write the relocated pointer into the slot and a never-writable segment would fault.

link_layout_commons allocates all surviving COMMON symbols into a synthetic .bss.common section appended to the writable segment, assigning each its offset and rewriting it to a normal SK_OBJ definition. link_emit_segment_bytes then copies each section's raw input bytes into its segment buffer (skipping NOBITS) — no relocations are applied, per invariant 3. On the JIT lane this byte copy is skipped: the mapper copies input bytes straight into execmem.

link_layout_debug carries .debug_* sections through to AOT ELF/Mach-O output as file-only LinkSections: they live in img->sections (so their SK_SECTION symbols resolve and the reloc engine applies to them) but carry segment_id == LINK_SEG_NONE and their own byte buffers in the image's debug registry, getting no PT_LOAD. Same-name contributions are assigned a per-name cumulative base (a DWARF-section-relative offset) so the emitter merges them into one output section with correct cross-section offsets. The JIT lane serves debug differently (via kit_jit_view, see DWARF.md), so it skips this pass; strip mode drops it. See DWARF.md for the producer/reader side.

Scripted layout (link_layout.c, link_script.c)

When a linker script is set, link_layout_sections_scripted replaces the bucket path: it walks the script's output sections in declaration order, placing matched input sections at a "dot" location counter, materializing one segment per non-DISCARD output section and turning script symbol assignments into defined global symbols. /DISCARD/ matches leave the input section's InputMap slot as LINK_SEC_NONE, which downstream passes already treat as "dropped". A scripted image is flagged so the emitter keeps script-assigned absolute vaddrs and omits the self-describing header PT_LOAD / build-id note.

The script itself is parsed by kit_link_script_parse (link_script.c), a hand-written recursive-descent parser for a deliberately small GNU-ld subset: ENTRY(sym), top-level and in-section symbol assignments with a small arithmetic-expression grammar, . = expr dot moves and alignment, SECTIONS { output : { input-matchers } }, and /DISCARD/. Unsupported directives (MEMORY, PHDRS, PROVIDE, OVERLAY, OUTPUT_FORMAT, GROUP, ...) are rejected with a diagnostic rather than silently ignored. The linker accepts only the structured KitLinkScript form — there is no text setter on the Linker; hosts that have GNU-ld text run the parser first. Input matchers use a *-only glob.

Phase 3 — post-placement vaddr / boundary / GOT / PLT / IPLT (link_reloc_layout.c)

With sections placed, link_assign_symbol_vaddrs binds every defined symbol to section.vaddr + (symbol.value - section.obj_offset). Then a family of boundary passes synthesize the linker-defined globals that C runtimes expect: __init_array_start/end, __fini_array_start/end, preinit equivalents; the TLS boundaries __tdata_start/end and __tbss_size; the encoding-section __start_X / __stop_X pairs; and target/format globals such as _GLOBAL_OFFSET_TABLE_, _DYNAMIC, __dso_handle, the RISC-V global pointer, and PE __ImageBase.

Three synthesis passes append new segments/sections to the image after the user payload (each using the link_iplt_alloc_* growth helpers, which keep image-owned tables resizable):

link_layout_iplt — for every defined STT_GNU_IFUNC symbol, builds a per-arch resolver trampoline plus its .igot.plt slot and records the (resolver_vaddr, slot_vaddr) pairs on the image. On the static-exe path it also wires a .init_array entry calling __kit_ifunc_init so slots are filled at startup; the JIT path resolves them in process instead.
link_layout_jit_stubs — AArch64-only, JIT-lane: synthesizes call/jump islands for CALL26 / JUMP26 relocs whose targets may sit outside ±128 MB of the call site once mapped, and returns a per-symbol stub map.
link_layout_got — a static-PIC .got: scans relocations for GOT-relative kinds (reloc_uses_got), allocates one 8-byte slot per referenced symbol in a single exactly-sized .got segment placed after everything, defines a local symbol per slot, and emits an R_ABS64 record to fill each slot. Returns a per-symbol GOT map.

Phase 4 — relocation emission (link_reloc_layout.c) and dynamic synthesis

link_emit_relocations walks every input relocation, skips ones whose source section was dropped, redirects GOT-using relocs to their GOT slot (via the GOT map) and AArch64 JIT-call relocs to their stub (via the stub map), and emits a LinkRelocApply record with the write site in image-relative coordinates, the resolved target LinkSymId, the kind, and the addend. These records are the durable, non-destructive output of resolve (invariant 3); nothing is patched into bytes yet.

For PIE/DSO output the format's layout_dyn hook (src/obj/elf/link_dyn.c) then synthesizes the dynamic sections — .interp, .dynsym, .dynstr, .gnu.hash, .plt, .got.plt, .rela.plt, .rela.dyn, .dynamic — recording one JUMP_SLOT per imported function and a PLT entry per import. Its layout invariants (dynsym slot 0 reserved, imports ordered PLT-functions-then-GOT-data, the three reserved .got.plt slots) live in LinkDynState (link_internal.h). The .rela.dyn RELATIVE tail is filled during emit, when internal absolute relocs are seen.

link_resolve_entry looks up the entry symbol (the per-format default from obj_format_default_entry_name: _start for ELF, _main for Mach-O) and stamps it on the image.

Emit / consume

link_emit_image_writer dispatches by target object format to the ELF / Mach-O / COFF link_emit function. That emitter is where invariant 4's shift happens (shift_image_addresses) and where the LinkRelocApply records are finally applied into the output bytes (apply_all_relocs), with imported targets routed through PLT/GLOB_DAT and internal absolutes turned into RELATIVE records under PIE. Image identity (link_image_id_compute) is a format-agnostic 16-byte hash over post-shift segment bytes and vaddrs, wrapped per-format (ELF build-id note, Mach-O LC_UUID, PE debug directory). See OBJ.md for the format writers. Alternatively kit_jit_from_image maps the image into executable memory — that is JIT.md's territory.

Partial / relocatable linking (link_relocatable.c)

ld -r is a deliberately separate path: link_emit_relocatable_writer builds a fresh ObjBuilder rather than a LinkImage. A relocatable output must preserve object-file structure — keep non-alloc sections, leave unresolved externals as relocatable references, assign no final vaddrs, synthesize no GOT/PLT/IFUNC/entry state. So it merges input sections into compatible output sections, merges globals (with the same binding-strength policy, including COMMON merging), copies symbols, COMDAT groups, and relocations with their symbol/section references rewritten into the output id space, then emits through the object-format writer. Archive ingest still runs (so -r over an archive pulls members), but linker scripts are rejected on this path.

Incremental linking

Incremental relink avoids paying O(whole program) for a one-line edit. The four invariants exist precisely to keep this addable without reworking the core. There are two tiers, and they are at different levels of realization.

Append-only, in-process (JIT) — the realized mechanism

This is the one incremental path that exists today. A live JIT image grows by appending new objects without ever moving a previously published address. It lives on the JIT side (kit_jit_append_obj, append cursors and reserved per-bucket slack in src/link/link_jit.c) and serves kit dbg. Its hard invariant is that any observable runtime address — a lookup result, a breakpoint, a return address, a DWARF PC range — never changes: new code may reference old code, old code is never repatched, and an append that would exhaust a bucket's reserved slack fails rather than relocating. See JIT.md.

Forward-compat surface for file-based patch (AOT) — designed only

Two internal entry points — link_resolve_at (base-pinned resolve) and link_resolve_extend (append new inputs to an existing image) — are declared and reserved but not yet implemented; they are panic stubs. They exist so the invariants above have a concrete shape to satisfy, not as a working feature. The intended design they anchor: patch a prior on-disk image instead of relinking from scratch — diff a changed input's atoms by content hash against a persisted placement table, overwrite unchanged-size atoms in per-atom slack, relocate grown atoms via a move primitive (a jump island, later a GOT cell), and re-derive only the touched relocations from current placements. The design is gated by a soundness check: apply incrementally only when an edit provably cannot change symbol resolution (no added/removed/rebound global, no new archive pull-in, no COMDAT-ownership flip, no TLS/import size change, no slack exhaustion); otherwise fall back to a full — but in-memory, so cheap — relink, because a correct-but-slow result always beats a fast-but-wrong one. The substrate that design leans on — the durable LinkRelocApply records, the stable input-id mapping, and atom granularity — is the same substrate the realized JIT path already uses.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README