JIT
kit's in-process JIT maps a fully linked program into the running
process's address space and hands back callable function pointers. There
is no separate "JIT compiler": the same linker that writes ELF/Mach-O/PE
files produces a resolved LinkImage, and the JIT mapper copies that
image into executable memory, applies relocations against the live
runtime addresses, and exposes a symbol/inspector surface. The mapper is
kit_jit_from_image in src/link/link_jit.c. The kit run driver
(driver/cmd/run.c) is the headline consumer; the JIT debugger (see
DBG.md) and the emulator's block translator (see EMU.md)
ride on the same mapping primitives.
Where the JIT sits
inputs (.c/.o/.a/.wat)
| frontend + codegen (see FRONTENDS.md, CODEGEN.md)
v
ObjBuilders --link_add_obj--> Linker (jit_mode=1, jit_host set)
| resolve + layout (see LINK.md)
v
LinkImage (segments, relocs, resolved symbols; bytes NOT serialized)
| kit_jit_from_image (src/link/link_jit.c)
v
KitJit (mapped exec memory + symbol inspector)
| kit_jit_lookup / addr_to_sym / sym_iter / view
v
host code calls the JITed entry in-process
The frontend-to-image half is shared verbatim with the file linker. A
KitLinkSession opened with output kind KIT_LINK_OUTPUT_JIT (see
src/api/link.c) sets two pieces of state on the Linker: jit_mode,
which tells layout to skip file serialization and synthesize JIT-only
stubs/GOT, and the JIT host (KitJitHost), the vtable through which
libkit reaches the executable-memory allocator without itself depending
on any OS. kit_link_session_jit then calls
kit_jit_from_image, transferring ownership of the image (and the
linker that backs it) into the returned KitJit.
Because the code runs in this process, the JIT only produces runnable
code when the target arch and object format match the host. The driver
defaults the target to the host, but -target overrides it without a
guard: libkit lowers and lays out for whatever target the compiler was
created with (kit_jit_image_arch simply reports target.arch), so a
cross-target kit run emits native code the host CPU cannot execute
and fails at runtime rather than with a diagnostic. Enforcing
target==host is the caller's responsibility. The JIT always lowers PIC;
-fPIC/-fPIE/-mcmodel are accepted by the driver but have no
observable effect.
The single contiguous reservation
The defining invariant of kit_jit_from_image is that the entire
image lives in one execmem->reserve mapping. Layout assigns every
segment a page-aligned image vaddr inside a single span [image_base, image_end); the mapper reserves that whole span (plus append slack, see
below) in one call and then treats each segment as a sub-range at offset
vaddr - image_base. No segment gets its own independent mmap.
This exists to keep inter-segment displacements in branch/addressing
range. AArch64 ADRP reaches ±4 GiB and CALL26 reaches ±128 MiB;
RISC-V AUIPC+branch and x86-64 RIP-relative loads have their own
windows. If code, rodata, and data were three independent mappings, the
OS could scatter them gigabytes apart and a perfectly legal cross-segment
reference would overflow the relocation's range check. One reservation
makes every intra-image displacement a function of the layout, not of
where the kernel happened to place three separate regions. The same
property is what lets a weak-undef symbol resolve through a zero-valued
slot, and what lets the far-call stubs (below) sit close enough to their
call sites.
vaddr_to_runtime / vaddr_to_write translate an image vaddr to the
two aliases of the master mapping (see W^X below). They scan the handful
of segments linearly, with a second pass that resolves a vaddr landing
exactly on a segment's one-past-end boundary (e.g. __fini_array_end).
Reloc-apply runs in-process on the shared path
The JIT does not have its own relocation engine. It iterates the image's
LinkRelocApply records and calls the same link_reloc_apply used by
the file writers (see LINK.md). The only JIT-specific twist is
the address arithmetic: the patch bytes are written through the write
alias, while the symbol value S and the patch-site address P are the
runtime alias addresses, because that is where the CPU will execute and
fetch from. A handful of relocation kinds get special in-process
handling before reaching link_reloc_apply:
- TLS-LE (ELF, AArch64/RISC-V/x86-64): a reloc whose
RelocDesccarriesRELOC_IS_TLS_LE. The loop stays arch-neutral: it computes the variable's in-image storage address and delegates the idiom rewrite to the arch'sLinkArchDesc.jit_tls_le_relax, which drops the thread-pointer read. See "Thread-local storage" below. - RISC-V
PCREL_LO12_I/S: the low-12 half of anAUIPCpair targets a local anchor at the pairedHI20site. The mapper finds that paired reloc, recomputes the displacement against runtime addresses, and feeds it to theLO12encoder so the two halves agree. - Weak-undef (
SK_ABS, bind weak, vaddr 0): address-of must yield NULL. An AArch64ADRP/ADDpair against such a target would compute a displacement far outside ±4 GiB once the image is placed away from address 0, tripping the range check, so theADRPis rewritten toMOVZ Xd,#0and the pairedADD #0left as-is. - Mach-O
TLVP_LOAD_PAGEOFF12: the Mach-O TLV access is collapsed to an ordinary in-image load (see "Thread-local storage" below). The mapper rewrites the__thread_ptrsload to anadd(descriptor address into the register), then rewrites the following thunk-load toldr xN,[xN,#16]and nops theblr— so the register ends up holding the variable's in-image storage address with no thunk call.
After relocations are applied, IFUNC resolvers (ELF only) are run
in-process and their results stored into the iplt slots, .init_array
constructors are run in forward order, and kit_jit_run_dtors runs
.fini_array in reverse on teardown.
Call stubs and GOT slots for host calls
JITed code routinely needs to call into the host process — libc, or any
symbol resolvable via the link session's extern resolver. Those targets
resolve to real host addresses (SK_ABS), which can be arbitrarily far
from the JIT mapping. A direct CALL26/JUMP26 to libc would overflow
the branch range, and a far data reference needs an indirection slot.
Layout solves both with JIT-only passes (gated on jit_mode, skipped for
static-exe output) in src/link/link_layout.c and
src/link/link_reloc_layout.c:
link_layout_jit_stubssynthesizes, for eachSK_ABStarget hit by a call relocation, a PLT-style stub plus an 8-byte pointer slot. The stub loads the slot and branches indirectly (e.g. AArch64ADRP/LDR/BR); the slot is filled by a per-slotR_ABS64against a synthetic resolver-pointer symbol that preserves the original host vaddr.link_emit_relocationsthen redirects the originalCALL26/JUMP26to the stub. The arch backend supplies the stub shape vianeeds_jit_call_stub/emit_iplt_stub/iplt_stub_size— the same machinery the ELF iplt uses (see ARCH.md).link_layout_gotmaterializes one GOT slot per GOT-referenced symbol with anR_ABS64, and rewritesADR_GOT_PAGE/LD64_GOT_LO12to point at the slot. Weak-undef GOT slots simply hold 0.
Both the stub section and the slot section are ordinary subsegments of
the single contiguous reservation, so the redirected call's branch stays
in range. The net effect: kit run hello.c can call printf even
though printf lives in a libc loaded megabytes away.
Executable memory: the host vtable and W^X
libkit never calls mmap, VirtualAlloc, or mach_vm_remap itself.
All executable-memory operations go through KitExecMem in the JIT
host (include/kit/jit.h): reserve, protect, release,
flush_icache, and a page_size. The driver supplies the concrete
adapter via driver_env_to_jit_host (driver/env/). The contract has
two distinct shapes:
- Dual-mapping (Apple silicon via
mach_vm_remapindriver/env/macos.c; Linux viamemfd_createinlinux.c; FreeBSD viamemfd/shm_openinfreebsd.c; Windows viaCreateFileMappingWinwindows.c).reservewithKIT_PROT_EXECreturns two virtual addresses backing the same physical pages: awritealias (RW, never X) and aruntimealias (X afterprotect, never W). The mapper populates through the write alias and the CPU fetches from the runtime alias, so no page is ever simultaneously writable and executable. A process-wide registry (exec_dual_*indriver/env/posix.c) lets the debugger recover the write alias for a given runtime address. - Single-mapping (
execmem_reserve_single, non-exec reservations or hosts without a dual-map primitive).writeandruntimeare the same address andprotectflips RW↔RX viamprotect.
The mapper requests KIT_PROT_EXEC on the master reservation if any
segment is executable (triggering the dual-mapping path), populates and
relocates through the write alias, then protects each segment's runtime
sub-range to its final perms. EXEC segments get flush_icache against
the runtime alias. On x86 this is a no-op: instruction fetches are
coherent with stores on the same core, and dispatch into freshly written
code always crosses a serializing return/call, so no explicit flush is
needed (the rationale is spelled out in driver/env/icache_x86.c). ARM
and RISC-V have separate I/D caches and do need an explicit flush
(__builtin___clear_cache / sys_icache_invalidate; see
driver/env/icache_*.c). The append-slack tail is protected PROT_NONE
until needed.
An absent or incomplete execmem vtable is a hard error
(compiler_panic): the JIT cannot run without one. page_size is taken
from the adapter, falling back to 0x4000 if the adapter reports 0; the
POSIX adapter fills it from sysconf(_SC_PAGESIZE).
Thread-local storage
The JIT is single-threaded, which collapses the whole problem: with one
thread there is exactly one instance of each thread-local, which is
semantically an ordinary global living in the image's .tdata/.tbss.
The mapper already materializes those sections (init bytes copied for
.tdata, zero-filled for .tbss), and perms_for maps the TLS segment
read-write in the JIT (the AOT image keeps it as a read-only init
template each thread copies). So the only work is to make every TLS
access resolve to the in-image storage without touching the host thread
pointer — reading the host's tpidr_el0/fs/tp would alias into the
host process's own TLS, which is both wrong (no initializer) and unsafe
(it scribbles on host libc state). All access lowering is therefore
relaxed at map time to in-image addressing; no thunk, no per-thread
block, no host TLS vtable.
The per-arch idiom rewrite lives behind an arch hook, not in the mapper:
the reloc loop classifies a TLS access via the RelocDesc flags
(RELOC_IS_TLVP for Mach-O, RELOC_IS_TLS_LE for ELF Local-Exec) and
delegates to the arch's LinkArchDesc.jit_tls_le_relax (ELF) or applies
the Mach-O TLVP relaxation inline. The mapper itself carries no arch
switch.
Mach-O (AArch64): codegen emits Apple's TLV sequence (load the 24-byte descriptor, load
descriptor[+0]as a thunk,blrit). dyld would rewrite that slot and allocate a pthread key; a JIT image is never seen by dyld. Instead the mapper leaves the descriptor's+16slot holding the variable's in-image storage address (the normalR_ABS64against the storage symbol) and relaxes the access to read it directly: the__thread_ptrsload becomesadd(descriptor address), the thunk-load becomesldr xN,[xN,#16], and theblrbecomes a nop.ELF (AArch64/RISC-V/x86-64):
jit_tls_le_relaxrewrites the per-arch Local-Exec idiom in place to address the in-image storage: AArch64mrs tpidr_el0; add #hi12; add #lo12→adrp; add :lo12:; nop; RISC-Vlui %tprel_hi; add tp; addi %tprel_lo→auipc %pcrel_hi; nop; addi %pcrel_lo; x86-64mov rd,fs:[0]; lea rd,[rd+tpoff]→nop…; lea rd,[rip+&var]. Codegen emits the idiom per-access and contiguous, so the primary (HI/offset) reloc drives the whole rewrite and the LO12 half is a no-op. The hook is in each arch'sreloc.cnext to itsreloc_apply_insnbyte encoders.COFF/Windows TLS (the TEB →
_tls_index→ TLS-array idiom) is not yet relaxed for the JIT — a follow-up (it also has to neutralize the_tls_indexextern the sequence references).
kit_jit_tls_addr gives host/interpreter code the same resolution from
the address a thread-local's symbol resolves to (Mach-O: read
descriptor[+16]; ELF/COFF: the symbol is the storage), range-checked
against the image so a foreign/extern thread-local resolved through the
host is rejected rather than dereferenced.
A subtlety the access lowering imposes on codegen: the Mach-O TLV sequence
materializes the descriptor in x0 and (in the AOT form) calls the
resolver thunk via x16, clobbering x0/x16/x17/lr. Codegen is
shared between AOT and JIT, so the optimizer must model that clobber set
or a value left live in x0 across a TLS access is corrupted at -O1+.
The backend reports it via NATIVE_MOP_TLS_ADDR from
machine_op_clobbers (ELF Local-Exec, which uses only the destination
register, reports none).
Symbol and inspector surface
KitJit exposes a read-only view of the mapped program (declared in
include/kit/jit.h):
kit_jit_lookup— name (C-mangled per target) to runtime address, GLOBAL-bind defined symbols only.kit_jit_sym_iter_*— walk every defined, user-visible symbol (functions, objects, common, TLS, ifunc, abs; mapping/section/file symbols filtered out), yielding demangled names and runtime addresses.kit_jit_addr_to_sym— runtime PC to nearest containing symbol + offset, for backtraces and disassembly annotation.kit_jit_runtime_to_image/kit_jit_image_to_runtime/kit_jit_image_contains— translate and bounds-check between the runtime alias and the image vaddr space DWARF was emitted in; the debugger crosses this boundary at every stop.kit_jit_view— a lazily built, in-memoryKitObjFilethat concatenates the debug sections of every input and resolves their cross-section relocations, so a DWARF consumer sees one coherent object even for a multi-input JIT.SK_SECTIONrelocs are resolved against a per-input prefix-size snapshot so merged CUs land their offsets correctly; code/data relocs resolve to final image vaddrs. See DWARF.md and DBG.md.
Incremental append
A KitJit reserves append slack (RX/R/RW/TLS buckets, protected
PROT_NONE initially) past the image end so additional objects can be
linked into the live mapping without a full relink. kit_jit_publish
with KIT_JIT_PUBLISH_APPEND_OBJECTS runs jit_append_obj_inner: it
preflights for duplicate strong definitions and unresolved references,
carves new segments out of the slack, appends symbols/relocations to the
image, applies the new relocations in place, and flips the new pages to
their final perms — bumping a generation counter. This is what lets the
dbg REPL compile and run snippets against an already-mapped program
(see DBG.md).
The kit run driver
driver/cmd/run.c is the user-facing front end. It classifies inputs by
suffix (.c/- source, .wat/.wasm modules, .o, .a), compiles
sources through a caller-owned compiler, JIT-links everything, looks up
the entry (default main, overridable with -e), and calls it as
int(*)(int, char**). Notable design points:
- Lifetime. The compiler backs
jit->c, whichkit_jit_lookupdereferences, sodriver_runkeeps the compiler alive across lookup and the entry call and frees it only afterkit_jit_free. - Host-symbol fallback. Unresolved externs route through
driver_dlsym_resolver(driver/env/), which retries with a leading_stripped so Mach-O-mangled C names resolve throughdlsym(RTLD_DEFAULT). This is how JITed code reaches host libc. - Synthetic argv[0]. A JITed
mainexpects a program name inargv[0], but there is no executable path. The driver fillsargv[0]with the first input's display name; user args after--start atargv[1]. Without--, the program seesargc==1. --no-jit. Routes entry execution through the IR interpreter (see INTERPRETER.md) instead of native code. The native JIT image is still built — it lays out data globals and resolves externs/function pointers — but only the entry's code runs interpreted. Symbol resolution for the interpreter walks the JIT image's full symbol table (locals included) before falling back to host dlsym.- Wasm.
.wat/.wasmmodules get a linear-memory instance wired up and run via a two-call__kit_wasm_init+ entry sequence (see WASM.md), on either the JIT or interpreter path. - Optional
--metrics/--time/--bench-timesurface the scoped compile/link/JIT timings libkit emits throughKitMetrics.
Cross-references
- LINK.md — shared resolve/layout/reloc machinery and the
LinkImage/LinkSessionmodel the JIT consumes. - DBG.md — the JIT debugger session, breakpoints, and PC↔source mapping built on the inspector surface.
- EMU.md — per-basic-block JIT translation on a growing image, a separate scheme that reuses the mapping primitives.
- DWARF.md — the debug-info producer/consumer behind
kit_jit_view. - DRIVER.md — the multitool and host-env adapter layering.
- INTERPRETER.md — the
--no-jitexecution path that shares the JIT image for data layout and symbol resolution. - WASM.md — the linear-memory instancing and two-call entry
sequence the driver wires up for
.wat/.wasminputs.