kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

JIT

kit's in-process JIT maps a fully linked program into the running process's address space and hands back callable function pointers. There is no separate "JIT compiler": the same linker that writes ELF/Mach-O/PE files produces a resolved LinkImage, and the JIT mapper copies that image into executable memory, applies relocations against the live runtime addresses, and exposes a symbol/inspector surface. The mapper is kit_jit_from_image in src/link/link_jit.c. The kit run driver (driver/cmd/run.c) is the headline consumer; the JIT debugger (see DBG.md) and the emulator's block translator (see EMU.md) ride on the same mapping primitives.

Where the JIT sits

  inputs (.c/.o/.a/.wat)
        |  frontend + codegen  (see FRONTENDS.md, CODEGEN.md)
        v
  ObjBuilders  --link_add_obj-->  Linker (jit_mode=1, jit_host set)
        |  resolve + layout      (see LINK.md)
        v
  LinkImage  (segments, relocs, resolved symbols; bytes NOT serialized)
        |  kit_jit_from_image  (src/link/link_jit.c)
        v
  KitJit   (mapped exec memory + symbol inspector)
        |  kit_jit_lookup / addr_to_sym / sym_iter / view
        v
  host code calls the JITed entry in-process

The frontend-to-image half is shared verbatim with the file linker. A KitLinkSession opened with output kind KIT_LINK_OUTPUT_JIT (see src/api/link.c) sets two pieces of state on the Linker: jit_mode, which tells layout to skip file serialization and synthesize JIT-only stubs/GOT, and the JIT host (KitJitHost), the vtable through which libkit reaches the executable-memory allocator without itself depending on any OS. kit_link_session_jit then calls kit_jit_from_image, transferring ownership of the image (and the linker that backs it) into the returned KitJit.

Because the code runs in this process, the JIT only produces runnable code when the target arch and object format match the host. The driver defaults the target to the host, but -target overrides it without a guard: libkit lowers and lays out for whatever target the compiler was created with (kit_jit_image_arch simply reports target.arch), so a cross-target kit run emits native code the host CPU cannot execute and fails at runtime rather than with a diagnostic. Enforcing target==host is the caller's responsibility. The JIT always lowers PIC; -fPIC/-fPIE/-mcmodel are accepted by the driver but have no observable effect.

The single contiguous reservation

The defining invariant of kit_jit_from_image is that the entire image lives in one execmem->reserve mapping. Layout assigns every segment a page-aligned image vaddr inside a single span [image_base, image_end); the mapper reserves that whole span (plus append slack, see below) in one call and then treats each segment as a sub-range at offset vaddr - image_base. No segment gets its own independent mmap.

This exists to keep inter-segment displacements in branch/addressing range. AArch64 ADRP reaches ±4 GiB and CALL26 reaches ±128 MiB; RISC-V AUIPC+branch and x86-64 RIP-relative loads have their own windows. If code, rodata, and data were three independent mappings, the OS could scatter them gigabytes apart and a perfectly legal cross-segment reference would overflow the relocation's range check. One reservation makes every intra-image displacement a function of the layout, not of where the kernel happened to place three separate regions. The same property is what lets a weak-undef symbol resolve through a zero-valued slot, and what lets the far-call stubs (below) sit close enough to their call sites.

vaddr_to_runtime / vaddr_to_write translate an image vaddr to the two aliases of the master mapping (see W^X below). They scan the handful of segments linearly, with a second pass that resolves a vaddr landing exactly on a segment's one-past-end boundary (e.g. __fini_array_end).

Reloc-apply runs in-process on the shared path

The JIT does not have its own relocation engine. It iterates the image's LinkRelocApply records and calls the same link_reloc_apply used by the file writers (see LINK.md). The only JIT-specific twist is the address arithmetic: the patch bytes are written through the write alias, while the symbol value S and the patch-site address P are the runtime alias addresses, because that is where the CPU will execute and fetch from. A handful of relocation kinds get special in-process handling before reaching link_reloc_apply:

After relocations are applied, IFUNC resolvers (ELF only) are run in-process and their results stored into the iplt slots, .init_array constructors are run in forward order, and kit_jit_run_dtors runs .fini_array in reverse on teardown.

Call stubs and GOT slots for host calls

JITed code routinely needs to call into the host process — libc, or any symbol resolvable via the link session's extern resolver. Those targets resolve to real host addresses (SK_ABS), which can be arbitrarily far from the JIT mapping. A direct CALL26/JUMP26 to libc would overflow the branch range, and a far data reference needs an indirection slot.

Layout solves both with JIT-only passes (gated on jit_mode, skipped for static-exe output) in src/link/link_layout.c and src/link/link_reloc_layout.c:

Both the stub section and the slot section are ordinary subsegments of the single contiguous reservation, so the redirected call's branch stays in range. The net effect: kit run hello.c can call printf even though printf lives in a libc loaded megabytes away.

Executable memory: the host vtable and W^X

libkit never calls mmap, VirtualAlloc, or mach_vm_remap itself. All executable-memory operations go through KitExecMem in the JIT host (include/kit/jit.h): reserve, protect, release, flush_icache, and a page_size. The driver supplies the concrete adapter via driver_env_to_jit_host (driver/env/). The contract has two distinct shapes:

The mapper requests KIT_PROT_EXEC on the master reservation if any segment is executable (triggering the dual-mapping path), populates and relocates through the write alias, then protects each segment's runtime sub-range to its final perms. EXEC segments get flush_icache against the runtime alias. On x86 this is a no-op: instruction fetches are coherent with stores on the same core, and dispatch into freshly written code always crosses a serializing return/call, so no explicit flush is needed (the rationale is spelled out in driver/env/icache_x86.c). ARM and RISC-V have separate I/D caches and do need an explicit flush (__builtin___clear_cache / sys_icache_invalidate; see driver/env/icache_*.c). The append-slack tail is protected PROT_NONE until needed.

An absent or incomplete execmem vtable is a hard error (compiler_panic): the JIT cannot run without one. page_size is taken from the adapter, falling back to 0x4000 if the adapter reports 0; the POSIX adapter fills it from sysconf(_SC_PAGESIZE).

Thread-local storage

The JIT is single-threaded, which collapses the whole problem: with one thread there is exactly one instance of each thread-local, which is semantically an ordinary global living in the image's .tdata/.tbss. The mapper already materializes those sections (init bytes copied for .tdata, zero-filled for .tbss), and perms_for maps the TLS segment read-write in the JIT (the AOT image keeps it as a read-only init template each thread copies). So the only work is to make every TLS access resolve to the in-image storage without touching the host thread pointer — reading the host's tpidr_el0/fs/tp would alias into the host process's own TLS, which is both wrong (no initializer) and unsafe (it scribbles on host libc state). All access lowering is therefore relaxed at map time to in-image addressing; no thunk, no per-thread block, no host TLS vtable.

The per-arch idiom rewrite lives behind an arch hook, not in the mapper: the reloc loop classifies a TLS access via the RelocDesc flags (RELOC_IS_TLVP for Mach-O, RELOC_IS_TLS_LE for ELF Local-Exec) and delegates to the arch's LinkArchDesc.jit_tls_le_relax (ELF) or applies the Mach-O TLVP relaxation inline. The mapper itself carries no arch switch.

kit_jit_tls_addr gives host/interpreter code the same resolution from the address a thread-local's symbol resolves to (Mach-O: read descriptor[+16]; ELF/COFF: the symbol is the storage), range-checked against the image so a foreign/extern thread-local resolved through the host is rejected rather than dereferenced.

A subtlety the access lowering imposes on codegen: the Mach-O TLV sequence materializes the descriptor in x0 and (in the AOT form) calls the resolver thunk via x16, clobbering x0/x16/x17/lr. Codegen is shared between AOT and JIT, so the optimizer must model that clobber set or a value left live in x0 across a TLS access is corrupted at -O1+. The backend reports it via NATIVE_MOP_TLS_ADDR from machine_op_clobbers (ELF Local-Exec, which uses only the destination register, reports none).

Symbol and inspector surface

KitJit exposes a read-only view of the mapped program (declared in include/kit/jit.h):

Incremental append

A KitJit reserves append slack (RX/R/RW/TLS buckets, protected PROT_NONE initially) past the image end so additional objects can be linked into the live mapping without a full relink. kit_jit_publish with KIT_JIT_PUBLISH_APPEND_OBJECTS runs jit_append_obj_inner: it preflights for duplicate strong definitions and unresolved references, carves new segments out of the slack, appends symbols/relocations to the image, applies the new relocations in place, and flips the new pages to their final perms — bumping a generation counter. This is what lets the dbg REPL compile and run snippets against an already-mapped program (see DBG.md).

The kit run driver

driver/cmd/run.c is the user-facing front end. It classifies inputs by suffix (.c/- source, .wat/.wasm modules, .o, .a), compiles sources through a caller-owned compiler, JIT-links everything, looks up the entry (default main, overridable with -e), and calls it as int(*)(int, char**). Notable design points:

Cross-references