JIT (planned work)
This roadmap covers the future of kit's in-process JIT: the
append-only incremental linker, function-level hot reload built on top of
it, the managed-runtime (Go-style) codegen/JIT contracts, and the
remaining cross-host and parity gaps in kit run / kit dbg. The
implemented design lives in ../JIT.md; the linker and
incremental-link mechanics are in ../LINK.md and
LINKER.md. This document records only what is still planned,
the open problems, and the order we intend to build them in.
Baseline (the starting point): the JIT mapper (src/link/link_jit.c)
reserves one contiguous region, copies segments, applies relocations
against the runtime base, and flips to W^X final perms; ELF and Mach-O
reloc-apply (cross-TU GOT, weak-undef proximity, far-call stubs, ELF
IFUNC pre-resolution) are green, and the inspector surface
(kit_jit_view, kit_jit_addr_to_sym, kit_jit_sym_iter_*) is
wired. Append-only extension already increments a JIT generation
(kit_jit_generation) and stages relocations via
link_append_reloc_slot. Reload, managed-runtime hooks, and the
regression harness are not yet built.
1. Function-level hot reload
Hot reload adds replacement on top of append-only incremental linking.
In v1 only functions can be replaced; data symbols, TLS, type layouts,
initializers, destructors, and object lifetime are out of scope. The
first usable milestone: in dbg, reload a global function while the
worker is stopped, existing function pointers keep working, new calls hit
the new body, and old frames return safely.
1.1 Core idea — stable entry, indirected body
Append-only linking can already add a new function body. Reload adds a stable function entry that indirects through a slot to the current body:
foo entry/trampoline -> foo.slot -> current foo body
Reloading compiles and appends a new body, relocates it, then atomically
updates foo.slot. Existing pointers to foo stay valid because they
point at the stable entry, not a body generation. The baseline patches
one pointer-sized cell, not every call site.
- Per-arch stable entry sequence (aarch64
adrp/ldr/br, x64jmp *foo.slot(%rip), rv64auipc/ld/jr). Entry lives in RX, slot in writable data under the same W^X discipline as the rest of the image. - Slot update is a pointer-width atomic aligned store so the
representation stays compatible with future multi-threaded sessions,
even though
dbgv1 reloads only while the worker is stopped. - Public symbol resolves to the stable entry; body symbols are internal
and generationed (
foo$body$0,foo$body$1, ...). Inspector and debugger present the public name; generationed bodies show only under an internal/debug flag.
1.2 Indirection is opt-in
The trampoline cost is real and pointless for normal kit run. Gate it
behind a JIT mode (KIT_JIT_INDIRECT_EXPORTED_FUNCS vs
KIT_JIT_INDIRECT_NONE) so AOT executable links are unaffected. v1
restricts reload to global C-linkage functions visible to
kit_jit_lookup; static functions need a stable synthetic identity
keyed on the containing TU before they qualify.
1.3 ABI compatibility gate
Reload must not change the ABI a caller already compiled against. The
C frontend should emit a compact per-definition ABI signature (target
arch/os, call conv, variadic flag, return class+size, arg classes+sizes;
fixed KIT_ABI_MAX_ARGS bound, no VLA) so the linker can verify
runtime-callability without re-deriving C types. Functions whose
signature exceeds the bound are marked non-reloadable until a heap-backed
encoding exists. Missing/non-C signatures reject reload unless the user
explicitly opts into unchecked replacement.
1.4 Replacement-object restrictions (v1)
A replacement object may contain the new body, private helpers used only by it, read-only literals it needs, debug sections, and undefined refs to already-linked or resolver symbols. It may not contain new writable globals, TLS, ctors/dtors, public definitions other than the target, or colliding strong definitions. This keeps reload function-only in practice, not just in name.
1.5 Old-generation lifetime
After a slot update, old bodies stay mapped. Even with the worker
stopped, a live frame may be inside the old function, so continuing must
be valid: existing frames finish in the old body, new calls enter the new
one. v1 never reclaims old generations until kit_jit_free; later,
retire a generation only when the debugger/runtime can prove no stopped or
running frame has a PC or return address inside it. Never unmap old code
immediately after publishing.
1.6 Debugger integration
dbg must distinguish address breakpoints from symbol/source ones:
b *0x1234stays at that exact address, even in an old generation.b foois rebound to the active generation after reload.b file.c:42is re-resolved after DWARF refresh; if re-resolution fails, keep the old breakpoint but mark it stale ininfo breakpoints.
Every reload bumps the JIT generation; kit_jit_view rebuilds on
mismatch. The rebuilt view keeps old debug info so backtraces from old
frames still resolve, while name-to-address lookup prefers the latest
generation.
1.7 Transactional publish and failure behavior
Reload is all-or-nothing. ABI mismatch, disallowed data/TLS/init arrays, unresolved symbols, out-of-capacity, or relocation failure all reject and leave the old body active. Pages committed before a failure may remain as dead space, but no public symbol or slot may ever point at them.
1.8 Patch-site index (later, performance only)
The correctness baseline needs no caller patching. A patch-site index built from durable relocation records (target sym -> apply ids, owner input -> apply ids, write section -> apply ids) enables a later fast mode that patches only call sites targeting the reloaded symbol, plus future non-function slot fixups. For v1, build the structures but use them only in assertions/tests.
1.9 Reload work items
- Reloadable-function records + per-arch stable entry/slot emission, gated by an indirection-mode JIT option.
kit_jit_lookupreturns stable entries for reloadable functions.- C-frontend ABI-signature emission per definition.
- Replacement-object validation (reject data/TLS/init/collisions).
- Append + relocate body, publish via atomic slot store, increment generation.
- JIT-view/DWARF refresh + symbol/line breakpoint rebinding.
link_session_mark_reloadable/link_session_reload_functioninternal surface andkit_jit_reload_functionexperimental API.- Direct-call patching as a separate phase, after the baseline is correct.
2. Go-runtime-style codegen / JIT support
Go is statically typed, so the gap is not dynamic typing in
../CODEGEN.md's KitCg — it is the managed runtime
model: precise GC, goroutines, managed stacks, panic/defer/recover, and a
long-lived JIT image whose code and metadata evolve safely. Principles:
keep CG typed; lower source concepts to storage types plus runtime
metadata; make managed behavior explicit rather than inferred late;
keep AOT and ordinary C/Toy/Wasm JIT unchanged by default (managed
features are opt-in via code options or function attributes); keep all
runtime services on context/session structs, never global state. The
first useful milestone is far smaller than "compile Go": a tiny managed
frontend that allocates traced objects, hits a safepoint, lets the host
enumerate roots from a JIT stack map, and calls in through a C-callable
trampoline.
2.1 CG interface extensions
- Precise GC metadata. A
kit_cg_safepointthat records, per safepoint PC, which frame slots / params / globals hold managed pointers. The encoding should become compact backend stackmap data (fast PC-to-stackmap lookup), not DWARF-only. - Managed pointer identity. Distinguish raw from managed heap
pointers.
kit_cg_type_ptralready takes an address space; the missing piece is policy (which spaces are scanned / movable / non-moving / interior / raw). - Write barriers and allocation. Don't make every frontend open-code
barriers. Add managed store ops or
GC_ALLOC/GC_WRITE_BARRIER/GC_READ_BARRIERintrinsics carrying object base, field offset, and pointer kind; ordinary C stores stay ordinary. - Runtime calling convention. A
KIT_CG_CC_MANAGEDplus function flags (MANAGED_STACK,GC_SAFEPOINTS) for a hidden goroutine/thread context param, reserved registers, prologue stack checks, and a helper-call ABI. Pairs with first-class multi-result functions to match Go better than sret today. - Managed stacks / goroutines. Prologue stack checks, grow/switch runtime calls, live-pointer maps before a growth call, and frame relocation metadata if stacks move — all as a managed-stack attribute, not a default.
- Panic and implicit checks. Explicit check ops or trap-site metadata (check kind, source location, recovery target, runtime helper) so the JIT/debug layer can map a trap PC to a language panic path instead of just a process signal.
- Defer/recover. Minimal path: lower defer management to runtime calls and make panic edges explicit enough for stackmaps and stepping. Full cleanup/landing-pad metadata is deferred until semantics settle.
2.2 JIT interface extensions
- Transactional publish. Strengthen the publish contract: link failure leaves the image unchanged, metadata publishes atomically with code, old code stays executable while frames may return into it, and readers detect generation changes. Aligns with §1.7.
- Runtime metadata registry. A JIT metadata channel separate from
object/DWARF inspection (stack map, func table, type desc, trap table,
inline table) supporting fast PC-to-function / PC-to-stackmap /
PC-to-trap / PC-to-inline-frame / symbol-to-generation queries.
kit_jit_viewstays DWARF-oriented; runtime metadata must be compact and queryable without parsing DWARF. - Code lifetime / reclamation. Explicit states (active, replaced but callable, retired, reclaimable) with a runtime/debugger veto on reclamation until stack scanning proves no frame references the old generation. Extends §1.5.
- Managed entry invocation. Today's entry-call helpers are
argv/
u64-narrow. Keep the low-level JIT call ABI simple and require the frontend/runtime to emit C-callable trampolines for managed entry points. - Thread / stop-the-world coordination. Eventually: safepoint polling, cooperative stop requests, goroutine enumeration, stack scanning while stopped, metadata refresh while paused. Not needed for the first CG change, but the publish/metadata APIs must not assume a single worker forever.
2.3 Managed-runtime sequence
- Managed pointer / address-space policy + explicit safepoint records.
- Emit and query compact stack maps from the JIT image.
- Managed allocation and write-barrier intrinsics.
- Managed-stack function attributes + stack-check lowering.
- Publish runtime metadata transactionally with JIT appends.
- Trap/check tables for panic lowering.
- Function replacement/lifetime on top of hot reload (§1).
- Broaden debugger/session APIs for multi-threaded coordination.
3. Remaining JIT TODOs
3.1 Driver — kit run
-O2crashes on the multi-file inline-asm demo withBus error. Likely an optimizer bug surfaced throughIR_ASM_BLOCKreplay; reduce and file againstsrc/opt/opt.c(the recorder/replay seam), not the JIT. See ../OPT.md.- Regression harness: a scripted
test/run/suite diffing exit codes and stdout across.c, stdin,.o,.a, multi-file, and-eentry cases, plus a--no-jitinterpreter-vs-JIT cross-check. No coverage today; wire atest-runtarget intomk/test.mk.
3.2 Inspector / debugger surface
- Windows host adapter for the JIT debugger: vectored exception
handlers +
SetThreadContextinstead of POSIX signals. See ../DBG.md. - x64 / rv64 displaced single-step (x64
INT3+ RIP-relative fixups, rv64EBREAK+ AUIPC/JAL/branch fixups). aarch64 only today — a JIT-specific parity gap that blocksdbgstep on those targets.
3.3 Memory mapping / executable allocator
- Cross-host
KitExecMemaudit. Apple silicon uses dual-mapping; other POSIX hosts fall back tomprotectRW<->RX. Document the contract and the failure mode whenhost->execmemis unset (currentlycompiler_panic), and define the WindowsVirtualAlloc/VirtualProtectstory alongside §3.2. - Page size: the JIT defaults to
0x4000when the host adapter reportspage_size = 0. Either require the adapter to fill it or querysysconf(_SC_PAGESIZE)indriver/env.c.
3.4 Tests
- Mach-O J-path markers in the link-test reporter so the reloc-apply
groups are distinguishable from a generic SIGSEGV (today
make test-link KIT_TEST_OBJ=machoprints rawSegmentation faultwith no J-specific markers). test/smoke/dbg_hello: a scripted REPL diff against a JIT'd source (see ../DBG.md).- Hot-reload unit + smoke tests once §1 lands: lookup-address
stability across reload, old vs new return value, saved function
pointer hits the new body, old PC still describable via
addr_to_sym, and the negative cases (ABI mismatch, writable-data replacement, duplicate public definition reject). Run on one JIT target first; cross-arch trampoline encoding gets its own tests.