kit Interfaces
Modularity and clean interfaces are a top project priority. This document is the operational companion to DESIGN.md: where DESIGN.md tells the layering narrative, this doc is the interface inventory and the interface-review checklist. It catalogues every boundary worth reviewing — the public API, the backend/codegen contracts, the internal subsystem seams, the core utilities, and the frontend-to-library edge — names the responsibility each one carries, and gives a checklist to apply when adding to or changing one.
The aim is to make the boundaries legible so they stay clean: an interface that nobody can locate is an interface nobody will defend.
Boundary map
From the outside in (see DESIGN.md for the full narrative):
driver/ CLI policy + host I/O. Built with -Iinclude only:
│ the public surface is all it can reach. No -Isrc.
└─ lang/ Frontends (c, cpp, toy, wasm). Compiled INTO libkit.
│ Public-API consumers: c/cpp/toy build with NO -Isrc and
│ reach nothing internal. Only wasm builds with -Isrc, to
│ reach the shared wasm module model (src/wasm/wasm.h).
└─ include/kit/ PUBLIC BOUNDARY. The library's entire external contract.
└─ src/api/ Composition layer: public handles <-> internal subsystems.
└─ src/ Internal subsystems. Share private headers; expose
nothing outward except through include/kit/.
The boundary that the build enforces. The hard layering line is the
driver, not lang/. The driver is compiled with -Iinclude (plus -Ilang
solely to reach a frontend's public c/c.h for the JIT REPL) and deliberately
without -Isrc, so internal headers (core/..., link/..., cg/...) are
physically unreachable from it. That makes the driver the first true consumer of
libkit's public API and the thing that proves the public surface is
sufficient.
Frontends in lang/ are a softer boundary, but in practice they hold the public
line. The C, cpp, and toy frontends build with no -Isrc at all, so internal
headers are physically unreachable from them — they are pure include/kit/
consumers, exactly like the driver. (The C frontend's ABI lowering lives in its
own lang/c/abi/ module built on the public CG type API and kit_compiler_target;
despite the TargetABI-flavored history it does not touch src/abi/.)
The lone exception is the wasm frontend, which builds with -Isrc to reach
src/wasm/wasm.h. That is not a frontend punching above the public API: the wasm
module model is a Tier-3 shared internal subsystem with three peer consumers —
the frontend (lang/wasm), the wasm object format (src/obj/wasm), and the wasm
codegen backend (src/arch/wasm) — and the frontend reaches it through its single
boundary header, the same way the other two do. It stays internal rather than
public because the module IR is large and unstable; the public wasm.h is a
deliberately different, narrow surface (host-import binding). Everything else the
wasm frontend touches is public — e.g. lang/wasm/cg.c allocates through the
public <kit/support/arena.h> mirror, not src/core/arena.h. Any new frontend
reach into src/ beyond this one edge is a signal the public API is missing
something — add it to include/kit/, don't grow the edge.
Invariants (keep them true):
- There is no whole-library umbrella
include/kit.h. Consumers include the specific headers they use:include/kit/*.handinclude/kit/support/*.h. The one sanctioned exception is a tier-scoped facade:frontend.his the deliberate single front door for the frontend tier and re-exportscg.h,compile.h,source.h, andsupport/arena.h. Scoping a facade to one tier is fine; a grab-bag spanning the whole library is not. - The driver includes only
<kit/...>(the-Ilangexception is a single frontend public header), neversrc/. *_internal.hheaders are private to one subsystem and must not be included across subsystem boundaries.- Format / arch / OS specifics stay behind their dispatch vtable
(
ObjFormatImpl,ArchImpl, the codegen*Targetstructs) — never leaked above the dispatch line.
Tier 1 — Public API (include/kit/)
The library's entire stable contract: nineteen headers in include/kit/ plus
two in include/kit/support/. No umbrella header — each consumer includes what
it uses.
| Header | Purpose | Key opaque type(s) | Primary consumer |
|---|---|---|---|
core.h |
Foundational substrate: compiler lifecycle, target triple, slices, status codes, host vtables (KitHeap/KitWriter/KitDiagSink/KitContext), symbol interning. |
KitCompiler |
everyone |
config.h |
Build-time component enable flags (arch / obj-format / language / subsystem / tool). Preprocessor-only. | — | build |
compile.h |
High-level source->object compilation; frontend registration vtable; dep iteration. | KitCompileSession, KitDepIter |
driver, frontends |
cg.h |
Code-generation API (the largest contract): a stack-machine typed IR over KitCg. Types/ABI, functions, control flow, memory, arithmetic, calls, intrinsics, inline+file asm, static data. |
KitCg |
frontends |
frontend.h |
Frontend front door: re-exports cg.h / compile.h / source.h / support/arena.h, plus the panic boundary (kit_frontend_run), metrics scopes, and fatal helpers. Including this alone is enough to write and register a frontend. |
— | frontends |
source.h |
Source registry: stable file IDs + include-edge recording. | — | frontends |
preprocess.h |
Standalone C preprocessor entry. | — | driver |
object.h |
Format-neutral object model: builder + read-only inspection; section/symbol/reloc enums. | KitObjBuilder, KitObjFile |
cg, link, jit, disasm, dwarf |
link.h |
Linker: byte/object/archive/DSO inputs, linker-script model, emit or JIT. | KitLinkSession, KitLinkScript |
driver, jit |
jit.h |
JIT image: mapped pages, symbol resolution, publish/append/replace, object view. | KitJit |
runtime, dbg |
interp.h |
Threaded-bytecode interpreter over the optimizer IR; host-identity and emu/guest configurations. | KitInterpProgram |
run --no-jit, emu |
dbg.h |
In-process JIT execution control: breakpoints, stepping, regs/mem, signal host. | KitJitSession |
debuggers |
dwarf.h |
DWARF5 consumer: PC<->line, type/var/subprogram queries, structural iterators. | KitDebugInfo, KitDwarfType |
debuggers, dumpers |
disasm.h |
Disassembly of byte ranges and objects, with symbol/reloc annotation. | KitDisasmIter |
objdump, dbg |
emu.h |
User-mode guest-ELF emulator (per-block JIT). | KitEmu |
emu tool |
arch.h |
Arch-agnostic register / unwind-frame metadata helpers. | — | dbg, dwarf, disasm |
archive.h |
POSIX ar reader/writer + symbol index. |
KitArIter |
ar/ranlib |
asm_emit.h |
Emit assembled object bytes as GAS text. | — | objdump |
wasm.h |
WebAssembly host-import resolver/binder. | KitWasmInstance |
wasm runners |
support/arena.h |
Public bump allocator (narrowed mirror of src/core/arena.h). |
KitArena |
frontends |
support/hashmap.h |
Header-only KIT_HASHMAP_DEFINE template + hash fns. |
— (macro) | frontends |
Public-tier notes:
cg.his by far the largest contract and the one frontends couple to hardest. Changes here ripple to every frontend — treat it as the highest-risk public interface.frontend.his the only sanctioned aggregator header — a front door scoped to the frontend tier. It re-exports the codegen / registration / source / arena surface so a frontend TU includes just<kit/frontend.h>. This is the deliberate exception to the "no umbrella header" invariant: the scope is one tier, not the whole library. A TU still reaches for a single narrow header directly when that is all it needs — e.g. a lexer wanting onlycore.h, or the preprocessor, which must not pull incg.h.core.hdefines the host vtables (KitHeap,KitWriter,KitDiagSink,KitContext). These are the project's "no global state" enforcement point; every subsystem threads context through them rather than reaching for a static.- The inspection family (
dwarf.h,disasm.h,arch.h, plusobject.h's read side) is consumed by the dumper/debugger tools and shares the format-neutral object model — keep arch/format detail behind the dispatch vtables it already uses.
Tier 2 — Backend / codegen contract (internal)
The codegen path is tiered: each tier is a distinct contract struct (a
vtable) that a backend or layer fills in. A frontend records into the highest
tier; the bytes come out of the lowest. The native backends — aarch64, x64, and
rv64 — are all live and all satisfy the same contracts. aarch64 is the reference
implementation; x64 and rv64 are full peers built on the shared
NativeTarget / NativeOps / NativeFrame substrate rather than bespoke
per-arch frame and lowering code.
| Tier | Header | Contract type | What implements it | Role |
|---|---|---|---|---|
| ABI | src/abi/abi.h (+ abi_internal.h) |
TargetABI |
per-ABI TUs (aapcs64, aapcs64_windows, sysv_x64, rv64, wasm32, apple_arm64, apple_x64, win64_x64) |
calling-convention + storage-layout queries; abi_new(Compiler) selects the implementation by (arch, obj-format) |
| Arch registry | src/arch/arch.h |
ArchImpl |
one singleton per arch, via arch_lookup(kind) |
discovery + dispatch to backend/decode/emu/link/dbg/dwarf surfaces; CFI defaults |
| Semantic CG | src/cg/cgtarget.h |
CgTarget |
native_direct_target (-O0) or opt_cgtarget (-O>=1), wrapping a per-arch CgTarget |
frontend-facing typed lowering, pre-regalloc |
| -O0 adapter | src/cg/native_direct_target.h |
NativeDirectTarget + NativeOps |
shared adapter, parameterized by each arch's NativeOps |
adapts a NativeTarget to CgTarget for the direct -O0 path |
| Physical emit | src/arch/native_target.h |
NativeTarget |
aa64/x64/rv64 *_native_target_new() |
hard-register, machine-code emission + frame/CFI |
| Frame model (shared) | src/cg/native_frame.h |
NativeFrame |
shared native_frame.c, embedded by all three native backends |
arch-neutral frame-slot bookkeeping the NativeTarget impls delegate to |
| Machine code | src/arch/mc.h |
MCEmitter |
one generic impl, mc_new(Compiler, ObjBuilder) |
section/label/reloc/CFI byte emission for all machine-code archs |
Per-arch entry points — the surface each backend exposes to the rest of the compiler. The native archs each expose the same pair; rv64 additionally exports its raw word/halfword emit helpers for the assembler path:
| Arch | Header | Entry points |
|---|---|---|
| aa64 (reference) | src/arch/aa64/aa64.h |
aa64_native_target_new, aa64_native_direct_ops |
| x64 | src/arch/x64/x64.h |
x64_native_target_new, x64_native_direct_ops |
| rv64 | src/arch/rv64/rv64.h |
rv64_native_target_new, rv64_native_direct_ops, rv64_emit32/16 |
| c_target | src/arch/c_target/{c_emit,ir_emit}.h |
C-source emission backend (standalone CGBackend, no ArchImpl) |
| wasm | src/arch/wasm/* |
wasm emission backend |
Backend-tier notes:
NativeTargetis the physical-emission contract: frame setup and prologue policy, control flow, data movement, arithmetic/compare/convert, calls (plan_call/emit_call/plan_ret/ret), atomics, variadics, intrinsics, and inline/file-scope asm — roughly three dozen hooks. The caller (NDT or the optimizer) has already selected legal physical operands and run register allocation;NativeTargetvalidates and emits but must never allocate registers itself. All three native backends implement the full contract.- The same arch fills both
NativeTarget(physical) andNativeOps(the semantic shims the -O0 adapter calls). Keep the split clean: semantic decisions (operand legality, call planning policy,va_*, asm binding, barriers) live inNativeOps; pure emission lives inNativeTarget. - A handful of
NativeTargethooks are explicitly optional and exist for archs whose ISA needs them —machine_op_clobbers(x86 idiv/shift fixed-register clobbers; NULL on aa64/rv64),emit_prologue/emit_minimal_prologue,bind_params_end(for backends that resolve param binds as a parallel copy), and the zero-register store fast path (has_store_zero_reg). NULL is the documented "this arch doesn't need it" answer, not an unimplemented gap. CgTargetlikewise carries capability queries so the semantic layer can stay arch-neutral:supports_label_table(false on Wasm, which has no code addresses in linear memory),switch_(overridden only by backends with a native multiway branch), andtail_call_unrealizable_reason(CG asks before settingCG_CALL_TAIL). Native archs take the shared label-table / cmp-chain defaults.mc.his arch-neutral. Per-arch relocation encoding lives behindArchImpl.apply_label_fixupplus the per-arch CFI constants onArchImpl; don't leak arch knowledge into the generic emitter.
Shared native frame model (src/cg/native_frame.h)
Every native backend lays out a stack frame the same way at the bookkeeping
level, so that bookkeeping was lifted into one shared module. NativeFrame owns
the arch-neutral parts; each NativeTarget embeds one and keeps the ISA/ABI
specifics. All three native backends embed a NativeFrame.
The split — what NativeFrame owns vs. what stays in the backend:
Owned by NativeFrame (arch-neutral) |
Stays in the backend (ISA/ABI-specific) |
|---|---|
Slot table + cumulative-offset arithmetic (native_frame_slot_alloc/_at) |
Coordinate transform from off to anchor-relative disp (fp/s0/rbp; aa64 top- vs bottom-record) |
frame_final gate (no new slots after the prologue) |
Prologue/epilogue + slim-variant instruction encoding |
Used-callee-save set derived from optimizer per-class masks (native_frame_set_callee_saves/_collect_saves) |
Callee-save placement (aa64 reserves slots here; rv64/x64 compute offsets below the locals) |
max_outgoing tracking (native_frame_note_outgoing) |
Deferred-patch application, variadic register-save stores |
Vararg save-area size from the ABI's va_list layout (native_frame_va_save_bytes) |
— |
Why it is shared: the slot arithmetic, the no-grow-after-prologue gate, and
the derivation of the used-callee-save set from the optimizer's per-class masks
are identical across the three archs. Consolidating them also folds the three
per-arch vararg-save magic numbers (rv64 64, x64 176, aa64 64+128) into a single
ABI-driven native_frame_va_save_bytes query, in line with the no-magic-numbers
rule. Frame-slot handles are 1-indexed with NATIVE_FRAME_SLOT_NONE as the
sentinel; callers and the shared allocator must agree on that.
Tier 3 — Internal subsystem boundaries
Each subsystem exposes a single shared header (its boundary) and may keep an
*_internal.h private to its own TUs. The boundary header is what other
subsystems are allowed to include; the internal header is not.
| Subsystem | Boundary header | Internal header | Role |
|---|---|---|---|
| obj | src/obj/obj.h |
format headers private | Format-neutral object model (ObjBuilder, sections, symbols, relocs) plus read side; the hub cg/link/jit/disasm/dwarf depend on. |
| ↳ formats | src/obj/{elf,macho,coff}/*.h, format.h, reloc_apply.h |
— | Per-format emit/read behind ObjFormatImpl; link_reloc_apply for relocation. |
| link | src/link/link.h (+ link_arch.h) |
link_internal.h |
Byte/object/archive/DSO inputs, symbol resolution (single-shot and incremental), ELF/JIT output; kit_jit_from_image. |
| opt | src/opt/opt.h (+ ir.h) |
opt_internal.h |
SSA construction, CFG passes, register allocation, MIR lowering; opt_cgtarget_new(Compiler, CgTarget, level) wraps a backend target. |
| cg | src/cg/{ir,ir_recorder,type}.h |
internal.h |
IR recording and the codegen type system (cg_type_*). |
| debug | src/debug/debug.h (+ dwarf_defs.h) |
debug_internal.h, dwarf_internal.h |
DWARF producer: types, subprograms, line program, emit. |
| emu | <kit/emu.h> (public face) |
src/emu/emu.h |
Guest-ELF emulator; format hooks via ObjFormatEmuOps. |
| dbg | <kit/dbg.h> (public face) |
src/dbg/dbg.h |
JIT execution control; the real contract is the public header. |
| asm | src/asm/asm.h (+ asm_lex.h) |
asm_helpers.h shared |
asm_parse(Compiler, AsmLexer, MCEmitter); driver helpers. |
| jit | src/jit/tlv_thunk.h |
— | Mach-O TLV thunk; the rest of JIT runs through LinkImage. |
| wasm | src/wasm/wasm.h |
— | Module model / codec / WAT / validate, in public Kit types. Shared by three peers: the wasm frontend (lang/wasm), object format (src/obj/wasm), and codegen backend (src/arch/wasm). |
| api | src/api/lang_registry.h |
— | lang_registry_init(Compiler) wires the enabled frontends. |
Subsystem-tier notes:
obj.his the hub: cg, link, jit, disasm, and dwarf all depend on it. Format knowledge (ELF/Mach-O/COFF) stays behindObjFormatImpl; verify new code doesn't hard-code one format above that line.linkexposes both single-shot and incremental resolve surfaces; keep them consistent — the incremental path patches a prior on-disk image rather than relinking from scratch.emuanddbgpresent their real contract through the public headers; thesrc/headers are implementation detail. Don't grow a second public-ish surface insrc/.
Tier 4 — Core utilities (src/core/)
Foundational data structures. They enforce two project rules at the type level:
no global state (everything takes an explicit Heap* / Arena* /
Compiler*) and no VLAs.
| Header | Purpose | Takes explicit allocator? | Public mirror |
|---|---|---|---|
core.h |
Type aliases, Compiler struct, panic/defer machinery |
Compiler holds context |
partial, via include/kit/core.h |
arena.h |
Bump allocator; reset frees all | Heap* |
include/kit/support/arena.h (narrowed) |
pool.h |
Symbol interning (Sym canonical IDs) |
Heap* |
— |
buf.h |
Chunked byte buffer with patch/seek | Heap* |
— |
vec.h |
Doubling-growth vector (VEC_GROW) |
Heap* |
— |
segvec.h |
Segmented append-only array, stable pointers (SEGVEC_DEFINE) |
Heap* |
— |
hashmap.h |
Alias to the public template | n/a | include/kit/support/hashmap.h |
heap.h |
Heap abstraction + JIT exec-mmap helper | wraps KitHeap |
KitHeap in core.h |
strbuf.h |
Bounded text builder, caller-owned buffer | none (caller buffer) | — |
slice.h |
Fat-pointer byte view (alias of KitSlice) |
Arena* for dup |
KitSlice in core.h |
bytes.h |
LE/BE integer serialize helpers | none | — |
diag.h |
Diagnostic-sink convenience wrappers | none (delegates) | — |
metrics.h |
Telemetry dispatch to optional callbacks | none (reads Compiler) | — |
sha256.h |
Streaming SHA-256 | none | — |
util.h |
MIN/MAX/ALIGN_*/CONTAINER_OF macros |
none | — |
Core-tier notes:
- The public mirrors (
arena,hashmap, parts ofcore) deliberately expose a narrowed surface, not the full internal one, and they are allowed to diverge. When changing an internal utility that has a mirror, decide explicitly whether the public mirror moves too. - These are the foundation for the no-global-state rule: any new core utility that reaches for a static or a global is a red flag.
Tier 5 — Frontend <-> library boundary
Frontends live in lang/, are API consumers, and register per-KitCompiler.
Each implements KitFrontendVTable (include/kit/compile.h): a constructor,
a compile function that turns a source slice into a KitObjBuilder, a
destructor, and the list of file extensions it claims. A frontend is registered
with kit_register_frontend(compiler, language, vtable);
src/api/lang_registry.h::lang_registry_init auto-wires the enabled
KIT_LANG_* frontends at compiler construction.
What frontends consume is overwhelmingly the public API, reached through a
single front door: <kit/frontend.h> re-exports cg.h, compile.h,
source.h, and support/arena.h (and transitively core.h + object.h), so a
frontend TU includes just that one header. The remaining direct public includes
are the ones outside the facade — support/hashmap.h for hashed tables, and
preprocess.h / jit.h / wasm.h where a specific frontend or runner needs
them — plus the single documented frontend→subsystem edge noted in the boundary
map (the wasm frontend reaching src/wasm/wasm.h).
| Frontend | Public entry | Notable internal headers |
|---|---|---|
C (lang/c/) |
kit_c_frontend_vtable (c.h) |
none in src/ — type/, decl/, sem/, parse/, abi/ are frontend-local modules over the public CG API |
cpp (lang/cpp/) |
shared by C; pp/pp.h, lex/lex.h |
cpp_support.h, pp/pp_priv.h |
toy (lang/toy/) |
kit_toy_frontend_vtable (toy.h) |
internal.h, lexer.h |
wasm (lang/wasm/) |
kit_wasm_frontend_vtable (wasm.h) |
src/wasm/wasm.h — the shared module-model subsystem; the one frontend→src/ edge |
Frontend-tier notes:
lang/c/parse/cg_adapter.his the C parser's codegen adapter: it wrapscg.hwith C-semantic sugar (lvalue auxiliaries, the type stack,pcg_*helpers). It is the real coupling point between the C parser and codegen, and it is worth understanding when working in either: it carries the C frontend's own policy on top of the genericcg.h, so a change to one frequently belongs in the other.lang/wasm/wasm.hexposeskit_wasm_wat_to_wasm()— a WAT-to-wasm helper living in a frontend-public header, used by wasm tooling/tests.
Interface review checklist
Apply this to any interface (header / vtable) you add or change. Tier-1 (public) and Tier-2 (backend contract) changes warrant the full list; lower tiers can be lighter.
Boundary & layering
- Header lives at the right tier; consumers at the correct layer can reach it.
- No layering violation: the driver uses only
<kit/*.h>; subsystems don't include each other's*_internal.h; a frontend reaches intosrc/only via the one documented edge (the wasm frontend →src/wasm/wasm.h). - Format/arch/OS specifics stay behind their dispatch vtable
(
ObjFormatImpl,ArchImpl, the codegen*Targetstructs) — not leaked above it. - If a public mirror exists (arena/hashmap/core), the divergence from the internal version is intentional and documented.
Surface shape
- Opaque handles where the consumer shouldn't see layout; concrete structs only where the layout is the contract.
- Minimal surface — no entry points added "just in case"; each has a caller.
- Naming consistent with the tier (
kit_*public; subsystem-prefixed internal). - Enums/flags are explicitly valued where they cross a format/wire boundary.
State & ownership
- No global/static state — context threaded via
Compiler/Heap/Arena/KitContext(project rule). - Allocation ownership is explicit: who allocates, who frees, lifetime rules.
- No VLAs (project rule).
- Borrowed vs. owned bytes (
KitSliceetc.) documented at the boundary.
Errors & contracts
- Errors reported via
KitStatus/ the diag sink, not ad-hoc returns; failure modes documented. - Pre/postconditions and ordering constraints stated (e.g.
obj_finalizebefore read-side queries;func_begin/func_endpairing). - No magic numbers — shared constants promoted to a header (project rule).
Vtable / backend contracts (Tier 2)
- Every required hook is implemented and matches its semantics, not just its signature; optional hooks left NULL only where the arch genuinely doesn't need them.
- Semantic vs. physical responsibilities kept on the right side
(
NativeOpsvs.NativeTarget). - A new hook added to the contract is implemented by all live native
backends, or has a documented capability-query fallback (e.g.
supports_label_table,machine_op_clobbersreturning NULL).
Stability & docs
- Public (Tier-1) change: is it source-compatible? If not, callers are updated in the same change.
- This document's inventory is updated.
- DESIGN.md is updated if the layering narrative changed.