kit Design
kit is a freestanding C11 compiler multi-tool, written in C11. This document
is the front door to the design docs: it states what kit is, the principles
that shape it, the layered architecture, the primary data flows, and an index of
every sibling design doc. It is a map, not a manual — API signatures and struct
layouts live in the headers under include/kit/; per-subsystem detail lives in
the docs indexed at the end.
What kit is
A single multi-call binary (kit) that bundles a complete C toolchain plus the
machinery to JIT, debug, and emulate what it produces. Capabilities:
- C11 preprocessor, single-pass parser/type checker, and code generator.
- A JIT compiler, an in-process runner, and an interactive debugger.
- A linker (objects/archives/DSO inputs -> executable or shared image), with basic linker-script support and file-based incremental linking.
- A standalone assembler (GAS subset) and inline assembler sharing one emitter.
- A lightweight optimizer (a recording IR with SSA, register allocation, and
local cleanup behind
-O1). - Cross-compiling backends for aarch64, x86-64, riscv64, and WebAssembly, plus a portable C-source backend.
- Object read/write for ELF, Mach-O, and PE/COFF; a Wasm object form.
- DWARF debug-info production and consumption; a disassembler.
- A user-mode guest-ELF emulator (per-basic-block JIT translation).
- A bytecode interpreter over the optimizer IR (
run --no-jit). - Signed, content-addressed code distribution (
.kpkg). - Object/archive utilities:
ar,ranlib,nm,size,strip,objcopy,objdump,addr2line,strings.
Design principles
- Freestanding C11. The compiler builds and runs without a hosted libc; it ships its own headers and runtime so it can compile itself and target bare metal. The implementation obeys the same constraints it imposes on its output.
- No global state. There are no mutable globals or hidden singletons. All
state hangs off an explicit context — a
KitCompileror a subsystem handle (KitObjBuilder,KitLinkSession,KitJit,KitJitSession,KitEmu, frontend state) — so the library is reentrant and embeddable. - The host supplies all side effects. libkit itself touches no OS. The
host injects every side effect through vtables: heap, diagnostics, file I/O,
metrics, and a clock via
KitContext(include/kit/core.h); executable memory and JIT thread-local storage viaKitJitHost(jit.h); and debugger OS hooks — threads, events, signal handling, code patching — viaKitDbgHost(dbg.h). This is what makes "no global state" enforceable and keeps the library portable across hosts. - No VLAs. Stack growth is bounded and predictable; dynamic sizing goes through arenas/heaps so freestanding and constrained targets stay safe.
- Strict modular layering. Each layer depends only inward, and subsystem internals are private. The public header set is the entire contract; crossing a boundary the wrong way is a design bug, not a shortcut.
- Multi-arch & multi-platform via vtables, not
#ifdef. Architecture, ABI, and object-format variation is expressed as runtime dispatch tables —ArchImpl(src/arch/arch.h),TargetABI(src/abi/abi.h), andObjFormatImpl(src/obj/format.h) — selected from the target triple. New targets are new table instances, not new conditional branches scattered through the tree. - Build-time component gating.
include/kit/config.hdefinesKIT_*_ENABLEDflags for archs, object formats, languages, subsystems, and tools. The build drops disabled units entirely, so a minimal embedding pays only for what it uses, and the tool registry indriver/main.cis gated by the same flags.
Layered architecture
From outside in, each layer depends only on the layer beneath it:
driver/ CLI policy + host I/O. Includes ONLY <kit/*.h>.
lang/ Frontends (c, cpp, toy, wasm). API consumers; ONLY
<kit/*.h> + their own private headers.
include/kit/ PUBLIC BOUNDARY. The library's entire stable contract.
src/api/ Composition: public handles <-> internal subsystems.
src/... Internal subsystems. Share private headers among their own
TUs; expose nothing except through include/kit/.
driver/implements the multi-call binary.driver/main.cholds the central tool table; each tool (cc,as,ld,ar,run,dbg,emu,cas,pkg, …) translates command-line flags into public API calls and supplies the host vtables. The content-addressed store and.kpkgpackaging (tar/deflate/lz4, BLAKE2b, ed25519/minisign) are a libkit subsystem (src/dist/, behind<kit/cas.h>/<kit/package.h>);cas/pkgare thin CLIs over it.lang/holds the frontends.lang/cpreprocesses (lang/cpp), parses, type-checks, manages C declarations, and drives the public CG API;lang/toyandlang/wasmare smaller frontends exercising the same boundary. Each registers aKitFrontendVTableper compiler and emits throughKitCg; no frontend owns object formats or linker policy.include/kit/is the public boundary — the only headersdriver/andlang/may include.src/api/is the composition layer: it implements the public handles (src/api/compile.c,link.c,object_builder.c,archive.c,disasm.c, …) and wires enabled frontends (src/api/lang_registry.c). It is the single place where public types meet internal subsystems.src/subsystems do the work:core(arenas, vectors, buffers, symbol interning, diagnostics, hashing),abi,arch,asm,cg,opt,obj,link,jit,dbg,emu,interp,debug(DWARF),wasm,os, anddist(content-addressed store + signed.kpkgpackaging).
The layering invariant: driver/ and lang/ include only <kit/*.h> —
never a src/ header. Anything a frontend or tool needs is promoted into the
public headers; reaching into src/ is a layering violation. Subsystem
*_internal.h headers stay private to their own translation units.
Key abstractions
KitCg(include/kit/cg.h) is the frontend-facing code-generation API: a typed stack-machine IR over which all frontends emit functions, control flow, data, calls, and inline asm. It is the largest public contract and the point frontends couple to hardest.- Tiered backend. A
CgTarget(src/cg/cgtarget.h) receives the lowered CG stream. At-O0a sharedNativeDirectTargetadapts the physicalNativeTarget(src/arch/native_target.h) directly; at-O1the optimizer wrapper (src/opt/) records IR, runs its passes, then replays into the sameNativeTarget. Physical machine bytes flow through one arch-neutralMCEmitter(src/arch/mc.h). ObjBuilder(src/obj/obj.h) is the canonical in-memory object model during compilation, assembly, linking, JIT, inspection, and DWARF emission — one section/symbol/relocation store, with format knowledge behindObjFormatImpl.- Symbols.
KitSymis an interned spelling (an identity, not a definition). Object builders use object-scoped symbol ids so locals from different objects never collide; the linker builds a separate resolved-symbol table across all inputs.
Primary data flows
1. C source -> object
driver cc -> KitContext + KitCompiler -> kit_compile_* (src/api/compile.c)
-> registered C frontend (lang/c): lex -> preprocess -> parse/type/decl
-> KitCg (public CG API)
-> CgTarget ( -O0 NativeDirect | -O1 opt wrapper )
-> NativeTarget -> MCEmitter
-> ObjBuilder -> object writer
The driver loads source bytes and picks options; src/api/compile.c creates an
ObjBuilder and dispatches to the frontend registered for the input language.
Assembly (.s) takes a shortcut: the asm frontend feeds the MCEmitter/
ObjBuilder path directly, bypassing KitCg because it is already
target-level.
2. File link -> executable
objects / object bytes / archives / DSO inputs
-> kit_link_* (src/api/link.c -> src/link/)
-> object/archive readers -> symbol resolution -> layout
-> relocation -> executable (or incremental patch) writer
The linker owns archive member selection, symbol resolution, section/segment
layout, relocation (per-arch fixups behind ArchImpl), build/image-id handling,
and final emission, for any enabled object format.
3. Run / JIT in-process
source/object inputs -> compile/link to a JIT LinkImage
-> kit_link_jit (KitExecMem from KitJitHost maps + protects pages)
-> KitJit / KitJitSession
-> run (invoke entry) | dbg (breakpoints, stepping, regs/mem via KitDbgHost)
The JIT shares the same compile/object/relocation machinery as file output; only
the final sink differs. Mapping executable memory and installing TLS are
delegated to the host through KitJitHost. run --no-jit instead attaches a
bytecode InterpProgram (src/interp/) and executes the entry through the
interpreter while still using the JIT image for real data/extern addresses.
4. Emulate a guest ELF
guest ELF bytes -> emu ELF loader (src/emu/)
-> decode/lift guest basic blocks
-> CgTarget -> JIT image
-> emu runtime (syscall + memory model)
The emulator is a user-mode ELF runner that translates guest basic blocks into the same backend/JIT infrastructure used for native JIT, executing them under a guest memory and syscall model.
State and ownership
The host owns storage and side effects (heap, file I/O, executable memory, TLS,
debugger OS hooks); libkit owns compilation, object construction, linking, JIT
layout, and relocation policy. Public APIs take explicit options and handles;
internal state hangs off KitCompiler, the subsystem handles, or frontend
context structs. Compile inputs are caller-owned byte buffers that must outlive
the call; builders returned by compile are owned by the compiler until consumers
finish; object/archive/DSO bytes handed to link calls are borrowed for the call
unless an API states otherwise.
Documentation index
| Doc | Covers |
|---|---|
| DESIGN.md | This map: what kit is, principles, layering, data flows, index. |
| INTERFACES.md | Interface inventory and review checklist across all tiers (public, backend, subsystem, core, frontend). |
| FRONTENDS.md | The lang/ frontends — C (preprocess/parse/type/decl), cpp, toy, wasm — and the frontend vtable contract. |
| CODEGEN.md | The KitCg public CG API and the tiered CgTarget -> NativeDirect/opt -> NativeTarget lowering path. |
| IR.md | The recording/optimizer IR: instructions, types, and how CG operations become analyzable functions. |
| ARCH.md | Per-arch backends (aarch64/x86-64/riscv64), ArchImpl dispatch, MCEmitter, register files, and fixups. |
| ASM.md | The standalone + inline assembler, GAS-subset syntax, and the shared emitter. |
| OPT.md | The -O1 optimizer: SSA construction, register allocation, combine/DCE, and replay into the backend. |
| INTERPRETER.md | The bytecode interpreter over the optimizer IR used by run --no-jit. |
| OBJ.md | The format-neutral object model and ELF/Mach-O/COFF/Wasm read/write behind ObjFormatImpl. |
| LINK.md | Linking: symbol resolution, layout, relocation, linker scripts, and incremental linking. |
| JIT.md | The JIT image model, executable-memory and TLS host hooks, and publish/append/replace. |
| EMU.md | The user-mode guest-ELF emulator and its per-block JIT translation. |
| DWARF.md | DWARF debug-info production and the consumer used by the debugger and dumpers. |
| DBG.md | The debugger: breakpoints, single-step, displaced execution, register/memory access. |
| CBACKEND.md | The portable C-source backend (src/arch/c_target/). |
| WASM.md | The WebAssembly backend, object form, and host-import binding. |
| DISTRIBUTE.md | Signed .kpkg packaging and the content-addressed store (src/dist/, <kit/cas.h> / <kit/package.h>, cas/pkg tools). |
| DRIVER.md | The multi-call binary, tool registry, and command-line policy. |
| RUNTIME.md | The freestanding headers and compiler-rt/libc-style support in rt/. |
| BUILD.md | The build system and KIT_*_ENABLED component gating. |
| TESTING.md | The test suites and harnesses under test/. |
| CODE_SIZE.md | Line counts per component (per-format/per-target split from core). |
Planned work and roadmaps live under doc/plan/.