kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Build & Configuration

This document describes how kit is built and configured: the make targets that produce the library, binary, and runtime; the compile-time component-gating model that lets a build include only the architectures, formats, frontends, subsystems, and tools it needs; the small set of choke points where those gates are honored; how output is made reproducible; and the staged self-build (bootstrap). It is a map of the build architecture, not a recipe list — see DESIGN.md for the system overview and RUNTIME.md for the runtime library it links against.

Three products, one tree

The source tree compiles into three outputs:

libkit.a                       the engine (public + internal C)
kit                            the multi-call driver binary
rt/<variant>/libkit_rt.a       compiler-rt/libc support, per target variant

The first two are direct make targets (make lib, make bin); the runtime is not a standalone user target. Its variants are produced as a dependency of the self-host path — the freshly built kit compiles each libkit_rt.a (RT_CC = $(BIN) cc) — so building the runtime always goes through a working driver binary.

Layering is enforced by include paths, not convention:

make all builds lib + bin. The driver binary links the static libkit.a; bin also drops a support/rt symlink so the freshly built compiler can find runtime sources at run time.

libkit.a is a single relocated object

The library is not a naive ar of every .o. The object set is first combined with ld -r into one relocatable object (build/.../libkit.o), and that is archived. This guarantees the archive is rebuilt wholesale when sources are added or removed (plain ar rcs only adds/updates members and would silently retain a deleted file's object), and it gives the symbol-discipline check (below) a single object to inspect.

Build modes

RELEASE=0 (default) is the development build: -O0 -g3, frame pointers, and ASan+UBSan with halt_on_error. RELEASE=1 is -O2, -DNDEBUG, function/data sections plus dead-strip at link. Mode flags (HOST_OPTFLAGS / HOST_MODE_*FLAGS) live in the root Makefile; they are recorded into build/.../.build-config so that flipping a mode flag forces a rebuild of objects that were produced under the old flags. Build-mode concerns are kept strictly separate from host-environment concerns (next section).

RELEASE=1's -DNDEBUG also compiles out the KIT_TRACE / KIT_LOG* developer tracing (include/kit/trace.h): each trace point expands to nothing, its arguments unevaluated. Define KIT_TRACE_FORCE to keep tracing in an NDEBUG build — useful for debugging a release or bootstrap image. Otherwise tracing is a pure runtime affordance, dormant until the KIT_TRACE env var selects modules and levels (KIT_TRACE=cg=trace,coff=debug, or KIT_TRACE=1 for everything, matched as a substring against each trace point's module name, default __FILE__). It rides the same weak-override seam as the rest of the hosted boundary: libkit.a ships no-op kit_trace_enabled / kit_trace_emit and holds no trace state, while the hosted driver/env/common.c supplies the strong overrides that parse KIT_TRACE once and write to stderr — so a libkit-only link traces nothing and pays nothing.

Host detection: mk/env.mk

mk/env.mk is the only place in the build that branches on the host OS or arch. It normalizes uname into HOST_OS / HOST_ARCH, resolves the Darwin -isysroot (empty elsewhere, so splicing $(HOST_SYSROOT_*FLAGS) is always safe), and — crucially — selects the exact set of driver/env/*.c files for the host: one per OS, one per arch for icache flush, and on POSIX one per (arch, OS) for ucontext register marshalling. Choosing source files in the build, rather than #ifdef-ing one mega-file, is the explicit design: every hosted adapter TU compiles for exactly one platform. The rest of the Makefile reads env.mk's outputs and never re-derives anything from uname.

Component gating

kit is a large toolchain, but most builds want only a slice of it. Gating lets a build drop whole axes — an arch, an object format, a language frontend, the optimizer, a subsystem, a CLI tool — down to a minimal freestanding library that still links and presents the full public API (gated-out calls return KIT_UNSUPPORTED). The axes are:

arch          AA64  X64  RV64  WASM  C_TARGET
obj-format    ELF   MACHO  COFF  WASM
language      ASM  CPP  C  TOY  WASM
optimizer     OPT          (O1+; O0 direct codegen is always present)
subsystems    AR DISASM DWARF LINK JIT DBG EMU INTERP
tools         CC CHECK CPP AS LD AR RANLIB STRIP OBJCOPY OBJDUMP
              DBG RUN EMU NM SIZE ADDR2LINE STRINGS CAS PKG

One source of truth: config.h

All flags are KIT_<COMPONENT>_ENABLED macros defined in include/kit/config.h. That file is preprocessor-only: every flag expands to a literal 0 or 1 usable from both #if and _Static_assert. The _ENABLED suffix exists so the macros never collide with the public enum constants of the same root (KIT_ARCH_RV64, KIT_OBJ_ELF, KIT_LANG_C are enum values, not gates).

The build mirrors these flags rather than duplicating them. mk/config.mk parses config.h in a single awk pass and evals each KIT_*_ENABLED line into an identically named make variable. The header is the single source of truth; the Makefile only reads it. This keeps the #if that drops a feature from the compile and the make rule that drops its source files perfectly in sync.

Some axes carry dependencies. These are documented constraints, not enforced invariants — config.h records them in comments, but no $(error) / _Static_assert rejects a contradictory configuration, so a hand-edited combination that violates them may simply fail to compile or link rather than being diagnosed. The constraints are: the C frontend needs the preprocessor (KIT_LANG_C consumes KIT_LANG_CPP); the interpreter consumes the optimizer's PReg-path Func, so it needs the optimizer; and the assembler substrate is always present (KIT_LANG_ASM_ENABLED gates only automatic registration of the asm frontend, because inline and file-scope asm in the C frontend depend on the same parser/emitter machinery).

Gate choke points

A central design rule: #if KIT_<axis>_* appears in exactly one file per axis. Everything downstream operates on registry outputs (vtables / impl pointers) and never re-checks a flag. This keeps the gating coherent and makes it trivial to audit what a given configuration includes.

                 the ONLY sites that test these flags
  KIT_LANG_*    ->  src/api/lang_registry.c
  KIT_ARCH_*    ->  src/arch/registry.c
  (ABI, derived)  ->  src/abi/registry.c

Linkability when gated out: config_stubs.c

The public API surface must always link, even for components that were compiled out. src/api/config_stubs.c provides weak no-op definitions for every public entry point of a gateable subsystem (ar, disasm, dwarf, link, jit, dbg, emu, and the internal debug_* producer hooks the DWARF axis would normally supply). Each stub body is itself wrapped in #if !KIT_<X>_ENABLED, so it compiles only when the real implementation is absent, and returns KIT_UNSUPPORTED / NULL. The result: an embedder linking a stripped libkit.a still resolves every symbol; calls to disabled features fail cleanly at run time instead of failing to link.

The Makefile does the symmetric work on the source side. For each disabled axis it filter-outs the matching implementation files (e.g. dropping each arch's disasm.c, link.c, dbg.c, emu.c, the src/opt/ tree, the per-format link.c, etc.) and, where an internal substrate symbol would go missing, adds a parallel *_stubs.c (src/arch/disasm_stubs.c, src/obj/link_stubs.c, src/interp/interp_stubs.c, ...). So there are two stub layers: config_stubs.c keeps the public API whole; the *_stubs.c files keep the internal link whole.

Tools and per-tool source sets

Each KIT_TOOL_*_ENABLED flag gates both the dispatch/help entry in driver/main.c and the driver/cmd/<tool>.c object compiled in. The driver also pulls in shared helper TUs (driver/lib/cflags.c, lib_resolve.c, hosted.c, runtime.c, inputs.c) only when at least one tool that needs them is enabled, and the distribution tools (cas, pkg) drag in their own driver/dist/ vendor set. The tool roster is centralized; see DRIVER.md.

Reproducible builds

Output is deterministic by construction — no timestamps, no randomness, no host paths in artifacts:

Symbol discipline

make test-lib-deps is a build-architecture guard, not a behavioral test. It asserts two invariants over the release libkit.a:

  1. The set of external (undefined) symbols the archive imports matches a checked-in allowlist (test/lib_deps.allowlist) — the library must not grow a hidden dependency on host libc or anything else.
  2. After relinking the archive into one relocatable object, every remaining externally visible definition uses a public prefix (Kit, kit_, KIT). Internal symbols must stay internal.

Together these keep libkit genuinely freestanding and its public surface honest. The check is part of the default make test set.

Tests

Tests are a large family of make test-* targets defined in mk/test.mk, grouped roughly as:

frontend        test-pp  test-parse  test-asm  test-toy
codegen/opt     test-cg-api  test-opt  test-isa  test-*-inline  test-abi-classify
object/link     test-elf  test-macho  test-coff  test-ar  test-link  test-driver-ar
debug           test-debug  test-dwarf  test-dbg
exec/interp/emu test-smoke-*  test-interp*  test-emu*  test-rt-*  test-libc*
roundtrip/diff  test-asm-roundtrip*  test-asm-symmetry  test-diff-llvm  test-hostas-*
driver/tools    test-driver  test-driver-{cc,ar,strip,objcopy,objdump,pkg,strings}

make test runs a curated DEFAULT_TEST_TARGETS subset (which includes test-lib-deps). Many harness binaries are built by the Makefile so they inherit host flags (sanitizers in debug). See TESTING.md for the harness design and conventions.

Bootstrap (staged self-build)

The goal is to compile kit with kit and prove the result is stable. The chain, per build mode:

seed cc (host clang)
   |  build libkit.a + kit
   v
stage1  = the host-built kit, copied aside, with cc/ld/ar/ranlib/as
   |     symlinks (busybox-style multi-call)
   |  rebuild the whole tree using stage1 as CC/AR/LD
   v
stage2  = kit built by kit
   |  rebuild the whole tree again using stage2 as CC/AR/LD
   v
stage3  = kit built by (kit built by kit)

invariant:  cmp stage2/kit stage3/kit   (must be byte-identical)

Stage2 vs stage3 is the fixed-point: once the compiler reproduces itself, a further self-application changes nothing. Reaching it exercises essentially the whole compiler on a real, substantial program — kit's own source — and the byte-identity check (leaning on the deterministic output above) catches any non-reproducible codegen. The bootstrap drives the normal Makefile with CC/AR/LD repointed at the stage's symlinks, so there is no separate "bootstrap build" — it is the same build rules run with kit as the toolchain. make bootstrap runs both the debug and release chains; make test-bootstrap-toy additionally runs the Toy corpus through the bootstrapped compiler.

The host-clang seed is the current root of trust. A full diverse-double-compilation / hex0-style seed chain (a tiny seed binary that needs no pre-existing C compiler) is a separate concern and is deliberately outside the boundary of this build.