Build & Configuration
This document describes how kit is built and configured: the make targets that produce the library, binary, and runtime; the compile-time component-gating model that lets a build include only the architectures, formats, frontends, subsystems, and tools it needs; the small set of choke points where those gates are honored; how output is made reproducible; and the staged self-build (bootstrap). It is a map of the build architecture, not a recipe list — see DESIGN.md for the system overview and RUNTIME.md for the runtime library it links against.
Three products, one tree
The source tree compiles into three outputs:
libkit.a the engine (public + internal C)
kit the multi-call driver binary
rt/<variant>/libkit_rt.a compiler-rt/libc support, per target variant
The first two are direct make targets (make lib, make bin); the runtime is
not a standalone user target. Its variants are produced as a dependency of the
self-host path — the freshly built kit compiles each libkit_rt.a (RT_CC = $(BIN) cc) — so building the runtime always goes through a working driver binary.
Layering is enforced by include paths, not convention:
- libkit (
src/+lang/) is freestanding C11. It sees both its public surface (-Iinclude) and its internals (-Isrc), and is compiled-ffreestanding -nostdincagainst the runtime headers (-Irt/include) with-fvisibility=hidden. It links to no host libc. - the driver (
driver/) is the first consumer of the public API. It gets-Iincludeand-Ilangbut deliberately not-Isrc, so internal headers are unreachable — if the driver needs something it must be public. The one hosted seam,driver/env/, is the only code compiled against the host SDK/libc (see "Host detection" below). - the runtime (
rt/) is built by kit itself (RT_CC = $(BIN) cc), once per target variant, and is what hosted programs link against.
make all builds lib + bin. The driver binary links the static
libkit.a; bin also drops a support/rt symlink so the freshly built
compiler can find runtime sources at run time.
libkit.a is a single relocated object
The library is not a naive ar of every .o. The object set is first combined
with ld -r into one relocatable object (build/.../libkit.o), and that is
archived. This guarantees the archive is rebuilt wholesale when sources are
added or removed (plain ar rcs only adds/updates members and would silently
retain a deleted file's object), and it gives the symbol-discipline check
(below) a single object to inspect.
Build modes
RELEASE=0 (default) is the development build: -O0 -g3, frame pointers, and
ASan+UBSan with halt_on_error. RELEASE=1 is -O2, -DNDEBUG,
function/data sections plus dead-strip at link. Mode flags
(HOST_OPTFLAGS / HOST_MODE_*FLAGS) live in the root Makefile; they are
recorded into build/.../.build-config so that flipping a mode flag forces a
rebuild of objects that were produced under the old flags. Build-mode concerns
are kept strictly separate from host-environment concerns (next section).
RELEASE=1's -DNDEBUG also compiles out the KIT_TRACE / KIT_LOG*
developer tracing (include/kit/trace.h): each trace point expands to nothing,
its arguments unevaluated. Define KIT_TRACE_FORCE to keep tracing in an NDEBUG
build — useful for debugging a release or bootstrap image. Otherwise tracing is
a pure runtime affordance, dormant until the KIT_TRACE env var selects modules
and levels (KIT_TRACE=cg=trace,coff=debug, or KIT_TRACE=1 for everything,
matched as a substring against each trace point's module name, default
__FILE__). It rides the same weak-override seam as the rest of the hosted
boundary: libkit.a ships no-op kit_trace_enabled / kit_trace_emit and
holds no trace state, while the hosted driver/env/common.c supplies the strong
overrides that parse KIT_TRACE once and write to stderr — so a libkit-only
link traces nothing and pays nothing.
Host detection: mk/env.mk
mk/env.mk is the only place in the build that branches on the host OS or
arch. It normalizes uname into HOST_OS / HOST_ARCH, resolves the Darwin
-isysroot (empty elsewhere, so splicing $(HOST_SYSROOT_*FLAGS) is always
safe), and — crucially — selects the exact set of driver/env/*.c files for the
host: one per OS, one per arch for icache flush, and on POSIX one per
(arch, OS) for ucontext register marshalling. Choosing source files in the
build, rather than #ifdef-ing one mega-file, is the explicit design: every
hosted adapter TU compiles for exactly one platform. The rest of the Makefile
reads env.mk's outputs and never re-derives anything from uname.
Component gating
kit is a large toolchain, but most builds want only a slice of it. Gating lets
a build drop whole axes — an arch, an object format, a language frontend, the
optimizer, a subsystem, a CLI tool — down to a minimal freestanding library that
still links and presents the full public API (gated-out calls return
KIT_UNSUPPORTED). The axes are:
arch AA64 X64 RV64 WASM C_TARGET
obj-format ELF MACHO COFF WASM
language ASM CPP C TOY WASM
optimizer OPT (O1+; O0 direct codegen is always present)
subsystems AR DISASM DWARF LINK JIT DBG EMU INTERP
tools CC CHECK CPP AS LD AR RANLIB STRIP OBJCOPY OBJDUMP
DBG RUN EMU NM SIZE ADDR2LINE STRINGS CAS PKG
One source of truth: config.h
All flags are KIT_<COMPONENT>_ENABLED macros defined in
include/kit/config.h. That file is preprocessor-only: every flag expands
to a literal 0 or 1 usable from both #if and _Static_assert. The
_ENABLED suffix exists so the macros never collide with the public enum
constants of the same root (KIT_ARCH_RV64, KIT_OBJ_ELF, KIT_LANG_C
are enum values, not gates).
The build mirrors these flags rather than duplicating them. mk/config.mk parses
config.h in a single awk pass and evals each KIT_*_ENABLED line into an
identically named make variable. The header is the single source of truth; the
Makefile only reads it. This keeps the #if that drops a feature from the
compile and the make rule that drops its source files perfectly in sync.
Some axes carry dependencies. These are documented constraints, not enforced
invariants — config.h records them in comments, but no $(error) /
_Static_assert rejects a contradictory configuration, so a hand-edited combination
that violates them may simply fail to compile or link rather than being diagnosed.
The constraints are: the C frontend needs the preprocessor (KIT_LANG_C consumes
KIT_LANG_CPP); the interpreter consumes the optimizer's PReg-path Func, so it
needs the optimizer; and the assembler substrate is always present
(KIT_LANG_ASM_ENABLED gates only automatic registration of the asm
frontend, because inline and file-scope asm in the C frontend depend on the same
parser/emitter machinery).
Gate choke points
A central design rule: #if KIT_<axis>_* appears in exactly one file per
axis. Everything downstream operates on registry outputs (vtables / impl
pointers) and never re-checks a flag. This keeps the gating coherent and makes it
trivial to audit what a given configuration includes.
the ONLY sites that test these flags
KIT_LANG_* -> src/api/lang_registry.c
KIT_ARCH_* -> src/arch/registry.c
(ABI, derived) -> src/abi/registry.c
src/api/lang_registry.cruns at compiler construction and wires each compiled-in frontend's vtable into the compiler's frontend table, dispatched later byKitLanguage. Third parties can still install or override a slot via the publickit_register_frontend().src/arch/registry.cowns the roster ofArchImpl(machine-code backends: emit, DWARF, debugger hooks, register file) and resolves theCGBackendfor a session. AnArchImpl's first field is aCGBackend, so a machine arch is a superset of a code-emitting backend;c_target(C source output) and the check-only backend areCGBackends with noArchImpl.src/abi/registry.cis derived, not user-configurable. An ABI entry exists only when both its machine arch and its object/OS-format are enabled — e.g. AAPCS64 needs AA64 (or the C target) plus ELF, Apple-arm64 needs Mach-O, win64 needs COFF. The arch x format product is computed at preprocessor time from the sameKIT_ARCH_*/KIT_OBJ_*flags.
Linkability when gated out: config_stubs.c
The public API surface must always link, even for components that were compiled
out. src/api/config_stubs.c provides weak no-op definitions for every
public entry point of a gateable subsystem (ar, disasm, dwarf, link, jit, dbg,
emu, and the internal debug_* producer hooks the DWARF axis would normally
supply). Each stub body is itself wrapped in #if !KIT_<X>_ENABLED, so it
compiles only when the real implementation is absent, and returns
KIT_UNSUPPORTED / NULL. The result: an embedder linking a stripped
libkit.a still resolves every symbol; calls to disabled features fail
cleanly at run time instead of failing to link.
The Makefile does the symmetric work on the source side. For each disabled
axis it filter-outs the matching implementation files (e.g. dropping each
arch's disasm.c, link.c, dbg.c, emu.c, the src/opt/ tree, the
per-format link.c, etc.) and, where an internal substrate symbol would go
missing, adds a parallel *_stubs.c (src/arch/disasm_stubs.c,
src/obj/link_stubs.c, src/interp/interp_stubs.c, ...). So there are two stub
layers: config_stubs.c keeps the public API whole; the *_stubs.c files keep
the internal link whole.
Tools and per-tool source sets
Each KIT_TOOL_*_ENABLED flag gates both the dispatch/help entry in
driver/main.c and the driver/cmd/<tool>.c object compiled in. The driver
also pulls in shared helper TUs (driver/lib/cflags.c, lib_resolve.c,
hosted.c, runtime.c, inputs.c) only when at least one tool that needs them
is enabled, and the distribution tools (cas, pkg) drag in their own
driver/dist/ vendor set. The tool roster is centralized; see
DRIVER.md.
Reproducible builds
Output is deterministic by construction — no timestamps, no randomness, no host paths in artifacts:
- Image identity is a content+layout hash, not a clock read.
src/link/link_image_id.cfolds each segment's vaddr, file size, and post-relocation bytes through two FNV-1a streams to produce a stable 128-bit id. The id is wrapped per format: ELF emits it as a.note.gnu.build-id(src/obj/elf/link.c), Mach-O as theLC_UUIDpayload (src/obj/macho/link.c) — the same bytes either way. - Object headers zero their time fields: COFF
TimeDateStampand the COFF archive/import time fields are written as 0 (and PE marks itself deterministic), Mach-O dylib timestamps are zeroed. - The self-build verifies this end to end: stage2 and stage3 must be byte-identical (below).
Symbol discipline
make test-lib-deps is a build-architecture guard, not a behavioral test. It
asserts two invariants over the release libkit.a:
- The set of external (undefined) symbols the archive imports matches a
checked-in allowlist (
test/lib_deps.allowlist) — the library must not grow a hidden dependency on host libc or anything else. - After relinking the archive into one relocatable object, every remaining
externally visible definition uses a public prefix (
Kit,kit_,KIT). Internal symbols must stay internal.
Together these keep libkit genuinely freestanding and its public surface
honest. The check is part of the default make test set.
Tests
Tests are a large family of make test-* targets defined in mk/test.mk,
grouped roughly as:
frontend test-pp test-parse test-asm test-toy
codegen/opt test-cg-api test-opt test-isa test-*-inline test-abi-classify
object/link test-elf test-macho test-coff test-ar test-link test-driver-ar
debug test-debug test-dwarf test-dbg
exec/interp/emu test-smoke-* test-interp* test-emu* test-rt-* test-libc*
roundtrip/diff test-asm-roundtrip* test-asm-symmetry test-diff-llvm test-hostas-*
driver/tools test-driver test-driver-{cc,ar,strip,objcopy,objdump,pkg,strings}
make test runs a curated DEFAULT_TEST_TARGETS subset (which includes
test-lib-deps). Many harness binaries are built by the Makefile so they inherit
host flags (sanitizers in debug). See TESTING.md for the harness
design and conventions.
Bootstrap (staged self-build)
The goal is to compile kit with kit and prove the result is stable. The chain, per build mode:
seed cc (host clang)
| build libkit.a + kit
v
stage1 = the host-built kit, copied aside, with cc/ld/ar/ranlib/as
| symlinks (busybox-style multi-call)
| rebuild the whole tree using stage1 as CC/AR/LD
v
stage2 = kit built by kit
| rebuild the whole tree again using stage2 as CC/AR/LD
v
stage3 = kit built by (kit built by kit)
invariant: cmp stage2/kit stage3/kit (must be byte-identical)
Stage2 vs stage3 is the fixed-point: once the compiler reproduces itself, a
further self-application changes nothing. Reaching it exercises essentially the
whole compiler on a real, substantial program — kit's own source — and the
byte-identity check (leaning
on the deterministic output above) catches any non-reproducible codegen. The
bootstrap drives the normal Makefile with CC/AR/LD repointed at the
stage's symlinks, so there is no separate "bootstrap build" — it is the same
build rules run with kit as the toolchain. make bootstrap runs both the debug
and release chains; make test-bootstrap-toy additionally runs the Toy corpus
through the bootstrapped compiler.
The host-clang seed is the current root of trust. A full diverse-double-compilation / hex0-style seed chain (a tiny seed binary that needs no pre-existing C compiler) is a separate concern and is deliberately outside the boundary of this build.