kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

DRIVER

The kit multitool is the toolchain's only executable: a single binary that dispatches to ~27 named tools (compiler, assembler, linker, archive/object utilities, byte utilities, JIT runner, debugger, emulator, packager, and an install command that lays down the per-tool links). It is also the first and canonical consumer of libkit — it depends only on the public API under include/kit/, never on src/. Everything that the OS provides (heap, file I/O, executable memory, threads, signals, time, entropy) enters libkit through host vtables that the driver constructs in exactly one place. See DESIGN.md for the library it drives, INTERFACES.md for the public API, RUNTIME.md for libkit_rt.a, and DISTRIBUTE.md for the pkg/cas subsystem.

Layering

The driver is three layers plus a vendored subsystem, all under driver/:

  main.c          dispatch + top-level help (the only entry: int main)
  cmd/<tool>.c    one CLI shell per tool (cc, ld, run, objdump, ...)
  lib/            cross-tool helpers (cflags, triples, inputs, runtime, hosted)
  env/            THE host boundary: turns "the OS" into a KitContext
  dist/           vendored crypto/compression for pkg + cas
       \________________ all of the above call only <kit/...> public headers

Two compile profiles enforce the boundary (see Makefile):

This split is the concrete form of the project's "no global state" rule: the freestanding tools cannot touch the host except through callbacks the env/ layer hands them, so each tool's behavior is fully a function of its arguments and the vtables it is given.

Dispatch (main.c)

driver/main.c holds the single centralized tool table — an array of {name, main, help, summary} rows. Dispatch is multi-call:

  argv[0] basename matches a tool?  -> run it           (installed as `cc` symlink)
  else argv[0] was bare "kit":
      no argv[1] / -h / --help      -> top-level help
      argv[1] == "help" [<tool>]    -> top-level or per-tool help
      argv[1] matches a tool        -> run it with argv shifted by one
      otherwise                     -> "no such tool" + help, exit 2

So kit cc -c f.c and a cc symlink to the binary behave identically; the shift trick rewrites argv[1] to argv[0] before delegating so the tool sees a conventional argv. dispatch returns -1 for "no such tool" (distinct from a tool's own non--1 exit code) which is what lets the bare-kit fallback logic run only when argv[0] itself wasn't a tool name.

Each table row is wrapped in #if KIT_TOOL_<NAME>_ENABLED (defined in include/kit/config.h). The same flags gate which driver/cmd/*.c objects the Makefile compiles in, so a disabled tool drops out of both the table and the build with no #ifdef scattered through the tool implementations. Adding a tool is: a config.h flag, a row here, the driver_<tool> / driver_help_<tool> prototypes in driver/driver.h, the cmd/<tool>.c, and a Makefile stanza.

Exit-code convention across all tools: 0 success, 1 tool-reported error, 2 bad usage. Help requests are detected by driver_argv_wants_help, which stops scanning at a literal -- so that a --help meant for a JITed program (run, dbg) or an emulated guest (emu) is not hijacked by the driver.

The tools (cmd/)

Each cmd/<tool>.c is a thin CLI shell: parse a flag surface, classify inputs, load bytes via the env file_io, call public libkit APIs, format output. No tool reaches into compiler internals.

Tool Role
cc C compiler driver: compile, optionally link; preprocess (-E), dep-emit (-M*), -shared. GCC flag subset. Resolves -l/-L to concrete archive paths.
check Run the C frontend checks with no code emission.
build-exe Kit-native build verb: compile a polyglot source set (C / asm / toy / wasm, per file) in memory and link it — with any .o/.a/.so inputs — into an executable. No intermediate files.
build-lib Compile a polyglot source set in memory into a static .a (default) or, with -dynamic, a shared library.
build-obj Compile sources to one object (or --emit=asm\|c\|ir, or -fsyntax-only check); multiple sources combine into one relocatable object (ld -r). The kit-native replacement for the retired compile tool.
install Lay down per-tool links (symlinks; hard links on Windows) in a target dir so the toolchain works under bare names (cc, ld, nm, …). Default set is the toolchain + standard-named byte utils; --all / explicit names override.
cpp Standalone preprocessor (alias for cc -E without link scaffolding).
as Assemble one GAS-subset text source to a relocatable object.
ld Link objects/archives into an executable, shared library, or relocatable object; parses -T scripts into structured form.
ar / ranlib Create/modify/list/extract ar archives; refresh the symbol index.
strip / objcopy Drop debug/symbols; rename/remove sections, reformat.
objdump / nm / size Inspect sections, symbols, disassembly, relocations, sizes.
addr2line / strings Address→file:line via DWARF; printable runs.
xxd Hex dump any file (format-agnostic, unlike objdump -s); reverse a dump to binary (-r), plain (-p), C array (-i).
cmp Compare two files byte by byte; GNU/BSD-compatible messages and 0/1/2 exit codes.
hash SHA-256, BLAKE2b-256, or CRC-32 (-a) of files or stdin; coreutils-style output. Backed by the public <kit/hash.h>.
sha256sum / b2sum / crc32 hash under standard names, each pinning its algorithm (-a rejected). sha256sum is byte-compatible with coreutils; kit's b2sum is BLAKE2b-256 (GNU defaults to 512).
compress Compress/decompress a stream with gzip (.gz, default) or the LZ4 frame format (.lz4); -d decompresses (format auto-detected from magic). Output interoperates with stock gzip/lz4. Backed by the public <kit/compress.h>.
gzip / gunzip / lz4 / lz4c compress under standard names: a pinned container + default direction. -d flips direction, -z is rejected, and the common gzip/lz4 flags (-c/-k/-f, -1..-9, …) are accepted as no-ops; output always streams to stdout/-o (no in-place rewrite).
disas Disassemble a raw, headerless byte buffer (file/stdin/inline -x hex) for a -target arch.
mc Assemble one instruction and show its encoding (llvm-mc style); lists any relocations.
run JIT-compile inputs and call the entry symbol in-process.
dbg Interactive JIT debugger (REPL over a KitJitSession).
emu Run a guest user-mode ELF (aarch64/riscv64) via per-block JIT translation.
cas / pkg Content-addressed store and signed .kpkg distribution.

run, dbg, and emu share the ---terminated argv convention: flags before -- configure the tool, tokens after -- become the JITed program's / guest's argv. cc and run overlap heavily on input shape and the preprocessor flag family — that overlap is exactly what driver/lib/ factors out.

cc vs build-*: cc is the GCC-compatible C driver — a drop-in cc/clang for build systems, with the full Unix-toolchain flag surface (-Wl,, -M*, sysroots, hosted-libc expansion, -l/-L) and a linker. It still accepts non-C sources by suffix. The build-* trio is the kit-native front door to the same in-memory, no-temp-files compile+link pipeline, without pretending to be gcc: every command is polyglot (language resolved per file), forwards per-language frontend flags via -X<lang> FLAG, and scopes compile flags to individual sources via --group [scopable flags] -- sources…. Flags split into two tiers — global / per-output flags (target, -O/-g, all link and output flags) that apply to the whole build, and a small scopable set (-I/-isystem/ -D/-U, -x, -X<lang>) that forms a baseline and may be overridden inside a --group. cc and build-* share one link path (driver/lib/link_engine.c) and the same language-neutral per-source compile step (driver/lib/compile_engine.c).

run doubles as a #! script interpreter so a C file can be made executable and run directly. The kernel's shebang mechanism appends the script path and the user's arguments after the interpreter's flags, with no way to inject a -- between them, so run --script FILE names the sole source and routes every later token to the program's argv (an implicit -- after FILE). --script implies -lc (scripts are usually hosted; under the JIT that only enables libc headers/macros — symbols still resolve at run time via host dlsym). The portable shebang is #!/usr/bin/env -S kit run --script (the env -S split is required because Linux passes everything after the interpreter as one argument). A leading #! line on the primary source file is recognized and skipped by the lexer (lex_skip_shebang, byte 0 only) for both the C frontend and cc -E, so the shebang is never mistaken for a # directive; includes and paste buffers are untouched.

Cross-tool helpers (lib/)

These hold the logic that more than one tool needs, so the CLI shells stay thin and consistent. All are freestanding (no host calls except through env/).

The host boundary (env/)

driver/env/ is the heart of the driver's design. It is the single place that constructs a DriverEnv and projects it into the vtables libkit consumes:

  DriverEnv  ->  driver_env_to_context()    -> KitContext  (heap, file_io, diag, metrics, now)
             ->  driver_env_to_jit_host()   -> KitJitHost  (execmem, jit_tls)
             ->  driver_env_to_dbg_host()   -> KitDbgHost  (dbg_os)

A KitContext is passed by const-pointer into every libkit entry; the JIT and debugger take their extra host vtables per-call rather than on the context, which keeps the common compile/link path from carrying execmem/signal machinery it never uses. libkit itself holds no global state and issues no syscalls — it only calls back through these function pointers, so the driver alone decides how the abstract operations map onto the real OS.

What the vtables abstract

Beyond the vtables, env/ also exposes the syscall-shaped helpers the freestanding tools need but can't make themselves: driver_printf/errf, path existence/mtime, mkdir -p, directory walks, stdin slurp, an $EDITOR temp-file round-trip, a raw-mode line editor with history/completion for the dbg REPL, SIGINT install/restore, monotonic time, CSPRNG bytes (for pkg key generation), a dlsym resolver so JITed code can call host libc, and (for install) self-executable-path resolution plus symlink / hard-link / unlink / no-follow-existence primitives. The self-path resolver is the one helper with a genuine per-OS divergence — /proc/self/exe (Linux), _NSGetExecutablePath (macOS), KERN_PROC_PATHNAME (FreeBSD), GetModuleFileNameW (Windows) — so it lives one-impl-per-OS alongside driver_default_hosted_dirs; the link/unlink helpers are POSIX-shared with a Windows twin.

One TU per concern, zero #ifdef

The env layer's structuring invariant: each TU implements one slice of the host with no preprocessor OS/arch conditionals. mk/env.mk is the only place in the build that branches on uname, and it selects exactly one file per axis:

  common.c                         every host (libc-pure floor)
  posix.c / windows.c              shared POSIX scaffold  |  whole Win32 surface
  posix_dbg.c, jit_tls_posix.c     POSIX dbg + TLS        |  (folded into windows.c)
  macos.c | linux.c | freebsd.c    per-OS hooks (one)
  icache_{arm,x86,riscv}.c         per-arch icache flush (one)
  uctx_<arch>_<os>.c               per-(arch,OS) ucontext<->frame marshalling (one)
  linux_exec_hint_{x86_64,default}.c   per-arch Linux mmap hint (one)

env_internal.h holds the OS-neutral surface (heap/diag singletons, the arch-only icache hook). env_posix.h adds the POSIX-only surface (the exec_dual alias registry, the os_* per-OS hooks, ucontext marshalling, the dbg interrupt signo). Windows folds everything into one TU because it shares no POSIX overlap.

W^X executable memory: the genuine divergence

The interesting per-OS work is producing executable memory under a strict write-xor-execute regime, where the JIT/debugger needs both a writable view and an executable view of the same physical pages:

When write and runtime aliases differ, the reservation registers itself in the exec_dual registry (posix.c) so the debugger's code_write_begin can translate a runtime address into the writable alias. Single-mapping reservations (write == runtime, the non-exec path) skip the registry and the debugger falls back to a transient mprotect. The arch-correct icache flush after a code write lives in the icache_* TUs (__builtin___clear_cache on arm/riscv; a no-op on coherent x86).

Debugger host (posix_dbg.c)

The POSIX KitDbgOs runs the debuggee on a worker thread and installs sigaction handlers for SIGTRAP/SIGSEGV/SIGBUS/SIGILL/SIGFPE plus a SIGUSR2 interrupt. On a fault it marshals the ucontext_t into a KitUnwindFrame (delegating the register layout to the per-(arch,OS) uctx_* TU), hands it to the session's on_fault, and writes back any session-edited register state. A sigsetjmp-guarded memcpy lets the session read possibly-bad target memory without crashing the process. Only the registered worker thread participates; faults on other threads fall through to the previous handler. Windows mirrors this with vectored exception handling and Suspend/GetThreadContext.

Data flow: a representative cc invocation

  main(argv) -> driver_main -> dispatch("cc") -> driver_cc(argc, argv)
       driver_env_init(&env)                    # build host vtables once
       parse flags (lib/cflags, lib/target)
       lib/runtime: discover support root, ensure libkit_rt.a for target
       lib/hosted: plan crt/libs/includes if linking a hosted exe
       lib/lib_resolve: -l/-L -> archive paths
       driver_env_to_context(&env) -> KitContext
       kit_compiler_new(target, ctx) ; kit_compile_* ; kit_link_*
            (libkit calls back through ctx.heap / ctx.file_io / ctx.diag)
       return 0 / 1 / 2

run/dbg differ only in also building a KitJitHost (and, for dbg, a KitDbgHost) and pumping inputs through lib/inputs instead of emitting a file. The shape — parse, classify, build a context, call public APIs — is the same for every tool.

The dist subsystem (cas, pkg)

driver/dist/ is a self-contained, vendored implementation of content-addressed storage and signed-package distribution: BLAKE2b hashing, ed25519 (monocypher) signing, minisign-format signatures, deflate/lz4 compression, tar bundling, and the .kpkg manifest. It is vendored so the package pipeline has no runtime dependency on host crypto/compression libraries; the only host input it takes is CSPRNG bytes via driver_random_bytes. The cas and pkg CLI shells in cmd/ are thin layers over it. See DISTRIBUTE.md.