kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Runtime (libkit_rt.a)

libkit_rt.a is the target runtime: the small body of code and headers that kit-compiled programs link against, entirely separate from the compiler library libkit.a. The compiler emits calls to ABI-mandated helper symbols (__divti3, __addtf3, __atomic_load_8, …), references freestanding standard headers (<stdint.h>, <stdatomic.h>, …), and may emit startup hooks (.init_array IFUNC resolution). The runtime supplies the implementations. It is freestanding — no OS, no hosted libc — so the same archive backs a Linux binary, a Darwin binary, and a bare-metal image alike. It is built per target by kit itself (see BUILD.md) and ships with the toolchain; the driver links it automatically (see DRIVER.md).

   user .c  ──kit──►  emits __divti3, va_arg, _Atomic, coro_resume, ...
                              │
            <stdint.h> etc. ◄─┘  (shipped headers, rt/include/)
                              │
                       libkit_rt.a  (per-target archive)
       ┌──────────────┬───────────┬──────────┬──────────┬───────────┐
   int/fp soft     mem/string   atomic     coro      startup     freestanding
   helpers         /stdio       shim       switch    (IFUNC)     libc subset

Design principles

No target-dispatch ifdefs in source. The integer/float helpers derive from compiler-rt (lib/builtins/, Apache-2.0 WITH LLVM-exception, see rt/lib/LICENSE-compiler-rt.txt), but upstream's #ifdef __ARM_EABI__ / __MINGW32__ / __SOFTFP__ target cascades were stripped out. Per-target variation is expressed by the build — which directories and flags are selected — not by preprocessor branches inside the C. What remains of the preprocessor is parameterization (precision, src/dst pair) and genuinely-orthogonal concerns (assembler syntax in assembly.h, HAS_INT128), never target dispatch.

One master .c/.S per feature, one object in the archive. Rather than globbing compiler-rt's many per-op files, each feature group is a single master translation unit (rt/lib/int/int.c, rt/lib/fp/fp.c, …) with the per-op snippets inlined as commented blocks (// ---- udivmoddi4.c ----). Templates in rt/lib/impl/ (fp_add_impl.inc, int_div_impl.inc, …) and the re-includable rt/lib/include/common/fp_lib.h are pulled into the master multiple times per TU, once per precision or per (src,dst) pair, with suffix-renamed statics so the single object carries every needed instance. This keeps the archive small and the member list explicit.

Weak portable fallbacks. Everything a hosted libc would normally own — memcpy/memmove/memset/memcmp, the string and stdlib functions, __clear_cache, __kit_assert_fail — is defined __attribute__((weak)) in portable C, so a real libc or a tuned arch-specific routine wins at link time without a conflict. The freestanding definitions only matter when nothing else provides them.

Build-time target selection (multilib)

rt/Makefile (included by the root Makefile) enumerates RT_VARIANTS and, for each, a small feature vector that drives source and flag selection. The dimensions are: clang target triple, data model (lp64 / ilp32 / llp64), HAS_INT128 (0/1), the coro arch token, binary128 long-double support (LDBL128), RISC-V save/restore, ARM AEABI mode, and a HOSTED flag. A single GNU-make define template expands each variant into its object list, compile flags, and an ar-built libkit_rt.a under $(BUILD_DIR)/rt/<variant>/. kit compiles and archives its own runtime: RT_CC/RT_AS/RT_AR default to kit cc/kit as/kit ar, so a codegen change in the compiler rebuilds the runtime.

The data-model dimension is the multilib axis. It selects:

Data model long/ptr 128-bit int master include dir targets
lp64_le 64 / 64 yes int64/int64.c rt/lib/include/lp64_le x86_64, aarch64, rv64 (LE)
llp64_le 32 / 64 yes int64/int64.c rt/lib/include/llp64_le Win64 x86_64 / aarch64
ilp32_le 32 / 32 no int32/int32.c rt/lib/include/ilp32_le i386, arm32, rv32, wasm32
lp64_le_ldbl128 (lp64 +) yes (lp64 + fp_tf) -include .../tf_supplement.h aarch64/rv64 binary128 long double

The per-model dir holds one file, int_lib.h, the compiler-rt support header folded together with upstream's int_endianness.h/int_types.h. They differ only where the data model forces it: LP64/LLP64 declare the ti_int/tu_int __int128 machinery and twords/utwords unions; ILP32 omits all of it (no 128-bit type) and instead defines AEABI_RTABI (the AAPCS __pcs__ attribute the ARM sources need). lp64_le_ldbl128 is not a separate header set but an extra -include tf_supplement.h layered onto an LP64 build, defining tf_float / CRT_HAS_TF_MODE before fp_lib.h processes them — keeping the base header free of feature gates. All headers assume little-endian; a big-endian port would need a parallel *_be/ set.

The HOSTED flag (Windows variants) ships only the compiler-support subset (RT_COMPILER_SRCS: int/fp/atomic/cache/ifunc) and lets the platform libc supply mem/string/stdio/stdlib. Everything else ships the full RT_BASE_SRCS.

Compiler-support helpers

These are the ABI-mandated symbols the backends emit when an operation has no native instruction.

What this runtime deliberately does not provide: 80-bit x86 xf soft float (x86 always has the FPU for long double), half-precision conversions, big-endian targets, and the __riscv_32e/64e embedded ABIs — none are in kit's runtime contract.

Per-arch assembly helpers

A Win64 stack-probe helper (rt/lib/stack/chkstk_x86_64_win.c, __chkstk / ___chkstk_ms page-touch probes for large frames) lives in the tree but is not wired into any variant's source list and so ships in no archive today. It is noted here only so the orphan is not mistaken for a present provider; when Win64 large-frame probing lands it would join the hosted Windows variant's source set.

kit-specific startup: IFUNC resolution

rt/lib/kit/ifunc_init.c provides __kit_ifunc_init, the startup hook for statically-linked ELF images that use STT_GNU_IFUNC symbols. The linker (src/link/link_layout.c) materializes one IPLT stub and .igot.plt slot per IFUNC, emits a parallel .iplt.pairs section of (resolver, slot) pointer pairs, and synthesizes a .init_array entry pointing at this function. Before main, the CRT walks .init_array; __kit_ifunc_init iterates the pairs, calls each resolver, and stores the chosen implementation pointer into its slot, so the IPLT load-and-branch tail-calls the right target. The .iplt.pairs span symbols (__start_iplt_pairs/__stop_iplt_pairs) are weak, so the object is a harmless no-op when linked into images with no IFUNCs or by a non-kit linker. The JIT path resolves slots in-process at load time and skips the .init_array synthesis, so this symbol is never an unresolved reference there (see JIT.md, LINK.md).

rt/lib/assert/assert.c supplies the weak __kit_assert_fail (the target of assert() failure), which __builtin_traps and spins.

Coroutines: stackful asymmetric context switch (rt/lib/coro/)

kit ships <kit/coro.h> as a native extension — C11 has no stackful coroutine facility — built as a deliberate counterpart to <setjmp.h>. The two share one per-target register-context payload (256 bytes, 16-aligned): the same save/restore instruction sequences back setjmp/longjmp and the coroutine switch. The module is two layers:

   <kit/coro.h>  coro_init / coro_resume / coro_yield / coro_self
        │
   coro/coro.c     arch-agnostic asymmetric layer  ── resume chain, thunk
        │             (one TU, built for every coro variant)
        ▼
   coro/<arch>.c   per-arch primitives:  setjmp / longjmp,
   (+ aarch64*.s)  __kit_coro_switch, __kit_coro_ctx_init, trampoline

Per-arch primitive (rt/lib/coro/<arch>.c). One per ABI: aarch64, x86_64, x86_64_win, i386, arm32, arm32_thumb1, riscv32, riscv64. Each defines the per-target context struct (callee-saved GPRs + callee-saved FPRs + sp + return address — e.g. x86_64 SysV is 8 words/64 bytes; aarch64 is x19–x28/fp/lr/sp + d8–d15/176 bytes), and verifies via _Static_assert that it fits both jmp_buf and coro_ctx. The three primitives that save/restore registers — setjmp, longjmp, and __kit_coro_switch — share one pair of SAVE_INTO/RESTORE_FROM macros so identical instruction bytes are emitted in all three. Symbol decoration uses __USER_LABEL_PREFIX__, so one source compiles for ELF / Mach-O / COFF. Most arches keep the asm file-scope inside the .c; aarch64 splits it into aarch64_elf.s / aarch64_macho.s (selected per variant via RT_EXTRA_SRCS) so the C TU needs no file-scope-asm support. The thumb1 variant is a separate file because its ARMv6-M sequences (no IT blocks, no VFP, no str sp) can't share with arm32.c. wasm32 ships no coro (would need an Asyncify fiber port).

__kit_coro_switch is the symmetric register shuffle: save callee state into *from, restore from *to, deliver a value. It is exposed in the public header for advanced (M:N, work-stealing) schedulers, and is the building block under the asymmetric layer. __kit_coro_ctx_init lays down a fresh context: zero the saved registers, point the entry-fn register and return address at the trampoline, and set sp to the (16-aligned, downward-growing) stack top.

Asymmetric layer (rt/lib/coro/coro.c). Implements coro_init / coro_resume / coro_yield / coro_self. A coro_t's private blob holds the coro_ctx, a resumer back-pointer, and the user entry fn (a _Static_assert pins the fit inside the header's 288-byte reservation). coro_resume records the caller as the resumer (NULL meaning the main flow), switches in, and on return restores the previous current-coroutine pointer; coro_yield reads its own resumer slot and switches back — so resumes nest like calls. The trampoline enters a static thunk (__kit_coro_thunk) that runs the user fn, marks the coroutine CORO_DEAD, and switches back to the resumer with the return value, so the symmetric primitive never needs to know about coro_t lifecycle. The "current coroutine" pointer and the "main" save slot are _Thread_local, so each thread gets an independent resume chain; kit's contract defines __STDC_NO_THREADS__, but _Thread_local is an independent C11 language feature, and bare-metal images with no TLS runtime collapse to single-thread semantics.

Thread-local storage (freestanding contract)

kit emits the Local-Exec TLS model only — there is no dynamic TLS (__tls_get_addr, GD/LD) and no TLS allocator. _Thread_local objects live in the executable's PT_TLS image and are reached tp-relative, with the per-arch offset baked in by the linker (ObjElfArchOps.tls_tp_bias, applied in src/obj/elf/link.c's tls_tcb_bias).

The runtime ships no crt0, so a freestanding image's own startup establishes the thread block and tp. The layout kit's codegen + linker assume for RISC-V and AArch64 (TLS variant I) is a 16-byte TCB ahead of .tdata:

[ TCB (16 bytes) | .tdata (init image) | .tbss (zeroed) ]
  ^tp

so a TLS variable at image offset off is accessed at tp + 16 + off. Startup must therefore reserve 16 + tdata_size + tbss_size, copy the .tdata init image to block + 16, zero the .tbss span, and set tp = block. The reference implementation is test/link/harness/start.c (the rv32 bare-metal stub lives in test/lib/exec_rv32_bare.sh); under ilp32f that startup must also set mstatus.FS before any FP op. x86_64 uses TLS variant II instead (tp/%fs points past the image, TPOFF offsets are negative), so it carries no TCB bias.

On a hosted RISC-V target the psABI points tp at the image start (bias 0, matching Linux/FreeBSD _init_tls); kit's linker selects that 0 bias for non-freestanding RISC-V automatically. With no thread block set up at all, _Thread_local still resolves against a single static image, so a bare-metal program that never sets tp collapses to single-thread semantics.

Shipped headers (rt/include/)

kit ships its own header set so freestanding compilation needs no system headers. Two groups:

Freestanding C standard headers. The C11 freestanding-mandated set (<stddef.h>, <stdint.h>, <stdarg.h>, <stdalign.h>, <stdbool.h>, <stdnoreturn.h>, <float.h>, <limits.h>, <iso646.h>, <stdatomic.h>), plus headers kit provides as extensions beyond the freestanding subset (<setjmp.h>, <assert.h>, and small <string.h>/<stdlib.h>/<stdio.h>/ <math.h> surfaces matching the runtime's weak functions). They lean on compiler builtins where the data model varies: <stdarg.h> is __builtin_va_*; <stdint.h> hardcodes the exact-/min-width limits (kit fixes CHAR_BIT == 8, int == 32, long long == 64) but delegates the type aliases (__INT32_TYPE__, intptr_t, the FAST family) to the compiler since those vary by model. A handful of x86 intrinsic shims (<emmintrin.h>, <x86intrin.h>, <mm_malloc.h>) round out source compatibility.

<setjmp.h> is ABI-coupled to the coro payload: jmp_buf is a 256-byte, 16-aligned struct-wrapped array (struct wrapper guarantees alignment for x86_64 xmm saves; [1] keeps it an array type so it decays to a pointer), sized to the largest per-target context (x86_64 Windows). No signal-mask slot — C11 7.13 excludes FP status and open-file state.

kit extension headers (rt/include/kit/). Non-standard primitives kit exposes so low-level code stays pure C:

Freestanding libc subset

For non-hosted targets the runtime also carries a minimal libc so freestanding programs can do basic work without a platform libc. All weak: rt/lib/mem/mem.c (the four mem* primitives, hand-written portable C, 0BSD — not from compiler-rt), rt/lib/string/string.c (strlen, strcpy, memchr, …), rt/lib/stdlib/stdlib.c + qsort.c (string-to-number conversions, qsort, the div_t family), and rt/lib/stdio/printf.c (a callback-driven formatter derived from mpaland/printf, integer formats only — floating-point conversions omitted). Hosted (Windows) variants drop all of these in favor of the platform libc.


Planned/roadmap work, if any, lives under doc/plan/ — not here.