Runtime (libkit_rt.a)
libkit_rt.a is the target runtime: the small body of code and headers that
kit-compiled programs link against, entirely separate from the compiler
library libkit.a. The compiler emits calls to ABI-mandated helper symbols
(__divti3, __addtf3, __atomic_load_8, …), references freestanding standard
headers (<stdint.h>, <stdatomic.h>, …), and may emit startup hooks
(.init_array IFUNC resolution). The runtime supplies the implementations. It
is freestanding — no OS, no hosted libc — so the same archive backs a Linux
binary, a Darwin binary, and a bare-metal image alike. It is built per target by
kit itself (see BUILD.md) and ships with the toolchain; the driver
links it automatically (see DRIVER.md).
user .c ──kit──► emits __divti3, va_arg, _Atomic, coro_resume, ...
│
<stdint.h> etc. ◄─┘ (shipped headers, rt/include/)
│
libkit_rt.a (per-target archive)
┌──────────────┬───────────┬──────────┬──────────┬───────────┐
int/fp soft mem/string atomic coro startup freestanding
helpers /stdio shim switch (IFUNC) libc subset
Design principles
No target-dispatch ifdefs in source. The integer/float helpers derive from
compiler-rt (lib/builtins/, Apache-2.0 WITH LLVM-exception, see
rt/lib/LICENSE-compiler-rt.txt), but upstream's #ifdef __ARM_EABI__ / __MINGW32__ / __SOFTFP__
target cascades were stripped out. Per-target variation is expressed by the
build — which directories and flags are selected — not by preprocessor
branches inside the C. What remains of the preprocessor is parameterization
(precision, src/dst pair) and genuinely-orthogonal concerns (assembler syntax in
assembly.h, HAS_INT128), never target dispatch.
One master .c/.S per feature, one object in the archive. Rather than
globbing compiler-rt's many per-op files, each feature group is a single master
translation unit (rt/lib/int/int.c, rt/lib/fp/fp.c, …) with the per-op
snippets inlined as commented blocks (// ---- udivmoddi4.c ----). Templates in
rt/lib/impl/ (fp_add_impl.inc, int_div_impl.inc, …) and the re-includable
rt/lib/include/common/fp_lib.h are pulled into the master multiple times per
TU, once per precision or per (src,dst) pair, with suffix-renamed statics so the
single object carries every needed instance. This keeps the archive small and
the member list explicit.
Weak portable fallbacks. Everything a hosted libc would normally own —
memcpy/memmove/memset/memcmp, the string and stdlib functions,
__clear_cache, __kit_assert_fail — is defined __attribute__((weak)) in
portable C, so a real libc or a tuned arch-specific routine wins at link time
without a conflict. The freestanding definitions only matter when nothing else
provides them.
Build-time target selection (multilib)
rt/Makefile (included by the root Makefile) enumerates RT_VARIANTS and, for
each, a small feature vector that drives source and flag selection. The
dimensions are: clang target triple, data model (lp64 / ilp32 / llp64),
HAS_INT128 (0/1), the coro arch token, binary128 long-double support
(LDBL128), RISC-V save/restore, ARM AEABI mode, and a HOSTED flag. A single
GNU-make define template expands each variant into its object list, compile
flags, and an ar-built libkit_rt.a under $(BUILD_DIR)/rt/<variant>/.
kit compiles and archives its own runtime: RT_CC/RT_AS/RT_AR default to
kit cc/kit as/kit ar, so a codegen change in the compiler rebuilds
the runtime.
The data-model dimension is the multilib axis. It selects:
| Data model | long/ptr |
128-bit | int master | include dir | targets |
|---|---|---|---|---|---|
lp64_le |
64 / 64 | yes | int64/int64.c |
rt/lib/include/lp64_le |
x86_64, aarch64, rv64 (LE) |
llp64_le |
32 / 64 | yes | int64/int64.c |
rt/lib/include/llp64_le |
Win64 x86_64 / aarch64 |
ilp32_le |
32 / 32 | no | int32/int32.c |
rt/lib/include/ilp32_le |
i386, arm32, rv32, wasm32 |
lp64_le_ldbl128 |
(lp64 +) | yes | (lp64 + fp_tf) |
-include .../tf_supplement.h |
aarch64/rv64 binary128 long double |
The per-model dir holds one file, int_lib.h, the compiler-rt support header
folded together with upstream's int_endianness.h/int_types.h. They differ
only where the data model forces it: LP64/LLP64 declare the ti_int/tu_int
__int128 machinery and twords/utwords unions; ILP32 omits all of it (no
128-bit type) and instead defines AEABI_RTABI (the AAPCS __pcs__ attribute
the ARM sources need). lp64_le_ldbl128 is not a separate header set but an
extra -include tf_supplement.h layered onto an LP64 build, defining
tf_float / CRT_HAS_TF_MODE before fp_lib.h processes them — keeping the
base header free of feature gates. All headers assume little-endian; a
big-endian port would need a parallel *_be/ set.
The HOSTED flag (Windows variants) ships only the compiler-support subset
(RT_COMPILER_SRCS: int/fp/atomic/cache/ifunc) and lets the platform libc
supply mem/string/stdio/stdlib. Everything else ships the full RT_BASE_SRCS.
Compiler-support helpers
These are the ABI-mandated symbols the backends emit when an operation has no native instruction.
- Integer (
rt/lib/int/int.c, always built). 64-bit divide/modulo (__udivdi3/__divdi3/__umoddi3/…) built onudivmoddi4, plus the bit-twiddling family every target may need:bswap,clz/ctz/ffs,parity,popcount,cmp/ucmp,abs,neg— at 32- and 64-bit widths.rt/lib/int/si_div.cadds bit-serial__udivsi3/__divsi3etc. (referenced by other helpers even where C int division is one instruction). - 64-bit-on-32-bit (
rt/lib/int32/int32.c, ILP32 only): 64-bit shifts and__muldi3synthesized from 32-bit lanes. - 128-bit-on-64-bit (
rt/lib/int64/int64.c, LP64/LLP64 only):__int128shifts,clz/ctz,__multi3,__negti2, and div/mod viaudivmodti4. - Soft float — binary32/binary64 (
rt/lib/fp/fp.c, always built):sf/dfadd/sub/mul/div/neg, the full compare set (__eqsf2…__gtdf2, each a real function rather than an object-format-conditional alias),sf↔dfextend/truncate, every int↔float conversion (floatsisf,fixdfdi, …), andfp_mode(rounding-mode query). Native-FPU targets still use the soft conversions kit's contract requires; FPU-less targets (rv32/64 without F/D, ARM softfp, wasm) use the whole set. - Soft float — binary128 (
rt/lib/fp_tf/fp_tf.c) and__int128↔float (rt/lib/fp_ti/fp_ti.c): built wherelong doubleis IEEE binary128 (aarch64/rv64). Addstfarithmetic,sf/df↔tfconversions, and thei128↔tf/i128↔sf/dffixes. - Atomics (
rt/lib/atomic/atomic_freestanding.c, always built): the__atomic_*_Nfallbacks for objects the backend cannot lower to a native atomic instruction. A pointer-sized_Atomic(uintptr_t)spinlock pool (atomic_common.inc) provides the lock, hashed by address — no OS dependency. Implemented over the GCC-style__atomic_*builtin family that kit itself documents (doc/builtins.md), with upstream's Clang-only__c11_atomic_*calls translated. 16-byte cases are keyed offHAS_INT128. On 32-bit targets (rv32ilp32/ilp32f) the ISA has no 64-bit atomic (lr.d/sc.d/amo*.dare rv64-only), so 8-byte_Atomic/__atomic_*lower to the__atomic_*_8entries here — spinlock-backed, correct but not lock-free; the front end's__atomic_always_lock_free(8, …)reports false to match. This is the same contract libatomic provides; kit ships no native 64-bit atomic on rv32. - Misc (
rt/lib/cache/clear_cache.c): a weak__clear_cache(target for__builtin___clear_cache) plus weak bare-metal cache stubs. ARM and RISC-V variants add the AEABI / save-restore assembly described below.
What this runtime deliberately does not provide: 80-bit x86 xf soft float
(x86 always has the FPU for long double), half-precision conversions, big-endian
targets, and the __riscv_32e/64e embedded ABIs — none are in kit's
runtime contract.
Per-arch assembly helpers
- ARM AEABI (
rt/lib/arm/aeabi_thumb2.S,aeabi_thumb1.S,aeabi.c): the AEABI div/mod dual-result helpers, soft-float compares, and size-specialized__aeabi_mem*wrappers (which forward to the weakmemcpy/memset). Two ISA variants — Thumb-2 (ARMv7+, tail-calls andsubs/mulsfolding) and Thumb-1 (ARMv6-M Cortex-M0, no tail-calls, restricted forms).aeabi.ccarries the ISA-agnostic__aeabi_drsub/__aeabi_frsub. - RISC-V save/restore (
rt/lib/riscv/rv32.S,rv64.S): the__riscv_save_*/__riscv_restore_*millicode for-msave-restore, split per XLEN (upstream gated one file on__riscv_xlen).
A Win64 stack-probe helper (rt/lib/stack/chkstk_x86_64_win.c, __chkstk /
___chkstk_ms page-touch probes for large frames) lives in the tree but is
not wired into any variant's source list and so ships in no archive today. It
is noted here only so the orphan is not mistaken for a present provider; when
Win64 large-frame probing lands it would join the hosted Windows variant's
source set.
kit-specific startup: IFUNC resolution
rt/lib/kit/ifunc_init.c provides __kit_ifunc_init, the startup hook for
statically-linked ELF images that use STT_GNU_IFUNC symbols. The linker
(src/link/link_layout.c) materializes one IPLT stub and .igot.plt slot per
IFUNC, emits a parallel .iplt.pairs section of (resolver, slot) pointer
pairs, and synthesizes a .init_array entry pointing at this function. Before
main, the CRT walks .init_array; __kit_ifunc_init iterates the pairs,
calls each resolver, and stores the chosen implementation pointer into its slot,
so the IPLT load-and-branch tail-calls the right target. The .iplt.pairs span
symbols (__start_iplt_pairs/__stop_iplt_pairs) are weak, so the object is a
harmless no-op when linked into images with no IFUNCs or by a non-kit linker.
The JIT path resolves slots in-process at load time and skips the .init_array
synthesis, so this symbol is never an unresolved reference there (see
JIT.md, LINK.md).
rt/lib/assert/assert.c supplies the weak __kit_assert_fail (the target of
assert() failure), which __builtin_traps and spins.
Coroutines: stackful asymmetric context switch (rt/lib/coro/)
kit ships <kit/coro.h> as a native extension — C11 has no stackful
coroutine facility — built as a deliberate counterpart to <setjmp.h>. The two
share one per-target register-context payload (256 bytes, 16-aligned): the same
save/restore instruction sequences back setjmp/longjmp and the coroutine
switch. The module is two layers:
<kit/coro.h> coro_init / coro_resume / coro_yield / coro_self
│
coro/coro.c arch-agnostic asymmetric layer ── resume chain, thunk
│ (one TU, built for every coro variant)
▼
coro/<arch>.c per-arch primitives: setjmp / longjmp,
(+ aarch64*.s) __kit_coro_switch, __kit_coro_ctx_init, trampoline
Per-arch primitive (rt/lib/coro/<arch>.c). One per ABI:
aarch64, x86_64, x86_64_win, i386, arm32, arm32_thumb1, riscv32,
riscv64. Each defines the per-target context struct (callee-saved GPRs +
callee-saved FPRs + sp + return address — e.g. x86_64 SysV is 8 words/64 bytes;
aarch64 is x19–x28/fp/lr/sp + d8–d15/176 bytes), and verifies via
_Static_assert that it fits both jmp_buf and coro_ctx. The three primitives
that save/restore registers — setjmp, longjmp, and __kit_coro_switch —
share one pair of SAVE_INTO/RESTORE_FROM macros so identical instruction
bytes are emitted in all three. Symbol decoration uses __USER_LABEL_PREFIX__,
so one source compiles for ELF / Mach-O / COFF. Most arches keep the asm
file-scope inside the .c; aarch64 splits it into aarch64_elf.s /
aarch64_macho.s (selected per variant via RT_EXTRA_SRCS) so the C TU needs
no file-scope-asm support. The thumb1 variant is a separate file because its
ARMv6-M sequences (no IT blocks, no VFP, no str sp) can't share with arm32.c.
wasm32 ships no coro (would need an Asyncify fiber port).
__kit_coro_switch is the symmetric register shuffle: save callee state into
*from, restore from *to, deliver a value. It is exposed in the public header
for advanced (M:N, work-stealing) schedulers, and is the building block under
the asymmetric layer. __kit_coro_ctx_init lays down a fresh context: zero
the saved registers, point the entry-fn register and return address at the
trampoline, and set sp to the (16-aligned, downward-growing) stack top.
Asymmetric layer (rt/lib/coro/coro.c). Implements coro_init /
coro_resume / coro_yield / coro_self. A coro_t's private blob holds the
coro_ctx, a resumer back-pointer, and the user entry fn (a _Static_assert
pins the fit inside the header's 288-byte reservation). coro_resume records the
caller as the resumer (NULL meaning the main flow), switches in, and on return
restores the previous current-coroutine pointer; coro_yield reads its own
resumer slot and switches back — so resumes nest like calls. The trampoline
enters a static thunk (__kit_coro_thunk) that runs the user fn, marks the
coroutine CORO_DEAD, and switches back to the resumer with the return value, so
the symmetric primitive never needs to know about coro_t lifecycle. The
"current coroutine" pointer and the "main" save slot are _Thread_local, so each
thread gets an independent resume chain; kit's contract defines
__STDC_NO_THREADS__, but _Thread_local is an independent C11 language
feature, and bare-metal images with no TLS runtime collapse to single-thread
semantics.
Thread-local storage (freestanding contract)
kit emits the Local-Exec TLS model only — there is no dynamic TLS
(__tls_get_addr, GD/LD) and no TLS allocator. _Thread_local objects live in
the executable's PT_TLS image and are reached tp-relative, with the per-arch
offset baked in by the linker (ObjElfArchOps.tls_tp_bias, applied in
src/obj/elf/link.c's tls_tcb_bias).
The runtime ships no crt0, so a freestanding image's own startup establishes
the thread block and tp. The layout kit's codegen + linker assume for RISC-V
and AArch64 (TLS variant I) is a 16-byte TCB ahead of .tdata:
[ TCB (16 bytes) | .tdata (init image) | .tbss (zeroed) ]
^tp
so a TLS variable at image offset off is accessed at tp + 16 + off. Startup
must therefore reserve 16 + tdata_size + tbss_size, copy the .tdata init
image to block + 16, zero the .tbss span, and set tp = block. The
reference implementation is test/link/harness/start.c (the rv32 bare-metal
stub lives in test/lib/exec_rv32_bare.sh); under ilp32f that startup must
also set mstatus.FS before any FP op. x86_64 uses TLS variant II instead
(tp/%fs points past the image, TPOFF offsets are negative), so it carries
no TCB bias.
On a hosted RISC-V target the psABI points tp at the image start (bias 0,
matching Linux/FreeBSD _init_tls); kit's linker selects that 0 bias for
non-freestanding RISC-V automatically. With no thread block set up at all,
_Thread_local still resolves against a single static image, so a bare-metal
program that never sets tp collapses to single-thread semantics.
Shipped headers (rt/include/)
kit ships its own header set so freestanding compilation needs no system headers. Two groups:
Freestanding C standard headers. The C11 freestanding-mandated set
(<stddef.h>, <stdint.h>, <stdarg.h>, <stdalign.h>, <stdbool.h>,
<stdnoreturn.h>, <float.h>, <limits.h>, <iso646.h>, <stdatomic.h>),
plus headers kit provides as extensions beyond the freestanding subset
(<setjmp.h>, <assert.h>, and small <string.h>/<stdlib.h>/<stdio.h>/
<math.h> surfaces matching the runtime's weak functions). They lean on
compiler builtins where the data model varies: <stdarg.h> is __builtin_va_*;
<stdint.h> hardcodes the exact-/min-width limits (kit fixes
CHAR_BIT == 8, int == 32, long long == 64) but delegates the type aliases
(__INT32_TYPE__, intptr_t, the FAST family) to the compiler since those vary
by model. A handful of x86 intrinsic shims (<emmintrin.h>, <x86intrin.h>,
<mm_malloc.h>) round out source compatibility.
<setjmp.h> is ABI-coupled to the coro payload: jmp_buf is a 256-byte,
16-aligned struct-wrapped array (struct wrapper guarantees alignment for x86_64
xmm saves; [1] keeps it an array type so it decays to a pointer), sized to the
largest per-target context (x86_64 Windows). No signal-mask slot — C11 7.13
excludes FP status and open-file state.
kit extension headers (rt/include/kit/). Non-standard primitives kit
exposes so low-level code stays pure C:
<kit/coro.h>— the coroutine API above.coro_ctxis the raw 256-byte register buffer;coro_tembeds it plus private scheduler storage.<kit/syscall.h>—__kit_syscall0..6, the raw kernel/supervisor trap primitive for Linux and freestanding native targets. These are compiler-lowered (the backend emitssyscall/svc/ecallinline as an opaque, full-memory-clobber operation) — there is no library implementation in this archive. The result is the raw target result register; Linux uses its "non-negative success / -errno failure" convention, and freestanding environments define their own ABI. Hosted non-Linux targets and WASM raise a compile-time error (use WASI imports for WASM).<kit/baremetal.h>— IRQ mask save/restore, CPU memory barriers (__kit_dmb/dsb/isb, distinct from C11 fences and meant for DMA / MMU / self-modifying code), range-based cache maintenance, and CPU hints (__kit_yield/wfi/wfe/sev). Like syscalls, these are compiler-lowered opaque operations, not library calls; targets with no meaningful lowering raise a compile-time error rather than silently no-op.
Freestanding libc subset
For non-hosted targets the runtime also carries a minimal libc so freestanding
programs can do basic work without a platform libc. All weak:
rt/lib/mem/mem.c (the four mem* primitives, hand-written portable C, 0BSD —
not from compiler-rt), rt/lib/string/string.c (strlen, strcpy, memchr,
…), rt/lib/stdlib/stdlib.c + qsort.c (string-to-number conversions, qsort,
the div_t family), and rt/lib/stdio/printf.c (a callback-driven formatter
derived from mpaland/printf, integer formats only — floating-point conversions
omitted). Hosted (Windows) variants drop all of these in favor of the platform
libc.
Planned/roadmap work, if any, lives under doc/plan/ — not here.