C Source Backend
kit's no-deps posture rules out linking against LLVM or GCC for an
industrial-strength optimizer. The C-source backend gives kit users that
optimizer anyway: it emits portable C source (cc -S=c-style, selected via
--emit=c) and hands the result to whatever gcc/clang exists on the build
host. The host C compiler then performs ABI lowering, instruction selection,
register allocation, and aggressive optimization (SROA, vectorization, etc.).
kit's job here is to produce legal and complete C, not human-readable C.
See CODEGEN.md for the CG model this backend consumes and
IR.md for the semantic IR it walks.
A CGBackend, not an ArchImpl
There is no ArchImpl for the C target. An ArchImpl describes a machine —
registers, encodings, an MCEmitter that writes object bytes. The C backend
produces no machine code and no object bytes; the eventual machine code runs on
the host triple after the host cc compiles the emitted source. What it
is is a CGBackend (cg_backend_c_target in src/arch/c_target/target.c):
the small "give me a CgTarget for this Compiler + ObjBuilder + emit options"
unit the registry hands out. The registry selects it in
cg_backend_for_session (src/arch/registry.c) whenever
CodeOptions.emit_c_source is set; output is written to
CodeOptions.c_source_writer instead of to an object file.
Two-stage pipeline: record CG into IR, then emit C from IR
The backend does not translate CG calls to C text directly. It splits into two stages with the semantic IR (see IR.md) as the seam:
frontend (lang/c, lang/toy)
| kit_cg_* calls
v
CgIrRecorder (src/cg/ir_recorder.c) <- a CgTarget that records
| semantic CG into CgIrModule
| [opt passes run here if opt_level>0]
v
c_emit_ir_module (src/arch/c_target/ir_emit.c)
| switch over CgIrOp -> c_emit_* calls
v
c_emit_* (src/arch/c_target/c_emit.c)
| string buffers (cbuf)
v
KitWriter -> .c text
c_target_backend_make constructs the C emitter (CTarget, via
c_emit_target_new) and then wraps it in a CgIrRecorder whose finalize
callback (c_ir_finalize) replays the recorded CgIrModule through
c_emit_ir_module and then flushes the C source. The recorder is what the
session and frontend actually drive; the CTarget is private behind the
recorder's user pointer.
This design has two consequences worth stating. First, the C target never sees
a live CG call stream — it walks a finished CgIrModule, so its emission code
is a straightforward op-dispatch (ir_emit_inst) with no value-stack
bookkeeping of its own. Second, because the recorder is itself a normal
CgTarget, the IR optimizer can sit between record and emit exactly as it does
for the machine backends: at opt_level > 0 the session wraps the recorder in
opt_cgtarget_new, so the C emitted is the optimized IR. Deferring the heavy
optimization to the host cc is still the intent, but the kit-side IR passes
are not bypassed.
ir_emit.c carries one piece of glue beyond pure dispatch: CG scope handles
are recorder-relative, so CIrEmitter keeps a scope_map that translates each
recorded CGScope to the handle the emitter minted at scope_begin.
Target-locked, not portable
The emitted C is target-locked: it must be compiled for the same triple
that kit --target= selected. Compiled for a different triple it may silently
misbehave. The cause is fundamental to CG: semantic lvalue chains are flattened
to (base, byte_offset) before any backend sees them. kit_cg_field(g, n)
arrives as an indirect access with ofs=12; the field identity is gone, and
that 12 came from the kit-selected target's record layout. If a downstream
compiler assumes a different layout, the access is wrong. This is the same
trade LLVM IR makes (datalayout-locked), and it does not limit the stated goal,
since the user already fixed the triple at kit invocation.
Semantic temporaries become C locals
CG mints fresh, unbounded local ids (CGLocal); each one becomes a single
declared C automatic variable named vN (c_local_name). Declaration is lazy:
the first time a local is referenced, c_ensure_local appends one typed
declaration to the per-function decls buffer. Locals are zero-initialized
(= 0, or = {0} for aggregates) and marked __attribute__((unused)) to
silence host-cc diagnostics on control flow the host can't reason through;
the host DSEs the init when a real assignment dominates.
Each local has exactly one declared C type, recorded in local_type and
checked for consistency on re-use. Where CG arithmetic crosses
pointer/integer or differing-width boundaries, the emitter bridges through
uintptr_t casts so host-cc warnings (-Wint-conversion and friends) stay
quiet while the bit semantics are exact. Signedness-sensitive operations
(unsigned divide/remainder, logical shift right, unsigned compares) get an
explicit width-sized signed/unsigned cast on their operands.
Types: scalars map, aggregates are opaque bytes
c_typename lowers a CG type id to a C type:
- scalars -> fixed-width
<stdint.h>types (int8_t..int64_t,__int128),float/double/long double, orint32_tforbool; - pointers ->
void*(all pointers collapse; access type comes from the cast at each load/store); - enums -> their underlying integer base;
- vararg state ->
va_list(flags a<stdarg.h>include); - records, arrays, function types -> an emitted typedef.
The key invariant: composite types are opaque storage. A record or array of
size N and alignment A becomes
typedef struct { _Alignas(A) uint8_t raw[N]; } __ty_<id>;
regardless of its fields. Field and element access is never expressed through C
field syntax; CG already speaks in (base, byte_offset), so every access is an
indirect dereference (*(T*)((char*)addr + ofs)). Emitting types as raw bytes
sidesteps all C aggregate-semantics ambiguity (bitfield layout rules, array
decay, packed/aligned attribute interactions) and keeps types orthogonal to
access patterns. Modern hosts see through the offset arithmetic for SROA
anyway. Function types instead become a function-pointer typedef
R (*__ty_<id>)(...) for indirect calls. Multi-result returns synthesize a
guarded __kit_tuple<N>_... struct.
Typedefs are emitted into a TU-wide typedefs buffer, keyed by unaliased type
id with a per-id state machine (unseen / inflight / emitted) so each type is
declared once, dependencies first, and recursive types degrade to forward-only
rather than looping.
TU structure and the deferred prologue
The emitter accumulates several string buffers and flushes them in a fixed
order at c_emit_finalize:
prologue #include <stdint.h>, <stdalign.h> (+ stdarg/setjmp if used)
typedefs __ty_* opaque-storage and function-pointer typedefs
forwards one `RetT name(params);` per function seen
data_defs data symbol definitions and extern declarations
function bodies signatures + spliced-in decls + body statements
Header choice beyond the two unconditional includes is deferred to finalize so
the include lines stay deterministic regardless of when a feature was first
referenced. The data walk (c_emit_data) populates two buffers: the
data_defs buffer it owns, and — as a side effect, since data initializers can
take the address of functions — the function forward-declaration buffer. So the
walk runs first, then forwards is flushed, then data_defs. Forwards precede
data definitions because a data initializer may reference a function by name.
Per function, declarations and body text are buffered separately: CG needs all
locals declared at the top of the function, but surfaces them interleaved with
body emission. func_end records the byte offset just past the opening brace
(fn_body_start) and splices the accumulated decls in there. A
last_was_terminator flag drops dead statements after an unconditional
return/goto so the output is not littered with unreachable C.
Control flow
CG's structured scopes map to C control flow where possible. SCOPE_LOOP
becomes for (;;) { ... }; within such a structured scope, jumps to the
scope's break/continue labels are emitted as C break;/continue; rather than
goto. Everything else lowers to labels and goto, which the host cc
re-structures. Switches, computed/indirect branches (GCC &&label /
goto *p), and address-of-label all have direct emitters.
Tail calls
CG owns the tail-call policy (see CODEGEN.md): before flagging a
call as a sibling call it asks the target whether the call is realizable, and
only sets CG_CALL_TAIL when the target agrees. The C backend answers through
c_emit_tail_call_unrealizable_reason_for, wired into the recorder config as
tail_call_unrealizable_reason. A realizable tail call is emitted as
__attribute__((musttail)) return <call>;, which clang lowers to a guaranteed
sibling call; the host compiler does the actual stack-reuse.
The reason hook declines the cases clang's musttail cannot honor, returning a
human-readable string instead of NULL: a variadic caller, a variadic callee,
or a caller/callee parameter-count mismatch. For those CG leaves the call
unflagged and the backend emits an ordinary call. This keeps the C output
within the subset clang's musttail accepts rather than asserting a sibling
call the host would reject.
Mapping kit semantics onto GCC/clang C
GCC/clang-extension C covers everything CG can express, so each feature maps to a builtin or extension rather than a runtime shim:
- inline asm -> verbatim
__asm__ __volatile__ (...), constraints and clobbers passed through (with one fix-up: kit's synthesized matching input for a+-tied output is dropped, since gcc rejects the redundant operand); - overflow/trap/builtins ->
__builtin_{add,sub,mul}_overflow,__builtin_trap,__builtin_unreachable,__builtin_{popcount,ctz,clz, bswap}*,__builtin_prefetch,__builtin_expect,__builtin_memcpy/memmove/memset,__builtin_alloca[_with_align]; - atomics ->
__atomic_*generic builtins with explicit memory orders; - varargs ->
__builtin_va_*overva_list; - float-constant loads -> a
static const uint8_t[]of the ABI byte pattern copied into the destination via__builtin_memcpy, so the host sees the exact bits; - bitfields -> bit-extract/insert arithmetic on the underlying storage unit; kit never emits a C bitfield declaration.
Data symbols and cross-symbol relocations
Data emission walks the ObjBuilder's symbols at finalize. A defined data
symbol is emitted as a typed file-scope object carrying its initializer bytes;
undefined data becomes an extern uint8_t name[]; declaration. Linkage,
visibility, weakness, const (for rodata), static (local binding) and
_Thread_local (TLS) are reproduced via attributes and qualifiers so the host
linker reconstructs the same symbol table.
Cross-symbol references (relocations into a symbol's bytes) are the interesting
case. Rather than a runtime constructor, the symbol's storage struct is split
so each relocated slot is a real typed field: raw uint8_t chunk_K[] runs
interleaved with pointer-width fields (void* for 8-byte, uint32_t for
R_ABS32). The initializer assigns standard C address-of expressions
((void*)((char*)&target + addend)) to those fields, so the host C compiler
and linker resolve the references natively. The relocation slots are sorted by
offset, and when any are present the struct is __attribute__((packed)) so the
field layout matches the original byte image exactly.
TLS delegates entirely to _Thread_local; the host compiler builds its own
descriptor. On Mach-O, where TLS is split into a descriptor symbol plus a
synthesized init symbol (see OBJ.md), the emitter pulls the initial
bytes from the init symbol via the descriptor's R_ABS64 and emits a single
_Thread_local, skipping the object-level descriptor machinery.
Function-local static data uses CG's narrow source-backend hook: those
symbols are emitted inside the owning function and skipped by the TU-wide data
walk.
Source locations and debug info
set_loc emits #line N "path" directives (deduplicated against the last one
emitted) into the function body. When the user passes -g to the downstream
host gcc/clang, the resulting object carries debug info mapped back to the
original kit input. kit's own DWARF producer (see DWARF.md) is
unused in this mode — there is no Debug and no MCEmitter on this path.
Testing
Exercised end-to-end: emit C with kit cc --emit=c, compile the result with
the host cc -Werror, run it, and assert behavior matches the machine-code
path. The test/toy and test/parse corpora drive this via a dedicated emit
mode. See TESTING.md.