IR

This document defines the semantics of kit's semantic CG IR: the function-level, recorded form of the internal CgTarget interface, captured as a CgIrModule of CgIrFunc bodies (see src/cg/ir.h). It is the stable hinge between the frontend's typed CG-API calls and everything downstream that wants a durable program form rather than immediate emission: the optimizer, the threaded interpreter, and source-like replay backends. This is the authoritative semantics-of-the-IR reference; for how the IR is produced and replayed see CODEGEN.md, for the optimizer's own derived form see OPT.md, and for interpreted execution see INTERPRETER.md.

What the IR is, and what it is not

The IR is a faithful tape of CgTarget calls. The CgTarget interface (src/cg/cgtarget.h) is the semantic codegen API: typed locals, labels, structured scopes, memory ops, ABI-shaped calls, atomics, intrinsics, inline asm. A backend can implement that interface to emit code immediately (the O0 native path, the C-source path). The IR recorder (src/cg/ir_recorder.c) is a second implementation of the same interface that, instead of emitting, records each call into a CgIrInst on the current CgIrFunc. Replaying the recorded tape through a direct target reproduces exactly what immediate emission would have done.

Because of that, the IR carries no optimizer state and no machine state. There are no basic blocks, no SSA values, no phis, no dominance, no liveness, no virtual or hard registers, no spill slots, and no call plans in the CG IR. Those are all derived, consumer-private views. In particular the optimizer's Func IR (src/opt/ir.h) is a separate representation with its own op set (IR_PHI, IR_PARAM_DECL, IR_CONST_I, ...); it is built from the CG IR, not a superset of it. Do not conflate the two: the CG IR enum is CgIrOp in src/cg/ir.h; the optimizer enum is IROp in src/opt/ir.h.

The IR is target-data-layout-specific but not target-instruction-specific. Type sizes, alignments, record field offsets, bitfield bit ranges, ABI classifications, and pointer widths are already resolved for the compile target by the time the recorder sees a call. The IR does not know about machine instructions, addressing-mode legality, or register files.

No undefined behavior

The CG IR has no undefined behavior. Every operation, on every input, has a fully determined meaning that falls into exactly one of three categories:

Portably defined — the result is the same on every target. This is the default for the arithmetic edges that C leaves undefined: integer overflow wraps, shift counts wrap modulo the width, float→int conversions saturate, clz(0)/ctz(0) is the bit width.
Target-defined — the result is deterministic given the compile target but may differ across targets. Two things are target-defined: (1) inherently machine-tied effects (a fault on an invalid memory access, the bit pattern an inline-asm block produces), and (2) the arithmetic edges above when the frontend opts into native-instruction semantics for performance (see Semantic modes).
Well-formedness preconditions — structural requirements on the recorded tape (operand kinds and widths agree, each label is placed once, every path ends in a terminator, …). A tape that violates one is malformed IR, a compiler bug in the producer — not a program exhibiting undefined behavior. Consumers may assume well-formed input.

There is deliberately no fourth "anything may happen" category. Where C would say undefined behavior, the CG IR says portably defined, target-defined, or malformed — never unconstrained. The runtime half of this guarantee (what each op computes on every input) is spelled out in Well-definedness: edge-case semantics; the structural half is the Well-formedness list.

Pipeline position

frontend
  -> KitCg          (public CG API: stack/lvalue model)
     -> CgTarget       (semantic codegen interface)
          |-> direct native target      (O0 emit)
          |-> direct C-source target     (--emit=c)
          \-> IR recorder -> CgIrModule  (O1/O2, interpreter)
                                  |-> opt: derive Func (CFG/SSA/MIR) -> native emit
                                  \-> opt: derive Func (reduced) -> interpreter

KitCg lowers the frontend's stack/lvalue source operations into flat CgTarget calls. At O0 those calls hit a direct target and become code right away. At O1/O2 and under the interpreter they hit the recorder and become a CgIrModule. The recorder is created by the optimizer (src/opt/opt.c calls cg_ir_recorder_new); it notifies the optimizer per completed function and at finalize through callbacks so cross-function work (inlining, reachability, alias resolution) can run before the buffered IR is lowered into the wrapped direct target.

Module and function structure

A CgIrModule owns the translation unit's recorded functions, symbol aliases, and file-scope __asm__ blocks. File-scope asm is retained on the module rather than emitted during recording because the optimizer path has no live emit target at recording time; it is replayed at finalize.

A CgIrFunc is one function body and owns everything needed to replay, optimize, or interpret it: the preserved CGFuncDesc (symbol, function type, result/param descriptors, source location, attributes, inline policy); a linear instruction stream (CgIrInst tape); and side tables for locals, params, labels, and scopes. It also caches two ObjSymSets — the set of symbols it calls and the set of globals it references — populated as operands are recorded, so reachability and alias passes need not rescan the tape.

There is one local namespace per function. A CgIrLocal is a mutable typed location identified by a CGLocal id (1-based; CG_LOCAL_NONE is the sentinel). A local records its CGLocalDesc (type, size, align, source name/loc), whether it is a parameter (with parameter index), and whether its address has been taken. Parameters are declarations, not executable ops: the recorder adds the parameter local and a CgIrParam entry; there is no parameter instruction in the tape. Taking the address of a local (CG_IR_ADDR_OF, or the dedicated local_addr recording) sets the local's address_taken flag, which downstream consumers use to decide it needs a concrete memory home; non-address-taken scalar locals may live in registers, SSA values, or interpreter slots as the consumer sees fit.

Labels, scopes, and derived blocks

Labels are first-class because CG control-flow ops name them: branch targets, switch case/default targets, label-address materialization, and the closed target set of a computed goto. A CgIrLabel records its id and the source location of its first placement. Placement appears in the tape as a CG_IR_LABEL instruction.

Structured scopes (CgIrScope) capture CG's structured control model. There are two scope kinds (ScopeKind in src/cg/cgtarget.h): SCOPE_BLOCK, a forward-only region whose break skips to the end, and SCOPE_LOOP, whose break exits forward and whose continue jumps to an explicit loop-header target. if/if-else is not a distinct scope kind: the frontend lowers it to a pair of nested forward blocks (kit_cg_if_begin/_else/_end), so there is no else op in the IR. Backends able to express structure (the C-source target, a future Wasm target; see WASM.md) replay scopes directly; native CFG consumers flatten them to ordinary labels and branches.

Basic blocks are not part of the IR. A consumer that needs CFG form derives it by splitting the linear tape at labels, scope boundaries, and terminators. That derived CFG, with its predecessor/successor edges, layout order, and dominance, belongs to the consumer (the optimizer builds exactly this in opt_func_from_cg_ir).

Instructions and operands

A CgIrInst has an op (CgIrOp), a sticky source location captured from the last set_loc, an operand array, and an extra union holding op-specific auxiliary data: a raw immediate, constant bytes, a MemAccess, or an arena pointer to an op-specific aux struct. There is no separate result-type field; each operand carries its own KitCgTypeId and destinations name typed locals, so the instruction's types are recoverable from its operands and aux.

Most ops map one-to-one to a CgTarget method, and the operand order in the tape follows the method's argument order — destination first where there is one. Multi-result ops (calls, compare-and-swap, checked-arithmetic intrinsics) name several destination locals.

Operands use the shared Operand shape (src/cg/cgtarget.h), every variant typed by a KitCgTypeId:

OPK_IMM: a signed immediate bit pattern.
OPK_LOCAL: a typed function local.
OPK_GLOBAL: an object symbol plus signed addend — an address, not a load.
OPK_INDIRECT: base local, optional index local with a log2 scale (1/2/4/8), and a signed displacement; an addressing expression, not a load.

There is deliberately no register operand kind. Register-like temporaries are just locals; physical registers are a backend concern that never appears in the CG IR.

Types

Every local and operand carries a KitCgTypeId — a CG storage type already selected for the target, not a frontend AST type. Enums and typedef aliases have already collapsed to their storage type; record/array field identity is gone, replaced by byte offsets, bit ranges, and aggregate sizes. The CG type system covers void, a boolean/i1 condition type, width-only integers, the float widths, pointers (with address space), function-pointer values, opaque fixed-size aggregates, and the per-arch vararg-state object. Signedness is not a property of an integer type; it is carried by the operation that consumes the value (signed vs unsigned divide, compare, shift, extend, and int/float conversion). ABI decomposition — splitting one source value into several storage parts for argument passing or returns — is recorded in the call and return descriptors, not by re-typing ordinary value ops.

Operation families

The complete op set is CgIrOp in src/cg/ir.h; the categories below describe its semantics. The textual dumper (src/cg/ir_dump.c, reachable as cg_ir_func_dump) is the canonical rendering and a good cross-check for the spelling and operand order of any op.

Administrative

CG_IR_NOP: no effect; also used as a deletion marker.
CG_IR_LABEL: marks the placement of a label (id in extra.imm).

Data movement

CG_IR_LOAD_IMM: set a destination local from an integer-like immediate bit pattern (extra.imm) — null pointers, integer/bool constants, small literals.
CG_IR_LOAD_CONST: set a destination local from exact target ABI bytes (extra.cbytes) — floating constants, i128, f128, and other byte-oriented constants whose representation must be preserved exactly.
CG_IR_COPY: assign one local from another.
CG_IR_LOAD / CG_IR_STORE: scalar load/store through a local/global/indirect address, carrying a MemAccess in extra.mem.
CG_IR_ADDR_OF: materialize the address of a local/global/indirect lvalue; marks an addressed local address_taken.
CG_IR_TLS_ADDR_OF: materialize a thread-local object's address for the current thread. Separate from ADDR_OF because TLS materialization may need a target-selected access model, relocations, helper calls, or thread-pointer arithmetic — it is not merely an address-space attribute of an ordinary lvalue.
CG_IR_AGG_COPY / CG_IR_AGG_SET: fixed-size aggregate byte-range copy and fill, carrying an AggregateAccess.
CG_IR_BITFIELD_LOAD / CG_IR_BITFIELD_STORE: extract/insert a bitfield in a record storage unit, carrying a BitFieldAccess (storage offset, bit offset, bit width, signedness).

All memory and aggregate/bitfield ops rely on target layout facts already encoded in their operands and aux records; consumers must not reinterpret layout for a different target.

Arithmetic, compare, convert

CG_IR_BINOP: integer/float binary op; the BinOp tag is in extra.imm, operands dst, a, b.
CG_IR_UNOP: unary op; UnOp tag in extra.imm, operands dst, a.
CG_IR_CMP: compare producing an i1/bool local; CmpOp tag in extra.imm, operands dst, a, b.
CG_IR_CONVERT: width/representation conversion; ConvKind tag in extra.imm, operands dst, src.

Source operands of binop/unop/cmp may be OPK_IMM as well as OPK_LOCAL; the backend or interpreter decides whether to fold a small immediate into an instruction form or materialize it. The operation tag families (BinOp, UnOp, CmpOp, ConvKind, AtomicOp, MemOrder, IntrinKind) are defined in src/cg/cgtarget.h and are open to vector/SIMD extension — consumers must switch with a default arm rather than assume exhaustiveness.

Calls and returns

CG_IR_CALL: a direct or indirect call. The full CGCallDesc is preserved in the call aux: function type, callee operand, argument locals, result locals, flags, and inline/tail policy. A direct call has an OPK_GLOBAL callee; any other callee operand is an indirect call through a function-pointer local.
CG_IR_RET: return zero or more result locals (recorded in the return aux).

Tail calls are modeled as a CG_IR_CALL carrying the CG_CALL_TAIL flag, not as a property of CG_IR_RET. CG verifies realizability before setting the flag (through the target's tail_call_unrealizable_reason query, which the recorder forwards to its configured callback); the recorder preserves the tail policy so replay can emit a sibling call, fall back to call-plus-return, or diagnose.

Branching and computed goto

CG_IR_BR: unconditional branch to a label (id in extra.imm).
CG_IR_CMP_BRANCH: fused compare-and-branch; operands a, b, with the CmpOp and taken-target label in the cmp-branch aux. This is CG's preferred conditional-branch form; an arbitrary i1 in a local branches via cmp_branch(CMP_NE, val, 0, label).
CG_IR_SWITCH: structured multi-way branch; the switch aux holds the selector type, case/value pairs, default label, and density hints. Backends that can express it natively (C switch, a future Wasm br_table) override the target hook; otherwise CG's shared lowering reduces it to compare-branch chains or a label-address jump table.
CG_IR_LOAD_LABEL_ADDR: materialize a function-local label's address into a local (label id in extra.imm).
CG_IR_INDIRECT_BRANCH: branch to a label address, constrained to the closed target set in the indirect aux. The closed set drives CFG reconstruction and branch-target hardening (BTI/PAC/IBT).

Label addresses are opaque, function-local tokens. They may be stored, loaded, compared, selected, and consumed by CG_IR_INDIRECT_BRANCH within the same function activation; they are not callable function pointers and not dereferenceable data.

Function-local static data

CG_IR_LOCAL_STATIC_DATA_BEGIN / ..._WRITE / ..._LABEL_ADDR / ..._END: define a function-scoped static-data object that needs function-label scope. The motivating case is C &&label dispatch-table initializers, where a static array is filled with code-label addresses: _WRITE appends bytes (or zeros), and _LABEL_ADDR records a relocation to a function-local label with an addend, width, and address space. A target that cannot resolve code-label addresses in static data (e.g. Wasm) declines _BEGIN, and the recorder reports that it likewise cannot build a label-address jump table so switch_ takes a different lowering.

Structured scopes

These ops preserve CG's C-like structured control model — block and loop scopes — so backends that express structure directly (the C-source target, a future Wasm target) can replay it without rebuilding a CFG. CFG-based consumers ignore the structure and reconstruct control flow from the underlying labels and branches instead. if/if-else has no dedicated op or scope kind; the frontend builds it from nested forward block scopes plus CG_IR_BREAK_TO.

CG_IR_SCOPE_BEGIN: open a scope. The scope id and full CGScopeDesc (its kind — SCOPE_BLOCK or SCOPE_LOOP — and associated descriptor fields) ride in a CgIrScopeAux on extra.aux. Recording also adds a CgIrScope to the function's scope side table.
CG_IR_SCOPE_END: close the most recently opened matching scope; scope id in extra.imm.
CG_IR_BREAK_TO: exit the named enclosing scope (loop/block/switch break); scope id in extra.imm.
CG_IR_CONTINUE_TO: continue the named enclosing loop scope; scope id in extra.imm.

Scope ids are 1-based with CG_SCOPE_NONE as the zero sentinel. The structured form is advisory metadata layered over the same primitive control flow: a consumer that flattens scopes to labels and branches produces the same observable behavior as one that replays the structure natively.

Stack allocation and variadics

CG_IR_ALLOCA: dynamic stack allocation; operands dst, size, required alignment in extra.imm. Models target-ABI dynamic allocation (reached via __builtin_alloca), not language VLAs.
CG_IR_VA_START / CG_IR_VA_ARG / CG_IR_VA_END / CG_IR_VA_COPY: the four C vararg operations over a target-ABI vararg-state object, always addressed by pointer. VA_ARG carries the next argument's type in extra.imm.

Atomics

CG_IR_ATOMIC_LOAD / CG_IR_ATOMIC_STORE: ordered scalar load/store.
CG_IR_ATOMIC_RMW: read-modify-write defining the prior value; AtomicOp in the atomic aux.
CG_IR_ATOMIC_CAS: compare-and-swap defining the prior value and a success bool; carries both success and failure orderings.
CG_IR_FENCE: standalone fence; MemOrder in extra.imm.

Atomic ops carry both a MemAccess and memory-order metadata in their aux. They are observable and must preserve the ordering the memory model requires.

Intrinsics and inline asm

CG_IR_INTRINSIC: a compiler intrinsic identified by IntrinKind, with explicit destination and argument operand arrays in the intrinsic aux. Semantics depend entirely on the kind: bit ops (popcount, ctz, clz, bswap), memory helpers (memcpy/memmove/memset/prefetch/assume-aligned), hints (expect/unreachable/trap), non-local control (setjmp/longjmp), and checked arithmetic (add/sub/mul-with-overflow). Some are pure value producers, some are observable side effects, and some are terminators or return twice — consumers must classify by IntrinKind, not by CG_IR_INTRINSIC alone.
CG_IR_ASM_BLOCK: one GCC-style inline-asm block — template string, input and output constraint/operand pairs, and clobbers — captured verbatim in the asm aux. Constraint strings are target-specific; optimization may inspect operands and clobbers but must treat the block conservatively.

Semantic modes: portable vs target-defined

A handful of integer and conversion operations have edge cases whose cheapest lowering differs across targets: integer division by zero and INT_MIN / -1, shift counts at or beyond the operand width, and out-of-range or NaN float→int conversions. For these the IR offers two semantics, chosen per instruction by the frontend:

Portable (default). The edge is defined identically on every target (details under edge-case semantics). A frontend that wants reproducible results across architectures — or whose source language has no C-style undefined behavior — gets them for free by recording the op with no semantic flags.
Target-defined (opt-in). The edge follows the target's native instruction. A frontend whose source language already declares the edge undefined (C division by zero, oversized shifts, out-of-range (int) casts) can opt in to skip the guards portable mode would require, trading portability for the fastest lowering.

The choice rides in CgIrInst.flags (CgIrInstFlag in src/cg/ir.h):

Flag	Affects	Cleared (portable default)	Set (target-defined)
`CG_IR_INST_TARGET_DIV_EDGES`	`BINOP` sdiv/udiv/srem/urem	div-by-zero traps; `INT_MIN/-1` wraps	target divide instruction
`CG_IR_INST_TARGET_SHIFT_EDGES`	`BINOP` shl/shr_s/shr_u	count reduced modulo width	target shift instruction
`CG_IR_INST_TARGET_FPTOINT_EDGES`	`CONVERT` ftoi_s/ftoi_u	saturate; NaN→0	target convert instruction

Both modes are fully defined: target-defined is still deterministic per target, never unconstrained. This flag set is the only place the IR's value semantics depend on a producer choice rather than on the op alone; everything else is fixed by the op. Memory-safety faults are always target-defined and are not governed by these flags — there is no portable bounds-checking mode (see Memory).

Portable is the safe default for a consumer that has not yet been taught a flag: implementing portable semantics where the op asked for target-defined is always legal, because the opt-in is only ever taken when the source language permits any behavior at that edge. Wiring the public CG API and recorder to set these bits, and teaching each consumer (optimizer, interpreter, native and C-source backends) to honor them, is implementation work tracked separately from this spec; the bits are defined here so the IR can carry the choice.

Well-definedness: edge-case semantics

This section pins down every operation's behavior on the inputs that a structural reading of the op set leaves open. It mirrors the operation families above. Unless a rule is marked target-defined, it is portably defined.

Integer arithmetic and bitwise

Widths. For BINOP/CMP the source operands — and, for binop, the destination — share one integer width W ∈ {8,16,32,64,128}. CMP yields the boolean/i1 type. (Width agreement is a well-formedness precondition.)
Wrapping. iadd, isub, imul, and neg compute modulo 2^W (two's complement). Signed and unsigned overflow both wrap; neither is undefined. Overflow detection is not part of these ops — use the *_OVERFLOW intrinsics for a checked result. The public API's NSW/NUW/ EXACT assertions and trap/saturate overflow flags are not represented on the base IR op; a frontend that needs them realizes them as explicit checks before recording.
Division and remainder. sdiv/srem are truncated (round-toward-zero) division; the remainder takes the sign of the dividend. udiv/urem are unsigned.
- Portable: a zero divisor traps (a deterministic abort, as INTRIN_TRAP). INT_MIN_W / -1 is defined as INT_MIN_W and INT_MIN_W % -1 as 0; neither traps.
- Target-defined (CG_IR_INST_TARGET_DIV_EDGES): both edges follow the target divide instruction — e.g. x86-64 raises #DE for a zero divisor and for INT_MIN/-1; AArch64 sdiv yields 0 for a zero divisor and INT_MIN for INT_MIN/-1.
Shifts. The shifted value and the result have width W; shr_s replicates the sign bit, shl/shr_u shift in zeros. The count is an integer operand interpreted as an unsigned amount.
- Portable: the count is reduced modulo W (only its low log2(W) bits matter), so every count is defined and a high-bit-set ("negative") count simply reduces mod W.
- Target-defined (CG_IR_INST_TARGET_SHIFT_EDGES): an out-of-range count follows the target shift instruction's own masking or zeroing.
and/or/xor are total bitwise ops with no edge cases.

Floating point

The IR's floating-point operations are strict IEEE-754 in the target's default environment: round-to-nearest-ties-to-even, non-trapping exceptions (status-flag only), no denormal flushing. These are portable; the IR does not represent alternate rounding modes or fast-math relaxations (the public API's rounding argument and FP fast-math flags are dropped at the IR level unless the frontend realizes them as explicit operations).

fadd/fsub/fmul/fdiv produce the correctly-rounded IEEE result. A NaN operand yields a quiet NaN. x/0 → ±∞ (sign per operands), 0/0 → NaN, ∞/∞ → NaN.
There is no FP remainder primitive; the frontend lowers a floating % to a runtime call (fmod).
fneg flips the sign bit — it is not 0 - x: it negates zeros and infinities and toggles a NaN's sign without otherwise altering its payload.
Compares. The relational FP compares lt_f, le_f, gt_f, ge_f are ordered: if either operand is NaN the result is false. On floating operands eq is ordered-equal (NaN → false) and ne is unordered-not-equal (NaN → true), matching C ==/!=. A frontend needing an unordered relational composes it as the negation of the opposite ordered compare (a ULT b ≡ !(a OGE b)), since negating an ordered compare turns the NaN result to true.
- Spec note / known gap: the current public→IR lowering (api_map_fp_cmp in src/cg/value.c) maps both the ordered and the unordered relational forms to the same internal op, so the ordered/unordered distinction for <,<=,>,>= is presently lost at the IR boundary — correct only under a no-NaN assumption. Resolving it (unordered relational variants, or the explicit NaN composition above emitted by the frontend) is an implementation follow-up; the rule above is the intended contract.

Conversions

sext/zext require dst width > src width and sign-/zero-extend; trunc requires dst width < src width and keeps the low dst bits. (Width ordering is a precondition.)
itof_s/itof_u convert integer→float with round-to-nearest-even; magnitudes beyond the float's range round to ±∞ per IEEE.
ftoi_s/ftoi_u convert float→int rounding toward zero (truncation); in-range values drop their fraction.
- Portable: out-of-range and non-finite inputs saturate — above the destination max → max, below the min → min (0 for the unsigned floor) — and NaN → 0.
- Target-defined (CG_IR_INST_TARGET_FPTOINT_EDGES): the result follows the target convert instruction (e.g. x86-64 cvttsd2si yields the "integer indefinite" INT_MIN on overflow/NaN; AArch64 fcvtzs saturates).
fext widens exactly (no rounding); ftrunc narrows with round-to-nearest- even, overflow → ±∞.
bitcast requires equal byte size and reinterprets the operand's target ABI bit pattern without changing bits. Pointer↔integer of equal width is a bitcast.

Memory: load, store, aggregate, bitfield

Address validity. A load/store/aggregate/bitfield/atomic op requires its effective address to reference a live object of at least size bytes in the access's address space. This is not portably checked: an invalid or out-of-bounds access (including a null dereference) produces a target-defined fault — the deterministic behavior of the target's load/store against that address (a trap on an MMU target; a read or write of whatever occupies the address on a flat-memory target). It is target-defined, never unconstrained, and never governed by a semantic-mode flag.
Alignment. MemAccess.align is a promise: the producer asserts the address is at least that aligned (natural alignment for the type when align == 0), and a target may use the promise to choose wider instructions. Recording an access whose address is in fact less aligned than stated, without MF_UNALIGNED, is a precondition violation; on a strict-alignment target it faults (target-defined). MF_UNALIGNED declares the access may be unaligned and obliges the consumer to emit an unaligned-capable sequence; it is then fully defined.
Uninitialized reads. Reading a local or memory location not yet assigned on the current dynamic path yields an unspecified value of the access type — an arbitrary but type-valid bit pattern. It never traps and never corrupts other state; it is not poison and not undefined behavior. Producers should define every location before reading it for determinism, but doing otherwise stays within defined IR.
Volatile. MF_VOLATILE accesses are observable side effects: they must not be added, removed, duplicated, or reordered with respect to other volatile or atomic accesses.
Aggregates. agg_copy copies size bytes and requires source and destination ranges not to overlap (memcpy semantics); overlap is a precondition violation — use the MEMMOVE intrinsic for overlap. agg_set fills size bytes with the byte value. size == 0 is a defined no-op.
Bitfields. bitfield_load/bitfield_store access bits [bit_offset, bit_offset+bit_width) within the storage unit at storage_offset; the range must lie within the unit (precondition). A load sign- or zero-extends per signed_. A store uses the low bit_width bits of the source and leaves bits outside the field unchanged. A zero-width field (bit_width == 0) is a layout barrier only and performs no memory access.

Control flow

Labels. Every label named by a branch, switch, computed-goto target set, or label-address op belongs to the same function and is placed exactly once (one CG_IR_LABEL); placement may follow use in tape order. (Preconditions.)
Terminators and reachability. Every dynamic path ends in a terminator (ret, a CG_CALL_TAIL call, INTRIN_UNREACHABLE/TRAP/LONGJMP, indirect_branch, or a br that ultimately reaches one). Falling off the end of the instruction stream without a terminator is malformed. Instructions after a terminator are reachable only through a label.
Switch. The selector is compared against each case value using selector_type's width and signedness; a match transfers to that case's label, otherwise to default_label (LABEL_NONE means fall through past the switch). Case values are distinct (a precondition); the IR defines no tie-break.
Computed goto. indirect_branch transfers to the label address in its operand, which must be one of the ntargets labels in its closed set (ntargets > 0, a precondition). The set is exhaustive: a runtime address outside it is target-defined (branch-protection hardening may fault). Label addresses (load_label_addr, local_static_data_label_addr) are opaque tokens valid only within the defining function's activation; they may be stored, loaded, compared for equality, and consumed by indirect_branch, but never called or dereferenced as data.

Calls and returns

A call's argument and result locals match fn_type in count and type; for a variadic callee the fixed parameters match and variadic arguments are already promoted by the frontend (preconditions). Calling through an invalid function pointer is a target-defined fault. A direct call uses an OPK_GLOBAL callee; any other callee operand is indirect.
ret returns exactly the function's declared result locals, in order and type (precondition). A tail call carries CG_CALL_TAIL, obeys the realizability contract above, is a terminator, and is never followed by a ret.

Stack allocation and variadics

alloca allocates size bytes (an unsigned byte count) aligned to align (a power of two; precondition), valid for the rest of the function activation. Exhausting the stack is a target-defined trap.
va_start/va_arg/va_end/va_copy operate on a target-ABI vararg-state object addressed by pointer. va_arg's type must match the promoted type of the corresponding actual argument, and the number of va_arg reads must not exceed the variadic arguments actually passed (preconditions); violating either is target-defined (it reads adjacent argument storage). va_start precedes va_arg/va_end on the same state; va_copy duplicates state.

Atomics

Order legality (preconditions, per the C11 memory model; mirrored by kit_cg_atomic_is_legal):
- atomic_load: relaxed, consume, acquire, or seq_cst.
- atomic_store: relaxed, release, or seq_cst.
- atomic_rmw: any order.
- atomic_cas: any success order; failure ∈ {relaxed, consume, acquire, seq_cst} and no stronger than success.
- fence: any order (a relaxed fence has no effect).
The access must be a supported atomic width and naturally aligned for a lock-free operation; otherwise the consumer may lower to a runtime atomic call (target-defined mechanism, same observable semantics). Atomic ops are observable and must preserve the ordering the memory model requires. rmw defines the prior value; cas defines the prior value and a success bool and compares using the full access width.

Intrinsics and inline asm

Operand shapes are fixed per IntrinKind (src/cg/cgtarget.h). Semantic edges:

CLZ(0) and CTZ(0) are defined to equal the operand's bit width (stronger than C, where they are undefined). POPCOUNT, BSWAP16/32/64 are total.
SADD/UADD/SSUB/USUB/SMUL/UMUL_OVERFLOW define a two's-complement wrapped result and a boolean overflow flag.
MEMCPY requires non-overlapping ranges; MEMMOVE permits overlap; MEMSET fills. All are defined no-ops at size == 0.
SETJMP returns 0 on the direct call and the value passed to the matching LONGJMP when it returns again (a LONGJMP value of 0 surfaces as 1); it "returns twice." LONGJMP does not return. Consumers must preserve both control effects.
ASSUME_ALIGNED returns its pointer and asserts the stated alignment (a precondition; a wrong assertion is target-defined). EXPECT returns its value unchanged (a branch-probability hint). PREFETCH has no value effect.
TRAP is a deterministic abort. UNREACHABLE asserts the point is never reached and is itself a terminator; if control does reach it the behavior is a target-defined trap, and consumers may assume it unreachable (e.g. to prune successors). Neither corrupts unrelated state.
asm_block and file-scope asm are opaque target assembly. The IR fixes the interface — operand directions, clobbers, volatility — but the assembly's own behavior is target/external, modeled conservatively (treated as reading and writing its declared operands and clobbers and as an observable side effect unless flagged otherwise). This is external behavior, not undefined behavior.

Well-formedness (invariants)

A well-formed tape satisfies all of the following; consumers may assume them, and a violation is a producer bug (malformed IR), not program behavior. These are the structural half of "no undefined behavior" — the runtime half is the edge-case section above.

Sentinels are zero-valued (CG_LOCAL_NONE, LABEL_NONE, CG_SCOPE_NONE, OBJ_SYM_NONE); local, label, and scope ids are 1-based.
Every local has exactly one declared type for the whole function, and every source and destination local is declared before use.
Destinations are OPK_LOCAL. Operand kinds match each op's contract (src/cg/cgtarget.h): FP arithmetic and fneg require OPK_LOCAL sources; binop/unop/cmp also accept OPK_IMM; addresses are OPK_LOCAL/OPK_GLOBAL/OPK_INDIRECT; an OPK_INDIRECT index is an integer local with log2 scale 0..3.
Integer binop/cmp operands (and the binop destination) share one width; conversions obey their width-ordering rules.
A control-transfer op names labels in the same function; only a call targets a symbol or function-pointer value. Each label is placed exactly once; every path ends in a terminator. Switch case values are distinct; a computed goto's target set is non-empty and closed.
Calls and returns match the function/callee type in arity and operand type; atomic orders are legal for their op (above).
Data-layout facts (sizes, alignments, field offsets, bit ranges, ABI shape) are already target-selected; consumers must not reinterpret them for a different target.
Source locations are sticky at recording time and stamped on each instruction.

Consumer guidance

Anything that reads the IR is reading a layout-resolved, ABI-shaped, but machine-neutral program. The contract a consumer must respect: preserve target-data-layout semantics, memory observability (the MemFlag set and alias roots on each access), the ABI shape of calls and returns, and CFG validity. It must also implement at least the portable edge-case semantics of every op, and honor the CgIrInst.flags semantic-mode bits where it understands them — falling back to portable semantics (a safe refinement) for any bit it does not. A consumer may assume a well-formed tape; it must not introduce undefined behavior of its own where the IR defines a result.

Two consumers exist today, and they take different paths:

The optimizer (see OPT.md) does not run passes on the CG IR in place. It converts each CgIrFunc into its own Func IR (opt_func_from_cg_ir in src/opt/cg_ir_lower.c), which materializes basic blocks, SSA, virtual registers, and frame objects, then runs CFG cleanup, simplification, machinization, liveness, register allocation, and emission, and finally replays into the wrapped direct backend. This conversion is why SSA/phi/const ops live in the optimizer's enum and never in the CG IR.
The interpreter (see INTERPRETER.md) also goes through the optimizer's Func form, but via a reduced pipeline (opt_run_o1_interp in src/opt/opt.c) that stops before machinization: it builds the CFG, runs target-independent cleanups, promotes scalar locals, and hands a Func with virtual registers to the interpreter loader (src/interp/lower.c), which emits fixed-width bytecode. The interpreter never consumes the native MIR/regalloc view, keeping execution aligned with CG semantics rather than backend emission. Address-taken locals, aggregates, globals, TLS instances, and allocas become byte-addressable interpreter memory; label addresses stay opaque tokens; interpreted-to-interpreted calls dispatch through retained function bodies and external calls go through an FFI layer.

Source-like backends (the C-source target, a future Wasm target) can instead replay the tape op-by-op into a direct CgTarget, taking advantage of the retained structured scopes and switch descriptors.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README