IR
This document defines the semantics of kit's semantic CG IR: the
function-level, recorded form of the internal CgTarget interface, captured as
a CgIrModule of CgIrFunc bodies (see src/cg/ir.h). It is the stable hinge
between the frontend's typed CG-API calls and everything downstream that wants a
durable program form rather than immediate emission: the optimizer, the
threaded interpreter, and source-like replay backends. This is the
authoritative semantics-of-the-IR reference; for how the IR is produced and
replayed see CODEGEN.md, for the optimizer's own derived form see
OPT.md, and for interpreted execution see
INTERPRETER.md.
What the IR is, and what it is not
The IR is a faithful tape of CgTarget calls. The CgTarget interface
(src/cg/cgtarget.h) is the semantic codegen API: typed locals, labels,
structured scopes, memory ops, ABI-shaped calls, atomics, intrinsics, inline
asm. A backend can implement that interface to emit code immediately (the O0
native path, the C-source path). The IR recorder (src/cg/ir_recorder.c) is a
second implementation of the same interface that, instead of emitting, records
each call into a CgIrInst on the current CgIrFunc. Replaying the recorded
tape through a direct target reproduces exactly what immediate emission would
have done.
Because of that, the IR carries no optimizer state and no machine state. There
are no basic blocks, no SSA values, no phis, no dominance, no liveness, no
virtual or hard registers, no spill slots, and no call plans in the CG IR. Those
are all derived, consumer-private views. In particular the optimizer's Func
IR (src/opt/ir.h) is a separate representation with its own op set
(IR_PHI, IR_PARAM_DECL, IR_CONST_I, ...); it is built from the CG IR, not
a superset of it. Do not conflate the two: the CG IR enum is CgIrOp in
src/cg/ir.h; the optimizer enum is IROp in src/opt/ir.h.
The IR is target-data-layout-specific but not target-instruction-specific. Type sizes, alignments, record field offsets, bitfield bit ranges, ABI classifications, and pointer widths are already resolved for the compile target by the time the recorder sees a call. The IR does not know about machine instructions, addressing-mode legality, or register files.
No undefined behavior
The CG IR has no undefined behavior. Every operation, on every input, has a fully determined meaning that falls into exactly one of three categories:
- Portably defined — the result is the same on every target. This is the
default for the arithmetic edges that C leaves undefined: integer overflow
wraps, shift counts wrap modulo the width, float→int conversions saturate,
clz(0)/ctz(0)is the bit width. - Target-defined — the result is deterministic given the compile target but may differ across targets. Two things are target-defined: (1) inherently machine-tied effects (a fault on an invalid memory access, the bit pattern an inline-asm block produces), and (2) the arithmetic edges above when the frontend opts into native-instruction semantics for performance (see Semantic modes).
- Well-formedness preconditions — structural requirements on the recorded tape (operand kinds and widths agree, each label is placed once, every path ends in a terminator, …). A tape that violates one is malformed IR, a compiler bug in the producer — not a program exhibiting undefined behavior. Consumers may assume well-formed input.
There is deliberately no fourth "anything may happen" category. Where C would say undefined behavior, the CG IR says portably defined, target-defined, or malformed — never unconstrained. The runtime half of this guarantee (what each op computes on every input) is spelled out in Well-definedness: edge-case semantics; the structural half is the Well-formedness list.
Pipeline position
frontend
-> KitCg (public CG API: stack/lvalue model)
-> CgTarget (semantic codegen interface)
|-> direct native target (O0 emit)
|-> direct C-source target (--emit=c)
\-> IR recorder -> CgIrModule (O1/O2, interpreter)
|-> opt: derive Func (CFG/SSA/MIR) -> native emit
\-> opt: derive Func (reduced) -> interpreter
KitCg lowers the frontend's stack/lvalue source operations into flat
CgTarget calls. At O0 those calls hit a direct target and become code right
away. At O1/O2 and under the interpreter they hit the recorder and become a
CgIrModule. The recorder is created by the optimizer (src/opt/opt.c calls
cg_ir_recorder_new); it notifies the optimizer per completed function and at
finalize through callbacks so cross-function work (inlining, reachability,
alias resolution) can run before the buffered IR is lowered into the wrapped
direct target.
Module and function structure
A CgIrModule owns the translation unit's recorded functions, symbol aliases,
and file-scope __asm__ blocks. File-scope asm is retained on the module rather
than emitted during recording because the optimizer path has no live emit
target at recording time; it is replayed at finalize.
A CgIrFunc is one function body and owns everything needed to replay,
optimize, or interpret it: the preserved CGFuncDesc (symbol, function type,
result/param descriptors, source location, attributes, inline policy); a linear
instruction stream (CgIrInst tape); and side tables for locals, params,
labels, and scopes. It also caches two ObjSymSets — the set of symbols it
calls and the set of globals it references — populated as operands are
recorded, so reachability and alias passes need not rescan the tape.
There is one local namespace per function. A CgIrLocal is a mutable typed
location identified by a CGLocal id (1-based; CG_LOCAL_NONE is the sentinel).
A local records its CGLocalDesc (type, size, align, source name/loc), whether
it is a parameter (with parameter index), and whether its address has been
taken. Parameters are declarations, not executable ops: the recorder adds the
parameter local and a CgIrParam entry; there is no parameter instruction in
the tape. Taking the address of a local (CG_IR_ADDR_OF, or the dedicated
local_addr recording) sets the local's address_taken flag, which downstream
consumers use to decide it needs a concrete memory home; non-address-taken
scalar locals may live in registers, SSA values, or interpreter slots as the
consumer sees fit.
Labels, scopes, and derived blocks
Labels are first-class because CG control-flow ops name them: branch targets,
switch case/default targets, label-address materialization, and the closed
target set of a computed goto. A CgIrLabel records its id and the source
location of its first placement. Placement appears in the tape as a
CG_IR_LABEL instruction.
Structured scopes (CgIrScope) capture CG's structured control model. There are
two scope kinds (ScopeKind in src/cg/cgtarget.h): SCOPE_BLOCK, a forward-only
region whose break skips to the end, and SCOPE_LOOP, whose break exits forward
and whose continue jumps to an explicit loop-header target. if/if-else is not
a distinct scope kind: the frontend lowers it to a pair of nested forward blocks
(kit_cg_if_begin/_else/_end), so there is no else op in the IR. Backends
able to express structure (the C-source target, a future Wasm target; see
WASM.md) replay scopes directly; native CFG consumers flatten them to
ordinary labels and branches.
Basic blocks are not part of the IR. A consumer that needs CFG form derives it
by splitting the linear tape at labels, scope boundaries, and terminators. That
derived CFG, with its predecessor/successor edges, layout order, and dominance,
belongs to the consumer (the optimizer builds exactly this in
opt_func_from_cg_ir).
Instructions and operands
A CgIrInst has an op (CgIrOp), a sticky source location captured from the
last set_loc, an operand array, and an extra union holding op-specific
auxiliary data: a raw immediate, constant bytes, a MemAccess, or an arena
pointer to an op-specific aux struct. There is no separate result-type field;
each operand carries its own KitCgTypeId and destinations name typed locals,
so the instruction's types are recoverable from its operands and aux.
Most ops map one-to-one to a CgTarget method, and the operand order in the
tape follows the method's argument order — destination first where there is one.
Multi-result ops (calls, compare-and-swap, checked-arithmetic intrinsics) name
several destination locals.
Operands use the shared Operand shape (src/cg/cgtarget.h), every variant typed
by a KitCgTypeId:
OPK_IMM: a signed immediate bit pattern.OPK_LOCAL: a typed function local.OPK_GLOBAL: an object symbol plus signed addend — an address, not a load.OPK_INDIRECT: base local, optional index local with a log2 scale (1/2/4/8), and a signed displacement; an addressing expression, not a load.
There is deliberately no register operand kind. Register-like temporaries are just locals; physical registers are a backend concern that never appears in the CG IR.
Types
Every local and operand carries a KitCgTypeId — a CG storage type already
selected for the target, not a frontend AST type. Enums and typedef aliases have
already collapsed to their storage type; record/array field identity is gone,
replaced by byte offsets, bit ranges, and aggregate sizes. The CG type system
covers void, a boolean/i1 condition type, width-only integers, the float
widths, pointers (with address space), function-pointer values, opaque
fixed-size aggregates, and the per-arch vararg-state object. Signedness is not a
property of an integer type; it is carried by the operation that consumes the
value (signed vs unsigned divide, compare, shift, extend, and int/float
conversion). ABI decomposition — splitting one source value into several
storage parts for argument passing or returns — is recorded in the call and
return descriptors, not by re-typing ordinary value ops.
Operation families
The complete op set is CgIrOp in src/cg/ir.h; the categories below describe
its semantics. The textual dumper (src/cg/ir_dump.c, reachable as
cg_ir_func_dump) is the canonical rendering and a good cross-check for the
spelling and operand order of any op.
Administrative
CG_IR_NOP: no effect; also used as a deletion marker.CG_IR_LABEL: marks the placement of a label (id inextra.imm).
Data movement
CG_IR_LOAD_IMM: set a destination local from an integer-like immediate bit pattern (extra.imm) — null pointers, integer/bool constants, small literals.CG_IR_LOAD_CONST: set a destination local from exact target ABI bytes (extra.cbytes) — floating constants, i128, f128, and other byte-oriented constants whose representation must be preserved exactly.CG_IR_COPY: assign one local from another.CG_IR_LOAD/CG_IR_STORE: scalar load/store through a local/global/indirect address, carrying aMemAccessinextra.mem.CG_IR_ADDR_OF: materialize the address of a local/global/indirect lvalue; marks an addressed localaddress_taken.CG_IR_TLS_ADDR_OF: materialize a thread-local object's address for the current thread. Separate fromADDR_OFbecause TLS materialization may need a target-selected access model, relocations, helper calls, or thread-pointer arithmetic — it is not merely an address-space attribute of an ordinary lvalue.CG_IR_AGG_COPY/CG_IR_AGG_SET: fixed-size aggregate byte-range copy and fill, carrying anAggregateAccess.CG_IR_BITFIELD_LOAD/CG_IR_BITFIELD_STORE: extract/insert a bitfield in a record storage unit, carrying aBitFieldAccess(storage offset, bit offset, bit width, signedness).
All memory and aggregate/bitfield ops rely on target layout facts already encoded in their operands and aux records; consumers must not reinterpret layout for a different target.
Arithmetic, compare, convert
CG_IR_BINOP: integer/float binary op; theBinOptag is inextra.imm, operandsdst, a, b.CG_IR_UNOP: unary op;UnOptag inextra.imm, operandsdst, a.CG_IR_CMP: compare producing an i1/bool local;CmpOptag inextra.imm, operandsdst, a, b.CG_IR_CONVERT: width/representation conversion;ConvKindtag inextra.imm, operandsdst, src.
Source operands of binop/unop/cmp may be OPK_IMM as well as OPK_LOCAL; the
backend or interpreter decides whether to fold a small immediate into an
instruction form or materialize it. The operation tag families (BinOp, UnOp,
CmpOp, ConvKind, AtomicOp, MemOrder, IntrinKind) are defined in
src/cg/cgtarget.h and are open to vector/SIMD extension — consumers must switch
with a default arm rather than assume exhaustiveness.
Calls and returns
CG_IR_CALL: a direct or indirect call. The fullCGCallDescis preserved in the call aux: function type, callee operand, argument locals, result locals, flags, and inline/tail policy. A direct call has anOPK_GLOBALcallee; any other callee operand is an indirect call through a function-pointer local.CG_IR_RET: return zero or more result locals (recorded in the return aux).
Tail calls are modeled as a CG_IR_CALL carrying the CG_CALL_TAIL flag, not
as a property of CG_IR_RET. CG verifies realizability before setting the flag
(through the target's tail_call_unrealizable_reason query, which the recorder
forwards to its configured callback); the recorder preserves the tail policy so
replay can emit a sibling call, fall back to call-plus-return, or diagnose.
Branching and computed goto
CG_IR_BR: unconditional branch to a label (id inextra.imm).CG_IR_CMP_BRANCH: fused compare-and-branch; operandsa, b, with theCmpOpand taken-target label in the cmp-branch aux. This is CG's preferred conditional-branch form; an arbitrary i1 in a local branches viacmp_branch(CMP_NE, val, 0, label).CG_IR_SWITCH: structured multi-way branch; the switch aux holds the selector type, case/value pairs, default label, and density hints. Backends that can express it natively (Cswitch, a future Wasmbr_table) override the target hook; otherwise CG's shared lowering reduces it to compare-branch chains or a label-address jump table.CG_IR_LOAD_LABEL_ADDR: materialize a function-local label's address into a local (label id inextra.imm).CG_IR_INDIRECT_BRANCH: branch to a label address, constrained to the closed target set in the indirect aux. The closed set drives CFG reconstruction and branch-target hardening (BTI/PAC/IBT).
Label addresses are opaque, function-local tokens. They may be stored, loaded,
compared, selected, and consumed by CG_IR_INDIRECT_BRANCH within the same
function activation; they are not callable function pointers and not
dereferenceable data.
Function-local static data
CG_IR_LOCAL_STATIC_DATA_BEGIN/..._WRITE/..._LABEL_ADDR/..._END: define a function-scoped static-data object that needs function-label scope. The motivating case is C&&labeldispatch-table initializers, where a static array is filled with code-label addresses:_WRITEappends bytes (or zeros), and_LABEL_ADDRrecords a relocation to a function-local label with an addend, width, and address space. A target that cannot resolve code-label addresses in static data (e.g. Wasm) declines_BEGIN, and the recorder reports that it likewise cannot build a label-address jump table soswitch_takes a different lowering.
Structured scopes
These ops preserve CG's C-like structured control model — block and loop
scopes — so backends that express structure directly (the C-source target, a
future Wasm target) can replay it without rebuilding a CFG. CFG-based consumers
ignore the structure and reconstruct control flow from the underlying labels and
branches instead. if/if-else has no dedicated op or scope kind; the frontend
builds it from nested forward block scopes plus CG_IR_BREAK_TO.
CG_IR_SCOPE_BEGIN: open a scope. The scope id and fullCGScopeDesc(itskind—SCOPE_BLOCKorSCOPE_LOOP— and associated descriptor fields) ride in aCgIrScopeAuxonextra.aux. Recording also adds aCgIrScopeto the function's scope side table.CG_IR_SCOPE_END: close the most recently opened matching scope; scope id inextra.imm.CG_IR_BREAK_TO: exit the named enclosing scope (loop/block/switch break); scope id inextra.imm.CG_IR_CONTINUE_TO: continue the named enclosing loop scope; scope id inextra.imm.
Scope ids are 1-based with CG_SCOPE_NONE as the zero sentinel. The structured
form is advisory metadata layered over the same primitive control flow: a
consumer that flattens scopes to labels and branches produces the same observable
behavior as one that replays the structure natively.
Stack allocation and variadics
CG_IR_ALLOCA: dynamic stack allocation; operandsdst, size, required alignment inextra.imm. Models target-ABI dynamic allocation (reached via__builtin_alloca), not language VLAs.CG_IR_VA_START/CG_IR_VA_ARG/CG_IR_VA_END/CG_IR_VA_COPY: the four C vararg operations over a target-ABI vararg-state object, always addressed by pointer.VA_ARGcarries the next argument's type inextra.imm.
Atomics
CG_IR_ATOMIC_LOAD/CG_IR_ATOMIC_STORE: ordered scalar load/store.CG_IR_ATOMIC_RMW: read-modify-write defining the prior value;AtomicOpin the atomic aux.CG_IR_ATOMIC_CAS: compare-and-swap defining the prior value and a success bool; carries both success and failure orderings.CG_IR_FENCE: standalone fence;MemOrderinextra.imm.
Atomic ops carry both a MemAccess and memory-order metadata in their aux. They
are observable and must preserve the ordering the memory model requires.
Intrinsics and inline asm
CG_IR_INTRINSIC: a compiler intrinsic identified byIntrinKind, with explicit destination and argument operand arrays in the intrinsic aux. Semantics depend entirely on the kind: bit ops (popcount, ctz, clz, bswap), memory helpers (memcpy/memmove/memset/prefetch/assume-aligned), hints (expect/unreachable/trap), non-local control (setjmp/longjmp), and checked arithmetic (add/sub/mul-with-overflow). Some are pure value producers, some are observable side effects, and some are terminators or return twice — consumers must classify byIntrinKind, not byCG_IR_INTRINSICalone.CG_IR_ASM_BLOCK: one GCC-style inline-asm block — template string, input and output constraint/operand pairs, and clobbers — captured verbatim in the asm aux. Constraint strings are target-specific; optimization may inspect operands and clobbers but must treat the block conservatively.
Semantic modes: portable vs target-defined
A handful of integer and conversion operations have edge cases whose cheapest
lowering differs across targets: integer division by zero and INT_MIN / -1,
shift counts at or beyond the operand width, and out-of-range or NaN float→int
conversions. For these the IR offers two semantics, chosen per instruction by
the frontend:
- Portable (default). The edge is defined identically on every target (details under edge-case semantics). A frontend that wants reproducible results across architectures — or whose source language has no C-style undefined behavior — gets them for free by recording the op with no semantic flags.
- Target-defined (opt-in). The edge follows the target's native instruction.
A frontend whose source language already declares the edge undefined (C
division by zero, oversized shifts, out-of-range
(int)casts) can opt in to skip the guards portable mode would require, trading portability for the fastest lowering.
The choice rides in CgIrInst.flags (CgIrInstFlag in src/cg/ir.h):
| Flag | Affects | Cleared (portable default) | Set (target-defined) |
|---|---|---|---|
CG_IR_INST_TARGET_DIV_EDGES |
BINOP sdiv/udiv/srem/urem |
div-by-zero traps; INT_MIN/-1 wraps |
target divide instruction |
CG_IR_INST_TARGET_SHIFT_EDGES |
BINOP shl/shr_s/shr_u |
count reduced modulo width | target shift instruction |
CG_IR_INST_TARGET_FPTOINT_EDGES |
CONVERT ftoi_s/ftoi_u |
saturate; NaN→0 | target convert instruction |
Both modes are fully defined: target-defined is still deterministic per target, never unconstrained. This flag set is the only place the IR's value semantics depend on a producer choice rather than on the op alone; everything else is fixed by the op. Memory-safety faults are always target-defined and are not governed by these flags — there is no portable bounds-checking mode (see Memory).
Portable is the safe default for a consumer that has not yet been taught a flag: implementing portable semantics where the op asked for target-defined is always legal, because the opt-in is only ever taken when the source language permits any behavior at that edge. Wiring the public CG API and recorder to set these bits, and teaching each consumer (optimizer, interpreter, native and C-source backends) to honor them, is implementation work tracked separately from this spec; the bits are defined here so the IR can carry the choice.
Well-definedness: edge-case semantics
This section pins down every operation's behavior on the inputs that a structural reading of the op set leaves open. It mirrors the operation families above. Unless a rule is marked target-defined, it is portably defined.
Integer arithmetic and bitwise
- Widths. For
BINOP/CMPthe source operands — and, forbinop, the destination — share one integer width W ∈ {8,16,32,64,128}.CMPyields the boolean/i1 type. (Width agreement is a well-formedness precondition.) - Wrapping.
iadd,isub,imul, andnegcompute modulo 2^W (two's complement). Signed and unsigned overflow both wrap; neither is undefined. Overflow detection is not part of these ops — use the*_OVERFLOWintrinsics for a checked result. The public API'sNSW/NUW/EXACTassertions and trap/saturate overflow flags are not represented on the base IR op; a frontend that needs them realizes them as explicit checks before recording. - Division and remainder.
sdiv/sremare truncated (round-toward-zero) division; the remainder takes the sign of the dividend.udiv/uremare unsigned.- Portable: a zero divisor traps (a deterministic abort, as
INTRIN_TRAP).INT_MIN_W / -1is defined asINT_MIN_WandINT_MIN_W % -1as0; neither traps. - Target-defined (
CG_IR_INST_TARGET_DIV_EDGES): both edges follow the target divide instruction — e.g. x86-64 raises#DEfor a zero divisor and forINT_MIN/-1; AArch64sdivyields0for a zero divisor andINT_MINforINT_MIN/-1.
- Portable: a zero divisor traps (a deterministic abort, as
- Shifts. The shifted value and the result have width W;
shr_sreplicates the sign bit,shl/shr_ushift in zeros. The count is an integer operand interpreted as an unsigned amount.- Portable: the count is reduced modulo W (only its low log2(W) bits matter), so every count is defined and a high-bit-set ("negative") count simply reduces mod W.
- Target-defined (
CG_IR_INST_TARGET_SHIFT_EDGES): an out-of-range count follows the target shift instruction's own masking or zeroing.
and/or/xorare total bitwise ops with no edge cases.
Floating point
The IR's floating-point operations are strict IEEE-754 in the target's default environment: round-to-nearest-ties-to-even, non-trapping exceptions (status-flag only), no denormal flushing. These are portable; the IR does not represent alternate rounding modes or fast-math relaxations (the public API's rounding argument and FP fast-math flags are dropped at the IR level unless the frontend realizes them as explicit operations).
fadd/fsub/fmul/fdivproduce the correctly-rounded IEEE result. A NaN operand yields a quiet NaN.x/0 → ±∞(sign per operands),0/0 → NaN,∞/∞ → NaN.- There is no FP remainder primitive; the frontend lowers a floating
%to a runtime call (fmod). fnegflips the sign bit — it is not0 - x: it negates zeros and infinities and toggles a NaN's sign without otherwise altering its payload.- Compares. The relational FP compares
lt_f,le_f,gt_f,ge_fare ordered: if either operand is NaN the result isfalse. On floating operandseqis ordered-equal (NaN →false) andneis unordered-not-equal (NaN →true), matching C==/!=. A frontend needing an unordered relational composes it as the negation of the opposite ordered compare (a ULT b ≡ !(a OGE b)), since negating an ordered compare turns the NaN result totrue.- Spec note / known gap: the current public→IR lowering (
api_map_fp_cmpin src/cg/value.c) maps both the ordered and the unordered relational forms to the same internal op, so the ordered/unordered distinction for<,<=,>,>=is presently lost at the IR boundary — correct only under a no-NaN assumption. Resolving it (unordered relational variants, or the explicit NaN composition above emitted by the frontend) is an implementation follow-up; the rule above is the intended contract.
- Spec note / known gap: the current public→IR lowering (
Conversions
sext/zextrequire dst width > src width and sign-/zero-extend;truncrequires dst width < src width and keeps the low dst bits. (Width ordering is a precondition.)itof_s/itof_uconvert integer→float with round-to-nearest-even; magnitudes beyond the float's range round to ±∞ per IEEE.ftoi_s/ftoi_uconvert float→int rounding toward zero (truncation); in-range values drop their fraction.- Portable: out-of-range and non-finite inputs saturate — above the
destination max → max, below the min → min (
0for the unsigned floor) — and NaN → 0. - Target-defined (
CG_IR_INST_TARGET_FPTOINT_EDGES): the result follows the target convert instruction (e.g. x86-64cvttsd2siyields the "integer indefinite"INT_MINon overflow/NaN; AArch64fcvtzssaturates).
- Portable: out-of-range and non-finite inputs saturate — above the
destination max → max, below the min → min (
fextwidens exactly (no rounding);ftruncnarrows with round-to-nearest- even, overflow → ±∞.bitcastrequires equal byte size and reinterprets the operand's target ABI bit pattern without changing bits. Pointer↔integer of equal width is a bitcast.
Memory: load, store, aggregate, bitfield
- Address validity. A
load/store/aggregate/bitfield/atomic op requires its effective address to reference a live object of at leastsizebytes in the access's address space. This is not portably checked: an invalid or out-of-bounds access (including a null dereference) produces a target-defined fault — the deterministic behavior of the target's load/store against that address (a trap on an MMU target; a read or write of whatever occupies the address on a flat-memory target). It is target-defined, never unconstrained, and never governed by a semantic-mode flag. - Alignment.
MemAccess.alignis a promise: the producer asserts the address is at least that aligned (natural alignment for the type whenalign == 0), and a target may use the promise to choose wider instructions. Recording an access whose address is in fact less aligned than stated, withoutMF_UNALIGNED, is a precondition violation; on a strict-alignment target it faults (target-defined).MF_UNALIGNEDdeclares the access may be unaligned and obliges the consumer to emit an unaligned-capable sequence; it is then fully defined. - Uninitialized reads. Reading a local or memory location not yet assigned on the current dynamic path yields an unspecified value of the access type — an arbitrary but type-valid bit pattern. It never traps and never corrupts other state; it is not poison and not undefined behavior. Producers should define every location before reading it for determinism, but doing otherwise stays within defined IR.
- Volatile.
MF_VOLATILEaccesses are observable side effects: they must not be added, removed, duplicated, or reordered with respect to other volatile or atomic accesses. - Aggregates.
agg_copycopiessizebytes and requires source and destination ranges not to overlap (memcpy semantics); overlap is a precondition violation — use theMEMMOVEintrinsic for overlap.agg_setfillssizebytes with the byte value.size == 0is a defined no-op. - Bitfields.
bitfield_load/bitfield_storeaccess bits[bit_offset, bit_offset+bit_width)within the storage unit atstorage_offset; the range must lie within the unit (precondition). A load sign- or zero-extends persigned_. A store uses the lowbit_widthbits of the source and leaves bits outside the field unchanged. A zero-width field (bit_width == 0) is a layout barrier only and performs no memory access.
Control flow
- Labels. Every label named by a branch, switch, computed-goto target set, or
label-address op belongs to the same function and is placed exactly once
(one
CG_IR_LABEL); placement may follow use in tape order. (Preconditions.) - Terminators and reachability. Every dynamic path ends in a terminator
(
ret, aCG_CALL_TAILcall,INTRIN_UNREACHABLE/TRAP/LONGJMP,indirect_branch, or abrthat ultimately reaches one). Falling off the end of the instruction stream without a terminator is malformed. Instructions after a terminator are reachable only through a label. - Switch. The selector is compared against each case
valueusingselector_type's width and signedness; a match transfers to that case's label, otherwise todefault_label(LABEL_NONEmeans fall through past the switch). Case values are distinct (a precondition); the IR defines no tie-break. - Computed goto.
indirect_branchtransfers to the label address in its operand, which must be one of thentargetslabels in its closed set (ntargets > 0, a precondition). The set is exhaustive: a runtime address outside it is target-defined (branch-protection hardening may fault). Label addresses (load_label_addr,local_static_data_label_addr) are opaque tokens valid only within the defining function's activation; they may be stored, loaded, compared for equality, and consumed byindirect_branch, but never called or dereferenced as data.
Calls and returns
- A call's argument and result locals match
fn_typein count and type; for a variadic callee the fixed parameters match and variadic arguments are already promoted by the frontend (preconditions). Calling through an invalid function pointer is a target-defined fault. A direct call uses anOPK_GLOBALcallee; any other callee operand is indirect. retreturns exactly the function's declared result locals, in order and type (precondition). A tail call carriesCG_CALL_TAIL, obeys the realizability contract above, is a terminator, and is never followed by aret.
Stack allocation and variadics
allocaallocatessizebytes (an unsigned byte count) aligned toalign(a power of two; precondition), valid for the rest of the function activation. Exhausting the stack is a target-defined trap.va_start/va_arg/va_end/va_copyoperate on a target-ABI vararg-state object addressed by pointer.va_arg's type must match the promoted type of the corresponding actual argument, and the number ofva_argreads must not exceed the variadic arguments actually passed (preconditions); violating either is target-defined (it reads adjacent argument storage).va_startprecedesva_arg/va_endon the same state;va_copyduplicates state.
Atomics
- Order legality (preconditions, per the C11 memory model; mirrored by
kit_cg_atomic_is_legal):atomic_load:relaxed,consume,acquire, orseq_cst.atomic_store:relaxed,release, orseq_cst.atomic_rmw: any order.atomic_cas: anysuccessorder;failure∈ {relaxed,consume,acquire,seq_cst} and no stronger thansuccess.fence: any order (arelaxedfence has no effect).
- The access must be a supported atomic width and naturally aligned for a
lock-free operation; otherwise the consumer may lower to a runtime atomic call
(target-defined mechanism, same observable semantics). Atomic ops are
observable and must preserve the ordering the memory model requires.
rmwdefines the prior value;casdefines the prior value and a success bool and compares using the full access width.
Intrinsics and inline asm
Operand shapes are fixed per IntrinKind (src/cg/cgtarget.h). Semantic edges:
CLZ(0)andCTZ(0)are defined to equal the operand's bit width (stronger than C, where they are undefined).POPCOUNT,BSWAP16/32/64are total.SADD/UADD/SSUB/USUB/SMUL/UMUL_OVERFLOWdefine a two's-complement wrapped result and a boolean overflow flag.MEMCPYrequires non-overlapping ranges;MEMMOVEpermits overlap;MEMSETfills. All are defined no-ops atsize == 0.SETJMPreturns0on the direct call and the value passed to the matchingLONGJMPwhen it returns again (aLONGJMPvalue of0surfaces as1); it "returns twice."LONGJMPdoes not return. Consumers must preserve both control effects.ASSUME_ALIGNEDreturns its pointer and asserts the stated alignment (a precondition; a wrong assertion is target-defined).EXPECTreturns its value unchanged (a branch-probability hint).PREFETCHhas no value effect.TRAPis a deterministic abort.UNREACHABLEasserts the point is never reached and is itself a terminator; if control does reach it the behavior is a target-defined trap, and consumers may assume it unreachable (e.g. to prune successors). Neither corrupts unrelated state.asm_blockand file-scope asm are opaque target assembly. The IR fixes the interface — operand directions, clobbers, volatility — but the assembly's own behavior is target/external, modeled conservatively (treated as reading and writing its declared operands and clobbers and as an observable side effect unless flagged otherwise). This is external behavior, not undefined behavior.
Well-formedness (invariants)
A well-formed tape satisfies all of the following; consumers may assume them, and a violation is a producer bug (malformed IR), not program behavior. These are the structural half of "no undefined behavior" — the runtime half is the edge-case section above.
- Sentinels are zero-valued (
CG_LOCAL_NONE,LABEL_NONE,CG_SCOPE_NONE,OBJ_SYM_NONE); local, label, and scope ids are 1-based. - Every local has exactly one declared type for the whole function, and every source and destination local is declared before use.
- Destinations are
OPK_LOCAL. Operand kinds match each op's contract (src/cg/cgtarget.h): FP arithmetic andfnegrequireOPK_LOCALsources;binop/unop/cmpalso acceptOPK_IMM; addresses areOPK_LOCAL/OPK_GLOBAL/OPK_INDIRECT; anOPK_INDIRECTindex is an integer local with log2 scale 0..3. - Integer
binop/cmpoperands (and the binop destination) share one width; conversions obey their width-ordering rules. - A control-transfer op names labels in the same function; only a call targets a symbol or function-pointer value. Each label is placed exactly once; every path ends in a terminator. Switch case values are distinct; a computed goto's target set is non-empty and closed.
- Calls and returns match the function/callee type in arity and operand type; atomic orders are legal for their op (above).
- Data-layout facts (sizes, alignments, field offsets, bit ranges, ABI shape) are already target-selected; consumers must not reinterpret them for a different target.
- Source locations are sticky at recording time and stamped on each instruction.
Consumer guidance
Anything that reads the IR is reading a layout-resolved, ABI-shaped, but
machine-neutral program. The contract a consumer must respect: preserve
target-data-layout semantics, memory observability (the MemFlag set and alias
roots on each access), the ABI shape of calls and returns, and CFG validity. It
must also implement at least the portable edge-case semantics of every op,
and honor the CgIrInst.flags semantic-mode bits where it understands them —
falling back to portable semantics (a safe refinement) for any bit it does not.
A consumer may assume a well-formed tape; it must not introduce undefined
behavior of its own where the IR defines a result.
Two consumers exist today, and they take different paths:
The optimizer (see OPT.md) does not run passes on the CG IR in place. It converts each
CgIrFuncinto its ownFuncIR (opt_func_from_cg_irin src/opt/cg_ir_lower.c), which materializes basic blocks, SSA, virtual registers, and frame objects, then runs CFG cleanup, simplification, machinization, liveness, register allocation, and emission, and finally replays into the wrapped direct backend. This conversion is why SSA/phi/const ops live in the optimizer's enum and never in the CG IR.The interpreter (see INTERPRETER.md) also goes through the optimizer's
Funcform, but via a reduced pipeline (opt_run_o1_interpin src/opt/opt.c) that stops before machinization: it builds the CFG, runs target-independent cleanups, promotes scalar locals, and hands a Func with virtual registers to the interpreter loader (src/interp/lower.c), which emits fixed-width bytecode. The interpreter never consumes the native MIR/regalloc view, keeping execution aligned with CG semantics rather than backend emission. Address-taken locals, aggregates, globals, TLS instances, and allocas become byte-addressable interpreter memory; label addresses stay opaque tokens; interpreted-to-interpreted calls dispatch through retained function bodies and external calls go through an FFI layer.
Source-like backends (the C-source target, a future Wasm target) can instead
replay the tape op-by-op into a direct CgTarget, taking advantage of the
retained structured scopes and switch descriptors.