doc: describe lowered-CG IR - kit

commit 6e35c2a0243005dccb9ca712b84de24705e4d0f6
parent 6a3230be6f5368231610c029cd4076aa97a4d98b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 25 May 2026 13:05:03 -0700

doc: describe lowered-CG IR

Diffstat:
A doc/IR.md  | 490 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 490 insertions(+), 0 deletions(-)
diff --git a/doc/IR.md b/doc/IR.md
@@ -0,0 +1,490 @@
+# IR
+
+This document defines the target shape for cfree's lowered-CG IR: the
+function-level representation recorded from the internal `CGTarget` interface
+and consumed by optimization, backend replay, and future non-JIT interpreted
+execution.
+
+The intended boundary is a shared recorded-CG IR layer. Frontends keep emitting
+public `CfreeCg` calls, `CfreeCg` lowers stack/lvalue source operations into
+`CGTarget` calls, and this IR records the resulting typed locals, labels,
+control-flow operations, memory operations, and ABI-shaped call operations.
+Optimizers and backends may attach private side tables or lowered views, but
+those are not part of the shared IR contract.
+
+## Pipeline Position
+
+The IR sits below the public CG API and above target machinization:
+
+```text
+frontend -> CfreeCg -> recording CGTarget -> lowered-CG IR
+                                      |-> optimizer-derived CFG/SSA/MIR views
+                                      |-> interpreter
+                                      |-> replay into native/C/wasm target
+```
+
+The IR is target-data-layout-specific: type sizes, alignments, record field
+offsets, bitfield positions, ABI classifications, and pointer widths are
+already selected for the compiler target. It is not target-instruction-specific
+until `opt_machinize` or an equivalent backend-prep pass runs.
+
+An interpreter should execute this pre-machinize IR. Post-machinize and
+post-register-allocation forms contain hard registers, spill slots, scratch
+policies, call plans, and backend emission constraints; those are not semantic
+execution concepts.
+
+## Types
+
+Every IR local has a `CfreeCgTypeId`. IR value types are target-selected CG
+storage types, not frontend AST types:
+
+- `void`: absence of a value.
+- `bool`: i1 condition and compare result.
+- integers: width-only i8, i16, i32, i64, and i128. Signedness is carried by
+  operations, comparisons, conversions, and ABI attributes.
+- floats: f32, f64, f80, and f128. `f80` is needed for x87-style extended
+  precision targets; targets that do not support it reject or lower it through
+  runtime helpers.
+- pointers: pointer-sized values with an address space.
+- function pointers: pointer values whose pointee type is a function type.
+- aggregates: opaque object types with size and alignment. Source arrays and
+  records lower to aggregate storage at this level; field identity is already
+  gone and record layout has become byte offsets, bit ranges, and aggregate
+  sizes.
+- `vararg_state`: target ABI vararg state object, accessed through addressable
+  storage.
+
+Enums and aliases are not distinct IR value types. They lower to their storage
+type before reaching this layer. Frontends and debug metadata may retain source
+identity separately.
+
+ABI values may decompose one source type into multiple storage parts for
+argument passing and returns. The IR records that ABI shape at call and return
+sites, while ordinary value ops remain typed by their CG value type.
+
+## Representation
+
+### Functions
+
+A `Func` is one function body. It owns the semantic IR needed to execute or
+replay that body:
+
+- The preserved `CGFuncDesc`, including the function symbol, type, ABI
+  classification, source location, and function attributes.
+- A linear instruction stream with labels and explicit control-transfer ops.
+- Typed IR locals for parameters, source locals, compiler temporaries, aggregate
+  homes, call results, and dynamic-allocation handles.
+- Optional source/debug metadata attached to function, local, label, and
+  instruction records.
+
+The IR contract should not expose optimizer state such as SSA construction
+tables, block arrays, dominance, liveness, register allocation, hard-register
+metadata, or pass scratch. Optimizers may derive and cache those views
+privately.
+
+There is one local namespace at this level. A local is a mutable typed location.
+Operations define destination locals and read source locals. Taking the address
+of a local with `IR_ADDR_OF` makes it addressable; backends may then home it in
+a frame slot, static storage, interpreter activation memory, or another
+target-specific location. An aggregate local is the way to model fixed-size
+local storage independent of scalar registers.
+
+### Labels and Derived Blocks
+
+Labels are part of the shared IR contract because they are exposed by CG-level
+control-flow operations: branch targets, switch targets, label addresses, and
+computed-goto valid target sets all name labels.
+
+Basic blocks are a derived view, not the base IR API. An optimizer or
+interpreter may split the instruction stream at labels and terminators to build
+a CFG with predecessor/successor lists. That CFG may cache layout order,
+fallthrough edges, dominance, and block-local analysis, but those belong to the
+consumer's view.
+
+### Instructions
+
+An `Inst` has:
+
+- `op`: an `IROp`.
+- `loc`: the sticky source location active when the instruction was recorded.
+- Destination locals, if the op produces values.
+- Source operands.
+- `extra`: immediate, constant bytes, memory access, or op-specific auxiliary
+  data.
+
+Destination arity is op-specific. Each destination names a typed IR local, so
+the instruction does not need an independent result type. Multi-result ops such
+as calls, compare-and-swap, and checked-arithmetic intrinsics list multiple
+destination locals.
+
+Most IR ops correspond one-to-one with a `CGTarget` method. SSA-only helpers
+such as phi nodes are optimizer-internal extensions, not base IR ops.
+
+### Locals and Addresses
+
+The IR has one mutable local namespace.
+
+Scalar locals hold scalar values. Aggregate locals hold opaque bytes with target
+size and alignment. A local may be used as an ordinary value, assigned by value
+operations, or addressed by `IR_ADDR_OF`. Address-taken locals are lowered by
+consumers to concrete storage. Non-address-taken scalar locals may remain in
+registers, SSA values, interpreter slots, or other consumer-owned storage.
+
+Function pointers and ordinary object pointers are produced by address
+materialization. A direct function declaration is an object symbol with function
+type; materializing `&fn + addend` is `IR_ADDR_OF` over a global symbol operand,
+and direct calls may also carry that global symbol directly as the callee.
+Indirect calls use a local containing a function-pointer value.
+
+Function-local goto labels are a different pointer-like value. `IR_LOAD_LABEL_ADDR`
+materializes an opaque label token into a local. The token may be stored,
+loaded, compared, selected, and consumed by `IR_INDIRECT_BRANCH` inside the
+same function activation. It is not a function pointer, not callable, and not
+dereferenceable as data. Static dispatch tables use the data equivalent of the
+same operation: a label-address data relocation tied to the containing function.
+
+### Operands
+
+IR uses the internal `Operand` shape:
+
+- `OPK_IMM`: signed immediate bit pattern.
+- `OPK_LOCAL`: typed IR local.
+- `OPK_GLOBAL`: object symbol plus addend address.
+- `OPK_INDIRECT`: base pointer local plus optional index local, scale, and
+  offset.
+
+There is no distinct `OPK_REG` in the base IR. Register-like temporaries are IR
+locals. Optimizers may derive SSA values or machine virtual registers as private
+views, but those are not the API-level operand model.
+
+### Memory
+
+Memory accesses carry `MemAccess`:
+
+- Codegen type and access size.
+- Known alignment.
+- Volatile, atomic, restrict, readonly, writeonly, and unaligned flags.
+- Address space.
+- Alias root when known.
+
+Non-volatile scalar loads are ordinary pure value producers. Volatile loads,
+stores, aggregate memory operations, bitfield stores, atomics, fences, calls,
+inline asm, and relevant intrinsics are observable. Optimizations may remove or
+reorder memory operations only when these flags and alias facts make that legal.
+
+### Control Flow
+
+The base IR keeps CG's control-flow model: labels, explicit unstructured
+branches, structured scopes, returns, tail transfers, switches, computed gotos,
+and terminating intrinsics. Unstructured control-flow ops name labels directly.
+Structured control-flow ops name scope handles whose metadata records the
+associated break, continue, else, and end labels.
+
+Consumers that need CFG form derive blocks and successor edges from labels,
+structured-scope metadata, terminators, and lexical instruction order. If
+control can continue at the next instruction, the op is not a terminator.
+
+## Operation Semantics
+
+### Administrative Ops
+
+- `IR_NOP`: no effect. Used as a deletion marker.
+
+Parameters are function/local declarations, not executable base-IR operations.
+The function descriptor and local table identify each parameter local, its
+source/debug metadata, type, index, and ABI incoming shape.
+
+### Data Movement
+
+- `IR_LOAD_IMM`: assign destination local from an integer-like immediate bit
+  pattern in `extra.imm`. This covers null pointers, integer constants, bools,
+  and small immediates that fit the immediate field.
+- `IR_LOAD_CONST`: assign destination local from target ABI bytes in
+  `extra.cbytes`. This covers constants whose representation is byte-oriented
+  rather than integer-immediate-oriented, such as floating constants, i128,
+  f128/f80, and other fixed-size constants that should be preserved exactly.
+- `IR_COPY`: assign destination local from source local.
+- `IR_LOAD`: load a scalar value from a local/global/indirect address using
+  `extra.mem` into destination local.
+- `IR_STORE`: store a scalar local or immediate to a local/global/indirect
+  address using `extra.mem`.
+- `IR_ADDR_OF`: materialize the address of a local/global/indirect lvalue.
+- `IR_TLS_ADDR_OF`: materialize the address of a thread-local object for the
+  current thread, using target TLS semantics. This remains a separate op rather
+  than a flag on `IR_ADDR_OF` because TLS address materialization can require
+  target-selected model logic, relocations, helper calls, or thread-pointer
+  arithmetic; it is not just an address-space property of an ordinary lvalue.
+- `IR_AGG_COPY`: copy a fixed-size aggregate byte range from `src` to `dst`.
+- `IR_AGG_SET`: set a fixed-size aggregate byte range at `dst` to a byte value.
+- `IR_BITFIELD_LOAD`: load and extract a bitfield from a record storage unit.
+- `IR_BITFIELD_STORE`: insert a bitfield into a record storage unit.
+
+`IR_LOAD`, `IR_STORE`, aggregate ops, and bitfield ops use target layout facts
+already encoded in their operands and auxiliary records.
+
+### Arithmetic and Conversions
+
+- `IR_BINOP`: integer or floating binary operation. The operation tag is a
+  `BinOp` in `extra.imm`; operands are `dst, a, b`.
+- `IR_UNOP`: unary operation. The operation tag is a `UnOp` in `extra.imm`;
+  operands are `dst, a`.
+- `IR_CMP`: compare operation. The comparison tag is a `CmpOp` in `extra.imm`;
+  operands are `dst, a, b`; result is an i1/bool value.
+- `IR_CONVERT`: conversion operation. The conversion tag is a `ConvKind` in
+  `extra.imm`; operands are `dst, src`.
+
+Integer types are width-only. Signedness is carried by op variants such as
+signed divide, signed compare, sign-extension, and signed integer/float
+conversion.
+
+### Calls and Returns
+
+- `IR_CALL`: call a direct or indirect callee described by `IRCallAux`.
+- `IR_RET`: return the optional ABI value described by `IRRetAux`.
+
+Calls preserve the full `CGCallDesc`: function type, ABI classification,
+callee operand, argument ABI values, result ABI value, tail-call flag, and
+inline policy. Direct calls use an `OPK_GLOBAL` callee. Other callee operands
+are indirect calls.
+
+Tail calls are represented by `IR_CALL`, not by `IR_RET`. A normal call has a
+local continuation and may be followed by `IR_RET` if the caller returns the
+call result. A required or selected tail transfer is a terminating `IR_CALL`
+with no local successor and no following `IR_RET`.
+
+`CGABIValue` may describe a scalar value, an indirect/byval/sret address, or
+multiple ABI-decomposed parts. The IR records enough information for replay,
+optimization, or interpretation without re-running frontend type checking.
+
+### Branching
+
+- `IR_BR`: unconditional branch to one target label.
+- `IR_CONDBR`: branch on a bool local to true and false target labels.
+- `IR_CMP_BRANCH`: fused compare-and-branch; operands are `a, b`, comparison
+  tag is `CmpOp`, and targets are taken and fallthrough labels.
+- `IR_SWITCH`: branch on selector to matching case label, else default label.
+- `IR_LOAD_LABEL_ADDR`: assign a local the opaque address of a
+  function-local label.
+- `IR_INDIRECT_BRANCH`: branch to a label address, constrained to the closed
+  target set recorded in `IRIndirectAux`.
+
+Label addresses are function-local opaque values. They may be compared, stored,
+loaded, selected, and consumed by `IR_INDIRECT_BRANCH` in the same function
+activation; they are not callable function pointers and are not dereferenceable
+data pointers.
+
+### Structured Control
+
+- `IR_SCOPE_BEGIN`: begin a structured block, loop, or if scope. `IRScopeAux`
+  records target scope id and associated labels.
+- `IR_SCOPE_ELSE`: transition to the else arm for an if scope.
+- `IR_SCOPE_END`: close a structured scope.
+- `IR_BREAK_TO`: transfer to a scope's break target.
+- `IR_CONTINUE_TO`: transfer to a scope's continue target.
+
+Structured ops exist so backends that can express structure, such as a C-source
+or future wasm target, can replay it. Native CFG consumers may lower them to
+ordinary labels and branches.
+
+### Stack Allocation and Variadics
+
+- `IR_ALLOCA`: dynamic stack allocation. Operands are `dst, size`;
+  `extra.imm` is required alignment.
+- `IR_VA_START`: initialize a target ABI vararg state at `ap`.
+- `IR_VA_ARG`: read the next vararg value of type `extra.aux` into `dst`.
+- `IR_VA_END`: end a vararg state.
+- `IR_VA_COPY`: copy one vararg state to another.
+
+These model target calling-convention variadics, not language-level rest
+parameters.
+
+### Atomics
+
+- `IR_ATOMIC_LOAD`: atomic load from `addr` into `dst`.
+- `IR_ATOMIC_STORE`: atomic store from `src` to `addr`.
+- `IR_ATOMIC_RMW`: atomic read-modify-write; defines the prior value.
+- `IR_ATOMIC_CAS`: compare-and-swap; defines prior value and success bool.
+- `IR_FENCE`: memory fence with `MemOrder` in `extra.imm`.
+
+Atomic accesses carry both `MemAccess` and memory-order metadata. They are
+observable and must preserve the ordering required by the memory model.
+
+### Inline Assembly
+
+- `IR_ASM_BLOCK`: one inline assembly block with template, constraints,
+  clobbers, input operands, and output operands.
+
+Constraint strings remain target-specific. Optimization may inspect clobbers
+and operands but must preserve the asm block unless it can prove the source
+contract permits removal. Current consumers treat inline asm conservatively.
+
+### Intrinsics
+
+- `IR_INTRINSIC`: compiler intrinsic identified by `IntrinKind`, with explicit
+  destination and argument operands.
+
+Intrinsic semantics depend on the kind:
+
+- Bit operations: popcount, ctz, clz, bswap.
+- Memory helpers: memcpy, memmove, memset, prefetch, assume-aligned.
+- Hints/control: expect, unreachable, trap.
+- Non-local control: setjmp, longjmp.
+- Checked arithmetic: signed/unsigned add, subtract, multiply with overflow.
+
+Some intrinsics are pure value producers with destinations, some are observable
+side effects, and some are terminators. Consumers must classify by
+`IntrinKind`, not by `IR_INTRINSIC` alone.
+
+### Optimizer Extensions
+
+Optimizer-owned views may add ops that are not part of the base IR API:
+
+- `IR_PARAM_DECL`: implementation artifact used by the current opt recorder to
+  place a definition for register-backed parameter locals in the entry stream.
+  Base IR consumers should get parameter information from the function's
+  parameter/local declarations instead.
+- `IR_CONST_I`: SSA integer constant. Recording uses `IR_LOAD_IMM`.
+- `IR_CONST_BYTES`: SSA byte constant. Recording uses `IR_LOAD_CONST`.
+- `IR_PHI`: SSA merge for a derived CFG block.
+
+These ops should not leak into replay or interpretation unless that consumer
+explicitly opts into the optimizer's SSA view.
+
+## Invariants
+
+- Local id zero, label zero, symbol zero, and related `*_NONE` constants are
+  sentinels.
+- A local has exactly one declared type for the whole function.
+- Every destination and source local must be declared before use.
+- A control-transfer op's target labels must name labels in the same function,
+  except for ordinary call targets represented as symbols or pointer values.
+- A terminating op ends the current linear control path. Any following
+  reachable instruction must be made reachable through a label.
+- Source locations are sticky at CG recording time and stored per instruction.
+- Data layout facts are already target-selected; consumers must not reinterpret
+  record or bitfield layout for another target.
+
+## Consumer Guidance
+
+Optimization may transform the IR as long as it preserves target-data-layout
+semantics, memory observability, ABI-shaped calls/returns, and CFG validity.
+
+Backend replay may either emit each op directly to a `CGTarget` or run a
+target-prep pipeline first. Native targets generally need machinization,
+liveness, register allocation, and final replay. Source-like targets may prefer
+direct replay.
+
+Interpreted execution should use the pre-machinize IR:
+
+- Maintain activation storage for typed IR locals.
+- Represent address-taken locals, aggregate locals, and globals as
+  byte-addressable target-layout memory.
+- Execute control transfers by label, or by a derived CFG block id in an
+  interpreter-owned view.
+- Treat label addresses as opaque function-local label tokens.
+- Implement interpreted-to-interpreted calls from retained `Func` bodies.
+- Route external calls through an explicit host-helper/FFI layer rather than
+  lowering the interpreter to machine ABI internals.
+
+This keeps interpretation aligned with CG semantics while avoiding native-code
+emission details.
+
+## Implementation Plan
+
+The migration should be a clean cutover of the semantic `CgTarget` interface,
+not an incremental layering of another IR beside the current target API.
+
+The O0 path must remain direct:
+
+```text
+CfreeCg -> semantic CgTarget
+             |-> native direct target   (O0)
+             |-> C source direct target (--emit=c)
+             |-> IR recorder            (O1/O2/interpreter)
+```
+
+Direct targets implement the semantic interface and emit immediately. The IR
+recorder implements the same interface and stores the clean IR. This preserves
+O0 compile-time behavior while giving optimized and interpreted paths a stable
+recorded form.
+
+### Phase 1: Cut Over `CgTarget`
+
+Update the internal target interface to match the clean IR model:
+
+- Use one typed mutable local namespace for scalar temporaries, source locals,
+  parameters, aggregate homes, and call results.
+- Make parameter information part of function/local declarations, not an
+  executable target op.
+- Remove `OPK_REG` from the semantic target surface. Register allocation is a
+  native-lowering concern.
+- Keep labels in the base interface. Blocks remain derived consumer views.
+- Keep `IR_ADDR_OF`, `IR_TLS_ADDR_OF`, and `IR_LOAD_LABEL_ADDR` distinct.
+- Keep ABI-shaped call descriptors, including decomposed arguments and returns.
+- Keep aggregate, bitfield, atomic, inline asm, intrinsic, and structured
+  control operations at the semantic level.
+
+Native O0 targets may still map locals immediately to registers, frame homes,
+or target-private storage. Taking the address of a local must force or require
+a concrete home; frontends/CG should continue marking known address-taken and
+memory-required locals so O0 does not need avoidable late repair.
+
+### Phase 2: Update Direct Backends
+
+Port the native and C-source targets to the new semantic interface.
+
+Native targets should keep their current emission strategy:
+
+- Map non-address-taken scalar locals to backend temporaries/registers.
+- Map aggregate and address-taken locals to frame or equivalent storage.
+- Materialize global function/object addresses with ordinary `addr_of`.
+- Materialize TLS addresses through the dedicated TLS op.
+- Materialize function-local label addresses through the label-address op.
+- Lower calls directly from ABI-shaped descriptors.
+
+The C-source target should become simpler: typed locals become C temporaries or
+opaque aggregate storage, records/arrays remain raw-byte aggregate typedefs, and
+enums/aliases stay lowered to storage types except in source/debug metadata.
+
+### Phase 3: Update O1 Optimizer
+
+Make the O1 path record clean IR first, then derive the current backend-oriented
+view needed by the existing O1 machinery:
+
+```text
+clean IR
+  -> derive CFG from labels and terminators
+  -> classify locals: scalar, address-taken, aggregate, ABI home
+  -> lower locals to virtual registers or frame/storage objects
+  -> lower calls to backend call plans where useful
+  -> run existing O1 passes
+  -> replay into native target
+```
+
+This front conversion lets the existing O1 pipeline survive the cutover:
+CFG cleanup, local simplify, machinize, liveness, dead-def elimination,
+register allocation, combine, and emit can continue operating on an internal
+MIR-like view. That view may still use virtual registers, frame slots, phis, and
+block arrays because it is optimizer-private.
+
+O2 can be ignored during the first cut. Once O1 is stable, O2 can either start
+from clean IR or reuse the same O1-derived view and then add SSA/inlining back
+on top.
+
+### Phase 4: Add Interpreter
+
+Add interpreted execution after the clean recorder and O1 conversion are stable.
+
+The interpreter should execute the clean pre-machinize IR:
+
+- Allocate activation storage for typed locals.
+- Use byte-addressable memory for aggregates, address-taken locals, globals,
+  TLS instances, and dynamic allocas.
+- Execute labels directly or through an interpreter-owned CFG view.
+- Treat label addresses as opaque function-local tokens.
+- Dispatch interpreted-to-interpreted calls through retained clean IR function
+  bodies.
+- Route external calls through an explicit host-helper or FFI layer.
+
+The interpreter should not consume the native MIR/regalloc view. That keeps it
+aligned with language/CG semantics rather than backend emission details.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README