kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 6e35c2a0243005dccb9ca712b84de24705e4d0f6
parent 6a3230be6f5368231610c029cd4076aa97a4d98b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 25 May 2026 13:05:03 -0700

doc: describe lowered-CG IR

Diffstat:
Adoc/IR.md | 490+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 490 insertions(+), 0 deletions(-)

diff --git a/doc/IR.md b/doc/IR.md @@ -0,0 +1,490 @@ +# IR + +This document defines the target shape for cfree's lowered-CG IR: the +function-level representation recorded from the internal `CGTarget` interface +and consumed by optimization, backend replay, and future non-JIT interpreted +execution. + +The intended boundary is a shared recorded-CG IR layer. Frontends keep emitting +public `CfreeCg` calls, `CfreeCg` lowers stack/lvalue source operations into +`CGTarget` calls, and this IR records the resulting typed locals, labels, +control-flow operations, memory operations, and ABI-shaped call operations. +Optimizers and backends may attach private side tables or lowered views, but +those are not part of the shared IR contract. + +## Pipeline Position + +The IR sits below the public CG API and above target machinization: + +```text +frontend -> CfreeCg -> recording CGTarget -> lowered-CG IR + |-> optimizer-derived CFG/SSA/MIR views + |-> interpreter + |-> replay into native/C/wasm target +``` + +The IR is target-data-layout-specific: type sizes, alignments, record field +offsets, bitfield positions, ABI classifications, and pointer widths are +already selected for the compiler target. It is not target-instruction-specific +until `opt_machinize` or an equivalent backend-prep pass runs. + +An interpreter should execute this pre-machinize IR. Post-machinize and +post-register-allocation forms contain hard registers, spill slots, scratch +policies, call plans, and backend emission constraints; those are not semantic +execution concepts. + +## Types + +Every IR local has a `CfreeCgTypeId`. IR value types are target-selected CG +storage types, not frontend AST types: + +- `void`: absence of a value. +- `bool`: i1 condition and compare result. +- integers: width-only i8, i16, i32, i64, and i128. Signedness is carried by + operations, comparisons, conversions, and ABI attributes. +- floats: f32, f64, f80, and f128. `f80` is needed for x87-style extended + precision targets; targets that do not support it reject or lower it through + runtime helpers. +- pointers: pointer-sized values with an address space. +- function pointers: pointer values whose pointee type is a function type. +- aggregates: opaque object types with size and alignment. Source arrays and + records lower to aggregate storage at this level; field identity is already + gone and record layout has become byte offsets, bit ranges, and aggregate + sizes. +- `vararg_state`: target ABI vararg state object, accessed through addressable + storage. + +Enums and aliases are not distinct IR value types. They lower to their storage +type before reaching this layer. Frontends and debug metadata may retain source +identity separately. + +ABI values may decompose one source type into multiple storage parts for +argument passing and returns. The IR records that ABI shape at call and return +sites, while ordinary value ops remain typed by their CG value type. + +## Representation + +### Functions + +A `Func` is one function body. It owns the semantic IR needed to execute or +replay that body: + +- The preserved `CGFuncDesc`, including the function symbol, type, ABI + classification, source location, and function attributes. +- A linear instruction stream with labels and explicit control-transfer ops. +- Typed IR locals for parameters, source locals, compiler temporaries, aggregate + homes, call results, and dynamic-allocation handles. +- Optional source/debug metadata attached to function, local, label, and + instruction records. + +The IR contract should not expose optimizer state such as SSA construction +tables, block arrays, dominance, liveness, register allocation, hard-register +metadata, or pass scratch. Optimizers may derive and cache those views +privately. + +There is one local namespace at this level. A local is a mutable typed location. +Operations define destination locals and read source locals. Taking the address +of a local with `IR_ADDR_OF` makes it addressable; backends may then home it in +a frame slot, static storage, interpreter activation memory, or another +target-specific location. An aggregate local is the way to model fixed-size +local storage independent of scalar registers. + +### Labels and Derived Blocks + +Labels are part of the shared IR contract because they are exposed by CG-level +control-flow operations: branch targets, switch targets, label addresses, and +computed-goto valid target sets all name labels. + +Basic blocks are a derived view, not the base IR API. An optimizer or +interpreter may split the instruction stream at labels and terminators to build +a CFG with predecessor/successor lists. That CFG may cache layout order, +fallthrough edges, dominance, and block-local analysis, but those belong to the +consumer's view. + +### Instructions + +An `Inst` has: + +- `op`: an `IROp`. +- `loc`: the sticky source location active when the instruction was recorded. +- Destination locals, if the op produces values. +- Source operands. +- `extra`: immediate, constant bytes, memory access, or op-specific auxiliary + data. + +Destination arity is op-specific. Each destination names a typed IR local, so +the instruction does not need an independent result type. Multi-result ops such +as calls, compare-and-swap, and checked-arithmetic intrinsics list multiple +destination locals. + +Most IR ops correspond one-to-one with a `CGTarget` method. SSA-only helpers +such as phi nodes are optimizer-internal extensions, not base IR ops. + +### Locals and Addresses + +The IR has one mutable local namespace. + +Scalar locals hold scalar values. Aggregate locals hold opaque bytes with target +size and alignment. A local may be used as an ordinary value, assigned by value +operations, or addressed by `IR_ADDR_OF`. Address-taken locals are lowered by +consumers to concrete storage. Non-address-taken scalar locals may remain in +registers, SSA values, interpreter slots, or other consumer-owned storage. + +Function pointers and ordinary object pointers are produced by address +materialization. A direct function declaration is an object symbol with function +type; materializing `&fn + addend` is `IR_ADDR_OF` over a global symbol operand, +and direct calls may also carry that global symbol directly as the callee. +Indirect calls use a local containing a function-pointer value. + +Function-local goto labels are a different pointer-like value. `IR_LOAD_LABEL_ADDR` +materializes an opaque label token into a local. The token may be stored, +loaded, compared, selected, and consumed by `IR_INDIRECT_BRANCH` inside the +same function activation. It is not a function pointer, not callable, and not +dereferenceable as data. Static dispatch tables use the data equivalent of the +same operation: a label-address data relocation tied to the containing function. + +### Operands + +IR uses the internal `Operand` shape: + +- `OPK_IMM`: signed immediate bit pattern. +- `OPK_LOCAL`: typed IR local. +- `OPK_GLOBAL`: object symbol plus addend address. +- `OPK_INDIRECT`: base pointer local plus optional index local, scale, and + offset. + +There is no distinct `OPK_REG` in the base IR. Register-like temporaries are IR +locals. Optimizers may derive SSA values or machine virtual registers as private +views, but those are not the API-level operand model. + +### Memory + +Memory accesses carry `MemAccess`: + +- Codegen type and access size. +- Known alignment. +- Volatile, atomic, restrict, readonly, writeonly, and unaligned flags. +- Address space. +- Alias root when known. + +Non-volatile scalar loads are ordinary pure value producers. Volatile loads, +stores, aggregate memory operations, bitfield stores, atomics, fences, calls, +inline asm, and relevant intrinsics are observable. Optimizations may remove or +reorder memory operations only when these flags and alias facts make that legal. + +### Control Flow + +The base IR keeps CG's control-flow model: labels, explicit unstructured +branches, structured scopes, returns, tail transfers, switches, computed gotos, +and terminating intrinsics. Unstructured control-flow ops name labels directly. +Structured control-flow ops name scope handles whose metadata records the +associated break, continue, else, and end labels. + +Consumers that need CFG form derive blocks and successor edges from labels, +structured-scope metadata, terminators, and lexical instruction order. If +control can continue at the next instruction, the op is not a terminator. + +## Operation Semantics + +### Administrative Ops + +- `IR_NOP`: no effect. Used as a deletion marker. + +Parameters are function/local declarations, not executable base-IR operations. +The function descriptor and local table identify each parameter local, its +source/debug metadata, type, index, and ABI incoming shape. + +### Data Movement + +- `IR_LOAD_IMM`: assign destination local from an integer-like immediate bit + pattern in `extra.imm`. This covers null pointers, integer constants, bools, + and small immediates that fit the immediate field. +- `IR_LOAD_CONST`: assign destination local from target ABI bytes in + `extra.cbytes`. This covers constants whose representation is byte-oriented + rather than integer-immediate-oriented, such as floating constants, i128, + f128/f80, and other fixed-size constants that should be preserved exactly. +- `IR_COPY`: assign destination local from source local. +- `IR_LOAD`: load a scalar value from a local/global/indirect address using + `extra.mem` into destination local. +- `IR_STORE`: store a scalar local or immediate to a local/global/indirect + address using `extra.mem`. +- `IR_ADDR_OF`: materialize the address of a local/global/indirect lvalue. +- `IR_TLS_ADDR_OF`: materialize the address of a thread-local object for the + current thread, using target TLS semantics. This remains a separate op rather + than a flag on `IR_ADDR_OF` because TLS address materialization can require + target-selected model logic, relocations, helper calls, or thread-pointer + arithmetic; it is not just an address-space property of an ordinary lvalue. +- `IR_AGG_COPY`: copy a fixed-size aggregate byte range from `src` to `dst`. +- `IR_AGG_SET`: set a fixed-size aggregate byte range at `dst` to a byte value. +- `IR_BITFIELD_LOAD`: load and extract a bitfield from a record storage unit. +- `IR_BITFIELD_STORE`: insert a bitfield into a record storage unit. + +`IR_LOAD`, `IR_STORE`, aggregate ops, and bitfield ops use target layout facts +already encoded in their operands and auxiliary records. + +### Arithmetic and Conversions + +- `IR_BINOP`: integer or floating binary operation. The operation tag is a + `BinOp` in `extra.imm`; operands are `dst, a, b`. +- `IR_UNOP`: unary operation. The operation tag is a `UnOp` in `extra.imm`; + operands are `dst, a`. +- `IR_CMP`: compare operation. The comparison tag is a `CmpOp` in `extra.imm`; + operands are `dst, a, b`; result is an i1/bool value. +- `IR_CONVERT`: conversion operation. The conversion tag is a `ConvKind` in + `extra.imm`; operands are `dst, src`. + +Integer types are width-only. Signedness is carried by op variants such as +signed divide, signed compare, sign-extension, and signed integer/float +conversion. + +### Calls and Returns + +- `IR_CALL`: call a direct or indirect callee described by `IRCallAux`. +- `IR_RET`: return the optional ABI value described by `IRRetAux`. + +Calls preserve the full `CGCallDesc`: function type, ABI classification, +callee operand, argument ABI values, result ABI value, tail-call flag, and +inline policy. Direct calls use an `OPK_GLOBAL` callee. Other callee operands +are indirect calls. + +Tail calls are represented by `IR_CALL`, not by `IR_RET`. A normal call has a +local continuation and may be followed by `IR_RET` if the caller returns the +call result. A required or selected tail transfer is a terminating `IR_CALL` +with no local successor and no following `IR_RET`. + +`CGABIValue` may describe a scalar value, an indirect/byval/sret address, or +multiple ABI-decomposed parts. The IR records enough information for replay, +optimization, or interpretation without re-running frontend type checking. + +### Branching + +- `IR_BR`: unconditional branch to one target label. +- `IR_CONDBR`: branch on a bool local to true and false target labels. +- `IR_CMP_BRANCH`: fused compare-and-branch; operands are `a, b`, comparison + tag is `CmpOp`, and targets are taken and fallthrough labels. +- `IR_SWITCH`: branch on selector to matching case label, else default label. +- `IR_LOAD_LABEL_ADDR`: assign a local the opaque address of a + function-local label. +- `IR_INDIRECT_BRANCH`: branch to a label address, constrained to the closed + target set recorded in `IRIndirectAux`. + +Label addresses are function-local opaque values. They may be compared, stored, +loaded, selected, and consumed by `IR_INDIRECT_BRANCH` in the same function +activation; they are not callable function pointers and are not dereferenceable +data pointers. + +### Structured Control + +- `IR_SCOPE_BEGIN`: begin a structured block, loop, or if scope. `IRScopeAux` + records target scope id and associated labels. +- `IR_SCOPE_ELSE`: transition to the else arm for an if scope. +- `IR_SCOPE_END`: close a structured scope. +- `IR_BREAK_TO`: transfer to a scope's break target. +- `IR_CONTINUE_TO`: transfer to a scope's continue target. + +Structured ops exist so backends that can express structure, such as a C-source +or future wasm target, can replay it. Native CFG consumers may lower them to +ordinary labels and branches. + +### Stack Allocation and Variadics + +- `IR_ALLOCA`: dynamic stack allocation. Operands are `dst, size`; + `extra.imm` is required alignment. +- `IR_VA_START`: initialize a target ABI vararg state at `ap`. +- `IR_VA_ARG`: read the next vararg value of type `extra.aux` into `dst`. +- `IR_VA_END`: end a vararg state. +- `IR_VA_COPY`: copy one vararg state to another. + +These model target calling-convention variadics, not language-level rest +parameters. + +### Atomics + +- `IR_ATOMIC_LOAD`: atomic load from `addr` into `dst`. +- `IR_ATOMIC_STORE`: atomic store from `src` to `addr`. +- `IR_ATOMIC_RMW`: atomic read-modify-write; defines the prior value. +- `IR_ATOMIC_CAS`: compare-and-swap; defines prior value and success bool. +- `IR_FENCE`: memory fence with `MemOrder` in `extra.imm`. + +Atomic accesses carry both `MemAccess` and memory-order metadata. They are +observable and must preserve the ordering required by the memory model. + +### Inline Assembly + +- `IR_ASM_BLOCK`: one inline assembly block with template, constraints, + clobbers, input operands, and output operands. + +Constraint strings remain target-specific. Optimization may inspect clobbers +and operands but must preserve the asm block unless it can prove the source +contract permits removal. Current consumers treat inline asm conservatively. + +### Intrinsics + +- `IR_INTRINSIC`: compiler intrinsic identified by `IntrinKind`, with explicit + destination and argument operands. + +Intrinsic semantics depend on the kind: + +- Bit operations: popcount, ctz, clz, bswap. +- Memory helpers: memcpy, memmove, memset, prefetch, assume-aligned. +- Hints/control: expect, unreachable, trap. +- Non-local control: setjmp, longjmp. +- Checked arithmetic: signed/unsigned add, subtract, multiply with overflow. + +Some intrinsics are pure value producers with destinations, some are observable +side effects, and some are terminators. Consumers must classify by +`IntrinKind`, not by `IR_INTRINSIC` alone. + +### Optimizer Extensions + +Optimizer-owned views may add ops that are not part of the base IR API: + +- `IR_PARAM_DECL`: implementation artifact used by the current opt recorder to + place a definition for register-backed parameter locals in the entry stream. + Base IR consumers should get parameter information from the function's + parameter/local declarations instead. +- `IR_CONST_I`: SSA integer constant. Recording uses `IR_LOAD_IMM`. +- `IR_CONST_BYTES`: SSA byte constant. Recording uses `IR_LOAD_CONST`. +- `IR_PHI`: SSA merge for a derived CFG block. + +These ops should not leak into replay or interpretation unless that consumer +explicitly opts into the optimizer's SSA view. + +## Invariants + +- Local id zero, label zero, symbol zero, and related `*_NONE` constants are + sentinels. +- A local has exactly one declared type for the whole function. +- Every destination and source local must be declared before use. +- A control-transfer op's target labels must name labels in the same function, + except for ordinary call targets represented as symbols or pointer values. +- A terminating op ends the current linear control path. Any following + reachable instruction must be made reachable through a label. +- Source locations are sticky at CG recording time and stored per instruction. +- Data layout facts are already target-selected; consumers must not reinterpret + record or bitfield layout for another target. + +## Consumer Guidance + +Optimization may transform the IR as long as it preserves target-data-layout +semantics, memory observability, ABI-shaped calls/returns, and CFG validity. + +Backend replay may either emit each op directly to a `CGTarget` or run a +target-prep pipeline first. Native targets generally need machinization, +liveness, register allocation, and final replay. Source-like targets may prefer +direct replay. + +Interpreted execution should use the pre-machinize IR: + +- Maintain activation storage for typed IR locals. +- Represent address-taken locals, aggregate locals, and globals as + byte-addressable target-layout memory. +- Execute control transfers by label, or by a derived CFG block id in an + interpreter-owned view. +- Treat label addresses as opaque function-local label tokens. +- Implement interpreted-to-interpreted calls from retained `Func` bodies. +- Route external calls through an explicit host-helper/FFI layer rather than + lowering the interpreter to machine ABI internals. + +This keeps interpretation aligned with CG semantics while avoiding native-code +emission details. + +## Implementation Plan + +The migration should be a clean cutover of the semantic `CgTarget` interface, +not an incremental layering of another IR beside the current target API. + +The O0 path must remain direct: + +```text +CfreeCg -> semantic CgTarget + |-> native direct target (O0) + |-> C source direct target (--emit=c) + |-> IR recorder (O1/O2/interpreter) +``` + +Direct targets implement the semantic interface and emit immediately. The IR +recorder implements the same interface and stores the clean IR. This preserves +O0 compile-time behavior while giving optimized and interpreted paths a stable +recorded form. + +### Phase 1: Cut Over `CgTarget` + +Update the internal target interface to match the clean IR model: + +- Use one typed mutable local namespace for scalar temporaries, source locals, + parameters, aggregate homes, and call results. +- Make parameter information part of function/local declarations, not an + executable target op. +- Remove `OPK_REG` from the semantic target surface. Register allocation is a + native-lowering concern. +- Keep labels in the base interface. Blocks remain derived consumer views. +- Keep `IR_ADDR_OF`, `IR_TLS_ADDR_OF`, and `IR_LOAD_LABEL_ADDR` distinct. +- Keep ABI-shaped call descriptors, including decomposed arguments and returns. +- Keep aggregate, bitfield, atomic, inline asm, intrinsic, and structured + control operations at the semantic level. + +Native O0 targets may still map locals immediately to registers, frame homes, +or target-private storage. Taking the address of a local must force or require +a concrete home; frontends/CG should continue marking known address-taken and +memory-required locals so O0 does not need avoidable late repair. + +### Phase 2: Update Direct Backends + +Port the native and C-source targets to the new semantic interface. + +Native targets should keep their current emission strategy: + +- Map non-address-taken scalar locals to backend temporaries/registers. +- Map aggregate and address-taken locals to frame or equivalent storage. +- Materialize global function/object addresses with ordinary `addr_of`. +- Materialize TLS addresses through the dedicated TLS op. +- Materialize function-local label addresses through the label-address op. +- Lower calls directly from ABI-shaped descriptors. + +The C-source target should become simpler: typed locals become C temporaries or +opaque aggregate storage, records/arrays remain raw-byte aggregate typedefs, and +enums/aliases stay lowered to storage types except in source/debug metadata. + +### Phase 3: Update O1 Optimizer + +Make the O1 path record clean IR first, then derive the current backend-oriented +view needed by the existing O1 machinery: + +```text +clean IR + -> derive CFG from labels and terminators + -> classify locals: scalar, address-taken, aggregate, ABI home + -> lower locals to virtual registers or frame/storage objects + -> lower calls to backend call plans where useful + -> run existing O1 passes + -> replay into native target +``` + +This front conversion lets the existing O1 pipeline survive the cutover: +CFG cleanup, local simplify, machinize, liveness, dead-def elimination, +register allocation, combine, and emit can continue operating on an internal +MIR-like view. That view may still use virtual registers, frame slots, phis, and +block arrays because it is optimizer-private. + +O2 can be ignored during the first cut. Once O1 is stable, O2 can either start +from clean IR or reuse the same O1-derived view and then add SSA/inlining back +on top. + +### Phase 4: Add Interpreter + +Add interpreted execution after the clean recorder and O1 conversion are stable. + +The interpreter should execute the clean pre-machinize IR: + +- Allocate activation storage for typed locals. +- Use byte-addressable memory for aggregates, address-taken locals, globals, + TLS instances, and dynamic allocas. +- Execute labels directly or through an interpreter-owned CFG view. +- Treat label addresses as opaque function-local tokens. +- Dispatch interpreted-to-interpreted calls through retained clean IR function + bodies. +- Route external calls through an explicit host-helper or FFI layer. + +The interpreter should not consume the native MIR/regalloc view. That keeps it +aligned with language/CG semantics rather than backend emission details.