kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit d1144788d5d7e4b65ab69dff577f284420eefc8c
parent c2f9575135399104652d2be75470b74f696cf9a1
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 25 May 2026 16:52:32 -0700

CGTarget plan

Diffstat:
Adoc/CGTARGET.md | 625+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 625 insertions(+), 0 deletions(-)

diff --git a/doc/CGTARGET.md b/doc/CGTARGET.md @@ -0,0 +1,625 @@ +# CGTarget and NativeTarget + +This document describes the intended split between cfree's semantic codegen +target interface and the native backend emission interface. It complements +`doc/IR.md`: that document defines the clean recorded-CG IR, while this one +defines how `CfreeCg`, direct `-O0` emission, and optimized native emission +fit around that IR. + +## Goals + +- Keep a fast direct `-O0` path that does not record or optimize IR. +- Give `CfreeCg` a clean semantic target interface that maps directly to the + planned lowered-CG IR. +- Keep hard registers, spill slots, call plans, prologue sizing, liveness, and + register allocation out of the semantic interface. +- Let native architectures share implementation helpers between direct `-O0` + emission and optimized post-regalloc emission without making them the same + public vtable. + +## Current Problem + +The current internal `CGTarget` serves two different levels at once. + +First, `CfreeCg` drives it as a semantic sink. The value-stack layer lowers +public CG operations into target calls such as loads, stores, arithmetic, +labels, branches, calls, returns, atomics, varargs, and inline assembly. + +Second, the optimizer also records through a `CGTarget`, optimizes the recorded +function, performs backend preparation and register allocation, then replays +the lowered result into a native `CGTarget`. That final replay uses hard +registers, frame slots, spill/reload hooks, call plans, and backend register +metadata. + +Those are different contracts. The first is a target-data-layout-specific +semantic interface. The second is a machine-emission interface after +machinization and register allocation. Combining them forces the shared target +API to expose both IR concepts and backend-private lowering state. + +## Proposed Layering + +The intended layering is: + +```text +frontend -> CfreeCg/value stack -> semantic CGTarget + |-> direct O0 native/C target/WASM + |-> IR recorder -> clean IR -> optimizer + -> NativeTarget +``` + +Native architectures may implement both downstream interfaces: + +- a semantic `CGTarget` for direct `-O0` emission; +- a `NativeTarget` or native emitter for optimized post-regalloc emission. + +The two implementations can share instruction encoders, ABI helpers, frame +layout code, relocation helpers, inline-asm parsers, and debug/unwind helpers. +They should not share one vtable contract, because their input operands and +phase assumptions are different. + +## Semantic CGTarget + +The semantic `CGTarget` is the interface driven by `CfreeCg` and implemented by +both direct targets and the IR recorder. It speaks in terms that can be +recorded directly as clean lowered-CG IR: + +- typed semantic locals; +- immediates, globals, locals, and indirect addresses; +- labels and structured scopes; +- target-data-layout-specific memory accesses; +- ABI-shaped calls and returns; +- aggregate, bitfield, atomic, vararg, intrinsic, and inline-asm operations; +- sticky source locations. + +It should not expose optimizer or native emission state such as CFG blocks, +SSA, hard registers, physical register files, liveness, frame slots, spill +slots, call plans, scratch-register policy, or prologue/epilogue patching. + +The semantic value namespace is one mutable local namespace. Parameters, source +locals, compiler temporaries, aggregate homes, call results, and alloca results +are all `CGLocal` ids allocated by the target: + +```c +typedef u32 CGLocal; +#define CG_LOCAL_NONE 0u +``` + +Semantic operands should not contain hard registers: + +```text +OPK_IMM signed immediate bit pattern +OPK_LOCAL typed semantic local +OPK_GLOBAL object symbol plus addend address +OPK_INDIRECT base local plus optional index local, scale, and offset +``` + +The semantic API still includes operations such as: + +- `local` and `param`; +- `load_imm`, `load_const`, `copy`, `load`, `store`, `addr_of`, + `tls_addr_of`, aggregate copies/sets, and bitfield operations; +- `binop`, `unop`, `cmp`, and `convert`; +- labels, branches, switches, label-address materialization, indirect + branches, and structured scopes; +- `call` and `ret` using ABI-shaped descriptors; +- `alloca_`, `va_*`, atomics, fences, intrinsics, inline asm, and source + location tracking. + +The semantic API should not include: + +- `FrameSlot`; +- `CGKnownFrameDesc`; +- `CGCallPlan`; +- spill/reload hooks; +- hard-register discovery or reservation hooks; +- call-plan emission hooks; +- inline-asm register-name resolution as a target-wide semantic operation. + +Those belong to native lowering or native emission. + +## Direct O0 Native Target + +The direct `-O0` path remains: + +```text +frontend -> CfreeCg/value stack -> semantic CGTarget -> native arch emits +``` + +No IR recording is required. A native semantic target receives semantic locals +and operations, maps them to target-private storage, and emits machine code +immediately. + +The simplest correct direct target can give every local a frame home and use +scratch registers per instruction: + +```text +load_imm dst, 42 + -> move 42 to scratch + -> store scratch to dst's frame home + +binop dst, a, b + -> load a into scratch0 + -> load b into scratch1 + -> emit the operation + -> store the result to dst's frame home +``` + +That baseline is intentionally conservative. It needs no liveness, CFG, SSA, +or global register allocation. + +A faster direct target can add a small per-function local register cache. Each +semantic local has target-private state: + +```c +typedef struct NativeLocal { + CfreeCgTypeId type; + u32 size; + u32 align; + u32 flags; + + FrameSlot home; + Reg reg; + u8 cls; + u8 dirty; + u8 address_taken; + u8 memory_required; +} NativeLocal; +``` + +The direct target then uses local greedy helpers: + +```c +Reg materialize(NativeCgTarget *, Operand op, RegClass cls); +Reg ensure_writable_reg(NativeCgTarget *, CGLocal dst, RegClass cls); +void flush_local(NativeCgTarget *, CGLocal local); +void spill_one(NativeCgTarget *, RegClass cls); +``` + +The cache policy can be simple: + +- Keep `reg_owner[reg] = CGLocal` per register class. +- Prefer a free allocable register. +- Otherwise spill a non-pinned register to its frame home. +- Pin source and scratch registers for the duration of one instruction. +- Mark destination locals dirty when their cached register has newer contents + than memory. + +The direct target must flush conservatively when memory may observe cached +locals: + +- before calls unless the call lowering can prove a local need not be saved; +- before volatile or atomic memory operations; +- before inline asm with a memory clobber; +- before operations that may observe address-taken locals. + +This remains a local register cache, not a real allocator. The semantic API +does not promise where a local lives, so a target can choose the frame-only +baseline first and add caching later. + +## Optimized O1+ Native Target + +The optimized path records clean semantic IR first: + +```text +CfreeCg/value stack -> recording CGTarget -> clean IR +``` + +Optimization then derives private views from that clean IR: + +```text +clean IR + -> CFG/SSA and semantic optimization + -> native MIR or backend-prep form + -> liveness and register allocation + -> final machine locations +``` + +Once the optimizer has assigned hard registers and spill slots, it should not +replay through the semantic `CGTarget`. At that point the representation is no +longer semantic lowered-CG IR. It is a native backend form, so final emission +should drive a `NativeTarget` or native emitter. + +The optimizer-private representation may still use hard registers, spill slots, +frame slots, call plans, block arrays, phis, dominance, liveness, and other +backend-prep metadata. Those are derived views, not part of the semantic +target contract. + +## Unified IR Container + +The semantic IR and machine IR can share one representation substrate without +sharing one semantic contract. + +The useful unification is at the container and infrastructure level: + +- one function container; +- one linear instruction stream and/or derived block representation; +- one label namespace; +- one source-location model; +- one operand storage shape; +- one aux-payload allocation strategy; +- shared dump, walk, rewrite, and verification infrastructure. + +The phases remain distinct: + +```c +typedef enum IRPhase { + IR_PHASE_SEMANTIC, + IR_PHASE_MIR, + IR_PHASE_ALLOCATED_MIR, +} IRPhase; +``` + +The same `Func`/`Inst` storage can carry different phase-specific op and +operand subsets. Not every operand kind is legal in every phase: + +```text +semantic IR: + OPK_IMM, OPK_LOCAL, OPK_GLOBAL, OPK_INDIRECT + +MIR before register allocation: + OPK_IMM, OPK_VREG, OPK_GLOBAL, OPK_FRAME_SLOT, OPK_MACH_ADDR + +MIR after register allocation: + OPK_IMM, OPK_HARD_REG, OPK_FRAME_SLOT, OPK_STACK_SLOT, OPK_MACH_ADDR +``` + +Similarly, some operations are semantic-only, some are MIR-only, and some are +shared control-flow operations: + +```text +semantic-only examples: + IR_CALL, IR_RET, IR_SCOPE_BEGIN, IR_VA_ARG, IR_AGG_COPY + +MIR-only examples: + IR_MACH_CALL, IR_SPILL, IR_RELOAD, selected two-address ops + +shared examples: + IR_NOP, IR_BR, IR_CMP_BRANCH, IR_SWITCH, IR_LOAD_LABEL_ADDR +``` + +The lowering should not preserve an opcode name merely because the shape looks +similar. A semantic `IR_LOAD` carries source-level memory and type facts. A +machine load carries selected addressing modes, register constraints, and +instruction-emission constraints. If those semantics diverge, the MIR phase +should use a distinct op even though both live in the same `Inst` container. + +This model lets optimization lower uniformly: + +```text +semantic Func + -> semantic cleanup + -> MIR Func using the same storage conventions + -> allocated MIR Func + -> NativeTarget emission +``` + +The guardrail is phase-specific verification: + +```c +void ir_verify_semantic(Func *); +void ir_verify_mir(Func *); +void ir_verify_allocated_mir(Func *); +``` + +The verifier enforces legal opcodes, operand kinds, aux payloads, block/label +rules, and phase invariants. This keeps the representation compact and shared +without letting machine-only concepts leak back into the semantic `CGTarget` +contract. + +## NativeTarget Surface + +`NativeTarget` is the post-machinize, post-regalloc interface. It speaks final +machine locations and selected/native operations: + +```text +MIR_LOC_REG hard physical register +MIR_LOC_STACK frame, spill, or outgoing stack slot +MIR_LOC_IMM immediate +MIR_LOC_GLOBAL symbol plus addend +MIR_LOC_ADDR final addressing mode +``` + +A compact native emission surface can be shaped around final MIR records: + +```c +typedef struct NativeTarget NativeTarget; + +struct NativeTarget { + const NativeRegInfo *regs; + + void (*func_begin_known_frame)(NativeTarget *, const CGFuncDesc *, + const NativeFrameDesc *); + void (*emit)(NativeTarget *, const MIRInst *); + void (*func_end)(NativeTarget *); + + void (*plan_call)(NativeTarget *, const CGCallDesc *, NativeCallPlan *); +}; +``` + +This interface owns the machine-level concerns removed from semantic +`CGTarget`: + +- concrete frame and spill slots; +- known-frame layout and max outgoing call area; +- callee-save reservation and prologue/epilogue patching; +- hard-register operands and final addressing modes; +- spill/reload insertion or emission; +- selected two-address and arch-specific instruction forms; +- direct, indirect, and tail call emission after ABI routing; +- CFI and unwind emission; +- inline-asm constraint binding and clobber handling. + +Static register-file metadata belongs in `NativeRegInfo`: + +```c +typedef struct NativeRegClassInfo { + RegClass cls; + + const Reg *allocable; + u32 nallocable; + + const Reg *scratch; + u32 nscratch; + + const CGPhysRegInfo *phys; + u32 nphys; + + u32 caller_saved_mask; + u32 callee_saved_mask; + u32 arg_mask; + u32 ret_mask; + u32 reserved_mask; +} NativeRegClassInfo; + +typedef struct NativeRegInfo { + const NativeRegClassInfo *classes; + u32 nclasses; + + int (*resolve_name)(const NativeRegInfo *, Sym name, Reg *out, + RegClass *cls_out); + const char *(*debug_name)(const NativeRegInfo *, RegClass cls, Reg reg); + u32 (*dwarf_reg)(const NativeRegInfo *, RegClass cls, Reg reg); +} NativeRegInfo; +``` + +`resolve_name` belongs here when it is pure register-file metadata. If inline +assembly dialects later affect name resolution, the callback can take a small +dialect context. + +Call-specific answers should not be static register metadata when they depend +on ABI, calling convention, variadic state, sret, vector ABI, or attributes. +Those belong to native call planning: + +```text +call_clobber_mask(call, class) +return_locations(function ABI) +argument register/stack routing +tail-call stack routing +``` + +## CfreeCg Value Stack + +The value stack remains useful, but its role should be narrowed. It should be a +public API adapter and semantic lowering layer, not a physical allocator. + +It still provides: + +- push/pop API state and validation; +- expression-stack lowering; +- lvalue/rvalue conversion; +- aggregate, bitfield, call, switch, computed-goto, vararg, alloca, and inline + asm lowering; +- construction of ABI-shaped `CGCallDesc` and `CGABIValue` records; +- delayed semantic patterns such as delayed compares for branches; +- a single diagnostic point for misuse of the public CG API; +- a convenient frontend interface for simple non-C producers. + +It should stop owning: + +- hard-register allocation; +- frame-slot allocation; +- spill/reload policy; +- caller-saved preservation; +- backend scratch-register selection. + +Stack entries should describe semantic values: + +```c +typedef enum SValueKind { + SV_IMM, + SV_CONST, + SV_LOCAL, + SV_LVALUE, + SV_DELAYED_CMP, +} SValueKind; + +typedef struct SValue { + CfreeCgTypeId type; + u8 kind; + CGLocal local; + Operand addr; +} SValue; +``` + +## Local and Lvalue Model + +`SV_LOCAL` is a computed rvalue stored in a semantic local. `SV_LVALUE` is an +addressable storage location that may be loaded from or stored to. + +For a read: + +```text +x + -> push SV_LVALUE(local x) + +lvalue conversion + -> tmp = target->local(i32 temporary) + -> target->load(tmp, local x, mem) + -> push SV_LOCAL(tmp) +``` + +For an assignment: + +```text +x = y + 1 + -> keep x as SV_LVALUE + -> compute y + 1 as SV_LOCAL + -> target->store(address of x, value local, mem) +``` + +For aggregates, the distinction is more important. Aggregate values often stay +in addressable homes and move through `copy_bytes` rather than becoming scalar +register values. + +## Migration Notes + +The migration should treat the semantic interface as the stable boundary and +move machine concepts downward: + +1. Define semantic `CGLocal` and semantic `Operand` without `OPK_REG`. +2. Move frame slots, call plans, hard-register metadata, and spill/reload hooks + out of semantic `CGTarget`. +3. Convert `CfreeCg` stack entries from physical register/frame ownership to + semantic locals, lvalues, immediates, constants, and delayed compares. +4. Implement a direct native semantic target with the frame-only baseline. +5. Add a local register cache to direct targets only after correctness is + stable. +6. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst` + container where practical, guarded by phase-specific verification. +7. Keep machine-only concepts phase-local while introducing a `NativeTarget` + emission boundary for post-regalloc output. + +The result is a direct `-O0` path that stays fast and simple, plus an optimized +path whose final emission interface matches the data it actually has after +register allocation. + +## Impact Surface + +The interface split touches every layer that currently sees `CGTarget`, +`Operand`, `OPK_REG`, `FrameSlot`, `CGLocalStorage`, or `CGCallPlan`. + +The main surfaces are: + +- `src/arch/arch.h`: current shared definitions for semantic operations, + physical operands, frame slots, call plans, register metadata, `CGTarget`, + `CGBackend`, and `ArchImpl`. +- `src/arch/cgtarget.c`: arch-agnostic constructor/finalize helpers and helper + lowering such as indexed-address folding that currently emits through + `OPK_REG`. +- `src/arch/registry.c`: feature-gated backend registry. It currently returns + a `CGBackend` whose only construction hook is `make -> CGTarget`. +- `src/arch/check_target.c`: check-only backend. It implements the full current + target vtable, including frame slots and register hooks. +- `src/cg/*`: the public CG value-stack implementation. This is the largest + semantic migration because it currently owns value registers, spill slots, + frame lvalues, local storage, caller-saved preservation, and delayed + materialization into `OPK_REG`. +- `src/opt/*`: current recorder, IR container, optimization passes, MIR + lowering, register allocation, and final `opt_emit` replay into a native + `CGTarget`. +- `src/arch/{aa64,x64,rv64}/*`: native direct emitters. The split affects + allocation helpers, operation lowering, call lowering, prologue/epilogue + patching, inline asm, and opt coordination hooks. +- `src/arch/{aa64,x64,rv64}/opt_coord.c`: current hard-register and call-plan + coordination hooks for the optimizer. These should move behind + `NativeRegInfo`, native call planning, and native MIR emission. +- `src/arch/c_target/*`: source backend. It should implement semantic + `CGTarget`, not `NativeTarget`. +- `src/arch/wasm/*`: wasm target and structurizer. It is closer to a semantic + structured target than native machine emitters, but it still currently sees + the combined `CGTarget`/`Operand` model. +- `lang/c/parse/cg_adapter.c` and related parser integration: public + `CfreeCg` users. If the public `CfreeCg` API remains stable, these should + not need to change for the target split. +- tests under `test/opt`, `test/arch`, `test/api`, `test/parse`, and smoke + harnesses: they include direct `CGTarget` construction, mock targets, opt IR + dumps, inline asm backend tests, and public CG tests. + +The public `include/cfree/cg.h` API does not need to expose the split. It can +keep the push/pop CG interface while the internal value stack changes from +physical allocation to semantic local lowering. + +## Existing Build Gating + +The repo already has coarse build gates in `include/cfree/config.h`, mirrored +by `mk/config.mk` so `Makefile` drops matching source directories: + +- `CFREE_ARCH_AA64_ENABLED` +- `CFREE_ARCH_X64_ENABLED` +- `CFREE_ARCH_RV64_ENABLED` +- `CFREE_ARCH_WASM_ENABLED` +- `CFREE_ARCH_C_TARGET_ENABLED` +- `CFREE_OBJ_*_ENABLED` +- `CFREE_LANG_*_ENABLED` +- `CFREE_OPT_ENABLED` + +The existing `CFREE_OPT_ENABLED=0` path is the first useful safety valve. It +drops `src/opt/*`, filters arch `opt_coord.c`, and makes `CfreeCg` reject +`opt_level > 0`. During the semantic `CGTarget` cutover, this allows work to +start with direct `-O0` only. + +No new target-migration gates are needed. The migration should rely on the +existing component gates to remove unported code from the build while one +backend is brought forward. + +The public `CfreeCg` API is the boundary that keeps frontends insulated. If +`CfreeCg` remains source-compatible, C, toy, wasm-language, and preprocessor +frontends do not need to be disabled just because `CGTarget` changes. They only +need disabling if their own source files directly include or depend on changed +internal target details. + +`src/cg/*` is core codegen infrastructure and is not meaningfully optional for +this migration. It must compile in every codegen-capable slice. The practical +way to shrink the work is to disable unported consumers and implementers of the +internal target interface: + +- disable `CFREE_OPT_ENABLED` to drop the optimizer recorder, MIR passes, + regalloc, final replay, and per-arch `opt_coord.c`; +- disable all but one native `CFREE_ARCH_*_ENABLED` backend while porting the + new direct `-O0` target implementation; +- optionally disable `CFREE_ARCH_C_TARGET_ENABLED` until the semantic source + backend is ported; +- optionally disable `CFREE_ARCH_WASM_ENABLED` until the wasm backend is ported; +- keep object-format gates narrow to the selected backend's required format + when possible. + +The smallest buildable slice should be: + +```text +CFREE_OPT_ENABLED=0 +CFREE_ARCH_C_TARGET_ENABLED=0 or 1 +one native arch enabled, preferably the host arch +only object formats required by that arch enabled +``` + +That slice only requires: + +- semantic `CGTarget`; +- semantic `CfreeCg` value stack; +- check-only target; +- one direct native `-O0` target; +- object writer and ABI support for that arch/object format. + +After that, re-enable components in this order: + +1. `check_target`: validates the semantic target shape without native emission. +2. One direct native `-O0` target with frame-only local homes. +3. The C-source target as a semantic-only backend. +4. Local register caching in the direct native target. +5. `CFREE_OPT_ENABLED`: clean IR recorder using the semantic target. +6. Semantic optimizer passes that do not require native MIR/regalloc. +7. MIR lowering and `NativeTarget` for one arch. +8. Register allocation and allocated-MIR emission. +9. Remaining native arches through their existing `CFREE_ARCH_*_ENABLED` + gates. +10. Wasm and structurized/source-like targets through existing arch gates. + +Tests should follow the same gating: + +- direct API and parser codegen tests first with `opt_level=0`; +- arch inline-asm tests only after the selected direct native target is ported; +- `test-opt` only after the semantic recorder and verifier compile; +- optimized codegen tests only after `NativeTarget` emission exists for the + selected arch; +- smoke/link/debug tests last, because they exercise the whole backend, + object, linker, debug, and runtime pipeline.