CGTarget plan - kit

commit d1144788d5d7e4b65ab69dff577f284420eefc8c
parent c2f9575135399104652d2be75470b74f696cf9a1
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 25 May 2026 16:52:32 -0700

CGTarget plan

Diffstat:
A doc/CGTARGET.md  | 625 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 file changed, 625 insertions(+), 0 deletions(-)
diff --git a/doc/CGTARGET.md b/doc/CGTARGET.md
@@ -0,0 +1,625 @@
+# CGTarget and NativeTarget
+
+This document describes the intended split between cfree's semantic codegen
+target interface and the native backend emission interface. It complements
+`doc/IR.md`: that document defines the clean recorded-CG IR, while this one
+defines how `CfreeCg`, direct `-O0` emission, and optimized native emission
+fit around that IR.
+
+## Goals
+
+- Keep a fast direct `-O0` path that does not record or optimize IR.
+- Give `CfreeCg` a clean semantic target interface that maps directly to the
+  planned lowered-CG IR.
+- Keep hard registers, spill slots, call plans, prologue sizing, liveness, and
+  register allocation out of the semantic interface.
+- Let native architectures share implementation helpers between direct `-O0`
+  emission and optimized post-regalloc emission without making them the same
+  public vtable.
+
+## Current Problem
+
+The current internal `CGTarget` serves two different levels at once.
+
+First, `CfreeCg` drives it as a semantic sink. The value-stack layer lowers
+public CG operations into target calls such as loads, stores, arithmetic,
+labels, branches, calls, returns, atomics, varargs, and inline assembly.
+
+Second, the optimizer also records through a `CGTarget`, optimizes the recorded
+function, performs backend preparation and register allocation, then replays
+the lowered result into a native `CGTarget`. That final replay uses hard
+registers, frame slots, spill/reload hooks, call plans, and backend register
+metadata.
+
+Those are different contracts. The first is a target-data-layout-specific
+semantic interface. The second is a machine-emission interface after
+machinization and register allocation. Combining them forces the shared target
+API to expose both IR concepts and backend-private lowering state.
+
+## Proposed Layering
+
+The intended layering is:
+
+```text
+frontend -> CfreeCg/value stack -> semantic CGTarget
+                                  |-> direct O0 native/C target/WASM
+                                  |-> IR recorder -> clean IR -> optimizer
+                                                         -> NativeTarget
+```
+
+Native architectures may implement both downstream interfaces:
+
+- a semantic `CGTarget` for direct `-O0` emission;
+- a `NativeTarget` or native emitter for optimized post-regalloc emission.
+
+The two implementations can share instruction encoders, ABI helpers, frame
+layout code, relocation helpers, inline-asm parsers, and debug/unwind helpers.
+They should not share one vtable contract, because their input operands and
+phase assumptions are different.
+
+## Semantic CGTarget
+
+The semantic `CGTarget` is the interface driven by `CfreeCg` and implemented by
+both direct targets and the IR recorder. It speaks in terms that can be
+recorded directly as clean lowered-CG IR:
+
+- typed semantic locals;
+- immediates, globals, locals, and indirect addresses;
+- labels and structured scopes;
+- target-data-layout-specific memory accesses;
+- ABI-shaped calls and returns;
+- aggregate, bitfield, atomic, vararg, intrinsic, and inline-asm operations;
+- sticky source locations.
+
+It should not expose optimizer or native emission state such as CFG blocks,
+SSA, hard registers, physical register files, liveness, frame slots, spill
+slots, call plans, scratch-register policy, or prologue/epilogue patching.
+
+The semantic value namespace is one mutable local namespace. Parameters, source
+locals, compiler temporaries, aggregate homes, call results, and alloca results
+are all `CGLocal` ids allocated by the target:
+
+```c
+typedef u32 CGLocal;
+#define CG_LOCAL_NONE 0u
+```
+
+Semantic operands should not contain hard registers:
+
+```text
+OPK_IMM       signed immediate bit pattern
+OPK_LOCAL     typed semantic local
+OPK_GLOBAL    object symbol plus addend address
+OPK_INDIRECT  base local plus optional index local, scale, and offset
+```
+
+The semantic API still includes operations such as:
+
+- `local` and `param`;
+- `load_imm`, `load_const`, `copy`, `load`, `store`, `addr_of`,
+  `tls_addr_of`, aggregate copies/sets, and bitfield operations;
+- `binop`, `unop`, `cmp`, and `convert`;
+- labels, branches, switches, label-address materialization, indirect
+  branches, and structured scopes;
+- `call` and `ret` using ABI-shaped descriptors;
+- `alloca_`, `va_*`, atomics, fences, intrinsics, inline asm, and source
+  location tracking.
+
+The semantic API should not include:
+
+- `FrameSlot`;
+- `CGKnownFrameDesc`;
+- `CGCallPlan`;
+- spill/reload hooks;
+- hard-register discovery or reservation hooks;
+- call-plan emission hooks;
+- inline-asm register-name resolution as a target-wide semantic operation.
+
+Those belong to native lowering or native emission.
+
+## Direct O0 Native Target
+
+The direct `-O0` path remains:
+
+```text
+frontend -> CfreeCg/value stack -> semantic CGTarget -> native arch emits
+```
+
+No IR recording is required. A native semantic target receives semantic locals
+and operations, maps them to target-private storage, and emits machine code
+immediately.
+
+The simplest correct direct target can give every local a frame home and use
+scratch registers per instruction:
+
+```text
+load_imm dst, 42
+  -> move 42 to scratch
+  -> store scratch to dst's frame home
+
+binop dst, a, b
+  -> load a into scratch0
+  -> load b into scratch1
+  -> emit the operation
+  -> store the result to dst's frame home
+```
+
+That baseline is intentionally conservative. It needs no liveness, CFG, SSA,
+or global register allocation.
+
+A faster direct target can add a small per-function local register cache. Each
+semantic local has target-private state:
+
+```c
+typedef struct NativeLocal {
+  CfreeCgTypeId type;
+  u32 size;
+  u32 align;
+  u32 flags;
+
+  FrameSlot home;
+  Reg reg;
+  u8 cls;
+  u8 dirty;
+  u8 address_taken;
+  u8 memory_required;
+} NativeLocal;
+```
+
+The direct target then uses local greedy helpers:
+
+```c
+Reg materialize(NativeCgTarget *, Operand op, RegClass cls);
+Reg ensure_writable_reg(NativeCgTarget *, CGLocal dst, RegClass cls);
+void flush_local(NativeCgTarget *, CGLocal local);
+void spill_one(NativeCgTarget *, RegClass cls);
+```
+
+The cache policy can be simple:
+
+- Keep `reg_owner[reg] = CGLocal` per register class.
+- Prefer a free allocable register.
+- Otherwise spill a non-pinned register to its frame home.
+- Pin source and scratch registers for the duration of one instruction.
+- Mark destination locals dirty when their cached register has newer contents
+  than memory.
+
+The direct target must flush conservatively when memory may observe cached
+locals:
+
+- before calls unless the call lowering can prove a local need not be saved;
+- before volatile or atomic memory operations;
+- before inline asm with a memory clobber;
+- before operations that may observe address-taken locals.
+
+This remains a local register cache, not a real allocator. The semantic API
+does not promise where a local lives, so a target can choose the frame-only
+baseline first and add caching later.
+
+## Optimized O1+ Native Target
+
+The optimized path records clean semantic IR first:
+
+```text
+CfreeCg/value stack -> recording CGTarget -> clean IR
+```
+
+Optimization then derives private views from that clean IR:
+
+```text
+clean IR
+  -> CFG/SSA and semantic optimization
+  -> native MIR or backend-prep form
+  -> liveness and register allocation
+  -> final machine locations
+```
+
+Once the optimizer has assigned hard registers and spill slots, it should not
+replay through the semantic `CGTarget`. At that point the representation is no
+longer semantic lowered-CG IR. It is a native backend form, so final emission
+should drive a `NativeTarget` or native emitter.
+
+The optimizer-private representation may still use hard registers, spill slots,
+frame slots, call plans, block arrays, phis, dominance, liveness, and other
+backend-prep metadata. Those are derived views, not part of the semantic
+target contract.
+
+## Unified IR Container
+
+The semantic IR and machine IR can share one representation substrate without
+sharing one semantic contract.
+
+The useful unification is at the container and infrastructure level:
+
+- one function container;
+- one linear instruction stream and/or derived block representation;
+- one label namespace;
+- one source-location model;
+- one operand storage shape;
+- one aux-payload allocation strategy;
+- shared dump, walk, rewrite, and verification infrastructure.
+
+The phases remain distinct:
+
+```c
+typedef enum IRPhase {
+  IR_PHASE_SEMANTIC,
+  IR_PHASE_MIR,
+  IR_PHASE_ALLOCATED_MIR,
+} IRPhase;
+```
+
+The same `Func`/`Inst` storage can carry different phase-specific op and
+operand subsets. Not every operand kind is legal in every phase:
+
+```text
+semantic IR:
+  OPK_IMM, OPK_LOCAL, OPK_GLOBAL, OPK_INDIRECT
+
+MIR before register allocation:
+  OPK_IMM, OPK_VREG, OPK_GLOBAL, OPK_FRAME_SLOT, OPK_MACH_ADDR
+
+MIR after register allocation:
+  OPK_IMM, OPK_HARD_REG, OPK_FRAME_SLOT, OPK_STACK_SLOT, OPK_MACH_ADDR
+```
+
+Similarly, some operations are semantic-only, some are MIR-only, and some are
+shared control-flow operations:
+
+```text
+semantic-only examples:
+  IR_CALL, IR_RET, IR_SCOPE_BEGIN, IR_VA_ARG, IR_AGG_COPY
+
+MIR-only examples:
+  IR_MACH_CALL, IR_SPILL, IR_RELOAD, selected two-address ops
+
+shared examples:
+  IR_NOP, IR_BR, IR_CMP_BRANCH, IR_SWITCH, IR_LOAD_LABEL_ADDR
+```
+
+The lowering should not preserve an opcode name merely because the shape looks
+similar. A semantic `IR_LOAD` carries source-level memory and type facts. A
+machine load carries selected addressing modes, register constraints, and
+instruction-emission constraints. If those semantics diverge, the MIR phase
+should use a distinct op even though both live in the same `Inst` container.
+
+This model lets optimization lower uniformly:
+
+```text
+semantic Func
+  -> semantic cleanup
+  -> MIR Func using the same storage conventions
+  -> allocated MIR Func
+  -> NativeTarget emission
+```
+
+The guardrail is phase-specific verification:
+
+```c
+void ir_verify_semantic(Func *);
+void ir_verify_mir(Func *);
+void ir_verify_allocated_mir(Func *);
+```
+
+The verifier enforces legal opcodes, operand kinds, aux payloads, block/label
+rules, and phase invariants. This keeps the representation compact and shared
+without letting machine-only concepts leak back into the semantic `CGTarget`
+contract.
+
+## NativeTarget Surface
+
+`NativeTarget` is the post-machinize, post-regalloc interface. It speaks final
+machine locations and selected/native operations:
+
+```text
+MIR_LOC_REG      hard physical register
+MIR_LOC_STACK    frame, spill, or outgoing stack slot
+MIR_LOC_IMM      immediate
+MIR_LOC_GLOBAL   symbol plus addend
+MIR_LOC_ADDR     final addressing mode
+```
+
+A compact native emission surface can be shaped around final MIR records:
+
+```c
+typedef struct NativeTarget NativeTarget;
+
+struct NativeTarget {
+  const NativeRegInfo *regs;
+
+  void (*func_begin_known_frame)(NativeTarget *, const CGFuncDesc *,
+                                 const NativeFrameDesc *);
+  void (*emit)(NativeTarget *, const MIRInst *);
+  void (*func_end)(NativeTarget *);
+
+  void (*plan_call)(NativeTarget *, const CGCallDesc *, NativeCallPlan *);
+};
+```
+
+This interface owns the machine-level concerns removed from semantic
+`CGTarget`:
+
+- concrete frame and spill slots;
+- known-frame layout and max outgoing call area;
+- callee-save reservation and prologue/epilogue patching;
+- hard-register operands and final addressing modes;
+- spill/reload insertion or emission;
+- selected two-address and arch-specific instruction forms;
+- direct, indirect, and tail call emission after ABI routing;
+- CFI and unwind emission;
+- inline-asm constraint binding and clobber handling.
+
+Static register-file metadata belongs in `NativeRegInfo`:
+
+```c
+typedef struct NativeRegClassInfo {
+  RegClass cls;
+
+  const Reg *allocable;
+  u32 nallocable;
+
+  const Reg *scratch;
+  u32 nscratch;
+
+  const CGPhysRegInfo *phys;
+  u32 nphys;
+
+  u32 caller_saved_mask;
+  u32 callee_saved_mask;
+  u32 arg_mask;
+  u32 ret_mask;
+  u32 reserved_mask;
+} NativeRegClassInfo;
+
+typedef struct NativeRegInfo {
+  const NativeRegClassInfo *classes;
+  u32 nclasses;
+
+  int (*resolve_name)(const NativeRegInfo *, Sym name, Reg *out,
+                      RegClass *cls_out);
+  const char *(*debug_name)(const NativeRegInfo *, RegClass cls, Reg reg);
+  u32 (*dwarf_reg)(const NativeRegInfo *, RegClass cls, Reg reg);
+} NativeRegInfo;
+```
+
+`resolve_name` belongs here when it is pure register-file metadata. If inline
+assembly dialects later affect name resolution, the callback can take a small
+dialect context.
+
+Call-specific answers should not be static register metadata when they depend
+on ABI, calling convention, variadic state, sret, vector ABI, or attributes.
+Those belong to native call planning:
+
+```text
+call_clobber_mask(call, class)
+return_locations(function ABI)
+argument register/stack routing
+tail-call stack routing
+```
+
+## CfreeCg Value Stack
+
+The value stack remains useful, but its role should be narrowed. It should be a
+public API adapter and semantic lowering layer, not a physical allocator.
+
+It still provides:
+
+- push/pop API state and validation;
+- expression-stack lowering;
+- lvalue/rvalue conversion;
+- aggregate, bitfield, call, switch, computed-goto, vararg, alloca, and inline
+  asm lowering;
+- construction of ABI-shaped `CGCallDesc` and `CGABIValue` records;
+- delayed semantic patterns such as delayed compares for branches;
+- a single diagnostic point for misuse of the public CG API;
+- a convenient frontend interface for simple non-C producers.
+
+It should stop owning:
+
+- hard-register allocation;
+- frame-slot allocation;
+- spill/reload policy;
+- caller-saved preservation;
+- backend scratch-register selection.
+
+Stack entries should describe semantic values:
+
+```c
+typedef enum SValueKind {
+  SV_IMM,
+  SV_CONST,
+  SV_LOCAL,
+  SV_LVALUE,
+  SV_DELAYED_CMP,
+} SValueKind;
+
+typedef struct SValue {
+  CfreeCgTypeId type;
+  u8 kind;
+  CGLocal local;
+  Operand addr;
+} SValue;
+```
+
+## Local and Lvalue Model
+
+`SV_LOCAL` is a computed rvalue stored in a semantic local. `SV_LVALUE` is an
+addressable storage location that may be loaded from or stored to.
+
+For a read:
+
+```text
+x
+  -> push SV_LVALUE(local x)
+
+lvalue conversion
+  -> tmp = target->local(i32 temporary)
+  -> target->load(tmp, local x, mem)
+  -> push SV_LOCAL(tmp)
+```
+
+For an assignment:
+
+```text
+x = y + 1
+  -> keep x as SV_LVALUE
+  -> compute y + 1 as SV_LOCAL
+  -> target->store(address of x, value local, mem)
+```
+
+For aggregates, the distinction is more important. Aggregate values often stay
+in addressable homes and move through `copy_bytes` rather than becoming scalar
+register values.
+
+## Migration Notes
+
+The migration should treat the semantic interface as the stable boundary and
+move machine concepts downward:
+
+1. Define semantic `CGLocal` and semantic `Operand` without `OPK_REG`.
+2. Move frame slots, call plans, hard-register metadata, and spill/reload hooks
+   out of semantic `CGTarget`.
+3. Convert `CfreeCg` stack entries from physical register/frame ownership to
+   semantic locals, lvalues, immediates, constants, and delayed compares.
+4. Implement a direct native semantic target with the frame-only baseline.
+5. Add a local register cache to direct targets only after correctness is
+   stable.
+6. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst`
+   container where practical, guarded by phase-specific verification.
+7. Keep machine-only concepts phase-local while introducing a `NativeTarget`
+   emission boundary for post-regalloc output.
+
+The result is a direct `-O0` path that stays fast and simple, plus an optimized
+path whose final emission interface matches the data it actually has after
+register allocation.
+
+## Impact Surface
+
+The interface split touches every layer that currently sees `CGTarget`,
+`Operand`, `OPK_REG`, `FrameSlot`, `CGLocalStorage`, or `CGCallPlan`.
+
+The main surfaces are:
+
+- `src/arch/arch.h`: current shared definitions for semantic operations,
+  physical operands, frame slots, call plans, register metadata, `CGTarget`,
+  `CGBackend`, and `ArchImpl`.
+- `src/arch/cgtarget.c`: arch-agnostic constructor/finalize helpers and helper
+  lowering such as indexed-address folding that currently emits through
+  `OPK_REG`.
+- `src/arch/registry.c`: feature-gated backend registry. It currently returns
+  a `CGBackend` whose only construction hook is `make -> CGTarget`.
+- `src/arch/check_target.c`: check-only backend. It implements the full current
+  target vtable, including frame slots and register hooks.
+- `src/cg/*`: the public CG value-stack implementation. This is the largest
+  semantic migration because it currently owns value registers, spill slots,
+  frame lvalues, local storage, caller-saved preservation, and delayed
+  materialization into `OPK_REG`.
+- `src/opt/*`: current recorder, IR container, optimization passes, MIR
+  lowering, register allocation, and final `opt_emit` replay into a native
+  `CGTarget`.
+- `src/arch/{aa64,x64,rv64}/*`: native direct emitters. The split affects
+  allocation helpers, operation lowering, call lowering, prologue/epilogue
+  patching, inline asm, and opt coordination hooks.
+- `src/arch/{aa64,x64,rv64}/opt_coord.c`: current hard-register and call-plan
+  coordination hooks for the optimizer. These should move behind
+  `NativeRegInfo`, native call planning, and native MIR emission.
+- `src/arch/c_target/*`: source backend. It should implement semantic
+  `CGTarget`, not `NativeTarget`.
+- `src/arch/wasm/*`: wasm target and structurizer. It is closer to a semantic
+  structured target than native machine emitters, but it still currently sees
+  the combined `CGTarget`/`Operand` model.
+- `lang/c/parse/cg_adapter.c` and related parser integration: public
+  `CfreeCg` users. If the public `CfreeCg` API remains stable, these should
+  not need to change for the target split.
+- tests under `test/opt`, `test/arch`, `test/api`, `test/parse`, and smoke
+  harnesses: they include direct `CGTarget` construction, mock targets, opt IR
+  dumps, inline asm backend tests, and public CG tests.
+
+The public `include/cfree/cg.h` API does not need to expose the split. It can
+keep the push/pop CG interface while the internal value stack changes from
+physical allocation to semantic local lowering.
+
+## Existing Build Gating
+
+The repo already has coarse build gates in `include/cfree/config.h`, mirrored
+by `mk/config.mk` so `Makefile` drops matching source directories:
+
+- `CFREE_ARCH_AA64_ENABLED`
+- `CFREE_ARCH_X64_ENABLED`
+- `CFREE_ARCH_RV64_ENABLED`
+- `CFREE_ARCH_WASM_ENABLED`
+- `CFREE_ARCH_C_TARGET_ENABLED`
+- `CFREE_OBJ_*_ENABLED`
+- `CFREE_LANG_*_ENABLED`
+- `CFREE_OPT_ENABLED`
+
+The existing `CFREE_OPT_ENABLED=0` path is the first useful safety valve. It
+drops `src/opt/*`, filters arch `opt_coord.c`, and makes `CfreeCg` reject
+`opt_level > 0`. During the semantic `CGTarget` cutover, this allows work to
+start with direct `-O0` only.
+
+No new target-migration gates are needed. The migration should rely on the
+existing component gates to remove unported code from the build while one
+backend is brought forward.
+
+The public `CfreeCg` API is the boundary that keeps frontends insulated. If
+`CfreeCg` remains source-compatible, C, toy, wasm-language, and preprocessor
+frontends do not need to be disabled just because `CGTarget` changes. They only
+need disabling if their own source files directly include or depend on changed
+internal target details.
+
+`src/cg/*` is core codegen infrastructure and is not meaningfully optional for
+this migration. It must compile in every codegen-capable slice. The practical
+way to shrink the work is to disable unported consumers and implementers of the
+internal target interface:
+
+- disable `CFREE_OPT_ENABLED` to drop the optimizer recorder, MIR passes,
+  regalloc, final replay, and per-arch `opt_coord.c`;
+- disable all but one native `CFREE_ARCH_*_ENABLED` backend while porting the
+  new direct `-O0` target implementation;
+- optionally disable `CFREE_ARCH_C_TARGET_ENABLED` until the semantic source
+  backend is ported;
+- optionally disable `CFREE_ARCH_WASM_ENABLED` until the wasm backend is ported;
+- keep object-format gates narrow to the selected backend's required format
+  when possible.
+
+The smallest buildable slice should be:
+
+```text
+CFREE_OPT_ENABLED=0
+CFREE_ARCH_C_TARGET_ENABLED=0 or 1
+one native arch enabled, preferably the host arch
+only object formats required by that arch enabled
+```
+
+That slice only requires:
+
+- semantic `CGTarget`;
+- semantic `CfreeCg` value stack;
+- check-only target;
+- one direct native `-O0` target;
+- object writer and ABI support for that arch/object format.
+
+After that, re-enable components in this order:
+
+1. `check_target`: validates the semantic target shape without native emission.
+2. One direct native `-O0` target with frame-only local homes.
+3. The C-source target as a semantic-only backend.
+4. Local register caching in the direct native target.
+5. `CFREE_OPT_ENABLED`: clean IR recorder using the semantic target.
+6. Semantic optimizer passes that do not require native MIR/regalloc.
+7. MIR lowering and `NativeTarget` for one arch.
+8. Register allocation and allocated-MIR emission.
+9. Remaining native arches through their existing `CFREE_ARCH_*_ENABLED`
+   gates.
+10. Wasm and structurized/source-like targets through existing arch gates.
+
+Tests should follow the same gating:
+
+- direct API and parser codegen tests first with `opt_level=0`;
+- arch inline-asm tests only after the selected direct native target is ported;
+- `test-opt` only after the semantic recorder and verifier compile;
+- optimized codegen tests only after `NativeTarget` emission exists for the
+  selected arch;
+- smoke/link/debug tests last, because they exercise the whole backend,
+  object, linker, debug, and runtime pipeline.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README