CGTARGET plan update - kit

commit 624912a1e292fe0c39d22645e67f0358ea742a02
parent f04946c4b6ed13d4d6dc4364e2ae512b6f6e77a8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 25 May 2026 18:37:05 -0700

CGTARGET plan update

Diffstat:
M doc/CGTARGET.md  | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------

1 file changed, 90 insertions(+), 32 deletions(-)
diff --git a/doc/CGTARGET.md b/doc/CGTARGET.md
@@ -42,20 +42,26 @@ The intended layering is:
 
 ```text
 frontend -> CfreeCg/value stack -> semantic CGTarget
-                                  |-> direct O0 native/C target/WASM
+                                  |-> direct O0 NativeDirectTarget
+                                  |     -> NativeOps -> NativeTarget
+                                  |-> C target / WASM / check target
                                   |-> IR recorder -> clean IR -> optimizer
                                                          -> NativeTarget
 ```
 
-Native architectures may implement both downstream interfaces:
+Native architectures expose a `NativeTarget` for physical emission. Direct
+`-O0` native codegen should not require every arch to implement a separate
+semantic `CGTarget` vtable. Instead, a shared `NativeDirectTarget` implements
+the semantic `CGTarget` interface once and is parameterized by:
 
-- a semantic `CGTarget` for direct `-O0` emission;
-- a `NativeTarget` or native emitter for optimized post-regalloc emission.
+- the arch's `NativeTarget`, which emits physical/native operations;
+- a small arch-specific `NativeOps` adapter for direct-mode ABI/frame/legality
+  questions that the shared direct target cannot answer generically.
 
-The two implementations can share instruction encoders, ABI helpers, frame
-layout code, relocation helpers, inline-asm parsers, and debug/unwind helpers.
-They should not share one vtable contract, because their input operands and
-phase assumptions are different.
+Optimized codegen does not use `NativeOps`; after MIR lowering and register
+allocation, the optimizer drives `NativeTarget` directly. `NativeOps` exists
+only to let `NativeDirectTarget` reuse `NativeTarget` without duplicating the
+semantic `CGTarget` surface per native arch.
 
 ## Semantic CGTarget
 
@@ -122,12 +128,43 @@ Those belong to native lowering or native emission.
 The direct `-O0` path remains:
 
 ```text
-frontend -> CfreeCg/value stack -> semantic CGTarget -> native arch emits
+frontend -> CfreeCg/value stack -> semantic CGTarget
+                                  -> NativeDirectTarget
+                                  -> NativeOps -> NativeTarget
 ```
 
-No IR recording is required. A native semantic target receives semantic locals
-and operations, maps them to target-private storage, and emits machine code
-immediately.
+No IR recording is required. `NativeDirectTarget` is the shared semantic
+`CGTarget` implementation for native architectures. It receives semantic
+locals and operations, maps them to direct-mode physical storage, and emits
+machine code immediately through the injected `NativeTarget`.
+
+`NativeDirectTarget` owns the direct-mode policy and state that should no
+longer live in `CfreeCg`:
+
+- semantic local allocation and local metadata;
+- assigning semantic locals frame homes;
+- direct-mode scratch register allocation;
+- optional local register caching;
+- dirty-local flushing and cache invalidation;
+- call/volatile/atomic/inline-asm memory barriers;
+- caller-saved invalidation using native register metadata;
+- materializing semantic operands into physical values;
+- storing physical results back into semantic locals;
+- max-outgoing-call-area tracking for frame finalization.
+
+`NativeOps` should stay small. It is not a second copy of `CGTarget`. It
+answers arch-specific direct-mode questions and forwards special cases into
+`NativeTarget`:
+
+- static register metadata (`NativeRegInfo`);
+- function/frame begin/end glue for direct mode;
+- frame slot allocation and slot-address formation;
+- incoming parameter binding into a semantic local's home;
+- return and tail-call ABI decisions;
+- call planning/routing for direct calls;
+- operand/addressing legality when the generic direct target has a choice;
+- inline-asm, vararg, and other arch-sensitive helpers that cannot be
+  described as ordinary physical MIR emission.
 
 The simplest correct direct target can give every local a frame home and use
 scratch registers per instruction:
@@ -166,13 +203,13 @@ typedef struct NativeLocal {
 } NativeLocal;
 ```
 
-The direct target then uses local greedy helpers:
+`NativeDirectTarget` then uses local greedy helpers:
 
 ```c
-Reg materialize(NativeCgTarget *, Operand op, RegClass cls);
-Reg ensure_writable_reg(NativeCgTarget *, CGLocal dst, RegClass cls);
-void flush_local(NativeCgTarget *, CGLocal local);
-void spill_one(NativeCgTarget *, RegClass cls);
+Reg materialize(NativeDirectTarget *, Operand op, RegClass cls);
+Reg ensure_writable_reg(NativeDirectTarget *, CGLocal dst, RegClass cls);
+void flush_local(NativeDirectTarget *, CGLocal local);
+void spill_one(NativeDirectTarget *, RegClass cls);
 ```
 
 The cache policy can be simple:
@@ -217,12 +254,14 @@ clean IR
 Once the optimizer has assigned hard registers and spill slots, it should not
 replay through the semantic `CGTarget`. At that point the representation is no
 longer semantic lowered-CG IR. It is a native backend form, so final emission
-should drive a `NativeTarget` or native emitter.
+should drive `NativeTarget` directly.
 
 The optimizer-private representation may still use hard registers, spill slots,
 frame slots, call plans, block arrays, phis, dominance, liveness, and other
 backend-prep metadata. Those are derived views, not part of the semantic
-target contract.
+target contract. `NativeOps` is not part of this path; any operation the
+optimizer needs should be represented in MIR or exposed by `NativeTarget`
+itself.
 
 ## Unified IR Container
 
@@ -308,8 +347,9 @@ contract.
 
 ## NativeTarget Surface
 
-`NativeTarget` is the post-machinize, post-regalloc interface. It speaks final
-machine locations and selected/native operations:
+`NativeTarget` is the physical native emission interface. Optimized code uses
+it post-machinize and post-regalloc, where it speaks final machine locations
+and selected/native operations:
 
 ```text
 MIR_LOC_REG      hard physical register
@@ -336,6 +376,11 @@ struct NativeTarget {
 };
 ```
 
+The shared direct path may also use the same `NativeTarget`, but it does so
+through `NativeDirectTarget` and the small `NativeOps` adapter. This keeps the
+semantic `CGTarget` surface maximally reused while avoiding a large
+arch-specific direct `CGTarget` implementation per native backend.
+
 This interface owns the machine-level concerns removed from semantic
 `CGTarget`:
 
@@ -397,6 +442,11 @@ argument register/stack routing
 tail-call stack routing
 ```
 
+For direct `-O0`, `NativeOps` may expose call planning as an adapter because
+`NativeDirectTarget` starts from semantic `CGCallDesc` values. For optimized
+code, call planning belongs on `NativeTarget` or in MIR lowering; the optimizer
+does not call `NativeOps`.
+
 ## CfreeCg Value Stack
 
 The value stack remains useful, but its role should be narrowed. It should be a
@@ -481,12 +531,15 @@ move machine concepts downward:
    out of semantic `CGTarget`.
 3. Convert `CfreeCg` stack entries from physical register/frame ownership to
    semantic locals, lvalues, immediates, constants, and delayed compares.
-4. Implement a direct native semantic target with the frame-only baseline.
-5. Add a local register cache to direct targets only after correctness is
+4. Implement shared `NativeDirectTarget` as the native semantic `CGTarget`,
+   initially with the frame-only baseline.
+5. Introduce per-arch `NativeOps` only for direct-mode ABI/frame/legality glue,
+   forwarding physical emission through each arch's `NativeTarget`.
+6. Add a local register cache to `NativeDirectTarget` only after correctness is
    stable.
-6. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst`
+7. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst`
    container where practical, guarded by phase-specific verification.
-7. Keep machine-only concepts phase-local while introducing a `NativeTarget`
+8. Keep machine-only concepts phase-local while introducing a `NativeTarget`
    emission boundary for post-regalloc output.
 
 The result is a direct `-O0` path that stays fast and simple, plus an optimized
@@ -517,12 +570,15 @@ The main surfaces are:
 - `src/opt/*`: current recorder, IR container, optimization passes, MIR
   lowering, register allocation, and final `opt_emit` replay into a native
   `CGTarget`.
-- `src/arch/{aa64,x64,rv64}/*`: native direct emitters. The split affects
-  allocation helpers, operation lowering, call lowering, prologue/epilogue
-  patching, inline asm, and opt coordination hooks.
+- `src/arch/{aa64,x64,rv64}/*`: native physical emitters. The split moves
+  direct-mode policy into shared `NativeDirectTarget`; per-arch code should
+  provide `NativeTarget`, `NativeOps`, ABI helpers, frame/prologue/epilogue
+  code, inline-asm support, and instruction encoders.
 - `src/arch/{aa64,x64,rv64}/opt_coord.c`: current hard-register and call-plan
   coordination hooks for the optimizer. These should move behind
-  `NativeRegInfo`, native call planning, and native MIR emission.
+  `NativeRegInfo`, native call planning, and native MIR emission. Direct-mode
+  consumers should reach equivalent answers through `NativeOps`, not through
+  the semantic `CGTarget` surface.
 - `src/arch/c_target/*`: source backend. It should implement semantic
   `CGTarget`, not `NativeTarget`.
 - `src/arch/wasm/*`: wasm target and structurizer. It is closer to a semantic
@@ -597,15 +653,17 @@ That slice only requires:
 - semantic `CGTarget`;
 - semantic `CfreeCg` value stack;
 - check-only target;
-- one direct native `-O0` target;
+- shared `NativeDirectTarget`;
+- one arch's `NativeTarget` and direct-mode `NativeOps`;
 - object writer and ABI support for that arch/object format.
 
 After that, re-enable components in this order:
 
 1. `check_target`: validates the semantic target shape without native emission.
-2. One direct native `-O0` target with frame-only local homes.
+2. `NativeDirectTarget` plus one arch's `NativeTarget`/`NativeOps`, with
+   frame-only local homes.
 3. The C-source target as a semantic-only backend.
-4. Local register caching in the direct native target.
+4. Local register caching in `NativeDirectTarget`.
 5. `CFREE_OPT_ENABLED`: clean IR recorder using the semantic target.
 6. Semantic optimizer passes that do not require native MIR/regalloc.
 7. MIR lowering and `NativeTarget` for one arch.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README