commit 624912a1e292fe0c39d22645e67f0358ea742a02
parent f04946c4b6ed13d4d6dc4364e2ae512b6f6e77a8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 25 May 2026 18:37:05 -0700
CGTARGET plan update
Diffstat:
| M | doc/CGTARGET.md | | | 122 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------- |
1 file changed, 90 insertions(+), 32 deletions(-)
diff --git a/doc/CGTARGET.md b/doc/CGTARGET.md
@@ -42,20 +42,26 @@ The intended layering is:
```text
frontend -> CfreeCg/value stack -> semantic CGTarget
- |-> direct O0 native/C target/WASM
+ |-> direct O0 NativeDirectTarget
+ | -> NativeOps -> NativeTarget
+ |-> C target / WASM / check target
|-> IR recorder -> clean IR -> optimizer
-> NativeTarget
```
-Native architectures may implement both downstream interfaces:
+Native architectures expose a `NativeTarget` for physical emission. Direct
+`-O0` native codegen should not require every arch to implement a separate
+semantic `CGTarget` vtable. Instead, a shared `NativeDirectTarget` implements
+the semantic `CGTarget` interface once and is parameterized by:
-- a semantic `CGTarget` for direct `-O0` emission;
-- a `NativeTarget` or native emitter for optimized post-regalloc emission.
+- the arch's `NativeTarget`, which emits physical/native operations;
+- a small arch-specific `NativeOps` adapter for direct-mode ABI/frame/legality
+ questions that the shared direct target cannot answer generically.
-The two implementations can share instruction encoders, ABI helpers, frame
-layout code, relocation helpers, inline-asm parsers, and debug/unwind helpers.
-They should not share one vtable contract, because their input operands and
-phase assumptions are different.
+Optimized codegen does not use `NativeOps`; after MIR lowering and register
+allocation, the optimizer drives `NativeTarget` directly. `NativeOps` exists
+only to let `NativeDirectTarget` reuse `NativeTarget` without duplicating the
+semantic `CGTarget` surface per native arch.
## Semantic CGTarget
@@ -122,12 +128,43 @@ Those belong to native lowering or native emission.
The direct `-O0` path remains:
```text
-frontend -> CfreeCg/value stack -> semantic CGTarget -> native arch emits
+frontend -> CfreeCg/value stack -> semantic CGTarget
+ -> NativeDirectTarget
+ -> NativeOps -> NativeTarget
```
-No IR recording is required. A native semantic target receives semantic locals
-and operations, maps them to target-private storage, and emits machine code
-immediately.
+No IR recording is required. `NativeDirectTarget` is the shared semantic
+`CGTarget` implementation for native architectures. It receives semantic
+locals and operations, maps them to direct-mode physical storage, and emits
+machine code immediately through the injected `NativeTarget`.
+
+`NativeDirectTarget` owns the direct-mode policy and state that should no
+longer live in `CfreeCg`:
+
+- semantic local allocation and local metadata;
+- assigning semantic locals frame homes;
+- direct-mode scratch register allocation;
+- optional local register caching;
+- dirty-local flushing and cache invalidation;
+- call/volatile/atomic/inline-asm memory barriers;
+- caller-saved invalidation using native register metadata;
+- materializing semantic operands into physical values;
+- storing physical results back into semantic locals;
+- max-outgoing-call-area tracking for frame finalization.
+
+`NativeOps` should stay small. It is not a second copy of `CGTarget`. It
+answers arch-specific direct-mode questions and forwards special cases into
+`NativeTarget`:
+
+- static register metadata (`NativeRegInfo`);
+- function/frame begin/end glue for direct mode;
+- frame slot allocation and slot-address formation;
+- incoming parameter binding into a semantic local's home;
+- return and tail-call ABI decisions;
+- call planning/routing for direct calls;
+- operand/addressing legality when the generic direct target has a choice;
+- inline-asm, vararg, and other arch-sensitive helpers that cannot be
+ described as ordinary physical MIR emission.
The simplest correct direct target can give every local a frame home and use
scratch registers per instruction:
@@ -166,13 +203,13 @@ typedef struct NativeLocal {
} NativeLocal;
```
-The direct target then uses local greedy helpers:
+`NativeDirectTarget` then uses local greedy helpers:
```c
-Reg materialize(NativeCgTarget *, Operand op, RegClass cls);
-Reg ensure_writable_reg(NativeCgTarget *, CGLocal dst, RegClass cls);
-void flush_local(NativeCgTarget *, CGLocal local);
-void spill_one(NativeCgTarget *, RegClass cls);
+Reg materialize(NativeDirectTarget *, Operand op, RegClass cls);
+Reg ensure_writable_reg(NativeDirectTarget *, CGLocal dst, RegClass cls);
+void flush_local(NativeDirectTarget *, CGLocal local);
+void spill_one(NativeDirectTarget *, RegClass cls);
```
The cache policy can be simple:
@@ -217,12 +254,14 @@ clean IR
Once the optimizer has assigned hard registers and spill slots, it should not
replay through the semantic `CGTarget`. At that point the representation is no
longer semantic lowered-CG IR. It is a native backend form, so final emission
-should drive a `NativeTarget` or native emitter.
+should drive `NativeTarget` directly.
The optimizer-private representation may still use hard registers, spill slots,
frame slots, call plans, block arrays, phis, dominance, liveness, and other
backend-prep metadata. Those are derived views, not part of the semantic
-target contract.
+target contract. `NativeOps` is not part of this path; any operation the
+optimizer needs should be represented in MIR or exposed by `NativeTarget`
+itself.
## Unified IR Container
@@ -308,8 +347,9 @@ contract.
## NativeTarget Surface
-`NativeTarget` is the post-machinize, post-regalloc interface. It speaks final
-machine locations and selected/native operations:
+`NativeTarget` is the physical native emission interface. Optimized code uses
+it post-machinize and post-regalloc, where it speaks final machine locations
+and selected/native operations:
```text
MIR_LOC_REG hard physical register
@@ -336,6 +376,11 @@ struct NativeTarget {
};
```
+The shared direct path may also use the same `NativeTarget`, but it does so
+through `NativeDirectTarget` and the small `NativeOps` adapter. This keeps the
+semantic `CGTarget` surface maximally reused while avoiding a large
+arch-specific direct `CGTarget` implementation per native backend.
+
This interface owns the machine-level concerns removed from semantic
`CGTarget`:
@@ -397,6 +442,11 @@ argument register/stack routing
tail-call stack routing
```
+For direct `-O0`, `NativeOps` may expose call planning as an adapter because
+`NativeDirectTarget` starts from semantic `CGCallDesc` values. For optimized
+code, call planning belongs on `NativeTarget` or in MIR lowering; the optimizer
+does not call `NativeOps`.
+
## CfreeCg Value Stack
The value stack remains useful, but its role should be narrowed. It should be a
@@ -481,12 +531,15 @@ move machine concepts downward:
out of semantic `CGTarget`.
3. Convert `CfreeCg` stack entries from physical register/frame ownership to
semantic locals, lvalues, immediates, constants, and delayed compares.
-4. Implement a direct native semantic target with the frame-only baseline.
-5. Add a local register cache to direct targets only after correctness is
+4. Implement shared `NativeDirectTarget` as the native semantic `CGTarget`,
+ initially with the frame-only baseline.
+5. Introduce per-arch `NativeOps` only for direct-mode ABI/frame/legality glue,
+ forwarding physical emission through each arch's `NativeTarget`.
+6. Add a local register cache to `NativeDirectTarget` only after correctness is
stable.
-6. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst`
+7. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst`
container where practical, guarded by phase-specific verification.
-7. Keep machine-only concepts phase-local while introducing a `NativeTarget`
+8. Keep machine-only concepts phase-local while introducing a `NativeTarget`
emission boundary for post-regalloc output.
The result is a direct `-O0` path that stays fast and simple, plus an optimized
@@ -517,12 +570,15 @@ The main surfaces are:
- `src/opt/*`: current recorder, IR container, optimization passes, MIR
lowering, register allocation, and final `opt_emit` replay into a native
`CGTarget`.
-- `src/arch/{aa64,x64,rv64}/*`: native direct emitters. The split affects
- allocation helpers, operation lowering, call lowering, prologue/epilogue
- patching, inline asm, and opt coordination hooks.
+- `src/arch/{aa64,x64,rv64}/*`: native physical emitters. The split moves
+ direct-mode policy into shared `NativeDirectTarget`; per-arch code should
+ provide `NativeTarget`, `NativeOps`, ABI helpers, frame/prologue/epilogue
+ code, inline-asm support, and instruction encoders.
- `src/arch/{aa64,x64,rv64}/opt_coord.c`: current hard-register and call-plan
coordination hooks for the optimizer. These should move behind
- `NativeRegInfo`, native call planning, and native MIR emission.
+ `NativeRegInfo`, native call planning, and native MIR emission. Direct-mode
+ consumers should reach equivalent answers through `NativeOps`, not through
+ the semantic `CGTarget` surface.
- `src/arch/c_target/*`: source backend. It should implement semantic
`CGTarget`, not `NativeTarget`.
- `src/arch/wasm/*`: wasm target and structurizer. It is closer to a semantic
@@ -597,15 +653,17 @@ That slice only requires:
- semantic `CGTarget`;
- semantic `CfreeCg` value stack;
- check-only target;
-- one direct native `-O0` target;
+- shared `NativeDirectTarget`;
+- one arch's `NativeTarget` and direct-mode `NativeOps`;
- object writer and ABI support for that arch/object format.
After that, re-enable components in this order:
1. `check_target`: validates the semantic target shape without native emission.
-2. One direct native `-O0` target with frame-only local homes.
+2. `NativeDirectTarget` plus one arch's `NativeTarget`/`NativeOps`, with
+ frame-only local homes.
3. The C-source target as a semantic-only backend.
-4. Local register caching in the direct native target.
+4. Local register caching in `NativeDirectTarget`.
5. `CFREE_OPT_ENABLED`: clean IR recorder using the semantic target.
6. Semantic optimizer passes that do not require native MIR/regalloc.
7. MIR lowering and `NativeTarget` for one arch.