kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 624912a1e292fe0c39d22645e67f0358ea742a02
parent f04946c4b6ed13d4d6dc4364e2ae512b6f6e77a8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 25 May 2026 18:37:05 -0700

CGTARGET plan update

Diffstat:
Mdoc/CGTARGET.md | 122++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 90 insertions(+), 32 deletions(-)

diff --git a/doc/CGTARGET.md b/doc/CGTARGET.md @@ -42,20 +42,26 @@ The intended layering is: ```text frontend -> CfreeCg/value stack -> semantic CGTarget - |-> direct O0 native/C target/WASM + |-> direct O0 NativeDirectTarget + | -> NativeOps -> NativeTarget + |-> C target / WASM / check target |-> IR recorder -> clean IR -> optimizer -> NativeTarget ``` -Native architectures may implement both downstream interfaces: +Native architectures expose a `NativeTarget` for physical emission. Direct +`-O0` native codegen should not require every arch to implement a separate +semantic `CGTarget` vtable. Instead, a shared `NativeDirectTarget` implements +the semantic `CGTarget` interface once and is parameterized by: -- a semantic `CGTarget` for direct `-O0` emission; -- a `NativeTarget` or native emitter for optimized post-regalloc emission. +- the arch's `NativeTarget`, which emits physical/native operations; +- a small arch-specific `NativeOps` adapter for direct-mode ABI/frame/legality + questions that the shared direct target cannot answer generically. -The two implementations can share instruction encoders, ABI helpers, frame -layout code, relocation helpers, inline-asm parsers, and debug/unwind helpers. -They should not share one vtable contract, because their input operands and -phase assumptions are different. +Optimized codegen does not use `NativeOps`; after MIR lowering and register +allocation, the optimizer drives `NativeTarget` directly. `NativeOps` exists +only to let `NativeDirectTarget` reuse `NativeTarget` without duplicating the +semantic `CGTarget` surface per native arch. ## Semantic CGTarget @@ -122,12 +128,43 @@ Those belong to native lowering or native emission. The direct `-O0` path remains: ```text -frontend -> CfreeCg/value stack -> semantic CGTarget -> native arch emits +frontend -> CfreeCg/value stack -> semantic CGTarget + -> NativeDirectTarget + -> NativeOps -> NativeTarget ``` -No IR recording is required. A native semantic target receives semantic locals -and operations, maps them to target-private storage, and emits machine code -immediately. +No IR recording is required. `NativeDirectTarget` is the shared semantic +`CGTarget` implementation for native architectures. It receives semantic +locals and operations, maps them to direct-mode physical storage, and emits +machine code immediately through the injected `NativeTarget`. + +`NativeDirectTarget` owns the direct-mode policy and state that should no +longer live in `CfreeCg`: + +- semantic local allocation and local metadata; +- assigning semantic locals frame homes; +- direct-mode scratch register allocation; +- optional local register caching; +- dirty-local flushing and cache invalidation; +- call/volatile/atomic/inline-asm memory barriers; +- caller-saved invalidation using native register metadata; +- materializing semantic operands into physical values; +- storing physical results back into semantic locals; +- max-outgoing-call-area tracking for frame finalization. + +`NativeOps` should stay small. It is not a second copy of `CGTarget`. It +answers arch-specific direct-mode questions and forwards special cases into +`NativeTarget`: + +- static register metadata (`NativeRegInfo`); +- function/frame begin/end glue for direct mode; +- frame slot allocation and slot-address formation; +- incoming parameter binding into a semantic local's home; +- return and tail-call ABI decisions; +- call planning/routing for direct calls; +- operand/addressing legality when the generic direct target has a choice; +- inline-asm, vararg, and other arch-sensitive helpers that cannot be + described as ordinary physical MIR emission. The simplest correct direct target can give every local a frame home and use scratch registers per instruction: @@ -166,13 +203,13 @@ typedef struct NativeLocal { } NativeLocal; ``` -The direct target then uses local greedy helpers: +`NativeDirectTarget` then uses local greedy helpers: ```c -Reg materialize(NativeCgTarget *, Operand op, RegClass cls); -Reg ensure_writable_reg(NativeCgTarget *, CGLocal dst, RegClass cls); -void flush_local(NativeCgTarget *, CGLocal local); -void spill_one(NativeCgTarget *, RegClass cls); +Reg materialize(NativeDirectTarget *, Operand op, RegClass cls); +Reg ensure_writable_reg(NativeDirectTarget *, CGLocal dst, RegClass cls); +void flush_local(NativeDirectTarget *, CGLocal local); +void spill_one(NativeDirectTarget *, RegClass cls); ``` The cache policy can be simple: @@ -217,12 +254,14 @@ clean IR Once the optimizer has assigned hard registers and spill slots, it should not replay through the semantic `CGTarget`. At that point the representation is no longer semantic lowered-CG IR. It is a native backend form, so final emission -should drive a `NativeTarget` or native emitter. +should drive `NativeTarget` directly. The optimizer-private representation may still use hard registers, spill slots, frame slots, call plans, block arrays, phis, dominance, liveness, and other backend-prep metadata. Those are derived views, not part of the semantic -target contract. +target contract. `NativeOps` is not part of this path; any operation the +optimizer needs should be represented in MIR or exposed by `NativeTarget` +itself. ## Unified IR Container @@ -308,8 +347,9 @@ contract. ## NativeTarget Surface -`NativeTarget` is the post-machinize, post-regalloc interface. It speaks final -machine locations and selected/native operations: +`NativeTarget` is the physical native emission interface. Optimized code uses +it post-machinize and post-regalloc, where it speaks final machine locations +and selected/native operations: ```text MIR_LOC_REG hard physical register @@ -336,6 +376,11 @@ struct NativeTarget { }; ``` +The shared direct path may also use the same `NativeTarget`, but it does so +through `NativeDirectTarget` and the small `NativeOps` adapter. This keeps the +semantic `CGTarget` surface maximally reused while avoiding a large +arch-specific direct `CGTarget` implementation per native backend. + This interface owns the machine-level concerns removed from semantic `CGTarget`: @@ -397,6 +442,11 @@ argument register/stack routing tail-call stack routing ``` +For direct `-O0`, `NativeOps` may expose call planning as an adapter because +`NativeDirectTarget` starts from semantic `CGCallDesc` values. For optimized +code, call planning belongs on `NativeTarget` or in MIR lowering; the optimizer +does not call `NativeOps`. + ## CfreeCg Value Stack The value stack remains useful, but its role should be narrowed. It should be a @@ -481,12 +531,15 @@ move machine concepts downward: out of semantic `CGTarget`. 3. Convert `CfreeCg` stack entries from physical register/frame ownership to semantic locals, lvalues, immediates, constants, and delayed compares. -4. Implement a direct native semantic target with the frame-only baseline. -5. Add a local register cache to direct targets only after correctness is +4. Implement shared `NativeDirectTarget` as the native semantic `CGTarget`, + initially with the frame-only baseline. +5. Introduce per-arch `NativeOps` only for direct-mode ABI/frame/legality glue, + forwarding physical emission through each arch's `NativeTarget`. +6. Add a local register cache to `NativeDirectTarget` only after correctness is stable. -6. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst` +7. Unify semantic IR, MIR, and allocated MIR around one `Func`/`Inst` container where practical, guarded by phase-specific verification. -7. Keep machine-only concepts phase-local while introducing a `NativeTarget` +8. Keep machine-only concepts phase-local while introducing a `NativeTarget` emission boundary for post-regalloc output. The result is a direct `-O0` path that stays fast and simple, plus an optimized @@ -517,12 +570,15 @@ The main surfaces are: - `src/opt/*`: current recorder, IR container, optimization passes, MIR lowering, register allocation, and final `opt_emit` replay into a native `CGTarget`. -- `src/arch/{aa64,x64,rv64}/*`: native direct emitters. The split affects - allocation helpers, operation lowering, call lowering, prologue/epilogue - patching, inline asm, and opt coordination hooks. +- `src/arch/{aa64,x64,rv64}/*`: native physical emitters. The split moves + direct-mode policy into shared `NativeDirectTarget`; per-arch code should + provide `NativeTarget`, `NativeOps`, ABI helpers, frame/prologue/epilogue + code, inline-asm support, and instruction encoders. - `src/arch/{aa64,x64,rv64}/opt_coord.c`: current hard-register and call-plan coordination hooks for the optimizer. These should move behind - `NativeRegInfo`, native call planning, and native MIR emission. + `NativeRegInfo`, native call planning, and native MIR emission. Direct-mode + consumers should reach equivalent answers through `NativeOps`, not through + the semantic `CGTarget` surface. - `src/arch/c_target/*`: source backend. It should implement semantic `CGTarget`, not `NativeTarget`. - `src/arch/wasm/*`: wasm target and structurizer. It is closer to a semantic @@ -597,15 +653,17 @@ That slice only requires: - semantic `CGTarget`; - semantic `CfreeCg` value stack; - check-only target; -- one direct native `-O0` target; +- shared `NativeDirectTarget`; +- one arch's `NativeTarget` and direct-mode `NativeOps`; - object writer and ABI support for that arch/object format. After that, re-enable components in this order: 1. `check_target`: validates the semantic target shape without native emission. -2. One direct native `-O0` target with frame-only local homes. +2. `NativeDirectTarget` plus one arch's `NativeTarget`/`NativeOps`, with + frame-only local homes. 3. The C-source target as a semantic-only backend. -4. Local register caching in the direct native target. +4. Local register caching in `NativeDirectTarget`. 5. `CFREE_OPT_ENABLED`: clean IR recorder using the semantic target. 6. Semantic optimizer passes that do not require native MIR/regalloc. 7. MIR lowering and `NativeTarget` for one arch.