kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 1233daa9e483569e46de1aff05ef023cbe92dc14
parent 05c9bbe7ebca86d0863b28fb963a29f2d37ec873
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  1 Jun 2026 17:33:32 -0700

plan: codegen update

Diffstat:
Adoc/plan/CODEGEN.md | 459+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 459 insertions(+), 0 deletions(-)

diff --git a/doc/plan/CODEGEN.md b/doc/plan/CODEGEN.md @@ -0,0 +1,459 @@ +# Codegen Interface Cleanup — Roadmap + +Status: **decided — ready to execute** (all eight decisions resolved in §Decisions). Forward-looking +companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal: make the +`CfreeCg` public API and the internal `CgTarget` contract carry **one clear +representation per concept, with no advertise-but-ignore surface and no façade**. +Breaking and sweeping changes are in scope; reducing churn is *not* a priority. + +The centerpiece is **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's +inference of what a stack slot *means* (lvalue vs rvalue, by-value vs by-reference). Its +core is decided (Model B; see §Track 7); the other tracks orbit it. + +## Scope + +Two stacked interfaces (see [doc/CODEGEN.md §The two boundaries](../CODEGEN.md)): + +- **Public** `cfree_cg_*` / `CfreeCg` (`include/cfree/cg.h`) — a value-stack machine. +- **Internal** `CgTarget` (`src/cg/cgtarget.h`) — a three-address operand vtable the + 5 realizations implement (native `-O0`, IR recorder, C-source, wasm). + +Between them sits the translation layer (`src/cg/value.c`, `arith.c`, `memory.c`, +`control.c`, `call.c`), which also performs `-O0` constant folding and compare +fusion. Almost every defect below lives at the **seam** between the two models — +duplicated, mapped lossily, or advertised on one side and dropped on the other. + +## Principles we are enforcing + +1. **One representation per concept.** No concept should exist in two structs or two + enums that must be hand-kept in sync. +2. **No advertise-but-ignore.** If a field/flag is in a public struct, it is either + honored or it does not exist. +3. **No façade.** A public enumerator that always panics is a bug. Either implement + it, or remove it, or gate it behind a capability query and a clean diagnostic — + consistent with how call-convs and symbol features already work. +4. **Width belongs to the type, not the opcode.** `bswap` is one operation, not three. +5. **Ops vs intrinsics has a stated rule** (§Track 4) and both layers obey it. +6. **The semantic layer may peephole, but that responsibility is named and isolated**, + not smeared across the op families. The vstack peephole is a *feature* (free `-O0` + perf), kept and maintained — not removed. +7. **Completeness over minimalism.** Keep an op/enumerator that has a distinct, sensible + meaning and completes an orthogonal set — *even with no current caller*. Judge a + surface by whether it is consistent and complete on its own terms, not by present + usage. Remove only the *redundant*: two spellings of one behavior. + +--- + +## Track 1 — Remove dead/redundant surface + +Genuine deletions are 1a (unreachable) and 1d (redundant). Applying Principle 7, **1b +and 1c are NOT deletions**: 1b (the vstack peephole) is a kept feature to *re-enable* +(now under Track 6); 1c (conditional control ops) is a complete set we keep and finish. +The net deletions here are pure subtraction with no behavior change. + +### 1a. `SCOPE_IF` / `CGScopeDesc.cond` / `scope_else` are unreachable +`cfree_cg_scope_begin` → `SCOPE_LOOP`, `cfree_cg_block_begin` → `SCOPE_BLOCK` +(`control.c:525,529`). **Nothing ever produces `SCOPE_IF`.** Therefore: +- `CGScopeDesc.cond` is never read for a real value. +- `scope_else` (implemented in `ir_recorder.c:625`, `native_direct_target.c:1792`, + recorded in IR `ir_dump.c:52`/`ir_print.c:94`) is never invoked. `nd_scope_else` + guards `if (s->kind != SCOPE_IF) panic` — which would *always* fire. +- wasm's native `SCOPE_IF` handling (`arch/wasm/emit.c`) is dead; `c_emit.c:1570` + already documents "Public API doesn't emit SCOPE_IF." + +**Action:** remove `SCOPE_IF`, `CGScopeDesc.cond`, the `scope_else` vtable member and +all 3+ implementations, and the dead wasm/native branches. `if` keeps lowering as two +nested `SCOPE_BLOCK` + `break_false` (the `cfree_cg_if_*` helpers). + +### 1b. Vstack peephole (`SV_ARITH`) — **keep & re-enable, moved to Track 6** +`api_can_delay_int_arith()` returns `0` (`value.c:1038`), gating off the delayed-arith +peephole (`api_make_arith_binop`/`_unop`, `api_materialize_arith_to`, `api_release_arith`, +`api_try_fold_arith_chain`, `api_try_collapse_binop_identity`, `api_try_fold_unary_chain`, +the `a_owned`/`b_owned` bookkeeping). It is **disabled, not dead-by-design**: git shows it +was live (`g && !flags && api_foldable_int_type(...)`) until commit `a126bec` ("extend +memory ops with effective-address rider") flipped it to `return 0` — the EA rider and the +delayed forms fought (see the "re-fetch in case alloc materialized a delayed expression" +workarounds at `memory.c:339`, `control.c:886`). **Decision (owner): keep the vstack +peephole — it is free `-O0` perf.** Since Track 7 *removes* the EA rider, re-enabling is +clean. Restoring the original gate + isolating the whole peephole is now **Track 6.3**; +[doc/CODEGEN.md:76-79](../CODEGEN.md) (which documents delayed arith as live) becomes +correct again. + +### 1c. Conditional control ops — **keep (complete set), per Principle 7** +`break_true`/`continue_true`/`continue_false` have 0 callers and `break_false` has 1, but +usage is not the test. The set is the orthogonal cross-product +`{break, continue} × {unconditional, _true, _false}` — the structured-scope analog of the +unstructured `branch_true`/`branch_false`, letting a frontend say "exit/continue this +scope if cond" without materializing a separate branch+label. Its result-carrying +semantics are well-defined (`cg.h:474-482`: `break_true` on an expression scope is +`[result, bool] → pop bool; if true pop result and exit`). Deleting only the unused +arms would make the API *incomplete and asymmetric* — exactly what we are fixing. +**Action:** keep the full set; **audit it for completeness** (confirm continue is +rejected on non-loop scopes, that block vs loop scope rules are uniform, and that every +arm has a test). `cfree_cg_block_begin` (0 direct callers, used via `cfree_cg_if_*`) is a +distinct, sensible primitive — keep. + +### 1d. `CFREE_CG_TAIL_NEVER` — remove (redundant, not incomplete) +Documented as "Treated as DEFAULT" (`cg.h:813`): a second spelling of `DEFAULT` with +identical semantics. Unlike 1c, removing it *increases* consistency (no two enumerators +mean the same thing). **Action:** remove; "no tail" is `CFREE_CG_TAIL_DEFAULT`. + +**Affected (1a/1d):** `cg.h`, `cgtarget.h`, `control.c`, `ir_recorder.c`, +`native_direct_target.c`, `arch/wasm/emit.c`, `arch/c_target/c_emit.c`, `ir_dump.c`, +`opt/ir_print.c`. +**Tests:** existing control-flow + toy + wasm suites stay green (1a/1d are no-behavior +deletions); **add** the missing-arm coverage for 1c (each `break_*`/`continue_*` variant +exercised end-to-end on a backend). + +--- + +## Track 2 — Unify the op-enum vocabulary + +Every operation enum exists twice and is hand-mapped 1:1 in `value.c`: + +| Public (`cg.h`) | Internal (`cgtarget.h`) | Relationship | Mapper | +|---|---|---|---| +| `CfreeCgAtomicOp` (7) | `AtomicOp` (7) | **identical** | `api_map_atomic_op` | +| `CfreeCgMemOrder` (6) | `MemOrder` (6) | **identical** | `api_map_mem_order` | +| `CfreeCgAsmDir` (3) | `AsmDir` (3) | **identical** | `api_map_asm_dir` | +| `CfreeCgIntBinOp`+`CfreeCgFpBinOp` | `BinOp` | split→merged | `api_map_int_binop`/`api_map_fp_binop` | +| `CfreeCgIntCmpOp`(10)+`CfreeCgFpCmpOp`(12) | `CmpOp` (14) | split→merged, **lossy** | `api_map_int_cmp`/`api_map_fp_cmp` | +| `CfreeCgIntUnOp`+`CfreeCgFpUnOp` | `UnOp` | split→merged | `api_map_int_unop` | + +Two concrete defects: +- The split→merge→split round-trip earns nothing: every native backend re-splits + int/fp immediately (`aa64/native.c:2070`, `x64/native.c:906`). +- The merge is **lossy**: `api_map_fp_cmp` collapses `OEQ`/`UEQ`→`CMP_EQ` (`value.c:648-668`) + so the public ordered/unordered distinction cannot survive to a backend; and + `api_map_fp_binop` maps `CFREE_CG_FP_REM`→`BO_FDIV` (`value.c:605`), which is dead + *and* wrong-looking. + +**Decision (recommended):** `CgTarget` consumes the public `CfreeCg*` op enums directly. +Delete the parallel internal `BinOp`/`UnOp`/`CmpOp`/`AtomicOp`/`MemOrder`/`AsmDir` and +every `api_map_*` (~200 lines of `value.c`). `cgtarget.h` already `#include`s `cfree/cg.h`, +so this is mechanical. Keep the public **int/fp split** (it is the clearer API and +matches what backends do anyway); backends switch on `CfreeCgIntBinOp` and +`CfreeCgFpBinOp` separately. This is a single-repo internal contract, not a published +backend ABI, so coupling it to the public enum values is acceptable. See §Open +decisions #2 for the split-vs-merged confirmation. + +**Affected:** `cgtarget.h` (enum deletions + signature changes on `binop`/`unop`/`cmp`/ +`atomic_*`/`fence`/`asm_block`), all 5 backends' switch sites, `ir_recorder.c` + +`opt/` IR (the recorded op field changes type), `value.c`/`arith.c`/`atomic.c`/`asm.c`. +**Tests:** ISA encode/decode (`test-isa`, `test-arch`), opt, smoke; add a case that +exercises an unordered FP compare end-to-end (currently lossy). + +--- + +## Track 3 — Unify duplicated representations + +### 3a. Two `MemAccess` structs + advertise-but-ignore flags +Public `CfreeCgMemAccess` vs internal `MemAccess`, with non-overlapping flag enums +(`CfreeCgMemAccessFlag` vs `MemFlag`). `api_mem_from_access` (`value.c:284-295`) +translates only `VOLATILE`; **`NONTEMPORAL` and `INVARIANT` are silently dropped**, and +`alias_scope`/`noalias_scope` are **never read** by anything. + +**Action:** either (a) carry `NONTEMPORAL`/`INVARIANT` through to an internal carrier +and into at least one backend, or (b) remove them from `CfreeCgMemAccess`. Remove +`alias_scope`/`noalias_scope` until there is a consumer. Keep one access struct as the +source of truth; derive the internal one by a single documented projection (not a +parallel hand-maintained type). Recommendation: (b) remove now — no frontend sets them +except toy, and there is no internal model for them. + +### 3b. Bitfields exist in three representations +- Public rider on `CfreeCgMemAccess` (`bit_offset`/`bit_width`/`storage_size`/`bit_signed`). +- Public rider on `CfreeCgField` (`bit_width`/`bit_offset`/`bit_storage_size`/`bit_signed` + — note `storage_size` vs `bit_storage_size` naming drift). +- Internal dedicated `BitFieldAccess` + `bitfield_load`/`bitfield_store`. + +The public load/store carry 4 bitfield fields that most callers zero, bridged by +`bf_from_access` (`memory.c:364`) into the dedicated internal path. + +**Action:** expose **dedicated public bitfield ops** (`cfree_cg_bitfield_load`/`_store` +taking an explicit `CfreeCgBitField` struct), mirroring the internal shape. Drop the +bitfield fields from `CfreeCgMemAccess` entirely. Keep `CfreeCgField`'s layout-query +fields (they answer record-layout queries) but rename for consistency. This removes the +"every memop is secretly maybe-a-bitfield" branch from `cfree_cg_load`/`_store` +(`memory.c:420,577,646`). + +### 3c. `scale` vs `log2_scale` — **superseded by Track 7** +The public `CfreeCgEffAddr` rider is removed entirely in Track 7 (its base+index*scale+ +offset job moves into the place representation built by `field`/`elem`). The scale-form +mismatch disappears with it. No separate action here. + +**Affected:** `cg.h`, `cgtarget.h`, `memory.c`, all backends' `load`/`store`/`bitfield_*`, +the C frontend's bitfield path (`lang/c/parse/cg_adapter.c`). +**Tests:** bitfield corpus in toy + C; `test-cg-api`. + +--- + +## Track 4 — Fix the op/intrinsic taxonomy + +Today "op vs intrinsic" is drawn inconsistently across and within layers: +- `memcpy`/`memset`: dedicated **public ops**, internal **intrinsics** (`INTRIN_MEMCPY`…). +- `unreachable`: public **op** documented as "a real terminator, not a side-effect + intrinsic" (`cg.h:560`) — yet lowered through the **intrinsic** hook (`control.c:401`, + `INTRIN_UNREACHABLE`). Direct doc/impl contradiction. +- `trap`: public **intrinsic**. +- `bswap`: **1** public intrinsic but **3** internal (`BSWAP16/32/64`), split by a + size test in `api_map_intrinsic` (`arith.c:803-806`). + +**The rule (proposed):** +- **Terminators are first-class `CgTarget` ops** (ret, unreachable, jump, branch, + computed_goto, tail-call). Give `unreachable` its own hook and honor its documented + terminator status; stop routing it through `intrinsic`. +- **Primitives that may lower to either an inline sequence or a libcall are intrinsics** + (clz/ctz/popcount/bswap/overflow/fma/memcpy/memset). Decide each concept's home once + and make public+internal agree. Recommendation: keep `memcpy`/`memset` as dedicated + *public* ops (they carry rich `MemAccess`) but stop double-modeling them as a separate + public *intrinsic* surface. +- **Width comes from the operand type, not the opcode.** Collapse `BSWAP16/32/64` → one + `BSWAP`; backends read width from the operand. Deletes the size-branch in + `api_map_intrinsic`. + +### 4b. Façade intrinsics (ties into Track 1) +`api_map_intrinsic` maps ~16 enumerators (`FMA`, `SYSCALL`, all `IRQ_*`, `DMB`/`DSB`/`ISB`, +`DCACHE_*`/`ICACHE_*`, `CPU_NOP`/`CPU_YIELD`/`WFI`/`WFE`/`SEV`, `CORO_SWITCH`) → `INTRIN_NONE`, +and `cfree_cg_intrinsic` turns `INTRIN_NONE` into `compiler_panic("unsupported intrinsic")` +(`arith.c:884`). The toy frontend calls them in good faith (`builtins.c:507`); the +expected-error test `test/toy/err/unsupported_cpu_nop.toy` confirms the panic is the +*current intended behavior*. `CFREE_CG_FP_REM` is the same (`arith.c:573`). And unlike +call-convs/symbol-features, there is **no `supports_` query for intrinsics**, so a +frontend cannot check before it panics. + +**Action:** +1. Add `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)`, consistent + with `cfree_cg_target_supports_call_conv`/`_symbol_feature`. +2. Convert the bare `compiler_panic` into a proper unsupported-feature diagnostic. +3. Implement the trivial single-instruction baremetal/CPU intrinsics on native arches + (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) — these are one + instruction each and the toy frontend already wants them. +4. Leave `FMA`/`SYSCALL`/`CORO_SWITCH` reported `false` by the query until implemented; + remove `CFREE_CG_FP_REM` (no path, and fp rem is a libcall the frontend can emit). + +See §Open decisions #3 (implement-vs-formally-unsupported per intrinsic). + +**Affected:** `cg.h`, `cgtarget.h`, `arith.c`, `control.c`, native backends' +`intrinsic`, `lang/toy/builtins.c`, `test/toy/err/`. +**Tests:** add `supports_intrinsic` coverage; convert the toy err-cases that become +supported into positive smoke cases. + +--- + +## Track 5 — Expose multi-result publicly + +The internal stack is already multi-result: `CGCallDesc`/`CGFuncDesc`/`ret` carry +`nresults`/`nvalues`, and backends realize `>1` via `plan_call`/`plan_ret` (no backend +asserts ≤1). But the public API tops out at one: `CfreeCgFuncSig` has a single `ret` +(`cg.h:102`), `session.c:318-324` fills `fn_result_types[1]` with 0 or 1, and +`cfree_cg_call`/`call_symbol` push exactly one result (`call.c:228,287,161`), `cfree_cg_ret` +pops one (`call.c:316`). **Decision: expose it.** Because backends already handle it, +this is a public-API + type-system + `value.c` change with **no backend work**. + +### API shape +```c +/* Symmetric with CfreeCgFuncParam. */ +typedef struct CfreeCgFuncResult { CfreeCgTypeId type; CfreeCgAbiAttrs attrs; } CfreeCgFuncResult; + +typedef struct CfreeCgFuncSig { + const CfreeCgFuncResult* results; /* was: CfreeCgTypeId ret; CfreeCgAbiAttrs ret_attrs; */ + uint32_t nresults; /* 0 = void */ + const CfreeCgFuncParam* params; + uint32_t nparams; + CfreeCgCallConv call_conv; + bool abi_variadic; +} CfreeCgFuncSig; +``` +- Type queries: replace `cfree_cg_type_func_ret`/`_ret_attrs` with + `cfree_cg_type_func_nresults` + `cfree_cg_type_func_result(idx)`. +- Type system: `CgType.func` stores `results[]`+`nresults`; interning (`type.c:344`) and + `cg_type_func_ret_id` (`type.c:268,827`) updated. +- `CfreeCg`: `fn_ret_type`/`fn_result_types[1]` → a small results array. +- **Stack-order convention (must be specified):** results are pushed by `cfree_cg_call` + in declaration order, so TOS is the last result; `cfree_cg_ret` pops `nresults` values + expecting the same order (last result on top). Document this on both calls. +- `void` is `nresults==0`; **`cfree_cg_ret_void` is removed** (decision #4): a void + function returns via `cfree_cg_ret` with 0 results — one return entry point. + +**Affected:** `cg.h`, `type.c`/`type.h`, `session.c`, `call.c`, every frontend's +func-type construction and `cfree_cg_type_func_ret` caller (C/toy/wasm adapters), wasm +backend can now surface true multi-value returns; every `cfree_cg_ret_void` caller +migrates to a 0-result `cfree_cg_ret`. +**Tests:** new `test-cg-api` + toy cases returning 2 values; wasm multi-value smoke. + +--- + +## Track 6 — Isolate and complete the semantic peephole + +The semantic layer is also a `-O0` peephole optimizer, and that is **a feature we keep** +(free `-O0` perf, Principle 6). This track gives it a named home and restores the half +that was switched off. + +### Current state +- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, driven from + `arith.c:44,126,171`) and the `SV_CMP` fused-compare-into-branch path + (`api_make_cmp`/`api_materialize_cmp_to`/`api_branch_if`). +- **Disabled (not dead-by-design):** the `SV_ARITH` delayed-arith subsystem, gated by + `api_can_delay_int_arith()==0`. It was live until `a126bec` flipped it off to ship the + EA rider (Track 1b). Track 7 removes that rider. +- **Live:** scalar store-to-load forwarding (`api_local_const_*`, `value.c:939-1036`). + +### Action +1. **6.2 — Extract the live peephole into `src/cg/fold.c` + `fold.h`** with a documented + contract: integer fold helpers, the `SV_CMP` lifecycle (make/release/materialize/ + branch-fuse), and const-local forwarding with its invalidation boundaries + (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The op + families (`arith.c`/`memory.c`/`control.c`/`call.c`) call into `fold.h` instead of + reaching into `value.c` internals. This also settles `ApiSValue`'s shape before Track 7. +2. **6.3 — Re-enable delayed arith *after* Track 7** (once the EA rider is gone). Restore + the original gate (`g && !flags && api_foldable_int_type(...)`), bring + `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith`/the fold-chain + + identity-collapse helpers under `fold.c`, and verify the delayed forms now compose with + the place/value model (the old conflict was specifically the EA rider). Net `-O0` win: + small immediates flow into `binop`, arith chains and identities fold. +3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** to match the restored, isolated peephole. + +**Affected:** `value.c`, `arith.c`, `internal.h`, new `fold.c`/`fold.h`, `doc/CODEGEN.md`. +**Tests:** `-O0` smoke + opt suites; snapshot-diff to confirm the peephole *improves* +`-O0` codegen (const-fold, fused compare, delayed arith) with no `-O1+` regression. + +--- + +## Track 7 — Strict place/value discipline (the centerpiece) + +**Decided:** Model B (explicit place/value kinds); wide-16 scalars are *values*. + +Today the value stack carries an **inferred** lvalue/rvalue distinction and several ops +accept multiple operand shapes and dispatch on type + shape. A stack slot's meaning is +*computed*, not declared. The inference points: + +- **`api_is_lvalue_sv` is a heuristic** (`value.c:176-180`): ORs the `lvalue` flag, + `bitfield_lvalue`, `api_operand_can_address`, and `source_local!=NONE && OPK_LOCAL`. +- **`cfree_cg_load` has ~7 behaviors, several of which don't load** (`memory.c:436-568`): + aggregate-lvalue@0 re-pushed as-is; ptr-rvalue-to-aggregate re-pushed; `OPK_GLOBAL` + aggregate/wide16 flips `lvalue=1`; scalar-local returns the local value directly; + wide16 keeps storage; then two general lvalue/ptr-rvalue paths. +- **`load`/`store` `base` accepts 4 shapes** ({lvalue, ptr-rvalue} × {no-index, indexed}); + there is **no explicit deref** — a pointer base is silently dereferenceable. +- **`cfree_cg_index` infers pointer-vs-array-lvalue** (`control.c:849-860`); + **`cfree_cg_field` infers record-lvalue-vs-pointer** (`control.c:941-952`). +- **Aggregates are implicitly by-reference and CG decides it** (`call.c:18-42,101-106, + 310-315`): the frontend never says "pass by reference"; CG infers it from + `cg_type_is_aggregate`. +- **wide16 (i128/f128) is special-cased** as aggregate-like throughout (`memory.c:504-533`, + `call.c:53-66`). + +### The discipline +Every stack entry is exactly one explicit, type-checked kind — no heuristic: + +- **PLACE** — an addressable location of a typed object. Representation = the existing + addressable operands (`OPK_LOCAL` / `OPK_GLOBAL` / `OPK_INDIRECT(base+index*scale+off)`). +- **VALUE** — a scalar rvalue: integers, floats, **pointers, and now i128/f128**. + +CG keeps owning **layout** (field offsets, element sizes, types — deterministic +computation from the record/array type). What it stops doing is **guessing the kind or +passing-mode of a stack value**. Every op declares the kinds it consumes/produces and +panics on mismatch. + +### Op signatures (strict, single-shape) +| Op | Consumes | Produces | Notes | +|---|---|---|---| +| `push_local l` | — | **PLACE** | the local's storage | +| `push_int/float/null` | — | VALUE | | +| `push_symbol_addr s,a` | — | VALUE (ptr) | | +| `push_local_addr l` | — | VALUE (ptr) | sugar for `push_local; addr` | +| `addr` | PLACE | VALUE (ptr) | address of the place | +| **`deref`** (NEW) | VALUE (ptr) | PLACE | the explicit ptr→place transition | +| `field i` | PLACE(record) | PLACE(field) | offset/type from layout; for `->` do `deref; field` | +| `elem` (was `index`) | VALUE(ptr to T) + index VALUE | PLACE(T) | `*(p+i)`; scale = `sizeof(T)`. Array lvalues decay to ptr first | +| `load access` | PLACE | VALUE | always dereferences; **no EA rider** | +| `store access` | PLACE, VALUE | — | always dereferences | + +The **`CfreeCgEffAddr` rider is removed** from `load`/`store`: addressing is built +explicitly by `field`/`elem`/`deref` and absorbed into the `OPK_INDIRECT` place, so the +backend still receives a single `[base+index*scale+off]` memop. The kept fold layer +(Track 6) recovers `-O0` quality: `load` of `PLACE(local)` folds to the local (no memory +round-trip), and a `deref` of a pointer-arith chain folds back into the place's indirect +form. Per decision #8 this recovery is **desirable but not a gate** — Track 7 may land +ahead of the peephole work; `-O1+` carries quality. + +### Aggregates (values forbidden) +An aggregate is **always a PLACE**; a VALUE of aggregate type is illegal (panic). Reading +an aggregate = keeping its place. Copies are explicit (`memcpy` between two places, or +field-by-field). Call args/returns of aggregate type pass an explicit place, with the +mode named via the ABI attrs that already exist (`SRET`/`BYVAL`/`BYREF`). This removes +the aggregate branches from `api_materialize_call_local`, `api_push_call_result`, and the +aggregate `ret` path — the frontend states the passing mode instead of CG inferring it. + +### wide16 (decided: scalar values) +`i128`/`f128` are VALUES like any scalar; the backend lowers 16-byte storage/moves. The +wide16 special paths in `memory.c`/`call.c`/`wide.c` collapse into the normal value path +(plus backend support for 16-byte value moves where not already present). + +### Inference points removed +`api_is_lvalue_sv` (→ a kind tag check); the 7-way `load` cascade (→ one deref+load); +`load`/`store` 4-shape base (→ one PLACE); `index`/`field` dual-mode (→ `elem` on ptr, +`field` on place); aggregate auto-by-ref (→ explicit place + ABI attr); wide16 special +path (→ value path). + +**Affected:** `cg.h` (new `deref`, `elem` rename, EA rider removed from `load`/`store`, +`ApiSValue` kind tag), `value.c` (kind discipline replaces `lvalue`/`api_is_lvalue_sv`), +`memory.c` (load/store rewritten; `fold_ea_into_operand`/`pop_and_normalize_index` +folded into place-building), `control.c` (`index`→`elem`, `field`), `call.c` (aggregate +branches removed), `wide.c` (wide16 path removed), **every frontend** (insert explicit +`deref`/array-decay where they relied on pointer-base load/store; mark aggregate passing +modes). Backends mostly unaffected (they already consume `OPK_INDIRECT`). +**Tests:** this is the highest-blast-radius track — red-green per op, lean on the toy +corpus and C frontend for *correctness*; snapshot `-O0` codegen to *track* addressing-mode +recovery (decision #8: `-O0` quality is not a gate, so a temporary regression does not +block landing). + +## Recommended sequencing + +Each track is independently shippable and testable. Suggested order by risk/leverage and +dependency: + +1. **Track 1 (remove dead/redundant surface: 1a, 1d) + 1c completeness audit.** Pure + subtraction plus filling test gaps; no behavior change. Shrinks the surface. +2. **Track 6.2 (isolate the live fold layer into `fold.c`).** Settles `ApiSValue`'s shape + and makes the fold layer a clean dependency for Track 7. +3. **Track 7 (place/value discipline).** The centerpiece; removes the EA rider; depends on + a solid fold layer. Highest blast radius — do it deliberately, red-green. +4. **Track 6.3 (re-enable delayed arith).** Now that Track 7 removed the EA rider that + killed it; free `-O0` perf, under the isolated `fold.c`. +5. **Track 3a/3b (MemAccess unify + bitfield-as-PLACE-subkind).** On the strict + place-based `load`/`store`. (3c was folded into 7; 3b merges via decision #7.) +6. **Track 2 + Track 4 (op/intrinsic vocabulary).** Independent of the above; reshape the + op/intrinsic vocabulary once. +7. **Track 5 (multi-result).** Independent; public + type-system + `value.c` only. + +## Decisions (all resolved — ready to execute) + +1. **`SV_ARITH`: delete or re-enable?** **DECIDED (owner): re-enable** — the vstack + peephole is a kept feature for free `-O0` perf. It was disabled by `a126bec` to ship + the EA rider, which Track 7 removes; restore + isolate under Track 6.2/6.3. +2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public + `CfreeCg*` op enums directly; delete internal `BinOp`/`UnOp`/`CmpOp`/`AtomicOp`/ + `MemOrder`/`AsmDir` + all `api_map_*`. Couples the internal contract to public enum + values (accepted for an in-repo contract). +3. **Façade intrinsics: query + implement the trivial ones.** Add + `cfree_cg_target_supports_intrinsic` + a clean diagnostic; implement the + single-instruction baremetal/CPU intrinsics on native arches; report + `FMA`/`SYSCALL`/`CORO_SWITCH` false until built; remove `FP_REM`. +4. **`cfree_cg_ret_void`: fold into `ret`.** Remove `ret_void`; a void function returns + via `cfree_cg_ret` with 0 results — a single return entry point. +5. **`NONTEMPORAL`/`INVARIANT`/alias scopes: remove now.** Drop them from + `CfreeCgMemAccess`; re-add with a real internal carrier + backend consumer when needed. + +**Track 7 model (decided earlier):** Model B (explicit PLACE/VALUE kinds + `deref`; +aggregate values forbidden); wide-16 scalars are values; the `CfreeCgEffAddr` rider is +removed. + +6. **`elem` operand shape: pointer VALUE + explicit array-decay.** `elem` consumes a + pointer VALUE (`*(p+i)`); array lvalues decay via an explicit PLACE(array)→VALUE(ptr) + op. One shape, no dual-mode. +7. **Bitfields: PLACE subkind.** A bitfield is a PLACE subkind carrying the descriptor; + the normal `load`/`store` perform the extract/insert. Merges Track 3b into Track 7. +8. **`-O0` quality: not a gate.** Track 7 may land the cleaner semantics even with `-O0` + codegen regressions; `-O1+` carries quality. The vstack peephole (Track 6.3) is still + restored for the free `-O0` win, but it does **not** block Track 7.