commit 1233daa9e483569e46de1aff05ef023cbe92dc14
parent 05c9bbe7ebca86d0863b28fb963a29f2d37ec873
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 1 Jun 2026 17:33:32 -0700
plan: codegen update
Diffstat:
| A | doc/plan/CODEGEN.md | | | 459 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 459 insertions(+), 0 deletions(-)
diff --git a/doc/plan/CODEGEN.md b/doc/plan/CODEGEN.md
@@ -0,0 +1,459 @@
+# Codegen Interface Cleanup — Roadmap
+
+Status: **decided — ready to execute** (all eight decisions resolved in §Decisions). Forward-looking
+companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal: make the
+`CfreeCg` public API and the internal `CgTarget` contract carry **one clear
+representation per concept, with no advertise-but-ignore surface and no façade**.
+Breaking and sweeping changes are in scope; reducing churn is *not* a priority.
+
+The centerpiece is **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's
+inference of what a stack slot *means* (lvalue vs rvalue, by-value vs by-reference). Its
+core is decided (Model B; see §Track 7); the other tracks orbit it.
+
+## Scope
+
+Two stacked interfaces (see [doc/CODEGEN.md §The two boundaries](../CODEGEN.md)):
+
+- **Public** `cfree_cg_*` / `CfreeCg` (`include/cfree/cg.h`) — a value-stack machine.
+- **Internal** `CgTarget` (`src/cg/cgtarget.h`) — a three-address operand vtable the
+ 5 realizations implement (native `-O0`, IR recorder, C-source, wasm).
+
+Between them sits the translation layer (`src/cg/value.c`, `arith.c`, `memory.c`,
+`control.c`, `call.c`), which also performs `-O0` constant folding and compare
+fusion. Almost every defect below lives at the **seam** between the two models —
+duplicated, mapped lossily, or advertised on one side and dropped on the other.
+
+## Principles we are enforcing
+
+1. **One representation per concept.** No concept should exist in two structs or two
+ enums that must be hand-kept in sync.
+2. **No advertise-but-ignore.** If a field/flag is in a public struct, it is either
+ honored or it does not exist.
+3. **No façade.** A public enumerator that always panics is a bug. Either implement
+ it, or remove it, or gate it behind a capability query and a clean diagnostic —
+ consistent with how call-convs and symbol features already work.
+4. **Width belongs to the type, not the opcode.** `bswap` is one operation, not three.
+5. **Ops vs intrinsics has a stated rule** (§Track 4) and both layers obey it.
+6. **The semantic layer may peephole, but that responsibility is named and isolated**,
+ not smeared across the op families. The vstack peephole is a *feature* (free `-O0`
+ perf), kept and maintained — not removed.
+7. **Completeness over minimalism.** Keep an op/enumerator that has a distinct, sensible
+ meaning and completes an orthogonal set — *even with no current caller*. Judge a
+ surface by whether it is consistent and complete on its own terms, not by present
+ usage. Remove only the *redundant*: two spellings of one behavior.
+
+---
+
+## Track 1 — Remove dead/redundant surface
+
+Genuine deletions are 1a (unreachable) and 1d (redundant). Applying Principle 7, **1b
+and 1c are NOT deletions**: 1b (the vstack peephole) is a kept feature to *re-enable*
+(now under Track 6); 1c (conditional control ops) is a complete set we keep and finish.
+The net deletions here are pure subtraction with no behavior change.
+
+### 1a. `SCOPE_IF` / `CGScopeDesc.cond` / `scope_else` are unreachable
+`cfree_cg_scope_begin` → `SCOPE_LOOP`, `cfree_cg_block_begin` → `SCOPE_BLOCK`
+(`control.c:525,529`). **Nothing ever produces `SCOPE_IF`.** Therefore:
+- `CGScopeDesc.cond` is never read for a real value.
+- `scope_else` (implemented in `ir_recorder.c:625`, `native_direct_target.c:1792`,
+ recorded in IR `ir_dump.c:52`/`ir_print.c:94`) is never invoked. `nd_scope_else`
+ guards `if (s->kind != SCOPE_IF) panic` — which would *always* fire.
+- wasm's native `SCOPE_IF` handling (`arch/wasm/emit.c`) is dead; `c_emit.c:1570`
+ already documents "Public API doesn't emit SCOPE_IF."
+
+**Action:** remove `SCOPE_IF`, `CGScopeDesc.cond`, the `scope_else` vtable member and
+all 3+ implementations, and the dead wasm/native branches. `if` keeps lowering as two
+nested `SCOPE_BLOCK` + `break_false` (the `cfree_cg_if_*` helpers).
+
+### 1b. Vstack peephole (`SV_ARITH`) — **keep & re-enable, moved to Track 6**
+`api_can_delay_int_arith()` returns `0` (`value.c:1038`), gating off the delayed-arith
+peephole (`api_make_arith_binop`/`_unop`, `api_materialize_arith_to`, `api_release_arith`,
+`api_try_fold_arith_chain`, `api_try_collapse_binop_identity`, `api_try_fold_unary_chain`,
+the `a_owned`/`b_owned` bookkeeping). It is **disabled, not dead-by-design**: git shows it
+was live (`g && !flags && api_foldable_int_type(...)`) until commit `a126bec` ("extend
+memory ops with effective-address rider") flipped it to `return 0` — the EA rider and the
+delayed forms fought (see the "re-fetch in case alloc materialized a delayed expression"
+workarounds at `memory.c:339`, `control.c:886`). **Decision (owner): keep the vstack
+peephole — it is free `-O0` perf.** Since Track 7 *removes* the EA rider, re-enabling is
+clean. Restoring the original gate + isolating the whole peephole is now **Track 6.3**;
+[doc/CODEGEN.md:76-79](../CODEGEN.md) (which documents delayed arith as live) becomes
+correct again.
+
+### 1c. Conditional control ops — **keep (complete set), per Principle 7**
+`break_true`/`continue_true`/`continue_false` have 0 callers and `break_false` has 1, but
+usage is not the test. The set is the orthogonal cross-product
+`{break, continue} × {unconditional, _true, _false}` — the structured-scope analog of the
+unstructured `branch_true`/`branch_false`, letting a frontend say "exit/continue this
+scope if cond" without materializing a separate branch+label. Its result-carrying
+semantics are well-defined (`cg.h:474-482`: `break_true` on an expression scope is
+`[result, bool] → pop bool; if true pop result and exit`). Deleting only the unused
+arms would make the API *incomplete and asymmetric* — exactly what we are fixing.
+**Action:** keep the full set; **audit it for completeness** (confirm continue is
+rejected on non-loop scopes, that block vs loop scope rules are uniform, and that every
+arm has a test). `cfree_cg_block_begin` (0 direct callers, used via `cfree_cg_if_*`) is a
+distinct, sensible primitive — keep.
+
+### 1d. `CFREE_CG_TAIL_NEVER` — remove (redundant, not incomplete)
+Documented as "Treated as DEFAULT" (`cg.h:813`): a second spelling of `DEFAULT` with
+identical semantics. Unlike 1c, removing it *increases* consistency (no two enumerators
+mean the same thing). **Action:** remove; "no tail" is `CFREE_CG_TAIL_DEFAULT`.
+
+**Affected (1a/1d):** `cg.h`, `cgtarget.h`, `control.c`, `ir_recorder.c`,
+`native_direct_target.c`, `arch/wasm/emit.c`, `arch/c_target/c_emit.c`, `ir_dump.c`,
+`opt/ir_print.c`.
+**Tests:** existing control-flow + toy + wasm suites stay green (1a/1d are no-behavior
+deletions); **add** the missing-arm coverage for 1c (each `break_*`/`continue_*` variant
+exercised end-to-end on a backend).
+
+---
+
+## Track 2 — Unify the op-enum vocabulary
+
+Every operation enum exists twice and is hand-mapped 1:1 in `value.c`:
+
+| Public (`cg.h`) | Internal (`cgtarget.h`) | Relationship | Mapper |
+|---|---|---|---|
+| `CfreeCgAtomicOp` (7) | `AtomicOp` (7) | **identical** | `api_map_atomic_op` |
+| `CfreeCgMemOrder` (6) | `MemOrder` (6) | **identical** | `api_map_mem_order` |
+| `CfreeCgAsmDir` (3) | `AsmDir` (3) | **identical** | `api_map_asm_dir` |
+| `CfreeCgIntBinOp`+`CfreeCgFpBinOp` | `BinOp` | split→merged | `api_map_int_binop`/`api_map_fp_binop` |
+| `CfreeCgIntCmpOp`(10)+`CfreeCgFpCmpOp`(12) | `CmpOp` (14) | split→merged, **lossy** | `api_map_int_cmp`/`api_map_fp_cmp` |
+| `CfreeCgIntUnOp`+`CfreeCgFpUnOp` | `UnOp` | split→merged | `api_map_int_unop` |
+
+Two concrete defects:
+- The split→merge→split round-trip earns nothing: every native backend re-splits
+ int/fp immediately (`aa64/native.c:2070`, `x64/native.c:906`).
+- The merge is **lossy**: `api_map_fp_cmp` collapses `OEQ`/`UEQ`→`CMP_EQ` (`value.c:648-668`)
+ so the public ordered/unordered distinction cannot survive to a backend; and
+ `api_map_fp_binop` maps `CFREE_CG_FP_REM`→`BO_FDIV` (`value.c:605`), which is dead
+ *and* wrong-looking.
+
+**Decision (recommended):** `CgTarget` consumes the public `CfreeCg*` op enums directly.
+Delete the parallel internal `BinOp`/`UnOp`/`CmpOp`/`AtomicOp`/`MemOrder`/`AsmDir` and
+every `api_map_*` (~200 lines of `value.c`). `cgtarget.h` already `#include`s `cfree/cg.h`,
+so this is mechanical. Keep the public **int/fp split** (it is the clearer API and
+matches what backends do anyway); backends switch on `CfreeCgIntBinOp` and
+`CfreeCgFpBinOp` separately. This is a single-repo internal contract, not a published
+backend ABI, so coupling it to the public enum values is acceptable. See §Open
+decisions #2 for the split-vs-merged confirmation.
+
+**Affected:** `cgtarget.h` (enum deletions + signature changes on `binop`/`unop`/`cmp`/
+`atomic_*`/`fence`/`asm_block`), all 5 backends' switch sites, `ir_recorder.c` +
+`opt/` IR (the recorded op field changes type), `value.c`/`arith.c`/`atomic.c`/`asm.c`.
+**Tests:** ISA encode/decode (`test-isa`, `test-arch`), opt, smoke; add a case that
+exercises an unordered FP compare end-to-end (currently lossy).
+
+---
+
+## Track 3 — Unify duplicated representations
+
+### 3a. Two `MemAccess` structs + advertise-but-ignore flags
+Public `CfreeCgMemAccess` vs internal `MemAccess`, with non-overlapping flag enums
+(`CfreeCgMemAccessFlag` vs `MemFlag`). `api_mem_from_access` (`value.c:284-295`)
+translates only `VOLATILE`; **`NONTEMPORAL` and `INVARIANT` are silently dropped**, and
+`alias_scope`/`noalias_scope` are **never read** by anything.
+
+**Action:** either (a) carry `NONTEMPORAL`/`INVARIANT` through to an internal carrier
+and into at least one backend, or (b) remove them from `CfreeCgMemAccess`. Remove
+`alias_scope`/`noalias_scope` until there is a consumer. Keep one access struct as the
+source of truth; derive the internal one by a single documented projection (not a
+parallel hand-maintained type). Recommendation: (b) remove now — no frontend sets them
+except toy, and there is no internal model for them.
+
+### 3b. Bitfields exist in three representations
+- Public rider on `CfreeCgMemAccess` (`bit_offset`/`bit_width`/`storage_size`/`bit_signed`).
+- Public rider on `CfreeCgField` (`bit_width`/`bit_offset`/`bit_storage_size`/`bit_signed`
+ — note `storage_size` vs `bit_storage_size` naming drift).
+- Internal dedicated `BitFieldAccess` + `bitfield_load`/`bitfield_store`.
+
+The public load/store carry 4 bitfield fields that most callers zero, bridged by
+`bf_from_access` (`memory.c:364`) into the dedicated internal path.
+
+**Action:** expose **dedicated public bitfield ops** (`cfree_cg_bitfield_load`/`_store`
+taking an explicit `CfreeCgBitField` struct), mirroring the internal shape. Drop the
+bitfield fields from `CfreeCgMemAccess` entirely. Keep `CfreeCgField`'s layout-query
+fields (they answer record-layout queries) but rename for consistency. This removes the
+"every memop is secretly maybe-a-bitfield" branch from `cfree_cg_load`/`_store`
+(`memory.c:420,577,646`).
+
+### 3c. `scale` vs `log2_scale` — **superseded by Track 7**
+The public `CfreeCgEffAddr` rider is removed entirely in Track 7 (its base+index*scale+
+offset job moves into the place representation built by `field`/`elem`). The scale-form
+mismatch disappears with it. No separate action here.
+
+**Affected:** `cg.h`, `cgtarget.h`, `memory.c`, all backends' `load`/`store`/`bitfield_*`,
+the C frontend's bitfield path (`lang/c/parse/cg_adapter.c`).
+**Tests:** bitfield corpus in toy + C; `test-cg-api`.
+
+---
+
+## Track 4 — Fix the op/intrinsic taxonomy
+
+Today "op vs intrinsic" is drawn inconsistently across and within layers:
+- `memcpy`/`memset`: dedicated **public ops**, internal **intrinsics** (`INTRIN_MEMCPY`…).
+- `unreachable`: public **op** documented as "a real terminator, not a side-effect
+ intrinsic" (`cg.h:560`) — yet lowered through the **intrinsic** hook (`control.c:401`,
+ `INTRIN_UNREACHABLE`). Direct doc/impl contradiction.
+- `trap`: public **intrinsic**.
+- `bswap`: **1** public intrinsic but **3** internal (`BSWAP16/32/64`), split by a
+ size test in `api_map_intrinsic` (`arith.c:803-806`).
+
+**The rule (proposed):**
+- **Terminators are first-class `CgTarget` ops** (ret, unreachable, jump, branch,
+ computed_goto, tail-call). Give `unreachable` its own hook and honor its documented
+ terminator status; stop routing it through `intrinsic`.
+- **Primitives that may lower to either an inline sequence or a libcall are intrinsics**
+ (clz/ctz/popcount/bswap/overflow/fma/memcpy/memset). Decide each concept's home once
+ and make public+internal agree. Recommendation: keep `memcpy`/`memset` as dedicated
+ *public* ops (they carry rich `MemAccess`) but stop double-modeling them as a separate
+ public *intrinsic* surface.
+- **Width comes from the operand type, not the opcode.** Collapse `BSWAP16/32/64` → one
+ `BSWAP`; backends read width from the operand. Deletes the size-branch in
+ `api_map_intrinsic`.
+
+### 4b. Façade intrinsics (ties into Track 1)
+`api_map_intrinsic` maps ~16 enumerators (`FMA`, `SYSCALL`, all `IRQ_*`, `DMB`/`DSB`/`ISB`,
+`DCACHE_*`/`ICACHE_*`, `CPU_NOP`/`CPU_YIELD`/`WFI`/`WFE`/`SEV`, `CORO_SWITCH`) → `INTRIN_NONE`,
+and `cfree_cg_intrinsic` turns `INTRIN_NONE` into `compiler_panic("unsupported intrinsic")`
+(`arith.c:884`). The toy frontend calls them in good faith (`builtins.c:507`); the
+expected-error test `test/toy/err/unsupported_cpu_nop.toy` confirms the panic is the
+*current intended behavior*. `CFREE_CG_FP_REM` is the same (`arith.c:573`). And unlike
+call-convs/symbol-features, there is **no `supports_` query for intrinsics**, so a
+frontend cannot check before it panics.
+
+**Action:**
+1. Add `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)`, consistent
+ with `cfree_cg_target_supports_call_conv`/`_symbol_feature`.
+2. Convert the bare `compiler_panic` into a proper unsupported-feature diagnostic.
+3. Implement the trivial single-instruction baremetal/CPU intrinsics on native arches
+ (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) — these are one
+ instruction each and the toy frontend already wants them.
+4. Leave `FMA`/`SYSCALL`/`CORO_SWITCH` reported `false` by the query until implemented;
+ remove `CFREE_CG_FP_REM` (no path, and fp rem is a libcall the frontend can emit).
+
+See §Open decisions #3 (implement-vs-formally-unsupported per intrinsic).
+
+**Affected:** `cg.h`, `cgtarget.h`, `arith.c`, `control.c`, native backends'
+`intrinsic`, `lang/toy/builtins.c`, `test/toy/err/`.
+**Tests:** add `supports_intrinsic` coverage; convert the toy err-cases that become
+supported into positive smoke cases.
+
+---
+
+## Track 5 — Expose multi-result publicly
+
+The internal stack is already multi-result: `CGCallDesc`/`CGFuncDesc`/`ret` carry
+`nresults`/`nvalues`, and backends realize `>1` via `plan_call`/`plan_ret` (no backend
+asserts ≤1). But the public API tops out at one: `CfreeCgFuncSig` has a single `ret`
+(`cg.h:102`), `session.c:318-324` fills `fn_result_types[1]` with 0 or 1, and
+`cfree_cg_call`/`call_symbol` push exactly one result (`call.c:228,287,161`), `cfree_cg_ret`
+pops one (`call.c:316`). **Decision: expose it.** Because backends already handle it,
+this is a public-API + type-system + `value.c` change with **no backend work**.
+
+### API shape
+```c
+/* Symmetric with CfreeCgFuncParam. */
+typedef struct CfreeCgFuncResult { CfreeCgTypeId type; CfreeCgAbiAttrs attrs; } CfreeCgFuncResult;
+
+typedef struct CfreeCgFuncSig {
+ const CfreeCgFuncResult* results; /* was: CfreeCgTypeId ret; CfreeCgAbiAttrs ret_attrs; */
+ uint32_t nresults; /* 0 = void */
+ const CfreeCgFuncParam* params;
+ uint32_t nparams;
+ CfreeCgCallConv call_conv;
+ bool abi_variadic;
+} CfreeCgFuncSig;
+```
+- Type queries: replace `cfree_cg_type_func_ret`/`_ret_attrs` with
+ `cfree_cg_type_func_nresults` + `cfree_cg_type_func_result(idx)`.
+- Type system: `CgType.func` stores `results[]`+`nresults`; interning (`type.c:344`) and
+ `cg_type_func_ret_id` (`type.c:268,827`) updated.
+- `CfreeCg`: `fn_ret_type`/`fn_result_types[1]` → a small results array.
+- **Stack-order convention (must be specified):** results are pushed by `cfree_cg_call`
+ in declaration order, so TOS is the last result; `cfree_cg_ret` pops `nresults` values
+ expecting the same order (last result on top). Document this on both calls.
+- `void` is `nresults==0`; **`cfree_cg_ret_void` is removed** (decision #4): a void
+ function returns via `cfree_cg_ret` with 0 results — one return entry point.
+
+**Affected:** `cg.h`, `type.c`/`type.h`, `session.c`, `call.c`, every frontend's
+func-type construction and `cfree_cg_type_func_ret` caller (C/toy/wasm adapters), wasm
+backend can now surface true multi-value returns; every `cfree_cg_ret_void` caller
+migrates to a 0-result `cfree_cg_ret`.
+**Tests:** new `test-cg-api` + toy cases returning 2 values; wasm multi-value smoke.
+
+---
+
+## Track 6 — Isolate and complete the semantic peephole
+
+The semantic layer is also a `-O0` peephole optimizer, and that is **a feature we keep**
+(free `-O0` perf, Principle 6). This track gives it a named home and restores the half
+that was switched off.
+
+### Current state
+- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, driven from
+ `arith.c:44,126,171`) and the `SV_CMP` fused-compare-into-branch path
+ (`api_make_cmp`/`api_materialize_cmp_to`/`api_branch_if`).
+- **Disabled (not dead-by-design):** the `SV_ARITH` delayed-arith subsystem, gated by
+ `api_can_delay_int_arith()==0`. It was live until `a126bec` flipped it off to ship the
+ EA rider (Track 1b). Track 7 removes that rider.
+- **Live:** scalar store-to-load forwarding (`api_local_const_*`, `value.c:939-1036`).
+
+### Action
+1. **6.2 — Extract the live peephole into `src/cg/fold.c` + `fold.h`** with a documented
+ contract: integer fold helpers, the `SV_CMP` lifecycle (make/release/materialize/
+ branch-fuse), and const-local forwarding with its invalidation boundaries
+ (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The op
+ families (`arith.c`/`memory.c`/`control.c`/`call.c`) call into `fold.h` instead of
+ reaching into `value.c` internals. This also settles `ApiSValue`'s shape before Track 7.
+2. **6.3 — Re-enable delayed arith *after* Track 7** (once the EA rider is gone). Restore
+ the original gate (`g && !flags && api_foldable_int_type(...)`), bring
+ `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith`/the fold-chain +
+ identity-collapse helpers under `fold.c`, and verify the delayed forms now compose with
+ the place/value model (the old conflict was specifically the EA rider). Net `-O0` win:
+ small immediates flow into `binop`, arith chains and identities fold.
+3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** to match the restored, isolated peephole.
+
+**Affected:** `value.c`, `arith.c`, `internal.h`, new `fold.c`/`fold.h`, `doc/CODEGEN.md`.
+**Tests:** `-O0` smoke + opt suites; snapshot-diff to confirm the peephole *improves*
+`-O0` codegen (const-fold, fused compare, delayed arith) with no `-O1+` regression.
+
+---
+
+## Track 7 — Strict place/value discipline (the centerpiece)
+
+**Decided:** Model B (explicit place/value kinds); wide-16 scalars are *values*.
+
+Today the value stack carries an **inferred** lvalue/rvalue distinction and several ops
+accept multiple operand shapes and dispatch on type + shape. A stack slot's meaning is
+*computed*, not declared. The inference points:
+
+- **`api_is_lvalue_sv` is a heuristic** (`value.c:176-180`): ORs the `lvalue` flag,
+ `bitfield_lvalue`, `api_operand_can_address`, and `source_local!=NONE && OPK_LOCAL`.
+- **`cfree_cg_load` has ~7 behaviors, several of which don't load** (`memory.c:436-568`):
+ aggregate-lvalue@0 re-pushed as-is; ptr-rvalue-to-aggregate re-pushed; `OPK_GLOBAL`
+ aggregate/wide16 flips `lvalue=1`; scalar-local returns the local value directly;
+ wide16 keeps storage; then two general lvalue/ptr-rvalue paths.
+- **`load`/`store` `base` accepts 4 shapes** ({lvalue, ptr-rvalue} × {no-index, indexed});
+ there is **no explicit deref** — a pointer base is silently dereferenceable.
+- **`cfree_cg_index` infers pointer-vs-array-lvalue** (`control.c:849-860`);
+ **`cfree_cg_field` infers record-lvalue-vs-pointer** (`control.c:941-952`).
+- **Aggregates are implicitly by-reference and CG decides it** (`call.c:18-42,101-106,
+ 310-315`): the frontend never says "pass by reference"; CG infers it from
+ `cg_type_is_aggregate`.
+- **wide16 (i128/f128) is special-cased** as aggregate-like throughout (`memory.c:504-533`,
+ `call.c:53-66`).
+
+### The discipline
+Every stack entry is exactly one explicit, type-checked kind — no heuristic:
+
+- **PLACE** — an addressable location of a typed object. Representation = the existing
+ addressable operands (`OPK_LOCAL` / `OPK_GLOBAL` / `OPK_INDIRECT(base+index*scale+off)`).
+- **VALUE** — a scalar rvalue: integers, floats, **pointers, and now i128/f128**.
+
+CG keeps owning **layout** (field offsets, element sizes, types — deterministic
+computation from the record/array type). What it stops doing is **guessing the kind or
+passing-mode of a stack value**. Every op declares the kinds it consumes/produces and
+panics on mismatch.
+
+### Op signatures (strict, single-shape)
+| Op | Consumes | Produces | Notes |
+|---|---|---|---|
+| `push_local l` | — | **PLACE** | the local's storage |
+| `push_int/float/null` | — | VALUE | |
+| `push_symbol_addr s,a` | — | VALUE (ptr) | |
+| `push_local_addr l` | — | VALUE (ptr) | sugar for `push_local; addr` |
+| `addr` | PLACE | VALUE (ptr) | address of the place |
+| **`deref`** (NEW) | VALUE (ptr) | PLACE | the explicit ptr→place transition |
+| `field i` | PLACE(record) | PLACE(field) | offset/type from layout; for `->` do `deref; field` |
+| `elem` (was `index`) | VALUE(ptr to T) + index VALUE | PLACE(T) | `*(p+i)`; scale = `sizeof(T)`. Array lvalues decay to ptr first |
+| `load access` | PLACE | VALUE | always dereferences; **no EA rider** |
+| `store access` | PLACE, VALUE | — | always dereferences |
+
+The **`CfreeCgEffAddr` rider is removed** from `load`/`store`: addressing is built
+explicitly by `field`/`elem`/`deref` and absorbed into the `OPK_INDIRECT` place, so the
+backend still receives a single `[base+index*scale+off]` memop. The kept fold layer
+(Track 6) recovers `-O0` quality: `load` of `PLACE(local)` folds to the local (no memory
+round-trip), and a `deref` of a pointer-arith chain folds back into the place's indirect
+form. Per decision #8 this recovery is **desirable but not a gate** — Track 7 may land
+ahead of the peephole work; `-O1+` carries quality.
+
+### Aggregates (values forbidden)
+An aggregate is **always a PLACE**; a VALUE of aggregate type is illegal (panic). Reading
+an aggregate = keeping its place. Copies are explicit (`memcpy` between two places, or
+field-by-field). Call args/returns of aggregate type pass an explicit place, with the
+mode named via the ABI attrs that already exist (`SRET`/`BYVAL`/`BYREF`). This removes
+the aggregate branches from `api_materialize_call_local`, `api_push_call_result`, and the
+aggregate `ret` path — the frontend states the passing mode instead of CG inferring it.
+
+### wide16 (decided: scalar values)
+`i128`/`f128` are VALUES like any scalar; the backend lowers 16-byte storage/moves. The
+wide16 special paths in `memory.c`/`call.c`/`wide.c` collapse into the normal value path
+(plus backend support for 16-byte value moves where not already present).
+
+### Inference points removed
+`api_is_lvalue_sv` (→ a kind tag check); the 7-way `load` cascade (→ one deref+load);
+`load`/`store` 4-shape base (→ one PLACE); `index`/`field` dual-mode (→ `elem` on ptr,
+`field` on place); aggregate auto-by-ref (→ explicit place + ABI attr); wide16 special
+path (→ value path).
+
+**Affected:** `cg.h` (new `deref`, `elem` rename, EA rider removed from `load`/`store`,
+`ApiSValue` kind tag), `value.c` (kind discipline replaces `lvalue`/`api_is_lvalue_sv`),
+`memory.c` (load/store rewritten; `fold_ea_into_operand`/`pop_and_normalize_index`
+folded into place-building), `control.c` (`index`→`elem`, `field`), `call.c` (aggregate
+branches removed), `wide.c` (wide16 path removed), **every frontend** (insert explicit
+`deref`/array-decay where they relied on pointer-base load/store; mark aggregate passing
+modes). Backends mostly unaffected (they already consume `OPK_INDIRECT`).
+**Tests:** this is the highest-blast-radius track — red-green per op, lean on the toy
+corpus and C frontend for *correctness*; snapshot `-O0` codegen to *track* addressing-mode
+recovery (decision #8: `-O0` quality is not a gate, so a temporary regression does not
+block landing).
+
+## Recommended sequencing
+
+Each track is independently shippable and testable. Suggested order by risk/leverage and
+dependency:
+
+1. **Track 1 (remove dead/redundant surface: 1a, 1d) + 1c completeness audit.** Pure
+ subtraction plus filling test gaps; no behavior change. Shrinks the surface.
+2. **Track 6.2 (isolate the live fold layer into `fold.c`).** Settles `ApiSValue`'s shape
+ and makes the fold layer a clean dependency for Track 7.
+3. **Track 7 (place/value discipline).** The centerpiece; removes the EA rider; depends on
+ a solid fold layer. Highest blast radius — do it deliberately, red-green.
+4. **Track 6.3 (re-enable delayed arith).** Now that Track 7 removed the EA rider that
+ killed it; free `-O0` perf, under the isolated `fold.c`.
+5. **Track 3a/3b (MemAccess unify + bitfield-as-PLACE-subkind).** On the strict
+ place-based `load`/`store`. (3c was folded into 7; 3b merges via decision #7.)
+6. **Track 2 + Track 4 (op/intrinsic vocabulary).** Independent of the above; reshape the
+ op/intrinsic vocabulary once.
+7. **Track 5 (multi-result).** Independent; public + type-system + `value.c` only.
+
+## Decisions (all resolved — ready to execute)
+
+1. **`SV_ARITH`: delete or re-enable?** **DECIDED (owner): re-enable** — the vstack
+ peephole is a kept feature for free `-O0` perf. It was disabled by `a126bec` to ship
+ the EA rider, which Track 7 removes; restore + isolate under Track 6.2/6.3.
+2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public
+ `CfreeCg*` op enums directly; delete internal `BinOp`/`UnOp`/`CmpOp`/`AtomicOp`/
+ `MemOrder`/`AsmDir` + all `api_map_*`. Couples the internal contract to public enum
+ values (accepted for an in-repo contract).
+3. **Façade intrinsics: query + implement the trivial ones.** Add
+ `cfree_cg_target_supports_intrinsic` + a clean diagnostic; implement the
+ single-instruction baremetal/CPU intrinsics on native arches; report
+ `FMA`/`SYSCALL`/`CORO_SWITCH` false until built; remove `FP_REM`.
+4. **`cfree_cg_ret_void`: fold into `ret`.** Remove `ret_void`; a void function returns
+ via `cfree_cg_ret` with 0 results — a single return entry point.
+5. **`NONTEMPORAL`/`INVARIANT`/alias scopes: remove now.** Drop them from
+ `CfreeCgMemAccess`; re-add with a real internal carrier + backend consumer when needed.
+
+**Track 7 model (decided earlier):** Model B (explicit PLACE/VALUE kinds + `deref`;
+aggregate values forbidden); wide-16 scalars are values; the `CfreeCgEffAddr` rider is
+removed.
+
+6. **`elem` operand shape: pointer VALUE + explicit array-decay.** `elem` consumes a
+ pointer VALUE (`*(p+i)`); array lvalues decay via an explicit PLACE(array)→VALUE(ptr)
+ op. One shape, no dual-mode.
+7. **Bitfields: PLACE subkind.** A bitfield is a PLACE subkind carrying the descriptor;
+ the normal `load`/`store` perform the extract/insert. Merges Track 3b into Track 7.
+8. **`-O0` quality: not a gate.** Track 7 may land the cleaner semantics even with `-O0`
+ codegen regressions; `-O1+` carries quality. The vstack peephole (Track 6.3) is still
+ restored for the free `-O0` win, but it does **not** block Track 7.