commit 20c4b046d8cbce3704e010a55002c6bc7f37f613
parent aa77e40659f4596f238bec93a379edf8c3f2476c
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 1 Jun 2026 19:44:42 -0700
plan: rework CODEGEN.md to record landed tracks + focus on what remains
Tracks 1a/1d, 5, 3a, FP_REM, and the AsmDir + Atomic/MemOrder enum slices of
Track 2 are committed and green; condense them into a Done table and replace the
per-track write-ups with the remaining work, sharpened with what executing the
landed slices surfaced: the binop/cmp split design (separate hooks + doubled IR
opcodes + fold-layer restructure, the int/fp decision already made by type), the
multi-result -O1/wasm follow-up, the cg_adapter.h enum-duplication hazard, and
the bswap width-from-type feasibility. Reorders the remaining sequence.
Diffstat:
| M | doc/plan/CODEGEN.md | | | 658 | ++++++++++++++++++++++++++++++++----------------------------------------------- |
1 file changed, 268 insertions(+), 390 deletions(-)
diff --git a/doc/plan/CODEGEN.md b/doc/plan/CODEGEN.md
@@ -1,359 +1,268 @@
-# Codegen Interface Cleanup — Roadmap
+# Codegen Interface Cleanup — Roadmap (remaining work)
-Status: **decided — ready to execute** (all eight decisions resolved in §Decisions). Forward-looking
-companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal: make the
-`CfreeCg` public API and the internal `CgTarget` contract carry **one clear
-representation per concept, with no advertise-but-ignore surface and no façade**.
-Breaking and sweeping changes are in scope; reducing churn is *not* a priority.
+Status: **partially executed.** The independent, lower-risk tracks are landed and committed
+(see §Done). What remains is the high-blast-radius work: the **binop/cmp op-split** (the
+rest of Track 2), the **op/intrinsic taxonomy** (Track 4), the **fold-layer isolation**
+(Track 6), and the **PLACE/VALUE centerpiece** (Track 7, with Track 3b folded in).
+
+Forward-looking companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal:
+make the `CfreeCg` public API and the internal `CgTarget` contract carry **one clear
+representation per concept, with no advertise-but-ignore surface and no façade**. Breaking
+and sweeping changes are in scope; reducing churn is *not* a priority.
The centerpiece is **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's
-inference of what a stack slot *means* (lvalue vs rvalue, by-value vs by-reference). Its
-core is decided (Model B; see §Track 7); the other tracks orbit it.
+inference of what a stack slot *means*. Its core is decided (Model B; see §Track 7).
## Scope
Two stacked interfaces (see [doc/CODEGEN.md §The two boundaries](../CODEGEN.md)):
- **Public** `cfree_cg_*` / `CfreeCg` (`include/cfree/cg.h`) — a value-stack machine.
-- **Internal** `CgTarget` (`src/cg/cgtarget.h`) — a three-address operand vtable the
- 5 realizations implement (native `-O0`, IR recorder, C-source, wasm).
+- **Internal** `CgTarget` (`src/cg/cgtarget.h`) — a three-address operand vtable. NOTE the
+ op enums also flow into the **physical** `NativeTarget` (`src/arch/native_target.h`,
+ which `#include`s `cgtarget.h`), the recorder IR (`src/cg/ir.h`) and opt IR
+ (`src/opt/ir.h`), and the interpreter (`src/interp/`). Any enum change touches all of
+ these layers, not just the semantic vtable.
Between them sits the translation layer (`src/cg/value.c`, `arith.c`, `memory.c`,
-`control.c`, `call.c`), which also performs `-O0` constant folding and compare
-fusion. Almost every defect below lives at the **seam** between the two models —
-duplicated, mapped lossily, or advertised on one side and dropped on the other.
-
-## Principles we are enforcing
-
-1. **One representation per concept.** No concept should exist in two structs or two
- enums that must be hand-kept in sync.
-2. **No advertise-but-ignore.** If a field/flag is in a public struct, it is either
- honored or it does not exist.
-3. **No façade.** A public enumerator that always panics is a bug. Either implement
- it, or remove it, or gate it behind a capability query and a clean diagnostic —
- consistent with how call-convs and symbol features already work.
+`control.c`, `call.c`), which also performs `-O0` constant folding and compare fusion.
+
+## Principles we are enforcing (unchanged)
+
+1. **One representation per concept.** No concept in two structs/enums hand-kept in sync.
+2. **No advertise-but-ignore.** A public field/flag is honored or it does not exist.
+3. **No façade.** A public enumerator that always panics is a bug — implement it, remove
+ it, or gate it behind a capability query + clean diagnostic.
4. **Width belongs to the type, not the opcode.** `bswap` is one operation, not three.
5. **Ops vs intrinsics has a stated rule** (§Track 4) and both layers obey it.
-6. **The semantic layer may peephole, but that responsibility is named and isolated**,
- not smeared across the op families. The vstack peephole is a *feature* (free `-O0`
- perf), kept and maintained — not removed.
-7. **Completeness over minimalism.** Keep an op/enumerator that has a distinct, sensible
- meaning and completes an orthogonal set — *even with no current caller*. Judge a
- surface by whether it is consistent and complete on its own terms, not by present
- usage. Remove only the *redundant*: two spellings of one behavior.
+6. **The semantic layer may peephole, but that responsibility is named and isolated.** The
+ vstack peephole is a kept feature (free `-O0` perf), not removed.
+7. **Completeness over minimalism.** Keep an op/enumerator with a distinct, sensible
+ meaning that completes an orthogonal set — even with no caller. Remove only the
+ *redundant*: two spellings of one behavior.
---
-## Track 1 — Remove dead/redundant surface
-
-Genuine deletions are 1a (unreachable) and 1d (redundant). Applying Principle 7, **1b
-and 1c are NOT deletions**: 1b (the vstack peephole) is a kept feature to *re-enable*
-(now under Track 6); 1c (conditional control ops) is a complete set we keep and finish.
-The net deletions here are pure subtraction with no behavior change.
-
-### 1a. `SCOPE_IF` / `CGScopeDesc.cond` / `scope_else` are unreachable
-`cfree_cg_scope_begin` → `SCOPE_LOOP`, `cfree_cg_block_begin` → `SCOPE_BLOCK`
-(`control.c:525,529`). **Nothing ever produces `SCOPE_IF`.** Therefore:
-- `CGScopeDesc.cond` is never read for a real value.
-- `scope_else` (implemented in `ir_recorder.c:625`, `native_direct_target.c:1792`,
- recorded in IR `ir_dump.c:52`/`ir_print.c:94`) is never invoked. `nd_scope_else`
- guards `if (s->kind != SCOPE_IF) panic` — which would *always* fire.
-- wasm's native `SCOPE_IF` handling (`arch/wasm/emit.c`) is dead; `c_emit.c:1570`
- already documents "Public API doesn't emit SCOPE_IF."
-
-**Action:** remove `SCOPE_IF`, `CGScopeDesc.cond`, the `scope_else` vtable member and
-all 3+ implementations, and the dead wasm/native branches. `if` keeps lowering as two
-nested `SCOPE_BLOCK` + `break_false` (the `cfree_cg_if_*` helpers).
-
-### 1b. Vstack peephole (`SV_ARITH`) — **keep & re-enable, moved to Track 6**
-`api_can_delay_int_arith()` returns `0` (`value.c:1038`), gating off the delayed-arith
-peephole (`api_make_arith_binop`/`_unop`, `api_materialize_arith_to`, `api_release_arith`,
-`api_try_fold_arith_chain`, `api_try_collapse_binop_identity`, `api_try_fold_unary_chain`,
-the `a_owned`/`b_owned` bookkeeping). It is **disabled, not dead-by-design**: git shows it
-was live (`g && !flags && api_foldable_int_type(...)`) until commit `a126bec` ("extend
-memory ops with effective-address rider") flipped it to `return 0` — the EA rider and the
-delayed forms fought (see the "re-fetch in case alloc materialized a delayed expression"
-workarounds at `memory.c:339`, `control.c:886`). **Decision (owner): keep the vstack
-peephole — it is free `-O0` perf.** Since Track 7 *removes* the EA rider, re-enabling is
-clean. Restoring the original gate + isolating the whole peephole is now **Track 6.3**;
-[doc/CODEGEN.md:76-79](../CODEGEN.md) (which documents delayed arith as live) becomes
-correct again.
-
-### 1c. Conditional control ops — **keep (complete set), per Principle 7**
-`break_true`/`continue_true`/`continue_false` have 0 callers and `break_false` has 1, but
-usage is not the test. The set is the orthogonal cross-product
-`{break, continue} × {unconditional, _true, _false}` — the structured-scope analog of the
-unstructured `branch_true`/`branch_false`, letting a frontend say "exit/continue this
-scope if cond" without materializing a separate branch+label. Its result-carrying
-semantics are well-defined (`cg.h:474-482`: `break_true` on an expression scope is
-`[result, bool] → pop bool; if true pop result and exit`). Deleting only the unused
-arms would make the API *incomplete and asymmetric* — exactly what we are fixing.
-**Action:** keep the full set; **audit it for completeness** (confirm continue is
-rejected on non-loop scopes, that block vs loop scope rules are uniform, and that every
-arm has a test). `cfree_cg_block_begin` (0 direct callers, used via `cfree_cg_if_*`) is a
-distinct, sensible primitive — keep.
-
-### 1d. `CFREE_CG_TAIL_NEVER` — remove (redundant, not incomplete)
-Documented as "Treated as DEFAULT" (`cg.h:813`): a second spelling of `DEFAULT` with
-identical semantics. Unlike 1c, removing it *increases* consistency (no two enumerators
-mean the same thing). **Action:** remove; "no tail" is `CFREE_CG_TAIL_DEFAULT`.
-
-**Affected (1a/1d):** `cg.h`, `cgtarget.h`, `control.c`, `ir_recorder.c`,
-`native_direct_target.c`, `arch/wasm/emit.c`, `arch/c_target/c_emit.c`, `ir_dump.c`,
-`opt/ir_print.c`.
-**Tests:** existing control-flow + toy + wasm suites stay green (1a/1d are no-behavior
-deletions); **add** the missing-arm coverage for 1c (each `break_*`/`continue_*` variant
-exercised end-to-end on a backend).
+## Done (committed, all green: lib · toy 1344/0 · cg-api · smoke x64/rv64 · opt · isa · libc; `make bootstrap` reproduces at -O0 AND -O1)
+
+| Commit | Track | Summary |
+|---|---|---|
+| `e27a288` | **1a / 1d** | Removed `SCOPE_IF` / `CGScopeDesc.cond` / the `scope_else` hook (both IR opcodes `CG_IR_SCOPE_ELSE` + `IR_SCOPE_ELSE`, all 5 realizations, the `desc.cond` opt walkers; ~22 files). Removed `CFREE_CG_TAIL_NEVER` (redundant with `DEFAULT`). |
+| `ae8d0f6` | **5** | Multi-result public API: `CfreeCgFuncSig.results[]`/`nresults` (+ `CfreeCgFuncResult`), `cfree_cg_type_func_nresults`/`_result`, `cfree_cg_ret_void` removed (void = 0-result `cfree_cg_ret`). Type system stores `results[]`; `cfree_cg_call` pushes/`cfree_cg_ret` pops in declaration order (last result on TOS). **Includes a self-host regression fix:** a no-value return on a *non-void* function (UB fall-off) now emits `cfree_cg_unreachable` instead of underflowing the value stack (`pcg_ret` in `lang/c/parse/cg_adapter.c`). |
+| `fabf255` | **3a** | Dropped `CFREE_CG_MEM_NONTEMPORAL`/`_INVARIANT` + `CfreeCgMemAccess.alias_scope`/`noalias_scope` (decision #5) and the matching toy attributes. |
+| `5e1335d` | **4 (FP_REM)** | Removed the `CFREE_CG_FP_REM` façade (always-panic; only dead callers). FP remainder is a libcall the frontend emits. |
+| `917ffe9` | **2 (AsmDir)** | Deleted internal `AsmDir` + `api_map_asm_dir`; `AsmConstraint.dir` and backends use public `CfreeCgAsmDir`. |
+| `a2f6367` | **2 (Atomic/Order)** | Deleted internal `AtomicOp`/`MemOrder` + `api_map_atomic_op`/`api_map_mem_order`; **both** the semantic `CgTarget` and physical `NativeTarget` atomic hooks, the recorder+opt IR aux, and the interpreter now carry public `CfreeCgAtomicOp`/`CfreeCgMemOrder`. |
+
+So **Tracks 1a/1d, 5, 3a are done; Track 2 is 2/3 done** (the 3 identical enums); **Track 4**
+has FP_REM removed.
+
+### Caveats / follow-ups discovered while doing the above
+- **Track 5 multi-result is single-result-complete only.** The `-O0` native path handles
+ `nresults > 1`, but the **opt path** (`src/opt/cg_ir_lower.c`, the `CG_IR_CALL`/`CG_IR_RET`
+ lowering) still only threads `results[0]` — a true 2+-result function is lossy at `-O1`.
+ The **wasm frontend** (`lang/wasm/cg.c`) was also migrated as single-result (takes
+ `f->results[0]`). True multi-value end-to-end (wasm + `-O1`) is unfinished follow-up.
+- **The C frontend keeps its own private copies** of `BinOp`/`AtomicOp`/`MemOrder`/
+ `IntrinKind` in `lang/c/parse/cg_adapter.h`. These are a **separate Principle-1 issue**,
+ deliberately left alone by Track 2 (they're a different namespace; do not blind-rename
+ `AO_*`/`MO_*`/`BO_*` across `lang/`). Worth a follow-up to dedupe against the public enums.
+- **Regression lesson** (in [[doc/plan/BOOTSTRAP.md]] / the self-build): removing a "bare
+ return that ignores result count" primitive means every frontend's *fall-off / default*
+ return must push the right number of values or terminate with `unreachable`. Audit other
+ frontends if you remove more return primitives.
---
-## Track 2 — Unify the op-enum vocabulary
+## Track 1c — Conditional control ops: completeness audit + tests (REMAINING, small)
-Every operation enum exists twice and is hand-mapped 1:1 in `value.c`:
+KEEP the full `{break, continue} × {unconditional, _true, _false}` set (Principle 7;
+`break_true`/`continue_true`/`continue_false` have 0 callers, `break_false` 1, but the set
+is the structured analog of `branch_true`/`branch_false`). Remaining work is **test
+coverage + an audit**, not code change:
+- Confirm `continue*` is rejected on non-loop scopes and block-vs-loop rules are uniform.
+- Add an end-to-end test for each `break_*`/`continue_*` variant on a backend (the
+ result-carrying semantics are spec'd at `cg.h` `cfree_cg_break_true` &c).
-| Public (`cg.h`) | Internal (`cgtarget.h`) | Relationship | Mapper |
-|---|---|---|---|
-| `CfreeCgAtomicOp` (7) | `AtomicOp` (7) | **identical** | `api_map_atomic_op` |
-| `CfreeCgMemOrder` (6) | `MemOrder` (6) | **identical** | `api_map_mem_order` |
-| `CfreeCgAsmDir` (3) | `AsmDir` (3) | **identical** | `api_map_asm_dir` |
-| `CfreeCgIntBinOp`+`CfreeCgFpBinOp` | `BinOp` | split→merged | `api_map_int_binop`/`api_map_fp_binop` |
-| `CfreeCgIntCmpOp`(10)+`CfreeCgFpCmpOp`(12) | `CmpOp` (14) | split→merged, **lossy** | `api_map_int_cmp`/`api_map_fp_cmp` |
-| `CfreeCgIntUnOp`+`CfreeCgFpUnOp` | `UnOp` | split→merged | `api_map_int_unop` |
-
-Two concrete defects:
-- The split→merge→split round-trip earns nothing: every native backend re-splits
- int/fp immediately (`aa64/native.c:2070`, `x64/native.c:906`).
-- The merge is **lossy**: `api_map_fp_cmp` collapses `OEQ`/`UEQ`→`CMP_EQ` (`value.c:648-668`)
- so the public ordered/unordered distinction cannot survive to a backend; and
- `api_map_fp_binop` maps `CFREE_CG_FP_REM`→`BO_FDIV` (`value.c:605`), which is dead
- *and* wrong-looking.
-
-**Decision (recommended):** `CgTarget` consumes the public `CfreeCg*` op enums directly.
-Delete the parallel internal `BinOp`/`UnOp`/`CmpOp`/`AtomicOp`/`MemOrder`/`AsmDir` and
-every `api_map_*` (~200 lines of `value.c`). `cgtarget.h` already `#include`s `cfree/cg.h`,
-so this is mechanical. Keep the public **int/fp split** (it is the clearer API and
-matches what backends do anyway); backends switch on `CfreeCgIntBinOp` and
-`CfreeCgFpBinOp` separately. This is a single-repo internal contract, not a published
-backend ABI, so coupling it to the public enum values is acceptable. See §Open
-decisions #2 for the split-vs-merged confirmation.
-
-**Affected:** `cgtarget.h` (enum deletions + signature changes on `binop`/`unop`/`cmp`/
-`atomic_*`/`fence`/`asm_block`), all 5 backends' switch sites, `ir_recorder.c` +
-`opt/` IR (the recorded op field changes type), `value.c`/`arith.c`/`atomic.c`/`asm.c`.
-**Tests:** ISA encode/decode (`test-isa`, `test-arch`), opt, smoke; add a case that
-exercises an unordered FP compare end-to-end (currently lossy).
+---
+
+## Track 2 (remaining) — Split the merged `BinOp`/`UnOp`/`CmpOp`
+
+The 3 *identical* enums (Atomic/Order/AsmDir) are done. What remains is the **split→merged**
+trio, which is the structural core of Track 2 (the largest remaining mechanical change):
+
+| Public (`cg.h`) | Internal (`cgtarget.h`) | Relationship |
+|---|---|---|
+| `CfreeCgIntBinOp`(13) + `CfreeCgFpBinOp`(4) | `BinOp` | split→merged |
+| `CfreeCgIntCmpOp`(10) + `CfreeCgFpCmpOp`(12) | `CmpOp`(14) | split→merged, **lossy** |
+| `CfreeCgIntUnOp`(3) + `CfreeCgFpUnOp`(1) | `UnOp` | split→merged |
+
+**Why it matters:** the merge is **lossy** — `api_map_fp_cmp` collapses `OEQ`/`UEQ` → one
+`CMP_EQ` (`value.c`), so the public ordered/unordered FP-compare distinction cannot reach a
+backend. Fixing that is the real correctness win; the binop/unop dedup is consistency.
+
+**Decision (#2): `CgTarget` consumes the public split enums directly; backends switch on
+`CfreeCgIntBinOp` and `CfreeCgFpBinOp` separately.** Delete `BinOp`/`UnOp`/`CmpOp` and
+`api_map_int_binop`/`api_map_fp_binop`/`api_map_int_unop`/`api_map_int_cmp`/`api_map_fp_cmp`.
+
+### Why this is bigger than the atomic slice — the design to implement
+Unlike Atomic/Order (a 1:1 value-preserving *rename*), this is a genuine **split**:
+
+1. **Hooks split** (`cgtarget.h`, mirrored in `native_target.h` if any binop/cmp is physical
+ — check; binop/cmp are semantic `CgTarget` hooks): `binop`→`int_binop`/`fp_binop`,
+ `unop`→`int_unop`/`fp_unop`, `cmp`→`int_cmp`/`fp_cmp`, `cmp_branch`→
+ `int_cmp_branch`/`fp_cmp_branch`.
+2. **IR opcodes double** — the recorder (`src/cg/ir.h`) and opt IR (`src/opt/ir.h`) store the
+ op in `extra.imm`/aux; a single `CG_IR_BINOP`/`IR_BINOP` can't hold an ambiguous value
+ (`CFREE_CG_INT_ADD == CFREE_CG_FP_ADD == 0`). Either split the opcodes
+ (`CG_IR_INT_BINOP`/`CG_IR_FP_BINOP`, …) **or** add an `is_fp` discriminator bit. Splitting
+ the opcodes is cleaner; both touch `ir_recorder.c`, `cg_ir_lower.c`, `pass_native_emit.c`,
+ `ir_dump.c`/`ir_print.c`, and every opt pass that switches on `IR_BINOP`/`IR_CMP`/
+ `IR_CMP_BRANCH` (`pass_combine`, `pass_simplify`, `pass_o2`, `pass_jump`, …).
+3. **Fold layer restructures** (`arith.c` + `value.c`). `api_cg_binop(BinOp)` /
+ `api_cg_unop(UnOp)` / `api_cg_cmp(CmpOp)` are the shared dispatch. Note the int/fp split
+ is **already made by TYPE** (`api_type_is_float`), not by the enum — so splitting the
+ dispatch is natural: the int path keeps the fold (`api_try_fold_int_binop`/`_unop`/`_cmp`,
+ int-only) and the delayed forms (`SV_ARITH` arith, `SV_CMP` compare; `ApiDelayedArith.bin_op`/
+ `un_op`, `ApiDelayedCmp.op`, `api_make_cmp`, `api_materialize_cmp_to`, `api_invert_cmp`,
+ `api_branch_if`); the fp path is simpler (the f128 helper path in `cfree_cg_fp_binop`
+ already exists, plus the fp hook). **This is the subtle, high-risk part** — get the
+ delayed-compare fusion + constant-fold right per int/fp. Coordinate with Track 6.2 (which
+ moves these into `fold.c`); doing 6.2 first may make this cleaner.
+4. **Backends split their switches:** the 3 native arches (`aa64`/`x64`/`rv64` `native.c` —
+ they already re-split int/fp internally), `c_target/c_emit.c`, `wasm/emit.c`, and the
+ interpreter (`interp/engine.c`).
+
+**Method that worked for the atomic slice:** delete the internal enum + change the hook
+signatures, then let `-Werror` enumerate every cg-side site (the C frontend's `cg_adapter.h`
+copy won't be flagged — it's a different type). Then fix per file. For the value-label
+renames, sed **only** within `src/cg|arch|opt|interp` (never `lang/`, never `src/wasm/`).
+
+**Tests:** `test-isa`/`test-arch` (encode/decode), `test-opt`, smoke; **add an unordered FP
+compare exercised end-to-end** (the currently-lossy case) — that's the regression guard for
+the real fix.
---
-## Track 3 — Unify duplicated representations
-
-### 3a. Two `MemAccess` structs + advertise-but-ignore flags
-Public `CfreeCgMemAccess` vs internal `MemAccess`, with non-overlapping flag enums
-(`CfreeCgMemAccessFlag` vs `MemFlag`). `api_mem_from_access` (`value.c:284-295`)
-translates only `VOLATILE`; **`NONTEMPORAL` and `INVARIANT` are silently dropped**, and
-`alias_scope`/`noalias_scope` are **never read** by anything.
-
-**Action:** either (a) carry `NONTEMPORAL`/`INVARIANT` through to an internal carrier
-and into at least one backend, or (b) remove them from `CfreeCgMemAccess`. Remove
-`alias_scope`/`noalias_scope` until there is a consumer. Keep one access struct as the
-source of truth; derive the internal one by a single documented projection (not a
-parallel hand-maintained type). Recommendation: (b) remove now — no frontend sets them
-except toy, and there is no internal model for them.
-
-### 3b. Bitfields exist in three representations
-- Public rider on `CfreeCgMemAccess` (`bit_offset`/`bit_width`/`storage_size`/`bit_signed`).
-- Public rider on `CfreeCgField` (`bit_width`/`bit_offset`/`bit_storage_size`/`bit_signed`
- — note `storage_size` vs `bit_storage_size` naming drift).
-- Internal dedicated `BitFieldAccess` + `bitfield_load`/`bitfield_store`.
-
-The public load/store carry 4 bitfield fields that most callers zero, bridged by
-`bf_from_access` (`memory.c:364`) into the dedicated internal path.
-
-**Action:** expose **dedicated public bitfield ops** (`cfree_cg_bitfield_load`/`_store`
-taking an explicit `CfreeCgBitField` struct), mirroring the internal shape. Drop the
-bitfield fields from `CfreeCgMemAccess` entirely. Keep `CfreeCgField`'s layout-query
-fields (they answer record-layout queries) but rename for consistency. This removes the
-"every memop is secretly maybe-a-bitfield" branch from `cfree_cg_load`/`_store`
-(`memory.c:420,577,646`).
-
-### 3c. `scale` vs `log2_scale` — **superseded by Track 7**
-The public `CfreeCgEffAddr` rider is removed entirely in Track 7 (its base+index*scale+
-offset job moves into the place representation built by `field`/`elem`). The scale-form
-mismatch disappears with it. No separate action here.
+## Track 3b — Bitfields as a PLACE subkind (REMAINING — do with/after Track 7)
+
+Three representations today: a rider on `CfreeCgMemAccess`
+(`bit_offset`/`bit_width`/`storage_size`/`bit_signed`), a rider on `CfreeCgField`
+(`bit_width`/`bit_offset`/`bit_storage_size`/`bit_signed` — note `storage_size` vs
+`bit_storage_size` drift), and internal `BitFieldAccess` + `bitfield_load`/`_store`.
+
+**Decision (#7): a bitfield is a PLACE subkind** carrying the descriptor; the normal
+`load`/`store` perform the extract/insert. This merges into Track 7 (it depends on the
+place model). Drop the bitfield fields from `CfreeCgMemAccess`; keep `CfreeCgField`'s
+layout-query fields but fix the naming drift. Removes the "every memop is secretly
+maybe-a-bitfield" branch in `cfree_cg_load`/`_store`.
**Affected:** `cg.h`, `cgtarget.h`, `memory.c`, all backends' `load`/`store`/`bitfield_*`,
-the C frontend's bitfield path (`lang/c/parse/cg_adapter.c`).
-**Tests:** bitfield corpus in toy + C; `test-cg-api`.
+`lang/c/parse/cg_adapter.c`. **Tests:** bitfield corpus in toy + C; `test-cg-api`.
---
-## Track 4 — Fix the op/intrinsic taxonomy
-
-Today "op vs intrinsic" is drawn inconsistently across and within layers:
-- `memcpy`/`memset`: dedicated **public ops**, internal **intrinsics** (`INTRIN_MEMCPY`…).
-- `unreachable`: public **op** documented as "a real terminator, not a side-effect
- intrinsic" (`cg.h:560`) — yet lowered through the **intrinsic** hook (`control.c:401`,
- `INTRIN_UNREACHABLE`). Direct doc/impl contradiction.
-- `trap`: public **intrinsic**.
-- `bswap`: **1** public intrinsic but **3** internal (`BSWAP16/32/64`), split by a
- size test in `api_map_intrinsic` (`arith.c:803-806`).
-
-**The rule (proposed):**
-- **Terminators are first-class `CgTarget` ops** (ret, unreachable, jump, branch,
- computed_goto, tail-call). Give `unreachable` its own hook and honor its documented
- terminator status; stop routing it through `intrinsic`.
-- **Primitives that may lower to either an inline sequence or a libcall are intrinsics**
- (clz/ctz/popcount/bswap/overflow/fma/memcpy/memset). Decide each concept's home once
- and make public+internal agree. Recommendation: keep `memcpy`/`memset` as dedicated
- *public* ops (they carry rich `MemAccess`) but stop double-modeling them as a separate
- public *intrinsic* surface.
-- **Width comes from the operand type, not the opcode.** Collapse `BSWAP16/32/64` → one
- `BSWAP`; backends read width from the operand. Deletes the size-branch in
- `api_map_intrinsic`.
-
-### 4b. Façade intrinsics (ties into Track 1)
+## Track 4 (remaining) — op/intrinsic taxonomy
+
+FP_REM removal is done. Remaining:
+
+### 4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP`
+Internal `IntrinKind` has 3 bswaps; public has 1 (`CFREE_CG_INTRIN_BSWAP`). `api_map_intrinsic`
+(`arith.c`) picks the internal one by `abi_cg_sizeof(result_type)`. **Feasible to collapse:**
+`NativeLoc` carries `.type` and `NativeTarget` has `t->c->abi`, so backends derive width from
+`dsts[0].type` (the result type — same source the size-branch uses). Collapse = wrap each
+backend's three existing sequences under a `switch(width)`; preserve the sequences verbatim.
+Touches `cgtarget.h` (enum), `arith.c` (drop the size-branch), and the bswap cases in
+`aa64`/`x64`/`rv64` `native.c`, `interp/engine.c`, `c_target/c_emit.c`, and **wasm
+(`arch/wasm/emit.c`, multi-site ~1577/1708/2894/3113 + capability path)**. NOTE the C
+frontend's `cg_adapter.h` has its own `INTRIN_BSWAP16/32/64`; leave it (it maps to the public
+single `BSWAP` at the call site). Pure internal dedup — public API unchanged.
+
+### 4b. `unreachable` as a first-class terminator hook
+`cfree_cg_unreachable` is documented "a real terminator, not a side-effect intrinsic"
+(`cg.h`) but is routed through the **intrinsic** hook (`control.c`, `INTRIN_UNREACHABLE`).
+Give it its own `CgTarget` hook + its own IR op (recorder + opt), and move the 5 backends'
+`INTRIN_UNREACHABLE` handling onto it. (Terminators are first-class: ret, unreachable, jump,
+branch, computed_goto, tail-call.)
+
+### 4c. Façade intrinsics: query + implement the trivial ones
`api_map_intrinsic` maps ~16 enumerators (`FMA`, `SYSCALL`, all `IRQ_*`, `DMB`/`DSB`/`ISB`,
`DCACHE_*`/`ICACHE_*`, `CPU_NOP`/`CPU_YIELD`/`WFI`/`WFE`/`SEV`, `CORO_SWITCH`) → `INTRIN_NONE`,
-and `cfree_cg_intrinsic` turns `INTRIN_NONE` into `compiler_panic("unsupported intrinsic")`
-(`arith.c:884`). The toy frontend calls them in good faith (`builtins.c:507`); the
-expected-error test `test/toy/err/unsupported_cpu_nop.toy` confirms the panic is the
-*current intended behavior*. `CFREE_CG_FP_REM` is the same (`arith.c:573`). And unlike
-call-convs/symbol-features, there is **no `supports_` query for intrinsics**, so a
-frontend cannot check before it panics.
-
-**Action:**
-1. Add `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)`, consistent
- with `cfree_cg_target_supports_call_conv`/`_symbol_feature`.
-2. Convert the bare `compiler_panic` into a proper unsupported-feature diagnostic.
-3. Implement the trivial single-instruction baremetal/CPU intrinsics on native arches
- (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) — these are one
- instruction each and the toy frontend already wants them.
-4. Leave `FMA`/`SYSCALL`/`CORO_SWITCH` reported `false` by the query until implemented;
- remove `CFREE_CG_FP_REM` (no path, and fp rem is a libcall the frontend can emit).
-
-See §Open decisions #3 (implement-vs-formally-unsupported per intrinsic).
+and `cfree_cg_intrinsic` turns `INTRIN_NONE` into a bare `compiler_panic` (`arith.c`). The toy
+frontend calls them in good faith; `test/toy/err/unsupported_*` encode the panic as current
+behavior. There is **no `supports_` query for intrinsics**.
-**Affected:** `cg.h`, `cgtarget.h`, `arith.c`, `control.c`, native backends'
-`intrinsic`, `lang/toy/builtins.c`, `test/toy/err/`.
-**Tests:** add `supports_intrinsic` coverage; convert the toy err-cases that become
-supported into positive smoke cases.
+1. Add `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)` (mirror
+ `cfree_cg_target_supports_call_conv`/`_symbol_feature`). Needs a per-arch capability source.
+2. Convert the bare `compiler_panic` into a proper unsupported-feature diagnostic.
+3. Implement the trivial single-instruction baremetal/CPU intrinsics on the native arches
+ (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) — one instruction each;
+ convert the corresponding `test/toy/err/` cases to positive smoke cases.
+4. Leave `FMA`/`SYSCALL`/`CORO_SWITCH` reported `false` until implemented.
----
+Also settle: keep `memcpy`/`memset` as dedicated *public* ops (they carry rich `MemAccess`)
+but stop double-modeling them as a separate public *intrinsic* surface.
-## Track 5 — Expose multi-result publicly
-
-The internal stack is already multi-result: `CGCallDesc`/`CGFuncDesc`/`ret` carry
-`nresults`/`nvalues`, and backends realize `>1` via `plan_call`/`plan_ret` (no backend
-asserts ≤1). But the public API tops out at one: `CfreeCgFuncSig` has a single `ret`
-(`cg.h:102`), `session.c:318-324` fills `fn_result_types[1]` with 0 or 1, and
-`cfree_cg_call`/`call_symbol` push exactly one result (`call.c:228,287,161`), `cfree_cg_ret`
-pops one (`call.c:316`). **Decision: expose it.** Because backends already handle it,
-this is a public-API + type-system + `value.c` change with **no backend work**.
-
-### API shape
-```c
-/* Symmetric with CfreeCgFuncParam. */
-typedef struct CfreeCgFuncResult { CfreeCgTypeId type; CfreeCgAbiAttrs attrs; } CfreeCgFuncResult;
-
-typedef struct CfreeCgFuncSig {
- const CfreeCgFuncResult* results; /* was: CfreeCgTypeId ret; CfreeCgAbiAttrs ret_attrs; */
- uint32_t nresults; /* 0 = void */
- const CfreeCgFuncParam* params;
- uint32_t nparams;
- CfreeCgCallConv call_conv;
- bool abi_variadic;
-} CfreeCgFuncSig;
-```
-- Type queries: replace `cfree_cg_type_func_ret`/`_ret_attrs` with
- `cfree_cg_type_func_nresults` + `cfree_cg_type_func_result(idx)`.
-- Type system: `CgType.func` stores `results[]`+`nresults`; interning (`type.c:344`) and
- `cg_type_func_ret_id` (`type.c:268,827`) updated.
-- `CfreeCg`: `fn_ret_type`/`fn_result_types[1]` → a small results array.
-- **Stack-order convention (must be specified):** results are pushed by `cfree_cg_call`
- in declaration order, so TOS is the last result; `cfree_cg_ret` pops `nresults` values
- expecting the same order (last result on top). Document this on both calls.
-- `void` is `nresults==0`; **`cfree_cg_ret_void` is removed** (decision #4): a void
- function returns via `cfree_cg_ret` with 0 results — one return entry point.
-
-**Affected:** `cg.h`, `type.c`/`type.h`, `session.c`, `call.c`, every frontend's
-func-type construction and `cfree_cg_type_func_ret` caller (C/toy/wasm adapters), wasm
-backend can now surface true multi-value returns; every `cfree_cg_ret_void` caller
-migrates to a 0-result `cfree_cg_ret`.
-**Tests:** new `test-cg-api` + toy cases returning 2 values; wasm multi-value smoke.
+**Affected:** `cg.h`, `cgtarget.h`, `arith.c`, `control.c`, native backends' `intrinsic`,
+`lang/toy/builtins.c`, `test/toy/err/`.
---
## Track 6 — Isolate and complete the semantic peephole
-The semantic layer is also a `-O0` peephole optimizer, and that is **a feature we keep**
-(free `-O0` perf, Principle 6). This track gives it a named home and restores the half
-that was switched off.
+The semantic layer is also a `-O0` peephole optimizer — a **kept feature** (Principle 6).
### Current state
-- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, driven from
- `arith.c:44,126,171`) and the `SV_CMP` fused-compare-into-branch path
- (`api_make_cmp`/`api_materialize_cmp_to`/`api_branch_if`).
-- **Disabled (not dead-by-design):** the `SV_ARITH` delayed-arith subsystem, gated by
- `api_can_delay_int_arith()==0`. It was live until `a126bec` flipped it off to ship the
- EA rider (Track 1b). Track 7 removes that rider.
-- **Live:** scalar store-to-load forwarding (`api_local_const_*`, `value.c:939-1036`).
+- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, from `arith.c`) and
+ the `SV_CMP` fused-compare-into-branch path (`api_make_cmp`/`api_materialize_cmp_to`/
+ `api_branch_if`).
+- **Disabled (not dead):** the `SV_ARITH` delayed-arith subsystem, gated by
+ `api_can_delay_int_arith()==0`. It was live until commit `a126bec` flipped it off to ship
+ the EA rider; **Track 7 removes that rider**, so re-enabling is clean.
+- **Live:** scalar store-to-load forwarding (`api_local_const_*`).
### Action
1. **6.2 — Extract the live peephole into `src/cg/fold.c` + `fold.h`** with a documented
- contract: integer fold helpers, the `SV_CMP` lifecycle (make/release/materialize/
- branch-fuse), and const-local forwarding with its invalidation boundaries
- (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The op
- families (`arith.c`/`memory.c`/`control.c`/`call.c`) call into `fold.h` instead of
- reaching into `value.c` internals. This also settles `ApiSValue`'s shape before Track 7.
-2. **6.3 — Re-enable delayed arith *after* Track 7** (once the EA rider is gone). Restore
- the original gate (`g && !flags && api_foldable_int_type(...)`), bring
- `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith`/the fold-chain +
- identity-collapse helpers under `fold.c`, and verify the delayed forms now compose with
- the place/value model (the old conflict was specifically the EA rider). Net `-O0` win:
- small immediates flow into `binop`, arith chains and identities fold.
-3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** to match the restored, isolated peephole.
-
-**Affected:** `value.c`, `arith.c`, `internal.h`, new `fold.c`/`fold.h`, `doc/CODEGEN.md`.
-**Tests:** `-O0` smoke + opt suites; snapshot-diff to confirm the peephole *improves*
-`-O0` codegen (const-fold, fused compare, delayed arith) with no `-O1+` regression.
+ contract: integer fold helpers, the `SV_CMP` lifecycle, and const-local forwarding with
+ its invalidation boundaries (`api_local_const_memory_boundary`/`_control_boundary`/
+ `_address_taken`). Op families call into `fold.h` instead of reaching into `value.c`
+ internals. **This settles `ApiSValue`'s shape — do it before Track 7, and it eases the
+ Track 2 binop/cmp split (the fold layer is the entangled part there).**
+2. **6.3 — Re-enable delayed arith *after* Track 7.** Restore the gate
+ (`g && !flags && api_foldable_int_type(...)`); bring `api_make_arith_*`/
+ `api_materialize_arith_to`/`api_release_arith`/the fold-chain + identity-collapse helpers
+ under `fold.c`; verify they compose with the place/value model.
+3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** to match the restored, isolated peephole (it
+ currently documents delayed arith as live).
---
-## Track 7 — Strict place/value discipline (the centerpiece)
+## Track 7 — Strict place/value discipline (the centerpiece, UNTOUCHED)
-**Decided:** Model B (explicit place/value kinds); wide-16 scalars are *values*.
+**Decided:** Model B (explicit place/value kinds); wide-16 scalars are *values*. (Track 3c —
+the `scale` vs `log2_scale` rider mismatch — is subsumed here: the `CfreeCgEffAddr` rider is
+removed entirely.)
Today the value stack carries an **inferred** lvalue/rvalue distinction and several ops
-accept multiple operand shapes and dispatch on type + shape. A stack slot's meaning is
-*computed*, not declared. The inference points:
-
-- **`api_is_lvalue_sv` is a heuristic** (`value.c:176-180`): ORs the `lvalue` flag,
- `bitfield_lvalue`, `api_operand_can_address`, and `source_local!=NONE && OPK_LOCAL`.
-- **`cfree_cg_load` has ~7 behaviors, several of which don't load** (`memory.c:436-568`):
- aggregate-lvalue@0 re-pushed as-is; ptr-rvalue-to-aggregate re-pushed; `OPK_GLOBAL`
- aggregate/wide16 flips `lvalue=1`; scalar-local returns the local value directly;
- wide16 keeps storage; then two general lvalue/ptr-rvalue paths.
+dispatch on type + shape. Inference points to remove:
+- **`api_is_lvalue_sv` is a heuristic** (`value.c`): ORs `lvalue`, `bitfield_lvalue`,
+ `api_operand_can_address`, `source_local!=NONE && OPK_LOCAL`.
+- **`cfree_cg_load` has ~7 behaviors, several of which don't load** (`memory.c`).
- **`load`/`store` `base` accepts 4 shapes** ({lvalue, ptr-rvalue} × {no-index, indexed});
- there is **no explicit deref** — a pointer base is silently dereferenceable.
-- **`cfree_cg_index` infers pointer-vs-array-lvalue** (`control.c:849-860`);
- **`cfree_cg_field` infers record-lvalue-vs-pointer** (`control.c:941-952`).
-- **Aggregates are implicitly by-reference and CG decides it** (`call.c:18-42,101-106,
- 310-315`): the frontend never says "pass by reference"; CG infers it from
- `cg_type_is_aggregate`.
-- **wide16 (i128/f128) is special-cased** as aggregate-like throughout (`memory.c:504-533`,
- `call.c:53-66`).
+ there is no explicit deref.
+- **`cfree_cg_index` / `cfree_cg_field` infer** pointer-vs-array / record-vs-pointer.
+- **Aggregates are implicitly by-reference, CG decides it** (`call.c`).
+- **wide16 (i128/f128) is special-cased** as aggregate-like (`memory.c`/`call.c`/`wide.c`).
### The discipline
-Every stack entry is exactly one explicit, type-checked kind — no heuristic:
+Every stack entry is exactly one explicit, type-checked kind:
+- **PLACE** — addressable location of a typed object (`OPK_LOCAL`/`OPK_GLOBAL`/
+ `OPK_INDIRECT(base+index*scale+off)`).
+- **VALUE** — a scalar rvalue: integers, floats, **pointers, and i128/f128**.
-- **PLACE** — an addressable location of a typed object. Representation = the existing
- addressable operands (`OPK_LOCAL` / `OPK_GLOBAL` / `OPK_INDIRECT(base+index*scale+off)`).
-- **VALUE** — a scalar rvalue: integers, floats, **pointers, and now i128/f128**.
-
-CG keeps owning **layout** (field offsets, element sizes, types — deterministic
-computation from the record/array type). What it stops doing is **guessing the kind or
-passing-mode of a stack value**. Every op declares the kinds it consumes/produces and
-panics on mismatch.
+CG keeps owning **layout** (field offsets, element sizes, types). It stops guessing the kind
+or passing-mode of a stack value. Every op declares the kinds it consumes/produces and panics
+on mismatch.
### Op signatures (strict, single-shape)
| Op | Consumes | Produces | Notes |
@@ -364,96 +273,65 @@ panics on mismatch.
| `push_local_addr l` | — | VALUE (ptr) | sugar for `push_local; addr` |
| `addr` | PLACE | VALUE (ptr) | address of the place |
| **`deref`** (NEW) | VALUE (ptr) | PLACE | the explicit ptr→place transition |
-| `field i` | PLACE(record) | PLACE(field) | offset/type from layout; for `->` do `deref; field` |
-| `elem` (was `index`) | VALUE(ptr to T) + index VALUE | PLACE(T) | `*(p+i)`; scale = `sizeof(T)`. Array lvalues decay to ptr first |
+| `field i` | PLACE(record) | PLACE(field) | offset/type from layout; `->` is `deref; field` |
+| `elem` (was `index`) | VALUE(ptr to T) + index VALUE | PLACE(T) | `*(p+i)`; scale=`sizeof(T)`; arrays decay to ptr first |
| `load access` | PLACE | VALUE | always dereferences; **no EA rider** |
| `store access` | PLACE, VALUE | — | always dereferences |
-The **`CfreeCgEffAddr` rider is removed** from `load`/`store`: addressing is built
-explicitly by `field`/`elem`/`deref` and absorbed into the `OPK_INDIRECT` place, so the
-backend still receives a single `[base+index*scale+off]` memop. The kept fold layer
-(Track 6) recovers `-O0` quality: `load` of `PLACE(local)` folds to the local (no memory
-round-trip), and a `deref` of a pointer-arith chain folds back into the place's indirect
-form. Per decision #8 this recovery is **desirable but not a gate** — Track 7 may land
-ahead of the peephole work; `-O1+` carries quality.
+The **`CfreeCgEffAddr` rider is removed** from `load`/`store`: addressing is built explicitly
+by `field`/`elem`/`deref` and absorbed into the `OPK_INDIRECT` place, so the backend still
+gets a single `[base+index*scale+off]` memop. The kept fold layer (Track 6) recovers `-O0`
+quality (`load` of `PLACE(local)` → the local; `deref` of a ptr-arith chain → the indirect
+place). Per decision #8 this recovery is **desirable but not a gate**.
### Aggregates (values forbidden)
-An aggregate is **always a PLACE**; a VALUE of aggregate type is illegal (panic). Reading
-an aggregate = keeping its place. Copies are explicit (`memcpy` between two places, or
-field-by-field). Call args/returns of aggregate type pass an explicit place, with the
-mode named via the ABI attrs that already exist (`SRET`/`BYVAL`/`BYREF`). This removes
-the aggregate branches from `api_materialize_call_local`, `api_push_call_result`, and the
-aggregate `ret` path — the frontend states the passing mode instead of CG inferring it.
-
-### wide16 (decided: scalar values)
-`i128`/`f128` are VALUES like any scalar; the backend lowers 16-byte storage/moves. The
-wide16 special paths in `memory.c`/`call.c`/`wide.c` collapse into the normal value path
-(plus backend support for 16-byte value moves where not already present).
-
-### Inference points removed
-`api_is_lvalue_sv` (→ a kind tag check); the 7-way `load` cascade (→ one deref+load);
-`load`/`store` 4-shape base (→ one PLACE); `index`/`field` dual-mode (→ `elem` on ptr,
-`field` on place); aggregate auto-by-ref (→ explicit place + ABI attr); wide16 special
-path (→ value path).
-
-**Affected:** `cg.h` (new `deref`, `elem` rename, EA rider removed from `load`/`store`,
-`ApiSValue` kind tag), `value.c` (kind discipline replaces `lvalue`/`api_is_lvalue_sv`),
-`memory.c` (load/store rewritten; `fold_ea_into_operand`/`pop_and_normalize_index`
-folded into place-building), `control.c` (`index`→`elem`, `field`), `call.c` (aggregate
-branches removed), `wide.c` (wide16 path removed), **every frontend** (insert explicit
-`deref`/array-decay where they relied on pointer-base load/store; mark aggregate passing
-modes). Backends mostly unaffected (they already consume `OPK_INDIRECT`).
-**Tests:** this is the highest-blast-radius track — red-green per op, lean on the toy
-corpus and C frontend for *correctness*; snapshot `-O0` codegen to *track* addressing-mode
-recovery (decision #8: `-O0` quality is not a gate, so a temporary regression does not
-block landing).
-
-## Recommended sequencing
-
-Each track is independently shippable and testable. Suggested order by risk/leverage and
-dependency:
-
-1. **Track 1 (remove dead/redundant surface: 1a, 1d) + 1c completeness audit.** Pure
- subtraction plus filling test gaps; no behavior change. Shrinks the surface.
-2. **Track 6.2 (isolate the live fold layer into `fold.c`).** Settles `ApiSValue`'s shape
- and makes the fold layer a clean dependency for Track 7.
-3. **Track 7 (place/value discipline).** The centerpiece; removes the EA rider; depends on
- a solid fold layer. Highest blast radius — do it deliberately, red-green.
-4. **Track 6.3 (re-enable delayed arith).** Now that Track 7 removed the EA rider that
- killed it; free `-O0` perf, under the isolated `fold.c`.
-5. **Track 3a/3b (MemAccess unify + bitfield-as-PLACE-subkind).** On the strict
- place-based `load`/`store`. (3c was folded into 7; 3b merges via decision #7.)
-6. **Track 2 + Track 4 (op/intrinsic vocabulary).** Independent of the above; reshape the
- op/intrinsic vocabulary once.
-7. **Track 5 (multi-result).** Independent; public + type-system + `value.c` only.
-
-## Decisions (all resolved — ready to execute)
-
-1. **`SV_ARITH`: delete or re-enable?** **DECIDED (owner): re-enable** — the vstack
- peephole is a kept feature for free `-O0` perf. It was disabled by `a126bec` to ship
- the EA rider, which Track 7 removes; restore + isolate under Track 6.2/6.3.
-2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public
- `CfreeCg*` op enums directly; delete internal `BinOp`/`UnOp`/`CmpOp`/`AtomicOp`/
- `MemOrder`/`AsmDir` + all `api_map_*`. Couples the internal contract to public enum
- values (accepted for an in-repo contract).
-3. **Façade intrinsics: query + implement the trivial ones.** Add
- `cfree_cg_target_supports_intrinsic` + a clean diagnostic; implement the
- single-instruction baremetal/CPU intrinsics on native arches; report
- `FMA`/`SYSCALL`/`CORO_SWITCH` false until built; remove `FP_REM`.
-4. **`cfree_cg_ret_void`: fold into `ret`.** Remove `ret_void`; a void function returns
- via `cfree_cg_ret` with 0 results — a single return entry point.
-5. **`NONTEMPORAL`/`INVARIANT`/alias scopes: remove now.** Drop them from
- `CfreeCgMemAccess`; re-add with a real internal carrier + backend consumer when needed.
-
-**Track 7 model (decided earlier):** Model B (explicit PLACE/VALUE kinds + `deref`;
-aggregate values forbidden); wide-16 scalars are values; the `CfreeCgEffAddr` rider is
-removed.
-
-6. **`elem` operand shape: pointer VALUE + explicit array-decay.** `elem` consumes a
- pointer VALUE (`*(p+i)`); array lvalues decay via an explicit PLACE(array)→VALUE(ptr)
- op. One shape, no dual-mode.
-7. **Bitfields: PLACE subkind.** A bitfield is a PLACE subkind carrying the descriptor;
- the normal `load`/`store` perform the extract/insert. Merges Track 3b into Track 7.
-8. **`-O0` quality: not a gate.** Track 7 may land the cleaner semantics even with `-O0`
- codegen regressions; `-O1+` carries quality. The vstack peephole (Track 6.3) is still
- restored for the free `-O0` win, but it does **not** block Track 7.
+An aggregate is **always a PLACE**; a VALUE of aggregate type is illegal (panic). Copies are
+explicit (`memcpy` between places, or field-by-field). Call args/returns of aggregate type
+pass an explicit place, mode named via existing ABI attrs (`SRET`/`BYVAL`/`BYREF`). Removes
+the aggregate branches in `api_materialize_call_local`, `api_push_call_result`, and the
+aggregate `ret` path.
+
+### wide16 (scalar values)
+`i128`/`f128` are VALUES; the backend lowers 16-byte storage/moves. The wide16 special paths
+in `memory.c`/`call.c`/`wide.c` collapse into the value path (plus backend 16-byte value-move
+support where missing).
+
+**Affected:** `cg.h` (new `deref`, `elem` rename, EA rider removed, `ApiSValue` kind tag),
+`value.c`, `memory.c`, `control.c` (`index`→`elem`, `field`), `call.c`, `wide.c`, **every
+frontend** (insert explicit `deref`/array-decay; mark aggregate passing modes). Backends
+mostly unaffected (they already consume `OPK_INDIRECT`). **Tests:** highest blast radius —
+red-green per op on the toy corpus + C frontend; `-O0` quality is not a gate (decision #8).
+
+---
+
+## Recommended sequencing (remaining)
+
+1. **Track 1c** completeness audit + tests (small, no behavior change).
+2. **Track 6.2** — isolate the live fold layer into `fold.c`. Settles `ApiSValue` and is a
+ clean dependency for both the Track 2 binop/cmp split and Track 7.
+3. **Track 2 binop/cmp split** — independent of 6.2 but cleaner after it (shares the fold
+ layer). Also fixes the lossy FP compare.
+4. **Track 7** (place/value) — the centerpiece; removes the EA rider; do it red-green.
+5. **Track 6.3** — re-enable delayed arith once Track 7 removed the EA rider.
+6. **Track 3b** — bitfield-as-PLACE-subkind, on the strict place-based `load`/`store`.
+7. **Track 4** (bswap collapse, `unreachable` hook, `supports_intrinsic`, CPU intrinsics) —
+ independent; can be done any time.
+8. **Track 5 follow-up** — true multi-value at `-O1` (opt `cg_ir_lower`) + wasm, if wanted.
+
+2, 4, 7 are independent of each other; 6.2 helps 2 and 7; 6.3 and 3b depend on 7.
+
+## Decisions still governing remaining work
+
+2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public split
+ enums; delete internal `BinOp`/`UnOp`/`CmpOp` + their `api_map_*`. (Atomic/Order/AsmDir
+ already done.)
+3. **Façade intrinsics: query + implement the trivial ones.** Add `supports_intrinsic` + a
+ clean diagnostic; implement single-instruction baremetal/CPU intrinsics; report
+ `FMA`/`SYSCALL`/`CORO_SWITCH` false until built. (`FP_REM` already removed.)
+6. **`elem` operand shape: pointer VALUE + explicit array-decay.**
+7. **Bitfields: PLACE subkind** (merges Track 3b into Track 7).
+8. **`-O0` quality: not a gate.** Track 7 may land with `-O0` regressions; `-O1+` carries
+ quality. Track 6.3 still restores the peephole for the free `-O0` win but does not block 7.
+
+(Decisions 1, 4, 5 are realized: 1 = peephole kept/re-enable under Track 6; 4 = `ret_void`
+removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed.)