commit 498a5c4ed0510ab532dc3df0b44c310501c66c78
parent 15e2effc7193b2495ed95320ca05dab0fe2acdb0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Tue, 2 Jun 2026 06:16:50 -0700
plan: record CODEGEN tracks landed/attempted on codegen-tracks-7634
Diffstat:
| M | doc/plan/CODEGEN.md | | | 246 | +++++++++++++++++++++++++++++++++++++++---------------------------------------- |
1 file changed, 121 insertions(+), 125 deletions(-)
diff --git a/doc/plan/CODEGEN.md b/doc/plan/CODEGEN.md
@@ -1,17 +1,21 @@
# Codegen Interface Cleanup — Roadmap (remaining work)
-Status: **partially executed.** The independent, lower-risk tracks are landed and committed
-(see §Done). What remains is the high-blast-radius work: the **binop/cmp op-split** (the
-rest of Track 2), the **op/intrinsic taxonomy** (Track 4), the **fold-layer isolation**
-(Track 6), and the **PLACE/VALUE centerpiece** (Track 7, with Track 3b folded in).
+Status: **mostly executed.** The independent, lower-risk tracks landed first; the
+high-blast-radius work has since followed. The **PLACE/VALUE centerpiece** (Track 7,
+with Track 3b folded in) is **complete** — strict addressing, the explicit place
+predicate, forbidden aggregate VALUEs, i128/f128 flowing as VALUEs, and bitfields as
+a PLACE subkind are all landed. The **op/intrinsic taxonomy** (Track 4) is **complete**.
+The **fold-layer isolation + delayed-arith re-enable** (Track 6) is **complete**. What
+remains is the **binop/cmp op-split** (the rest of Track 2), the Track 1c completeness
+audit, and the Track 5 multi-value follow-up.
Forward-looking companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal:
make the `CfreeCg` public API and the internal `CgTarget` contract carry **one clear
representation per concept, with no advertise-but-ignore surface and no façade**. Breaking
and sweeping changes are in scope; reducing churn is *not* a priority.
-The centerpiece is **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's
-inference of what a stack slot *means*. Its core is decided (Model B; see §Track 7).
+The centerpiece was **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's
+inference of what a stack slot *means*. It is now landed (Model B; see §Track 7).
## Scope
@@ -55,9 +59,16 @@ Between them sits the translation layer (`src/cg/value.c`, `arith.c`, `memory.c`
| `a2f6367` | **2 (Atomic/Order)** | Deleted internal `AtomicOp`/`MemOrder` + `api_map_atomic_op`/`api_map_mem_order`; **both** the semantic `CgTarget` and physical `NativeTarget` atomic hooks, the recorder+opt IR aux, and the interpreter now carry public `CfreeCgAtomicOp`/`CfreeCgMemOrder`. |
| `d03eb4c` | **6.2** | Isolated the `-O0` semantic peephole into `src/cg/fold.{c,h}`: integer constant folding, the `SV_CMP` delayed-compare lifecycle, the (gated-off) `SV_ARITH` delayed-arith lifecycle, and const-local store-to-load forwarding with its invalidation boundaries. `fold.h` is the documented contract, re-exported via `internal.h`; `value.c` keeps stack discipline, `api_lvalue_addr`, and the enum-mapping helpers. Pure relocation, no behavior change. `doc/CODEGEN.md` updated. |
| `c338c74`+`8e17cb9` | **7 (core)** | Strict PLACE/VALUE addressing. Removed `CfreeCgEffAddr` from `load`/`store` (they consume a PLACE); added `deref(offset)` (VALUE ptr→PLACE), renamed `index`→`elem` (VALUE ptr + index→PLACE, scale=sizeof(T)), kept `field(i)`/`addr`. Each op **panics on kind mismatch** — no place/value inference. The place ops fold the constant offset (deref/field) and scale (elem→`log2_scale`) into one `OPK_INDIRECT[base + index*scale + offset]`, so the backend still gets a single addressing-mode memop (base/index dynamic, scale/offset folded). All three frontends + emu + cg-api tests conformed (explicit `deref`/decay/`field`). `cg.h` documents the kinds + per-op contracts. Green: toy 1344/0, cg-api, opt (incl tiny-inline), smoke, libc, isa/link/elf, and `make bootstrap` reproduces byte-identical at -O0 AND -O1. |
+| `a0397c6` | **7.1 / 7.2** | Explicit PLACE predicate + forbid aggregate VALUEs. `api_is_lvalue_sv` is now a kind-based predicate — `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)` — replacing the old heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed; `SV_CMP`/`SV_ARITH` never carry `lvalue=1`). `api_push` now panics if an aggregate-typed value enters the stack as a non-place (aggregates are always PLACEs; i128/f128 are scalars and unaffected). |
+| `6f48bfd` | **7.3** | Flow i128/f128 as VALUEs, collapse the wide16 special paths in `memory.c`/`call.c` (~100 lines deleted). The 16-byte scalars now ride the value path; the aggregate-like special-casing is gone. |
+| `d08e794` | **3b** | Bitfield as a PLACE subkind, single representation. Dropped the bit-field rider on `CfreeCgMemAccess`; the strict `load`/`store` carry the bit-field geometry via the `CfreeCgMemAccess` the frontend supplies (rebuilt through `bf_from_access`), and `cfree_cg_field` pushes the record-base address as a place of the field type with no `delayed.bitfield`/`bitfield_lvalue` rider. Removes the "every memop is secretly maybe-a-bitfield" branch. |
+| `b8de5c0` | **6.3** | Re-enabled the `SV_ARITH` delayed-arith `-O0` peephole (gate flip in `fold.c`, now that Track 7 removed the EA rider it conflicted with). `doc/CODEGEN.md` note flipped from gated-off to live. |
+| `52897e0` | **4a** | Collapsed `INTRIN_BSWAP16/32/64` into one width-by-type `BSWAP` (`cgtarget.h`). `arith.c` drops the size-branch; each backend (`aa64`/`x64`/`rv64` native, interp, c_target, wasm) derives width from `dsts[0].type` under a `switch(width)`, preserving the existing sequences. Pure internal dedup; public API unchanged. |
+| `7eaf7bf9` | **4b** | `unreachable` is now a first-class terminator hook with its own `CgTarget` hook + IR op (recorder + opt), not routed through the intrinsic path. The 5 backends + interp + every opt pass that handles terminators (CFG/DCE/SSA/native-emit/…) handle it directly. |
+| `15e2effc` | **4c** | `cfree_cg_target_supports_intrinsic` query + a real unsupported-feature diagnostic (replacing the bare `compiler_panic`); implemented the single-instruction baremetal/CPU intrinsics (`cpu_nop`/`yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) on the native arches. Converted the `test/toy/err/unsupported_*` panic cases into positive smoke cases + added the capability-query test. `FMA`/`SYSCALL`/`CORO_SWITCH` still report `false`. |
-So **Tracks 1a/1d, 5, 3a, 6.2 are done; Track 2 is 2/3 done** (the 3 identical enums); **Track 4**
-has FP_REM removed.
+So **Tracks 1a/1d, 5, 3a, 3b, 6, 7 are done; Track 4 is done** (FP_REM + 4a/4b/4c);
+**Track 2 is 2/3 done** (the 3 identical enums; the binop/cmp split remains).
### Caveats / follow-ups discovered while doing the above
- **Track 5 multi-result is single-result-complete only.** The `-O0` native path handles
@@ -146,121 +157,109 @@ the real fix.
---
-## Track 3b — Bitfields as a PLACE subkind (REMAINING — do with/after Track 7)
+## Track 3b — Bitfields as a PLACE subkind (DONE — `d08e794`)
-Three representations today: a rider on `CfreeCgMemAccess`
-(`bit_offset`/`bit_width`/`storage_size`/`bit_signed`), a rider on `CfreeCgField`
-(`bit_width`/`bit_offset`/`bit_storage_size`/`bit_signed` — note `storage_size` vs
-`bit_storage_size` drift), and internal `BitFieldAccess` + `bitfield_load`/`_store`.
-
-**Decision (#7): a bitfield is a PLACE subkind** carrying the descriptor; the normal
-`load`/`store` perform the extract/insert. This merges into Track 7 (it depends on the
-place model). Drop the bitfield fields from `CfreeCgMemAccess`; keep `CfreeCgField`'s
-layout-query fields but fix the naming drift. Removes the "every memop is secretly
-maybe-a-bitfield" branch in `cfree_cg_load`/`_store`.
-
-**Affected:** `cg.h`, `cgtarget.h`, `memory.c`, all backends' `load`/`store`/`bitfield_*`,
-`lang/c/parse/cg_adapter.c`. **Tests:** bitfield corpus in toy + C; `test-cg-api`.
+**LANDED on `codegen-tracks-7634`** (`d08e794`). A bitfield is now a PLACE subkind: the
+bit-field rider was dropped from `CfreeCgMemAccess`, and the strict `load`/`store` carry
+the bit-field geometry (storage size/offset, bit offset/width, signedness) via the
+`CfreeCgMemAccess` the frontend supplies, rebuilt through `bf_from_access`. `cfree_cg_field`
+now pushes the record-base address as a place of the field type with no `delayed.bitfield`
+/`bitfield_lvalue` rider, and the "every memop is secretly maybe-a-bitfield" branch in
+`cfree_cg_load`/`_store` is gone. Touched `cg.h`, `internal.h`, `memory.c`, `value.c`,
+`control.c`, and `lang/c/parse/cg_adapter.c`; green on the bitfield corpus + `test-cg-api`
++ bootstrap. (Done as a PLACE subkind on the strict `load`/`store`, after Track 7 core.)
---
-## Track 4 (remaining) — op/intrinsic taxonomy
-
-FP_REM removal is done. Remaining:
-
-### 4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP`
-Internal `IntrinKind` has 3 bswaps; public has 1 (`CFREE_CG_INTRIN_BSWAP`). `api_map_intrinsic`
-(`arith.c`) picks the internal one by `abi_cg_sizeof(result_type)`. **Feasible to collapse:**
-`NativeLoc` carries `.type` and `NativeTarget` has `t->c->abi`, so backends derive width from
-`dsts[0].type` (the result type — same source the size-branch uses). Collapse = wrap each
-backend's three existing sequences under a `switch(width)`; preserve the sequences verbatim.
-Touches `cgtarget.h` (enum), `arith.c` (drop the size-branch), and the bswap cases in
-`aa64`/`x64`/`rv64` `native.c`, `interp/engine.c`, `c_target/c_emit.c`, and **wasm
-(`arch/wasm/emit.c`, multi-site ~1577/1708/2894/3113 + capability path)**. NOTE the C
-frontend's `cg_adapter.h` has its own `INTRIN_BSWAP16/32/64`; leave it (it maps to the public
-single `BSWAP` at the call site). Pure internal dedup — public API unchanged.
-
-### 4b. `unreachable` as a first-class terminator hook
-`cfree_cg_unreachable` is documented "a real terminator, not a side-effect intrinsic"
-(`cg.h`) but is routed through the **intrinsic** hook (`control.c`, `INTRIN_UNREACHABLE`).
-Give it its own `CgTarget` hook + its own IR op (recorder + opt), and move the 5 backends'
-`INTRIN_UNREACHABLE` handling onto it. (Terminators are first-class: ret, unreachable, jump,
-branch, computed_goto, tail-call.)
-
-### 4c. Façade intrinsics: query + implement the trivial ones
-`api_map_intrinsic` maps ~16 enumerators (`FMA`, `SYSCALL`, all `IRQ_*`, `DMB`/`DSB`/`ISB`,
-`DCACHE_*`/`ICACHE_*`, `CPU_NOP`/`CPU_YIELD`/`WFI`/`WFE`/`SEV`, `CORO_SWITCH`) → `INTRIN_NONE`,
-and `cfree_cg_intrinsic` turns `INTRIN_NONE` into a bare `compiler_panic` (`arith.c`). The toy
-frontend calls them in good faith; `test/toy/err/unsupported_*` encode the panic as current
-behavior. There is **no `supports_` query for intrinsics**.
-
-1. Add `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)` (mirror
- `cfree_cg_target_supports_call_conv`/`_symbol_feature`). Needs a per-arch capability source.
-2. Convert the bare `compiler_panic` into a proper unsupported-feature diagnostic.
-3. Implement the trivial single-instruction baremetal/CPU intrinsics on the native arches
- (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) — one instruction each;
- convert the corresponding `test/toy/err/` cases to positive smoke cases.
-4. Leave `FMA`/`SYSCALL`/`CORO_SWITCH` reported `false` until implemented.
-
-Also settle: keep `memcpy`/`memset` as dedicated *public* ops (they carry rich `MemAccess`)
-but stop double-modeling them as a separate public *intrinsic* surface.
-
-**Affected:** `cg.h`, `cgtarget.h`, `arith.c`, `control.c`, native backends' `intrinsic`,
-`lang/toy/builtins.c`, `test/toy/err/`.
+## Track 4 — op/intrinsic taxonomy (DONE — `5e1335d` + `52897e0` + `7eaf7bf9` + `15e2effc`)
+
+**LANDED.** FP_REM removal (`5e1335d`) plus 4a/4b/4c on `codegen-tracks-7634`:
+
+### 4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP` — DONE (`52897e0`)
+Collapsed the 3 internal `IntrinKind` bswaps into one width-by-type `BSWAP` in `cgtarget.h`.
+`arith.c` dropped the `abi_cg_sizeof`-driven size-branch; each backend now derives width from
+`dsts[0].type` and wraps its three existing sequences under a `switch(width)`, preserving them
+verbatim. Done across `aa64`/`x64`/`rv64` `native.c`, `interp/engine.c`, `c_target/c_emit.c`,
+and **wasm (`arch/wasm/emit.c` + `internal.h`)**. The C frontend's `cg_adapter.h`
+`INTRIN_BSWAP16/32/64` was left as-is (maps to the public single `BSWAP` at the call site).
+Pure internal dedup — public API unchanged.
+
+### 4b. `unreachable` as a first-class terminator hook — DONE (`7eaf7bf9`)
+`cfree_cg_unreachable` now has its own `CgTarget` hook + its own IR op (recorder + opt) and is
+no longer routed through the intrinsic hook. The 5 backends' + interp's handling, plus every
+opt pass that handles terminators (`pass_cfg`/`pass_dce`/`pass_ssa`/`pass_analysis`/`pass_o2`/
+`pass_lower`/`pass_native_emit`, `cg_ir_lower`, `ir_dump`/`ir_print`, `check_target`), were
+moved onto it. (Terminators are first-class: ret, unreachable, jump, branch, computed_goto,
+tail-call.)
+
+### 4c. Façade intrinsics: query + implement the trivial ones — DONE (`15e2effc`)
+Added `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)` (mirroring
+`_supports_call_conv`/`_symbol_feature`) and converted the bare `compiler_panic` into a proper
+unsupported-feature diagnostic. Implemented the single-instruction baremetal/CPU intrinsics on
+the native arches (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`). The
+`test/toy/err/unsupported_*` panic cases were converted into positive smoke cases (plus a new
+`144_intrinsic_capability_query` + `145_baremetal_privileged_aa64`). `FMA`/`SYSCALL`/
+`CORO_SWITCH` still report `false` until implemented.
+
+**Follow-up not done here:** the "keep `memcpy`/`memset` as dedicated public ops but stop
+double-modeling them as a separate public intrinsic surface" cleanup was *not* part of this
+slice — it remains an open taxonomy tidy if wanted.
---
-## Track 6 — Isolate and complete the semantic peephole
+## Track 6 — Isolate and complete the semantic peephole (DONE — `d03eb4c` + `b8de5c0`)
The semantic layer is also a `-O0` peephole optimizer — a **kept feature** (Principle 6).
+**Status: DONE.** Both 6.2 (`d03eb4c`) and 6.3 (`b8de5c0`) landed.
+
### Current state
-- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, from `arith.c`) and
+- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, in `fold.c`) and
the `SV_CMP` fused-compare-into-branch path (`api_make_cmp`/`api_materialize_cmp_to`/
`api_branch_if`).
-- **Disabled (not dead):** the `SV_ARITH` delayed-arith subsystem, gated by
- `api_can_delay_int_arith()==0`. It was live until commit `a126bec` flipped it off to ship
- the EA rider; **Track 7 removes that rider**, so re-enabling is clean.
+- **Live again:** the `SV_ARITH` delayed-arith subsystem — re-enabled by `b8de5c0` once
+ Track 7 removed the EA rider it conflicted with.
- **Live:** scalar store-to-load forwarding (`api_local_const_*`).
-### Action
+### Action — completed
1. **6.2 — Extract the live peephole into `src/cg/fold.c` + `fold.h`** — **DONE** (`d03eb4c`).
The documented contract covers the integer fold helpers, the `SV_CMP` lifecycle, and
const-local forwarding with its invalidation boundaries
- (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The (gated-off)
- `SV_ARITH` machinery was moved alongside it so 6.3 is a gate flip, not a code move. Op
- families call into `fold.h`; `value.c` keeps the stack discipline. `ApiSValue`'s shape is
- now settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated.
-2. **6.3 — Re-enable delayed arith *after* Track 7.** Restore the gate
- (`g && !flags && api_foldable_int_type(...)`) in `api_can_delay_int_arith` (now in
- `fold.c`); the `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith`/the
- fold-chain + identity-collapse helpers already live under `fold.c` — verify they compose
- with the place/value model.
-3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** to match the restored, isolated peephole. 6.2
- already corrected it to introduce `fold.c` and mark delayed arith gated-off; 6.3 should
- flip that note to "live" once re-enabled.
+ (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The (then
+ gated-off) `SV_ARITH` machinery was moved alongside it so 6.3 was a gate flip, not a code
+ move. Op families call into `fold.h`; `value.c` keeps the stack discipline. `ApiSValue`'s
+ shape is settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated.
+2. **6.3 — Re-enable delayed arith after Track 7** — **DONE** (`b8de5c0`). The gate in
+ `api_can_delay_int_arith` (in `fold.c`) was restored now that Track 7 removed the EA
+ rider; the `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith` fold-chain +
+ identity-collapse helpers compose with the place/value model. Green at -O0; bootstrap
+ reproduces.
+3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** — **DONE.** 6.2 introduced `fold.c` and marked
+ delayed arith gated-off; 6.3 flipped that note to "live".
---
-## Track 7 — Strict place/value discipline (the centerpiece)
-
-**Status: core LANDED** (`c338c74`+`8e17cb9`). The public addressing surface is the strict
-`push_local`/`addr`/`deref`/`field`/`elem`/`load`/`store` set; the `CfreeCgEffAddr` rider is
-gone; every op panics on a place/value kind mismatch (no inference at the boundary); and the
-place ops fold the constant offset/scale into one `OPK_INDIRECT[base + index*scale + offset]`
-for clean memops. All frontends + emu + cg-api tests conform; `make bootstrap` reproduces at
--O0 AND -O1.
-
-**Remaining refinements (follow-ups):**
-- The *internal* `api_is_lvalue_sv` is still the place predicate (a heuristic on `ApiSValue`).
- The boundary is strict, but replacing the internal heuristic with an explicit `ApiSValue`
- kind tag (Model B's "every stack entry is exactly one kind") is not yet done.
-- Aggregate **VALUE**s aren't yet hard-forbidden inside the stack (aggregates are PLACEs in
- practice, but there's no panic on an aggregate-typed value).
-- **wide16** (i128/f128) is still special-cased as aggregate-like in `memory.c`/`call.c`/
- `wide.c` rather than flowing as a VALUE.
-- **Bitfields** still ride on `CfreeCgMemAccess`/`CfreeCgField` (Track 3b: make them a PLACE
- subkind on the strict `load`/`store`).
+## Track 7 — Strict place/value discipline (the centerpiece) — DONE
+
+**Status: LANDED** (`c338c74`+`8e17cb9` core; `a0397c6` 7.1/7.2; `6f48bfd` 7.3). The public
+addressing surface is the strict `push_local`/`addr`/`deref`/`field`/`elem`/`load`/`store`
+set; the `CfreeCgEffAddr` rider is gone; every op panics on a place/value kind mismatch (no
+inference at the boundary); and the place ops fold the constant offset/scale into one
+`OPK_INDIRECT[base + index*scale + offset]` for clean memops. All frontends + emu + cg-api
+tests conform; `make bootstrap` reproduces at -O0 AND -O1.
+
+**Refinements (now landed):**
+- The *internal* place predicate `api_is_lvalue_sv` is **now kind-based** (`a0397c6`):
+ `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)`, replacing the old
+ heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed).
+- Aggregate **VALUE**s are **now hard-forbidden** (`a0397c6`): `api_push` panics on an
+ aggregate-typed non-place value. i128/f128 are scalars and unaffected.
+- **wide16** (i128/f128) **now flows as a VALUE** (`6f48bfd`): the aggregate-like special
+ paths in `memory.c`/`call.c` collapsed (~100 lines deleted).
+- **Bitfields** are **now a PLACE subkind** (Track 3b, `d08e794`): the bit-field rider was
+ dropped from `CfreeCgMemAccess`; the strict `load`/`store` carry the geometry.
+
+**Remaining refinement (follow-up, non-blocking per decision #8):**
- **-O0 mem-op quality**: the C frontend reaches non-trivial places via
`pcg_materialize_lv_to_ptr` (int arithmetic) + `deref`; it could instead emit
`deref(offset)`/`elem` directly so -O0 also gets the folded addressing mode (-O1's
@@ -335,32 +334,29 @@ red-green per op on the toy corpus + C frontend; `-O0` quality is not a gate (de
## Recommended sequencing (remaining)
-1. **Track 1c** completeness audit + tests (small, no behavior change).
-2. **Track 6.2** — isolate the live fold layer into `fold.c`. **DONE (`d03eb4c`).** Settles
- `ApiSValue` and is a clean dependency for both the Track 2 binop/cmp split and Track 7.
-3. **Track 2 binop/cmp split** — independent of 6.2 but cleaner after it (shares the fold
- layer). Also fixes the lossy FP compare.
-4. **Track 7** (place/value) — the centerpiece; removes the EA rider; do it red-green.
-5. **Track 6.3** — re-enable delayed arith once Track 7 removed the EA rider.
-6. **Track 3b** — bitfield-as-PLACE-subkind, on the strict place-based `load`/`store`.
-7. **Track 4** (bswap collapse, `unreachable` hook, `supports_intrinsic`, CPU intrinsics) —
- independent; can be done any time.
-8. **Track 5 follow-up** — true multi-value at `-O1` (opt `cg_ir_lower`) + wasm, if wanted.
+Most of the original sequence is landed: **6.2** (`d03eb4c`), **Track 7** core + 7.1/7.2/7.3
+(`c338c74`/`8e17cb9`/`a0397c6`/`6f48bfd`), **6.3** (`b8de5c0`), **Track 3b** (`d08e794`), and
+**Track 4** 4a/4b/4c (`52897e0`/`7eaf7bf9`/`15e2effc`) are all done. What's left:
+
+1. **Track 2 binop/cmp split** — the largest remaining mechanical change; cleaner now that
+ the fold layer is isolated (6.2). Also fixes the lossy FP compare.
+2. **Track 1c** completeness audit + tests (small, no behavior change).
+3. **Track 5 follow-up** — true multi-value at `-O1` (opt `cg_ir_lower`) + wasm, if wanted.
+4. **Track 4 taxonomy tidy** (optional) — stop double-modeling `memcpy`/`memset` as a
+ separate public intrinsic surface (kept as dedicated public ops).
-2, 4, 7 are independent of each other; 6.2 helps 2 and 7; 6.3 and 3b depend on 7.
+Track 2 is independent of everything still open; the fold-layer isolation (6.2) already
+helps it.
## Decisions still governing remaining work
2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public split
enums; delete internal `BinOp`/`UnOp`/`CmpOp` + their `api_map_*`. (Atomic/Order/AsmDir
- already done.)
-3. **Façade intrinsics: query + implement the trivial ones.** Add `supports_intrinsic` + a
- clean diagnostic; implement single-instruction baremetal/CPU intrinsics; report
- `FMA`/`SYSCALL`/`CORO_SWITCH` false until built. (`FP_REM` already removed.)
-6. **`elem` operand shape: pointer VALUE + explicit array-decay.**
-7. **Bitfields: PLACE subkind** (merges Track 3b into Track 7).
-8. **`-O0` quality: not a gate.** Track 7 may land with `-O0` regressions; `-O1+` carries
- quality. Track 6.3 still restores the peephole for the free `-O0` win but does not block 7.
-
-(Decisions 1, 4, 5 are realized: 1 = peephole kept/re-enable under Track 6; 4 = `ret_void`
-removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed.)
+ already done.) — **still governs the open Track 2 binop/cmp split.**
+
+(Decisions 1, 3, 4, 5, 6, 7, 8 are realized: 1 = peephole kept + re-enabled under Track 6
+(`b8de5c0`); 3 = `supports_intrinsic` + diagnostic + CPU intrinsics landed (`15e2effc`),
+`FP_REM` removed; 4 = `ret_void` removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed;
+6 = `elem` is a pointer VALUE + explicit array-decay (Track 7 core); 7 = bitfields are a
+PLACE subkind (`d08e794`); 8 = `-O0` quality was not a gate for Track 7, and Track 6.3
+restored the peephole.)