kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 498a5c4ed0510ab532dc3df0b44c310501c66c78
parent 15e2effc7193b2495ed95320ca05dab0fe2acdb0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue,  2 Jun 2026 06:16:50 -0700

plan: record CODEGEN tracks landed/attempted on codegen-tracks-7634

Diffstat:
Mdoc/plan/CODEGEN.md | 246+++++++++++++++++++++++++++++++++++++++----------------------------------------
1 file changed, 121 insertions(+), 125 deletions(-)

diff --git a/doc/plan/CODEGEN.md b/doc/plan/CODEGEN.md @@ -1,17 +1,21 @@ # Codegen Interface Cleanup — Roadmap (remaining work) -Status: **partially executed.** The independent, lower-risk tracks are landed and committed -(see §Done). What remains is the high-blast-radius work: the **binop/cmp op-split** (the -rest of Track 2), the **op/intrinsic taxonomy** (Track 4), the **fold-layer isolation** -(Track 6), and the **PLACE/VALUE centerpiece** (Track 7, with Track 3b folded in). +Status: **mostly executed.** The independent, lower-risk tracks landed first; the +high-blast-radius work has since followed. The **PLACE/VALUE centerpiece** (Track 7, +with Track 3b folded in) is **complete** — strict addressing, the explicit place +predicate, forbidden aggregate VALUEs, i128/f128 flowing as VALUEs, and bitfields as +a PLACE subkind are all landed. The **op/intrinsic taxonomy** (Track 4) is **complete**. +The **fold-layer isolation + delayed-arith re-enable** (Track 6) is **complete**. What +remains is the **binop/cmp op-split** (the rest of Track 2), the Track 1c completeness +audit, and the Track 5 multi-value follow-up. Forward-looking companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal: make the `CfreeCg` public API and the internal `CgTarget` contract carry **one clear representation per concept, with no advertise-but-ignore surface and no façade**. Breaking and sweeping changes are in scope; reducing churn is *not* a priority. -The centerpiece is **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's -inference of what a stack slot *means*. Its core is decided (Model B; see §Track 7). +The centerpiece was **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's +inference of what a stack slot *means*. It is now landed (Model B; see §Track 7). ## Scope @@ -55,9 +59,16 @@ Between them sits the translation layer (`src/cg/value.c`, `arith.c`, `memory.c` | `a2f6367` | **2 (Atomic/Order)** | Deleted internal `AtomicOp`/`MemOrder` + `api_map_atomic_op`/`api_map_mem_order`; **both** the semantic `CgTarget` and physical `NativeTarget` atomic hooks, the recorder+opt IR aux, and the interpreter now carry public `CfreeCgAtomicOp`/`CfreeCgMemOrder`. | | `d03eb4c` | **6.2** | Isolated the `-O0` semantic peephole into `src/cg/fold.{c,h}`: integer constant folding, the `SV_CMP` delayed-compare lifecycle, the (gated-off) `SV_ARITH` delayed-arith lifecycle, and const-local store-to-load forwarding with its invalidation boundaries. `fold.h` is the documented contract, re-exported via `internal.h`; `value.c` keeps stack discipline, `api_lvalue_addr`, and the enum-mapping helpers. Pure relocation, no behavior change. `doc/CODEGEN.md` updated. | | `c338c74`+`8e17cb9` | **7 (core)** | Strict PLACE/VALUE addressing. Removed `CfreeCgEffAddr` from `load`/`store` (they consume a PLACE); added `deref(offset)` (VALUE ptr→PLACE), renamed `index`→`elem` (VALUE ptr + index→PLACE, scale=sizeof(T)), kept `field(i)`/`addr`. Each op **panics on kind mismatch** — no place/value inference. The place ops fold the constant offset (deref/field) and scale (elem→`log2_scale`) into one `OPK_INDIRECT[base + index*scale + offset]`, so the backend still gets a single addressing-mode memop (base/index dynamic, scale/offset folded). All three frontends + emu + cg-api tests conformed (explicit `deref`/decay/`field`). `cg.h` documents the kinds + per-op contracts. Green: toy 1344/0, cg-api, opt (incl tiny-inline), smoke, libc, isa/link/elf, and `make bootstrap` reproduces byte-identical at -O0 AND -O1. | +| `a0397c6` | **7.1 / 7.2** | Explicit PLACE predicate + forbid aggregate VALUEs. `api_is_lvalue_sv` is now a kind-based predicate — `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)` — replacing the old heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed; `SV_CMP`/`SV_ARITH` never carry `lvalue=1`). `api_push` now panics if an aggregate-typed value enters the stack as a non-place (aggregates are always PLACEs; i128/f128 are scalars and unaffected). | +| `6f48bfd` | **7.3** | Flow i128/f128 as VALUEs, collapse the wide16 special paths in `memory.c`/`call.c` (~100 lines deleted). The 16-byte scalars now ride the value path; the aggregate-like special-casing is gone. | +| `d08e794` | **3b** | Bitfield as a PLACE subkind, single representation. Dropped the bit-field rider on `CfreeCgMemAccess`; the strict `load`/`store` carry the bit-field geometry via the `CfreeCgMemAccess` the frontend supplies (rebuilt through `bf_from_access`), and `cfree_cg_field` pushes the record-base address as a place of the field type with no `delayed.bitfield`/`bitfield_lvalue` rider. Removes the "every memop is secretly maybe-a-bitfield" branch. | +| `b8de5c0` | **6.3** | Re-enabled the `SV_ARITH` delayed-arith `-O0` peephole (gate flip in `fold.c`, now that Track 7 removed the EA rider it conflicted with). `doc/CODEGEN.md` note flipped from gated-off to live. | +| `52897e0` | **4a** | Collapsed `INTRIN_BSWAP16/32/64` into one width-by-type `BSWAP` (`cgtarget.h`). `arith.c` drops the size-branch; each backend (`aa64`/`x64`/`rv64` native, interp, c_target, wasm) derives width from `dsts[0].type` under a `switch(width)`, preserving the existing sequences. Pure internal dedup; public API unchanged. | +| `7eaf7bf9` | **4b** | `unreachable` is now a first-class terminator hook with its own `CgTarget` hook + IR op (recorder + opt), not routed through the intrinsic path. The 5 backends + interp + every opt pass that handles terminators (CFG/DCE/SSA/native-emit/…) handle it directly. | +| `15e2effc` | **4c** | `cfree_cg_target_supports_intrinsic` query + a real unsupported-feature diagnostic (replacing the bare `compiler_panic`); implemented the single-instruction baremetal/CPU intrinsics (`cpu_nop`/`yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) on the native arches. Converted the `test/toy/err/unsupported_*` panic cases into positive smoke cases + added the capability-query test. `FMA`/`SYSCALL`/`CORO_SWITCH` still report `false`. | -So **Tracks 1a/1d, 5, 3a, 6.2 are done; Track 2 is 2/3 done** (the 3 identical enums); **Track 4** -has FP_REM removed. +So **Tracks 1a/1d, 5, 3a, 3b, 6, 7 are done; Track 4 is done** (FP_REM + 4a/4b/4c); +**Track 2 is 2/3 done** (the 3 identical enums; the binop/cmp split remains). ### Caveats / follow-ups discovered while doing the above - **Track 5 multi-result is single-result-complete only.** The `-O0` native path handles @@ -146,121 +157,109 @@ the real fix. --- -## Track 3b — Bitfields as a PLACE subkind (REMAINING — do with/after Track 7) +## Track 3b — Bitfields as a PLACE subkind (DONE — `d08e794`) -Three representations today: a rider on `CfreeCgMemAccess` -(`bit_offset`/`bit_width`/`storage_size`/`bit_signed`), a rider on `CfreeCgField` -(`bit_width`/`bit_offset`/`bit_storage_size`/`bit_signed` — note `storage_size` vs -`bit_storage_size` drift), and internal `BitFieldAccess` + `bitfield_load`/`_store`. - -**Decision (#7): a bitfield is a PLACE subkind** carrying the descriptor; the normal -`load`/`store` perform the extract/insert. This merges into Track 7 (it depends on the -place model). Drop the bitfield fields from `CfreeCgMemAccess`; keep `CfreeCgField`'s -layout-query fields but fix the naming drift. Removes the "every memop is secretly -maybe-a-bitfield" branch in `cfree_cg_load`/`_store`. - -**Affected:** `cg.h`, `cgtarget.h`, `memory.c`, all backends' `load`/`store`/`bitfield_*`, -`lang/c/parse/cg_adapter.c`. **Tests:** bitfield corpus in toy + C; `test-cg-api`. +**LANDED on `codegen-tracks-7634`** (`d08e794`). A bitfield is now a PLACE subkind: the +bit-field rider was dropped from `CfreeCgMemAccess`, and the strict `load`/`store` carry +the bit-field geometry (storage size/offset, bit offset/width, signedness) via the +`CfreeCgMemAccess` the frontend supplies, rebuilt through `bf_from_access`. `cfree_cg_field` +now pushes the record-base address as a place of the field type with no `delayed.bitfield` +/`bitfield_lvalue` rider, and the "every memop is secretly maybe-a-bitfield" branch in +`cfree_cg_load`/`_store` is gone. Touched `cg.h`, `internal.h`, `memory.c`, `value.c`, +`control.c`, and `lang/c/parse/cg_adapter.c`; green on the bitfield corpus + `test-cg-api` ++ bootstrap. (Done as a PLACE subkind on the strict `load`/`store`, after Track 7 core.) --- -## Track 4 (remaining) — op/intrinsic taxonomy - -FP_REM removal is done. Remaining: - -### 4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP` -Internal `IntrinKind` has 3 bswaps; public has 1 (`CFREE_CG_INTRIN_BSWAP`). `api_map_intrinsic` -(`arith.c`) picks the internal one by `abi_cg_sizeof(result_type)`. **Feasible to collapse:** -`NativeLoc` carries `.type` and `NativeTarget` has `t->c->abi`, so backends derive width from -`dsts[0].type` (the result type — same source the size-branch uses). Collapse = wrap each -backend's three existing sequences under a `switch(width)`; preserve the sequences verbatim. -Touches `cgtarget.h` (enum), `arith.c` (drop the size-branch), and the bswap cases in -`aa64`/`x64`/`rv64` `native.c`, `interp/engine.c`, `c_target/c_emit.c`, and **wasm -(`arch/wasm/emit.c`, multi-site ~1577/1708/2894/3113 + capability path)**. NOTE the C -frontend's `cg_adapter.h` has its own `INTRIN_BSWAP16/32/64`; leave it (it maps to the public -single `BSWAP` at the call site). Pure internal dedup — public API unchanged. - -### 4b. `unreachable` as a first-class terminator hook -`cfree_cg_unreachable` is documented "a real terminator, not a side-effect intrinsic" -(`cg.h`) but is routed through the **intrinsic** hook (`control.c`, `INTRIN_UNREACHABLE`). -Give it its own `CgTarget` hook + its own IR op (recorder + opt), and move the 5 backends' -`INTRIN_UNREACHABLE` handling onto it. (Terminators are first-class: ret, unreachable, jump, -branch, computed_goto, tail-call.) - -### 4c. Façade intrinsics: query + implement the trivial ones -`api_map_intrinsic` maps ~16 enumerators (`FMA`, `SYSCALL`, all `IRQ_*`, `DMB`/`DSB`/`ISB`, -`DCACHE_*`/`ICACHE_*`, `CPU_NOP`/`CPU_YIELD`/`WFI`/`WFE`/`SEV`, `CORO_SWITCH`) → `INTRIN_NONE`, -and `cfree_cg_intrinsic` turns `INTRIN_NONE` into a bare `compiler_panic` (`arith.c`). The toy -frontend calls them in good faith; `test/toy/err/unsupported_*` encode the panic as current -behavior. There is **no `supports_` query for intrinsics**. - -1. Add `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)` (mirror - `cfree_cg_target_supports_call_conv`/`_symbol_feature`). Needs a per-arch capability source. -2. Convert the bare `compiler_panic` into a proper unsupported-feature diagnostic. -3. Implement the trivial single-instruction baremetal/CPU intrinsics on the native arches - (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) — one instruction each; - convert the corresponding `test/toy/err/` cases to positive smoke cases. -4. Leave `FMA`/`SYSCALL`/`CORO_SWITCH` reported `false` until implemented. - -Also settle: keep `memcpy`/`memset` as dedicated *public* ops (they carry rich `MemAccess`) -but stop double-modeling them as a separate public *intrinsic* surface. - -**Affected:** `cg.h`, `cgtarget.h`, `arith.c`, `control.c`, native backends' `intrinsic`, -`lang/toy/builtins.c`, `test/toy/err/`. +## Track 4 — op/intrinsic taxonomy (DONE — `5e1335d` + `52897e0` + `7eaf7bf9` + `15e2effc`) + +**LANDED.** FP_REM removal (`5e1335d`) plus 4a/4b/4c on `codegen-tracks-7634`: + +### 4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP` — DONE (`52897e0`) +Collapsed the 3 internal `IntrinKind` bswaps into one width-by-type `BSWAP` in `cgtarget.h`. +`arith.c` dropped the `abi_cg_sizeof`-driven size-branch; each backend now derives width from +`dsts[0].type` and wraps its three existing sequences under a `switch(width)`, preserving them +verbatim. Done across `aa64`/`x64`/`rv64` `native.c`, `interp/engine.c`, `c_target/c_emit.c`, +and **wasm (`arch/wasm/emit.c` + `internal.h`)**. The C frontend's `cg_adapter.h` +`INTRIN_BSWAP16/32/64` was left as-is (maps to the public single `BSWAP` at the call site). +Pure internal dedup — public API unchanged. + +### 4b. `unreachable` as a first-class terminator hook — DONE (`7eaf7bf9`) +`cfree_cg_unreachable` now has its own `CgTarget` hook + its own IR op (recorder + opt) and is +no longer routed through the intrinsic hook. The 5 backends' + interp's handling, plus every +opt pass that handles terminators (`pass_cfg`/`pass_dce`/`pass_ssa`/`pass_analysis`/`pass_o2`/ +`pass_lower`/`pass_native_emit`, `cg_ir_lower`, `ir_dump`/`ir_print`, `check_target`), were +moved onto it. (Terminators are first-class: ret, unreachable, jump, branch, computed_goto, +tail-call.) + +### 4c. Façade intrinsics: query + implement the trivial ones — DONE (`15e2effc`) +Added `cfree_cg_target_supports_intrinsic(CfreeCompiler*, CfreeCgIntrinsic)` (mirroring +`_supports_call_conv`/`_symbol_feature`) and converted the bare `compiler_panic` into a proper +unsupported-feature diagnostic. Implemented the single-instruction baremetal/CPU intrinsics on +the native arches (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`). The +`test/toy/err/unsupported_*` panic cases were converted into positive smoke cases (plus a new +`144_intrinsic_capability_query` + `145_baremetal_privileged_aa64`). `FMA`/`SYSCALL`/ +`CORO_SWITCH` still report `false` until implemented. + +**Follow-up not done here:** the "keep `memcpy`/`memset` as dedicated public ops but stop +double-modeling them as a separate public intrinsic surface" cleanup was *not* part of this +slice — it remains an open taxonomy tidy if wanted. --- -## Track 6 — Isolate and complete the semantic peephole +## Track 6 — Isolate and complete the semantic peephole (DONE — `d03eb4c` + `b8de5c0`) The semantic layer is also a `-O0` peephole optimizer — a **kept feature** (Principle 6). +**Status: DONE.** Both 6.2 (`d03eb4c`) and 6.3 (`b8de5c0`) landed. + ### Current state -- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, from `arith.c`) and +- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, in `fold.c`) and the `SV_CMP` fused-compare-into-branch path (`api_make_cmp`/`api_materialize_cmp_to`/ `api_branch_if`). -- **Disabled (not dead):** the `SV_ARITH` delayed-arith subsystem, gated by - `api_can_delay_int_arith()==0`. It was live until commit `a126bec` flipped it off to ship - the EA rider; **Track 7 removes that rider**, so re-enabling is clean. +- **Live again:** the `SV_ARITH` delayed-arith subsystem — re-enabled by `b8de5c0` once + Track 7 removed the EA rider it conflicted with. - **Live:** scalar store-to-load forwarding (`api_local_const_*`). -### Action +### Action — completed 1. **6.2 — Extract the live peephole into `src/cg/fold.c` + `fold.h`** — **DONE** (`d03eb4c`). The documented contract covers the integer fold helpers, the `SV_CMP` lifecycle, and const-local forwarding with its invalidation boundaries - (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The (gated-off) - `SV_ARITH` machinery was moved alongside it so 6.3 is a gate flip, not a code move. Op - families call into `fold.h`; `value.c` keeps the stack discipline. `ApiSValue`'s shape is - now settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated. -2. **6.3 — Re-enable delayed arith *after* Track 7.** Restore the gate - (`g && !flags && api_foldable_int_type(...)`) in `api_can_delay_int_arith` (now in - `fold.c`); the `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith`/the - fold-chain + identity-collapse helpers already live under `fold.c` — verify they compose - with the place/value model. -3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** to match the restored, isolated peephole. 6.2 - already corrected it to introduce `fold.c` and mark delayed arith gated-off; 6.3 should - flip that note to "live" once re-enabled. + (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The (then + gated-off) `SV_ARITH` machinery was moved alongside it so 6.3 was a gate flip, not a code + move. Op families call into `fold.h`; `value.c` keeps the stack discipline. `ApiSValue`'s + shape is settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated. +2. **6.3 — Re-enable delayed arith after Track 7** — **DONE** (`b8de5c0`). The gate in + `api_can_delay_int_arith` (in `fold.c`) was restored now that Track 7 removed the EA + rider; the `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith` fold-chain + + identity-collapse helpers compose with the place/value model. Green at -O0; bootstrap + reproduces. +3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** — **DONE.** 6.2 introduced `fold.c` and marked + delayed arith gated-off; 6.3 flipped that note to "live". --- -## Track 7 — Strict place/value discipline (the centerpiece) - -**Status: core LANDED** (`c338c74`+`8e17cb9`). The public addressing surface is the strict -`push_local`/`addr`/`deref`/`field`/`elem`/`load`/`store` set; the `CfreeCgEffAddr` rider is -gone; every op panics on a place/value kind mismatch (no inference at the boundary); and the -place ops fold the constant offset/scale into one `OPK_INDIRECT[base + index*scale + offset]` -for clean memops. All frontends + emu + cg-api tests conform; `make bootstrap` reproduces at --O0 AND -O1. - -**Remaining refinements (follow-ups):** -- The *internal* `api_is_lvalue_sv` is still the place predicate (a heuristic on `ApiSValue`). - The boundary is strict, but replacing the internal heuristic with an explicit `ApiSValue` - kind tag (Model B's "every stack entry is exactly one kind") is not yet done. -- Aggregate **VALUE**s aren't yet hard-forbidden inside the stack (aggregates are PLACEs in - practice, but there's no panic on an aggregate-typed value). -- **wide16** (i128/f128) is still special-cased as aggregate-like in `memory.c`/`call.c`/ - `wide.c` rather than flowing as a VALUE. -- **Bitfields** still ride on `CfreeCgMemAccess`/`CfreeCgField` (Track 3b: make them a PLACE - subkind on the strict `load`/`store`). +## Track 7 — Strict place/value discipline (the centerpiece) — DONE + +**Status: LANDED** (`c338c74`+`8e17cb9` core; `a0397c6` 7.1/7.2; `6f48bfd` 7.3). The public +addressing surface is the strict `push_local`/`addr`/`deref`/`field`/`elem`/`load`/`store` +set; the `CfreeCgEffAddr` rider is gone; every op panics on a place/value kind mismatch (no +inference at the boundary); and the place ops fold the constant offset/scale into one +`OPK_INDIRECT[base + index*scale + offset]` for clean memops. All frontends + emu + cg-api +tests conform; `make bootstrap` reproduces at -O0 AND -O1. + +**Refinements (now landed):** +- The *internal* place predicate `api_is_lvalue_sv` is **now kind-based** (`a0397c6`): + `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)`, replacing the old + heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed). +- Aggregate **VALUE**s are **now hard-forbidden** (`a0397c6`): `api_push` panics on an + aggregate-typed non-place value. i128/f128 are scalars and unaffected. +- **wide16** (i128/f128) **now flows as a VALUE** (`6f48bfd`): the aggregate-like special + paths in `memory.c`/`call.c` collapsed (~100 lines deleted). +- **Bitfields** are **now a PLACE subkind** (Track 3b, `d08e794`): the bit-field rider was + dropped from `CfreeCgMemAccess`; the strict `load`/`store` carry the geometry. + +**Remaining refinement (follow-up, non-blocking per decision #8):** - **-O0 mem-op quality**: the C frontend reaches non-trivial places via `pcg_materialize_lv_to_ptr` (int arithmetic) + `deref`; it could instead emit `deref(offset)`/`elem` directly so -O0 also gets the folded addressing mode (-O1's @@ -335,32 +334,29 @@ red-green per op on the toy corpus + C frontend; `-O0` quality is not a gate (de ## Recommended sequencing (remaining) -1. **Track 1c** completeness audit + tests (small, no behavior change). -2. **Track 6.2** — isolate the live fold layer into `fold.c`. **DONE (`d03eb4c`).** Settles - `ApiSValue` and is a clean dependency for both the Track 2 binop/cmp split and Track 7. -3. **Track 2 binop/cmp split** — independent of 6.2 but cleaner after it (shares the fold - layer). Also fixes the lossy FP compare. -4. **Track 7** (place/value) — the centerpiece; removes the EA rider; do it red-green. -5. **Track 6.3** — re-enable delayed arith once Track 7 removed the EA rider. -6. **Track 3b** — bitfield-as-PLACE-subkind, on the strict place-based `load`/`store`. -7. **Track 4** (bswap collapse, `unreachable` hook, `supports_intrinsic`, CPU intrinsics) — - independent; can be done any time. -8. **Track 5 follow-up** — true multi-value at `-O1` (opt `cg_ir_lower`) + wasm, if wanted. +Most of the original sequence is landed: **6.2** (`d03eb4c`), **Track 7** core + 7.1/7.2/7.3 +(`c338c74`/`8e17cb9`/`a0397c6`/`6f48bfd`), **6.3** (`b8de5c0`), **Track 3b** (`d08e794`), and +**Track 4** 4a/4b/4c (`52897e0`/`7eaf7bf9`/`15e2effc`) are all done. What's left: + +1. **Track 2 binop/cmp split** — the largest remaining mechanical change; cleaner now that + the fold layer is isolated (6.2). Also fixes the lossy FP compare. +2. **Track 1c** completeness audit + tests (small, no behavior change). +3. **Track 5 follow-up** — true multi-value at `-O1` (opt `cg_ir_lower`) + wasm, if wanted. +4. **Track 4 taxonomy tidy** (optional) — stop double-modeling `memcpy`/`memset` as a + separate public intrinsic surface (kept as dedicated public ops). -2, 4, 7 are independent of each other; 6.2 helps 2 and 7; 6.3 and 3b depend on 7. +Track 2 is independent of everything still open; the fold-layer isolation (6.2) already +helps it. ## Decisions still governing remaining work 2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public split enums; delete internal `BinOp`/`UnOp`/`CmpOp` + their `api_map_*`. (Atomic/Order/AsmDir - already done.) -3. **Façade intrinsics: query + implement the trivial ones.** Add `supports_intrinsic` + a - clean diagnostic; implement single-instruction baremetal/CPU intrinsics; report - `FMA`/`SYSCALL`/`CORO_SWITCH` false until built. (`FP_REM` already removed.) -6. **`elem` operand shape: pointer VALUE + explicit array-decay.** -7. **Bitfields: PLACE subkind** (merges Track 3b into Track 7). -8. **`-O0` quality: not a gate.** Track 7 may land with `-O0` regressions; `-O1+` carries - quality. Track 6.3 still restores the peephole for the free `-O0` win but does not block 7. - -(Decisions 1, 4, 5 are realized: 1 = peephole kept/re-enable under Track 6; 4 = `ret_void` -removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed.) + already done.) — **still governs the open Track 2 binop/cmp split.** + +(Decisions 1, 3, 4, 5, 6, 7, 8 are realized: 1 = peephole kept + re-enabled under Track 6 +(`b8de5c0`); 3 = `supports_intrinsic` + diagnostic + CPU intrinsics landed (`15e2effc`), +`FP_REM` removed; 4 = `ret_void` removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed; +6 = `elem` is a pointer VALUE + explicit array-decay (Track 7 core); 7 = bitfields are a +PLACE subkind (`d08e794`); 8 = `-O0` quality was not a gate for Track 7, and Track 6.3 +restored the peephole.)