Codegen Interface Cleanup — Roadmap (remaining work)

Status: mostly executed. The independent, lower-risk tracks landed first; the high-blast-radius work has since followed. The PLACE/VALUE centerpiece (Track 7, with Track 3b folded in) is complete — strict addressing, the explicit place predicate, forbidden aggregate VALUEs, i128/f128 flowing as VALUEs, and bitfields as a PLACE subkind are all landed. The op/intrinsic taxonomy (Track 4) is complete. The fold-layer isolation + delayed-arith re-enable (Track 6) is complete. What remains is the binop/cmp op-split (the rest of Track 2), the Track 1c completeness audit, and the Track 5 multi-value follow-up.

✓ RESOLVED — i128 -O1 regression fixed. test-parse i128_06_shifts_bitwise had crashed (SIGSEGV/SIGABRT) at -O1 on the native/link paths. Root cause: Track 7.3 made i128/f128 flow as scalar VALUEs, but api_make_wide16_int_const (src/cg/wide.c) still returned an lvalue/PLACE, so an i128 constant entered the stack as a place; the O1 ABI lowering then passed it by-reference and dereferenced a value slot as a (mistyped 32-bit) pointer → null deref on i128→_Bool compares. Fix: return a VALUE (api_make_sv), matching api_push_call_result's i128 representation. Now full test-parse is 3784/0 and bootstrap reproduces at -O0 AND -O1. Lesson: codegen gates must include test-parse — bootstrap reproducing ≠ correctness.

Forward-looking companion to the canonical design in doc/CODEGEN.md. Goal: make the KitCg public API and the internal CgTarget contract carry one clear representation per concept, with no advertise-but-ignore surface and no façade. Breaking and sweeping changes are in scope; reducing churn is not a priority.

The centerpiece was Track 7 — a strict PLACE/VALUE stack discipline that ends CG's inference of what a stack slot means. It is now landed (Model B; see §Track 7).

Scope

Two stacked interfaces (see doc/CODEGEN.md §The two boundaries):

Public kit_cg_* / KitCg (include/kit/cg.h) — a value-stack machine.
Internal CgTarget (src/cg/cgtarget.h) — a three-address operand vtable. NOTE the op enums also flow into the physical NativeTarget (src/arch/native_target.h, which #includes cgtarget.h), the recorder IR (src/cg/ir.h) and opt IR (src/opt/ir.h), and the interpreter (src/interp/). Any enum change touches all of these layers, not just the semantic vtable.

Between them sits the translation layer (src/cg/value.c, arith.c, memory.c, control.c, call.c), which also performs -O0 constant folding and compare fusion.

Principles we are enforcing (unchanged)

One representation per concept. No concept in two structs/enums hand-kept in sync.
No advertise-but-ignore. A public field/flag is honored or it does not exist.
No façade. A public enumerator that always panics is a bug — implement it, remove it, or gate it behind a capability query + clean diagnostic.
Width belongs to the type, not the opcode. bswap is one operation, not three.
Ops vs intrinsics has a stated rule (§Track 4) and both layers obey it.
The semantic layer may peephole, but that responsibility is named and isolated. The vstack peephole is a kept feature (free -O0 perf), not removed.
Completeness over minimalism. Keep an op/enumerator with a distinct, sensible meaning that completes an orthogonal set — even with no caller. Remove only the redundant: two spellings of one behavior.

Done (committed, all green: lib · toy 1344/0 · cg-api · smoke x64/rv64 · opt · isa · libc; `make bootstrap` reproduces at -O0 AND -O1)

Commit	Track	Summary
`e27a288`	1a / 1d	Removed `SCOPE_IF` / `CGScopeDesc.cond` / the `scope_else` hook (both IR opcodes `CG_IR_SCOPE_ELSE` + `IR_SCOPE_ELSE`, all 5 realizations, the `desc.cond` opt walkers; ~22 files). Removed `KIT_CG_TAIL_NEVER` (redundant with `DEFAULT`).
`ae8d0f6`	5	Multi-result public API: `KitCgFuncSig.results[]`/`nresults` (+ `KitCgFuncResult`), `kit_cg_type_func_nresults`/`_result`, `kit_cg_ret_void` removed (void = 0-result `kit_cg_ret`). Type system stores `results[]`; `kit_cg_call` pushes/`kit_cg_ret` pops in declaration order (last result on TOS). Includes a self-host regression fix: a no-value return on a non-void function (UB fall-off) now emits `kit_cg_unreachable` instead of underflowing the value stack (`pcg_ret` in `lang/c/parse/cg_adapter.c`).
`fabf255`	3a	Dropped `KIT_CG_MEM_NONTEMPORAL`/`_INVARIANT` + `KitCgMemAccess.alias_scope`/`noalias_scope` (decision #5) and the matching toy attributes.
`5e1335d`	4 (FP_REM)	Removed the `KIT_CG_FP_REM` façade (always-panic; only dead callers). FP remainder is a libcall the frontend emits.
`917ffe9`	2 (AsmDir)	Deleted internal `AsmDir` + `api_map_asm_dir`; `AsmConstraint.dir` and backends use public `KitCgAsmDir`.
`a2f6367`	2 (Atomic/Order)	Deleted internal `AtomicOp`/`MemOrder` + `api_map_atomic_op`/`api_map_mem_order`; both the semantic `CgTarget` and physical `NativeTarget` atomic hooks, the recorder+opt IR aux, and the interpreter now carry public `KitCgAtomicOp`/`KitCgMemOrder`.
`d03eb4c`	6.2	Isolated the `-O0` semantic peephole into `src/cg/fold.{c,h}`: integer constant folding, the `SV_CMP` delayed-compare lifecycle, the (gated-off) `SV_ARITH` delayed-arith lifecycle, and const-local store-to-load forwarding with its invalidation boundaries. `fold.h` is the documented contract, re-exported via `internal.h`; `value.c` keeps stack discipline, `api_lvalue_addr`, and the enum-mapping helpers. Pure relocation, no behavior change. `doc/CODEGEN.md` updated.
`c338c74`+`8e17cb9`	7 (core)	Strict PLACE/VALUE addressing. Removed `KitCgEffAddr` from `load`/`store` (they consume a PLACE); added `deref(offset)` (VALUE ptr→PLACE), renamed `index`→`elem` (VALUE ptr + index→PLACE, scale=sizeof(T)), kept `field(i)`/`addr`. Each op panics on kind mismatch — no place/value inference. The place ops fold the constant offset (deref/field) and scale (elem→`log2_scale`) into one `OPK_INDIRECT[base + index*scale + offset]`, so the backend still gets a single addressing-mode memop (base/index dynamic, scale/offset folded). All three frontends + emu + cg-api tests conformed (explicit `deref`/decay/`field`). `cg.h` documents the kinds + per-op contracts. Green: toy 1344/0, cg-api, opt (incl tiny-inline), smoke, libc, isa/link/elf, and `make bootstrap` reproduces byte-identical at -O0 AND -O1.
`a0397c6`	7.1 / 7.2	Explicit PLACE predicate + forbid aggregate VALUEs. `api_is_lvalue_sv` is now a kind-based predicate — `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)` — replacing the old heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed; `SV_CMP`/`SV_ARITH` never carry `lvalue=1`). `api_push` now panics if an aggregate-typed value enters the stack as a non-place (aggregates are always PLACEs; i128/f128 are scalars and unaffected).
`6f48bfd`	7.3	Flow i128/f128 as VALUEs, collapse the wide16 special paths in `memory.c`/`call.c` (~100 lines deleted). The 16-byte scalars now ride the value path; the aggregate-like special-casing is gone.
`d08e794`	3b	Bitfield as a PLACE subkind, single representation. Dropped the bit-field rider on `KitCgMemAccess`; the strict `load`/`store` carry the bit-field geometry via the `KitCgMemAccess` the frontend supplies (rebuilt through `bf_from_access`), and `kit_cg_field` pushes the record-base address as a place of the field type with no `delayed.bitfield`/`bitfield_lvalue` rider. Removes the "every memop is secretly maybe-a-bitfield" branch.
`b8de5c0`	6.3	Re-enabled the `SV_ARITH` delayed-arith `-O0` peephole (gate flip in `fold.c`, now that Track 7 removed the EA rider it conflicted with). `doc/CODEGEN.md` note flipped from gated-off to live.
`52897e0`	4a	Collapsed `INTRIN_BSWAP16/32/64` into one width-by-type `BSWAP` (`cgtarget.h`). `arith.c` drops the size-branch; each backend (`aa64`/`x64`/`rv64` native, interp, c_target, wasm) derives width from `dsts[0].type` under a `switch(width)`, preserving the existing sequences. Pure internal dedup; public API unchanged.
`7eaf7bf9`	4b	`unreachable` is now a first-class terminator hook with its own `CgTarget` hook + IR op (recorder + opt), not routed through the intrinsic path. The 5 backends + interp + every opt pass that handles terminators (CFG/DCE/SSA/native-emit/…) handle it directly.
`15e2effc`	4c	`kit_cg_target_supports_intrinsic` query + a real unsupported-feature diagnostic (replacing the bare `compiler_panic`); implemented the single-instruction baremetal/CPU intrinsics (`cpu_nop`/`yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_`) on the native arches. Converted the `test/toy/err/unsupported_` panic cases into positive smoke cases + added the capability-query test. `FMA`/`SYSCALL`/`CORO_SWITCH` still report `false`.

So Tracks 1a/1d, 5, 3a, 3b, 6, 7 are done; Track 4 is done (FP_REM + 4a/4b/4c); Track 2 is 2/3 done (the 3 identical enums; the binop/cmp split remains).

Caveats / follow-ups discovered while doing the above

Track 5 multi-result is single-result-complete only. The -O0 native path handles nresults > 1, but the opt path (src/opt/cg_ir_lower.c, the CG_IR_CALL/CG_IR_RET lowering) still only threads results[0] — a true 2+-result function is lossy at -O1. The wasm frontend (lang/wasm/cg.c) was also migrated as single-result (takes f->results[0]). True multi-value end-to-end (wasm + -O1) is unfinished follow-up.
The C frontend keeps its own private copies of BinOp/AtomicOp/MemOrder/ IntrinKind in lang/c/parse/cg_adapter.h. These are a separate Principle-1 issue, deliberately left alone by Track 2 (they're a different namespace; do not blind-rename AO_*/MO_*/BO_* across lang/). Worth a follow-up to dedupe against the public enums.
Regression lesson (in [[doc/plan/BOOTSTRAP.md]] / the self-build): removing a "bare return that ignores result count" primitive means every frontend's fall-off / default return must push the right number of values or terminate with unreachable. Audit other frontends if you remove more return primitives.

Track 1c — Conditional control ops: completeness audit + tests (REMAINING, small)

KEEP the full {break, continue} × {unconditional, _true, _false} set (Principle 7; break_true/continue_true/continue_false have 0 callers, break_false 1, but the set is the structured analog of branch_true/branch_false). Remaining work is test coverage + an audit, not code change:

Confirm continue* is rejected on non-loop scopes and block-vs-loop rules are uniform.
Add an end-to-end test for each break_*/continue_* variant on a backend (the result-carrying semantics are spec'd at cg.h kit_cg_break_true &c).

Track 2 (remaining) — Split the merged `BinOp`/`UnOp`/`CmpOp`

The 3 identical enums (Atomic/Order/AsmDir) are done. What remains is the split→merged trio, which is the structural core of Track 2 (the largest remaining mechanical change):

Public (`cg.h`)	Internal (`cgtarget.h`)	Relationship
`KitCgIntBinOp`(13) + `KitCgFpBinOp`(4)	`BinOp`	split→merged
`KitCgIntCmpOp`(10) + `KitCgFpCmpOp`(12)	`CmpOp`(14)	split→merged, lossy
`KitCgIntUnOp`(3) + `KitCgFpUnOp`(1)	`UnOp`	split→merged

Why it matters: the merge is lossy — api_map_fp_cmp collapses OEQ/UEQ → one CMP_EQ (value.c), so the public ordered/unordered FP-compare distinction cannot reach a backend. Fixing that is the real correctness win; the binop/unop dedup is consistency.

Decision (#2): CgTarget consumes the public split enums directly; backends switch on KitCgIntBinOp and KitCgFpBinOp separately. Delete BinOp/UnOp/CmpOp and api_map_int_binop/api_map_fp_binop/api_map_int_unop/api_map_int_cmp/api_map_fp_cmp.

Why this is bigger than the atomic slice — the design to implement

Unlike Atomic/Order (a 1:1 value-preserving rename), this is a genuine split:

Hooks split (cgtarget.h, mirrored in native_target.h if any binop/cmp is physical — check; binop/cmp are semantic CgTarget hooks): binop→int_binop/fp_binop, unop→int_unop/fp_unop, cmp→int_cmp/fp_cmp, cmp_branch→ int_cmp_branch/fp_cmp_branch.
IR opcodes double — the recorder (src/cg/ir.h) and opt IR (src/opt/ir.h) store the op in extra.imm/aux; a single CG_IR_BINOP/IR_BINOP can't hold an ambiguous value (KIT_CG_INT_ADD == KIT_CG_FP_ADD == 0). Either split the opcodes (CG_IR_INT_BINOP/CG_IR_FP_BINOP, …) or add an is_fp discriminator bit. Splitting the opcodes is cleaner; both touch ir_recorder.c, cg_ir_lower.c, pass_native_emit.c, ir_dump.c/ir_print.c, and every opt pass that switches on IR_BINOP/IR_CMP/ IR_CMP_BRANCH (pass_combine, pass_simplify, pass_o2, pass_jump, …).
Fold layer restructures (arith.c + value.c). api_cg_binop(BinOp) / api_cg_unop(UnOp) / api_cg_cmp(CmpOp) are the shared dispatch. Note the int/fp split is already made by TYPE (api_type_is_float), not by the enum — so splitting the dispatch is natural: the int path keeps the fold (api_try_fold_int_binop/_unop/_cmp, int-only) and the delayed forms (SV_ARITH arith, SV_CMP compare; ApiDelayedArith.bin_op/ un_op, ApiDelayedCmp.op, api_make_cmp, api_materialize_cmp_to, api_invert_cmp, api_branch_if); the fp path is simpler (the f128 helper path in kit_cg_fp_binop already exists, plus the fp hook). This is the subtle, high-risk part — get the delayed-compare fusion + constant-fold right per int/fp. Coordinate with Track 6.2 (which moves these into fold.c); doing 6.2 first may make this cleaner.
Backends split their switches: the 3 native arches (aa64/x64/rv64 native.c — they already re-split int/fp internally), c_target/c_emit.c, wasm/emit.c, and the interpreter (interp/engine.c).

Method that worked for the atomic slice: delete the internal enum + change the hook signatures, then let -Werror enumerate every cg-side site (the C frontend's cg_adapter.h copy won't be flagged — it's a different type). Then fix per file. For the value-label renames, sed only within src/cg|arch|opt|interp (never lang/, never src/wasm/).

Tests: test-isa/test-arch (encode/decode), test-opt, smoke; add an unordered FP compare exercised end-to-end (the currently-lossy case) — that's the regression guard for the real fix.

Track 3b — Bitfields as a PLACE subkind (DONE — `d08e794`)

LANDED on codegen-tracks-7634 (d08e794). A bitfield is now a PLACE subkind: the bit-field rider was dropped from KitCgMemAccess, and the strict load/store carry the bit-field geometry (storage size/offset, bit offset/width, signedness) via the KitCgMemAccess the frontend supplies, rebuilt through bf_from_access. kit_cg_field now pushes the record-base address as a place of the field type with no delayed.bitfield /bitfield_lvalue rider, and the "every memop is secretly maybe-a-bitfield" branch in kit_cg_load/_store is gone. Touched cg.h, internal.h, memory.c, value.c, control.c, and lang/c/parse/cg_adapter.c; green on the bitfield corpus + test-cg-api

bootstrap. (Done as a PLACE subkind on the strict load/store, after Track 7 core.)

Track 4 — op/intrinsic taxonomy (DONE — `5e1335d` + `52897e0` + `7eaf7bf9` + `15e2effc`)

LANDED. FP_REM removal (5e1335d) plus 4a/4b/4c on codegen-tracks-7634:

4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP` — DONE (`52897e0`)

Collapsed the 3 internal IntrinKind bswaps into one width-by-type BSWAP in cgtarget.h. arith.c dropped the abi_cg_sizeof-driven size-branch; each backend now derives width from dsts[0].type and wraps its three existing sequences under a switch(width), preserving them verbatim. Done across aa64/x64/rv64 native.c, interp/engine.c, c_target/c_emit.c, and wasm (arch/wasm/emit.c + internal.h). The C frontend's cg_adapter.h INTRIN_BSWAP16/32/64 was left as-is (maps to the public single BSWAP at the call site). Pure internal dedup — public API unchanged.

4b. `unreachable` as a first-class terminator hook — DONE (`7eaf7bf9`)

kit_cg_unreachable now has its own CgTarget hook + its own IR op (recorder + opt) and is no longer routed through the intrinsic hook. The 5 backends' + interp's handling, plus every opt pass that handles terminators (pass_cfg/pass_dce/pass_ssa/pass_analysis/pass_o2/ pass_lower/pass_native_emit, cg_ir_lower, ir_dump/ir_print, check_target), were moved onto it. (Terminators are first-class: ret, unreachable, jump, branch, computed_goto, tail-call.)

4c. Façade intrinsics: query + implement the trivial ones — DONE (`15e2effc`)

Added kit_cg_target_supports_intrinsic(KitCompiler*, KitCgIntrinsic) (mirroring _supports_call_conv/_symbol_feature) and converted the bare compiler_panic into a proper unsupported-feature diagnostic. Implemented the single-instruction baremetal/CPU intrinsics on the native arches (cpu_nop/cpu_yield/wfi/wfe/sev/isb/dmb/dsb/irq_*). The test/toy/err/unsupported_* panic cases were converted into positive smoke cases (plus a new 144_intrinsic_capability_query + 145_baremetal_privileged_aa64). FMA/SYSCALL/ CORO_SWITCH still report false until implemented.

Follow-up not done here: the "keep memcpy/memset as dedicated public ops but stop double-modeling them as a separate public intrinsic surface" cleanup was not part of this slice — it remains an open taxonomy tidy if wanted.

Track 6 — Isolate and complete the semantic peephole (DONE — `d03eb4c` + `b8de5c0`)

The semantic layer is also a -O0 peephole optimizer — a kept feature (Principle 6).

Status: DONE. Both 6.2 (d03eb4c) and 6.3 (b8de5c0) landed.

Current state

Live: constant folding (api_try_fold_int_binop/_unop/_cmp, in fold.c) and the SV_CMP fused-compare-into-branch path (api_make_cmp/api_materialize_cmp_to/ api_branch_if).
Live again: the SV_ARITH delayed-arith subsystem — re-enabled by b8de5c0 once Track 7 removed the EA rider it conflicted with.
Live: scalar store-to-load forwarding (api_local_const_*).

Action — completed

6.2 — Extract the live peephole into src/cg/fold.c + fold.h — DONE (d03eb4c). The documented contract covers the integer fold helpers, the SV_CMP lifecycle, and const-local forwarding with its invalidation boundaries (api_local_const_memory_boundary/_control_boundary/_address_taken). The (then gated-off) SV_ARITH machinery was moved alongside it so 6.3 was a gate flip, not a code move. Op families call into fold.h; value.c keeps the stack discipline. ApiSValue's shape is settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated.
6.3 — Re-enable delayed arith after Track 7 — DONE (b8de5c0). The gate in api_can_delay_int_arith (in fold.c) was restored now that Track 7 removed the EA rider; the api_make_arith_*/api_materialize_arith_to/api_release_arith fold-chain + identity-collapse helpers compose with the place/value model. Green at -O0; bootstrap reproduces.
Fix doc/CODEGEN.md — DONE. 6.2 introduced fold.c and marked delayed arith gated-off; 6.3 flipped that note to "live".

Track 7 — Strict place/value discipline (the centerpiece) — DONE

Status: LANDED (c338c74+8e17cb9 core; a0397c6 7.1/7.2; 6f48bfd 7.3). The public addressing surface is the strict push_local/addr/deref/field/elem/load/store set; the KitCgEffAddr rider is gone; every op panics on a place/value kind mismatch (no inference at the boundary); and the place ops fold the constant offset/scale into one OPK_INDIRECT[base + index*scale + offset] for clean memops. All frontends + emu + cg-api tests conform; make bootstrap reproduces at -O0 AND -O1.

Refinements (now landed):

The internal place predicate api_is_lvalue_sv is now kind-based (a0397c6): sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op), replacing the old heuristic OR (the bitfield_lvalue and source_local && OPK_LOCAL terms are subsumed).
Aggregate VALUEs are now hard-forbidden (a0397c6): api_push panics on an aggregate-typed non-place value. i128/f128 are scalars and unaffected.
wide16 (i128/f128) now flows as a VALUE (6f48bfd): the aggregate-like special paths in memory.c/call.c collapsed (~100 lines deleted).
Bitfields are now a PLACE subkind (Track 3b, d08e794): the bit-field rider was dropped from KitCgMemAccess; the strict load/store carry the geometry.

Remaining refinement (follow-up, non-blocking per decision #8):

-O0 mem-op quality: the C frontend reaches non-trivial places via pcg_materialize_lv_to_ptr (int arithmetic) + deref; it could instead emit deref(offset)/elem directly so -O0 also gets the folded addressing mode (-O1's addr-fold already recovers it; decision #8 makes this non-blocking).

Original design notes (for the remaining refinements)

Decided: Model B (explicit place/value kinds); wide-16 scalars are values. (Track 3c — the scale vs log2_scale rider mismatch — is subsumed here: the KitCgEffAddr rider is removed entirely.)

Today the value stack carries an inferred lvalue/rvalue distinction and several ops dispatch on type + shape. Inference points to remove:

api_is_lvalue_sv is a heuristic (value.c): ORs lvalue, bitfield_lvalue, api_operand_can_address, source_local!=NONE && OPK_LOCAL.
kit_cg_load has ~7 behaviors, several of which don't load (memory.c).
load/store base accepts 4 shapes ({lvalue, ptr-rvalue} × {no-index, indexed}); there is no explicit deref.
kit_cg_index / kit_cg_field infer pointer-vs-array / record-vs-pointer.
Aggregates are implicitly by-reference, CG decides it (call.c).
wide16 (i128/f128) is special-cased as aggregate-like (memory.c/call.c/wide.c).

The discipline

Every stack entry is exactly one explicit, type-checked kind:

PLACE — addressable location of a typed object (OPK_LOCAL/OPK_GLOBAL/ OPK_INDIRECT(base+index*scale+off)).
VALUE — a scalar rvalue: integers, floats, pointers, and i128/f128.

CG keeps owning layout (field offsets, element sizes, types). It stops guessing the kind or passing-mode of a stack value. Every op declares the kinds it consumes/produces and panics on mismatch.

Op signatures (strict, single-shape)

Op	Consumes	Produces	Notes
`push_local l`	—	PLACE	the local's storage
`push_int/float/null`	—	VALUE
`push_symbol_addr s,a`	—	VALUE (ptr)
`push_local_addr l`	—	VALUE (ptr)	sugar for `push_local; addr`
`addr`	PLACE	VALUE (ptr)	address of the place
`deref` (NEW)	VALUE (ptr)	PLACE	the explicit ptr→place transition
`field i`	PLACE(record)	PLACE(field)	offset/type from layout; `->` is `deref; field`
`elem` (was `index`)	VALUE(ptr to T) + index VALUE	PLACE(T)	`*(p+i)`; scale=`sizeof(T)`; arrays decay to ptr first
`load access`	PLACE	VALUE	always dereferences; no EA rider
`store access`	PLACE, VALUE	—	always dereferences

The KitCgEffAddr rider is removed from load/store: addressing is built explicitly by field/elem/deref and absorbed into the OPK_INDIRECT place, so the backend still gets a single [base+index*scale+off] memop. The kept fold layer (Track 6) recovers -O0 quality (load of PLACE(local) → the local; deref of a ptr-arith chain → the indirect place). Per decision #8 this recovery is desirable but not a gate.

Aggregates (values forbidden)

An aggregate is always a PLACE; a VALUE of aggregate type is illegal (panic). Copies are explicit (memcpy between places, or field-by-field). Call args/returns of aggregate type pass an explicit place, mode named via existing ABI attrs (SRET/BYVAL/BYREF). Removes the aggregate branches in api_materialize_call_local, api_push_call_result, and the aggregate ret path.

wide16 (scalar values)

i128/f128 are VALUES; the backend lowers 16-byte storage/moves. The wide16 special paths in memory.c/call.c/wide.c collapse into the value path (plus backend 16-byte value-move support where missing).

Affected: cg.h (new deref, elem rename, EA rider removed, ApiSValue kind tag), value.c, memory.c, control.c (index→elem, field), call.c, wide.c, every frontend (insert explicit deref/array-decay; mark aggregate passing modes). Backends mostly unaffected (they already consume OPK_INDIRECT). Tests: highest blast radius — red-green per op on the toy corpus + C frontend; -O0 quality is not a gate (decision #8).

Recommended sequencing (remaining)

Most of the original sequence is landed: 6.2 (d03eb4c), Track 7 core + 7.1/7.2/7.3 (c338c74/8e17cb9/a0397c6/6f48bfd), 6.3 (b8de5c0), Track 3b (d08e794), and Track 4 4a/4b/4c (52897e0/7eaf7bf9/15e2effc) are all done. What's left:

Track 2 binop/cmp split — the largest remaining mechanical change; cleaner now that the fold layer is isolated (6.2). Also fixes the lossy FP compare.
Track 1c completeness audit + tests (small, no behavior change).
Track 5 follow-up — true multi-value at -O1 (opt cg_ir_lower) + wasm, if wanted.
Track 4 taxonomy tidy (optional) — stop double-modeling memcpy/memset as a separate public intrinsic surface (kept as dedicated public ops).

Track 2 is independent of everything still open; the fold-layer isolation (6.2) already helps it.

Decisions still governing remaining work

Op enums: one public vocabulary, int/fp split. CgTarget consumes the public split enums; delete internal BinOp/UnOp/CmpOp + their api_map_*. (Atomic/Order/AsmDir already done.) — still governs the open Track 2 binop/cmp split.

(Decisions 1, 3, 4, 5, 6, 7, 8 are realized: 1 = peephole kept + re-enabled under Track 6 (b8de5c0); 3 = supports_intrinsic + diagnostic + CPU intrinsics landed (15e2effc), FP_REM removed; 4 = ret_void removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed; 6 = elem is a pointer VALUE + explicit array-decay (Track 7 core); 7 = bitfields are a PLACE subkind (d08e794); 8 = -O0 quality was not a gate for Track 7, and Track 6.3 restored the peephole.)

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

kit