Codegen Interface Cleanup — Roadmap (remaining work)
Status: mostly executed. The independent, lower-risk tracks landed first; the high-blast-radius work has since followed. The PLACE/VALUE centerpiece (Track 7, with Track 3b folded in) is complete — strict addressing, the explicit place predicate, forbidden aggregate VALUEs, i128/f128 flowing as VALUEs, and bitfields as a PLACE subkind are all landed. The op/intrinsic taxonomy (Track 4) is complete. The fold-layer isolation + delayed-arith re-enable (Track 6) is complete. What remains is the binop/cmp op-split (the rest of Track 2), the Track 1c completeness audit, and the Track 5 multi-value follow-up.
✓ RESOLVED — i128 -O1 regression fixed.
test-parse i128_06_shifts_bitwisehad crashed (SIGSEGV/SIGABRT) at -O1 on the native/link paths. Root cause: Track 7.3 made i128/f128 flow as scalar VALUEs, butapi_make_wide16_int_const(src/cg/wide.c) still returned an lvalue/PLACE, so an i128 constant entered the stack as a place; the O1 ABI lowering then passed it by-reference and dereferenced a value slot as a (mistyped 32-bit) pointer → null deref on i128→_Boolcompares. Fix: return a VALUE (api_make_sv), matchingapi_push_call_result's i128 representation. Now fulltest-parseis 3784/0 and bootstrap reproduces at -O0 AND -O1. Lesson: codegen gates must includetest-parse— bootstrap reproducing ≠ correctness.
Forward-looking companion to the canonical design in doc/CODEGEN.md. Goal:
make the KitCg public API and the internal CgTarget contract carry one clear
representation per concept, with no advertise-but-ignore surface and no façade. Breaking
and sweeping changes are in scope; reducing churn is not a priority.
The centerpiece was Track 7 — a strict PLACE/VALUE stack discipline that ends CG's inference of what a stack slot means. It is now landed (Model B; see §Track 7).
Scope
Two stacked interfaces (see doc/CODEGEN.md §The two boundaries):
- Public
kit_cg_*/KitCg(include/kit/cg.h) — a value-stack machine. - Internal
CgTarget(src/cg/cgtarget.h) — a three-address operand vtable. NOTE the op enums also flow into the physicalNativeTarget(src/arch/native_target.h, which#includescgtarget.h), the recorder IR (src/cg/ir.h) and opt IR (src/opt/ir.h), and the interpreter (src/interp/). Any enum change touches all of these layers, not just the semantic vtable.
Between them sits the translation layer (src/cg/value.c, arith.c, memory.c,
control.c, call.c), which also performs -O0 constant folding and compare fusion.
Principles we are enforcing (unchanged)
- One representation per concept. No concept in two structs/enums hand-kept in sync.
- No advertise-but-ignore. A public field/flag is honored or it does not exist.
- No façade. A public enumerator that always panics is a bug — implement it, remove it, or gate it behind a capability query + clean diagnostic.
- Width belongs to the type, not the opcode.
bswapis one operation, not three. - Ops vs intrinsics has a stated rule (§Track 4) and both layers obey it.
- The semantic layer may peephole, but that responsibility is named and isolated. The
vstack peephole is a kept feature (free
-O0perf), not removed. - Completeness over minimalism. Keep an op/enumerator with a distinct, sensible meaning that completes an orthogonal set — even with no caller. Remove only the redundant: two spellings of one behavior.
Done (committed, all green: lib · toy 1344/0 · cg-api · smoke x64/rv64 · opt · isa · libc; make bootstrap reproduces at -O0 AND -O1)
| Commit | Track | Summary |
|---|---|---|
e27a288 |
1a / 1d | Removed SCOPE_IF / CGScopeDesc.cond / the scope_else hook (both IR opcodes CG_IR_SCOPE_ELSE + IR_SCOPE_ELSE, all 5 realizations, the desc.cond opt walkers; ~22 files). Removed KIT_CG_TAIL_NEVER (redundant with DEFAULT). |
ae8d0f6 |
5 | Multi-result public API: KitCgFuncSig.results[]/nresults (+ KitCgFuncResult), kit_cg_type_func_nresults/_result, kit_cg_ret_void removed (void = 0-result kit_cg_ret). Type system stores results[]; kit_cg_call pushes/kit_cg_ret pops in declaration order (last result on TOS). Includes a self-host regression fix: a no-value return on a non-void function (UB fall-off) now emits kit_cg_unreachable instead of underflowing the value stack (pcg_ret in lang/c/parse/cg_adapter.c). |
fabf255 |
3a | Dropped KIT_CG_MEM_NONTEMPORAL/_INVARIANT + KitCgMemAccess.alias_scope/noalias_scope (decision #5) and the matching toy attributes. |
5e1335d |
4 (FP_REM) | Removed the KIT_CG_FP_REM façade (always-panic; only dead callers). FP remainder is a libcall the frontend emits. |
917ffe9 |
2 (AsmDir) | Deleted internal AsmDir + api_map_asm_dir; AsmConstraint.dir and backends use public KitCgAsmDir. |
a2f6367 |
2 (Atomic/Order) | Deleted internal AtomicOp/MemOrder + api_map_atomic_op/api_map_mem_order; both the semantic CgTarget and physical NativeTarget atomic hooks, the recorder+opt IR aux, and the interpreter now carry public KitCgAtomicOp/KitCgMemOrder. |
d03eb4c |
6.2 | Isolated the -O0 semantic peephole into src/cg/fold.{c,h}: integer constant folding, the SV_CMP delayed-compare lifecycle, the (gated-off) SV_ARITH delayed-arith lifecycle, and const-local store-to-load forwarding with its invalidation boundaries. fold.h is the documented contract, re-exported via internal.h; value.c keeps stack discipline, api_lvalue_addr, and the enum-mapping helpers. Pure relocation, no behavior change. doc/CODEGEN.md updated. |
c338c74+8e17cb9 |
7 (core) | Strict PLACE/VALUE addressing. Removed KitCgEffAddr from load/store (they consume a PLACE); added deref(offset) (VALUE ptr→PLACE), renamed index→elem (VALUE ptr + index→PLACE, scale=sizeof(T)), kept field(i)/addr. Each op panics on kind mismatch — no place/value inference. The place ops fold the constant offset (deref/field) and scale (elem→log2_scale) into one OPK_INDIRECT[base + index*scale + offset], so the backend still gets a single addressing-mode memop (base/index dynamic, scale/offset folded). All three frontends + emu + cg-api tests conformed (explicit deref/decay/field). cg.h documents the kinds + per-op contracts. Green: toy 1344/0, cg-api, opt (incl tiny-inline), smoke, libc, isa/link/elf, and make bootstrap reproduces byte-identical at -O0 AND -O1. |
a0397c6 |
7.1 / 7.2 | Explicit PLACE predicate + forbid aggregate VALUEs. api_is_lvalue_sv is now a kind-based predicate — sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op) — replacing the old heuristic OR (the bitfield_lvalue and source_local && OPK_LOCAL terms are subsumed; SV_CMP/SV_ARITH never carry lvalue=1). api_push now panics if an aggregate-typed value enters the stack as a non-place (aggregates are always PLACEs; i128/f128 are scalars and unaffected). |
6f48bfd |
7.3 | Flow i128/f128 as VALUEs, collapse the wide16 special paths in memory.c/call.c (~100 lines deleted). The 16-byte scalars now ride the value path; the aggregate-like special-casing is gone. |
d08e794 |
3b | Bitfield as a PLACE subkind, single representation. Dropped the bit-field rider on KitCgMemAccess; the strict load/store carry the bit-field geometry via the KitCgMemAccess the frontend supplies (rebuilt through bf_from_access), and kit_cg_field pushes the record-base address as a place of the field type with no delayed.bitfield/bitfield_lvalue rider. Removes the "every memop is secretly maybe-a-bitfield" branch. |
b8de5c0 |
6.3 | Re-enabled the SV_ARITH delayed-arith -O0 peephole (gate flip in fold.c, now that Track 7 removed the EA rider it conflicted with). doc/CODEGEN.md note flipped from gated-off to live. |
52897e0 |
4a | Collapsed INTRIN_BSWAP16/32/64 into one width-by-type BSWAP (cgtarget.h). arith.c drops the size-branch; each backend (aa64/x64/rv64 native, interp, c_target, wasm) derives width from dsts[0].type under a switch(width), preserving the existing sequences. Pure internal dedup; public API unchanged. |
7eaf7bf9 |
4b | unreachable is now a first-class terminator hook with its own CgTarget hook + IR op (recorder + opt), not routed through the intrinsic path. The 5 backends + interp + every opt pass that handles terminators (CFG/DCE/SSA/native-emit/…) handle it directly. |
15e2effc |
4c | kit_cg_target_supports_intrinsic query + a real unsupported-feature diagnostic (replacing the bare compiler_panic); implemented the single-instruction baremetal/CPU intrinsics (cpu_nop/yield/wfi/wfe/sev/isb/dmb/dsb/irq_*) on the native arches. Converted the test/toy/err/unsupported_* panic cases into positive smoke cases + added the capability-query test. FMA/SYSCALL/CORO_SWITCH still report false. |
So Tracks 1a/1d, 5, 3a, 3b, 6, 7 are done; Track 4 is done (FP_REM + 4a/4b/4c); Track 2 is 2/3 done (the 3 identical enums; the binop/cmp split remains).
Caveats / follow-ups discovered while doing the above
- Track 5 multi-result is single-result-complete only. The
-O0native path handlesnresults > 1, but the opt path (src/opt/cg_ir_lower.c, theCG_IR_CALL/CG_IR_RETlowering) still only threadsresults[0]— a true 2+-result function is lossy at-O1. The wasm frontend (lang/wasm/cg.c) was also migrated as single-result (takesf->results[0]). True multi-value end-to-end (wasm +-O1) is unfinished follow-up. - The C frontend keeps its own private copies of
BinOp/AtomicOp/MemOrder/IntrinKindinlang/c/parse/cg_adapter.h. These are a separate Principle-1 issue, deliberately left alone by Track 2 (they're a different namespace; do not blind-renameAO_*/MO_*/BO_*acrosslang/). Worth a follow-up to dedupe against the public enums. - Regression lesson (in [[doc/plan/BOOTSTRAP.md]] / the self-build): removing a "bare
return that ignores result count" primitive means every frontend's fall-off / default
return must push the right number of values or terminate with
unreachable. Audit other frontends if you remove more return primitives.
Track 1c — Conditional control ops: completeness audit + tests (REMAINING, small)
KEEP the full {break, continue} × {unconditional, _true, _false} set (Principle 7;
break_true/continue_true/continue_false have 0 callers, break_false 1, but the set
is the structured analog of branch_true/branch_false). Remaining work is test
coverage + an audit, not code change:
- Confirm
continue*is rejected on non-loop scopes and block-vs-loop rules are uniform. - Add an end-to-end test for each
break_*/continue_*variant on a backend (the result-carrying semantics are spec'd atcg.hkit_cg_break_true&c).
Track 2 (remaining) — Split the merged BinOp/UnOp/CmpOp
The 3 identical enums (Atomic/Order/AsmDir) are done. What remains is the split→merged trio, which is the structural core of Track 2 (the largest remaining mechanical change):
Public (cg.h) |
Internal (cgtarget.h) |
Relationship |
|---|---|---|
KitCgIntBinOp(13) + KitCgFpBinOp(4) |
BinOp |
split→merged |
KitCgIntCmpOp(10) + KitCgFpCmpOp(12) |
CmpOp(14) |
split→merged, lossy |
KitCgIntUnOp(3) + KitCgFpUnOp(1) |
UnOp |
split→merged |
Why it matters: the merge is lossy — api_map_fp_cmp collapses OEQ/UEQ → one
CMP_EQ (value.c), so the public ordered/unordered FP-compare distinction cannot reach a
backend. Fixing that is the real correctness win; the binop/unop dedup is consistency.
Decision (#2): CgTarget consumes the public split enums directly; backends switch on
KitCgIntBinOp and KitCgFpBinOp separately. Delete BinOp/UnOp/CmpOp and
api_map_int_binop/api_map_fp_binop/api_map_int_unop/api_map_int_cmp/api_map_fp_cmp.
Why this is bigger than the atomic slice — the design to implement
Unlike Atomic/Order (a 1:1 value-preserving rename), this is a genuine split:
- Hooks split (
cgtarget.h, mirrored innative_target.hif any binop/cmp is physical — check; binop/cmp are semanticCgTargethooks):binop→int_binop/fp_binop,unop→int_unop/fp_unop,cmp→int_cmp/fp_cmp,cmp_branch→int_cmp_branch/fp_cmp_branch. - IR opcodes double — the recorder (
src/cg/ir.h) and opt IR (src/opt/ir.h) store the op inextra.imm/aux; a singleCG_IR_BINOP/IR_BINOPcan't hold an ambiguous value (KIT_CG_INT_ADD == KIT_CG_FP_ADD == 0). Either split the opcodes (CG_IR_INT_BINOP/CG_IR_FP_BINOP, …) or add anis_fpdiscriminator bit. Splitting the opcodes is cleaner; both touchir_recorder.c,cg_ir_lower.c,pass_native_emit.c,ir_dump.c/ir_print.c, and every opt pass that switches onIR_BINOP/IR_CMP/IR_CMP_BRANCH(pass_combine,pass_simplify,pass_o2,pass_jump, …). - Fold layer restructures (
arith.c+value.c).api_cg_binop(BinOp)/api_cg_unop(UnOp)/api_cg_cmp(CmpOp)are the shared dispatch. Note the int/fp split is already made by TYPE (api_type_is_float), not by the enum — so splitting the dispatch is natural: the int path keeps the fold (api_try_fold_int_binop/_unop/_cmp, int-only) and the delayed forms (SV_ARITHarith,SV_CMPcompare;ApiDelayedArith.bin_op/un_op,ApiDelayedCmp.op,api_make_cmp,api_materialize_cmp_to,api_invert_cmp,api_branch_if); the fp path is simpler (the f128 helper path inkit_cg_fp_binopalready exists, plus the fp hook). This is the subtle, high-risk part — get the delayed-compare fusion + constant-fold right per int/fp. Coordinate with Track 6.2 (which moves these intofold.c); doing 6.2 first may make this cleaner. - Backends split their switches: the 3 native arches (
aa64/x64/rv64native.c— they already re-split int/fp internally),c_target/c_emit.c,wasm/emit.c, and the interpreter (interp/engine.c).
Method that worked for the atomic slice: delete the internal enum + change the hook
signatures, then let -Werror enumerate every cg-side site (the C frontend's cg_adapter.h
copy won't be flagged — it's a different type). Then fix per file. For the value-label
renames, sed only within src/cg|arch|opt|interp (never lang/, never src/wasm/).
Tests: test-isa/test-arch (encode/decode), test-opt, smoke; add an unordered FP
compare exercised end-to-end (the currently-lossy case) — that's the regression guard for
the real fix.
Track 3b — Bitfields as a PLACE subkind (DONE — d08e794)
LANDED on codegen-tracks-7634 (d08e794). A bitfield is now a PLACE subkind: the
bit-field rider was dropped from KitCgMemAccess, and the strict load/store carry
the bit-field geometry (storage size/offset, bit offset/width, signedness) via the
KitCgMemAccess the frontend supplies, rebuilt through bf_from_access. kit_cg_field
now pushes the record-base address as a place of the field type with no delayed.bitfield
/bitfield_lvalue rider, and the "every memop is secretly maybe-a-bitfield" branch in
kit_cg_load/_store is gone. Touched cg.h, internal.h, memory.c, value.c,
control.c, and lang/c/parse/cg_adapter.c; green on the bitfield corpus + test-cg-api
- bootstrap. (Done as a PLACE subkind on the strict
load/store, after Track 7 core.)
Track 4 — op/intrinsic taxonomy (DONE — 5e1335d + 52897e0 + 7eaf7bf9 + 15e2effc)
LANDED. FP_REM removal (5e1335d) plus 4a/4b/4c on codegen-tracks-7634:
4a. Width-by-type: collapse BSWAP16/32/64 → one BSWAP — DONE (52897e0)
Collapsed the 3 internal IntrinKind bswaps into one width-by-type BSWAP in cgtarget.h.
arith.c dropped the abi_cg_sizeof-driven size-branch; each backend now derives width from
dsts[0].type and wraps its three existing sequences under a switch(width), preserving them
verbatim. Done across aa64/x64/rv64 native.c, interp/engine.c, c_target/c_emit.c,
and wasm (arch/wasm/emit.c + internal.h). The C frontend's cg_adapter.h
INTRIN_BSWAP16/32/64 was left as-is (maps to the public single BSWAP at the call site).
Pure internal dedup — public API unchanged.
4b. unreachable as a first-class terminator hook — DONE (7eaf7bf9)
kit_cg_unreachable now has its own CgTarget hook + its own IR op (recorder + opt) and is
no longer routed through the intrinsic hook. The 5 backends' + interp's handling, plus every
opt pass that handles terminators (pass_cfg/pass_dce/pass_ssa/pass_analysis/pass_o2/
pass_lower/pass_native_emit, cg_ir_lower, ir_dump/ir_print, check_target), were
moved onto it. (Terminators are first-class: ret, unreachable, jump, branch, computed_goto,
tail-call.)
4c. Façade intrinsics: query + implement the trivial ones — DONE (15e2effc)
Added kit_cg_target_supports_intrinsic(KitCompiler*, KitCgIntrinsic) (mirroring
_supports_call_conv/_symbol_feature) and converted the bare compiler_panic into a proper
unsupported-feature diagnostic. Implemented the single-instruction baremetal/CPU intrinsics on
the native arches (cpu_nop/cpu_yield/wfi/wfe/sev/isb/dmb/dsb/irq_*). The
test/toy/err/unsupported_* panic cases were converted into positive smoke cases (plus a new
144_intrinsic_capability_query + 145_baremetal_privileged_aa64). FMA/SYSCALL/
CORO_SWITCH still report false until implemented.
Follow-up not done here: the "keep memcpy/memset as dedicated public ops but stop
double-modeling them as a separate public intrinsic surface" cleanup was not part of this
slice — it remains an open taxonomy tidy if wanted.
Track 6 — Isolate and complete the semantic peephole (DONE — d03eb4c + b8de5c0)
The semantic layer is also a -O0 peephole optimizer — a kept feature (Principle 6).
Status: DONE. Both 6.2 (d03eb4c) and 6.3 (b8de5c0) landed.
Current state
- Live: constant folding (
api_try_fold_int_binop/_unop/_cmp, infold.c) and theSV_CMPfused-compare-into-branch path (api_make_cmp/api_materialize_cmp_to/api_branch_if). - Live again: the
SV_ARITHdelayed-arith subsystem — re-enabled byb8de5c0once Track 7 removed the EA rider it conflicted with. - Live: scalar store-to-load forwarding (
api_local_const_*).
Action — completed
- 6.2 — Extract the live peephole into
src/cg/fold.c+fold.h— DONE (d03eb4c). The documented contract covers the integer fold helpers, theSV_CMPlifecycle, and const-local forwarding with its invalidation boundaries (api_local_const_memory_boundary/_control_boundary/_address_taken). The (then gated-off)SV_ARITHmachinery was moved alongside it so 6.3 was a gate flip, not a code move. Op families call intofold.h;value.ckeeps the stack discipline.ApiSValue's shape is settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated. - 6.3 — Re-enable delayed arith after Track 7 — DONE (
b8de5c0). The gate inapi_can_delay_int_arith(infold.c) was restored now that Track 7 removed the EA rider; theapi_make_arith_*/api_materialize_arith_to/api_release_arithfold-chain + identity-collapse helpers compose with the place/value model. Green at -O0; bootstrap reproduces. - Fix doc/CODEGEN.md — DONE. 6.2 introduced
fold.cand marked delayed arith gated-off; 6.3 flipped that note to "live".
Track 7 — Strict place/value discipline (the centerpiece) — DONE
Status: LANDED (c338c74+8e17cb9 core; a0397c6 7.1/7.2; 6f48bfd 7.3). The public
addressing surface is the strict push_local/addr/deref/field/elem/load/store
set; the KitCgEffAddr rider is gone; every op panics on a place/value kind mismatch (no
inference at the boundary); and the place ops fold the constant offset/scale into one
OPK_INDIRECT[base + index*scale + offset] for clean memops. All frontends + emu + cg-api
tests conform; make bootstrap reproduces at -O0 AND -O1.
Refinements (now landed):
- The internal place predicate
api_is_lvalue_svis now kind-based (a0397c6):sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op), replacing the old heuristic OR (thebitfield_lvalueandsource_local && OPK_LOCALterms are subsumed). - Aggregate VALUEs are now hard-forbidden (
a0397c6):api_pushpanics on an aggregate-typed non-place value. i128/f128 are scalars and unaffected. - wide16 (i128/f128) now flows as a VALUE (
6f48bfd): the aggregate-like special paths inmemory.c/call.ccollapsed (~100 lines deleted). - Bitfields are now a PLACE subkind (Track 3b,
d08e794): the bit-field rider was dropped fromKitCgMemAccess; the strictload/storecarry the geometry.
Remaining refinement (follow-up, non-blocking per decision #8):
- -O0 mem-op quality: the C frontend reaches non-trivial places via
pcg_materialize_lv_to_ptr(int arithmetic) +deref; it could instead emitderef(offset)/elemdirectly so -O0 also gets the folded addressing mode (-O1's addr-fold already recovers it; decision #8 makes this non-blocking).
Original design notes (for the remaining refinements)
Decided: Model B (explicit place/value kinds); wide-16 scalars are values. (Track 3c —
the scale vs log2_scale rider mismatch — is subsumed here: the KitCgEffAddr rider is
removed entirely.)
Today the value stack carries an inferred lvalue/rvalue distinction and several ops dispatch on type + shape. Inference points to remove:
api_is_lvalue_svis a heuristic (value.c): ORslvalue,bitfield_lvalue,api_operand_can_address,source_local!=NONE && OPK_LOCAL.kit_cg_loadhas ~7 behaviors, several of which don't load (memory.c).load/storebaseaccepts 4 shapes ({lvalue, ptr-rvalue} × {no-index, indexed}); there is no explicit deref.kit_cg_index/kit_cg_fieldinfer pointer-vs-array / record-vs-pointer.- Aggregates are implicitly by-reference, CG decides it (
call.c). - wide16 (i128/f128) is special-cased as aggregate-like (
memory.c/call.c/wide.c).
The discipline
Every stack entry is exactly one explicit, type-checked kind:
- PLACE — addressable location of a typed object (
OPK_LOCAL/OPK_GLOBAL/OPK_INDIRECT(base+index*scale+off)). - VALUE — a scalar rvalue: integers, floats, pointers, and i128/f128.
CG keeps owning layout (field offsets, element sizes, types). It stops guessing the kind or passing-mode of a stack value. Every op declares the kinds it consumes/produces and panics on mismatch.
Op signatures (strict, single-shape)
| Op | Consumes | Produces | Notes |
|---|---|---|---|
push_local l |
— | PLACE | the local's storage |
push_int/float/null |
— | VALUE | |
push_symbol_addr s,a |
— | VALUE (ptr) | |
push_local_addr l |
— | VALUE (ptr) | sugar for push_local; addr |
addr |
PLACE | VALUE (ptr) | address of the place |
deref (NEW) |
VALUE (ptr) | PLACE | the explicit ptr→place transition |
field i |
PLACE(record) | PLACE(field) | offset/type from layout; -> is deref; field |
elem (was index) |
VALUE(ptr to T) + index VALUE | PLACE(T) | *(p+i); scale=sizeof(T); arrays decay to ptr first |
load access |
PLACE | VALUE | always dereferences; no EA rider |
store access |
PLACE, VALUE | — | always dereferences |
The KitCgEffAddr rider is removed from load/store: addressing is built explicitly
by field/elem/deref and absorbed into the OPK_INDIRECT place, so the backend still
gets a single [base+index*scale+off] memop. The kept fold layer (Track 6) recovers -O0
quality (load of PLACE(local) → the local; deref of a ptr-arith chain → the indirect
place). Per decision #8 this recovery is desirable but not a gate.
Aggregates (values forbidden)
An aggregate is always a PLACE; a VALUE of aggregate type is illegal (panic). Copies are
explicit (memcpy between places, or field-by-field). Call args/returns of aggregate type
pass an explicit place, mode named via existing ABI attrs (SRET/BYVAL/BYREF). Removes
the aggregate branches in api_materialize_call_local, api_push_call_result, and the
aggregate ret path.
wide16 (scalar values)
i128/f128 are VALUES; the backend lowers 16-byte storage/moves. The wide16 special paths
in memory.c/call.c/wide.c collapse into the value path (plus backend 16-byte value-move
support where missing).
Affected: cg.h (new deref, elem rename, EA rider removed, ApiSValue kind tag),
value.c, memory.c, control.c (index→elem, field), call.c, wide.c, every
frontend (insert explicit deref/array-decay; mark aggregate passing modes). Backends
mostly unaffected (they already consume OPK_INDIRECT). Tests: highest blast radius —
red-green per op on the toy corpus + C frontend; -O0 quality is not a gate (decision #8).
Recommended sequencing (remaining)
Most of the original sequence is landed: 6.2 (d03eb4c), Track 7 core + 7.1/7.2/7.3
(c338c74/8e17cb9/a0397c6/6f48bfd), 6.3 (b8de5c0), Track 3b (d08e794), and
Track 4 4a/4b/4c (52897e0/7eaf7bf9/15e2effc) are all done. What's left:
- Track 2 binop/cmp split — the largest remaining mechanical change; cleaner now that the fold layer is isolated (6.2). Also fixes the lossy FP compare.
- Track 1c completeness audit + tests (small, no behavior change).
- Track 5 follow-up — true multi-value at
-O1(optcg_ir_lower) + wasm, if wanted. - Track 4 taxonomy tidy (optional) — stop double-modeling
memcpy/memsetas a separate public intrinsic surface (kept as dedicated public ops).
Track 2 is independent of everything still open; the fold-layer isolation (6.2) already helps it.
Decisions still governing remaining work
- Op enums: one public vocabulary, int/fp split.
CgTargetconsumes the public split enums; delete internalBinOp/UnOp/CmpOp+ theirapi_map_*. (Atomic/Order/AsmDir already done.) — still governs the open Track 2 binop/cmp split.
(Decisions 1, 3, 4, 5, 6, 7, 8 are realized: 1 = peephole kept + re-enabled under Track 6
(b8de5c0); 3 = supports_intrinsic + diagnostic + CPU intrinsics landed (15e2effc),
FP_REM removed; 4 = ret_void removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed;
6 = elem is a pointer VALUE + explicit array-decay (Track 7 core); 7 = bitfields are a
PLACE subkind (d08e794); 8 = -O0 quality was not a gate for Track 7, and Track 6.3
restored the peephole.)