kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Codegen Interface Cleanup — Roadmap (remaining work)

Status: mostly executed. The independent, lower-risk tracks landed first; the high-blast-radius work has since followed. The PLACE/VALUE centerpiece (Track 7, with Track 3b folded in) is complete — strict addressing, the explicit place predicate, forbidden aggregate VALUEs, i128/f128 flowing as VALUEs, and bitfields as a PLACE subkind are all landed. The op/intrinsic taxonomy (Track 4) is complete. The fold-layer isolation + delayed-arith re-enable (Track 6) is complete. What remains is the binop/cmp op-split (the rest of Track 2), the Track 1c completeness audit, and the Track 5 multi-value follow-up.

✓ RESOLVED — i128 -O1 regression fixed. test-parse i128_06_shifts_bitwise had crashed (SIGSEGV/SIGABRT) at -O1 on the native/link paths. Root cause: Track 7.3 made i128/f128 flow as scalar VALUEs, but api_make_wide16_int_const (src/cg/wide.c) still returned an lvalue/PLACE, so an i128 constant entered the stack as a place; the O1 ABI lowering then passed it by-reference and dereferenced a value slot as a (mistyped 32-bit) pointer → null deref on i128→_Bool compares. Fix: return a VALUE (api_make_sv), matching api_push_call_result's i128 representation. Now full test-parse is 3784/0 and bootstrap reproduces at -O0 AND -O1. Lesson: codegen gates must include test-parse — bootstrap reproducing ≠ correctness.

Forward-looking companion to the canonical design in doc/CODEGEN.md. Goal: make the KitCg public API and the internal CgTarget contract carry one clear representation per concept, with no advertise-but-ignore surface and no façade. Breaking and sweeping changes are in scope; reducing churn is not a priority.

The centerpiece was Track 7 — a strict PLACE/VALUE stack discipline that ends CG's inference of what a stack slot means. It is now landed (Model B; see §Track 7).

Scope

Two stacked interfaces (see doc/CODEGEN.md §The two boundaries):

Between them sits the translation layer (src/cg/value.c, arith.c, memory.c, control.c, call.c), which also performs -O0 constant folding and compare fusion.

Principles we are enforcing (unchanged)

  1. One representation per concept. No concept in two structs/enums hand-kept in sync.
  2. No advertise-but-ignore. A public field/flag is honored or it does not exist.
  3. No façade. A public enumerator that always panics is a bug — implement it, remove it, or gate it behind a capability query + clean diagnostic.
  4. Width belongs to the type, not the opcode. bswap is one operation, not three.
  5. Ops vs intrinsics has a stated rule (§Track 4) and both layers obey it.
  6. The semantic layer may peephole, but that responsibility is named and isolated. The vstack peephole is a kept feature (free -O0 perf), not removed.
  7. Completeness over minimalism. Keep an op/enumerator with a distinct, sensible meaning that completes an orthogonal set — even with no caller. Remove only the redundant: two spellings of one behavior.

Done (committed, all green: lib · toy 1344/0 · cg-api · smoke x64/rv64 · opt · isa · libc; make bootstrap reproduces at -O0 AND -O1)

Commit Track Summary
e27a288 1a / 1d Removed SCOPE_IF / CGScopeDesc.cond / the scope_else hook (both IR opcodes CG_IR_SCOPE_ELSE + IR_SCOPE_ELSE, all 5 realizations, the desc.cond opt walkers; ~22 files). Removed KIT_CG_TAIL_NEVER (redundant with DEFAULT).
ae8d0f6 5 Multi-result public API: KitCgFuncSig.results[]/nresults (+ KitCgFuncResult), kit_cg_type_func_nresults/_result, kit_cg_ret_void removed (void = 0-result kit_cg_ret). Type system stores results[]; kit_cg_call pushes/kit_cg_ret pops in declaration order (last result on TOS). Includes a self-host regression fix: a no-value return on a non-void function (UB fall-off) now emits kit_cg_unreachable instead of underflowing the value stack (pcg_ret in lang/c/parse/cg_adapter.c).
fabf255 3a Dropped KIT_CG_MEM_NONTEMPORAL/_INVARIANT + KitCgMemAccess.alias_scope/noalias_scope (decision #5) and the matching toy attributes.
5e1335d 4 (FP_REM) Removed the KIT_CG_FP_REM façade (always-panic; only dead callers). FP remainder is a libcall the frontend emits.
917ffe9 2 (AsmDir) Deleted internal AsmDir + api_map_asm_dir; AsmConstraint.dir and backends use public KitCgAsmDir.
a2f6367 2 (Atomic/Order) Deleted internal AtomicOp/MemOrder + api_map_atomic_op/api_map_mem_order; both the semantic CgTarget and physical NativeTarget atomic hooks, the recorder+opt IR aux, and the interpreter now carry public KitCgAtomicOp/KitCgMemOrder.
d03eb4c 6.2 Isolated the -O0 semantic peephole into src/cg/fold.{c,h}: integer constant folding, the SV_CMP delayed-compare lifecycle, the (gated-off) SV_ARITH delayed-arith lifecycle, and const-local store-to-load forwarding with its invalidation boundaries. fold.h is the documented contract, re-exported via internal.h; value.c keeps stack discipline, api_lvalue_addr, and the enum-mapping helpers. Pure relocation, no behavior change. doc/CODEGEN.md updated.
c338c74+8e17cb9 7 (core) Strict PLACE/VALUE addressing. Removed KitCgEffAddr from load/store (they consume a PLACE); added deref(offset) (VALUE ptr→PLACE), renamed indexelem (VALUE ptr + index→PLACE, scale=sizeof(T)), kept field(i)/addr. Each op panics on kind mismatch — no place/value inference. The place ops fold the constant offset (deref/field) and scale (elem→log2_scale) into one OPK_INDIRECT[base + index*scale + offset], so the backend still gets a single addressing-mode memop (base/index dynamic, scale/offset folded). All three frontends + emu + cg-api tests conformed (explicit deref/decay/field). cg.h documents the kinds + per-op contracts. Green: toy 1344/0, cg-api, opt (incl tiny-inline), smoke, libc, isa/link/elf, and make bootstrap reproduces byte-identical at -O0 AND -O1.
a0397c6 7.1 / 7.2 Explicit PLACE predicate + forbid aggregate VALUEs. api_is_lvalue_sv is now a kind-based predicate — sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op) — replacing the old heuristic OR (the bitfield_lvalue and source_local && OPK_LOCAL terms are subsumed; SV_CMP/SV_ARITH never carry lvalue=1). api_push now panics if an aggregate-typed value enters the stack as a non-place (aggregates are always PLACEs; i128/f128 are scalars and unaffected).
6f48bfd 7.3 Flow i128/f128 as VALUEs, collapse the wide16 special paths in memory.c/call.c (~100 lines deleted). The 16-byte scalars now ride the value path; the aggregate-like special-casing is gone.
d08e794 3b Bitfield as a PLACE subkind, single representation. Dropped the bit-field rider on KitCgMemAccess; the strict load/store carry the bit-field geometry via the KitCgMemAccess the frontend supplies (rebuilt through bf_from_access), and kit_cg_field pushes the record-base address as a place of the field type with no delayed.bitfield/bitfield_lvalue rider. Removes the "every memop is secretly maybe-a-bitfield" branch.
b8de5c0 6.3 Re-enabled the SV_ARITH delayed-arith -O0 peephole (gate flip in fold.c, now that Track 7 removed the EA rider it conflicted with). doc/CODEGEN.md note flipped from gated-off to live.
52897e0 4a Collapsed INTRIN_BSWAP16/32/64 into one width-by-type BSWAP (cgtarget.h). arith.c drops the size-branch; each backend (aa64/x64/rv64 native, interp, c_target, wasm) derives width from dsts[0].type under a switch(width), preserving the existing sequences. Pure internal dedup; public API unchanged.
7eaf7bf9 4b unreachable is now a first-class terminator hook with its own CgTarget hook + IR op (recorder + opt), not routed through the intrinsic path. The 5 backends + interp + every opt pass that handles terminators (CFG/DCE/SSA/native-emit/…) handle it directly.
15e2effc 4c kit_cg_target_supports_intrinsic query + a real unsupported-feature diagnostic (replacing the bare compiler_panic); implemented the single-instruction baremetal/CPU intrinsics (cpu_nop/yield/wfi/wfe/sev/isb/dmb/dsb/irq_*) on the native arches. Converted the test/toy/err/unsupported_* panic cases into positive smoke cases + added the capability-query test. FMA/SYSCALL/CORO_SWITCH still report false.

So Tracks 1a/1d, 5, 3a, 3b, 6, 7 are done; Track 4 is done (FP_REM + 4a/4b/4c); Track 2 is 2/3 done (the 3 identical enums; the binop/cmp split remains).

Caveats / follow-ups discovered while doing the above


Track 1c — Conditional control ops: completeness audit + tests (REMAINING, small)

KEEP the full {break, continue} × {unconditional, _true, _false} set (Principle 7; break_true/continue_true/continue_false have 0 callers, break_false 1, but the set is the structured analog of branch_true/branch_false). Remaining work is test coverage + an audit, not code change:


Track 2 (remaining) — Split the merged BinOp/UnOp/CmpOp

The 3 identical enums (Atomic/Order/AsmDir) are done. What remains is the split→merged trio, which is the structural core of Track 2 (the largest remaining mechanical change):

Public (cg.h) Internal (cgtarget.h) Relationship
KitCgIntBinOp(13) + KitCgFpBinOp(4) BinOp split→merged
KitCgIntCmpOp(10) + KitCgFpCmpOp(12) CmpOp(14) split→merged, lossy
KitCgIntUnOp(3) + KitCgFpUnOp(1) UnOp split→merged

Why it matters: the merge is lossyapi_map_fp_cmp collapses OEQ/UEQ → one CMP_EQ (value.c), so the public ordered/unordered FP-compare distinction cannot reach a backend. Fixing that is the real correctness win; the binop/unop dedup is consistency.

Decision (#2): CgTarget consumes the public split enums directly; backends switch on KitCgIntBinOp and KitCgFpBinOp separately. Delete BinOp/UnOp/CmpOp and api_map_int_binop/api_map_fp_binop/api_map_int_unop/api_map_int_cmp/api_map_fp_cmp.

Why this is bigger than the atomic slice — the design to implement

Unlike Atomic/Order (a 1:1 value-preserving rename), this is a genuine split:

  1. Hooks split (cgtarget.h, mirrored in native_target.h if any binop/cmp is physical — check; binop/cmp are semantic CgTarget hooks): binopint_binop/fp_binop, unopint_unop/fp_unop, cmpint_cmp/fp_cmp, cmp_branchint_cmp_branch/fp_cmp_branch.
  2. IR opcodes double — the recorder (src/cg/ir.h) and opt IR (src/opt/ir.h) store the op in extra.imm/aux; a single CG_IR_BINOP/IR_BINOP can't hold an ambiguous value (KIT_CG_INT_ADD == KIT_CG_FP_ADD == 0). Either split the opcodes (CG_IR_INT_BINOP/CG_IR_FP_BINOP, …) or add an is_fp discriminator bit. Splitting the opcodes is cleaner; both touch ir_recorder.c, cg_ir_lower.c, pass_native_emit.c, ir_dump.c/ir_print.c, and every opt pass that switches on IR_BINOP/IR_CMP/ IR_CMP_BRANCH (pass_combine, pass_simplify, pass_o2, pass_jump, …).
  3. Fold layer restructures (arith.c + value.c). api_cg_binop(BinOp) / api_cg_unop(UnOp) / api_cg_cmp(CmpOp) are the shared dispatch. Note the int/fp split is already made by TYPE (api_type_is_float), not by the enum — so splitting the dispatch is natural: the int path keeps the fold (api_try_fold_int_binop/_unop/_cmp, int-only) and the delayed forms (SV_ARITH arith, SV_CMP compare; ApiDelayedArith.bin_op/ un_op, ApiDelayedCmp.op, api_make_cmp, api_materialize_cmp_to, api_invert_cmp, api_branch_if); the fp path is simpler (the f128 helper path in kit_cg_fp_binop already exists, plus the fp hook). This is the subtle, high-risk part — get the delayed-compare fusion + constant-fold right per int/fp. Coordinate with Track 6.2 (which moves these into fold.c); doing 6.2 first may make this cleaner.
  4. Backends split their switches: the 3 native arches (aa64/x64/rv64 native.c — they already re-split int/fp internally), c_target/c_emit.c, wasm/emit.c, and the interpreter (interp/engine.c).

Method that worked for the atomic slice: delete the internal enum + change the hook signatures, then let -Werror enumerate every cg-side site (the C frontend's cg_adapter.h copy won't be flagged — it's a different type). Then fix per file. For the value-label renames, sed only within src/cg|arch|opt|interp (never lang/, never src/wasm/).

Tests: test-isa/test-arch (encode/decode), test-opt, smoke; add an unordered FP compare exercised end-to-end (the currently-lossy case) — that's the regression guard for the real fix.


Track 3b — Bitfields as a PLACE subkind (DONE — d08e794)

LANDED on codegen-tracks-7634 (d08e794). A bitfield is now a PLACE subkind: the bit-field rider was dropped from KitCgMemAccess, and the strict load/store carry the bit-field geometry (storage size/offset, bit offset/width, signedness) via the KitCgMemAccess the frontend supplies, rebuilt through bf_from_access. kit_cg_field now pushes the record-base address as a place of the field type with no delayed.bitfield /bitfield_lvalue rider, and the "every memop is secretly maybe-a-bitfield" branch in kit_cg_load/_store is gone. Touched cg.h, internal.h, memory.c, value.c, control.c, and lang/c/parse/cg_adapter.c; green on the bitfield corpus + test-cg-api


Track 4 — op/intrinsic taxonomy (DONE — 5e1335d + 52897e0 + 7eaf7bf9 + 15e2effc)

LANDED. FP_REM removal (5e1335d) plus 4a/4b/4c on codegen-tracks-7634:

4a. Width-by-type: collapse BSWAP16/32/64 → one BSWAP — DONE (52897e0)

Collapsed the 3 internal IntrinKind bswaps into one width-by-type BSWAP in cgtarget.h. arith.c dropped the abi_cg_sizeof-driven size-branch; each backend now derives width from dsts[0].type and wraps its three existing sequences under a switch(width), preserving them verbatim. Done across aa64/x64/rv64 native.c, interp/engine.c, c_target/c_emit.c, and wasm (arch/wasm/emit.c + internal.h). The C frontend's cg_adapter.h INTRIN_BSWAP16/32/64 was left as-is (maps to the public single BSWAP at the call site). Pure internal dedup — public API unchanged.

4b. unreachable as a first-class terminator hook — DONE (7eaf7bf9)

kit_cg_unreachable now has its own CgTarget hook + its own IR op (recorder + opt) and is no longer routed through the intrinsic hook. The 5 backends' + interp's handling, plus every opt pass that handles terminators (pass_cfg/pass_dce/pass_ssa/pass_analysis/pass_o2/ pass_lower/pass_native_emit, cg_ir_lower, ir_dump/ir_print, check_target), were moved onto it. (Terminators are first-class: ret, unreachable, jump, branch, computed_goto, tail-call.)

4c. Façade intrinsics: query + implement the trivial ones — DONE (15e2effc)

Added kit_cg_target_supports_intrinsic(KitCompiler*, KitCgIntrinsic) (mirroring _supports_call_conv/_symbol_feature) and converted the bare compiler_panic into a proper unsupported-feature diagnostic. Implemented the single-instruction baremetal/CPU intrinsics on the native arches (cpu_nop/cpu_yield/wfi/wfe/sev/isb/dmb/dsb/irq_*). The test/toy/err/unsupported_* panic cases were converted into positive smoke cases (plus a new 144_intrinsic_capability_query + 145_baremetal_privileged_aa64). FMA/SYSCALL/ CORO_SWITCH still report false until implemented.

Follow-up not done here: the "keep memcpy/memset as dedicated public ops but stop double-modeling them as a separate public intrinsic surface" cleanup was not part of this slice — it remains an open taxonomy tidy if wanted.


Track 6 — Isolate and complete the semantic peephole (DONE — d03eb4c + b8de5c0)

The semantic layer is also a -O0 peephole optimizer — a kept feature (Principle 6).

Status: DONE. Both 6.2 (d03eb4c) and 6.3 (b8de5c0) landed.

Current state

Action — completed

  1. 6.2 — Extract the live peephole into src/cg/fold.c + fold.hDONE (d03eb4c). The documented contract covers the integer fold helpers, the SV_CMP lifecycle, and const-local forwarding with its invalidation boundaries (api_local_const_memory_boundary/_control_boundary/_address_taken). The (then gated-off) SV_ARITH machinery was moved alongside it so 6.3 was a gate flip, not a code move. Op families call into fold.h; value.c keeps the stack discipline. ApiSValue's shape is settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated.
  2. 6.3 — Re-enable delayed arith after Track 7DONE (b8de5c0). The gate in api_can_delay_int_arith (in fold.c) was restored now that Track 7 removed the EA rider; the api_make_arith_*/api_materialize_arith_to/api_release_arith fold-chain + identity-collapse helpers compose with the place/value model. Green at -O0; bootstrap reproduces.
  3. Fix doc/CODEGEN.mdDONE. 6.2 introduced fold.c and marked delayed arith gated-off; 6.3 flipped that note to "live".

Track 7 — Strict place/value discipline (the centerpiece) — DONE

Status: LANDED (c338c74+8e17cb9 core; a0397c6 7.1/7.2; 6f48bfd 7.3). The public addressing surface is the strict push_local/addr/deref/field/elem/load/store set; the KitCgEffAddr rider is gone; every op panics on a place/value kind mismatch (no inference at the boundary); and the place ops fold the constant offset/scale into one OPK_INDIRECT[base + index*scale + offset] for clean memops. All frontends + emu + cg-api tests conform; make bootstrap reproduces at -O0 AND -O1.

Refinements (now landed):

Remaining refinement (follow-up, non-blocking per decision #8):

Original design notes (for the remaining refinements)

Decided: Model B (explicit place/value kinds); wide-16 scalars are values. (Track 3c — the scale vs log2_scale rider mismatch — is subsumed here: the KitCgEffAddr rider is removed entirely.)

Today the value stack carries an inferred lvalue/rvalue distinction and several ops dispatch on type + shape. Inference points to remove:

The discipline

Every stack entry is exactly one explicit, type-checked kind:

CG keeps owning layout (field offsets, element sizes, types). It stops guessing the kind or passing-mode of a stack value. Every op declares the kinds it consumes/produces and panics on mismatch.

Op signatures (strict, single-shape)

Op Consumes Produces Notes
push_local l PLACE the local's storage
push_int/float/null VALUE
push_symbol_addr s,a VALUE (ptr)
push_local_addr l VALUE (ptr) sugar for push_local; addr
addr PLACE VALUE (ptr) address of the place
deref (NEW) VALUE (ptr) PLACE the explicit ptr→place transition
field i PLACE(record) PLACE(field) offset/type from layout; -> is deref; field
elem (was index) VALUE(ptr to T) + index VALUE PLACE(T) *(p+i); scale=sizeof(T); arrays decay to ptr first
load access PLACE VALUE always dereferences; no EA rider
store access PLACE, VALUE always dereferences

The KitCgEffAddr rider is removed from load/store: addressing is built explicitly by field/elem/deref and absorbed into the OPK_INDIRECT place, so the backend still gets a single [base+index*scale+off] memop. The kept fold layer (Track 6) recovers -O0 quality (load of PLACE(local) → the local; deref of a ptr-arith chain → the indirect place). Per decision #8 this recovery is desirable but not a gate.

Aggregates (values forbidden)

An aggregate is always a PLACE; a VALUE of aggregate type is illegal (panic). Copies are explicit (memcpy between places, or field-by-field). Call args/returns of aggregate type pass an explicit place, mode named via existing ABI attrs (SRET/BYVAL/BYREF). Removes the aggregate branches in api_materialize_call_local, api_push_call_result, and the aggregate ret path.

wide16 (scalar values)

i128/f128 are VALUES; the backend lowers 16-byte storage/moves. The wide16 special paths in memory.c/call.c/wide.c collapse into the value path (plus backend 16-byte value-move support where missing).

Affected: cg.h (new deref, elem rename, EA rider removed, ApiSValue kind tag), value.c, memory.c, control.c (indexelem, field), call.c, wide.c, every frontend (insert explicit deref/array-decay; mark aggregate passing modes). Backends mostly unaffected (they already consume OPK_INDIRECT). Tests: highest blast radius — red-green per op on the toy corpus + C frontend; -O0 quality is not a gate (decision #8).


Recommended sequencing (remaining)

Most of the original sequence is landed: 6.2 (d03eb4c), Track 7 core + 7.1/7.2/7.3 (c338c74/8e17cb9/a0397c6/6f48bfd), 6.3 (b8de5c0), Track 3b (d08e794), and Track 4 4a/4b/4c (52897e0/7eaf7bf9/15e2effc) are all done. What's left:

  1. Track 2 binop/cmp split — the largest remaining mechanical change; cleaner now that the fold layer is isolated (6.2). Also fixes the lossy FP compare.
  2. Track 1c completeness audit + tests (small, no behavior change).
  3. Track 5 follow-up — true multi-value at -O1 (opt cg_ir_lower) + wasm, if wanted.
  4. Track 4 taxonomy tidy (optional) — stop double-modeling memcpy/memset as a separate public intrinsic surface (kept as dedicated public ops).

Track 2 is independent of everything still open; the fold-layer isolation (6.2) already helps it.

Decisions still governing remaining work

  1. Op enums: one public vocabulary, int/fp split. CgTarget consumes the public split enums; delete internal BinOp/UnOp/CmpOp + their api_map_*. (Atomic/Order/AsmDir already done.) — still governs the open Track 2 binop/cmp split.

(Decisions 1, 3, 4, 5, 6, 7, 8 are realized: 1 = peephole kept + re-enabled under Track 6 (b8de5c0); 3 = supports_intrinsic + diagnostic + CPU intrinsics landed (15e2effc), FP_REM removed; 4 = ret_void removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed; 6 = elem is a pointer VALUE + explicit array-decay (Track 7 core); 7 = bitfields are a PLACE subkind (d08e794); 8 = -O0 quality was not a gate for Track 7, and Track 6.3 restored the peephole.)