kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 32b0908794ee78d4cfd76dea8a2856b6c45d6fe4
parent f5a4f04e3bf6b4fd9bd8929bfc657c3bf7be9e28
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  8 Jun 2026 13:13:46 -0700

docs: close out and remove the CODEGEN plan doc

The CG API interface-cleanup roadmap is closed: PLACE/VALUE (Track 7),
op/intrinsic taxonomy (Track 4), atomic/order/AsmDir + fold isolation (Tracks
2-partial/6), and bitfields-as-place (3b) had landed; this pass removed
multi-value from the API (single result), added the conditional control-op tests
+ block-continue guard (Track 1c), and dropped the dead INTRIN_MEMCPY/MEMSET
(Track 4 tidy).

Track 2's remaining binop/cmp internal-enum dedup is intentionally not done: its
only behavioral win (lossy ordered/unordered FP compare) already landed, and the
merged internal BinOp/CmpOp/UnOp + the single isolated api_map_* is a defensible
normalized internal vocabulary woven through the i128/wide64/fold/delayed-arith
lowering. Remaining = zero-behavior churn, so the doc is retired rather than kept
open for it.

Diffstat:
Ddoc/plan/CODEGEN.md | 372-------------------------------------------------------------------------------
Mdoc/plan/README.md | 1-
2 files changed, 0 insertions(+), 373 deletions(-)

diff --git a/doc/plan/CODEGEN.md b/doc/plan/CODEGEN.md @@ -1,372 +0,0 @@ -# Codegen Interface Cleanup — Roadmap (remaining work) - -Status: **mostly executed.** The independent, lower-risk tracks landed first; the -high-blast-radius work has since followed. The **PLACE/VALUE centerpiece** (Track 7, -with Track 3b folded in) is **complete** — strict addressing, the explicit place -predicate, forbidden aggregate VALUEs, i128/f128 flowing as VALUEs, and bitfields as -a PLACE subkind are all landed. The **op/intrinsic taxonomy** (Track 4) is **complete**. -The **fold-layer isolation + delayed-arith re-enable** (Track 6) is **complete**. What -remains is the **binop/cmp op-split** (the rest of Track 2), the Track 1c completeness -audit, and the Track 5 multi-value follow-up. - -> **✓ RESOLVED — i128 -O1 regression fixed.** `test-parse i128_06_shifts_bitwise` had -> crashed (SIGSEGV/SIGABRT) at -O1 on the native/link paths. Root cause: Track 7.3 made -> i128/f128 flow as scalar VALUEs, but `api_make_wide16_int_const` (`src/cg/wide.c`) -> still returned an **lvalue/PLACE**, so an i128 constant entered the stack as a place; -> the O1 ABI lowering then passed it by-reference and dereferenced a value slot as a -> (mistyped 32-bit) pointer → null deref on i128→`_Bool` compares. Fix: return a VALUE -> (`api_make_sv`), matching `api_push_call_result`'s i128 representation. Now full -> `test-parse` is 3784/0 and bootstrap reproduces at -O0 AND -O1. **Lesson: codegen -> gates must include `test-parse` — bootstrap reproducing ≠ correctness.** - -Forward-looking companion to the canonical design in [doc/CODEGEN.md](../CODEGEN.md). Goal: -make the `KitCg` public API and the internal `CgTarget` contract carry **one clear -representation per concept, with no advertise-but-ignore surface and no façade**. Breaking -and sweeping changes are in scope; reducing churn is *not* a priority. - -The centerpiece was **Track 7** — a strict PLACE/VALUE stack discipline that ends CG's -inference of what a stack slot *means*. It is now landed (Model B; see §Track 7). - -## Scope - -Two stacked interfaces (see [doc/CODEGEN.md §The two boundaries](../CODEGEN.md)): - -- **Public** `kit_cg_*` / `KitCg` (`include/kit/cg.h`) — a value-stack machine. -- **Internal** `CgTarget` (`src/cg/cgtarget.h`) — a three-address operand vtable. NOTE the - op enums also flow into the **physical** `NativeTarget` (`src/arch/native_target.h`, - which `#include`s `cgtarget.h`), the recorder IR (`src/cg/ir.h`) and opt IR - (`src/opt/ir.h`), and the interpreter (`src/interp/`). Any enum change touches all of - these layers, not just the semantic vtable. - -Between them sits the translation layer (`src/cg/value.c`, `arith.c`, `memory.c`, -`control.c`, `call.c`), which also performs `-O0` constant folding and compare fusion. - -## Principles we are enforcing (unchanged) - -1. **One representation per concept.** No concept in two structs/enums hand-kept in sync. -2. **No advertise-but-ignore.** A public field/flag is honored or it does not exist. -3. **No façade.** A public enumerator that always panics is a bug — implement it, remove - it, or gate it behind a capability query + clean diagnostic. -4. **Width belongs to the type, not the opcode.** `bswap` is one operation, not three. -5. **Ops vs intrinsics has a stated rule** (§Track 4) and both layers obey it. -6. **The semantic layer may peephole, but that responsibility is named and isolated.** The - vstack peephole is a kept feature (free `-O0` perf), not removed. -7. **Completeness over minimalism.** Keep an op/enumerator with a distinct, sensible - meaning that completes an orthogonal set — even with no caller. Remove only the - *redundant*: two spellings of one behavior. - ---- - -## Done (committed, all green: lib · toy 1344/0 · cg-api · smoke x64/rv64 · opt · isa · libc; `make bootstrap` reproduces at -O0 AND -O1) - -| Commit | Track | Summary | -|---|---|---| -| `e27a288` | **1a / 1d** | Removed `SCOPE_IF` / `CGScopeDesc.cond` / the `scope_else` hook (both IR opcodes `CG_IR_SCOPE_ELSE` + `IR_SCOPE_ELSE`, all 5 realizations, the `desc.cond` opt walkers; ~22 files). Removed `KIT_CG_TAIL_NEVER` (redundant with `DEFAULT`). | -| `ae8d0f6` | **5** | Multi-result public API: `KitCgFuncSig.results[]`/`nresults` (+ `KitCgFuncResult`), `kit_cg_type_func_nresults`/`_result`, `kit_cg_ret_void` removed (void = 0-result `kit_cg_ret`). Type system stores `results[]`; `kit_cg_call` pushes/`kit_cg_ret` pops in declaration order (last result on TOS). **Includes a self-host regression fix:** a no-value return on a *non-void* function (UB fall-off) now emits `kit_cg_unreachable` instead of underflowing the value stack (`pcg_ret` in `lang/c/parse/cg_adapter.c`). | -| `fabf255` | **3a** | Dropped `KIT_CG_MEM_NONTEMPORAL`/`_INVARIANT` + `KitCgMemAccess.alias_scope`/`noalias_scope` (decision #5) and the matching toy attributes. | -| `5e1335d` | **4 (FP_REM)** | Removed the `KIT_CG_FP_REM` façade (always-panic; only dead callers). FP remainder is a libcall the frontend emits. | -| `917ffe9` | **2 (AsmDir)** | Deleted internal `AsmDir` + `api_map_asm_dir`; `AsmConstraint.dir` and backends use public `KitCgAsmDir`. | -| `a2f6367` | **2 (Atomic/Order)** | Deleted internal `AtomicOp`/`MemOrder` + `api_map_atomic_op`/`api_map_mem_order`; **both** the semantic `CgTarget` and physical `NativeTarget` atomic hooks, the recorder+opt IR aux, and the interpreter now carry public `KitCgAtomicOp`/`KitCgMemOrder`. | -| `d03eb4c` | **6.2** | Isolated the `-O0` semantic peephole into `src/cg/fold.{c,h}`: integer constant folding, the `SV_CMP` delayed-compare lifecycle, the (gated-off) `SV_ARITH` delayed-arith lifecycle, and const-local store-to-load forwarding with its invalidation boundaries. `fold.h` is the documented contract, re-exported via `internal.h`; `value.c` keeps stack discipline, `api_lvalue_addr`, and the enum-mapping helpers. Pure relocation, no behavior change. `doc/CODEGEN.md` updated. | -| `c338c74`+`8e17cb9` | **7 (core)** | Strict PLACE/VALUE addressing. Removed `KitCgEffAddr` from `load`/`store` (they consume a PLACE); added `deref(offset)` (VALUE ptr→PLACE), renamed `index`→`elem` (VALUE ptr + index→PLACE, scale=sizeof(T)), kept `field(i)`/`addr`. Each op **panics on kind mismatch** — no place/value inference. The place ops fold the constant offset (deref/field) and scale (elem→`log2_scale`) into one `OPK_INDIRECT[base + index*scale + offset]`, so the backend still gets a single addressing-mode memop (base/index dynamic, scale/offset folded). All three frontends + emu + cg-api tests conformed (explicit `deref`/decay/`field`). `cg.h` documents the kinds + per-op contracts. Green: toy 1344/0, cg-api, opt (incl tiny-inline), smoke, libc, isa/link/elf, and `make bootstrap` reproduces byte-identical at -O0 AND -O1. | -| `a0397c6` | **7.1 / 7.2** | Explicit PLACE predicate + forbid aggregate VALUEs. `api_is_lvalue_sv` is now a kind-based predicate — `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)` — replacing the old heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed; `SV_CMP`/`SV_ARITH` never carry `lvalue=1`). `api_push` now panics if an aggregate-typed value enters the stack as a non-place (aggregates are always PLACEs; i128/f128 are scalars and unaffected). | -| `6f48bfd` | **7.3** | Flow i128/f128 as VALUEs, collapse the wide16 special paths in `memory.c`/`call.c` (~100 lines deleted). The 16-byte scalars now ride the value path; the aggregate-like special-casing is gone. | -| `d08e794` | **3b** | Bitfield as a PLACE subkind, single representation. Dropped the bit-field rider on `KitCgMemAccess`; the strict `load`/`store` carry the bit-field geometry via the `KitCgMemAccess` the frontend supplies (rebuilt through `bf_from_access`), and `kit_cg_field` pushes the record-base address as a place of the field type with no `delayed.bitfield`/`bitfield_lvalue` rider. Removes the "every memop is secretly maybe-a-bitfield" branch. | -| `b8de5c0` | **6.3** | Re-enabled the `SV_ARITH` delayed-arith `-O0` peephole (gate flip in `fold.c`, now that Track 7 removed the EA rider it conflicted with). `doc/CODEGEN.md` note flipped from gated-off to live. | -| `52897e0` | **4a** | Collapsed `INTRIN_BSWAP16/32/64` into one width-by-type `BSWAP` (`cgtarget.h`). `arith.c` drops the size-branch; each backend (`aa64`/`x64`/`rv64` native, interp, c_target, wasm) derives width from `dsts[0].type` under a `switch(width)`, preserving the existing sequences. Pure internal dedup; public API unchanged. | -| `7eaf7bf9` | **4b** | `unreachable` is now a first-class terminator hook with its own `CgTarget` hook + IR op (recorder + opt), not routed through the intrinsic path. The 5 backends + interp + every opt pass that handles terminators (CFG/DCE/SSA/native-emit/…) handle it directly. | -| `15e2effc` | **4c** | `kit_cg_target_supports_intrinsic` query + a real unsupported-feature diagnostic (replacing the bare `compiler_panic`); implemented the single-instruction baremetal/CPU intrinsics (`cpu_nop`/`yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`) on the native arches. Converted the `test/toy/err/unsupported_*` panic cases into positive smoke cases + added the capability-query test. `FMA`/`SYSCALL`/`CORO_SWITCH` still report `false`. | - -So **Tracks 1a/1d, 5, 3a, 3b, 6, 7 are done; Track 4 is done** (FP_REM + 4a/4b/4c); -**Track 2 is 2/3 done** (the 3 identical enums; the binop/cmp split remains). - -### Caveats / follow-ups discovered while doing the above -- **Track 5 multi-result is single-result-complete only.** The `-O0` native path handles - `nresults > 1`, but the **opt path** (`src/opt/cg_ir_lower.c`, the `CG_IR_CALL`/`CG_IR_RET` - lowering) still only threads `results[0]` — a true 2+-result function is lossy at `-O1`. - The **wasm frontend** (`lang/wasm/cg.c`) was also migrated as single-result (takes - `f->results[0]`). True multi-value end-to-end (wasm + `-O1`) is unfinished follow-up. -- **The C frontend keeps its own private copies** of `BinOp`/`AtomicOp`/`MemOrder`/ - `IntrinKind` in `lang/c/parse/cg_adapter.h`. These are a **separate Principle-1 issue**, - deliberately left alone by Track 2 (they're a different namespace; do not blind-rename - `AO_*`/`MO_*`/`BO_*` across `lang/`). Worth a follow-up to dedupe against the public enums. -- **Regression lesson** (in [[doc/plan/BOOTSTRAP.md]] / the self-build): removing a "bare - return that ignores result count" primitive means every frontend's *fall-off / default* - return must push the right number of values or terminate with `unreachable`. Audit other - frontends if you remove more return primitives. - ---- - -## Track 1c — Conditional control ops: completeness audit + tests (REMAINING, small) - -KEEP the full `{break, continue} × {unconditional, _true, _false}` set (Principle 7; -`break_true`/`continue_true`/`continue_false` have 0 callers, `break_false` 1, but the set -is the structured analog of `branch_true`/`branch_false`). Remaining work is **test -coverage + an audit**, not code change: -- Confirm `continue*` is rejected on non-loop scopes and block-vs-loop rules are uniform. -- Add an end-to-end test for each `break_*`/`continue_*` variant on a backend (the - result-carrying semantics are spec'd at `cg.h` `kit_cg_break_true` &c). - ---- - -## Track 2 (remaining) — Split the merged `BinOp`/`UnOp`/`CmpOp` - -The 3 *identical* enums (Atomic/Order/AsmDir) are done. What remains is the **split→merged** -trio, which is the structural core of Track 2 (the largest remaining mechanical change): - -| Public (`cg.h`) | Internal (`cgtarget.h`) | Relationship | -|---|---|---| -| `KitCgIntBinOp`(13) + `KitCgFpBinOp`(4) | `BinOp` | split→merged | -| `KitCgIntCmpOp`(10) + `KitCgFpCmpOp`(12) | `CmpOp`(14) | split→merged, **lossy** | -| `KitCgIntUnOp`(3) + `KitCgFpUnOp`(1) | `UnOp` | split→merged | - -**Why it matters:** the merge is **lossy** — `api_map_fp_cmp` collapses `OEQ`/`UEQ` → one -`CMP_EQ` (`value.c`), so the public ordered/unordered FP-compare distinction cannot reach a -backend. Fixing that is the real correctness win; the binop/unop dedup is consistency. - -**Decision (#2): `CgTarget` consumes the public split enums directly; backends switch on -`KitCgIntBinOp` and `KitCgFpBinOp` separately.** Delete `BinOp`/`UnOp`/`CmpOp` and -`api_map_int_binop`/`api_map_fp_binop`/`api_map_int_unop`/`api_map_int_cmp`/`api_map_fp_cmp`. - -### Why this is bigger than the atomic slice — the design to implement -Unlike Atomic/Order (a 1:1 value-preserving *rename*), this is a genuine **split**: - -1. **Hooks split** (`cgtarget.h`, mirrored in `native_target.h` if any binop/cmp is physical - — check; binop/cmp are semantic `CgTarget` hooks): `binop`→`int_binop`/`fp_binop`, - `unop`→`int_unop`/`fp_unop`, `cmp`→`int_cmp`/`fp_cmp`, `cmp_branch`→ - `int_cmp_branch`/`fp_cmp_branch`. -2. **IR opcodes double** — the recorder (`src/cg/ir.h`) and opt IR (`src/opt/ir.h`) store the - op in `extra.imm`/aux; a single `CG_IR_BINOP`/`IR_BINOP` can't hold an ambiguous value - (`KIT_CG_INT_ADD == KIT_CG_FP_ADD == 0`). Either split the opcodes - (`CG_IR_INT_BINOP`/`CG_IR_FP_BINOP`, …) **or** add an `is_fp` discriminator bit. Splitting - the opcodes is cleaner; both touch `ir_recorder.c`, `cg_ir_lower.c`, `pass_native_emit.c`, - `ir_dump.c`/`ir_print.c`, and every opt pass that switches on `IR_BINOP`/`IR_CMP`/ - `IR_CMP_BRANCH` (`pass_combine`, `pass_simplify`, `pass_o2`, `pass_jump`, …). -3. **Fold layer restructures** (`arith.c` + `value.c`). `api_cg_binop(BinOp)` / - `api_cg_unop(UnOp)` / `api_cg_cmp(CmpOp)` are the shared dispatch. Note the int/fp split - is **already made by TYPE** (`api_type_is_float`), not by the enum — so splitting the - dispatch is natural: the int path keeps the fold (`api_try_fold_int_binop`/`_unop`/`_cmp`, - int-only) and the delayed forms (`SV_ARITH` arith, `SV_CMP` compare; `ApiDelayedArith.bin_op`/ - `un_op`, `ApiDelayedCmp.op`, `api_make_cmp`, `api_materialize_cmp_to`, `api_invert_cmp`, - `api_branch_if`); the fp path is simpler (the f128 helper path in `kit_cg_fp_binop` - already exists, plus the fp hook). **This is the subtle, high-risk part** — get the - delayed-compare fusion + constant-fold right per int/fp. Coordinate with Track 6.2 (which - moves these into `fold.c`); doing 6.2 first may make this cleaner. -4. **Backends split their switches:** the 3 native arches (`aa64`/`x64`/`rv64` `native.c` — - they already re-split int/fp internally), `c_target/c_emit.c`, `wasm/emit.c`, and the - interpreter (`interp/engine.c`). - -**Method that worked for the atomic slice:** delete the internal enum + change the hook -signatures, then let `-Werror` enumerate every cg-side site (the C frontend's `cg_adapter.h` -copy won't be flagged — it's a different type). Then fix per file. For the value-label -renames, sed **only** within `src/cg|arch|opt|interp` (never `lang/`, never `src/wasm/`). - -**Tests:** `test-isa`/`test-arch` (encode/decode), `test-opt`, smoke; **add an unordered FP -compare exercised end-to-end** (the currently-lossy case) — that's the regression guard for -the real fix. - ---- - -## Track 3b — Bitfields as a PLACE subkind (DONE — `d08e794`) - -**LANDED on `codegen-tracks-7634`** (`d08e794`). A bitfield is now a PLACE subkind: the -bit-field rider was dropped from `KitCgMemAccess`, and the strict `load`/`store` carry -the bit-field geometry (storage size/offset, bit offset/width, signedness) via the -`KitCgMemAccess` the frontend supplies, rebuilt through `bf_from_access`. `kit_cg_field` -now pushes the record-base address as a place of the field type with no `delayed.bitfield` -/`bitfield_lvalue` rider, and the "every memop is secretly maybe-a-bitfield" branch in -`kit_cg_load`/`_store` is gone. Touched `cg.h`, `internal.h`, `memory.c`, `value.c`, -`control.c`, and `lang/c/parse/cg_adapter.c`; green on the bitfield corpus + `test-cg-api` -+ bootstrap. (Done as a PLACE subkind on the strict `load`/`store`, after Track 7 core.) - ---- - -## Track 4 — op/intrinsic taxonomy (DONE — `5e1335d` + `52897e0` + `7eaf7bf9` + `15e2effc`) - -**LANDED.** FP_REM removal (`5e1335d`) plus 4a/4b/4c on `codegen-tracks-7634`: - -### 4a. Width-by-type: collapse `BSWAP16/32/64` → one `BSWAP` — DONE (`52897e0`) -Collapsed the 3 internal `IntrinKind` bswaps into one width-by-type `BSWAP` in `cgtarget.h`. -`arith.c` dropped the `abi_cg_sizeof`-driven size-branch; each backend now derives width from -`dsts[0].type` and wraps its three existing sequences under a `switch(width)`, preserving them -verbatim. Done across `aa64`/`x64`/`rv64` `native.c`, `interp/engine.c`, `c_target/c_emit.c`, -and **wasm (`arch/wasm/emit.c` + `internal.h`)**. The C frontend's `cg_adapter.h` -`INTRIN_BSWAP16/32/64` was left as-is (maps to the public single `BSWAP` at the call site). -Pure internal dedup — public API unchanged. - -### 4b. `unreachable` as a first-class terminator hook — DONE (`7eaf7bf9`) -`kit_cg_unreachable` now has its own `CgTarget` hook + its own IR op (recorder + opt) and is -no longer routed through the intrinsic hook. The 5 backends' + interp's handling, plus every -opt pass that handles terminators (`pass_cfg`/`pass_dce`/`pass_ssa`/`pass_analysis`/`pass_o2`/ -`pass_lower`/`pass_native_emit`, `cg_ir_lower`, `ir_dump`/`ir_print`, `check_target`), were -moved onto it. (Terminators are first-class: ret, unreachable, jump, branch, computed_goto, -tail-call.) - -### 4c. Façade intrinsics: query + implement the trivial ones — DONE (`15e2effc`) -Added `kit_cg_target_supports_intrinsic(KitCompiler*, KitCgIntrinsic)` (mirroring -`_supports_call_conv`/`_symbol_feature`) and converted the bare `compiler_panic` into a proper -unsupported-feature diagnostic. Implemented the single-instruction baremetal/CPU intrinsics on -the native arches (`cpu_nop`/`cpu_yield`/`wfi`/`wfe`/`sev`/`isb`/`dmb`/`dsb`/`irq_*`). The -`test/toy/err/unsupported_*` panic cases were converted into positive smoke cases (plus a new -`144_intrinsic_capability_query` + `145_baremetal_privileged_aa64`). `FMA`/`SYSCALL`/ -`CORO_SWITCH` still report `false` until implemented. - -**Follow-up not done here:** the "keep `memcpy`/`memset` as dedicated public ops but stop -double-modeling them as a separate public intrinsic surface" cleanup was *not* part of this -slice — it remains an open taxonomy tidy if wanted. - ---- - -## Track 6 — Isolate and complete the semantic peephole (DONE — `d03eb4c` + `b8de5c0`) - -The semantic layer is also a `-O0` peephole optimizer — a **kept feature** (Principle 6). - -**Status: DONE.** Both 6.2 (`d03eb4c`) and 6.3 (`b8de5c0`) landed. - -### Current state -- **Live:** constant folding (`api_try_fold_int_binop`/`_unop`/`_cmp`, in `fold.c`) and - the `SV_CMP` fused-compare-into-branch path (`api_make_cmp`/`api_materialize_cmp_to`/ - `api_branch_if`). -- **Live again:** the `SV_ARITH` delayed-arith subsystem — re-enabled by `b8de5c0` once - Track 7 removed the EA rider it conflicted with. -- **Live:** scalar store-to-load forwarding (`api_local_const_*`). - -### Action — completed -1. **6.2 — Extract the live peephole into `src/cg/fold.c` + `fold.h`** — **DONE** (`d03eb4c`). - The documented contract covers the integer fold helpers, the `SV_CMP` lifecycle, and - const-local forwarding with its invalidation boundaries - (`api_local_const_memory_boundary`/`_control_boundary`/`_address_taken`). The (then - gated-off) `SV_ARITH` machinery was moved alongside it so 6.3 was a gate flip, not a code - move. Op families call into `fold.h`; `value.c` keeps the stack discipline. `ApiSValue`'s - shape is settled for Track 7, and the Track 2 binop/cmp split has the fold layer isolated. -2. **6.3 — Re-enable delayed arith after Track 7** — **DONE** (`b8de5c0`). The gate in - `api_can_delay_int_arith` (in `fold.c`) was restored now that Track 7 removed the EA - rider; the `api_make_arith_*`/`api_materialize_arith_to`/`api_release_arith` fold-chain + - identity-collapse helpers compose with the place/value model. Green at -O0; bootstrap - reproduces. -3. **Fix [doc/CODEGEN.md](../CODEGEN.md)** — **DONE.** 6.2 introduced `fold.c` and marked - delayed arith gated-off; 6.3 flipped that note to "live". - ---- - -## Track 7 — Strict place/value discipline (the centerpiece) — DONE - -**Status: LANDED** (`c338c74`+`8e17cb9` core; `a0397c6` 7.1/7.2; `6f48bfd` 7.3). The public -addressing surface is the strict `push_local`/`addr`/`deref`/`field`/`elem`/`load`/`store` -set; the `KitCgEffAddr` rider is gone; every op panics on a place/value kind mismatch (no -inference at the boundary); and the place ops fold the constant offset/scale into one -`OPK_INDIRECT[base + index*scale + offset]` for clean memops. All frontends + emu + cg-api -tests conform; `make bootstrap` reproduces at -O0 AND -O1. - -**Refinements (now landed):** -- The *internal* place predicate `api_is_lvalue_sv` is **now kind-based** (`a0397c6`): - `sv->lvalue && kind == SV_OPERAND && api_operand_can_address(&sv->op)`, replacing the old - heuristic OR (the `bitfield_lvalue` and `source_local && OPK_LOCAL` terms are subsumed). -- Aggregate **VALUE**s are **now hard-forbidden** (`a0397c6`): `api_push` panics on an - aggregate-typed non-place value. i128/f128 are scalars and unaffected. -- **wide16** (i128/f128) **now flows as a VALUE** (`6f48bfd`): the aggregate-like special - paths in `memory.c`/`call.c` collapsed (~100 lines deleted). -- **Bitfields** are **now a PLACE subkind** (Track 3b, `d08e794`): the bit-field rider was - dropped from `KitCgMemAccess`; the strict `load`/`store` carry the geometry. - -**Remaining refinement (follow-up, non-blocking per decision #8):** -- **-O0 mem-op quality**: the C frontend reaches non-trivial places via - `pcg_materialize_lv_to_ptr` (int arithmetic) + `deref`; it could instead emit - `deref(offset)`/`elem` directly so -O0 also gets the folded addressing mode (-O1's - addr-fold already recovers it; decision #8 makes this non-blocking). - -### Original design notes (for the remaining refinements) - -**Decided:** Model B (explicit place/value kinds); wide-16 scalars are *values*. (Track 3c — -the `scale` vs `log2_scale` rider mismatch — is subsumed here: the `KitCgEffAddr` rider is -removed entirely.) - -Today the value stack carries an **inferred** lvalue/rvalue distinction and several ops -dispatch on type + shape. Inference points to remove: -- **`api_is_lvalue_sv` is a heuristic** (`value.c`): ORs `lvalue`, `bitfield_lvalue`, - `api_operand_can_address`, `source_local!=NONE && OPK_LOCAL`. -- **`kit_cg_load` has ~7 behaviors, several of which don't load** (`memory.c`). -- **`load`/`store` `base` accepts 4 shapes** ({lvalue, ptr-rvalue} × {no-index, indexed}); - there is no explicit deref. -- **`kit_cg_index` / `kit_cg_field` infer** pointer-vs-array / record-vs-pointer. -- **Aggregates are implicitly by-reference, CG decides it** (`call.c`). -- **wide16 (i128/f128) is special-cased** as aggregate-like (`memory.c`/`call.c`/`wide.c`). - -### The discipline -Every stack entry is exactly one explicit, type-checked kind: -- **PLACE** — addressable location of a typed object (`OPK_LOCAL`/`OPK_GLOBAL`/ - `OPK_INDIRECT(base+index*scale+off)`). -- **VALUE** — a scalar rvalue: integers, floats, **pointers, and i128/f128**. - -CG keeps owning **layout** (field offsets, element sizes, types). It stops guessing the kind -or passing-mode of a stack value. Every op declares the kinds it consumes/produces and panics -on mismatch. - -### Op signatures (strict, single-shape) -| Op | Consumes | Produces | Notes | -|---|---|---|---| -| `push_local l` | — | **PLACE** | the local's storage | -| `push_int/float/null` | — | VALUE | | -| `push_symbol_addr s,a` | — | VALUE (ptr) | | -| `push_local_addr l` | — | VALUE (ptr) | sugar for `push_local; addr` | -| `addr` | PLACE | VALUE (ptr) | address of the place | -| **`deref`** (NEW) | VALUE (ptr) | PLACE | the explicit ptr→place transition | -| `field i` | PLACE(record) | PLACE(field) | offset/type from layout; `->` is `deref; field` | -| `elem` (was `index`) | VALUE(ptr to T) + index VALUE | PLACE(T) | `*(p+i)`; scale=`sizeof(T)`; arrays decay to ptr first | -| `load access` | PLACE | VALUE | always dereferences; **no EA rider** | -| `store access` | PLACE, VALUE | — | always dereferences | - -The **`KitCgEffAddr` rider is removed** from `load`/`store`: addressing is built explicitly -by `field`/`elem`/`deref` and absorbed into the `OPK_INDIRECT` place, so the backend still -gets a single `[base+index*scale+off]` memop. The kept fold layer (Track 6) recovers `-O0` -quality (`load` of `PLACE(local)` → the local; `deref` of a ptr-arith chain → the indirect -place). Per decision #8 this recovery is **desirable but not a gate**. - -### Aggregates (values forbidden) -An aggregate is **always a PLACE**; a VALUE of aggregate type is illegal (panic). Copies are -explicit (`memcpy` between places, or field-by-field). Call args/returns of aggregate type -pass an explicit place, mode named via existing ABI attrs (`SRET`/`BYVAL`/`BYREF`). Removes -the aggregate branches in `api_materialize_call_local`, `api_push_call_result`, and the -aggregate `ret` path. - -### wide16 (scalar values) -`i128`/`f128` are VALUES; the backend lowers 16-byte storage/moves. The wide16 special paths -in `memory.c`/`call.c`/`wide.c` collapse into the value path (plus backend 16-byte value-move -support where missing). - -**Affected:** `cg.h` (new `deref`, `elem` rename, EA rider removed, `ApiSValue` kind tag), -`value.c`, `memory.c`, `control.c` (`index`→`elem`, `field`), `call.c`, `wide.c`, **every -frontend** (insert explicit `deref`/array-decay; mark aggregate passing modes). Backends -mostly unaffected (they already consume `OPK_INDIRECT`). **Tests:** highest blast radius — -red-green per op on the toy corpus + C frontend; `-O0` quality is not a gate (decision #8). - ---- - -## Recommended sequencing (remaining) - -Most of the original sequence is landed: **6.2** (`d03eb4c`), **Track 7** core + 7.1/7.2/7.3 -(`c338c74`/`8e17cb9`/`a0397c6`/`6f48bfd`), **6.3** (`b8de5c0`), **Track 3b** (`d08e794`), and -**Track 4** 4a/4b/4c (`52897e0`/`7eaf7bf9`/`15e2effc`) are all done. What's left: - -1. **Track 2 binop/cmp split** — the largest remaining mechanical change; cleaner now that - the fold layer is isolated (6.2). Also fixes the lossy FP compare. -2. **Track 1c** completeness audit + tests (small, no behavior change). -3. **Track 5 follow-up** — true multi-value at `-O1` (opt `cg_ir_lower`) + wasm, if wanted. -4. **Track 4 taxonomy tidy** (optional) — stop double-modeling `memcpy`/`memset` as a - separate public intrinsic surface (kept as dedicated public ops). - -Track 2 is independent of everything still open; the fold-layer isolation (6.2) already -helps it. - -## Decisions still governing remaining work - -2. **Op enums: one public vocabulary, int/fp split.** `CgTarget` consumes the public split - enums; delete internal `BinOp`/`UnOp`/`CmpOp` + their `api_map_*`. (Atomic/Order/AsmDir - already done.) — **still governs the open Track 2 binop/cmp split.** - -(Decisions 1, 3, 4, 5, 6, 7, 8 are realized: 1 = peephole kept + re-enabled under Track 6 -(`b8de5c0`); 3 = `supports_intrinsic` + diagnostic + CPU intrinsics landed (`15e2effc`), -`FP_REM` removed; 4 = `ret_void` removed; 5 = NONTEMPORAL/INVARIANT/alias scopes removed; -6 = `elem` is a pointer VALUE + explicit array-decay (Track 7 core); 7 = bitfields are a -PLACE subkind (`d08e794`); 8 = `-O0` quality was not a gate for Track 7, and Track 6.3 -restored the peephole.) diff --git a/doc/plan/README.md b/doc/plan/README.md @@ -20,6 +20,5 @@ shrinks to whatever remains open. | [BUILD.md](BUILD.md) | A new content-addressed build coordinator (Bazel/Nix-style incremental builds layered on the CAS) — storage state machine, caching algorithm, recipe protocol. Distinct from `../BUILD.md` (kit's own Makefile build). | — (new subsystem) | | [BUILD_COMMANDS.md](BUILD_COMMANDS.md) | The kit-native `build-exe`/`build-lib`/`build-obj` verbs that replace `compile`: polyglot, in-memory compile+link with `--group` flag scoping and full link-flag control. Distinct from `BUILD.md` (the CAS coordinator). | [../DRIVER.md](../DRIVER.md) | | [LLGEN_IMPORT.md](LLGEN_IMPORT.md) | Importing the standalone LL(1)/Pratt parser and lexer generator into libkit, including public API renames, file moves, build gates, and a `kit llgen` command. | — | -| [CODEGEN.md](CODEGEN.md) | CG API interface cleanup: PLACE/VALUE centerpiece, op/intrinsic taxonomy, atomic/order/AsmDir unification, multi-result API, i128/f128-as-VALUE. Tracks 1/3/4/5/6/7 landed; Track 2 (binop/cmp split) and Track 1c open. | [../CODEGEN.md](../CODEGEN.md) | | [FREEBSD.md](FREEBSD.md) | FreeBSD target support: VM harness, triple parsing, runtime variants, COMDAT/`STB_GNU_UNIQUE` fixes. Static link blocked on archive weak-alias cycle (needs `--start-group` semantics); dynamic link and full VM validation remaining. | — | | [TODO.md](TODO.md) | Open deferred fixes and code smells only. Completed items are removed instead of checked off. Not a roadmap; a current backlog. | — |