kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit e554263a21a3c16282604bbdaad3bee5283f039c
parent a0397c63a404f4589469e97260336eeaffc04549
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue,  2 Jun 2026 03:51:36 -0700

plan: fp cg fix

Diffstat:
Adoc/plan/CGFP.md | 182+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 182 insertions(+), 0 deletions(-)

diff --git a/doc/plan/CGFP.md b/doc/plan/CGFP.md @@ -0,0 +1,182 @@ +# Make the internal `CmpOp` FP-compare-lossless (a "disjoint FP block") + +## Context + +cfree has two stacked codegen interfaces (see `doc/plan/CODEGEN.md`, Track 2). The public +API (`include/cfree/cg.h`) splits compares by class — `CfreeCgIntCmpOp` (10) and the full +IEEE-complete `CfreeCgFpCmpOp` (12: OEQ, ONE, OLT, OLE, OGT, OGE, UEQ, UNE, ULT, ULE, UGT, +UGE). The internal `CgTarget` vocabulary `CmpOp` (`src/cg/cgtarget.h:60-75`) merges them into +14 members and is **lossy for floating point**: it has no unordered relationals, and +`CMP_EQ`/`CMP_NE` are overloaded between int and float (on float, `CMP_EQ`=ordered-equal, +`CMP_NE`=unordered-not-equal). `api_map_fp_cmp` (`src/cg/value.c:503`) collapses the 12 +public FP predicates down to 6, so the ordered/unordered distinction **cannot reach any +backend**. + +**Why fix it (accurate motivation — there is no live C miscompile):** +- **Advertise-but-ignore (Principle 2).** A direct cg-api caller — or any future frontend, or + the C99 `isgreater`/`isless`/`islessgreater`/`isunordered` math builtins if/when the C + frontend implements them — that requests `ULT/ULE/UGT/UGE/UEQ/ONE` is silently lowered to + the wrong (ordered) predicate, producing wrong NaN behavior. The public surface promises a + distinction the backend never receives. +- **A masked frontend mislabel.** Both the C frontend (`pcg_fp_cmp`, `cg_adapter.c:343`) and + toy (`expr.c:1432`) map `!=` to **ONE** (ordered-not-equal). C `!=` on floats is + **unordered** (`NaN != x` is true). This is "correct by accident" today *only* because the + lossy collapse maps ONE → `CMP_NE`, which every backend implements as UNE. Making the map + lossless would expose this — so the frontend fix is **coupled** to this change. +- **A latent inversion bug.** `api_invert_cmp` (`src/cg/fold.c:214`) inverts ordered FP + relationals to ordered duals (`OLT→OGE`) — wrong around NaN (the correct inverse of ordered + `a<b` is *unordered* `a>=b`). It is currently **unreachable** because `api_cg_cmp` + materializes FP compares immediately (`arith.c:181-192`) rather than building a delayed + `SV_CMP`; it becomes a live miscompile the moment anyone fuses/delays an FP cmp+branch. +- **A second, parallel lossy path.** The f128/`long double` soft-float path + (`cfree_cg_fp_cmp`, `arith.c:602-651`) collapses ordered/unordered onto shared libcalls the + same way and must be fixed in lockstep. +- **It is the correctness content of Track 2, decoupled from the structural split.** This + change does *not* split the `cmp` hook, double the IR opcodes, or touch binop/unop. It makes + the merged enum as expressive as the public one already is (a clean disjoint int-block + + fp-block, isomorphic to `CfreeCgIntCmpOp ⊎ CfreeCgFpCmpOp`), so the eventual full split is + reduced to pure type/hook plumbing with no correctness left in it. + +## Design + +### 1. The enum: disjoint FP block (`src/cg/cgtarget.h`) +Replace the FP portion of `CmpOp`. Keep the 10 integer members unchanged (already 1:1 with +`CfreeCgIntCmpOp`); replace `CMP_LT_F/LE_F/GT_F/GE_F` and the FP overloading of `CMP_EQ/NE` +with a clean 12-member FP block laid out **after** the int block, in the **same order** as +`CfreeCgFpCmpOp`: + +``` +CMP_EQ, CMP_NE, CMP_LT_S, CMP_LE_S, CMP_GT_S, CMP_GE_S, CMP_LT_U, CMP_LE_U, CMP_GT_U, CMP_GE_U, // 0..9 (int, unchanged) +CMP_OEQ_F, CMP_ONE_F, CMP_OLT_F, CMP_OLE_F, CMP_OGT_F, CMP_OGE_F, // 10..15 (FP ordered) +CMP_UEQ_F, CMP_UNE_F, CMP_ULT_F, CMP_ULE_F, CMP_UGT_F, CMP_UGE_F // 16..21 (FP unordered) +``` + +`CMP_OEQ_F` (=10) is the new FP boundary. The old names `CMP_LT_F/LE_F/GT_F/GE_F` are +**removed** so `-Werror` hard-errors every stale *name* reference. Update the header comment +(`cgtarget.h:54-59`) to state the block is now IEEE-complete (drop the "does not encode the +ordered/unordered distinction" disclaimer). + +### 2. Mappings (`src/cg/value.c`) +- `api_map_int_cmp` — unchanged (already 1:1). +- `api_map_fp_cmp` — make **1:1** (OEQ→`CMP_OEQ_F`, …, UGE→`CMP_UGE_F`). Since the FP block + mirrors the public enum order, this is `return (CmpOp)(CMP_OEQ_F + op);` (keep as an + explicit switch for clarity; the full Track 2 deletes it later). + +### 3. NaN-aware inversion (`src/cg/fold.c:214`, `api_invert_cmp`) +Replace the 4 buggy FP arms with the full 12-member table. Rule: **flip ordered↔unordered and +negate the relation** (eq↔ne, lt↔ge, le↔gt, gt↔le, ge↔lt): + +``` +OEQ→UNE ONE→UEQ OLT→UGE OLE→UGT OGT→ULE OGE→ULT +UEQ→ONE UNE→OEQ ULT→OGE ULE→OGT UGT→OLE UGE→OLT +``` + +### 4. Dispatch simplification +FP-ness is now self-describing from the opcode: replace the four `op >= CMP_LT_F || +((op==CMP_EQ||op==CMP_NE) && loc_is_fp(...))` sites with `op >= CMP_OEQ_F`, dropping the +operand-sniffing clause (FP eq/ne are now distinct opcodes). Sites: `interp/engine.c:341`, +`rv64/native.c:998`, `rv64/native.c:1148`, `x64/native.c:1112`. +**Caveat:** aa64 does **not** use this boundary — `aa_emit_cmp_to_flags` +(`aa64/native.c:2185-2191`) chooses FCMP-vs-SUBS from the register class via `loc_is_fp(lhs)`; +that operand-sniff must **stay** (it's precision/class selection, not predicate decode). +Single/double precision selection (rv64 `RV_FMT_D/S`, x64 `0x66` prefix) is type-driven +everywhere and is unaffected. + +### 5. Backend emission — the 6 new unordered predicates +Key identity used throughout: **`unordered-R = NOT(ordered-¬R)`** — `ULT=!(OGE)`, `ULE=!(OGT)`, +`UGT=!(OLE)`, `UGE=!(OLT)`, `UEQ=!(ONE)`, `UNE=!(OEQ)`. Each backend renames its 4 ordered +arms (`LT_F→OLT_F`, …) + the FP eq/ne arms (old `CMP_EQ/NE`-on-float → `CMP_OEQ_F`/`CMP_UNE_F`) +and adds the unordered arms. Per-arch reality (cost ~moderate, **not** the 300-500 LOC a naive +read suggests; rv64 is *not* the long pole once negation is used): + +| Backend | File:hook | Strategy + trap | +|---|---|---| +| interp | `engine.c:340 do_cmp` | trivial: direct C doubles w/ explicit `x!=x\|\|y!=y`; bump gate to `CMP_OEQ_F` | +| rv64 | `native.c:991 rv_cmp`, `:1136 rv_cmp_branch` | ordered feq/flt/fle + `xori …,1` for unordered. `flt/fle` are **signaling** (raise NV on NaN) — pre-existing for ordered ops; document, boolean is correct. `default:` is `rv_panic` (traps, not silent) | +| x64 | `native.c:1161 x64_cmp`, `:1334 cmp_branch`, `:1095 cmp_to_cc` | `ucomisd` sets ZF/PF/CF (PF=unordered). Ordered=AND-with-NP, unordered=OR-with-P. **Build each predicate's flag formula explicitly via De Morgan** — do not blindly `!(opposite)`; the existing GT/GE arms skip parity correction. `cmp_to_cc default:` returns E (**silent** — see methodology) | +| aa64 | `native.c:634 cmp_cond`, `:2205 aa_cmp`, `:1668 cmp_branch` | keep quiet `FCMP` (`:482`, not FCMPE). Build full 12-entry FP cond table from the ARM FP-condition reference. **`UEQ` (equal-or-unordered) and `ONE` (ordered-and-not-equal) have no single AArch64 condition** → 2 instructions (e.g. fcmp + cset + csinc). `cmp_cond default:` returns 0x0=EQ (**silent, highest-risk**) | +| c_target | `c_emit.c:1340 cmp_to_c`, `:1404 c_emit_cmp` | operator-negation in generated C: `ULT→!((a)>=(b))`, `ONE→((a)<(b)\|\|(a)>(b))`, `UEQ→(!((a)<(b))&&!((a)>(b)))`, `UNE→(a)!=(b)`; no `isnan`. Must restructure `c_emit_cmp` to emit compound expressions (not just one operator) and wrap `!(...)` around the full cast-bearing comparison. Don't let the host compiler use `-ffast-math` | +| wasm | `emit.c:2493 cmp_kind` | **FP eq/ne arms are currently absent** — today f32/f64 eq/ne falls through to integer `WASM_INSN_*_EQ/NE` (a validator type error); add all 12 from scratch, not "rename". Unordered via `i32.eqz` of the opposite ordered compare; `ONE`/`UEQ` via or-combine | + +### 6. f128 / soft-float path (`src/cg/arith.c:602-651`) +Independent of the enum (it uses integer compares on libcall results), but the same +collapse. Make it lossless using `__unordtf2` (nonzero iff either operand is NaN): +- ordered relationals: unchanged (`OLT→__lttf2 <0`, etc.). +- unordered relationals: `Ult = (__lttf2(a,b) < 0) || (__unordtf2(a,b) != 0)`, and similarly + for ULE/UGT/UGE. +- `UEQ = (__eqtf2==0) || (__unordtf2!=0)`; `OEQ = (__eqtf2==0)`. +- `ONE = (__netf2!=0) && (__unordtf2==0)`; `UNE = (__netf2!=0)` (note: `__netf2!=0` already + means unordered-not-equal, so today's `ONE→__netf2+CMP_NE` actually computes UNE — same + masking as the scalar path). Verify `__unordtf2` is provided by the runtime (`rt/`). + +### 7. Coupled frontend correctness fix (lands with the lossless map) +- `lang/c/parse/cg_adapter.c:343` — `case CMP_NE: return CFREE_CG_FP_UNE;` (was `ONE`). +- `lang/toy/expr.c:1432` — `case TOK_NE: fp_cmp = CFREE_CG_FP_UNE;` (was `ONE`). +These touch only the frontends' own private enums/tokens → public mapping; they do **not** +depend on or conflict with the internal-enum rename (the C frontend has its *own* `CmpOp` at +`cg_adapter.h:88`, so the `-Werror` name removal in `cgtarget.h` must not be globally +sed-applied into `lang/`). + +### 8. Dumper + IR storage +- `src/cg/ir_dump.c:106 cg_ir_cmp_name` — rename/add the 12 FP name strings (no `default:`, so + `-Wswitch` *will* flag omissions here). +- IR storage is already wide enough: `CgIrCmpBranchAux.op` is a full `CmpOp` (`ir.h:112`), + `extra.imm` is `i64`; recorder/lower pass the op through transparently — **no change**. + +### Confirmed non-issues (do not touch) +`pass_native_emit.c` / `cg_ir_lower.c` pass the op through; `pass_simplify.c:simplify_cmp` +bails on float via `simplify_width` (`:6`) before the switch; `pass_o2.c gvn_fold_cmp` is +int-only and `gvn_commutative_cmp` is EQ/NE-only (FP eq/ne CSE is an optional future add); +`pass_jump.c:invert_cmp` (`:177`) has `default: return 0` and stays conservatively safe for all +FP. No array is indexed by `CmpOp`; no `CmpOp` consumers in `src/asm` or `src/debug`. + +## Methodology note — `-Werror` will NOT catch the dangerous sites +`Makefile` uses `-Wall -Wextra -Werror` but not `-Wswitch-enum`; `-Wswitch` only fires on a +switch with **no `default:`**. Removing the old names flags every stale *name* use (good), and +flags missing enumerators in the no-`default` switches (`api_invert_cmp`, `cg_ir_cmp_name`, +wasm `cmp_kind`, c_target `cmp_to_c`). But the `default:`-bearing dispatch tables **silently +mishandle** the new ops: `aa64 cmp_cond`→EQ, `x64 cmp_to_cc`→E, `interp do_cmp`, `x64_cmp` FP +switch. Treat the per-backend table in §5 as the authoritative checklist, and/or compile the +touched files with `-Wswitch-enum` while developing. + +## Files to modify (grouped) +- **Enum/map/fold/dump:** `src/cg/cgtarget.h`, `src/cg/value.c`, `src/cg/fold.c`, + `src/cg/ir_dump.c` +- **Soft-float:** `src/cg/arith.c` (f128 path) +- **Backends:** `src/arch/{aa64,x64,rv64}/native.c`, `src/arch/c_target/c_emit.c`, + `src/arch/wasm/emit.c`, `src/interp/engine.c` +- **Frontends (coupled fix):** `lang/c/parse/cg_adapter.c`, `lang/toy/expr.c` +- **Tests:** `test/interp/interp_smoke_test.c` (rename refs + extend), new + `test/api/cg_fp_cmp_test.c` + +## Recommended sequencing (red-green, keep every commit green) +1. **Red:** add the new public-API test (`test/api/cg_fp_cmp_test.c`) driving all 12 + `cfree_cg_fp_cmp(CFREE_CG_FP_*)` predicates through the in-process JIT/interp with NaN and + ordinary operands; it fails today on `ULT/ULE/UGT/UGE/UEQ/ONE` (and on f128). +2. **Core (atomic):** rewrite `CmpOp` (disjoint block), make `api_map_fp_cmp` 1:1, fix + `api_invert_cmp`, update all 6 backends + interp via the §5 checklist, the 4 boundary + checks (§4), `cg_ir_cmp_name`, and the internal-name refs in `interp_smoke_test.c` + (`CMP_LT_F→CMP_OLT_F`, `CMP_GE_F→CMP_OGE_F`, float `CMP_EQ→CMP_OEQ_F`, `CMP_NE→CMP_UNE_F`). + Single commit so a `make bootstrap` byte-identity check lands on one consistent state. +3. **Frontend `!=` fix** (§7) — small, independent commit. +4. **f128 lossless** (§6) — self-contained commit. +5. Extend `spec_fp_cmp_nan` to all 12 internal predicates × {NaN-lhs, NaN-rhs, both-NaN, + ordered, -0.0/0.0}. + +## Verification +- **Unit (internal level):** `make test-isa test-opt`; the extended `spec_fp_cmp_nan` in + `make test-...` for interp guards each backend's emission of the new internal ops. NaN is + injected via `bitsd(0x7ff8000000000000ull)` (already in the test). +- **End-to-end public API (the advertise-but-ignore guard):** `make test-cg-api` runs the new + `cg_fp_cmp_test.c`. It must drive the **public** enum directly — the discriminating cases + (`ULT/ULE/UGT/UGE/UEQ/ONE`) cannot be produced by any C/toy source, so they are only + reachable through `cfree_cg_fp_cmp`. Assert per-predicate results for `(NaN,1.0)`, + `(1.0,NaN)`, `(NaN,NaN)`, `(1.0,2.0)`, `(-0.0,0.0)`. Add a **`long double` variant** to cover + the f128 path (§6) and a **wasm** case to surface the missing FP eq/ne (§5). +- **Per-arch execution:** run the cg-api/JIT cases on x64/aa64/rv64 via + `test/lib/exec_target.sh` (`exec_target_queue`/`flush`). +- **Frontend `!=`:** a C execution test `int g(float a,float b){return a!=b;}` with `a=NaN` + must return 1 (UNE), guarding §7 against the now-unmasked mislabel. +- **No regression:** `make test-asm test-lex test-parse test-toy test-smoke-x64 + test-smoke-rv64`; and `make bootstrap` must still reproduce **byte-identical at -O0 AND -O1** + (a no-regression guard — it won't exercise unordered predicates itself).