Make the internal `CmpOp` FP-compare-lossless (a "disjoint FP block")

Context

kit has two stacked codegen interfaces (see doc/plan/CODEGEN.md, Track 2). The public API (include/kit/cg.h) splits compares by class — KitCgIntCmpOp (10) and the full IEEE-complete KitCgFpCmpOp (12: OEQ, ONE, OLT, OLE, OGT, OGE, UEQ, UNE, ULT, ULE, UGT, UGE). The internal CgTarget vocabulary CmpOp (src/cg/cgtarget.h:60-75) merges them into 14 members and is lossy for floating point: it has no unordered relationals, and CMP_EQ/CMP_NE are overloaded between int and float (on float, CMP_EQ=ordered-equal, CMP_NE=unordered-not-equal). api_map_fp_cmp (src/cg/value.c:503) collapses the 12 public FP predicates down to 6, so the ordered/unordered distinction cannot reach any backend.

Why fix it (accurate motivation — there is no live C miscompile):

Advertise-but-ignore (Principle 2). A direct cg-api caller — or any future frontend, or the C99 isgreater/isless/islessgreater/isunordered math builtins if/when the C frontend implements them — that requests ULT/ULE/UGT/UGE/UEQ/ONE is silently lowered to the wrong (ordered) predicate, producing wrong NaN behavior. The public surface promises a distinction the backend never receives.
A masked frontend mislabel. Both the C frontend (pcg_fp_cmp, cg_adapter.c:343) and toy (expr.c:1432) map != to ONE (ordered-not-equal). C != on floats is unordered (NaN != x is true). This is "correct by accident" today only because the lossy collapse maps ONE → CMP_NE, which every backend implements as UNE. Making the map lossless would expose this — so the frontend fix is coupled to this change.
A latent inversion bug. api_invert_cmp (src/cg/fold.c:214) inverts ordered FP relationals to ordered duals (OLT→OGE) — wrong around NaN (the correct inverse of ordered a<b is unordered a>=b). It is currently unreachable because api_cg_cmp materializes FP compares immediately (arith.c:181-192) rather than building a delayed SV_CMP; it becomes a live miscompile the moment anyone fuses/delays an FP cmp+branch.
A second, parallel lossy path. The f128/long double soft-float path (kit_cg_fp_cmp, arith.c:602-651) collapses ordered/unordered onto shared libcalls the same way and must be fixed in lockstep.
It is the correctness content of Track 2, decoupled from the structural split. This change does not split the cmp hook, double the IR opcodes, or touch binop/unop. It makes the merged enum as expressive as the public one already is (a clean disjoint int-block + fp-block, isomorphic to KitCgIntCmpOp ⊎ KitCgFpCmpOp), so the eventual full split is reduced to pure type/hook plumbing with no correctness left in it.

Design

1. The enum: disjoint FP block (`src/cg/cgtarget.h`)

Replace the FP portion of CmpOp. Keep the 10 integer members unchanged (already 1:1 with KitCgIntCmpOp); replace CMP_LT_F/LE_F/GT_F/GE_F and the FP overloading of CMP_EQ/NE with a clean 12-member FP block laid out after the int block, in the same order as KitCgFpCmpOp:

CMP_EQ, CMP_NE, CMP_LT_S, CMP_LE_S, CMP_GT_S, CMP_GE_S, CMP_LT_U, CMP_LE_U, CMP_GT_U, CMP_GE_U,   // 0..9  (int, unchanged)
CMP_OEQ_F, CMP_ONE_F, CMP_OLT_F, CMP_OLE_F, CMP_OGT_F, CMP_OGE_F,                                  // 10..15 (FP ordered)
CMP_UEQ_F, CMP_UNE_F, CMP_ULT_F, CMP_ULE_F, CMP_UGT_F, CMP_UGE_F                                   // 16..21 (FP unordered)

CMP_OEQ_F (=10) is the new FP boundary. The old names CMP_LT_F/LE_F/GT_F/GE_F are removed so -Werror hard-errors every stale name reference. Update the header comment (cgtarget.h:54-59) to state the block is now IEEE-complete (drop the "does not encode the ordered/unordered distinction" disclaimer).

2. Mappings (`src/cg/value.c`)

api_map_int_cmp — unchanged (already 1:1).
api_map_fp_cmp — make 1:1 (OEQ→CMP_OEQ_F, …, UGE→CMP_UGE_F). Since the FP block mirrors the public enum order, this is return (CmpOp)(CMP_OEQ_F + op); (keep as an explicit switch for clarity; the full Track 2 deletes it later).

3. NaN-aware inversion (`src/cg/fold.c:214`, `api_invert_cmp`)

Replace the 4 buggy FP arms with the full 12-member table. Rule: flip ordered↔unordered and negate the relation (eq↔ne, lt↔ge, le↔gt, gt↔le, ge↔lt):

OEQ→UNE  ONE→UEQ  OLT→UGE  OLE→UGT  OGT→ULE  OGE→ULT
UEQ→ONE  UNE→OEQ  ULT→OGE  ULE→OGT  UGT→OLE  UGE→OLT

4. Dispatch simplification

FP-ness is now self-describing from the opcode: replace the four op >= CMP_LT_F || ((op==CMP_EQ||op==CMP_NE) && loc_is_fp(...)) sites with op >= CMP_OEQ_F, dropping the operand-sniffing clause (FP eq/ne are now distinct opcodes). Sites: interp/engine.c:341, rv64/native.c:998, rv64/native.c:1148, x64/native.c:1112. Caveat: aa64 does not use this boundary — aa_emit_cmp_to_flags (aa64/native.c:2185-2191) chooses FCMP-vs-SUBS from the register class via loc_is_fp(lhs); that operand-sniff must stay (it's precision/class selection, not predicate decode). Single/double precision selection (rv64 RV_FMT_D/S, x64 0x66 prefix) is type-driven everywhere and is unaffected.

5. Backend emission — the 6 new unordered predicates

Key identity used throughout: unordered-R = NOT(ordered-¬R) — ULT=!(OGE), ULE=!(OGT), UGT=!(OLE), UGE=!(OLT), UEQ=!(ONE), UNE=!(OEQ). Each backend renames its 4 ordered arms (LT_F→OLT_F, …) + the FP eq/ne arms (old CMP_EQ/NE-on-float → CMP_OEQ_F/CMP_UNE_F) and adds the unordered arms. Per-arch reality (cost ~moderate, not the 300-500 LOC a naive read suggests; rv64 is not the long pole once negation is used):

Backend	File:hook	Strategy + trap
interp	`engine.c:340 do_cmp`	trivial: direct C doubles w/ explicit `x!=x\\|\\|y!=y`; bump gate to `CMP_OEQ_F`
rv64	`native.c:991 rv_cmp`, `:1136 rv_cmp_branch`	ordered feq/flt/fle + `xori …,1` for unordered. `flt/fle` are signaling (raise NV on NaN) — pre-existing for ordered ops; document, boolean is correct. `default:` is `rv_panic` (traps, not silent)
x64	`native.c:1161 x64_cmp`, `:1334 cmp_branch`, `:1095 cmp_to_cc`	`ucomisd` sets ZF/PF/CF (PF=unordered). Ordered=AND-with-NP, unordered=OR-with-P. Build each predicate's flag formula explicitly via De Morgan — do not blindly `!(opposite)`; the existing GT/GE arms skip parity correction. `cmp_to_cc default:` returns E (silent — see methodology)
aa64	`native.c:634 cmp_cond`, `:2205 aa_cmp`, `:1668 cmp_branch`	keep quiet `FCMP` (`:482`, not FCMPE). Build full 12-entry FP cond table from the ARM FP-condition reference. `UEQ` (equal-or-unordered) and `ONE` (ordered-and-not-equal) have no single AArch64 condition → 2 instructions (e.g. fcmp + cset + csinc). `cmp_cond default:` returns 0x0=EQ (silent, highest-risk)
c_target	`c_emit.c:1340 cmp_to_c`, `:1404 c_emit_cmp`	operator-negation in generated C: `ULT→!((a)>=(b))`, `ONE→((a)<(b)\\|\\|(a)>(b))`, `UEQ→(!((a)<(b))&&!((a)>(b)))`, `UNE→(a)!=(b)`; no `isnan`. Must restructure `c_emit_cmp` to emit compound expressions (not just one operator) and wrap `!(...)` around the full cast-bearing comparison. Don't let the host compiler use `-ffast-math`
wasm	`emit.c:2493 cmp_kind`	FP eq/ne arms are currently absent — today f32/f64 eq/ne falls through to integer `WASM_INSN_*_EQ/NE` (a validator type error); add all 12 from scratch, not "rename". Unordered via `i32.eqz` of the opposite ordered compare; `ONE`/`UEQ` via or-combine

6. f128 / soft-float path (`src/cg/arith.c:602-651`)

Independent of the enum (it uses integer compares on libcall results), but the same collapse. Make it lossless using __unordtf2 (nonzero iff either operand is NaN):

ordered relationals: unchanged (OLT→__lttf2 <0, etc.).
unordered relationals: Ult = (__lttf2(a,b) < 0) || (__unordtf2(a,b) != 0), and similarly for ULE/UGT/UGE.
UEQ = (__eqtf2==0) || (__unordtf2!=0); OEQ = (__eqtf2==0).
ONE = (__netf2!=0) && (__unordtf2==0); UNE = (__netf2!=0) (note: __netf2!=0 already means unordered-not-equal, so today's ONE→__netf2+CMP_NE actually computes UNE — same masking as the scalar path). Verify __unordtf2 is provided by the runtime (rt/).

7. Coupled frontend correctness fix (lands with the lossless map)

lang/c/parse/cg_adapter.c:343 — case CMP_NE: return KIT_CG_FP_UNE; (was ONE).
lang/toy/expr.c:1432 — case TOK_NE: fp_cmp = KIT_CG_FP_UNE; (was ONE). These touch only the frontends' own private enums/tokens → public mapping; they do not depend on or conflict with the internal-enum rename (the C frontend has its own CmpOp at cg_adapter.h:88, so the -Werror name removal in cgtarget.h must not be globally sed-applied into lang/).

8. Dumper + IR storage

src/cg/ir_dump.c:106 cg_ir_cmp_name — rename/add the 12 FP name strings (no default:, so -Wswitch will flag omissions here).
IR storage is already wide enough: CgIrCmpBranchAux.op is a full CmpOp (ir.h:112), extra.imm is i64; recorder/lower pass the op through transparently — no change.

Confirmed non-issues (do not touch)

pass_native_emit.c / cg_ir_lower.c pass the op through; pass_simplify.c:simplify_cmp bails on float via simplify_width (:6) before the switch; pass_o2.c gvn_fold_cmp is int-only and gvn_commutative_cmp is EQ/NE-only (FP eq/ne CSE is an optional future add); pass_jump.c:invert_cmp (:177) has default: return 0 and stays conservatively safe for all FP. No array is indexed by CmpOp; no CmpOp consumers in src/asm or src/debug.

Methodology note — `-Werror` will NOT catch the dangerous sites

Makefile uses -Wall -Wextra -Werror but not -Wswitch-enum; -Wswitch only fires on a switch with no default:. Removing the old names flags every stale name use (good), and flags missing enumerators in the no-default switches (api_invert_cmp, cg_ir_cmp_name, wasm cmp_kind, c_target cmp_to_c). But the default:-bearing dispatch tables silently mishandle the new ops: aa64 cmp_cond→EQ, x64 cmp_to_cc→E, interp do_cmp, x64_cmp FP switch. Treat the per-backend table in §5 as the authoritative checklist, and/or compile the touched files with -Wswitch-enum while developing.

Files to modify (grouped)

Enum/map/fold/dump: src/cg/cgtarget.h, src/cg/value.c, src/cg/fold.c, src/cg/ir_dump.c
Soft-float: src/cg/arith.c (f128 path)
Backends: src/arch/{aa64,x64,rv64}/native.c, src/arch/c_target/c_emit.c, src/arch/wasm/emit.c, src/interp/engine.c
Frontends (coupled fix): lang/c/parse/cg_adapter.c, lang/toy/expr.c
Tests: test/interp/interp_smoke_test.c (rename refs + extend), new test/api/cg_fp_cmp_test.c

Recommended sequencing (red-green, keep every commit green)

Red: add the new public-API test (test/api/cg_fp_cmp_test.c) driving all 12 kit_cg_fp_cmp(KIT_CG_FP_*) predicates through the in-process JIT/interp with NaN and ordinary operands; it fails today on ULT/ULE/UGT/UGE/UEQ/ONE (and on f128).
Core (atomic): rewrite CmpOp (disjoint block), make api_map_fp_cmp 1:1, fix api_invert_cmp, update all 6 backends + interp via the §5 checklist, the 4 boundary checks (§4), cg_ir_cmp_name, and the internal-name refs in interp_smoke_test.c (CMP_LT_F→CMP_OLT_F, CMP_GE_F→CMP_OGE_F, float CMP_EQ→CMP_OEQ_F, CMP_NE→CMP_UNE_F). Single commit so a make bootstrap byte-identity check lands on one consistent state.
Frontend != fix (§7) — small, independent commit.
f128 lossless (§6) — self-contained commit.
Extend spec_fp_cmp_nan to all 12 internal predicates × {NaN-lhs, NaN-rhs, both-NaN, ordered, -0.0/0.0}.

Verification

Unit (internal level): make test-isa test-opt; the extended spec_fp_cmp_nan in make test-... for interp guards each backend's emission of the new internal ops. NaN is injected via bitsd(0x7ff8000000000000ull) (already in the test).
End-to-end public API (the advertise-but-ignore guard): make test-cg-api runs the new cg_fp_cmp_test.c. It must drive the public enum directly — the discriminating cases (ULT/ULE/UGT/UGE/UEQ/ONE) cannot be produced by any C/toy source, so they are only reachable through kit_cg_fp_cmp. Assert per-predicate results for (NaN,1.0), (1.0,NaN), (NaN,NaN), (1.0,2.0), (-0.0,0.0). Add a long double variant to cover the f128 path (§6) and a wasm case to surface the missing FP eq/ne (§5).
Per-arch execution: run the cg-api/JIT cases on x64/aa64/rv64 via test/lib/exec_target.sh (exec_target_queue/flush).
Frontend !=: a C execution test int g(float a,float b){return a!=b;} with a=NaN must return 1 (UNE), guarding §7 against the now-unmasked mislabel.
No regression: make test-asm test-lex test-parse test-toy test-smoke-x64 test-smoke-rv64; and make bootstrap must still reproduce byte-identical at -O0 AND -O1 (a no-regression guard — it won't exercise unordered predicates itself).

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

kit

Make the internal CmpOp FP-compare-lossless (a "disjoint FP block")