Make the internal CmpOp FP-compare-lossless (a "disjoint FP block")
Context
kit has two stacked codegen interfaces (see doc/plan/CODEGEN.md, Track 2). The public
API (include/kit/cg.h) splits compares by class — KitCgIntCmpOp (10) and the full
IEEE-complete KitCgFpCmpOp (12: OEQ, ONE, OLT, OLE, OGT, OGE, UEQ, UNE, ULT, ULE, UGT,
UGE). The internal CgTarget vocabulary CmpOp (src/cg/cgtarget.h:60-75) merges them into
14 members and is lossy for floating point: it has no unordered relationals, and
CMP_EQ/CMP_NE are overloaded between int and float (on float, CMP_EQ=ordered-equal,
CMP_NE=unordered-not-equal). api_map_fp_cmp (src/cg/value.c:503) collapses the 12
public FP predicates down to 6, so the ordered/unordered distinction cannot reach any
backend.
Why fix it (accurate motivation — there is no live C miscompile):
- Advertise-but-ignore (Principle 2). A direct cg-api caller — or any future frontend, or
the C99
isgreater/isless/islessgreater/isunorderedmath builtins if/when the C frontend implements them — that requestsULT/ULE/UGT/UGE/UEQ/ONEis silently lowered to the wrong (ordered) predicate, producing wrong NaN behavior. The public surface promises a distinction the backend never receives. - A masked frontend mislabel. Both the C frontend (
pcg_fp_cmp,cg_adapter.c:343) and toy (expr.c:1432) map!=to ONE (ordered-not-equal). C!=on floats is unordered (NaN != xis true). This is "correct by accident" today only because the lossy collapse maps ONE →CMP_NE, which every backend implements as UNE. Making the map lossless would expose this — so the frontend fix is coupled to this change. - A latent inversion bug.
api_invert_cmp(src/cg/fold.c:214) inverts ordered FP relationals to ordered duals (OLT→OGE) — wrong around NaN (the correct inverse of ordereda<bis unordereda>=b). It is currently unreachable becauseapi_cg_cmpmaterializes FP compares immediately (arith.c:181-192) rather than building a delayedSV_CMP; it becomes a live miscompile the moment anyone fuses/delays an FP cmp+branch. - A second, parallel lossy path. The f128/
long doublesoft-float path (kit_cg_fp_cmp,arith.c:602-651) collapses ordered/unordered onto shared libcalls the same way and must be fixed in lockstep. - It is the correctness content of Track 2, decoupled from the structural split. This
change does not split the
cmphook, double the IR opcodes, or touch binop/unop. It makes the merged enum as expressive as the public one already is (a clean disjoint int-block + fp-block, isomorphic toKitCgIntCmpOp ⊎ KitCgFpCmpOp), so the eventual full split is reduced to pure type/hook plumbing with no correctness left in it.
Design
1. The enum: disjoint FP block (src/cg/cgtarget.h)
Replace the FP portion of CmpOp. Keep the 10 integer members unchanged (already 1:1 with
KitCgIntCmpOp); replace CMP_LT_F/LE_F/GT_F/GE_F and the FP overloading of CMP_EQ/NE
with a clean 12-member FP block laid out after the int block, in the same order as
KitCgFpCmpOp:
CMP_EQ, CMP_NE, CMP_LT_S, CMP_LE_S, CMP_GT_S, CMP_GE_S, CMP_LT_U, CMP_LE_U, CMP_GT_U, CMP_GE_U, // 0..9 (int, unchanged)
CMP_OEQ_F, CMP_ONE_F, CMP_OLT_F, CMP_OLE_F, CMP_OGT_F, CMP_OGE_F, // 10..15 (FP ordered)
CMP_UEQ_F, CMP_UNE_F, CMP_ULT_F, CMP_ULE_F, CMP_UGT_F, CMP_UGE_F // 16..21 (FP unordered)
CMP_OEQ_F (=10) is the new FP boundary. The old names CMP_LT_F/LE_F/GT_F/GE_F are
removed so -Werror hard-errors every stale name reference. Update the header comment
(cgtarget.h:54-59) to state the block is now IEEE-complete (drop the "does not encode the
ordered/unordered distinction" disclaimer).
2. Mappings (src/cg/value.c)
api_map_int_cmp— unchanged (already 1:1).api_map_fp_cmp— make 1:1 (OEQ→CMP_OEQ_F, …, UGE→CMP_UGE_F). Since the FP block mirrors the public enum order, this isreturn (CmpOp)(CMP_OEQ_F + op);(keep as an explicit switch for clarity; the full Track 2 deletes it later).
3. NaN-aware inversion (src/cg/fold.c:214, api_invert_cmp)
Replace the 4 buggy FP arms with the full 12-member table. Rule: flip ordered↔unordered and negate the relation (eq↔ne, lt↔ge, le↔gt, gt↔le, ge↔lt):
OEQ→UNE ONE→UEQ OLT→UGE OLE→UGT OGT→ULE OGE→ULT
UEQ→ONE UNE→OEQ ULT→OGE ULE→OGT UGT→OLE UGE→OLT
4. Dispatch simplification
FP-ness is now self-describing from the opcode: replace the four op >= CMP_LT_F || ((op==CMP_EQ||op==CMP_NE) && loc_is_fp(...)) sites with op >= CMP_OEQ_F, dropping the
operand-sniffing clause (FP eq/ne are now distinct opcodes). Sites: interp/engine.c:341,
rv64/native.c:998, rv64/native.c:1148, x64/native.c:1112.
Caveat: aa64 does not use this boundary — aa_emit_cmp_to_flags
(aa64/native.c:2185-2191) chooses FCMP-vs-SUBS from the register class via loc_is_fp(lhs);
that operand-sniff must stay (it's precision/class selection, not predicate decode).
Single/double precision selection (rv64 RV_FMT_D/S, x64 0x66 prefix) is type-driven
everywhere and is unaffected.
5. Backend emission — the 6 new unordered predicates
Key identity used throughout: unordered-R = NOT(ordered-¬R) — ULT=!(OGE), ULE=!(OGT),
UGT=!(OLE), UGE=!(OLT), UEQ=!(ONE), UNE=!(OEQ). Each backend renames its 4 ordered
arms (LT_F→OLT_F, …) + the FP eq/ne arms (old CMP_EQ/NE-on-float → CMP_OEQ_F/CMP_UNE_F)
and adds the unordered arms. Per-arch reality (cost ~moderate, not the 300-500 LOC a naive
read suggests; rv64 is not the long pole once negation is used):
| Backend | File:hook | Strategy + trap |
|---|---|---|
| interp | engine.c:340 do_cmp |
trivial: direct C doubles w/ explicit x!=x\|\|y!=y; bump gate to CMP_OEQ_F |
| rv64 | native.c:991 rv_cmp, :1136 rv_cmp_branch |
ordered feq/flt/fle + xori …,1 for unordered. flt/fle are signaling (raise NV on NaN) — pre-existing for ordered ops; document, boolean is correct. default: is rv_panic (traps, not silent) |
| x64 | native.c:1161 x64_cmp, :1334 cmp_branch, :1095 cmp_to_cc |
ucomisd sets ZF/PF/CF (PF=unordered). Ordered=AND-with-NP, unordered=OR-with-P. Build each predicate's flag formula explicitly via De Morgan — do not blindly !(opposite); the existing GT/GE arms skip parity correction. cmp_to_cc default: returns E (silent — see methodology) |
| aa64 | native.c:634 cmp_cond, :2205 aa_cmp, :1668 cmp_branch |
keep quiet FCMP (:482, not FCMPE). Build full 12-entry FP cond table from the ARM FP-condition reference. UEQ (equal-or-unordered) and ONE (ordered-and-not-equal) have no single AArch64 condition → 2 instructions (e.g. fcmp + cset + csinc). cmp_cond default: returns 0x0=EQ (silent, highest-risk) |
| c_target | c_emit.c:1340 cmp_to_c, :1404 c_emit_cmp |
operator-negation in generated C: ULT→!((a)>=(b)), ONE→((a)<(b)\|\|(a)>(b)), UEQ→(!((a)<(b))&&!((a)>(b))), UNE→(a)!=(b); no isnan. Must restructure c_emit_cmp to emit compound expressions (not just one operator) and wrap !(...) around the full cast-bearing comparison. Don't let the host compiler use -ffast-math |
| wasm | emit.c:2493 cmp_kind |
FP eq/ne arms are currently absent — today f32/f64 eq/ne falls through to integer WASM_INSN_*_EQ/NE (a validator type error); add all 12 from scratch, not "rename". Unordered via i32.eqz of the opposite ordered compare; ONE/UEQ via or-combine |
6. f128 / soft-float path (src/cg/arith.c:602-651)
Independent of the enum (it uses integer compares on libcall results), but the same
collapse. Make it lossless using __unordtf2 (nonzero iff either operand is NaN):
- ordered relationals: unchanged (
OLT→__lttf2 <0, etc.). - unordered relationals:
Ult = (__lttf2(a,b) < 0) || (__unordtf2(a,b) != 0), and similarly for ULE/UGT/UGE. UEQ = (__eqtf2==0) || (__unordtf2!=0);OEQ = (__eqtf2==0).ONE = (__netf2!=0) && (__unordtf2==0);UNE = (__netf2!=0)(note:__netf2!=0already means unordered-not-equal, so today'sONE→__netf2+CMP_NEactually computes UNE — same masking as the scalar path). Verify__unordtf2is provided by the runtime (rt/).
7. Coupled frontend correctness fix (lands with the lossless map)
lang/c/parse/cg_adapter.c:343—case CMP_NE: return KIT_CG_FP_UNE;(wasONE).lang/toy/expr.c:1432—case TOK_NE: fp_cmp = KIT_CG_FP_UNE;(wasONE). These touch only the frontends' own private enums/tokens → public mapping; they do not depend on or conflict with the internal-enum rename (the C frontend has its ownCmpOpatcg_adapter.h:88, so the-Werrorname removal incgtarget.hmust not be globally sed-applied intolang/).
8. Dumper + IR storage
src/cg/ir_dump.c:106 cg_ir_cmp_name— rename/add the 12 FP name strings (nodefault:, so-Wswitchwill flag omissions here).- IR storage is already wide enough:
CgIrCmpBranchAux.opis a fullCmpOp(ir.h:112),extra.immisi64; recorder/lower pass the op through transparently — no change.
Confirmed non-issues (do not touch)
pass_native_emit.c / cg_ir_lower.c pass the op through; pass_simplify.c:simplify_cmp
bails on float via simplify_width (:6) before the switch; pass_o2.c gvn_fold_cmp is
int-only and gvn_commutative_cmp is EQ/NE-only (FP eq/ne CSE is an optional future add);
pass_jump.c:invert_cmp (:177) has default: return 0 and stays conservatively safe for all
FP. No array is indexed by CmpOp; no CmpOp consumers in src/asm or src/debug.
Methodology note — -Werror will NOT catch the dangerous sites
Makefile uses -Wall -Wextra -Werror but not -Wswitch-enum; -Wswitch only fires on a
switch with no default:. Removing the old names flags every stale name use (good), and
flags missing enumerators in the no-default switches (api_invert_cmp, cg_ir_cmp_name,
wasm cmp_kind, c_target cmp_to_c). But the default:-bearing dispatch tables silently
mishandle the new ops: aa64 cmp_cond→EQ, x64 cmp_to_cc→E, interp do_cmp, x64_cmp FP
switch. Treat the per-backend table in §5 as the authoritative checklist, and/or compile the
touched files with -Wswitch-enum while developing.
Files to modify (grouped)
- Enum/map/fold/dump:
src/cg/cgtarget.h,src/cg/value.c,src/cg/fold.c,src/cg/ir_dump.c - Soft-float:
src/cg/arith.c(f128 path) - Backends:
src/arch/{aa64,x64,rv64}/native.c,src/arch/c_target/c_emit.c,src/arch/wasm/emit.c,src/interp/engine.c - Frontends (coupled fix):
lang/c/parse/cg_adapter.c,lang/toy/expr.c - Tests:
test/interp/interp_smoke_test.c(rename refs + extend), newtest/api/cg_fp_cmp_test.c
Recommended sequencing (red-green, keep every commit green)
- Red: add the new public-API test (
test/api/cg_fp_cmp_test.c) driving all 12kit_cg_fp_cmp(KIT_CG_FP_*)predicates through the in-process JIT/interp with NaN and ordinary operands; it fails today onULT/ULE/UGT/UGE/UEQ/ONE(and on f128). - Core (atomic): rewrite
CmpOp(disjoint block), makeapi_map_fp_cmp1:1, fixapi_invert_cmp, update all 6 backends + interp via the §5 checklist, the 4 boundary checks (§4),cg_ir_cmp_name, and the internal-name refs ininterp_smoke_test.c(CMP_LT_F→CMP_OLT_F,CMP_GE_F→CMP_OGE_F, floatCMP_EQ→CMP_OEQ_F,CMP_NE→CMP_UNE_F). Single commit so amake bootstrapbyte-identity check lands on one consistent state. - Frontend
!=fix (§7) — small, independent commit. - f128 lossless (§6) — self-contained commit.
- Extend
spec_fp_cmp_nanto all 12 internal predicates × {NaN-lhs, NaN-rhs, both-NaN, ordered, -0.0/0.0}.
Verification
- Unit (internal level):
make test-isa test-opt; the extendedspec_fp_cmp_naninmake test-...for interp guards each backend's emission of the new internal ops. NaN is injected viabitsd(0x7ff8000000000000ull)(already in the test). - End-to-end public API (the advertise-but-ignore guard):
make test-cg-apiruns the newcg_fp_cmp_test.c. It must drive the public enum directly — the discriminating cases (ULT/ULE/UGT/UGE/UEQ/ONE) cannot be produced by any C/toy source, so they are only reachable throughkit_cg_fp_cmp. Assert per-predicate results for(NaN,1.0),(1.0,NaN),(NaN,NaN),(1.0,2.0),(-0.0,0.0). Add along doublevariant to cover the f128 path (§6) and a wasm case to surface the missing FP eq/ne (§5). - Per-arch execution: run the cg-api/JIT cases on x64/aa64/rv64 via
test/lib/exec_target.sh(exec_target_queue/flush). - Frontend
!=: a C execution testint g(float a,float b){return a!=b;}witha=NaNmust return 1 (UNE), guarding §7 against the now-unmasked mislabel. - No regression:
make test-asm test-lex test-parse test-toy test-smoke-x64 test-smoke-rv64; andmake bootstrapmust still reproduce byte-identical at -O0 AND -O1 (a no-regression guard — it won't exercise unordered predicates itself).