kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 90f2ba1a39b3ad8ae0ba9119ce548d2f5d85cd19
parent 9ebc085dfe3e925ecf2cf329beba8fe4d043c884
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 28 May 2026 13:18:47 -0700

wasm: fix all 7 test/parse W-path failures

W-path parse suite goes 426→433 pass, 7→0 fail (32 skip unchanged).

- decode: bin_sleb accumulated in int64_t, so a 7-bit group landing in
  bit 63 was a signed-shift UB. Accumulate in uint64_t, cast at the end.
- emit: clz/ctz/popcount run at the operand width and wrap to the i32
  result, fixing the i64-operand type mismatch.
- ir_emit: install function aliases in sym_to_func before linearizing
  bodies (linearization is eager), so a call through an alias resolves to
  the target instead of a stray empty result-mismatched func.
- cg/atomic: lock-free ceiling is the native atomic width
  (CG_MAX_ATOMIC_SIZE=8), not the pointer width — wasm32 has 4-byte
  pointers but 8-byte atomics.
- c frontend: __builtin___clear_cache is a no-op on wasm/x86 (no I-cache
  to flush), matching GCC/Clang, instead of an undefined __clear_cache.
- wasm insn set: add the standard sign-extension operators
  (i32.extend8_s … i64.extend32_s, 0xc0–0xc4) across enum/name/encode/
  decode/wat/validate and the re-lowering. emit_convert uses them so a
  narrow signed promotion (e.g. signed char → int) sign-extends
  in-register instead of being a same-valtype no-op.

Diffstat:
Mdoc/WASM_PARSE_CHECKLIST.md | 93++++++++++++++++++++++++++++++++++++-------------------------------------------
Mlang/c/parse/parse_expr.c | 18++++++++++++++++++
Mlang/wasm/cg.c | 22++++++++++++++++++++++
Msrc/arch/wasm/emit.c | 72+++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
Msrc/arch/wasm/ir_emit.c | 13++++++++++---
Msrc/cg/atomic.c | 8+++++---
Msrc/cg/internal.h | 7+++++++
Msrc/wasm/decode.c | 25+++++++++++++++++++++----
Msrc/wasm/encode.c | 10++++++++++
Msrc/wasm/insn.c | 11+++++++++--
Msrc/wasm/wasm.h | 7+++++++
Msrc/wasm/wat.c | 20++++++++++++++++++++
12 files changed, 226 insertions(+), 80 deletions(-)

diff --git a/doc/WASM_PARSE_CHECKLIST.md b/doc/WASM_PARSE_CHECKLIST.md @@ -4,60 +4,37 @@ Status of the Wasm CGTarget against the `test/parse` C suite, path **W** (`cfree cc -O0 -target wasm32-none -c case.c` → `cfree run -e test_main case.wasm`). - Host: arm64 (native JIT for the re-lowering). Opt level 0. -- 465 cases: **426 pass · 7 fail · 0 compile-fail · 32 skip · 0 hang**. +- 465 cases: **433 pass · 0 fail · 32 skip**. - The skips below match run.sh's phased-rollout regex (reported SKIP). The fails fall outside it and report as **FAIL** in the harness. -- Reproduce / re-probe: `build/wasm_probe.sh [filter]`; results in - `build/wasm_probe/results.tsv`, per-case logs alongside. +- Reproduce / re-probe: `CFREE_TEST_ALLOW_SKIP=1 ./test/parse/run.sh "" W`. -## ⏳ Hang — fixed (was 1) - -- [x] **`6_8_19_switch_nested_dup_case`** — the structurer's - `unroll_switch_islands` pass only reordered the outermost switch per - invocation: after recording the outer island its `scan` advanced past the - inner switch's `JUMP L_disp`, so the inner case labels stayed backward - refs and were wrapped as synthetic `SCOPE_LOOP`s. `br_table` targeting - those loops turned the inner switch into an infinite loop. Fix - (`src/arch/wasm/structure.c`): re-run the unroller to a fixed point, and - also bail the dispatch-search loop on `WIR_SCOPE_OPEN`/`SCOPE_CLOSE` so a - later pass doesn't mistake a for-loop body label (which sits just before - `[switch SCOPE_LOOP open]`) for a switch dispatch label. - -## ❌ Fail — wrong exit code (7) +## ✅ Fixed — wrong exit code (was 7, now 0) ### Decoder UB (LEB128 sign-extend shifts into sign bit) -- [ ] `6_5_58_large_integer_immediates` — exp 42, got 134; UBSan at - `src/wasm/decode.c:54:36` (`left shift of 127 by 63 places ... int64_t`) -- [ ] `rv64_large_imm_li` — exp 42, got 134; same `src/wasm/decode.c:54` UB +- [x] `6_5_58_large_integer_immediates` — `bin_sleb` accumulated in `int64_t`, + so a 7-bit group landing in bit 63 was UB. Now accumulates in `uint64_t` + and casts at the end (`src/wasm/decode.c`). +- [x] `rv64_large_imm_li` — same `bin_sleb` fix. ### Misc lowering mismatches -- [ ] `attr_p2_10_alias` — exp 42, got 1; `fatal: wasm: function result type mismatch` -- [ ] `builtin_22_ctz_long_widths` — exp 42, got 1; `unary operand type mismatch (expected i32 got i64)` -- [ ] `builtin_24_atomic_lock_free` — exp 42, got 34 -- [ ] `builtin_clear_cache_01` — exp 42, got 134; `AddressSanitizer: DEADLYSIGNAL` -- [ ] `6_8_31_switch_char_extremes` — exp 5, got 20 - -## ✅ Compile-fail — all fixed (was 11) - -- [x] `6_7_1_03_thread_local_basic`, `gnu_thread_storage_01` — wasm has no TLS - model (one linear memory per instance), so thread-locals are ordinary data: - `obj_secname_tdata/tbss` name them `.tdata`/`.tbss` and `CG_IR_TLS_ADDR_OF` - lowers to a plain symbol address (`src/obj/obj_secnames.c`, - `src/arch/wasm/ir_emit.c`). -- [x] `call_indirect_{arg,ret}_struct_*`, `call_large_const_global_struct_byval` - — `CG_IR_LOAD` of an aggregate now lowers to a `memory.copy` instead of a - scalar load (shared `wasm_ir_emit_agg_move` in `src/arch/wasm/ir_emit.c`). -- [x] `6_8_26_switch_many_cases` — large dense switches now lower to a real - `br_table`: `WasmInsn.targets` is a heap-grown vector (no 64-entry cap), - the dense/sparse choice is a density test (`switch_use_br_table`), and the - validator's control-frame stack grows on demand (`src/arch/wasm/emit.c`, - `src/wasm/{wasm.h,module.c,decode.c,wat.c,validate.c}`). -- [x] `builtin_25_atomic_fetch_nand` — `AO_NAND` (no native wasm opcode) lowers - to an `atomic.rmw.cmpxchg` retry loop (`src/arch/wasm/emit.c`). -- [x] `rv64_atomic_widths_orders` — `cmp_branch`/`switch`/`if`-cond operands now - go through `wasm_ir_source_op`, so an address-taken local (e.g. the - `expected` out-param of `__atomic_compare_exchange`) is loaded from memory - rather than read as an undefined wasm local (`src/arch/wasm/ir_emit.c`). +- [x] `attr_p2_10_alias` — function aliases are now installed in `sym_to_func` + *before* function bodies are linearized (`wasm_emit_ir_module`), so a call + through the alias resolves to the target instead of allocating an empty + func with a result-type-mismatched body. +- [x] `builtin_22_ctz_long_widths` — clz/ctz/popcount now run at the operand + width (i64) and wrap to the i32 result (`emit_intrinsic_bit_op`). +- [x] `builtin_24_atomic_lock_free` — lock-free ceiling is the native atomic + width (`CG_MAX_ATOMIC_SIZE` = 8), not the pointer width; wasm32 has 4-byte + pointers but 8-byte atomics (`src/cg/atomic.c`). +- [x] `builtin_clear_cache_01` — `__builtin___clear_cache` is a no-op on wasm + and x86 (no I-cache to flush), matching GCC/Clang, instead of calling an + undefined `__clear_cache` (`lang/c/parse/parse_expr.c`). +- [x] `6_8_31_switch_char_extremes` — added the standard sign-extension + operators (`i32.extend8_s` … `i64.extend32_s`, opcodes 0xc0–0xc4) across + the wasm insn set; `emit_convert` uses them so a narrow signed promotion + (e.g. `signed char` → `int`) actually sign-extends in-register instead of + being treated as a same-valtype no-op. ## ⏭️ Skip — phased-rollout (32, reported SKIP) @@ -98,7 +75,21 @@ returning 0. `ldbl128_01_layout_macros` still PASSES (compile-time layout checks - [ ] `i128_14_arbitrary_mul` ### Other -- [ ] `asm_01_grammar` — `wasm target: asm register clobbers not yet supported` -- [ ] `asm_02_file_scope` — `wasm target: address of undefined symbol not yet implemented` -- [ ] `attr_p2_08_weak_undef` — `wasm target: address of undefined symbol not yet implemented` -- [ ] `builtin_26_sadd_overflow` — `wasm target: 64-bit checked-overflow multiply is not yet supported` +- [ ] `asm_02_file_scope` — `wasm target: address of undefined symbol not yet implemented`. + Implementable via the wasm object linking section + relocations: + `R_WASM_MEMORY_ADDR_{LEB,SLEB,I32,I64}` for data symbols, + `R_WASM_TABLE_INDEX_{SLEB,I32}` for address-taken functions (which go in + the indirect-call table). The undefined symbol gets a `SYMBOL_INFO` with + `WASM_SYM_UNDEFINED`; `wasm-ld` resolves at link time. Same machinery + needed for any cross-TU function pointer — shared with `attr_p2_08`. +- [ ] `attr_p2_08_weak_undef` — `wasm target: address of undefined symbol not yet implemented`. + Implementable, same mechanism as `asm_02_file_scope` plus + `WASM_SYM_BINDING_WEAK` in the symbol flags. `wasm-ld` resolves a weak + undef to 0 (data) or to a trapping stub (code) if nothing defines it. +- [ ] `builtin_26_sadd_overflow` — `wasm target: 64-bit checked-overflow multiply is not yet supported`. + Implementable as a software lowering: wasm has `i64.mul` but no widening + or flag-producing form, so synthesize a 64×64→128 multiply by splitting + each i64 into two i32 halves, doing four i32×i32→i64 partial products, + and checking the high 64 bits against sign-extension of the low 64 + (signed) or non-zero (unsigned). ~dozen wasm ops inlined, or a runtime + helper — same shape as the 32-bit software-mul path other backends use. diff --git a/lang/c/parse/parse_expr.c b/lang/c/parse/parse_expr.c @@ -1268,6 +1268,24 @@ static int parse_builtin_clear_cache_call(Parser* p, Sym name, SrcLoc loc) { coerce_top_to_type(p, void_ptr_ty); expect_punct(p, ')', "')' after __builtin___clear_cache"); + /* Instruction-cache coherency is automatic on x86, and wasm has no separate + * I-cache to flush (the engine handles code installation), so + * __builtin___clear_cache is a no-op on those targets — matching GCC/Clang. + * Emitting the __clear_cache libcall there would reference an undefined + * symbol. Targets that need explicit coherency (ARM, RISC-V) keep the call. + * The argument expressions are still evaluated for their side effects. */ + switch (cfree_compiler_target(p->c).arch) { + case CFREE_ARCH_WASM: + case CFREE_ARCH_X86_32: + case CFREE_ARCH_X86_64: + cg_drop(p->cg); + cg_drop(p->cg); + cg_push_int(p->cg, 0, ty_int(p)); + return 1; + default: + break; + } + params[0] = void_ptr_ty; params[1] = void_ptr_ty; fn_ty = type_func(p->pool, void_ty, params, 2, 0); diff --git a/lang/wasm/cg.c b/lang/wasm/cg.c @@ -3545,6 +3545,28 @@ void wasm_emit_cg(CfreeCompiler* c, const CfreeCodeOptions* code_opts, case WASM_INSN_I64_EXTEND_I32_U: cfree_cg_zext(cg, b.id[CFREE_CG_BUILTIN_I64]); break; + /* In-register sign-extension operators: narrow the value to the source + * width, then sign-extend back to the result width. */ + case WASM_INSN_I32_EXTEND8_S: + cfree_cg_trunc(cg, b.id[CFREE_CG_BUILTIN_I8]); + cfree_cg_sext(cg, b.id[CFREE_CG_BUILTIN_I32]); + break; + case WASM_INSN_I32_EXTEND16_S: + cfree_cg_trunc(cg, b.id[CFREE_CG_BUILTIN_I16]); + cfree_cg_sext(cg, b.id[CFREE_CG_BUILTIN_I32]); + break; + case WASM_INSN_I64_EXTEND8_S: + cfree_cg_trunc(cg, b.id[CFREE_CG_BUILTIN_I8]); + cfree_cg_sext(cg, b.id[CFREE_CG_BUILTIN_I64]); + break; + case WASM_INSN_I64_EXTEND16_S: + cfree_cg_trunc(cg, b.id[CFREE_CG_BUILTIN_I16]); + cfree_cg_sext(cg, b.id[CFREE_CG_BUILTIN_I64]); + break; + case WASM_INSN_I64_EXTEND32_S: + cfree_cg_trunc(cg, b.id[CFREE_CG_BUILTIN_I32]); + cfree_cg_sext(cg, b.id[CFREE_CG_BUILTIN_I64]); + break; case WASM_INSN_I32_TRUNC_F32_S: case WASM_INSN_I32_TRUNC_F64_S: wasm_cg_checked_trunc(c, cg, b, &rt, diff --git a/src/arch/wasm/emit.c b/src/arch/wasm/emit.c @@ -1749,6 +1749,10 @@ void wasm_intrinsic(CGTarget* tg, IntrinKind k, Operand* dst, u32 ndst, w->dst = dst[0].v.reg; w->a = args[0].v.reg; w->type = dst[0].type; + /* clz/ctz/popcount return int (i32) but operate at the operand's width + * (e.g. __builtin_ctzl over an i64). The wasm op width must follow the + * operand, with a wrap to the i32 dst afterward. type2 carries it. */ + w->type2 = args[0].type; w->cls = dst[0].cls; return; } @@ -2526,9 +2530,45 @@ static WasmInsnKind cmp_kind(WTarget* t, CmpOp op, WasmValType vt) { } static void emit_convert(WTarget* t, ConvKind ck, WasmValType src, - WasmValType dst) { - if (src == dst && - (ck == CV_BITCAST || ck == CV_ZEXT || ck == CV_SEXT || ck == CV_TRUNC)) { + WasmValType dst, u32 sw, u32 dw) { + (void)dw; + /* Integer sign/zero extension. Sub-i32 logical widths (i8/i16) share the i32 + * valtype, so a "same valtype" SEXT/ZEXT is NOT a no-op — the high bits must + * be filled per the source's logical width (sw). The CG IR keeps narrow + * immediates as truncated bit patterns, so without this an i8 value like + * (signed char)-128 reads back as 128. */ + if (ck == CV_SEXT && src != WASM_VAL_F32 && src != WASM_VAL_F64) { + if (src == WASM_VAL_I32) { + if (sw == 8u) emit_insn(t, WASM_INSN_I32_EXTEND8_S, 0); + else if (sw == 16u) emit_insn(t, WASM_INSN_I32_EXTEND16_S, 0); + } else { + if (sw == 8u) emit_insn(t, WASM_INSN_I64_EXTEND8_S, 0); + else if (sw == 16u) emit_insn(t, WASM_INSN_I64_EXTEND16_S, 0); + else if (sw == 32u) emit_insn(t, WASM_INSN_I64_EXTEND32_S, 0); + } + if (src == WASM_VAL_I32 && dst == WASM_VAL_I64) + emit_insn(t, WASM_INSN_I64_EXTEND_I32_S, 0); + else if (src == WASM_VAL_I64 && dst == WASM_VAL_I32) + emit_insn(t, WASM_INSN_I32_WRAP_I64, 0); + return; + } + if (ck == CV_ZEXT && src != WASM_VAL_F32 && src != WASM_VAL_F64) { + if (src == WASM_VAL_I32) { + if (sw > 0u && sw < 32u) { + emit_insn(t, WASM_INSN_I32_CONST, (i64)(((u32)1 << sw) - 1u)); + emit_insn(t, WASM_INSN_I32_AND, 0); + } + if (dst == WASM_VAL_I64) emit_insn(t, WASM_INSN_I64_EXTEND_I32_U, 0); + } else { + if (sw > 0u && sw < 64u) { + emit_push_imm(t, WASM_VAL_I64, (i64)(((u64)1 << sw) - 1u)); + emit_insn(t, WASM_INSN_I64_AND, 0); + } + if (dst == WASM_VAL_I32) emit_insn(t, WASM_INSN_I32_WRAP_I64, 0); + } + return; + } + if (src == dst && (ck == CV_BITCAST || ck == CV_TRUNC)) { /* No-op conversion. */ return; } @@ -2563,18 +2603,6 @@ static void emit_convert(WTarget* t, ConvKind ck, WasmValType src, } wfail(t, "wasm: unsupported bitcast"); } - if (ck == CV_SEXT) { - if (src == WASM_VAL_I32 && dst == WASM_VAL_I64) { - emit_insn(t, WASM_INSN_I64_EXTEND_I32_S, 0); - return; - } - } - if (ck == CV_ZEXT) { - if (src == WASM_VAL_I32 && dst == WASM_VAL_I64) { - emit_insn(t, WASM_INSN_I64_EXTEND_I32_U, 0); - return; - } - } if (ck == CV_TRUNC) { if (src == WASM_VAL_I64 && dst == WASM_VAL_I32) { emit_insn(t, WASM_INSN_I32_WRAP_I64, 0); @@ -2839,7 +2867,10 @@ static void emit_switch_br_table(WTarget* t, LoweringState* L, const WIR* w) { * emits a plain copy. ----------------------------------------------- */ static void emit_intrinsic_bit_op(WTarget* t, const WIR* w) { - WasmValType vt = type_valtype(t, w->type); + /* clz/ctz/popcount instruction width follows the operand (type2), not the + * i32 result. i64 forms produce an i64 count that we wrap to the i32 dst. */ + WasmValType vt = type_valtype(t, w->type2 ? w->type2 : w->type); + WasmValType dvt = type_valtype(t, w->type); WasmInsnKind op; switch ((IntrinKind)w->cgop) { case INTRIN_CLZ: @@ -2857,6 +2888,8 @@ static void emit_intrinsic_bit_op(WTarget* t, const WIR* w) { } emit_push_operand_reg(t, w->a); emit_insn(t, op, 0); + if (vt == WASM_VAL_I64 && dvt == WASM_VAL_I32) + emit_insn(t, WASM_INSN_I32_WRAP_I64, 0); emit_local_set(t, w->dst, w->type, (RegClass)w->cls); } @@ -3316,8 +3349,10 @@ static void linearize_range(WTarget* t, LoweringState* L, u32 start, u32 end) { case WIR_CONVERT: { WasmValType src = type_valtype(t, w->type2); WasmValType dst = type_valtype(t, w->type); + u32 sw = cfree_cg_type_int_width((CfreeCompiler*)t->c, w->type2); + u32 dw = cfree_cg_type_int_width((CfreeCompiler*)t->c, w->type); emit_push_operand(t, w->imm_kind, w->imm_a, w->a, w->type2); - emit_convert(t, (ConvKind)w->cgop, src, dst); + emit_convert(t, (ConvKind)w->cgop, src, dst, sw, dw); emit_local_set(t, w->dst, w->type, (RegClass)w->cls); break; } @@ -4107,6 +4142,9 @@ void wasm_alias(CGTarget* tg, ObjSymId alias_sym, ObjSymId target_sym, const ObjSym* asym; (void)type; if (t->dead) return; + /* Aliases are processed before any function body is emitted, so the module + * may not exist yet; sym_to_wasm_func / wasm_add_export both need it. */ + ensure_module(t); tsym = obj_symbol_get(t->obj, target_sym); if (!tsym) wfail(t, "wasm: alias against unknown target symbol"); if (tsym->kind == SK_FUNC) { diff --git a/src/arch/wasm/ir_emit.c b/src/arch/wasm/ir_emit.c @@ -937,11 +937,18 @@ static void wasm_ir_emit_func(WTarget* t, const CgIrFunc* f) { void wasm_emit_ir_module(WTarget* t, const CgIrModule* module) { if (!t || !module) return; - for (u32 i = 0; i < module->nfuncs; ++i) { - wasm_ir_emit_func(t, module->funcs[i]); - } + /* Process aliases before emitting function bodies. Function linearization is + * eager (wasm_func_end), so a call through a function alias resolves its + * wasm func index at emit time; if the alias->target mapping isn't installed + * first, the call allocates a fresh empty func for the alias (bad body -> + * "function result type mismatch") instead of dispatching to the target. + * Data aliases only need the target's (section_id,value), shared earlier at + * the ObjBuilder layer, so they are equally safe here. */ for (u32 i = 0; i < module->naliases; ++i) { const CgIrAlias* a = &module->aliases[i]; wasm_alias((CGTarget*)&t->base, a->alias_sym, a->target_sym, a->type); } + for (u32 i = 0; i < module->nfuncs; ++i) { + wasm_ir_emit_func(t, module->funcs[i]); + } } diff --git a/src/cg/atomic.c b/src/cg/atomic.c @@ -3,7 +3,7 @@ MemAccess api_mem_for_atomic(CfreeCg* g, CfreeCgTypeId val_ty) { MemAccess ma; api_require_scalar_mem_type(g, "atomic memory access", val_ty); - if (api_mem_type_size(g, val_ty, "atomic memory access") > 8u) { + if (api_mem_type_size(g, val_ty, "atomic memory access") > CG_MAX_ATOMIC_SIZE) { compiler_panic(g->c, g->cur_loc, "CfreeCg: atomic memory access size exceeds 8 bytes"); } @@ -22,14 +22,16 @@ int cfree_cg_atomic_is_legal(CfreeCompiler* c, CfreeCgMemAccess access, (void)order; if (!ty) return 0; if (cg_type_is_aggregate(c, ty) || cg_type_is_void(c, ty)) return 0; - return abi_cg_sizeof(c->abi, access.type) <= 8; + return abi_cg_sizeof(c->abi, access.type) <= CG_MAX_ATOMIC_SIZE; } int cfree_cg_atomic_is_lock_free(CfreeCompiler* c, CfreeCgMemAccess access) { CfreeCgTypeId ty = resolve_type(c, access.type); if (!ty) return 0; if (cg_type_is_aggregate(c, ty) || cg_type_is_void(c, ty)) return 0; - return abi_cg_sizeof(c->abi, access.type) <= (u32)c->target.ptr_size; + /* Lock-free up to the native atomic width, NOT the pointer width: wasm32 has + * 4-byte pointers but lowers 8-byte (i64) atomics lock-free. */ + return abi_cg_sizeof(c->abi, access.type) <= CG_MAX_ATOMIC_SIZE; } void cfree_cg_atomic_load(CfreeCg* g, CfreeCgMemAccess access, diff --git a/src/cg/internal.h b/src/cg/internal.h @@ -77,6 +77,13 @@ typedef struct ApiSValue { #define API_CG_STACK_INITIAL 16u +/* Largest scalar the codegen lowers as a native (lock-free) atomic. All + * current targets — aa64, x64, rv64, wasm32 — provide 8-byte (i64-width) + * atomics, so this is both the legality ceiling and the lock-free ceiling. + * Note it is NOT the pointer width: wasm32 has 4-byte pointers but 8-byte + * atomics. */ +#define CG_MAX_ATOMIC_SIZE 8u + typedef struct ApiCgScope { Label break_lbl; Label continue_lbl; diff --git a/src/wasm/decode.c b/src/wasm/decode.c @@ -42,7 +42,9 @@ static uint64_t bin_uleb64(BinReader* r) { } static int64_t bin_sleb(BinReader* r, uint32_t bits) { - int64_t result = 0; + /* Accumulate in the unsigned domain: a 7-bit group can land in bit 63 + (shift == 63 for a 10-byte i64), where a signed left shift is UB. */ + uint64_t result = 0; uint32_t shift = 0; uint32_t max_bytes = (bits + 6u) / 7u; uint32_t nbytes = 0; @@ -51,11 +53,11 @@ static int64_t bin_sleb(BinReader* r, uint32_t bits) { if (nbytes++ >= max_bytes) wasm_error(r->c, wasm_loc(0, 0), "wasm: invalid sleb128"); b = bin_u8(r); - result |= (int64_t)(b & 0x7fu) << shift; + result |= (uint64_t)(b & 0x7fu) << shift; shift += 7u; } while (b & 0x80u); - if (shift < bits && (b & 0x40u)) result |= -((int64_t)1 << shift); - return result; + if (shift < bits && (b & 0x40u)) result |= ~(uint64_t)0 << shift; + return (int64_t)result; } static double bin_f32(BinReader* r) { @@ -745,6 +747,21 @@ static void decode_body_insn(BinReader* rp, WasmModule* out, WasmFunc* f, case 0xbf: wasm_func_add_insn(c, out, f, WASM_INSN_F64_REINTERPRET_I64, 0); break; + case 0xc0: + wasm_func_add_insn(c, out, f, WASM_INSN_I32_EXTEND8_S, 0); + break; + case 0xc1: + wasm_func_add_insn(c, out, f, WASM_INSN_I32_EXTEND16_S, 0); + break; + case 0xc2: + wasm_func_add_insn(c, out, f, WASM_INSN_I64_EXTEND8_S, 0); + break; + case 0xc3: + wasm_func_add_insn(c, out, f, WASM_INSN_I64_EXTEND16_S, 0); + break; + case 0xc4: + wasm_func_add_insn(c, out, f, WASM_INSN_I64_EXTEND32_S, 0); + break; case 0xfc: { uint32_t sub = bin_uleb(&r); switch (sub) { diff --git a/src/wasm/encode.c b/src/wasm/encode.c @@ -537,6 +537,16 @@ static uint8_t wasm_opcode(uint8_t kind) { return 0xbe; case WASM_INSN_F64_REINTERPRET_I64: return 0xbf; + case WASM_INSN_I32_EXTEND8_S: + return 0xc0; + case WASM_INSN_I32_EXTEND16_S: + return 0xc1; + case WASM_INSN_I64_EXTEND8_S: + return 0xc2; + case WASM_INSN_I64_EXTEND16_S: + return 0xc3; + case WASM_INSN_I64_EXTEND32_S: + return 0xc4; } return 0; } diff --git a/src/wasm/insn.c b/src/wasm/insn.c @@ -298,12 +298,14 @@ CfreeCgAtomicOp wasm_atomic_rmw_op(uint8_t kind) { int wasm_int_unop_kind(uint8_t kind, WasmValType* vt) { if (kind == WASM_INSN_I32_CLZ || kind == WASM_INSN_I32_CTZ || - kind == WASM_INSN_I32_POPCNT) { + kind == WASM_INSN_I32_POPCNT || kind == WASM_INSN_I32_EXTEND8_S || + kind == WASM_INSN_I32_EXTEND16_S) { *vt = WASM_VAL_I32; return 1; } if (kind == WASM_INSN_I64_CLZ || kind == WASM_INSN_I64_CTZ || - kind == WASM_INSN_I64_POPCNT) { + kind == WASM_INSN_I64_POPCNT || kind == WASM_INSN_I64_EXTEND8_S || + kind == WASM_INSN_I64_EXTEND16_S || kind == WASM_INSN_I64_EXTEND32_S) { *vt = WASM_VAL_I64; return 1; } @@ -630,6 +632,11 @@ const char* wasm_insn_mnemonic(WasmInsnKind kind) { [WASM_INSN_I64_REINTERPRET_F64] = "i64.reinterpret_f64", [WASM_INSN_F32_REINTERPRET_I32] = "f32.reinterpret_i32", [WASM_INSN_F64_REINTERPRET_I64] = "f64.reinterpret_i64", + [WASM_INSN_I32_EXTEND8_S] = "i32.extend8_s", + [WASM_INSN_I32_EXTEND16_S] = "i32.extend16_s", + [WASM_INSN_I64_EXTEND8_S] = "i64.extend8_s", + [WASM_INSN_I64_EXTEND16_S] = "i64.extend16_s", + [WASM_INSN_I64_EXTEND32_S] = "i64.extend32_s", [WASM_INSN_I32_TRUNC_SAT_F32_S] = "i32.trunc_sat_f32_s", [WASM_INSN_I32_TRUNC_SAT_F32_U] = "i32.trunc_sat_f32_u", [WASM_INSN_I32_TRUNC_SAT_F64_S] = "i32.trunc_sat_f64_s", diff --git a/src/wasm/wasm.h b/src/wasm/wasm.h @@ -236,6 +236,13 @@ typedef enum WasmInsnKind { WASM_INSN_I64_REINTERPRET_F64, WASM_INSN_F32_REINTERPRET_I32, WASM_INSN_F64_REINTERPRET_I64, + /* Sign-extension operators (0xc0..0xc4). In-register sign extension from a + * narrower width; part of the standard MVP-era instruction set. */ + WASM_INSN_I32_EXTEND8_S, + WASM_INSN_I32_EXTEND16_S, + WASM_INSN_I64_EXTEND8_S, + WASM_INSN_I64_EXTEND16_S, + WASM_INSN_I64_EXTEND32_S, /* Non-trapping float-to-int truncation (0xfc 0x00..0x07). * Gated by WASM_FEATURE_NONTRAPPING_FTOI. */ WASM_INSN_I32_TRUNC_SAT_F32_S, diff --git a/src/wasm/wat.c b/src/wasm/wat.c @@ -1193,6 +1193,26 @@ static int wat_instr_kind(WasmTok t, WasmInsnKind* out, int* has_imm) { *out = WASM_INSN_F64_REINTERPRET_I64; return 1; } + if (tok_is(t, "i32.extend8_s")) { + *out = WASM_INSN_I32_EXTEND8_S; + return 1; + } + if (tok_is(t, "i32.extend16_s")) { + *out = WASM_INSN_I32_EXTEND16_S; + return 1; + } + if (tok_is(t, "i64.extend8_s")) { + *out = WASM_INSN_I64_EXTEND8_S; + return 1; + } + if (tok_is(t, "i64.extend16_s")) { + *out = WASM_INSN_I64_EXTEND16_S; + return 1; + } + if (tok_is(t, "i64.extend32_s")) { + *out = WASM_INSN_I64_EXTEND32_S; + return 1; + } return 0; }