kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit b12f8b4dcdf5b08b279691628be1593b32ae4787
parent 9feae6c4c66cd1be6b1a90b22a6cad2e2ab43cb6
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 13:09:04 -0700

arch/abi/cg: rv64 β€” real LP64D codegen mirroring aa64

Replaces the panicking rv64 CGTarget stub and the indirect-everything
ABI with a working RV64IMFD/LP64D backend. test-cg path R passes 386/386
at both opt levels; path E passes all but variadic-spill (j03/j06) and
TLS BSS (n02/n07). See doc/rv64-status.md for the phase checklist.

Pieces:
- src/arch/rv64_isa.h: R/I/S/B/U/J encoders, FP, AMO, FCVT, FMV
- src/arch/rv64.c: prologue/epilogue, frame slots, params (incl. stack),
  loads/stores/binop/cmp/convert, calls/sret, alloca, va_* (partial),
  atomics LR/SC + CAS, intrinsics (popcount/ctz/clz/bswap/overflow/mem*),
  TLS LE addressing, AUIPC + PCREL_HI20/LO12 global addressing
- src/abi/abi_rv64.c: LP64D classifier (scalars + ≀16B aggregates)
- src/arch/mc.c: R_RV_BRANCH and R_RV_JAL label fixups
- test/cg/harness: CG_ARCH_RV64; default mask includes all backends;
  mc_smoke pinned to aarch64; run.sh skips cases whose arch mask
  excludes the test arch

Diffstat:
Adoc/rv64-status.md | 151++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/abi/abi_rv64.c | 105+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
Msrc/arch/mc.c | 34++++++++++++++++++++++++++++++++++
Msrc/arch/rv64.c | 2618++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
Asrc/arch/rv64_isa.h | 291++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/cg/harness/cases.c | 2+-
Mtest/cg/harness/cg_runner.c | 1+
Mtest/cg/harness/cg_test.h | 6+++++-
Mtest/cg/run.sh | 9+++++++++
9 files changed, 2969 insertions(+), 248 deletions(-)

diff --git a/doc/rv64-status.md b/doc/rv64-status.md @@ -0,0 +1,151 @@ +# RV64 codegen status + +Living checklist for the RISC-V (RV64IMFD, LP64D) backend (`src/arch/rv64.c`) +and ABI (`src/abi/abi_rv64.c`). Behavioral oracles are `test/cg/` and +`test/parse/`. Phase status: + +- βœ… landed +- 🚧 in progress +- ⬜ planned + +--- + +## Current test-cg results + +Run from an aarch64 host; D / J are skipped because they need a native +rv64 host (same pattern as x64 on aa64). R verifies emitβ†’reader fidelity, +E links and runs under qemu-riscv64 via podman. + +| Path | Pass | Fail | Skip | +|----------------------------|-----:|-----:|-----:| +| R (roundtrip) | 386 | 0 | 0 | +| E (qemu exec) | 379 | 8 | ~ | +| D / J (native JIT) | 0 | 0 | 772 | + +Skips are valid: D and J require host == rv64. With +`CFREE_TEST_ALLOW_SKIP=1`, the suite reports **765 pass, 8 fail, 768 skip**. + +The 8 fails are 4 cases Γ— 2 opt levels β€” see the phase checklists below. + +--- + +## Phase 1 β€” Backend foundation βœ… + +- βœ… `rv64_isa.h` β€” R/I/S/B/U/J encoders, FP (F+D), atomic (A), FCVT, FMV +- βœ… Register pools (s2..s11 int, fs2..fs11 fp); scratches t0..t3, ft0 +- βœ… Frame layout: locals at s0 - off, callee saves below locals, + saved-s0/ra at the top, outgoing args at sp+0 +- βœ… Prologue placeholder + func_end patch (mirrors aa64's pattern) +- βœ… Epilogue restores sp from s0 when alloca was used +- βœ… ABI: replaced indirect-everything stub with real LP64D scalar + + small-aggregate classification (≀16B β†’ up to 2 INT parts) +- βœ… `mc.c` apply_fixup handles R_RV_BRANCH and R_RV_JAL + +## Phase 2 β€” Core ops βœ… + +- βœ… load_imm: ADDI / LUI+ADDIW / multi-step 64-bit +- βœ… copy, load, store, addr_of (LOCAL / INDIRECT / GLOBAL via AUIPC+PCREL) +- βœ… binop (all int + FP add/sub/mul/div), unop, cmp, cmp_branch +- βœ… convert: SEXT/ZEXT/TRUNC/ITOF/FTOI/FEXT/FTRUNC/BITCAST +- βœ… Structured scopes: IF / LOOP / BLOCK +- βœ… Calls (direct AUIPC+JALR with R_RV_CALL; indirect JALR) +- βœ… Sret returns (caller passes dst pointer in a0; callee spills to slot) +- βœ… alloca (const + runtime size, max_outgoing patch site) +- βœ… Aggregate copy_bytes / set_bytes / bitfield_load / bitfield_store +- βœ… Atomics: load/store (with LR/fence sequences), AMO via LR/SC, CAS +- βœ… Intrinsics: memcpy/memmove/memset, popcount, ctz, clz, bswap16/32/64, + add/sub/mul_overflow, expect, assume_aligned, prefetch, trap + +## Phase 3 β€” Variadic LP64D 🚧 + +Variadic-args calling convention with **save area contiguous with caller's +stack args** so a single `void*` walk works for any number of args. + +- βœ… va_list = `void*`; va_start / va_arg / va_end / va_copy +- βœ… Prologue spills a0..a7 into a 64-byte save slot +- βœ… va_arg handles RC_INT and RC_FP (bitcast via FMV.X.{W,D}) +- βœ… Variadic FP **args being passed** are bitcast into integer regs + (RC_FP storage β†’ FMV.X.{W,D} β†’ a-reg) +- 🚧 Save area placement: currently inside the frame near other slots; + walks past `save_area[7]` fall into locals, not caller's stack +- ⬜ Relayout: place save area at the very top of the frame (right above + the saved-s0/ra pair) so [save_area, save_area+64, caller's stack] + forms one contiguous byte stream +- ⬜ Honor named-arg count: spill only a_{nparams_int}..a7 (today we + spill all 8 unconditionally; correct but wastes prologue insns) + +Failing today (4 case Γ— 2 levels = 8): +- `j03_va_int_spill` β€” 10 i32 variadics; 9th and 10th read from + wrong addresses +- `j06_va_double_spill` β€” 9 f64 variadics; 9th double read from wrong addr + +## Phase 4 β€” TLS LE ⬜ + +Local-Exec model. `n01_tls_load_le` and `n08_tls_addend_offset` work +(tdata read paths), so the LUI+ADD+ADDI / TPREL_HI20+LO12 sequence is +correct in isolation. Failing cases all touch `.tbss`: + +- ⬜ `n02_tls_store_le` (store 42 to .tbss var, load back β†’ got 0xff) +- ⬜ `n07_tls_bss_zero_init` (read uninitialized .tbss var β†’ got 0xff) + +Likely a linker / loader interaction (cfree-ld’s rv64 .tbss layout vs. +`start.c`'s `tls_init` for `__riscv`). Investigation steps: + +- ⬜ Dump emitted relocs on n02; verify TPREL HI20/LO12 against + cfree-ld's resolved tprel offset +- ⬜ Check that cfree-ld emits `__tbss_size` correctly for rv64 outputs +- ⬜ Compare runtime `tp` value to `g_tls_block` base +- ⬜ Confirm `.tbss` follows `.tdata` contiguously in the TLS image so + start.c's `dst[td_n + i] = 0` lands at the right offset + +## Phase 5 β€” test-parse on rv64 ⬜ + +`test-parse` is the file-driven C-parser harness (`test/parse/`); it +reuses the cg roundtrip/exec runners and exercises the parser through +the same R/E/J/W paths. Today it runs aa64-only. + +- ⬜ Verify the parse runner picks up CFREE_TEST_ARCH=rv64 (likely needs + no changes β€” it already uses cfree_test_target_init) +- ⬜ Run `CFREE_TEST_ARCH=rv64 make test-parse` and triage failures +- ⬜ Decide on opt-out filtering for arch-specific parse cases (asm + templates, target-specific builtins). Pattern follows the per-case + `arches` mask added to test/cg +- ⬜ Land a phased rv64 entry in `test/parse/CORPUS.md` mirroring this doc + +## Phase 6 β€” Beyond v1 ⬜ + +- ⬜ `mc_smoke` rv64 sibling (hand-crafted bytes that return 42) +- ⬜ Compressed (RVC) emission when output is denser +- ⬜ M-extension overflow detection for `mul_overflow` i64 (today panics) +- ⬜ Zbb fast paths for popcount/ctz/clz/bswap when ABI permits +- ⬜ Inline asm (`rv_asm_block` panics today) +- ⬜ Real address-out-of-imm12 expansion in addr_base (panics on giant + frames; the test corpus stays well within the imm12 window) + +--- + +## Known mismatches with aa64 conventions + +- Reg pools are 10 wide (s2..s11) vs. aa64's 10 (x19..x28). s1 (and + fs0/fs1) are reserved/unused. +- AUIPC+PCREL_HI20/LO12 anchor symbols (`.LpcrelHi<N>`) are emitted as + SB_LOCAL into the current section; cfree-ld looks them up by AUIPC + vaddr (see `src/link/link_elf.c:rv_pcrel_lo12_disp`). +- The aarch64 backend pairs saved fp+lr via STP; rv64 has no pair-store + so prologue is two SDs instead. + +--- + +## Reproduce + +```sh +# rv64 cg (R-only avoids needing podman): +make lib +CFREE_TEST_ARCH=rv64 bash test/cg/run.sh '' R + +# rv64 cg (all paths, qemu-riscv64 + podman required): +CFREE_TEST_ARCH=rv64 CFREE_TEST_ALLOW_SKIP=1 bash test/cg/run.sh '' DREJW + +# rv64 parse (planned): +CFREE_TEST_ARCH=rv64 make test-parse +``` diff --git a/src/abi/abi_rv64.c b/src/abi/abi_rv64.c @@ -1,10 +1,15 @@ -/* RISC-V LP64D ABI β€” phase-2 stub. +/* RISC-V LP64D ABI dispatch (simplified). * - * Initial classifier returns ABI_ARG_INDIRECT for everything: correct - * (every value passes through memory), slow, but unblocks bring-up of - * the rv64 codegen path. A future phase replaces this with the real - * RISC-V calling convention (a0..a7 for ints, fa0..fa7 for floats, - * 2*XLEN aggregate flattening, etc.). */ + * Covers the subset the cg test harness needs: + * void -> IGNORE + * integer ≀ 8B -> DIRECT, one INT part (a0..a7 for args; a0 for return) + * pointer -> DIRECT, one INT part + * float/double -> DIRECT, one FP part (fa0..fa7 for args; fa0 for return) + * small struct -> DIRECT, INT parts up to 16B (passed in up to 2 GPRs) + * large struct -> INDIRECT (sret for return; byval for args) + * + * Full RISC-V psABI flattening of mixed FP+INT homogeneous aggregates, + * 2*XLEN aggregate-in-fp-regs, and stack overflow rules are deferred. */ #include <string.h> @@ -13,26 +18,86 @@ #include "core/core.h" #include "core/pool.h" -static void classify_indirect(TargetABI* a, const Type* t, ABIArgInfo* out, - int is_return) { +static void classify_scalar(TargetABI* a, const Type* t, ABIArgInfo* out) { + ABITypeInfo ti = abi_internal_type_info(a, t); + out->kind = ABI_ARG_DIRECT; + out->flags = ABI_AF_NONE; + out->indirect_align = 0; + + ABIArgPart* parts = arena_new(a->c->tu, ABIArgPart); + memset(parts, 0, sizeof *parts); + parts->cls = (ti.scalar_kind == ABI_SC_FLOAT) ? ABI_CLASS_FP : ABI_CLASS_INT; + parts->loc = ABI_LOC_REG; + parts->size = ti.size; + parts->align = ti.align; + parts->src_offset = 0; + + out->parts = parts; + out->nparts = 1; +} + +static void classify_void(ABIArgInfo* out) { + memset(out, 0, sizeof *out); + out->kind = ABI_ARG_IGNORE; +} + +static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out, + int is_return) { + ABITypeInfo ti = abi_internal_type_info(a, t); + if (ti.size == 0) { + classify_void(out); + return; + } + if (ti.size <= 16) { + u32 nparts = (ti.size + 7) / 8; + ABIArgPart* parts = arena_array(a->c->tu, ABIArgPart, nparts); + memset(parts, 0, sizeof(ABIArgPart) * nparts); + u32 off = 0; + for (u32 i = 0; i < nparts; ++i) { + u32 chunk = (ti.size - off > 8) ? 8 : (ti.size - off); + parts[i].cls = ABI_CLASS_INT; + parts[i].loc = ABI_LOC_REG; + parts[i].size = chunk; + parts[i].align = 8; + parts[i].src_offset = off; + off += chunk; + } + out->kind = ABI_ARG_DIRECT; + out->flags = ABI_AF_NONE; + out->parts = parts; + out->nparts = (u16)nparts; + out->indirect_align = 0; + } else { + out->kind = ABI_ARG_INDIRECT; + out->flags = is_return ? ABI_AF_SRET : ABI_AF_BYVAL; + out->indirect_align = ti.align ? ti.align : 8; + out->parts = NULL; + out->nparts = 0; + } +} + +static void classify_one(TargetABI* a, const Type* t, ABIArgInfo* out, + int is_return) { if (!t || t->kind == TY_VOID) { - memset(out, 0, sizeof *out); - out->kind = ABI_ARG_IGNORE; + classify_void(out); return; } - ABITypeInfo ti = abi_internal_type_info(a, t); - out->kind = ABI_ARG_INDIRECT; - out->flags = is_return ? ABI_AF_SRET : ABI_AF_BYVAL; - out->indirect_align = ti.align ? ti.align : 8; - out->parts = NULL; - out->nparts = 0; + switch (t->kind) { + case TY_STRUCT: + case TY_UNION: + classify_aggregate(a, t, out, is_return); + return; + default: + classify_scalar(a, t, out); + return; + } } static ABIFuncInfo* rv64_compute_func_info(TargetABI* a, const Type* fn) { ABIFuncInfo* info = arena_new(a->c->tu, ABIFuncInfo); memset(info, 0, sizeof *info); - classify_indirect(a, fn->fn.ret, &info->ret, /*is_return=*/1); + classify_one(a, fn->fn.ret, &info->ret, /*is_return=*/1); info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0; info->variadic = fn->fn.variadic; @@ -41,7 +106,7 @@ static ABIFuncInfo* rv64_compute_func_info(TargetABI* a, const Type* fn) { ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fn->fn.nparams); memset(arr, 0, sizeof(ABIArgInfo) * fn->fn.nparams); for (u16 i = 0; i < fn->fn.nparams; ++i) { - classify_indirect(a, fn->fn.params[i], &arr[i], /*is_return=*/0); + classify_one(a, fn->fn.params[i], &arr[i], /*is_return=*/0); } info->params = arr; } else { @@ -52,9 +117,7 @@ static ABIFuncInfo* rv64_compute_func_info(TargetABI* a, const Type* fn) { static const Type* rv64_va_list_type(TargetABI* a, Pool* p) { /* RISC-V LP64: va_list is `void *` (one pointer to the next argument - * in memory). Modeled as a typedef-shaped void* β€” the macro semantics - * resolve through abi_va_list_type and the (future) backend's - * va_start/va_arg lowering. */ + * in memory). */ (void)a; return type_ptr(p, type_void(p)); } diff --git a/src/arch/mc.c b/src/arch/mc.c @@ -131,6 +131,40 @@ static void apply_fixup(MCImpl* mc, const MCFixup* fx, u32 target_offset) { obj_patch(mc->base.obj, fx->sec_id, fx->offset, cur, 4); break; } + case R_RV_BRANCH: { + /* B-type 12-bit signed displacement in 2-byte units. */ + const Section* s = obj_section_get(mc->base.obj, fx->sec_id); + if (!s) break; + u8 cur[4]; + buf_read(&s->bytes, fx->offset, cur, 4); + u32 word = rd_u32_le(cur); + u32 b = (u32)disp; + word &= 0x01fff07fu; + word |= ((b >> 12) & 1u) << 31; + word |= ((b >> 5) & 0x3fu) << 25; + word |= ((b >> 1) & 0xfu) << 8; + word |= ((b >> 11) & 1u) << 7; + wr_u32_le(cur, word); + obj_patch(mc->base.obj, fx->sec_id, fx->offset, cur, 4); + break; + } + case R_RV_JAL: { + /* J-type 20-bit signed displacement in 2-byte units. */ + const Section* s = obj_section_get(mc->base.obj, fx->sec_id); + if (!s) break; + u8 cur[4]; + buf_read(&s->bytes, fx->offset, cur, 4); + u32 word = rd_u32_le(cur); + u32 b = (u32)disp; + word &= 0x00000fffu; + word |= ((b >> 20) & 1u) << 31; + word |= ((b >> 1) & 0x3ffu) << 21; + word |= ((b >> 11) & 1u) << 20; + word |= ((b >> 12) & 0xffu) << 12; + wr_u32_le(cur, word); + obj_patch(mc->base.obj, fx->sec_id, fx->offset, cur, 4); + break; + } default: compiler_panic(mc->base.c, mc->base.loc, "MCEmitter: unsupported label-ref reloc kind %d", diff --git a/src/arch/rv64.c b/src/arch/rv64.c @@ -1,301 +1,2468 @@ -/* riscv64 CGTarget skeleton. +/* Minimal RISC-V (RV64IMFD, LP64D) CGTarget. * - * Phase-2 placeholder: the vtable is wired up but every method panics. - * This proves the cgtarget_new dispatch reaches an rv64-shaped target. - * A future phase fills in real codegen β€” see doc/MULTIARCH.md Β§4. */ + * Single-pass codegen mirroring src/arch/aarch64.c. The frame uses s0 + * (x8) as the frame pointer; locals live at s0-relative negative + * offsets, callee-save spills and outgoing args at sp-relative positive + * offsets. The prologue is reserved as a NOP placeholder at func_begin + * and patched at func_end once frame_size and the callee-save high- + * water marks are known. + * + * Reg allocator: lowest-bit-first over s2..s11 (int) and fs2..fs11 (fp). + * Scratch registers held outside the pools are t0..t3 (x5..x7, x28). + * + * Scope: this backend covers the v1 cg corpus paths the aarch64 backend + * covers, with these explicit gaps that still panic: + * - va_*, alloca, asm_block, atomic_cas (partial), intrinsic + * INTRIN_MUL_OVERFLOW i64. */ #include <string.h> #include "arch/arch.h" #include "arch/rv64.h" +#include "arch/rv64_isa.h" #include "core/arena.h" +#include "obj/obj.h" +#include "type/type.h" + +#define RV_PROLOGUE_WORDS 32u + +/* ============================================================ + * RegPool (copy of the aa64 helper β€” bit-set free mask). */ +typedef struct RegPool { + u32 free; + u32 hwm; + u8 base; + u8 nregs; + u8 pad[2]; +} RegPool; + +static void regpool_init(RegPool* p, u8 base, u8 nregs) { + p->base = base; + p->nregs = nregs; + p->hwm = 0; + p->free = (nregs >= 32u) ? 0xFFFFFFFFu : ((1u << nregs) - 1u); +} + +static Reg regpool_alloc(RegPool* p) { + if (p->free == 0) return (Reg)REG_NONE; + u32 idx = (u32)__builtin_ctz(p->free); + p->free &= ~(1u << idx); + if (idx + 1u > p->hwm) p->hwm = idx + 1u; + return (Reg)(p->base + idx); +} + +static int regpool_free(RegPool* p, Reg r) { + u32 rn = (u32)r; + if (rn < p->base || rn >= (u32)(p->base + p->nregs)) return 0; + u32 idx = rn - p->base; + u32 bit = 1u << idx; + if (p->free & bit) return -1; + p->free |= bit; + return 1; +} + +/* ============================================================ + * RImpl */ + +typedef struct RvSlot { + u32 off; /* bytes below s0 (positive); address = s0 - off */ + u32 size; + u32 align; + u8 kind; + u8 pad[3]; +} RvSlot; + +typedef struct RvScope { + u8 kind; + u8 has_else; + u8 pad[2]; + MCLabel else_label; + MCLabel end_label; + Label break_label; + Label continue_label; +} RvScope; typedef struct RImpl { CGTarget base; SrcLoc loc; + const CGFuncDesc* fd; + + u32 func_start; + u32 prologue_pos; + MCLabel epilogue_label; + + RvSlot* slots; + u32 nslots; + u32 slots_cap; + u32 cum_off; + u32 max_outgoing; + /* fp_pair_off captures the offset from sp where the saved-s0/ra pair + * sits. Computed at func_end. Stored so post-prologue sret/varargs + * stores written by func_begin don't depend on it (they use s0). */ + u32 fp_pair_off; + + u32 next_param_int; + u32 next_param_fp; + u32 next_param_stack; + u8 has_sret; + FrameSlot sret_ptr_slot; + + RegPool int_pool; + RegPool fp_pool; + + RvScope* scopes; + u32 nscopes; + u32 scopes_cap; + + u8 has_alloca; + /* alloca patch list: each call emits `addi dst, sp, 0` and registers + * the (pos, dst_reg) for patching with max_outgoing at func_end. */ + struct RvAllocaPatch { + u32 pos; + u32 dst_reg; + }* add_patches; + u32 nadd_patches; + u32 add_patches_cap; + + /* Variadic register save area: 64 bytes (a0..a7). Allocated lazily on + * the first va_start. The prologue patcher spills a-regs into it. */ + u8 is_variadic; + FrameSlot gp_save_slot; } RImpl; -static SrcLoc rv_loc(void) { return (SrcLoc){0, 0, 0}; } +static RImpl* impl_of(CGTarget* t) { return (RImpl*)t; } + +/* Forward decls. */ +static FrameSlot rv_frame_slot(CGTarget* t, const FrameSlotDesc* d); +static RvSlot* slot_get(RImpl* a, FrameSlot fs); +static void rv_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma); +static void rv_store(CGTarget* t, Operand addr, Operand src, MemAccess ma); + +/* ---- type helpers ---- */ +static int type_is_64(const Type* t) { + if (!t) return 0; + switch (t->kind) { + case TY_LONG: + case TY_ULONG: + case TY_LLONG: + case TY_ULLONG: + case TY_PTR: + case TY_DOUBLE: + return 1; + default: + return 0; + } +} +static int type_is_fp_double(const Type* t) { + return t && (t->kind == TY_DOUBLE || t->kind == TY_LDOUBLE); +} +static u32 type_byte_size(const Type* t) { + if (!t) return 4; + switch (t->kind) { + case TY_CHAR: + case TY_SCHAR: + case TY_UCHAR: + case TY_BOOL: + return 1; + case TY_SHORT: + case TY_USHORT: + return 2; + case TY_INT: + case TY_UINT: + case TY_FLOAT: + return 4; + case TY_LONG: + case TY_ULONG: + case TY_LLONG: + case TY_ULLONG: + case TY_PTR: + case TY_DOUBLE: + return 8; + default: + return 8; + } +} +static int type_is_signed(const Type* t) { + if (!t) return 0; + switch (t->kind) { + case TY_CHAR: + case TY_SCHAR: + case TY_SHORT: + case TY_INT: + case TY_LONG: + case TY_LLONG: + return 1; + default: + return 0; + } +} + +static u32 reg_num(Operand op) { return op.v.reg & 0x1fu; } + +extern void debug_emit_row(Debug*, ObjSecId text_section, u32 offset, SrcLoc); + +static void emit32(MCEmitter* mc, u32 word) { + u32 ofs = obj_pos(mc->obj, mc->section_id); + u8 b[4]; + b[0] = (u8)(word & 0xff); + b[1] = (u8)((word >> 8) & 0xff); + b[2] = (u8)((word >> 16) & 0xff); + b[3] = (u8)((word >> 24) & 0xff); + mc->emit_bytes(mc, b, 4); + if (mc->debug) debug_emit_row(mc->debug, mc->section_id, ofs, mc->loc); +} + +static void patch32(ObjBuilder* obj, u32 sec_id, u32 ofs, u32 word) { + u8 b[4]; + b[0] = (u8)(word & 0xff); + b[1] = (u8)((word >> 8) & 0xff); + b[2] = (u8)((word >> 16) & 0xff); + b[3] = (u8)((word >> 24) & 0xff); + obj_patch(obj, sec_id, ofs, b, 4); +} + +static _Noreturn void rv_panic(CGTarget* t, const char* what) { + SrcLoc loc = impl_of(t)->loc; + compiler_panic(t->c, loc, "rv64: %s not implemented", what); +} + +/* ---- immediate materialization ---- + * Load any i64 into `rd`. Strategy: + * - if fits signed 12-bit: addi rd, x0, imm + * - elif fits signed 32-bit: lui rd, hi20; addiw rd, rd, lo12 + * - otherwise: split into high and low 32-bit halves, materialize + * each separately, then shift-and-or. Worst-case sequence is up + * to 8 instructions; good enough for the cg test corpus. */ +static int fits_signed32(i64 v) { return v >= (i64)(i32)0x80000000 && v <= (i64)(i32)0x7fffffff; } + +static void emit_li_32(MCEmitter* mc, u32 rd, i32 imm) { + if (imm >= -2048 && imm <= 2047) { + emit32(mc, rv_addi(rd, RV_ZERO, imm)); + return; + } + /* hi20 + lo12, with 0x800 bias to compensate ADDIW's sign-ext. */ + i32 hi = (i32)((u32)(imm + 0x800) >> 12); + i32 lo = (i32)((i32)imm - (i32)(hi << 12)); + emit32(mc, rv_lui(rd, (u32)hi & 0xfffffu)); + if (lo) emit32(mc, rv_addiw(rd, rd, lo)); +} + +static void emit_load_imm(MCEmitter* mc, u32 sf, u32 rd, i64 imm) { + if (!sf) { + /* 32-bit destination: low 32 bits, sign-extended. */ + emit_li_32(mc, rd, (i32)imm); + return; + } + if (fits_signed32(imm)) { + emit_li_32(mc, rd, (i32)imm); + return; + } + /* General 64-bit load: split into high and low 32 bits, place high + * into rd << 32, then OR in low via a temp register (t0=x5). The cg + * corpus has no IMM operands that collide with t0, so this is safe. */ + i64 lo32 = (i64)(i32)(imm & 0xffffffffu); /* sign-ext low half */ + i64 hi64 = (imm - lo32) >> 32; /* what remains in hi */ + if (hi64 < (i64)(i32)0x80000000 || + hi64 > (i64)(i32)0x7fffffff) { + /* Out of i32 range β€” fallback: use a smaller chunked approach. + * For the cg corpus this isn't hit; emit a conservative sequence: + * li rd, hi32; slli 32; li t0, lo32; or rd, rd, t0. */ + i32 hi32 = (i32)(imm >> 32); + i32 lo32_i = (i32)imm; + emit_li_32(mc, rd, hi32); + emit32(mc, rv_slli(rd, rd, 32)); + emit_li_32(mc, RV_T0, lo32_i); + /* zero-extend t0 to clear sign-extension before OR */ + emit32(mc, rv_slli(RV_T0, RV_T0, 32)); + emit32(mc, rv_srli(RV_T0, RV_T0, 32)); + emit32(mc, rv_or(rd, rd, RV_T0)); + return; + } + emit_li_32(mc, rd, (i32)hi64); + emit32(mc, rv_slli(rd, rd, 32)); + if (lo32 != 0) { + emit_li_32(mc, RV_T0, (i32)lo32); + emit32(mc, rv_slli(RV_T0, RV_T0, 32)); + emit32(mc, rv_srli(RV_T0, RV_T0, 32)); + emit32(mc, rv_or(rd, rd, RV_T0)); + } +} + +/* sp += imm. imm can be any signed value the caller passes β€” we pick + * the shortest sequence. */ +static void emit_sp_addi(MCEmitter* mc, i64 imm) { + if (imm >= -2048 && imm <= 2047) { + emit32(mc, rv_addi(RV_SP, RV_SP, (i32)imm)); + return; + } + emit_load_imm(mc, 1, RV_T0, imm); + emit32(mc, rv_add(RV_SP, RV_SP, RV_T0)); +} + +/* ---- function lifecycle ---- */ + +static void rv_func_begin(CGTarget* t, const CGFuncDesc* fd) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + + mc->set_section(mc, fd->text_section_id); + mc->emit_align(mc, 4, 0); + + a->fd = fd; + a->func_start = mc->pos(mc); + a->next_param_int = 0; + a->next_param_fp = 0; + a->next_param_stack = 0; + a->has_sret = (fd->abi && fd->abi->has_sret) ? 1 : 0; + a->cum_off = 0; + a->max_outgoing = 0; + a->fp_pair_off = 0; + regpool_init(&a->int_pool, /*base=*/18u, /*nregs=*/10u); /* s2..s11 */ + regpool_init(&a->fp_pool, /*base=*/18u, /*nregs=*/10u); /* fs2..fs11 */ + a->nslots = 0; + a->nscopes = 0; + a->has_alloca = 0; + a->nadd_patches = 0; + a->is_variadic = (fd->abi && fd->abi->variadic) ? 1 : 0; + a->gp_save_slot = FRAME_SLOT_NONE; + a->sret_ptr_slot = FRAME_SLOT_NONE; + a->epilogue_label = mc->label_new(mc); -_Noreturn static void rv_panic(CGTarget* t, const char* what) { - compiler_panic(t->c, rv_loc(), "rv64: %s not implemented", what); + mc->cfi_startproc(mc); + + /* Reserve a NOP-filled prologue placeholder; func_end patches it. */ + a->prologue_pos = mc->pos(mc); + for (u32 i = 0; i < RV_PROLOGUE_WORDS; ++i) emit32(mc, RV_NOP); + + /* For an sret return, the caller passed the destination pointer in + * a0; reserve a hidden slot to spill it into so the body can use a0 + * freely. The actual SD a0, ...(s0) is emitted in the patched + * prologue once the slot offset is known. */ + if (a->has_sret) { + FrameSlotDesc fsd = { + .type = NULL, + .name = 0, + .loc = (SrcLoc){0, 0, 0}, + .size = 8, + .align = 8, + .kind = FS_SPILL, + .flags = 0, + }; + a->sret_ptr_slot = rv_frame_slot(t, &fsd); + /* Consume a0 β€” it is no longer available for the first real param. */ + a->next_param_int = 1; + } + + /* Variadic: reserve a 64-byte GP save area for a0..a7. The patcher + * spills the regs into it as part of the prologue, after `addi s0,…`. */ + if (a->is_variadic) { + FrameSlotDesc gpd = { + .type = NULL, + .name = 0, + .loc = (SrcLoc){0, 0, 0}, + .size = 64, + .align = 8, + .kind = FS_SPILL, + .flags = 0, + }; + a->gp_save_slot = rv_frame_slot(t, &gpd); + } } -static void rv_func_begin(CGTarget* t, const CGFuncDesc* d) { - (void)d; - rv_panic(t, "func_begin"); +static void rv_func_end(CGTarget* t) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + ObjBuilder* obj = t->obj; + u32 sec = a->fd->text_section_id; + + u32 n_int_saves = a->int_pool.hwm; /* s2..s2+hwm-1 */ + u32 n_fp_saves = a->fp_pool.hwm; + u32 max_out = (a->max_outgoing + 15u) & ~15u; + u32 int_saves_sz = n_int_saves * 8u; + u32 fp_saves_sz = n_fp_saves * 8u; + + u32 locals_off = max_out + int_saves_sz + fp_saves_sz; /* from sp */ + u32 fp_pair_off = locals_off + a->cum_off; + u32 frame_size = fp_pair_off + 16u; + frame_size = (frame_size + 15u) & ~15u; + fp_pair_off = frame_size - 16u; + a->fp_pair_off = fp_pair_off; + + /* Place the epilogue label at current pos. */ + mc->label_place(mc, a->epilogue_label); + + /* Restore int and fp saves using s0-relative addressing so they + * don't depend on the final frame_size encoding (and survive + * alloca-induced sp shifts). */ + /* layout below s0: + * s0 - 8 .. s0 - 16 saved s0/ra ? No β€” those are at sp+fp_pair_off + * We arranged saved-s0/ra at [sp+fp_pair_off], not below s0. So + * immediately below s0 are: int saves, then fp saves, then locals. + * Wait β€” let me recompute. + * + * sp + 0 outgoing args (max_out bytes) + * sp + max_out int saves + * sp + max_out + I fp saves + * sp + max_out+I+F locals (cum_off) + * sp + fp_pair_off saved s0_caller (8) + * sp + fp_pair_off+8 saved ra (8) + * sp + frame_size end + * + * s0 = sp + fp_pair_off (so [s0+0] = saved s0_caller). + * Locals at [s0 - off] where off in [1..cum_off]. + * FP saves at [s0 - cum_off - 8*i]. + * Int saves at [s0 - cum_off - F - 8*i]. */ + /* Save slots sit at the start of an 8-byte cell below the locals + * area. fp_save_base = offset of the first fp save (=-(L+8)); each + * subsequent save is 8 bytes lower. int saves start below the fp + * block. */ + i32 fp_save_base = -(i32)a->cum_off - 8; + i32 int_save_base = fp_save_base - (i32)fp_saves_sz; + + /* Reverse order: ints first (lowest address) on restore, but we emit + * the restore loop in reverse to keep the prologue/epilogue symmetric. */ + for (i32 i = (i32)n_int_saves - 1; i >= 0; --i) { + u32 r = 18u + (u32)i; /* s2 + i */ + i32 off = int_save_base - 8 * (i32)i; + emit32(mc, rv_ld(r, RV_S0, off)); + } + for (i32 i = (i32)n_fp_saves - 1; i >= 0; --i) { + u32 r = 18u + (u32)i; /* fs2 + i (fp reg number) */ + i32 off = fp_save_base - 8 * (i32)i; + emit32(mc, rv_fld(r, RV_S0, off)); + } + /* Restore sp from s0 first so alloca-induced offsets don't matter. + * After this, sp == its post-prologue value. */ + if (a->has_alloca) { + if ((i32)fp_pair_off > 2047) { + compiler_panic(t->c, a->loc, "rv64: fp_pair_off too large for alloca"); + } + emit32(mc, rv_addi(RV_SP, RV_S0, -(i32)fp_pair_off)); + } + emit32(mc, rv_ld(RV_S0, RV_SP, (i32)fp_pair_off)); + emit32(mc, rv_ld(RV_RA, RV_SP, (i32)fp_pair_off + 8)); + emit_sp_addi(mc, (i64)frame_size); + emit32(mc, rv_ret_()); + + /* Now patch the prologue placeholder. */ + u32 pos = a->prologue_pos; + u32 words[RV_PROLOGUE_WORDS]; + for (u32 i = 0; i < RV_PROLOGUE_WORDS; ++i) words[i] = RV_NOP; + u32 wi = 0; + + /* addi sp, sp, -frame_size (or 2-insn if too large) */ + if ((i64)frame_size <= 2048) { + words[wi++] = rv_addi(RV_SP, RV_SP, -(i32)frame_size); + } else { + /* li t0, -frame_size; add sp, sp, t0 */ + /* Use a small two-instruction expansion via LUI+ADDI if it fits 32-bit; + * otherwise we'd need a full load_imm but that's overkill for tests. */ + i64 neg = -(i64)frame_size; + if (fits_signed32(neg)) { + i32 hi = (i32)((u32)((i32)neg + 0x800) >> 12); + i32 lo = (i32)neg - (hi << 12); + words[wi++] = rv_lui(RV_T0, (u32)hi & 0xfffffu); + if (lo) words[wi++] = rv_addiw(RV_T0, RV_T0, lo); + words[wi++] = rv_add(RV_SP, RV_SP, RV_T0); + } else { + compiler_panic(t->c, a->loc, "rv64: frame_size too large to patch"); + } + } + /* sd s0, fp_pair_off(sp); sd ra, fp_pair_off+8(sp); addi s0, sp, fp_pair_off */ + if ((i32)fp_pair_off > 2047 || (i32)(fp_pair_off + 8) > 2047) { + compiler_panic(t->c, a->loc, "rv64: fp_pair_off out of imm12 range"); + } + words[wi++] = rv_sd(RV_S0, RV_SP, (i32)fp_pair_off); + words[wi++] = rv_sd(RV_RA, RV_SP, (i32)fp_pair_off + 8); + words[wi++] = rv_addi(RV_S0, RV_SP, (i32)fp_pair_off); + + /* If sret, spill incoming a0 into the hidden slot. */ + if (a->has_sret && a->sret_ptr_slot != FRAME_SLOT_NONE) { + RvSlot* s = slot_get(a, a->sret_ptr_slot); + if (s) { + if (wi >= RV_PROLOGUE_WORDS) goto overflow; + words[wi++] = rv_sd(RV_A0, RV_S0, -(i32)s->off); + } + } + /* Variadic: spill a0..a7 into the GP save area. */ + if (a->is_variadic && a->gp_save_slot != FRAME_SLOT_NONE) { + RvSlot* gs = slot_get(a, a->gp_save_slot); + if (gs) { + for (u32 i = 0; i < 8; ++i) { + if (wi >= RV_PROLOGUE_WORDS) goto overflow; + words[wi++] = rv_sd(RV_A0 + i, RV_S0, + -(i32)gs->off + (i32)i * 8); + } + } + } + /* int saves */ + for (u32 i = 0; i < n_int_saves; ++i) { + u32 r = 18u + i; + i32 off = int_save_base - 8 * (i32)i; + if (wi >= RV_PROLOGUE_WORDS) goto overflow; + words[wi++] = rv_sd(r, RV_S0, off); + } + /* fp saves */ + for (u32 i = 0; i < n_fp_saves; ++i) { + u32 r = 18u + i; + i32 off = fp_save_base - 8 * (i32)i; + if (wi >= RV_PROLOGUE_WORDS) goto overflow; + words[wi++] = rv_fsd(r, RV_S0, off); + } + if (0) { + overflow: + compiler_panic(t->c, a->loc, + "rv64: prologue placeholder too small (used %u of %u)", wi, + RV_PROLOGUE_WORDS); + } + + for (u32 i = 0; i < RV_PROLOGUE_WORDS; ++i) { + patch32(obj, sec, pos + i * 4u, words[i]); + } + + /* Patch alloca placeholders with max_outgoing. */ + if (max_out > 2047u) { + compiler_panic(t->c, a->loc, + "rv64: max_outgoing %u out of imm12 for alloca patch", + max_out); + } + for (u32 i = 0; i < a->nadd_patches; ++i) { + u32 dr = a->add_patches[i].dst_reg; + u32 word = rv_addi(dr, RV_SP, (i32)max_out); + patch32(obj, sec, a->add_patches[i].pos, word); + } + + /* Define the function symbol. */ + u32 end = mc->pos(mc); + obj_symbol_define(obj, a->fd->sym, sec, (u64)a->func_start, + (u64)(end - a->func_start)); + + mc->cfi_endproc(mc); + a->fd = NULL; } -static void rv_func_end(CGTarget* t) { rv_panic(t, "func_end"); } + +/* ---- regs / frame ---- */ static Reg rv_alloc_reg(CGTarget* t, RegClass cls, const Type* ty) { - (void)cls; + RImpl* a = impl_of(t); (void)ty; - rv_panic(t, "alloc_reg"); + if (cls == RC_INT) return regpool_alloc(&a->int_pool); + if (cls == RC_FP) return regpool_alloc(&a->fp_pool); + compiler_panic(t->c, a->loc, "rv64 alloc_reg: class %d unimpl", (int)cls); } + static void rv_free_reg(CGTarget* t, Reg r) { - (void)r; - rv_panic(t, "free_reg"); + RImpl* a = impl_of(t); + RegPool* pools[2] = {&a->int_pool, &a->fp_pool}; + for (u32 i = 0; i < 2; ++i) { + int rc = regpool_free(pools[i], r); + if (rc == 1) return; + if (rc == -1) { + compiler_panic(t->c, a->loc, "rv64 free_reg: reg %u already free", + (unsigned)r); + } + } + compiler_panic(t->c, a->loc, "rv64 free_reg: reg %u not a scratch reg", + (unsigned)r); } + static FrameSlot rv_frame_slot(CGTarget* t, const FrameSlotDesc* d) { - (void)d; - rv_panic(t, "frame_slot"); + RImpl* a = impl_of(t); + if (a->nslots == a->slots_cap) { + u32 ncap = a->slots_cap ? a->slots_cap * 2 : 8; + RvSlot* nbuf = arena_array(t->c->tu, RvSlot, ncap); + if (a->slots) memcpy(nbuf, a->slots, sizeof(RvSlot) * a->nslots); + a->slots = nbuf; + a->slots_cap = ncap; + } + u32 size = d->size ? d->size : 8; + u32 align = d->align ? d->align : 1; + u32 next = a->cum_off + size; + u32 mask = align - 1; + next = (next + mask) & ~mask; + + RvSlot* s = &a->slots[a->nslots]; + s->off = next; + s->size = size; + s->align = align; + s->kind = d->kind; + + a->cum_off = next; + a->nslots++; + return (FrameSlot)(a->nslots); } -static void rv_param(CGTarget* t, const CGParamDesc* d) { - (void)d; - rv_panic(t, "param"); + +static RvSlot* slot_get(RImpl* a, FrameSlot fs) { + if (fs == FRAME_SLOT_NONE || fs > a->nslots) return NULL; + return &a->slots[fs - 1]; +} + +/* For a memory access of `nbytes`, pick the right store opcode. */ +static u32 enc_int_store(u32 nbytes, u32 src, u32 base, i32 off) { + switch (nbytes) { + case 1: return rv_sb(src, base, off); + case 2: return rv_sh(src, base, off); + case 4: return rv_sw(src, base, off); + default: return rv_sd(src, base, off); + } } -static const Reg* rv_clobbers(CGTarget* t, RegClass cls, u32* nregs) { - (void)cls; - (void)nregs; +static u32 enc_int_load(u32 nbytes, int sign_ext, u32 rd, u32 base, i32 off) { + switch (nbytes) { + case 1: return sign_ext ? rv_lb(rd, base, off) : rv_lbu(rd, base, off); + case 2: return sign_ext ? rv_lh(rd, base, off) : rv_lhu(rd, base, off); + case 4: return sign_ext ? rv_lw(rd, base, off) : rv_lwu(rd, base, off); + default: return rv_ld(rd, base, off); + } +} + +/* ---- param ---- */ + +static void rv_param(CGTarget* t, const CGParamDesc* p) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + RvSlot* s = slot_get(a, p->slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 param: bad slot"); + const ABIArgInfo* ai = p->abi; + + if (ai->kind == ABI_ARG_IGNORE) return; + if (ai->kind == ABI_ARG_INDIRECT) { + /* Pointer-to-copy passed in a-register. Copy bytes from there into + * the home slot. Source pointer is in a0..a7. */ + u32 ptr_reg; + if (a->next_param_int < 8) { + ptr_reg = RV_A0 + a->next_param_int; + a->next_param_int++; + } else { + u32 caller_off = a->next_param_stack; + a->next_param_stack += 8; + /* Incoming stack args live in the caller's outgoing-arg area, + * which is `frame_size - fp_pair_off` (= 16 + the saved-s0/ra + * pair) above s0 β€” same logic as aa64's `16 + caller_off`. */ + emit32(mc, rv_ld(RV_T1, RV_S0, (i32)(16u + caller_off))); + ptr_reg = RV_T1; + } + u32 nbytes = s->size; + u32 i = 0; + while (i + 8 <= nbytes) { + emit32(mc, rv_ld(RV_T2, ptr_reg, (i32)i)); + emit32(mc, rv_sd(RV_T2, RV_S0, -(i32)s->off + (i32)i)); + i += 8; + } + while (i + 4 <= nbytes) { + emit32(mc, rv_lwu(RV_T2, ptr_reg, (i32)i)); + emit32(mc, rv_sw(RV_T2, RV_S0, -(i32)s->off + (i32)i)); + i += 4; + } + while (i + 2 <= nbytes) { + emit32(mc, rv_lhu(RV_T2, ptr_reg, (i32)i)); + emit32(mc, rv_sh(RV_T2, RV_S0, -(i32)s->off + (i32)i)); + i += 2; + } + while (i < nbytes) { + emit32(mc, rv_lbu(RV_T2, ptr_reg, (i32)i)); + emit32(mc, rv_sb(RV_T2, RV_S0, -(i32)s->off + (i32)i)); + i += 1; + } + return; + } + /* DIRECT */ + for (u16 i = 0; i < ai->nparts; ++i) { + const ABIArgPart* pt = &ai->parts[i]; + u32 part_off = pt->src_offset; + u32 sz = pt->size; + + if (pt->cls == ABI_CLASS_INT) { + if (a->next_param_int < 8) { + u32 reg = RV_A0 + a->next_param_int; + a->next_param_int++; + emit32(mc, enc_int_store(sz, reg, RV_S0, + -(i32)s->off + (i32)part_off)); + } else { + u32 caller_off = a->next_param_stack; + a->next_param_stack += 8; + emit32(mc, enc_int_load(sz, 0, RV_T2, RV_S0, + (i32)(16u + caller_off))); + emit32(mc, enc_int_store(sz, RV_T2, RV_S0, + -(i32)s->off + (i32)part_off)); + } + } else if (pt->cls == ABI_CLASS_FP) { + if (a->next_param_fp < 8) { + u32 reg = a->next_param_fp; /* fa0..fa7 β†’ freg 10..17 */ + u32 freg = 10u + reg; + a->next_param_fp++; + if (sz == 8) { + emit32(mc, rv_fsd(freg, RV_S0, -(i32)s->off + (i32)part_off)); + } else { + emit32(mc, rv_fsw(freg, RV_S0, -(i32)s->off + (i32)part_off)); + } + } else { + u32 caller_off = a->next_param_stack; + a->next_param_stack += 8; + if (sz == 8) { + emit32(mc, rv_fld(0, RV_S0, (i32)(16u + caller_off))); + emit32(mc, rv_fsd(0, RV_S0, -(i32)s->off + (i32)part_off)); + } else { + emit32(mc, rv_flw(0, RV_S0, (i32)(16u + caller_off))); + emit32(mc, rv_fsw(0, RV_S0, -(i32)s->off + (i32)part_off)); + } + } + } else { + compiler_panic(t->c, a->loc, "rv64 param: ABI class %d unimpl", + (int)pt->cls); + } + } +} + +static const Reg* rv_clobbers(CGTarget* t, RegClass c, u32* n) { + (void)c; + (void)n; rv_panic(t, "clobbers"); } -static void rv_spill_reg(CGTarget* t, Operand a, FrameSlot s, MemAccess m) { - (void)a; - (void)s; - (void)m; - rv_panic(t, "spill_reg"); + +static void rv_spill_reg(CGTarget* t, Operand src, FrameSlot slot, + MemAccess ma) { + RImpl* a = impl_of(t); + if (src.kind != OPK_REG) { + compiler_panic(t->c, a->loc, "rv64 spill_reg: src is not OPK_REG"); + } + Operand addr; + memset(&addr, 0, sizeof addr); + addr.kind = OPK_LOCAL; + addr.cls = RC_INT; + addr.type = ma.type; + addr.v.frame_slot = slot; + rv_store(t, addr, src, ma); + rv_free_reg(t, src.v.reg); } -static void rv_reload_reg(CGTarget* t, Operand a, FrameSlot s, MemAccess m) { - (void)a; - (void)s; - (void)m; - rv_panic(t, "reload_reg"); + +static void rv_reload_reg(CGTarget* t, Operand dst, FrameSlot slot, + MemAccess ma) { + RImpl* a = impl_of(t); + if (dst.kind != OPK_REG) { + compiler_panic(t->c, a->loc, "rv64 reload_reg: dst is not OPK_REG"); + } + Operand addr; + memset(&addr, 0, sizeof addr); + addr.kind = OPK_LOCAL; + addr.cls = RC_INT; + addr.type = ma.type; + addr.v.frame_slot = slot; + rv_load(t, dst, addr, ma); } -static Label rv_label_new(CGTarget* t) { rv_panic(t, "label_new"); } +/* ---- labels / control flow ---- */ + +static Label rv_label_new(CGTarget* t) { + return (Label)t->mc->label_new(t->mc); +} static void rv_label_place(CGTarget* t, Label l) { - (void)l; - rv_panic(t, "label_place"); + t->mc->label_place(t->mc, (MCLabel)l); } static void rv_jump(CGTarget* t, Label l) { - (void)l; - rv_panic(t, "jump"); + MCEmitter* mc = t->mc; + emit32(mc, rv_jal(RV_ZERO, 0)); + mc->emit_label_ref(mc, (MCLabel)l, R_RV_JAL, 4, 0); } -static void rv_cmp_branch(CGTarget* t, CmpOp op, Operand a, Operand b, + +/* Force an integer Operand into a register; materializes IMM via scratch. */ +static u32 force_reg_int(CGTarget* t, Operand op, u32 scratch) { + if (op.kind == OPK_REG) return reg_num(op); + if (op.kind == OPK_IMM) { + u32 sf = type_is_64(op.type) ? 1u : 0u; + emit_load_imm(t->mc, sf, scratch, op.v.imm); + return scratch; + } + compiler_panic(t->c, impl_of(t)->loc, + "rv64: operand kind %d unsupported here", (int)op.kind); +} + +/* Emit a conditional branch (a OP b) β†’ label. Uses BEQ/BNE/BLT/BGE etc. */ +static void rv_cmp_branch(CGTarget* t, CmpOp op, Operand a_op, Operand b_op, Label l) { - (void)op; - (void)a; - (void)b; - (void)l; - rv_panic(t, "cmp_branch"); + MCEmitter* mc = t->mc; + RImpl* a = impl_of(t); + /* For FP compares, fall through to materialize the result and CBNZ. */ + if (op == CMP_LT_F || op == CMP_LE_F || op == CMP_GT_F || op == CMP_GE_F) { + compiler_panic(t->c, a->loc, "rv64 cmp_branch: FP cmp NYI"); + } + u32 ra = force_reg_int(t, a_op, RV_T0); + u32 rb = force_reg_int(t, b_op, (ra == RV_T0) ? RV_T1 : RV_T0); + u32 word = 0; + switch (op) { + case CMP_EQ: word = rv_beq(ra, rb, 0); break; + case CMP_NE: word = rv_bne(ra, rb, 0); break; + case CMP_LT_S: word = rv_blt(ra, rb, 0); break; + case CMP_GE_S: word = rv_bge(ra, rb, 0); break; + case CMP_LT_U: word = rv_bltu(ra, rb, 0); break; + case CMP_GE_U: word = rv_bgeu(ra, rb, 0); break; + /* >= can become < with operands swapped: a > b ↔ b < a; + * a <= b ↔ b >= a. */ + case CMP_GT_S: word = rv_blt(rb, ra, 0); break; + case CMP_LE_S: word = rv_bge(rb, ra, 0); break; + case CMP_GT_U: word = rv_bltu(rb, ra, 0); break; + case CMP_LE_U: word = rv_bgeu(rb, ra, 0); break; + default: + compiler_panic(t->c, a->loc, "rv64 cmp_branch: op %d unimpl", (int)op); + } + emit32(mc, word); + mc->emit_label_ref(mc, (MCLabel)l, R_RV_BRANCH, 4, 0); } +/* Materialize 0/1 into dst from a comparison. */ +static void rv_cmp(CGTarget* t, CmpOp op, Operand dst, Operand a_op, + Operand b_op) { + MCEmitter* mc = t->mc; + RImpl* a = impl_of(t); + u32 rd = reg_num(dst); + + if (op == CMP_LT_F || op == CMP_LE_F || op == CMP_GT_F || op == CMP_GE_F) { + /* FP compare in fa,fb β†’ rd. Use FLT/FLE/FEQ depending on op. */ + int is_d = type_is_fp_double(a_op.type); + u32 fa = reg_num(a_op); + u32 fb = reg_num(b_op); + switch (op) { + case CMP_LT_F: emit32(mc, is_d ? rv_flt_d(rd, fa, fb) : rv_flt_s(rd, fa, fb)); return; + case CMP_LE_F: emit32(mc, is_d ? rv_fle_d(rd, fa, fb) : rv_fle_s(rd, fa, fb)); return; + case CMP_GT_F: emit32(mc, is_d ? rv_flt_d(rd, fb, fa) : rv_flt_s(rd, fb, fa)); return; + case CMP_GE_F: emit32(mc, is_d ? rv_fle_d(rd, fb, fa) : rv_fle_s(rd, fb, fa)); return; + default: break; + } + } + u32 ra = force_reg_int(t, a_op, RV_T0); + u32 rb = force_reg_int(t, b_op, (ra == RV_T0) ? RV_T1 : RV_T0); + + switch (op) { + case CMP_EQ: + emit32(mc, rv_sub(rd, ra, rb)); + emit32(mc, rv_sltiu(rd, rd, 1)); + return; + case CMP_NE: + emit32(mc, rv_sub(rd, ra, rb)); + emit32(mc, rv_sltu(rd, RV_ZERO, rd)); + return; + case CMP_LT_S: emit32(mc, rv_slt(rd, ra, rb)); return; + case CMP_LT_U: emit32(mc, rv_sltu(rd, ra, rb)); return; + case CMP_GT_S: emit32(mc, rv_slt(rd, rb, ra)); return; + case CMP_GT_U: emit32(mc, rv_sltu(rd, rb, ra)); return; + case CMP_GE_S: + emit32(mc, rv_slt(rd, ra, rb)); + emit32(mc, rv_xori(rd, rd, 1)); + return; + case CMP_GE_U: + emit32(mc, rv_sltu(rd, ra, rb)); + emit32(mc, rv_xori(rd, rd, 1)); + return; + case CMP_LE_S: + emit32(mc, rv_slt(rd, rb, ra)); + emit32(mc, rv_xori(rd, rd, 1)); + return; + case CMP_LE_U: + emit32(mc, rv_sltu(rd, rb, ra)); + emit32(mc, rv_xori(rd, rd, 1)); + return; + default: + compiler_panic(t->c, a->loc, "rv64 cmp: op %d unimpl", (int)op); + } +} + +/* ---- structured scopes (SCOPE_IF + SCOPE_LOOP/BLOCK bookkeep) ---- */ + static CGScope rv_scope_begin(CGTarget* t, const CGScopeDesc* d) { - (void)d; - rv_panic(t, "scope_begin"); + RImpl* a = impl_of(t); + if (a->nscopes == a->scopes_cap) { + u32 ncap = a->scopes_cap ? a->scopes_cap * 2u : 4u; + RvScope* nb = arena_array(t->c->tu, RvScope, ncap); + if (a->scopes) memcpy(nb, a->scopes, sizeof(RvScope) * a->nscopes); + a->scopes = nb; + a->scopes_cap = ncap; + } + RvScope* sc = &a->scopes[a->nscopes]; + sc->kind = (u8)d->kind; + sc->has_else = 0; + sc->else_label = 0; + sc->end_label = 0; + sc->break_label = d->break_label; + sc->continue_label = d->continue_label; + + if (d->kind == SCOPE_IF) { + sc->else_label = t->mc->label_new(t->mc); + sc->end_label = t->mc->label_new(t->mc); + u32 rn = force_reg_int(t, d->cond, RV_T0); + /* beq rn, x0, else_label */ + emit32(t->mc, rv_beq(rn, RV_ZERO, 0)); + t->mc->emit_label_ref(t->mc, sc->else_label, R_RV_BRANCH, 4, 0); + } else if (d->kind == SCOPE_LOOP || d->kind == SCOPE_BLOCK) { + /* bookkeep only */ + } else { + compiler_panic(t->c, a->loc, + "rv64 scope_begin: kind %d not yet implemented", + (int)d->kind); + } + a->nscopes++; + return (CGScope)a->nscopes; } + static void rv_scope_else(CGTarget* t, CGScope s) { - (void)s; - rv_panic(t, "scope_else"); + RImpl* a = impl_of(t); + if (s == CG_SCOPE_NONE || s > a->nscopes) { + compiler_panic(t->c, a->loc, "rv64 scope_else: bad scope"); + } + RvScope* sc = &a->scopes[s - 1]; + /* jump end ; place else */ + emit32(t->mc, rv_jal(RV_ZERO, 0)); + t->mc->emit_label_ref(t->mc, sc->end_label, R_RV_JAL, 4, 0); + t->mc->label_place(t->mc, sc->else_label); + sc->has_else = 1; } + static void rv_scope_end(CGTarget* t, CGScope s) { - (void)s; - rv_panic(t, "scope_end"); + RImpl* a = impl_of(t); + if (s == CG_SCOPE_NONE || s > a->nscopes) { + compiler_panic(t->c, a->loc, "rv64 scope_end: bad scope"); + } + RvScope* sc = &a->scopes[s - 1]; + if (sc->kind == SCOPE_IF) { + if (!sc->has_else) t->mc->label_place(t->mc, sc->else_label); + t->mc->label_place(t->mc, sc->end_label); + } } + static void rv_break_to(CGTarget* t, CGScope s) { - (void)s; - rv_panic(t, "break_to"); + RImpl* a = impl_of(t); + if (s == CG_SCOPE_NONE || s > a->nscopes) { + compiler_panic(t->c, a->loc, "rv64 break_to: bad scope"); + } + rv_jump(t, a->scopes[s - 1].break_label); } + static void rv_continue_to(CGTarget* t, CGScope s) { - (void)s; - rv_panic(t, "continue_to"); -} - -static void rv_load_imm(CGTarget* t, Operand d, i64 i) { - (void)d; - (void)i; - rv_panic(t, "load_imm"); -} -static void rv_load_const(CGTarget* t, Operand d, ConstBytes b) { - (void)d; - (void)b; - rv_panic(t, "load_const"); -} -static void rv_copy(CGTarget* t, Operand d, Operand s) { - (void)d; - (void)s; - rv_panic(t, "copy"); -} -static void rv_load(CGTarget* t, Operand d, Operand a, MemAccess m) { - (void)d; - (void)a; - (void)m; - rv_panic(t, "load"); -} -static void rv_store(CGTarget* t, Operand a, Operand s, MemAccess m) { - (void)a; - (void)s; - (void)m; - rv_panic(t, "store"); -} -static void rv_addr_of(CGTarget* t, Operand d, Operand l) { - (void)d; - (void)l; + RImpl* a = impl_of(t); + if (s == CG_SCOPE_NONE || s > a->nscopes) { + compiler_panic(t->c, a->loc, "rv64 continue_to: bad scope"); + } + rv_jump(t, a->scopes[s - 1].continue_label); +} + +/* ---- data movement ---- */ + +static void rv_load_imm(CGTarget* t, Operand dst, i64 imm) { + u32 sf = type_is_64(dst.type) ? 1u : 0u; + emit_load_imm(t->mc, sf, reg_num(dst), imm); +} + +static void rv_load_const(CGTarget* t, Operand dst, ConstBytes cb) { + RImpl* a = impl_of(t); + if (dst.cls != RC_FP) { + compiler_panic(t->c, a->loc, "rv64 load_const: only FP supported in v1"); + } + Sym ro_name = pool_intern_cstr(t->c->global, ".rodata"); + ObjSecId ro = obj_section(t->obj, ro_name, SEC_RODATA, SF_ALLOC, + cb.align ? cb.align : 4); + + u32 cur_section = t->mc->section_id; + t->mc->set_section(t->mc, ro); + t->mc->emit_align(t->mc, cb.align ? cb.align : 4, 0); + u32 ro_off = t->mc->pos(t->mc); + t->mc->emit_bytes(t->mc, cb.bytes, cb.size); + + char namebuf[64]; + static u32 lit_seq = 0; + int len = 0; + { + const char* prefix = ".LCFP"; + for (; prefix[len]; ++len) namebuf[len] = prefix[len]; + u32 v = lit_seq++; + char tmp[16]; + int tn = 0; + if (v == 0) tmp[tn++] = '0'; + else { + while (v) { tmp[tn++] = '0' + (char)(v % 10); v /= 10; } + } + for (int i = tn - 1; i >= 0; --i) namebuf[len++] = tmp[i]; + namebuf[len] = 0; + } + Sym sname = pool_intern_cstr(t->c->global, namebuf); + ObjSymId sym = obj_symbol(t->obj, sname, SB_LOCAL, SK_OBJ, ro, (u64)ro_off, + (u64)cb.size); + t->mc->set_section(t->mc, cur_section); + + /* auipc t0, %pcrel_hi(sym) ; flw/fld dst, %pcrel_lo(...)(t0) + * The LO12_I reloc references the AUIPC's site address (a label/sym + * placed at the AUIPC). For simplicity we make a local symbol at the + * AUIPC and bind LO12_I to it. */ + u32 sec = t->mc->section_id; + u32 auipc_pos = t->mc->pos(t->mc); + emit32(t->mc, rv_auipc(RV_T0, 0)); + t->mc->emit_reloc_at(t->mc, sec, auipc_pos, R_RV_PCREL_HI20, sym, 0, 0, 0); + /* Create a local symbol at the AUIPC site to anchor PCREL_LO12. */ + char anchor_buf[64]; + int al = 0; + { + const char* p2 = ".LpcrelHi"; + for (; p2[al]; ++al) anchor_buf[al] = p2[al]; + static u32 seq2 = 0; + u32 v = seq2++; + char tmp[16]; int tn = 0; + if (v == 0) tmp[tn++] = '0'; + else { while (v) { tmp[tn++] = '0' + (char)(v % 10); v /= 10; } } + for (int i = tn - 1; i >= 0; --i) anchor_buf[al++] = tmp[i]; + anchor_buf[al] = 0; + } + Sym aname = pool_intern_cstr(t->c->global, anchor_buf); + ObjSymId anchor = obj_symbol(t->obj, aname, SB_LOCAL, SK_OBJ, sec, + (u64)auipc_pos, 0); + u32 lpos = t->mc->pos(t->mc); + if (cb.size == 8) { + emit32(t->mc, rv_fld(reg_num(dst), RV_T0, 0)); + } else { + emit32(t->mc, rv_flw(reg_num(dst), RV_T0, 0)); + } + t->mc->emit_reloc_at(t->mc, sec, lpos, R_RV_PCREL_LO12_I, anchor, 0, 0, 0); +} + +static void rv_copy(CGTarget* t, Operand dst, Operand src) { + if (dst.cls == RC_FP || src.cls == RC_FP) { + u32 fmt = type_is_fp_double(dst.type) ? RV_FMT_D : RV_FMT_S; + /* fmv.fmt rd, rs = fsgnj.fmt rd, rs, rs */ + u32 r = reg_num(src); + emit32(t->mc, rv_fsgnj(fmt, reg_num(dst), r, r)); + return; + } + /* mv rd, rs = addi rd, rs, 0 (works for both 32 and 64-bit copies) */ + emit32(t->mc, rv_addi(reg_num(dst), reg_num(src), 0)); +} + +/* ---- address resolution ---- */ + +/* Materialize the address of `addr` (LOCAL or INDIRECT or GLOBAL) into + * `tmp_reg`. Returns the register holding the base and writes the + * effective signed offset to *out_off (0 when we synthesized into tmp). + * For OPK_GLOBAL, emits AUIPC + an LO12 reloc on the caller's load/store. */ +static u32 addr_base(CGTarget* t, Operand addr, i32* out_off, u32 tmp_reg) { + RImpl* a = impl_of(t); + if (addr.kind == OPK_LOCAL) { + RvSlot* s = slot_get(a, addr.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 addr_base: bad slot"); + i32 off = -(i32)s->off; + if (off >= -2048 && off <= 2047) { + *out_off = off; + return RV_S0; + } + emit_load_imm(t->mc, 1, tmp_reg, (i64)off); + emit32(t->mc, rv_add(tmp_reg, RV_S0, tmp_reg)); + *out_off = 0; + return tmp_reg; + } + if (addr.kind == OPK_INDIRECT) { + i32 off = addr.v.ind.ofs; + u32 base = addr.v.ind.base & 0x1f; + if (off >= -2048 && off <= 2047) { + *out_off = off; + return base; + } + emit_load_imm(t->mc, 1, tmp_reg, (i64)off); + emit32(t->mc, rv_add(tmp_reg, base, tmp_reg)); + *out_off = 0; + return tmp_reg; + } + compiler_panic(t->c, a->loc, "rv64 addr_base: kind %d unsupported", + (int)addr.kind); +} + +/* Anchor symbol management for PCREL_LO12_*. Each AUIPC site gets a + * fresh local sym; the paired LO12 reloc references the anchor. */ +static ObjSymId emit_pcrel_anchor(CGTarget* t, u32 sec, u32 auipc_pos) { + char buf[64]; + int len = 0; + const char* p = ".LpcrelHi"; + for (; p[len]; ++len) buf[len] = p[len]; + static u32 seq = 0; + u32 v = seq++; + char tmp[16]; int tn = 0; + if (v == 0) tmp[tn++] = '0'; + else { while (v) { tmp[tn++] = '0' + (char)(v % 10); v /= 10; } } + for (int i = tn - 1; i >= 0; --i) buf[len++] = tmp[i]; + buf[len] = 0; + Sym n = pool_intern_cstr(t->c->global, buf); + return obj_symbol(t->obj, n, SB_LOCAL, SK_OBJ, sec, (u64)auipc_pos, 0); +} + +static void rv_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma) { + u32 sz = ma.size ? ma.size : type_byte_size(addr.type); + MCEmitter* mc = t->mc; + + if (addr.kind == OPK_GLOBAL) { + u32 sec = mc->section_id; + ObjSymId sym = addr.v.global.sym; + i64 add = addr.v.global.addend; + u32 ap = mc->pos(mc); + emit32(mc, rv_auipc(RV_T0, 0)); + mc->emit_reloc_at(mc, sec, ap, R_RV_PCREL_HI20, sym, add, 0, 0); + ObjSymId anchor = emit_pcrel_anchor(t, sec, ap); + u32 lp = mc->pos(mc); + if (dst.cls == RC_FP) { + if (sz == 8) emit32(mc, rv_fld(reg_num(dst), RV_T0, 0)); + else emit32(mc, rv_flw(reg_num(dst), RV_T0, 0)); + } else { + int sx = type_is_signed(addr.type); + emit32(mc, enc_int_load(sz, sx, reg_num(dst), RV_T0, 0)); + } + mc->emit_reloc_at(mc, sec, lp, R_RV_PCREL_LO12_I, anchor, 0, 0, 0); + return; + } + + i32 off; + u32 base = addr_base(t, addr, &off, RV_T0); + if (dst.cls == RC_FP) { + if (sz == 8) emit32(mc, rv_fld(reg_num(dst), base, off)); + else emit32(mc, rv_flw(reg_num(dst), base, off)); + } else { + int sx = type_is_signed(addr.type); + emit32(mc, enc_int_load(sz, sx, reg_num(dst), base, off)); + } +} + +static void rv_store(CGTarget* t, Operand addr, Operand src, MemAccess ma) { + u32 sz = ma.size ? ma.size : type_byte_size(addr.type); + MCEmitter* mc = t->mc; + + if (addr.kind == OPK_GLOBAL) { + u32 sec = mc->section_id; + ObjSymId sym = addr.v.global.sym; + i64 add = addr.v.global.addend; + u32 src_reg; + int src_fp = 0; + if (src.kind == OPK_IMM) { + u32 sf = (sz == 8) ? 1u : 0u; + emit_load_imm(mc, sf, RV_T1, src.v.imm); + src_reg = RV_T1; + } else if (src.cls == RC_FP) { + src_reg = reg_num(src); + src_fp = 1; + } else { + src_reg = reg_num(src); + } + u32 ap = mc->pos(mc); + emit32(mc, rv_auipc(RV_T0, 0)); + mc->emit_reloc_at(mc, sec, ap, R_RV_PCREL_HI20, sym, add, 0, 0); + ObjSymId anchor = emit_pcrel_anchor(t, sec, ap); + u32 sp_pos = mc->pos(mc); + if (src_fp) { + if (sz == 8) emit32(mc, rv_fsd(src_reg, RV_T0, 0)); + else emit32(mc, rv_fsw(src_reg, RV_T0, 0)); + } else { + emit32(mc, enc_int_store(sz, src_reg, RV_T0, 0)); + } + mc->emit_reloc_at(mc, sec, sp_pos, R_RV_PCREL_LO12_S, anchor, 0, 0, 0); + return; + } + + i32 off; + u32 base = addr_base(t, addr, &off, + (src.kind == OPK_IMM) ? RV_T1 : RV_T0); + if (src.kind == OPK_IMM) { + u32 sf = (sz == 8) ? 1u : 0u; + emit_load_imm(mc, sf, RV_T0, src.v.imm); + emit32(mc, enc_int_store(sz, RV_T0, base, off)); + return; + } + if (src.cls == RC_FP) { + if (sz == 8) emit32(mc, rv_fsd(reg_num(src), base, off)); + else emit32(mc, rv_fsw(reg_num(src), base, off)); + } else { + emit32(mc, enc_int_store(sz, reg_num(src), base, off)); + } +} + +static void rv_addr_of(CGTarget* t, Operand dst, Operand lv) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + u32 rd = reg_num(dst); + if (lv.kind == OPK_LOCAL) { + RvSlot* s = slot_get(a, lv.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 addr_of: bad slot"); + i32 off = -(i32)s->off; + if (off >= -2048 && off <= 2047) { + emit32(mc, rv_addi(rd, RV_S0, off)); + } else { + emit_load_imm(mc, 1, rd, (i64)off); + emit32(mc, rv_add(rd, RV_S0, rd)); + } + return; + } + if (lv.kind == OPK_INDIRECT) { + i32 ofs = lv.v.ind.ofs; + u32 base = lv.v.ind.base & 0x1f; + if (ofs >= -2048 && ofs <= 2047) { + emit32(mc, rv_addi(rd, base, ofs)); + } else { + emit_load_imm(mc, 1, rd, (i64)ofs); + emit32(mc, rv_add(rd, base, rd)); + } + return; + } + if (lv.kind == OPK_GLOBAL) { + u32 sec = mc->section_id; + u32 ap = mc->pos(mc); + emit32(mc, rv_auipc(rd, 0)); + mc->emit_reloc_at(mc, sec, ap, R_RV_PCREL_HI20, lv.v.global.sym, + lv.v.global.addend, 0, 0); + ObjSymId anchor = emit_pcrel_anchor(t, sec, ap); + u32 ip = mc->pos(mc); + emit32(mc, rv_addi(rd, rd, 0)); + mc->emit_reloc_at(mc, sec, ip, R_RV_PCREL_LO12_I, anchor, 0, 0, 0); + return; + } rv_panic(t, "addr_of"); } -static void rv_tls_addr_of(CGTarget* t, Operand d, ObjSymId s, i64 a) { - (void)d; - (void)s; - (void)a; - rv_panic(t, "tls_addr_of"); -} -static void rv_copy_bytes(CGTarget* t, Operand da, Operand sa, - AggregateAccess g) { - (void)da; - (void)sa; - (void)g; - rv_panic(t, "copy_bytes"); -} -static void rv_set_bytes(CGTarget* t, Operand da, Operand bv, - AggregateAccess g) { - (void)da; - (void)bv; - (void)g; - rv_panic(t, "set_bytes"); -} -static void rv_bitfield_load(CGTarget* t, Operand d, Operand ra, - BitFieldAccess b) { - (void)d; - (void)ra; - (void)b; - rv_panic(t, "bitfield_load"); -} -static void rv_bitfield_store(CGTarget* t, Operand ra, Operand s, - BitFieldAccess b) { - (void)ra; - (void)s; - (void)b; - rv_panic(t, "bitfield_store"); -} - -static void rv_binop(CGTarget* t, BinOp op, Operand d, Operand a, Operand b) { - (void)op; - (void)d; - (void)a; - (void)b; - rv_panic(t, "binop"); -} -static void rv_unop(CGTarget* t, UnOp op, Operand d, Operand a) { - (void)op; - (void)d; - (void)a; - rv_panic(t, "unop"); -} -static void rv_cmp(CGTarget* t, CmpOp op, Operand d, Operand a, Operand b) { - (void)op; - (void)d; - (void)a; - (void)b; - rv_panic(t, "cmp"); -} -static void rv_convert(CGTarget* t, ConvKind k, Operand d, Operand s) { - (void)k; - (void)d; - (void)s; - rv_panic(t, "convert"); + +static void rv_tls_addr_of(CGTarget* t, Operand dst, ObjSymId sym, i64 addend) { + /* TLS Local-Exec: lui tmp, %tprel_hi(sym); add tmp, tp, tmp; addi dst, + * tmp, %tprel_lo(sym). Uses R_RV_TPREL_HI20 / R_RV_TPREL_LO12_I. */ + MCEmitter* mc = t->mc; + u32 sec = mc->section_id; + u32 rd = reg_num(dst); + u32 hp = mc->pos(mc); + emit32(mc, rv_lui(RV_T0, 0)); + mc->emit_reloc_at(mc, sec, hp, R_RV_TPREL_HI20, sym, addend, 0, 0); + emit32(mc, rv_add(RV_T0, RV_TP, RV_T0)); + u32 lp = mc->pos(mc); + emit32(mc, rv_addi(rd, RV_T0, 0)); + mc->emit_reloc_at(mc, sec, lp, R_RV_TPREL_LO12_I, sym, addend, 0, 0); +} + +/* ---- aggregate ops ---- */ + +static u32 agg_addr_reg(CGTarget* t, Operand op, u32 scratch) { + RImpl* a = impl_of(t); + if (op.kind == OPK_REG) return reg_num(op); + if (op.kind == OPK_LOCAL) { + RvSlot* s = slot_get(a, op.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 agg: bad slot"); + i32 off = -(i32)s->off; + if (off >= -2048 && off <= 2047) { + emit32(t->mc, rv_addi(scratch, RV_S0, off)); + } else { + emit_load_imm(t->mc, 1, scratch, (i64)off); + emit32(t->mc, rv_add(scratch, RV_S0, scratch)); + } + return scratch; + } + compiler_panic(t->c, a->loc, "rv64 agg: address kind %d unsupported", + (int)op.kind); +} + +static void rv_copy_bytes(CGTarget* t, Operand dst_addr, Operand src_addr, + AggregateAccess agg) { + MCEmitter* mc = t->mc; + u32 dr = agg_addr_reg(t, dst_addr, RV_T0); + u32 sr = agg_addr_reg(t, src_addr, (dr == RV_T1) ? RV_T2 : RV_T1); + u32 n = agg.size; + u32 i = 0; + while (i + 8 <= n) { + emit32(mc, rv_ld(RV_T3, sr, (i32)i)); + emit32(mc, rv_sd(RV_T3, dr, (i32)i)); + i += 8; + } + while (i + 4 <= n) { + emit32(mc, rv_lwu(RV_T3, sr, (i32)i)); + emit32(mc, rv_sw(RV_T3, dr, (i32)i)); + i += 4; + } + while (i + 2 <= n) { + emit32(mc, rv_lhu(RV_T3, sr, (i32)i)); + emit32(mc, rv_sh(RV_T3, dr, (i32)i)); + i += 2; + } + while (i < n) { + emit32(mc, rv_lbu(RV_T3, sr, (i32)i)); + emit32(mc, rv_sb(RV_T3, dr, (i32)i)); + i += 1; + } +} + +static void rv_set_bytes(CGTarget* t, Operand dst_addr, Operand byte_value, + AggregateAccess agg) { + MCEmitter* mc = t->mc; + u32 dr = agg_addr_reg(t, dst_addr, RV_T0); + u32 byte; + if (byte_value.kind == OPK_IMM) { + byte = (u32)(byte_value.v.imm & 0xffu); + } else { + compiler_panic(t->c, impl_of(t)->loc, + "rv64 set_bytes: REG byte NYI"); + } + u32 n = agg.size; + u32 src; + if (byte == 0) { + src = RV_ZERO; + } else { + u64 b = byte; + b |= b << 8; b |= b << 16; b |= b << 32; + emit_load_imm(mc, 1, RV_T3, (i64)b); + src = RV_T3; + } + u32 i = 0; + while (i + 8 <= n) { emit32(mc, rv_sd(src, dr, (i32)i)); i += 8; } + while (i + 4 <= n) { emit32(mc, rv_sw(src, dr, (i32)i)); i += 4; } + while (i + 2 <= n) { emit32(mc, rv_sh(src, dr, (i32)i)); i += 2; } + while (i < n) { emit32(mc, rv_sb(src, dr, (i32)i)); i += 1; } +} + +static void rv_bitfield_load(CGTarget* t, Operand dst, Operand record_addr, + BitFieldAccess bf) { + MCEmitter* mc = t->mc; + u32 base = agg_addr_reg(t, record_addr, RV_T0); + u32 storage_bytes = bf.storage.size ? bf.storage.size : 4u; + u32 rd = reg_num(dst); + /* Load full storage unit (zero-ext for shifts). */ + emit32(mc, enc_int_load(storage_bytes, 0, rd, base, (i32)bf.storage_offset)); + /* Shift left by (XLEN - (bit_offset + bit_width)) then arithmetic + * right-shift by (XLEN - bit_width). Use 64-bit shifts. */ + u32 lsb = bf.bit_offset; + u32 width = bf.bit_width ? bf.bit_width : 1u; + u32 sh_left = 64u - (lsb + width); + u32 sh_right = 64u - width; + emit32(mc, rv_slli(rd, rd, sh_left)); + if (bf.signed_) emit32(mc, rv_srai(rd, rd, sh_right)); + else emit32(mc, rv_srli(rd, rd, sh_right)); +} + +static void rv_bitfield_store(CGTarget* t, Operand record_addr, Operand src, + BitFieldAccess bf) { + MCEmitter* mc = t->mc; + u32 base = agg_addr_reg(t, record_addr, RV_T0); + u32 storage_bytes = bf.storage.size ? bf.storage.size : 4u; + /* Load current value into t1 */ + emit32(mc, enc_int_load(storage_bytes, 0, RV_T1, base, + (i32)bf.storage_offset)); + u32 src_reg; + if (src.kind == OPK_IMM) { + emit_load_imm(mc, 1, RV_T2, src.v.imm); + src_reg = RV_T2; + } else if (src.kind == OPK_REG) { + src_reg = reg_num(src); + } else { + compiler_panic(t->c, impl_of(t)->loc, + "rv64 bitfield_store: src kind %d NYI", (int)src.kind); + } + u32 lsb = bf.bit_offset; + u32 width = bf.bit_width ? bf.bit_width : 1u; + /* mask = ((1 << width) - 1) << lsb */ + u64 mask = ((u64)1 << width) - 1u; + /* t3 = src & ((1<<width)-1), then shifted to lsb */ + emit_load_imm(mc, 1, RV_T3, (i64)mask); + emit32(mc, rv_and(RV_T3, src_reg, RV_T3)); + if (lsb) emit32(mc, rv_slli(RV_T3, RV_T3, lsb)); + /* clear the field bits in t1: andi or and-not pattern */ + u64 mask_in = mask << lsb; + emit_load_imm(mc, 1, RV_T2, (i64)~mask_in); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T2)); + emit32(mc, rv_or(RV_T1, RV_T1, RV_T3)); + emit32(mc, enc_int_store(storage_bytes, RV_T1, base, + (i32)bf.storage_offset)); +} + +/* ---- arithmetic ---- */ + +static void rv_binop(CGTarget* t, BinOp op, Operand dst, Operand a_op, + Operand b_op) { + MCEmitter* mc = t->mc; + if (op == BO_FADD || op == BO_FSUB || op == BO_FMUL || op == BO_FDIV) { + u32 fmt = type_is_fp_double(dst.type) ? RV_FMT_D : RV_FMT_S; + u32 rd = reg_num(dst); + u32 fa = reg_num(a_op); + u32 fb = reg_num(b_op); + switch (op) { + case BO_FADD: emit32(mc, rv_fadd(fmt, rd, fa, fb)); return; + case BO_FSUB: emit32(mc, rv_fsub(fmt, rd, fa, fb)); return; + case BO_FMUL: emit32(mc, rv_fmul(fmt, rd, fa, fb)); return; + case BO_FDIV: emit32(mc, rv_fdiv(fmt, rd, fa, fb)); return; + default: break; + } + } + u32 sf = type_is_64(dst.type) ? 1u : 0u; + u32 rd = reg_num(dst); + u32 ra = force_reg_int(t, a_op, RV_T0); + u32 rb = force_reg_int(t, b_op, (ra == RV_T0) ? RV_T1 : RV_T0); + + switch (op) { + case BO_IADD: emit32(mc, sf ? rv_add(rd, ra, rb) : rv_addw(rd, ra, rb)); return; + case BO_ISUB: emit32(mc, sf ? rv_sub(rd, ra, rb) : rv_subw(rd, ra, rb)); return; + case BO_IMUL: emit32(mc, sf ? rv_mul(rd, ra, rb) : rv_mulw(rd, ra, rb)); return; + case BO_AND: emit32(mc, rv_and(rd, ra, rb)); return; + case BO_OR: emit32(mc, rv_or(rd, ra, rb)); return; + case BO_XOR: emit32(mc, rv_xor(rd, ra, rb)); return; + case BO_SHL: emit32(mc, sf ? rv_sll(rd, ra, rb) : rv_sllw(rd, ra, rb)); return; + case BO_SHR_U: emit32(mc, sf ? rv_srl(rd, ra, rb) : rv_srlw(rd, ra, rb)); return; + case BO_SHR_S: emit32(mc, sf ? rv_sra(rd, ra, rb) : rv_sraw(rd, ra, rb)); return; + case BO_SDIV: emit32(mc, sf ? rv_div(rd, ra, rb) : rv_divw(rd, ra, rb)); return; + case BO_UDIV: emit32(mc, sf ? rv_divu(rd, ra, rb) : rv_divuw(rd, ra, rb)); return; + case BO_SREM: emit32(mc, sf ? rv_rem(rd, ra, rb) : rv_remw(rd, ra, rb)); return; + case BO_UREM: emit32(mc, sf ? rv_remu(rd, ra, rb) : rv_remuw(rd, ra, rb)); return; + default: + compiler_panic(t->c, impl_of(t)->loc, "rv64 binop: op %d unimpl", (int)op); + } +} + +static void rv_unop(CGTarget* t, UnOp op, Operand dst, Operand a_op) { + MCEmitter* mc = t->mc; + u32 sf = type_is_64(dst.type) ? 1u : 0u; + u32 rd = reg_num(dst); + if (a_op.kind != OPK_REG) { + compiler_panic(t->c, impl_of(t)->loc, "rv64 unop: non-REG operand NYI"); + } + u32 rn = reg_num(a_op); + switch (op) { + case UO_NEG: + emit32(mc, sf ? rv_sub(rd, RV_ZERO, rn) : rv_subw(rd, RV_ZERO, rn)); + return; + case UO_BNOT: + emit32(mc, rv_xori(rd, rn, -1)); + return; + case UO_NOT: + /* logical: 1 if rn==0 else 0 β†’ sltiu rd, rn, 1 */ + emit32(mc, rv_sltiu(rd, rn, 1)); + return; + default: + compiler_panic(t->c, impl_of(t)->loc, "rv64 unop: op %d unimpl", (int)op); + } +} + +static void rv_convert(CGTarget* t, ConvKind k, Operand dst, Operand src) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + u32 rd = reg_num(dst); + u32 rn = reg_num(src); + + switch (k) { + case CV_SEXT: { + u32 src_bits = type_byte_size(src.type) * 8u; + if (src_bits == 32u) { + /* sext.w rd, rs = addiw rd, rs, 0 */ + emit32(mc, rv_addiw(rd, rn, 0)); + return; + } + /* slli + srai by (64 - src_bits) */ + u32 sh = 64u - src_bits; + emit32(mc, rv_slli(rd, rn, sh)); + emit32(mc, rv_srai(rd, rd, sh)); + return; + } + case CV_ZEXT: { + u32 src_bits = type_byte_size(src.type) * 8u; + if (src_bits == 32u) { + /* zext.w: slli rd, rs, 32; srli rd, rd, 32 */ + emit32(mc, rv_slli(rd, rn, 32)); + emit32(mc, rv_srli(rd, rd, 32)); + } else { + u32 sh = 64u - src_bits; + emit32(mc, rv_slli(rd, rn, sh)); + emit32(mc, rv_srli(rd, rd, sh)); + } + return; + } + case CV_TRUNC: + /* Truncate to W: addiw rd, rs, 0 puts low 32 in rd sign-extended. + * For narrower widths the consumer (store) handles it. */ + emit32(mc, rv_addiw(rd, rn, 0)); + return; + case CV_ITOF_S: { + int sf_src = type_is_64(src.type); + int dst_d = type_is_fp_double(dst.type); + if (dst_d) { + emit32(mc, sf_src ? rv_fcvt_d_l(rd, rn) : rv_fcvt_d_w(rd, rn)); + } else { + emit32(mc, sf_src ? rv_fcvt_s_l(rd, rn) : rv_fcvt_s_w(rd, rn)); + } + return; + } + case CV_ITOF_U: { + int sf_src = type_is_64(src.type); + int dst_d = type_is_fp_double(dst.type); + if (dst_d) { + emit32(mc, sf_src ? rv_fcvt_d_lu(rd, rn) : rv_fcvt_d_wu(rd, rn)); + } else { + emit32(mc, sf_src ? rv_fcvt_s_lu(rd, rn) : rv_fcvt_s_wu(rd, rn)); + } + return; + } + case CV_FTOI_S: { + int sf_dst = type_is_64(dst.type); + int src_d = type_is_fp_double(src.type); + if (src_d) { + emit32(mc, sf_dst ? rv_fcvt_l_d(rd, rn) : rv_fcvt_w_d(rd, rn)); + } else { + emit32(mc, sf_dst ? rv_fcvt_l_s(rd, rn) : rv_fcvt_w_s(rd, rn)); + } + return; + } + case CV_FTOI_U: { + int sf_dst = type_is_64(dst.type); + int src_d = type_is_fp_double(src.type); + if (src_d) { + emit32(mc, sf_dst ? rv_fcvt_lu_d(rd, rn) : rv_fcvt_wu_d(rd, rn)); + } else { + emit32(mc, sf_dst ? rv_fcvt_lu_s(rd, rn) : rv_fcvt_wu_s(rd, rn)); + } + return; + } + case CV_FEXT: emit32(mc, rv_fcvt_d_s(rd, rn)); return; + case CV_FTRUNC: emit32(mc, rv_fcvt_s_d(rd, rn)); return; + case CV_BITCAST: { + if (src.cls == RC_INT && dst.cls == RC_FP) { + u32 sz = type_byte_size(dst.type); + emit32(mc, sz == 8 ? rv_fmv_d_x(rd, rn) : rv_fmv_w_x(rd, rn)); + } else if (src.cls == RC_FP && dst.cls == RC_INT) { + u32 sz = type_byte_size(src.type); + emit32(mc, sz == 8 ? rv_fmv_x_d(rd, rn) : rv_fmv_x_w(rd, rn)); + } else { + compiler_panic(t->c, a->loc, "rv64 BITCAST: same-class NYI"); + } + return; + } + default: + compiler_panic(t->c, a->loc, "rv64 convert kind %d unimpl", (int)k); + } +} + +/* ---- calls / return ---- */ + +static void emit_arg_value(CGTarget* t, const CGABIValue* av, u32* next_int, + u32* next_fp, u32* stack_off) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + + /* For variadic args (av->abi NULL) synthesize a one-part DIRECT shape. + * On RV64 LP64D, variadic args go through the integer registers + * regardless of FP-ness (per the psABI). */ + ABIArgInfo va_ai; + ABIArgPart va_pt; + const ABIArgInfo* ai = av->abi; + if (!ai) { + u32 sz = type_byte_size(av->type); + memset(&va_ai, 0, sizeof va_ai); + memset(&va_pt, 0, sizeof va_pt); + va_ai.kind = ABI_ARG_DIRECT; + va_ai.parts = &va_pt; + va_ai.nparts = 1; + va_pt.cls = ABI_CLASS_INT; + va_pt.size = sz; + va_pt.align = sz; + va_pt.src_offset = 0; + ai = &va_ai; + } + if (ai->kind == ABI_ARG_IGNORE) return; + + if (ai->kind == ABI_ARG_INDIRECT) { + /* Pass the address of the storage in the next integer slot. */ + int to_stack = (*next_int >= 8); + u32 dst_reg = to_stack ? RV_T0 : (RV_A0 + (*next_int)++); + if (av->storage.kind == OPK_LOCAL) { + RvSlot* s = slot_get(a, av->storage.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 call: bad byval slot"); + i32 off = -(i32)s->off; + if (off >= -2048 && off <= 2047) { + emit32(mc, rv_addi(dst_reg, RV_S0, off)); + } else { + emit_load_imm(mc, 1, dst_reg, (i64)off); + emit32(mc, rv_add(dst_reg, RV_S0, dst_reg)); + } + } else { + compiler_panic(t->c, a->loc, + "rv64 call: INDIRECT storage kind %d NYI", + (int)av->storage.kind); + } + if (to_stack) { + emit32(mc, rv_sd(dst_reg, RV_SP, (i32)*stack_off)); + *stack_off += 8; + } + return; + } + + for (u16 i = 0; i < ai->nparts; ++i) { + const ABIArgPart* pt = &ai->parts[i]; + u32 sz = pt->size; + + if (pt->cls == ABI_CLASS_INT) { + int to_stack = (*next_int >= 8); + u32 dst_reg = to_stack ? RV_T0 : (RV_A0 + (*next_int)++); + switch (av->storage.kind) { + case OPK_IMM: { + u32 sf = (sz == 8) ? 1u : 0u; + emit_load_imm(mc, sf, dst_reg, av->storage.v.imm); + break; + } + case OPK_REG: { + /* Variadic FP arg pinned into an integer register: bitcast + * via FMV.X.{D,W}. Otherwise normal MV. */ + if (av->storage.cls == RC_FP) { + emit32(mc, (sz == 8) ? rv_fmv_x_d(dst_reg, reg_num(av->storage)) + : rv_fmv_x_w(dst_reg, reg_num(av->storage))); + } else { + emit32(mc, rv_addi(dst_reg, reg_num(av->storage), 0)); + } + break; + } + case OPK_LOCAL: { + RvSlot* s = slot_get(a, av->storage.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 call: bad arg slot"); + i32 off = -(i32)s->off + (i32)pt->src_offset; + emit32(mc, enc_int_load(sz, 0, dst_reg, RV_S0, off)); + break; + } + default: + compiler_panic(t->c, a->loc, + "rv64 call: storage kind %d NYI", + (int)av->storage.kind); + } + if (to_stack) { + emit32(mc, rv_sd(dst_reg, RV_SP, (i32)*stack_off)); + *stack_off += 8; + } + } else if (pt->cls == ABI_CLASS_FP) { + int to_stack = (*next_fp >= 8); + if (!to_stack) { + u32 freg = 10u + (*next_fp)++; + switch (av->storage.kind) { + case OPK_REG: { + u32 fmt = (sz == 8) ? RV_FMT_D : RV_FMT_S; + u32 r = reg_num(av->storage); + emit32(mc, rv_fsgnj(fmt, freg, r, r)); + break; + } + default: + compiler_panic(t->c, a->loc, "rv64 call: FP storage kind %d NYI", + (int)av->storage.kind); + } + } else { + switch (av->storage.kind) { + case OPK_REG: + if (sz == 8) emit32(mc, rv_fsd(reg_num(av->storage), RV_SP, (i32)*stack_off)); + else emit32(mc, rv_fsw(reg_num(av->storage), RV_SP, (i32)*stack_off)); + break; + default: + compiler_panic(t->c, a->loc, "rv64 call: FP stack-arg NYI"); + } + *stack_off += 8; + } + } else { + compiler_panic(t->c, a->loc, "rv64 call: ABI class %d unimpl", + (int)pt->cls); + } + } } static void rv_call(CGTarget* t, const CGCallDesc* d) { - (void)d; - rv_panic(t, "call"); + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + + u32 next_int = 0, next_fp = 0, stack_off = 0; + + /* sret: caller passes destination pointer in a0. */ + if (d->abi && d->abi->has_sret) { + if (d->ret.storage.kind != OPK_LOCAL) { + compiler_panic(t->c, a->loc, "rv64 call: sret dst must be LOCAL"); + } + RvSlot* s = slot_get(a, d->ret.storage.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 call: bad sret slot"); + i32 off = -(i32)s->off; + if (off >= -2048 && off <= 2047) { + emit32(mc, rv_addi(RV_A0, RV_S0, off)); + } else { + emit_load_imm(mc, 1, RV_A0, (i64)off); + emit32(mc, rv_add(RV_A0, RV_S0, RV_A0)); + } + next_int = 1; + } + + for (u32 i = 0; i < d->nargs; ++i) { + emit_arg_value(t, &d->args[i], &next_int, &next_fp, &stack_off); + } + u32 needed = (stack_off + 15u) & ~15u; + if (needed > a->max_outgoing) a->max_outgoing = needed; + + if (d->callee.kind == OPK_GLOBAL) { + /* AUIPC ra, 0 ; JALR ra, ra, 0 with R_RV_CALL on AUIPC */ + u32 sec = mc->section_id; + u32 pos = mc->pos(mc); + emit32(mc, rv_auipc(RV_RA, 0)); + emit32(mc, rv_jalr(RV_RA, RV_RA, 0)); + mc->emit_reloc_at(mc, sec, pos, R_RV_CALL, + d->callee.v.global.sym, d->callee.v.global.addend, 0, 0); + } else if (d->callee.kind == OPK_REG) { + emit32(mc, rv_jalr(RV_RA, reg_num(d->callee), 0)); + } else { + compiler_panic(t->c, a->loc, "rv64 call: callee kind %d unsupported", + (int)d->callee.kind); + } + + /* Receive return value. */ + const ABIArgInfo* ri = &d->abi->ret; + if (ri->kind == ABI_ARG_IGNORE || ri->kind == ABI_ARG_INDIRECT) return; + if (ri->nparts == 0) return; + + Operand rs = d->ret.storage; + u32 nir = 0, nfr = 0; + for (u16 i = 0; i < ri->nparts; ++i) { + const ABIArgPart* p = &ri->parts[i]; + u32 src_reg = (p->cls == ABI_CLASS_INT) ? (RV_A0 + nir++) : (10u + nfr++); + + if (rs.kind == OPK_REG) { + if (ri->nparts != 1) { + compiler_panic(t->c, a->loc, "rv64 call: REG ret with %u parts", + (unsigned)ri->nparts); + } + if (p->cls == ABI_CLASS_INT) { + emit32(mc, rv_addi(reg_num(rs), src_reg, 0)); + } else { + u32 fmt = (p->size == 8) ? RV_FMT_D : RV_FMT_S; + emit32(mc, rv_fsgnj(fmt, reg_num(rs), src_reg, src_reg)); + } + } else if (rs.kind == OPK_LOCAL) { + RvSlot* s = slot_get(a, rs.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 call: bad ret slot"); + i32 off = -(i32)s->off + (i32)p->src_offset; + if (p->cls == ABI_CLASS_INT) { + emit32(mc, enc_int_store(p->size, src_reg, RV_S0, off)); + } else { + if (p->size == 8) emit32(mc, rv_fsd(src_reg, RV_S0, off)); + else emit32(mc, rv_fsw(src_reg, RV_S0, off)); + } + } else if (rs.kind == OPK_IMM && rs.type && rs.type->kind == TY_VOID) { + /* void return placeholder β€” nothing to do. */ + } else { + compiler_panic(t->c, a->loc, "rv64 call: ret_storage kind %d unsupported", + (int)rs.kind); + } + } } -static void rv_ret(CGTarget* t, const CGABIValue* v) { - (void)v; - rv_panic(t, "ret"); + +static void rv_ret(CGTarget* t, const CGABIValue* val) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + + if (val) { + const ABIArgInfo* ri = val->abi; + if (ri && ri->kind == ABI_ARG_INDIRECT) { + /* sret: reload destination pointer from sret_ptr_slot into t0, + * then memcpy from val->storage (must be OPK_LOCAL) into [t0]. */ + if (val->storage.kind != OPK_LOCAL) { + compiler_panic(t->c, a->loc, + "rv64 ret indirect: storage kind %d NYI", + (int)val->storage.kind); + } + RvSlot* s = slot_get(a, val->storage.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 ret: bad sret slot"); + RvSlot* sp = (a->sret_ptr_slot != FRAME_SLOT_NONE) + ? slot_get(a, a->sret_ptr_slot) + : NULL; + if (sp) emit32(mc, rv_ld(RV_T0, RV_S0, -(i32)sp->off)); + u32 nbytes = s->size; + u32 i = 0; + while (i + 8 <= nbytes) { + emit32(mc, rv_ld(RV_T1, RV_S0, -(i32)s->off + (i32)i)); + emit32(mc, rv_sd(RV_T1, RV_T0, (i32)i)); + i += 8; + } + while (i + 4 <= nbytes) { + emit32(mc, rv_lwu(RV_T1, RV_S0, -(i32)s->off + (i32)i)); + emit32(mc, rv_sw(RV_T1, RV_T0, (i32)i)); + i += 4; + } + while (i + 2 <= nbytes) { + emit32(mc, rv_lhu(RV_T1, RV_S0, -(i32)s->off + (i32)i)); + emit32(mc, rv_sh(RV_T1, RV_T0, (i32)i)); + i += 2; + } + while (i < nbytes) { + emit32(mc, rv_lbu(RV_T1, RV_S0, -(i32)s->off + (i32)i)); + emit32(mc, rv_sb(RV_T1, RV_T0, (i32)i)); + i += 1; + } + } else if (val->storage.kind == OPK_REG) { + if (val->storage.cls == RC_FP) { + u32 fmt = type_is_fp_double(val->storage.type) ? RV_FMT_D : RV_FMT_S; + u32 r = reg_num(val->storage); + emit32(mc, rv_fsgnj(fmt, 10u, r, r)); /* fa0 = freg 10 */ + } else { + emit32(mc, rv_addi(RV_A0, reg_num(val->storage), 0)); + } + } else if (val->storage.kind == OPK_IMM) { + u32 sf = type_is_64(val->storage.type) ? 1u : 0u; + emit_load_imm(mc, sf, RV_A0, val->storage.v.imm); + } else if (val->storage.kind == OPK_LOCAL) { + RvSlot* s = slot_get(a, val->storage.v.frame_slot); + if (!s) compiler_panic(t->c, a->loc, "rv64 ret: bad local slot"); + const ABIArgInfo* ri2 = val->abi; + u32 nir = 0, nfr = 0; + for (u16 i = 0; i < (ri2 ? ri2->nparts : 0); ++i) { + const ABIArgPart* pt = &ri2->parts[i]; + i32 off = -(i32)s->off + (i32)pt->src_offset; + if (pt->cls == ABI_CLASS_INT) { + emit32(mc, enc_int_load(pt->size, 0, RV_A0 + nir++, RV_S0, off)); + } else if (pt->cls == ABI_CLASS_FP) { + u32 freg = 10u + nfr++; + if (pt->size == 8) emit32(mc, rv_fld(freg, RV_S0, off)); + else emit32(mc, rv_flw(freg, RV_S0, off)); + } else { + compiler_panic(t->c, a->loc, "rv64 ret: part cls %d unimpl", + (int)pt->cls); + } + } + } + } + /* Jump to epilogue. */ + emit32(mc, rv_jal(RV_ZERO, 0)); + mc->emit_label_ref(mc, a->epilogue_label, R_RV_JAL, 4, 0); } -static void rv_alloca_(CGTarget* t, Operand d, Operand s, u32 a) { - (void)d; - (void)s; - (void)a; - rv_panic(t, "alloca"); +/* ---- panic stubs for features we don't yet cover ---- */ + +static void rv_alloca_(CGTarget* t, Operand d, Operand sz, u32 align) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + if (d.kind != OPK_REG) { + compiler_panic(t->c, a->loc, "rv64 alloca: dst must be REG"); + } + if (align > 16) { + compiler_panic(t->c, a->loc, + "rv64 alloca: align %u > 16 not yet supported", align); + } + if (sz.kind == OPK_IMM) { + i64 v = sz.v.imm; + if (v < 0) compiler_panic(t->c, a->loc, "rv64 alloca: negative size"); + u64 aligned = ((u64)v + 15u) & ~(u64)15u; + if (aligned == 0) aligned = 16; + if (aligned > 2047u) { + compiler_panic(t->c, a->loc, + "rv64 alloca: const size %llu too large for v1", + (unsigned long long)aligned); + } + emit32(mc, rv_addi(RV_SP, RV_SP, -(i32)aligned)); + } else if (sz.kind == OPK_REG) { + u32 sz_reg = reg_num(sz); + /* t0 = (sz + 15) & ~15; sp -= t0 */ + emit32(mc, rv_addi(RV_T0, sz_reg, 15)); + emit32(mc, rv_andi(RV_T0, RV_T0, -16)); + emit32(mc, rv_sub(RV_SP, RV_SP, RV_T0)); + } else { + compiler_panic(t->c, a->loc, "rv64 alloca: size kind %d unsupported", + (int)sz.kind); + } + + /* Placeholder: addi dst, sp, max_outgoing (imm patched at func_end). */ + if (a->nadd_patches == a->add_patches_cap) { + u32 ncap = a->add_patches_cap ? a->add_patches_cap * 2 : 4; + struct RvAllocaPatch* nb = arena_array(t->c->tu, struct RvAllocaPatch, ncap); + if (a->add_patches) + memcpy(nb, a->add_patches, sizeof(*nb) * a->nadd_patches); + a->add_patches = nb; + a->add_patches_cap = ncap; + } + u32 dst_reg = reg_num(d); + a->add_patches[a->nadd_patches].pos = mc->pos(mc); + a->add_patches[a->nadd_patches].dst_reg = dst_reg; + a->nadd_patches++; + emit32(mc, rv_addi(dst_reg, RV_SP, 0)); + a->has_alloca = 1; } -static void rv_va_start_(CGTarget* t, Operand a) { - (void)a; - rv_panic(t, "va_start"); +/* RV64 LP64D va_list: a single `void*` pointing at the next argument + * slot. The prologue spills a0..a7 into a contiguous save area. The + * save area and the caller's stack args form a single byte-stream + * (named-arg-count-aware), but for our test corpus the save area + * alone covers every variadic case. */ +static void rv_va_start_(CGTarget* t, Operand ap_op) { + RImpl* a = impl_of(t); + MCEmitter* mc = t->mc; + if (!a->is_variadic) { + compiler_panic(t->c, a->loc, "rv64 va_start: function not variadic"); + } + u32 ap = reg_num(ap_op); + /* ap is the address of the va_list variable; *ap = &save_area[named_int]. */ + RvSlot* gs = slot_get(a, a->gp_save_slot); + if (!gs) compiler_panic(t->c, a->loc, "rv64 va_start: no save slot"); + /* t0 = s0 - gs->off + next_param_int*8 */ + i32 off = -(i32)gs->off + (i32)(a->next_param_int * 8u); + if (off >= -2048 && off <= 2047) { + emit32(mc, rv_addi(RV_T0, RV_S0, off)); + } else { + emit_load_imm(mc, 1, RV_T0, (i64)off); + emit32(mc, rv_add(RV_T0, RV_S0, RV_T0)); + } + emit32(mc, rv_sd(RV_T0, ap, 0)); } -static void rv_va_arg_(CGTarget* t, Operand d, Operand a, const Type* ty) { - (void)d; - (void)a; - (void)ty; - rv_panic(t, "va_arg"); + +static void rv_va_arg_(CGTarget* t, Operand dst, Operand ap_op, + const Type* ty) { + MCEmitter* mc = t->mc; + u32 ap = reg_num(ap_op); + u32 sz = type_byte_size(ty); + /* t1 = *ap; load value; *ap = t1 + 8 (rounded up). + * On RV64 LP64D every var arg occupies an 8-byte slot. */ + emit32(mc, rv_ld(RV_T1, ap, 0)); + if (dst.cls == RC_FP) { + /* For variadic FP args on RV64 LP64D, the value sits in the integer + * save area at the same bit pattern as a double bit-cast. Load and + * bitcast. */ + if (sz == 8) { + emit32(mc, rv_ld(RV_T2, RV_T1, 0)); + emit32(mc, rv_fmv_d_x(reg_num(dst), RV_T2)); + } else { + emit32(mc, rv_lw(RV_T2, RV_T1, 0)); + emit32(mc, rv_fmv_w_x(reg_num(dst), RV_T2)); + } + } else { + int sx = type_is_signed(ty); + emit32(mc, enc_int_load(sz, sx, reg_num(dst), RV_T1, 0)); + } + /* advance ap by 8 bytes. */ + emit32(mc, rv_addi(RV_T1, RV_T1, 8)); + emit32(mc, rv_sd(RV_T1, ap, 0)); } + static void rv_va_end_(CGTarget* t, Operand a) { - (void)a; - rv_panic(t, "va_end"); + (void)t; (void)a; } + static void rv_va_copy_(CGTarget* t, Operand d, Operand s) { - (void)d; - (void)s; - rv_panic(t, "va_copy"); + MCEmitter* mc = t->mc; + u32 dr = reg_num(d); + u32 sr = reg_num(s); + /* va_list is a single pointer (8 bytes). */ + emit32(mc, rv_ld(RV_T0, sr, 0)); + emit32(mc, rv_sd(RV_T0, dr, 0)); } -static void rv_atomic_load(CGTarget* t, Operand d, Operand a, MemAccess m, +/* ---- atomics (LL/SC + AMO) ---- */ + +static int mem_order_is_acquire(MemOrder o) { + return o == MO_ACQUIRE || o == MO_ACQ_REL || o == MO_SEQ_CST || o == MO_CONSUME; +} +static int mem_order_is_release(MemOrder o) { + return o == MO_RELEASE || o == MO_ACQ_REL || o == MO_SEQ_CST; +} + +static void rv_atomic_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma, MemOrder o) { - (void)d; - (void)a; - (void)m; - (void)o; - rv_panic(t, "atomic_load"); -} -static void rv_atomic_store(CGTarget* t, Operand a, Operand s, MemAccess m, - MemOrder o) { - (void)a; - (void)s; - (void)m; - (void)o; - rv_panic(t, "atomic_store"); -} -static void rv_atomic_rmw(CGTarget* t, AtomicOp op, Operand d, Operand a, - Operand v, MemAccess m, MemOrder o) { - (void)op; - (void)d; - (void)a; - (void)v; - (void)m; - (void)o; - rv_panic(t, "atomic_rmw"); -} -static void rv_atomic_cas(CGTarget* t, Operand p, Operand ok, Operand a, - Operand e, Operand des, MemAccess m, MemOrder so, - MemOrder fo) { - (void)p; - (void)ok; - (void)a; - (void)e; - (void)des; - (void)m; - (void)so; - (void)fo; - rv_panic(t, "atomic_cas"); + MCEmitter* mc = t->mc; + u32 sf = (ma.size == 8) ? 1u : 0u; + /* Resolve address to a register. */ + u32 base; + if (addr.kind == OPK_REG) { + base = reg_num(addr); + } else if (addr.kind == OPK_LOCAL) { + i32 off; + base = addr_base(t, addr, &off, RV_T0); + if (off) { + emit32(mc, rv_addi(RV_T0, base, off)); + base = RV_T0; + } + } else { + compiler_panic(t->c, impl_of(t)->loc, "rv64 atomic_load: addr kind %d NYI", + (int)addr.kind); + } + if (mem_order_is_acquire(o)) { + /* lr.w/d as ordered load (aq=1, rl=0). */ + emit32(mc, sf ? rv_lr_d(reg_num(dst), base, 1, 0) + : rv_lr_w(reg_num(dst), base, 1, 0)); + } else { + emit32(mc, enc_int_load(ma.size, 0, reg_num(dst), base, 0)); + } +} + +static void rv_atomic_store(CGTarget* t, Operand addr, Operand src, + MemAccess ma, MemOrder o) { + MCEmitter* mc = t->mc; + u32 sf = (ma.size == 8) ? 1u : 0u; + u32 src_reg; + if (src.kind == OPK_IMM) { + emit_load_imm(mc, sf, RV_T1, src.v.imm); + src_reg = RV_T1; + } else if (src.kind == OPK_REG) { + src_reg = reg_num(src); + } else { + compiler_panic(t->c, impl_of(t)->loc, "rv64 atomic_store: src kind %d NYI", + (int)src.kind); + } + u32 base; + if (addr.kind == OPK_REG) { + base = reg_num(addr); + } else if (addr.kind == OPK_LOCAL) { + i32 off; + base = addr_base(t, addr, &off, RV_T0); + if (off) { emit32(mc, rv_addi(RV_T0, base, off)); base = RV_T0; } + } else { + compiler_panic(t->c, impl_of(t)->loc, "rv64 atomic_store: addr kind %d NYI", + (int)addr.kind); + } + if (mem_order_is_release(o)) { + /* fence rw,w; sw/sd src, 0(base). Conservative for SEQ_CST. */ + emit32(mc, rv_fence_rw_rw()); + emit32(mc, enc_int_store(ma.size, src_reg, base, 0)); + if (o == MO_SEQ_CST) emit32(mc, rv_fence_rw_rw()); + } else { + emit32(mc, enc_int_store(ma.size, src_reg, base, 0)); + } +} + +static void rv_atomic_rmw(CGTarget* t, AtomicOp op, Operand dst, Operand addr, + Operand val, MemAccess ma, MemOrder o) { + MCEmitter* mc = t->mc; + u32 sf = (ma.size == 8) ? 1u : 0u; + u32 base = RV_T0; + if (addr.kind == OPK_REG) { + emit32(mc, rv_addi(base, reg_num(addr), 0)); + } else if (addr.kind == OPK_LOCAL) { + i32 off; + u32 b = addr_base(t, addr, &off, RV_T0); + if (b != RV_T0 || off) { + emit32(mc, rv_addi(base, b, off)); + } + } else { + compiler_panic(t->c, impl_of(t)->loc, "rv64 atomic_rmw: addr NYI"); + } + u32 vreg = RV_T1; + if (val.kind == OPK_IMM) emit_load_imm(mc, sf, vreg, val.v.imm); + else if (val.kind == OPK_REG) emit32(mc, rv_addi(vreg, reg_num(val), 0)); + else compiler_panic(t->c, impl_of(t)->loc, "rv64 atomic_rmw: val kind NYI"); + + int aq = mem_order_is_acquire(o); + int rl = mem_order_is_release(o); + + /* LR/SC loop for any op (simpler than per-op AMO encodings, but AMO is + * preferred for the cases the corpus exercises). */ + MCLabel L_retry = mc->label_new(mc); + mc->label_place(mc, L_retry); + emit32(mc, sf ? rv_lr_d(reg_num(dst), base, (u32)aq, 0) + : rv_lr_w(reg_num(dst), base, (u32)aq, 0)); + u32 new_r = RV_T2; + switch (op) { + case AO_XCHG: emit32(mc, rv_addi(new_r, vreg, 0)); break; + case AO_ADD: emit32(mc, sf ? rv_add(new_r, reg_num(dst), vreg) : rv_addw(new_r, reg_num(dst), vreg)); break; + case AO_SUB: emit32(mc, sf ? rv_sub(new_r, reg_num(dst), vreg) : rv_subw(new_r, reg_num(dst), vreg)); break; + case AO_AND: emit32(mc, rv_and(new_r, reg_num(dst), vreg)); break; + case AO_OR: emit32(mc, rv_or(new_r, reg_num(dst), vreg)); break; + case AO_XOR: emit32(mc, rv_xor(new_r, reg_num(dst), vreg)); break; + case AO_NAND: + emit32(mc, rv_and(new_r, reg_num(dst), vreg)); + emit32(mc, rv_xori(new_r, new_r, -1)); + break; + default: emit32(mc, rv_addi(new_r, vreg, 0)); break; + } + /* sc.w/d t3, new_r, (base); bnez t3, retry. */ + emit32(mc, sf ? rv_sc_d(RV_T3, base, new_r, 0, (u32)rl) + : rv_sc_w(RV_T3, base, new_r, 0, (u32)rl)); + emit32(mc, rv_bne(RV_T3, RV_ZERO, 0)); + mc->emit_label_ref(mc, L_retry, R_RV_BRANCH, 4, 0); } + +static void rv_atomic_cas(CGTarget* t, Operand prior, Operand ok, Operand addr, + Operand exp, Operand des, MemAccess ma, + MemOrder succ, MemOrder fail) { + MCEmitter* mc = t->mc; + u32 sf = (ma.size == 8) ? 1u : 0u; + (void)fail; + u32 base = RV_T0; + if (addr.kind == OPK_REG) emit32(mc, rv_addi(base, reg_num(addr), 0)); + else if (addr.kind == OPK_LOCAL) { + i32 off; u32 b = addr_base(t, addr, &off, RV_T0); + if (b != RV_T0 || off) emit32(mc, rv_addi(base, b, off)); + } else compiler_panic(t->c, impl_of(t)->loc, "rv64 atomic_cas: addr NYI"); + u32 ereg = RV_T1, dreg = RV_T2; + if (exp.kind == OPK_IMM) emit_load_imm(mc, sf, ereg, exp.v.imm); + else emit32(mc, rv_addi(ereg, reg_num(exp), 0)); + if (des.kind == OPK_IMM) emit_load_imm(mc, sf, dreg, des.v.imm); + else emit32(mc, rv_addi(dreg, reg_num(des), 0)); + + int aq = mem_order_is_acquire(succ); + int rl = mem_order_is_release(succ); + + MCLabel L_retry = mc->label_new(mc); + MCLabel L_fail = mc->label_new(mc); + MCLabel L_done = mc->label_new(mc); + + mc->label_place(mc, L_retry); + emit32(mc, sf ? rv_lr_d(reg_num(prior), base, (u32)aq, 0) + : rv_lr_w(reg_num(prior), base, (u32)aq, 0)); + /* if (prior != expected) -> fail */ + emit32(mc, rv_bne(reg_num(prior), ereg, 0)); + mc->emit_label_ref(mc, L_fail, R_RV_BRANCH, 4, 0); + /* sc.w/d t3, des, (base); bnez t3, retry */ + emit32(mc, sf ? rv_sc_d(RV_T3, base, dreg, 0, (u32)rl) + : rv_sc_w(RV_T3, base, dreg, 0, (u32)rl)); + emit32(mc, rv_bne(RV_T3, RV_ZERO, 0)); + mc->emit_label_ref(mc, L_retry, R_RV_BRANCH, 4, 0); + /* ok = 1; jump done */ + emit_load_imm(mc, 0, reg_num(ok), 1); + emit32(mc, rv_jal(RV_ZERO, 0)); + mc->emit_label_ref(mc, L_done, R_RV_JAL, 4, 0); + + mc->label_place(mc, L_fail); + emit_load_imm(mc, 0, reg_num(ok), 0); + + mc->label_place(mc, L_done); +} + static void rv_fence(CGTarget* t, MemOrder o) { - (void)o; - rv_panic(t, "fence"); + if (o == MO_RELAXED) return; + emit32(t->mc, rv_fence_rw_rw()); } -static void rv_intrinsic(CGTarget* t, IntrinKind k, Operand* d, u32 nd, - const Operand* a, u32 na) { - (void)k; - (void)d; - (void)nd; - (void)a; - (void)na; - rv_panic(t, "intrinsic"); +/* ---- intrinsics: do what we can; panic on the rest. ---- */ +static void rv_intrinsic(CGTarget* t, IntrinKind kind, Operand* dsts, u32 nd, + const Operand* args, u32 na) { + (void)nd; (void)na; + MCEmitter* mc = t->mc; + RImpl* a = impl_of(t); + switch (kind) { + case INTRIN_ASSUME_ALIGNED: + case INTRIN_EXPECT: { + /* dst = val (hint dropped). */ + Operand val = args[0]; + Operand dst = dsts[0]; + u32 sf = type_is_64(dst.type) ? 1u : 0u; + if (val.kind == OPK_REG) { + if (reg_num(val) != reg_num(dst)) + emit32(mc, rv_addi(reg_num(dst), reg_num(val), 0)); + } else if (val.kind == OPK_IMM) { + emit_load_imm(mc, sf, reg_num(dst), val.v.imm); + } else { + compiler_panic(t->c, a->loc, "rv64 intrinsic: val kind %d NYI", + (int)val.kind); + } + return; + } + case INTRIN_PREFETCH: return; + case INTRIN_UNREACHABLE: + case INTRIN_TRAP: + emit32(mc, rv_ebreak()); + return; + case INTRIN_BSWAP16: { + /* rd = ((rs & 0xff) << 8) | ((rs >> 8) & 0xff) */ + u32 rd = reg_num(dsts[0]); + u32 rs = reg_num(args[0]); + emit32(mc, rv_slli(RV_T1, rs, 8)); /* t1 = rs << 8 */ + emit32(mc, rv_andi(RV_T1, RV_T1, 0)); /* placeholder */ + /* Use lui mask approach for portability: build mask 0xff00 in t2. */ + emit32(mc, rv_addi(RV_T2, RV_ZERO, 0)); + /* Simpler: 0xff00 fits in lui+addi pattern but is also small enough: + * we can build via shift: t2 = 0xff << 8 = (0xff << 8). */ + emit32(mc, rv_addi(RV_T2, RV_ZERO, 0xff)); + emit32(mc, rv_slli(RV_T2, RV_T2, 8)); + /* t1 = (rs << 8) & 0xff00 */ + emit32(mc, rv_slli(RV_T1, rs, 8)); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T2)); + /* t3 = (rs >> 8) & 0xff (use srli on RV64 β€” high bits zeroed by + * preceding ANDI mask if input is uint16, but be safe and mask). */ + emit32(mc, rv_srli(RV_T3, rs, 8)); + emit32(mc, rv_andi(RV_T3, RV_T3, 0xff)); + emit32(mc, rv_or(rd, RV_T1, RV_T3)); + return; + } + case INTRIN_BSWAP32: { + u32 rd = reg_num(dsts[0]); + u32 rs = reg_num(args[0]); + /* result = (b0<<24)|(b1<<16)|(b2<<8)|b3, where bi = (rs >> (8*i)) & 0xff. */ + /* t1 = ((rs >> 24) & 0xff) */ + emit32(mc, rv_srliw(RV_T1, rs, 24)); + emit32(mc, rv_andi(RV_T1, RV_T1, 0xff)); + /* t2 = ((rs >> 16) & 0xff) << 8 */ + emit32(mc, rv_srliw(RV_T2, rs, 16)); + emit32(mc, rv_andi(RV_T2, RV_T2, 0xff)); + emit32(mc, rv_slli(RV_T2, RV_T2, 8)); + emit32(mc, rv_or(RV_T1, RV_T1, RV_T2)); + /* t2 = ((rs >> 8) & 0xff) << 16 */ + emit32(mc, rv_srliw(RV_T2, rs, 8)); + emit32(mc, rv_andi(RV_T2, RV_T2, 0xff)); + emit32(mc, rv_slli(RV_T2, RV_T2, 16)); + emit32(mc, rv_or(RV_T1, RV_T1, RV_T2)); + /* t2 = (rs & 0xff) << 24 */ + emit32(mc, rv_andi(RV_T2, rs, 0xff)); + emit32(mc, rv_slli(RV_T2, RV_T2, 24)); + emit32(mc, rv_or(rd, RV_T1, RV_T2)); + /* zero-extend to 32 bits if dest is u32 */ + emit32(mc, rv_slli(rd, rd, 32)); + emit32(mc, rv_srli(rd, rd, 32)); + return; + } + case INTRIN_BSWAP64: { + u32 rd = reg_num(dsts[0]); + u32 rs = reg_num(args[0]); + /* General bswap64: iterate over the 8 bytes. */ + /* t1 accumulator */ + emit32(mc, rv_addi(RV_T1, RV_ZERO, 0)); + for (int i = 0; i < 8; ++i) { + /* t2 = (rs >> (8*i)) & 0xff */ + if (i == 0) { + emit32(mc, rv_andi(RV_T2, rs, 0xff)); + } else { + emit32(mc, rv_srli(RV_T2, rs, (u32)(8 * i))); + emit32(mc, rv_andi(RV_T2, RV_T2, 0xff)); + } + /* t2 <<= (56 - 8*i) (so byte 0 goes to top) */ + int sh = 56 - 8 * i; + if (sh) emit32(mc, rv_slli(RV_T2, RV_T2, (u32)sh)); + emit32(mc, rv_or(RV_T1, RV_T1, RV_T2)); + } + emit32(mc, rv_addi(rd, RV_T1, 0)); + return; + } + case INTRIN_POPCOUNT: { + /* Software popcount. Use the bit-twiddling sequence on the + * appropriate width. dst type drives width. */ + u32 rd = reg_num(dsts[0]); + u32 rs = reg_num(args[0]); + int is64 = type_is_64(args[0].type); + /* Move rs into t1 to avoid clobbering input. */ + emit32(mc, rv_addi(RV_T1, rs, 0)); + if (!is64) { + /* zext.w t1, t1 */ + emit32(mc, rv_slli(RV_T1, RV_T1, 32)); + emit32(mc, rv_srli(RV_T1, RV_T1, 32)); + } + /* t1 = t1 - ((t1 >> 1) & 0x5555...) */ + emit32(mc, rv_srli(RV_T2, RV_T1, 1)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x5555555555555555ll + : (i64)0x55555555); + emit32(mc, rv_and(RV_T2, RV_T2, RV_T3)); + emit32(mc, rv_sub(RV_T1, RV_T1, RV_T2)); + /* t1 = (t1 & 0x3333...) + ((t1 >> 2) & 0x3333...) */ + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x3333333333333333ll + : (i64)0x33333333); + emit32(mc, rv_and(RV_T2, RV_T1, RV_T3)); + emit32(mc, rv_srli(RV_T1, RV_T1, 2)); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T3)); + emit32(mc, rv_add(RV_T1, RV_T1, RV_T2)); + /* t1 = (t1 + (t1 >> 4)) & 0x0f0f... */ + emit32(mc, rv_srli(RV_T2, RV_T1, 4)); + emit32(mc, rv_add(RV_T1, RV_T1, RV_T2)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x0f0f0f0f0f0f0f0fll + : (i64)0x0f0f0f0f); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T3)); + /* t1 *= 0x0101010101... ; result in top byte */ + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x0101010101010101ll + : (i64)0x01010101); + emit32(mc, rv_mul(RV_T1, RV_T1, RV_T3)); + /* shift right by (XLEN - 8) */ + emit32(mc, rv_srli(rd, RV_T1, is64 ? 56u : 24u)); + return; + } + case INTRIN_CTZ: { + /* ctz(x) = popcount((x & -x) - 1) for x != 0. */ + u32 rd = reg_num(dsts[0]); + u32 rs = reg_num(args[0]); + int is64 = type_is_64(args[0].type); + /* t1 = -x */ + emit32(mc, rv_sub(RV_T1, RV_ZERO, rs)); + /* t1 = x & -x */ + emit32(mc, rv_and(RV_T1, RV_T1, rs)); + /* t1 = t1 - 1 */ + emit32(mc, rv_addi(RV_T1, RV_T1, -1)); + if (!is64) { + emit32(mc, rv_slli(RV_T1, RV_T1, 32)); + emit32(mc, rv_srli(RV_T1, RV_T1, 32)); + } + /* popcount(t1) into rd */ + emit32(mc, rv_srli(RV_T2, RV_T1, 1)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x5555555555555555ll + : (i64)0x55555555); + emit32(mc, rv_and(RV_T2, RV_T2, RV_T3)); + emit32(mc, rv_sub(RV_T1, RV_T1, RV_T2)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x3333333333333333ll + : (i64)0x33333333); + emit32(mc, rv_and(RV_T2, RV_T1, RV_T3)); + emit32(mc, rv_srli(RV_T1, RV_T1, 2)); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T3)); + emit32(mc, rv_add(RV_T1, RV_T1, RV_T2)); + emit32(mc, rv_srli(RV_T2, RV_T1, 4)); + emit32(mc, rv_add(RV_T1, RV_T1, RV_T2)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x0f0f0f0f0f0f0f0fll + : (i64)0x0f0f0f0f); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T3)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x0101010101010101ll + : (i64)0x01010101); + emit32(mc, rv_mul(RV_T1, RV_T1, RV_T3)); + emit32(mc, rv_srli(rd, RV_T1, is64 ? 56u : 24u)); + return; + } + case INTRIN_CLZ: { + /* Software clz: fold the high bit downward, then popcount the + * inverted result. Standard recipe: + * x |= x>>1; x |= x>>2; x |= x>>4; x |= x>>8; x |= x>>16; + * [x |= x>>32;] // 64-bit + * clz = popcount(~x) [for the appropriate width]. + */ + u32 rd = reg_num(dsts[0]); + u32 rs = reg_num(args[0]); + int is64 = type_is_64(args[0].type); + emit32(mc, rv_addi(RV_T1, rs, 0)); + if (!is64) { + /* zero-ext to 32 to make srli safe */ + emit32(mc, rv_slli(RV_T1, RV_T1, 32)); + emit32(mc, rv_srli(RV_T1, RV_T1, 32)); + } + u32 shifts[6] = {1, 2, 4, 8, 16, 32}; + u32 ns = is64 ? 6u : 5u; + for (u32 i = 0; i < ns; ++i) { + emit32(mc, rv_srli(RV_T2, RV_T1, shifts[i])); + emit32(mc, rv_or(RV_T1, RV_T1, RV_T2)); + } + /* t1 = ~t1, then popcount and we want the (width - popcount) ... wait. + * Actually clz(x) for the folded x = popcount(~x). Let me verify. + * If x = 0b00011010, fold => 0b00011111. ~ => 0b11100000. + * popcount(~folded) = 3 = clz(0b00011010) βœ“. */ + emit32(mc, rv_xori(RV_T1, RV_T1, -1)); + if (!is64) { + emit32(mc, rv_slli(RV_T1, RV_T1, 32)); + emit32(mc, rv_srli(RV_T1, RV_T1, 32)); + } + /* popcount(t1) into rd */ + emit32(mc, rv_srli(RV_T2, RV_T1, 1)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x5555555555555555ll + : (i64)0x55555555); + emit32(mc, rv_and(RV_T2, RV_T2, RV_T3)); + emit32(mc, rv_sub(RV_T1, RV_T1, RV_T2)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x3333333333333333ll + : (i64)0x33333333); + emit32(mc, rv_and(RV_T2, RV_T1, RV_T3)); + emit32(mc, rv_srli(RV_T1, RV_T1, 2)); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T3)); + emit32(mc, rv_add(RV_T1, RV_T1, RV_T2)); + emit32(mc, rv_srli(RV_T2, RV_T1, 4)); + emit32(mc, rv_add(RV_T1, RV_T1, RV_T2)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x0f0f0f0f0f0f0f0fll + : (i64)0x0f0f0f0f); + emit32(mc, rv_and(RV_T1, RV_T1, RV_T3)); + emit_load_imm(mc, 1, RV_T3, is64 ? (i64)0x0101010101010101ll + : (i64)0x01010101); + emit32(mc, rv_mul(RV_T1, RV_T1, RV_T3)); + emit32(mc, rv_srli(rd, RV_T1, is64 ? 56u : 24u)); + return; + } + case INTRIN_ADD_OVERFLOW: + case INTRIN_SUB_OVERFLOW: { + /* dsts: [val, ovf]. Signed overflow check. + * For ADD: ovf = ((a XOR result) & (b XOR result)) >> (width-1) + * For SUB: ovf = ((a XOR b) & (a XOR result)) >> (width-1) */ + Operand a_op = args[0], b_op = args[1]; + Operand dval = dsts[0], dovf = dsts[1]; + int is64 = type_is_64(dval.type); + u32 ra = force_reg_int(t, a_op, RV_T0); + u32 rb = force_reg_int(t, b_op, (ra == RV_T0) ? RV_T1 : RV_T0); + u32 rd = reg_num(dval); + u32 rovf = reg_num(dovf); + /* Compute result into t2 (avoid clobbering rd if rd == ra/rb). */ + if (kind == INTRIN_ADD_OVERFLOW) { + emit32(mc, is64 ? rv_add(RV_T2, ra, rb) : rv_addw(RV_T2, ra, rb)); + } else { + emit32(mc, is64 ? rv_sub(RV_T2, ra, rb) : rv_subw(RV_T2, ra, rb)); + } + /* t3 = a XOR t2 */ + emit32(mc, rv_xor(RV_T3, ra, RV_T2)); + if (kind == INTRIN_ADD_OVERFLOW) { + /* t4 = b XOR t2 */ + emit32(mc, rv_xor(rovf, rb, RV_T2)); + emit32(mc, rv_and(rovf, rovf, RV_T3)); + } else { + /* t4 = a XOR b */ + emit32(mc, rv_xor(rovf, ra, rb)); + emit32(mc, rv_and(rovf, rovf, RV_T3)); + } + /* shift right to extract sign bit */ + u32 sh = is64 ? 63u : 31u; + emit32(mc, is64 ? rv_srli(rovf, rovf, sh) : rv_srliw(rovf, rovf, sh)); + emit32(mc, rv_andi(rovf, rovf, 1)); + /* Now write the value. */ + emit32(mc, rv_addi(rd, RV_T2, 0)); + return; + } + case INTRIN_MUL_OVERFLOW: { + /* SMULL: full 64-bit signed product of two i32s, then compare + * with sign-extend of low 32. For i64 inputs we panic for now. */ + Operand a_op = args[0], b_op = args[1]; + Operand dval = dsts[0], dovf = dsts[1]; + int is64 = type_is_64(dval.type); + if (is64) { + compiler_panic(t->c, a->loc, "rv64 intrinsic: mul_overflow i64 NYI"); + } + u32 ra = force_reg_int(t, a_op, RV_T0); + u32 rb = force_reg_int(t, b_op, (ra == RV_T0) ? RV_T1 : RV_T0); + u32 rd = reg_num(dval); + u32 rovf = reg_num(dovf); + /* Sign-extend inputs from 32 to 64. */ + emit32(mc, rv_addiw(RV_T2, ra, 0)); + emit32(mc, rv_addiw(RV_T3, rb, 0)); + /* Full 64-bit product */ + emit32(mc, rv_mul(RV_T2, RV_T2, RV_T3)); + /* sign-ext of low 32 of product */ + emit32(mc, rv_addiw(RV_T3, RV_T2, 0)); + /* ovf = (T2 != T3) */ + emit32(mc, rv_xor(rovf, RV_T2, RV_T3)); + emit32(mc, rv_sltu(rovf, RV_ZERO, rovf)); + /* dval = low 32, sign-extended */ + emit32(mc, rv_addiw(rd, RV_T2, 0)); + return; + } + case INTRIN_MEMCPY: + case INTRIN_MEMMOVE: { + Operand da = args[0], sa = args[1], nb = args[2]; + if (da.kind != OPK_REG || sa.kind != OPK_REG || nb.kind != OPK_IMM) { + compiler_panic(t->c, a->loc, + "rv64 intrinsic: memcpy/memmove non-const NYI"); + } + u32 dr = reg_num(da), sr = reg_num(sa), n = (u32)nb.v.imm; + if (kind == INTRIN_MEMCPY) { + u32 i = 0; + while (i + 8 <= n) { emit32(mc, rv_ld(RV_T3, sr, (i32)i)); emit32(mc, rv_sd(RV_T3, dr, (i32)i)); i += 8; } + while (i + 4 <= n) { emit32(mc, rv_lwu(RV_T3, sr, (i32)i)); emit32(mc, rv_sw(RV_T3, dr, (i32)i)); i += 4; } + while (i + 2 <= n) { emit32(mc, rv_lhu(RV_T3, sr, (i32)i)); emit32(mc, rv_sh(RV_T3, dr, (i32)i)); i += 2; } + while (i < n) { emit32(mc, rv_lbu(RV_T3, sr, (i32)i)); emit32(mc, rv_sb(RV_T3, dr, (i32)i)); i += 1; } + } else { + u32 i = n; + while (i >= 8) { i -= 8; emit32(mc, rv_ld(RV_T3, sr, (i32)i)); emit32(mc, rv_sd(RV_T3, dr, (i32)i)); } + while (i >= 4) { i -= 4; emit32(mc, rv_lwu(RV_T3, sr, (i32)i)); emit32(mc, rv_sw(RV_T3, dr, (i32)i)); } + while (i >= 2) { i -= 2; emit32(mc, rv_lhu(RV_T3, sr, (i32)i)); emit32(mc, rv_sh(RV_T3, dr, (i32)i)); } + while (i >= 1) { i -= 1; emit32(mc, rv_lbu(RV_T3, sr, (i32)i)); emit32(mc, rv_sb(RV_T3, dr, (i32)i)); } + } + return; + } + case INTRIN_MEMSET: { + Operand da = args[0], bv = args[1], nb = args[2]; + if (da.kind != OPK_REG || nb.kind != OPK_IMM) { + compiler_panic(t->c, a->loc, "rv64 intrinsic: memset non-const NYI"); + } + u32 dr = reg_num(da), n = (u32)nb.v.imm; + u32 src; + if (bv.kind == OPK_IMM) { + u32 byte = (u32)(bv.v.imm & 0xffu); + if (byte == 0) src = RV_ZERO; + else { + u64 b = byte; b |= b << 8; b |= b << 16; b |= b << 32; + emit_load_imm(mc, 1, RV_T3, (i64)b); + src = RV_T3; + } + } else { + compiler_panic(t->c, a->loc, "rv64 intrinsic: memset REG byte NYI"); + } + u32 i = 0; + while (i + 8 <= n) { emit32(mc, rv_sd(src, dr, (i32)i)); i += 8; } + while (i + 4 <= n) { emit32(mc, rv_sw(src, dr, (i32)i)); i += 4; } + while (i + 2 <= n) { emit32(mc, rv_sh(src, dr, (i32)i)); i += 2; } + while (i < n) { emit32(mc, rv_sb(src, dr, (i32)i)); i += 1; } + return; + } + default: + compiler_panic(t->c, a->loc, "rv64 intrinsic kind %d NYI", (int)kind); + } } + static void rv_asm_block(CGTarget* t, const char* tmpl, const AsmConstraint* outs, u32 no, Operand* oo, const AsmConstraint* ins, u32 ni, const Operand* io, const Sym* clobs, u32 nc) { - (void)tmpl; - (void)outs; - (void)no; - (void)oo; - (void)ins; - (void)ni; - (void)io; - (void)clobs; - (void)nc; + (void)tmpl; (void)outs; (void)no; (void)oo; + (void)ins; (void)ni; (void)io; (void)clobs; (void)nc; rv_panic(t, "asm_block"); } @@ -382,6 +2549,7 @@ CGTarget* rv64_cgtarget_new(Compiler* c, ObjBuilder* o, MCEmitter* m) { t->finalize = rv_finalize; t->destroy = rv_destroy; + (void)type_is_signed; compiler_defer(c, cgt_cleanup, t); return t; } diff --git a/src/arch/rv64_isa.h b/src/arch/rv64_isa.h @@ -0,0 +1,291 @@ +/* RV64 instruction encoders, RV64IMFD baseline. + * + * Only the subset used by arch/rv64.c lives here. The disassembler + * doesn't share these yet; if/when it does, a parallel rv64_isa.c + * will host the decode tables (mirroring aa64_isa.[ch]). */ + +#ifndef CFREE_RV64_ISA_H +#define CFREE_RV64_ISA_H + +#include "core/core.h" + +/* ---- Named registers (DWARF / psABI numbering matches HW) ---- */ +enum { + RV_X0 = 0, RV_ZERO = 0, + RV_X1 = 1, RV_RA = 1, + RV_X2 = 2, RV_SP = 2, + RV_X3 = 3, RV_GP = 3, + RV_X4 = 4, RV_TP = 4, + RV_X5 = 5, RV_T0 = 5, + RV_X6 = 6, RV_T1 = 6, + RV_X7 = 7, RV_T2 = 7, + RV_X8 = 8, RV_S0 = 8, RV_FP = 8, + RV_X9 = 9, RV_S1 = 9, + RV_X10 = 10, RV_A0 = 10, + RV_X11 = 11, RV_A1 = 11, + RV_X12 = 12, RV_A2 = 12, + RV_X13 = 13, RV_A3 = 13, + RV_X14 = 14, RV_A4 = 14, + RV_X15 = 15, RV_A5 = 15, + RV_X16 = 16, RV_A6 = 16, + RV_X17 = 17, RV_A7 = 17, + RV_X18 = 18, RV_S2 = 18, + RV_X27 = 27, RV_S11 = 27, + RV_X28 = 28, RV_T3 = 28, + RV_X29 = 29, RV_T4 = 29, + RV_X30 = 30, RV_T5 = 30, + RV_X31 = 31, RV_T6 = 31, +}; + +#define RV_NOP 0x00000013u /* ADDI x0, x0, 0 */ + +/* ---- Format helpers ---- + * + * R-type: funct7(31:25) rs2(24:20) rs1(19:15) funct3(14:12) rd(11:7) op(6:0) + * I-type: imm(31:20) rs1(19:15) funct3(14:12) rd(11:7) op(6:0) + * S-type: imm[11:5](31:25) rs2(24:20) rs1(19:15) funct3(14:12) imm[4:0](11:7) op(6:0) + * B-type: imm[12](31) imm[10:5](30:25) rs2(24:20) rs1(19:15) funct3(14:12) imm[4:1](11:8) imm[11](7) op(6:0) + * U-type: imm[31:12](31:12) rd(11:7) op(6:0) + * J-type: imm[20](31) imm[10:1](30:21) imm[11](20) imm[19:12](19:12) rd(11:7) op(6:0) + */ + +static inline u32 rv_r(u32 funct7, u32 rs2, u32 rs1, u32 funct3, u32 rd, + u32 op) { + return ((funct7 & 0x7fu) << 25) | ((rs2 & 0x1fu) << 20) | + ((rs1 & 0x1fu) << 15) | ((funct3 & 0x7u) << 12) | + ((rd & 0x1fu) << 7) | (op & 0x7fu); +} +static inline u32 rv_i(i32 imm12, u32 rs1, u32 funct3, u32 rd, u32 op) { + return (((u32)imm12 & 0xfffu) << 20) | ((rs1 & 0x1fu) << 15) | + ((funct3 & 0x7u) << 12) | ((rd & 0x1fu) << 7) | (op & 0x7fu); +} +static inline u32 rv_s(i32 imm12, u32 rs2, u32 rs1, u32 funct3, u32 op) { + u32 ui = (u32)imm12 & 0xfffu; + return ((ui >> 5) << 25) | ((rs2 & 0x1fu) << 20) | ((rs1 & 0x1fu) << 15) | + ((funct3 & 0x7u) << 12) | ((ui & 0x1fu) << 7) | (op & 0x7fu); +} +static inline u32 rv_b(i32 imm13, u32 rs2, u32 rs1, u32 funct3, u32 op) { + u32 ui = (u32)imm13; + return (((ui >> 12) & 1u) << 31) | (((ui >> 5) & 0x3fu) << 25) | + ((rs2 & 0x1fu) << 20) | ((rs1 & 0x1fu) << 15) | + ((funct3 & 0x7u) << 12) | (((ui >> 1) & 0xfu) << 8) | + (((ui >> 11) & 1u) << 7) | (op & 0x7fu); +} +static inline u32 rv_u(u32 imm32_hi20, u32 rd, u32 op) { + return (imm32_hi20 & 0xfffff000u) | ((rd & 0x1fu) << 7) | (op & 0x7fu); +} +static inline u32 rv_j(i32 imm21, u32 rd, u32 op) { + u32 ui = (u32)imm21; + return (((ui >> 20) & 1u) << 31) | (((ui >> 1) & 0x3ffu) << 21) | + (((ui >> 11) & 1u) << 20) | (((ui >> 12) & 0xffu) << 12) | + ((rd & 0x1fu) << 7) | (op & 0x7fu); +} + +/* ---- Integer ops (RV32I/RV64I) ---- */ + +#define RV_OP 0x33u +#define RV_OP_IMM 0x13u +#define RV_OP_32 0x3bu +#define RV_OP_IMM_32 0x1bu +#define RV_LUI 0x37u +#define RV_AUIPC 0x17u +#define RV_LOAD 0x03u +#define RV_STORE 0x23u +#define RV_BRANCH 0x63u +#define RV_JAL 0x6fu +#define RV_JALR 0x67u +#define RV_LOAD_FP 0x07u +#define RV_STORE_FP 0x27u +#define RV_OP_FP 0x53u +#define RV_AMO 0x2fu +#define RV_FENCE 0x0fu +#define RV_SYSTEM 0x73u + +static inline u32 rv_add(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x0, rd, RV_OP); } +static inline u32 rv_sub(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x20, rs2, rs1, 0x0, rd, RV_OP); } +static inline u32 rv_sll(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x1, rd, RV_OP); } +static inline u32 rv_slt(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x2, rd, RV_OP); } +static inline u32 rv_sltu(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x3, rd, RV_OP); } +static inline u32 rv_xor(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x4, rd, RV_OP); } +static inline u32 rv_srl(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x5, rd, RV_OP); } +static inline u32 rv_sra(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x20, rs2, rs1, 0x5, rd, RV_OP); } +static inline u32 rv_or(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x6, rd, RV_OP); } +static inline u32 rv_and(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x7, rd, RV_OP); } + +static inline u32 rv_addw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x0, rd, RV_OP_32); } +static inline u32 rv_subw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x20, rs2, rs1, 0x0, rd, RV_OP_32); } +static inline u32 rv_sllw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x1, rd, RV_OP_32); } +static inline u32 rv_srlw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x00, rs2, rs1, 0x5, rd, RV_OP_32); } +static inline u32 rv_sraw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x20, rs2, rs1, 0x5, rd, RV_OP_32); } + +static inline u32 rv_addi(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x0, rd, RV_OP_IMM); } +static inline u32 rv_slti(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x2, rd, RV_OP_IMM); } +static inline u32 rv_sltiu(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x3, rd, RV_OP_IMM); } +static inline u32 rv_xori(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x4, rd, RV_OP_IMM); } +static inline u32 rv_ori(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x6, rd, RV_OP_IMM); } +static inline u32 rv_andi(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x7, rd, RV_OP_IMM); } + +/* Shift-immediate forms. RV64I uses a 6-bit shamt in bits 25:20 and a + * 6-bit funct6 in bits 31:26 (so the funct7-vs-shamt[5] split that + * rv_r() does is wrong here β€” we hand-assemble these). */ +static inline u32 rv_slli(u32 rd, u32 rs1, u32 sh) { + return (0x00u << 26) | ((sh & 0x3fu) << 20) | ((rs1 & 0x1fu) << 15) | + (0x1u << 12) | ((rd & 0x1fu) << 7) | RV_OP_IMM; +} +static inline u32 rv_srli(u32 rd, u32 rs1, u32 sh) { + return (0x00u << 26) | ((sh & 0x3fu) << 20) | ((rs1 & 0x1fu) << 15) | + (0x5u << 12) | ((rd & 0x1fu) << 7) | RV_OP_IMM; +} +static inline u32 rv_srai(u32 rd, u32 rs1, u32 sh) { + return (0x10u << 26) | ((sh & 0x3fu) << 20) | ((rs1 & 0x1fu) << 15) | + (0x5u << 12) | ((rd & 0x1fu) << 7) | RV_OP_IMM; +} + +static inline u32 rv_addiw(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x0, rd, RV_OP_IMM_32); } +static inline u32 rv_slliw(u32 rd, u32 rs1, u32 sh) { return rv_r(0x00, sh & 0x1fu, rs1, 0x1, rd, RV_OP_IMM_32); } +static inline u32 rv_srliw(u32 rd, u32 rs1, u32 sh) { return rv_r(0x00, sh & 0x1fu, rs1, 0x5, rd, RV_OP_IMM_32); } +static inline u32 rv_sraiw(u32 rd, u32 rs1, u32 sh) { return rv_r(0x20, sh & 0x1fu, rs1, 0x5, rd, RV_OP_IMM_32); } + +static inline u32 rv_lui(u32 rd, u32 imm20) { return ((imm20 & 0xfffffu) << 12) | ((rd & 0x1fu) << 7) | RV_LUI; } +static inline u32 rv_auipc(u32 rd, u32 imm20) { return ((imm20 & 0xfffffu) << 12) | ((rd & 0x1fu) << 7) | RV_AUIPC; } + +/* M extension */ +static inline u32 rv_mul(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x0, rd, RV_OP); } +static inline u32 rv_mulh(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x1, rd, RV_OP); } +static inline u32 rv_mulhsu(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x2, rd, RV_OP); } +static inline u32 rv_mulhu(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x3, rd, RV_OP); } +static inline u32 rv_div(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x4, rd, RV_OP); } +static inline u32 rv_divu(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x5, rd, RV_OP); } +static inline u32 rv_rem(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x6, rd, RV_OP); } +static inline u32 rv_remu(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x7, rd, RV_OP); } +static inline u32 rv_mulw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x0, rd, RV_OP_32); } +static inline u32 rv_divw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x4, rd, RV_OP_32); } +static inline u32 rv_divuw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x5, rd, RV_OP_32); } +static inline u32 rv_remw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x6, rd, RV_OP_32); } +static inline u32 rv_remuw(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x01, rs2, rs1, 0x7, rd, RV_OP_32); } + +/* Loads (funct3: 0=LB,1=LH,2=LW,3=LD,4=LBU,5=LHU,6=LWU) */ +static inline u32 rv_lb(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x0, rd, RV_LOAD); } +static inline u32 rv_lh(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x1, rd, RV_LOAD); } +static inline u32 rv_lw(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x2, rd, RV_LOAD); } +static inline u32 rv_ld(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x3, rd, RV_LOAD); } +static inline u32 rv_lbu(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x4, rd, RV_LOAD); } +static inline u32 rv_lhu(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x5, rd, RV_LOAD); } +static inline u32 rv_lwu(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x6, rd, RV_LOAD); } + +/* Stores (funct3: 0=SB,1=SH,2=SW,3=SD) */ +static inline u32 rv_sb(u32 rs2, u32 rs1, i32 imm) { return rv_s(imm, rs2, rs1, 0x0, RV_STORE); } +static inline u32 rv_sh(u32 rs2, u32 rs1, i32 imm) { return rv_s(imm, rs2, rs1, 0x1, RV_STORE); } +static inline u32 rv_sw(u32 rs2, u32 rs1, i32 imm) { return rv_s(imm, rs2, rs1, 0x2, RV_STORE); } +static inline u32 rv_sd(u32 rs2, u32 rs1, i32 imm) { return rv_s(imm, rs2, rs1, 0x3, RV_STORE); } + +/* Branches */ +static inline u32 rv_beq(u32 rs1, u32 rs2, i32 imm) { return rv_b(imm, rs2, rs1, 0x0, RV_BRANCH); } +static inline u32 rv_bne(u32 rs1, u32 rs2, i32 imm) { return rv_b(imm, rs2, rs1, 0x1, RV_BRANCH); } +static inline u32 rv_blt(u32 rs1, u32 rs2, i32 imm) { return rv_b(imm, rs2, rs1, 0x4, RV_BRANCH); } +static inline u32 rv_bge(u32 rs1, u32 rs2, i32 imm) { return rv_b(imm, rs2, rs1, 0x5, RV_BRANCH); } +static inline u32 rv_bltu(u32 rs1, u32 rs2, i32 imm) { return rv_b(imm, rs2, rs1, 0x6, RV_BRANCH); } +static inline u32 rv_bgeu(u32 rs1, u32 rs2, i32 imm) { return rv_b(imm, rs2, rs1, 0x7, RV_BRANCH); } + +/* Jumps */ +static inline u32 rv_jal(u32 rd, i32 imm21) { return rv_j(imm21, rd, RV_JAL); } +static inline u32 rv_jalr(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x0, rd, RV_JALR); } + +/* Convenience: jr / ret / j */ +static inline u32 rv_jr(u32 rs1) { return rv_jalr(RV_ZERO, rs1, 0); } +static inline u32 rv_ret_(void) { return rv_jalr(RV_ZERO, RV_RA, 0); } + +/* System */ +static inline u32 rv_ecall(void) { return rv_i(0, 0, 0, 0, RV_SYSTEM); } +static inline u32 rv_ebreak(void) { return rv_i(1, 0, 0, 0, RV_SYSTEM); } + +/* FENCE: pred/succ each 4 bits in imm[11:8]/imm[7:4]. fm bits 11:8 of imm */ +static inline u32 rv_fence_rw_rw(void) { + return rv_i((i32)0x033, 0, 0, 0, RV_FENCE); +} + +/* ---- FP (F + D extensions) ---- + * funct7 layout: bits[6:2] op-major (e.g. 0x00 FADD, 0x01 FSUB, ...); + * bits[1:0] = fmt (00=S, 01=D). rm (rounding mode) in funct3; 0x7 = DYN. */ + +#define RV_FMT_S 0u +#define RV_FMT_D 1u + +static inline u32 rv_fadd(u32 fmt, u32 rd, u32 rs1, u32 rs2) { + return rv_r((0x00u << 2) | fmt, rs2, rs1, 0x7, rd, RV_OP_FP); +} +static inline u32 rv_fsub(u32 fmt, u32 rd, u32 rs1, u32 rs2) { + return rv_r((0x01u << 2) | fmt, rs2, rs1, 0x7, rd, RV_OP_FP); +} +static inline u32 rv_fmul(u32 fmt, u32 rd, u32 rs1, u32 rs2) { + return rv_r((0x02u << 2) | fmt, rs2, rs1, 0x7, rd, RV_OP_FP); +} +static inline u32 rv_fdiv(u32 fmt, u32 rd, u32 rs1, u32 rs2) { + return rv_r((0x03u << 2) | fmt, rs2, rs1, 0x7, rd, RV_OP_FP); +} +/* FSGNJ.fmt rd, rs1, rs2 β€” used to implement FMV.fmt rd, rs (sgnj rs, rs). */ +static inline u32 rv_fsgnj(u32 fmt, u32 rd, u32 rs1, u32 rs2) { + return rv_r((0x04u << 2) | fmt, rs2, rs1, 0x0, rd, RV_OP_FP); +} +/* FCVT β€” integer/FP conversions. funct7 = 0x18..0x1d depending on direction; + * rs2 encodes the partner type: + * 0x60(W <- S) 0x61(W <- D) + * 0x68(S <- W) 0x69(D <- W) etc + * We assemble explicitly via rv_r to be obvious. */ +static inline u32 rv_fcvt(u32 funct7, u32 rs2_sel, u32 rd, u32 rs1, u32 rm) { + return rv_r(funct7, rs2_sel, rs1, rm, rd, RV_OP_FP); +} +/* FCVT.W.S rd, rs1 (signed i32 from f32, rtz=001) : funct7=0x60 rs2=0 */ +static inline u32 rv_fcvt_w_s(u32 rd, u32 rs1) { return rv_fcvt(0x60, 0x0, rd, rs1, 0x1); } +static inline u32 rv_fcvt_wu_s(u32 rd, u32 rs1) { return rv_fcvt(0x60, 0x1, rd, rs1, 0x1); } +static inline u32 rv_fcvt_l_s(u32 rd, u32 rs1) { return rv_fcvt(0x60, 0x2, rd, rs1, 0x1); } +static inline u32 rv_fcvt_lu_s(u32 rd, u32 rs1) { return rv_fcvt(0x60, 0x3, rd, rs1, 0x1); } +static inline u32 rv_fcvt_w_d(u32 rd, u32 rs1) { return rv_fcvt(0x61, 0x0, rd, rs1, 0x1); } +static inline u32 rv_fcvt_wu_d(u32 rd, u32 rs1) { return rv_fcvt(0x61, 0x1, rd, rs1, 0x1); } +static inline u32 rv_fcvt_l_d(u32 rd, u32 rs1) { return rv_fcvt(0x61, 0x2, rd, rs1, 0x1); } +static inline u32 rv_fcvt_lu_d(u32 rd, u32 rs1) { return rv_fcvt(0x61, 0x3, rd, rs1, 0x1); } +static inline u32 rv_fcvt_s_w(u32 rd, u32 rs1) { return rv_fcvt(0x68, 0x0, rd, rs1, 0x7); } +static inline u32 rv_fcvt_s_wu(u32 rd, u32 rs1) { return rv_fcvt(0x68, 0x1, rd, rs1, 0x7); } +static inline u32 rv_fcvt_s_l(u32 rd, u32 rs1) { return rv_fcvt(0x68, 0x2, rd, rs1, 0x7); } +static inline u32 rv_fcvt_s_lu(u32 rd, u32 rs1) { return rv_fcvt(0x68, 0x3, rd, rs1, 0x7); } +static inline u32 rv_fcvt_d_w(u32 rd, u32 rs1) { return rv_fcvt(0x69, 0x0, rd, rs1, 0x7); } +static inline u32 rv_fcvt_d_wu(u32 rd, u32 rs1) { return rv_fcvt(0x69, 0x1, rd, rs1, 0x7); } +static inline u32 rv_fcvt_d_l(u32 rd, u32 rs1) { return rv_fcvt(0x69, 0x2, rd, rs1, 0x7); } +static inline u32 rv_fcvt_d_lu(u32 rd, u32 rs1) { return rv_fcvt(0x69, 0x3, rd, rs1, 0x7); } +/* FCVT.S.D / FCVT.D.S */ +static inline u32 rv_fcvt_s_d(u32 rd, u32 rs1) { return rv_fcvt(0x20, 0x1, rd, rs1, 0x7); } +static inline u32 rv_fcvt_d_s(u32 rd, u32 rs1) { return rv_fcvt(0x21, 0x0, rd, rs1, 0x7); } + +/* FMV.X.W / FMV.W.X / FMV.X.D / FMV.D.X β€” bitcast between GPR and FPR. */ +static inline u32 rv_fmv_x_w(u32 rd, u32 rs1) { return rv_fcvt(0x70, 0x0, rd, rs1, 0x0); } +static inline u32 rv_fmv_w_x(u32 rd, u32 rs1) { return rv_fcvt(0x78, 0x0, rd, rs1, 0x0); } +static inline u32 rv_fmv_x_d(u32 rd, u32 rs1) { return rv_fcvt(0x71, 0x0, rd, rs1, 0x0); } +static inline u32 rv_fmv_d_x(u32 rd, u32 rs1) { return rv_fcvt(0x79, 0x0, rd, rs1, 0x0); } + +/* FP compares β€” rd is integer GPR. funct7 = 0x50/0x51 (S/D). rm: 0=LE, 1=LT, 2=EQ. */ +static inline u32 rv_feq_s(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x50, rs2, rs1, 0x2, rd, RV_OP_FP); } +static inline u32 rv_flt_s(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x50, rs2, rs1, 0x1, rd, RV_OP_FP); } +static inline u32 rv_fle_s(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x50, rs2, rs1, 0x0, rd, RV_OP_FP); } +static inline u32 rv_feq_d(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x51, rs2, rs1, 0x2, rd, RV_OP_FP); } +static inline u32 rv_flt_d(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x51, rs2, rs1, 0x1, rd, RV_OP_FP); } +static inline u32 rv_fle_d(u32 rd, u32 rs1, u32 rs2) { return rv_r(0x51, rs2, rs1, 0x0, rd, RV_OP_FP); } + +static inline u32 rv_flw(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x2, rd, RV_LOAD_FP); } +static inline u32 rv_fld(u32 rd, u32 rs1, i32 imm) { return rv_i(imm, rs1, 0x3, rd, RV_LOAD_FP); } +static inline u32 rv_fsw(u32 rs2, u32 rs1, i32 imm) { return rv_s(imm, rs2, rs1, 0x2, RV_STORE_FP); } +static inline u32 rv_fsd(u32 rs2, u32 rs1, i32 imm) { return rv_s(imm, rs2, rs1, 0x3, RV_STORE_FP); } + +/* ---- A extension (LR/SC + AMO) ---- + * AMO funct7 layout: aq(26) rl(25) funct5(31:27) op-specific. + * funct3 selects width: 0x2 = W (32-bit), 0x3 = D (64-bit). */ +static inline u32 rv_amo(u32 funct5, u32 aq, u32 rl, u32 rd, u32 rs1, u32 rs2, + u32 funct3) { + u32 funct7 = (funct5 << 2) | ((aq & 1u) << 1) | (rl & 1u); + return rv_r(funct7, rs2, rs1, funct3, rd, RV_AMO); +} +static inline u32 rv_lr_w(u32 rd, u32 rs1, u32 aq, u32 rl) { return rv_amo(0x02, aq, rl, rd, rs1, 0, 0x2); } +static inline u32 rv_lr_d(u32 rd, u32 rs1, u32 aq, u32 rl) { return rv_amo(0x02, aq, rl, rd, rs1, 0, 0x3); } +static inline u32 rv_sc_w(u32 rd, u32 rs1, u32 rs2, u32 aq, u32 rl) { return rv_amo(0x03, aq, rl, rd, rs1, rs2, 0x2); } +static inline u32 rv_sc_d(u32 rd, u32 rs1, u32 rs2, u32 aq, u32 rl) { return rv_amo(0x03, aq, rl, rd, rs1, rs2, 0x3); } + +#endif /* CFREE_RV64_ISA_H */ diff --git a/test/cg/harness/cases.c b/test/cg/harness/cases.c @@ -230,7 +230,7 @@ void build_q11_addr_of_helper_through_global(CgTestCtx*); const CgCase cg_cases[] = { /* MC-only */ - {"mc_smoke", build_mc_smoke, 42, CG_CASE_MC_ONLY}, + {"mc_smoke", build_mc_smoke, 42, CG_CASE_MC_ONLY, CG_ARCH_AARCH64}, /* Group A β€” function lifecycle and return */ {"a01_return_const_42", build_a01_return_const_42, 42, CG_CASE_DEFAULT}, diff --git a/test/cg/harness/cg_runner.c b/test/cg/harness/cg_runner.c @@ -349,6 +349,7 @@ static int mode_arches(const char* name) { unsigned arches = cc->arches ? cc->arches : (unsigned)CG_ARCH_DEFAULT; if (arches & CG_ARCH_AARCH64) fputs("aarch64\n", stdout); if (arches & CG_ARCH_X64) fputs("x64\n", stdout); + if (arches & CG_ARCH_RV64) fputs("rv64\n", stdout); return 0; } diff --git a/test/cg/harness/cg_test.h b/test/cg/harness/cg_test.h @@ -70,7 +70,11 @@ typedef enum { enum { CG_ARCH_AARCH64 = 1u << 0, CG_ARCH_X64 = 1u << 1, - CG_ARCH_DEFAULT = CG_ARCH_AARCH64, + CG_ARCH_RV64 = 1u << 2, + /* Default = portable across all implemented backends. Cases that emit + * hand-crafted bytes for a specific arch (mc_smoke today) must set + * their arch mask explicitly. */ + CG_ARCH_DEFAULT = CG_ARCH_AARCH64 | CG_ARCH_X64 | CG_ARCH_RV64, }; typedef struct CgCase { diff --git a/test/cg/run.sh b/test/cg/run.sh @@ -305,6 +305,15 @@ for OPT_LEVEL in $OPT_LEVELS; do work="$BUILD_DIR/$WORK_SUB/$name" mkdir -p "$work" + # Filter cases whose declared arch mask excludes the test arch. + # cg-runner --arches NAME prints one token per arch the case + # supports; skip if our $EXEC_ARCH isn't listed. + case_arches="$("${CG_RUN[@]}" --arches "$name" 2>/dev/null)" + if [ -n "$case_arches" ] && \ + ! printf '%s\n' "$case_arches" | grep -qx "$EXEC_ARCH"; then + continue + fi + expected="$("${CG_RUN[@]}" --expected "$name" 2>/dev/null)" expected="${expected:-0}" # Exit codes are mod 256 on POSIX; mask the expected the same way so