kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit b9d9f14016146421df4aad8dc36cb35279db6258
parent 972d2c69b2cde627f555c57732534d1452a281f0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed,  3 Jun 2026 11:46:44 -0700

plan: RV32

Diffstat:
Adoc/plan/RV32.md | 385+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 385 insertions(+), 0 deletions(-)

diff --git a/doc/plan/RV32.md b/doc/plan/RV32.md @@ -0,0 +1,385 @@ +# Plan: RISC-V 32-bit (`riscv32-none-elf`) support + +## Context + +`kit` today targets `riscv64` (LP64D) via a single backend in `src/arch/rv64/`. We want a +new cross target: + +``` +--target=riscv32-none-elf +-march=rv32imafc_zicsr_zifencei +-mabi=ilp32f (and also -mabi=ilp32, soft-float) +-mcmodel=medlow +``` + +This is a freestanding 32-bit RISC-V toolchain target: F (single-precision hardware float) +but **no D**, so `double` and `long long` are not native and must be lowered. The enum +`KIT_ARCH_RV32`, the `riscv32` triple parse (`driver/lib/target.c:275`, `ptr_size=4`), ELF +auto-detection (`src/api/object_detect.c`), and the runtime source files +(`rt/lib/riscv/rv32.S`, `rt/lib/coro/riscv32.c`) already exist but are unwired/incomplete. + +The intended outcome: `kit cc/as/ld/objdump/disas` produce and consume correct +`riscv32-none-elf` ELFCLASS32 objects and static executables for both `ilp32f` and `ilp32`, +with `libkit_rt.a` builtins available and the JIT `run`/`dbg` plumbing wired (native +execution host-gated, as for rv64). + +## Confirmed scope decisions + +- **Shared backend**: refactor `src/arch/rv64/` into **one XLEN-parameterized RISC-V backend** + serving both rv32 and rv64 from a single tree. RV64 must not regress and is re-validated. +- **Subsystems in scope**: compile + assemble + link + disasm; runtime lib; JIT `run`/`dbg`. + **Emulator is out of scope** (`src/emu`, `src/os`, `src/obj/elf/emu_load.c` stay rv64-only). +- **ABIs**: `ilp32f` (single hard-float: `float` in `fa0-fa7`, `double`/`i64` via integer + regs + soft-float) **and** `ilp32` (pure soft-float). `double` is always soft-float. +- **Code model**: accept and validate `-mcmodel=medlow`/`medany`, but keep the existing + PC-relative (`auipc` + `R_RV_PCREL_HI20/LO12`, GOT for externs) addressing for v1. No new + absolute-addressing path. + +## XLEN-parameterization mechanism + +Add a `const RiscvVariant*` descriptor (immutable, two static instances selected by +`KitArchKind`) carried on the per-function codegen context and threaded into the otherwise +stateless decode/asm/disasm/link/dbg paths. This honors "no global state — everything hangs +off a context struct" (the variant is a const table reached through a context, never ambient). + +New `src/arch/riscv/variant.h`: + +```c +typedef struct RiscvVariant { + KitArchKind kind; /* KIT_ARCH_RV32 / KIT_ARCH_RV64 */ + const char* name; /* "rv32" / "rv64" */ + const char* isa_prefix; /* "rv32" / "rv64" — for -march parsing */ + u8 xlen; /* 32 / 64 */ + u8 ptr_bytes; /* 4 / 8 — pointer & native register width */ + u8 gp_slot_bytes; /* 4 / 8 — varargs save & callee-save stride */ + u8 has_w_forms; /* 0 rv32 / 1 rv64 — ADDW/ADDIW/SLLIW/... */ + u8 shamt_bits; /* 5 rv32 / 6 rv64 — SLLI/SRLI/SRAI immediate */ + u32 frame_save_size; /* 2 * ptr_bytes (8 rv32 / 16 rv64) */ +} RiscvVariant; +const RiscvVariant* riscv_variant_for_kind(KitArchKind); +``` + +Reached via: `RvNativeTarget.variant` (codegen), `riscv_variant_for_kind(c->target.arch)` in +the decoder/assembler/disassembler/dbg (they already hold a `Compiler*`), and two +`LinkArchDesc` literals for the linker. Distinguish **three different "8"s** carefully — +`ptr_bytes` (pointer/reg width), `gp_slot_bytes` (ABI save stride), and `frame_save_size` +(saved ra+s0 pair) — conflating them passes rv64 (all 8) and breaks rv32. + +The **float ABI** (soft vs single-hard) is a separate axis from XLEN, carried on +`KitTargetSpec.float_abi` (see WS4), consumed by the ABI classifier and predefined macros. + +## Workstreams (ordered; each leaves a green targeted check) + +### WS0 — Config + variant scaffold (no behavior change) +- `include/kit/config.h`: add `#define KIT_ARCH_RV32_ENABLED 1` (`mk/config.mk` auto-parses + it into a make var — no `config.mk` edit needed). +- Add `src/arch/riscv/variant.h` with the struct + two `const` instances + lookup. +- **Gate**: `make lib` compiles. + +### WS1 — Directory rename + thread variant through codegen (rv64 still identical) +- `git mv src/arch/rv64 src/arch/riscv`; fix include guards/paths. Update `mk/lib_srcs.mk:55,189` + (`LIB_SRCS_ARCH_RV64` → `LIB_SRCS_ARCH_RISCV`, gated by `RV32 || RV64`). The only external + referent is the symbol `arch_impl_rv64` in `src/arch/registry.c` (path-independent). +- Keep file names and internal `rv64_`/`rv_` symbol prefixes for v1 (cosmetic rename is a + separate follow-up; renaming 2000+ sites is pure regression risk). +- `src/arch/riscv/native.c`: add `const RiscvVariant* variant` to `RvNativeTarget` (set from + `c->target.arch` in the one constructor); replace hardcoded 8/16/`RV_FRAME_SAVE_SIZE`/ + `addiw`/`ld`/`sd`/float-fmt sites with variant reads. **With the rv64 variant the emitted + bytes are byte-for-byte identical** — this isolates the "sharing" regression from rv32 + correctness. Key sites (file `src/arch/riscv/native.c`): `rv_emit_li32` (LUI+ADDIW→ADDI when + `!has_w_forms`), `enc_int_load/store` (sw/lw vs sd/ld by `ptr_bytes`), `RV_FRAME_SAVE_SIZE`, + varargs save area, callee-save stride, `rv_type_size`/`align` defaults, `rv_convert` sext/zext + (`xlen - src_bits` shift; `addiw` fast-path only when `has_w_forms`). +- **Gate**: `make test-smoke-rv64`, `test/arch/rv64_decode_test.c`, `test/asm/regen-rv64.sh`, + `test/link/rv64_jit_test.c` all byte-identical green. + +### WS2 — ISA / asm / disasm / link / dbg XLEN parameterization (still rv64-only at runtime) +- `src/arch/riscv/isa.c`/`isa.h`: add a one-byte **availability mask** column to `Rv64InsnDesc` + (`RV_AV_RV32 | RV_AV_RV64`) rather than a second table. Mark RV64-only: W-forms + (`addw/subw/sllw/srlw/sraw`, `addiw/slliw/srliw/sraiw`, `mulw/divw/divuw/remw/remuw`), + 64-bit mem (`ld/sd/lwu`), 64-bit FP int conv (`fcvt.*.l/lu`, `fmv.x.d/d.x`), compressed + `c.addiw/c.addw`, and the RV64 meaning of `c.ld/c.sd/c.ldsp/c.sdsp/c.fld/c.fsd/...`. Enable + RV32-only: `c.jal` (shares the encoding that is `c.addiw` on rv64), `c.lw/c.sw`, `c.flw/c.fsw`. +- `src/arch/riscv/disasm.c` + the compressed decoder `rv64_disasm_find_c`: pass the variant in; + branch the ambiguous compressed quadrant encodings and the **5-bit vs 6-bit shamt** decode + (`& 0x1f` on rv32, reject bit 25 set). `rv64_disasm_find`/`rv64_asm_find` skip rows by mask. +- `src/arch/riscv/link.c`: split `link_arch_rv64` and a new `link_arch_rv32`; PLT/IPLT stubs use + `rv_lw` instead of `rv_ld` (re-check stub sizes/offsets for 4-byte slots). +- `src/arch/riscv/dbg.c`: parameterize the displaced-step shim by `ptr_bytes`; set + `min_insn_len=2, max_insn_len=4` for rv32 (C ext on); RVC control-flow falls back to + step-over (`KIT_UNSUPPORTED`), 4-byte fixups reuse the rv64 builder. +- **Gate**: `make test-isa`, `regen-rv64.sh`, `rv64_jit_test` still green. + +### WS3 — rv32 ArchImpl + registry + `-march` + predefined macros +- `src/arch/riscv/arch.c`: define **both** `arch_impl_rv32` and `arch_impl_rv64` (share + `cgtarget_new/asm_new/disasm_new/decode/dwarf/dbg/asm_ops`/register file; differ in `.kind`, + `.name`, `.link`, `.predefined_macros`, `.target_feature_*`, and `cfi_data_align_factor` + -4 vs -8). +- Generalize `rv64_target_feature_apply_isa` (currently hard-requires the `"rv64"` prefix, + `arch.c:204`) to compare against `variant->isa_prefix`. rv32 default profile = + `rv32imafc_zicsr_zifencei` (I/M/A/F/C/Zicsr/Zifencei, **D cleared**). +- Predefined macros for rv32 (float-abi-dependent, see WS4): `__riscv_xlen=32`, + `__ILP32__`/`_ILP32` (drop `__LP64__`/`_LP64`), `__riscv_float_abi_single` (ilp32f) **or** + `__riscv_float_abi_soft` (ilp32) instead of `_double`, `__riscv_flen=32` when F present. +- `src/arch/registry.c`: register `arch_impl_rv32` under `#if KIT_ARCH_RV32_ENABLED` + (`:24,50,57`); `arch_kind_name` already returns "riscv32". +- **Gate**: `kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -E -dM` shows the + right macros; `kit mc`/`disas -target riscv32-none-elf` round-trips a hand-written rv32 insn. + +### WS4 — ABI vtable refactor + `-mabi` plumbing +- **New spec field**: in `include/kit/core.h` add `enum KitFloatAbi {DEFAULT, SOFT, SINGLE, + DOUBLE}` and `uint8_t float_abi` on `KitTargetSpec`; add `KitSlice abi` to `KitTargetOptions`. +- **Driver `-mabi`**: in `driver/lib/target.c`, intercept `-mabi=`/`-mabi` in + `driver_target_features_try_consume` **before** the catch-all `-m<x>` fallback (which would + otherwise mis-eat it), mirroring `-march` at `:154-165`; carry through `driver_target_options`. + Add `medlow`→`KIT_CM_SMALL`, `medany`→`KIT_CM_MEDIUM` aliases in `cc_record_mcmodel` + (`driver/cmd/cc.c:751`) and `run_record_mcmodel` (`driver/cmd/run.c:379`). +- **Resolve + validate** in `kit_target_new` (`src/api/core.c`), after `-march` features are + known: parse `ilp32|ilp32f|ilp32d|lp64|lp64f|lp64d`; if omitted, derive from `-march` + (D→DOUBLE, F-no-D→SINGLE, else SOFT); **reject** `*f` without F and `*d` without D. So + `rv32imafc` defaults to `ilp32f`, and `ilp32d` is rejected. +- **Shared ABI classifier**: generalize `src/abi/abi_rv64.c` into a RISC-V classifier + parameterized by a descriptor `{xlen_bytes (=ptr_size), gpr_bytes, aggregate_gpr_bytes=2*gpr, + flen (0/4/8), float_abi}` read from `a->c->target`. Replace the `RV64_ABI_*_BYTES=8/16` enum. + - FP-eligibility predicate `fp_eligible(desc, size)`: SOFT never; SINGLE iff `size==4` + (float; `double` 8>flen4 → INT pair); DOUBLE iff `size<=8` (preserves rv64 LP64D). + - `classify_scalar`: i8/16/32/ptr → 1 INT part; `i64`/soft-`double` → **2 INT parts of 4 in + an even-aligned GPR pair**; `float` (ilp32f) → 1 FP part (fa0-fa7). Replace the hardcoded + `size==16 → 2×8` with `nparts = size/gpr_bytes`. + - `classify_aggregate`: register threshold `2*gpr_bytes` (8 on rv32), chunk by `gpr_bytes`; + HFA refinement gated by `fp_eligible`. + - va_list: `ABI_VA_LIST_POINTER`, `gp_reg_count=8`, `gp_slot_size=4`, `fp_reg_count=0` + (**FP varargs always go via INT regs even under ilp32f**). Two thin static vtables + (`rv32_vtable`, `rv64_vtable`) sharing the classifier, differing only in the va_list literal. +- `src/abi/registry.c`: add `KIT_ABI_RV32_ENABLED` and an `{KIT_ARCH_RV32, KIT_OBJ_ELF, + &rv32_vtable}` entry (one entry serves both ilp32/ilp32f; the float axis is read from the spec). +- **Gate**: ABI classification golden tests (`test/api/abi_classify_test.c` style) for rv32 + ilp32f and ilp32. + +### WS5 — ELFCLASS32 object emission + reading (largest item) +Introduce one `is32`/`ElfEnc` flag (from `c->target.ptr_size`) threaded through, **not** +copy-paste duplication. +- `src/obj/elf/elf.h`: add `ELFCLASS32`, `ELF32_{EHDR,PHDR,SHDR}_SIZE` (52/32/40), + `ELF32_SYM_SIZE`(16)/`ELF32_RELA_SIZE`(12), `ELF32_R_INFO(s,t)=((s)<<8)|((t)&0xff)`, + `ELF32_R_SYM/TYPE`. +- `src/obj/elf/emit.c`: replace the `ptr_size != 8` panic (`:271`); branch sym record (16B, + different field widths) and rela record (12B, `ELF32_R_INFO`) writers; `EI_CLASS` + (`:664`); Ehdr/Shdr address fields via `elf_wr_u32` and ELF32 sizes; e_flags from + `float_abi` (`EF_RISCV_FLOAT_ABI_SINGLE`/`_SOFT` | `EF_RISCV_RVC`). +- `src/obj/elf/read.c`: accept `ELFCLASS32` (`:446,814`); add `parse_shdr32`/`parse_sym32`/ + rela32 with the correct offsets/strides and `ELF32_R_SYM/TYPE`. Scope v1 to ET_REL + + ET_EXEC reads; give ELF32 ET_DYN a clear "unsupported" rather than mis-parse. +- `src/obj/elf/link.c`: ELF32 ET_EXEC writer (parallel parameterization to emit.c) — needed + for `ld`/`run`/`dbg`. `link_dyn.c` and `emu_load.c` stay rv64/ELF64-only: gate rv32 to + static linking (freestanding `-none-elf` defaults to `KIT_PIC_NONE`), panic-with-diagnostic + for rv32 dynamic. +- New `src/obj/elf/reloc_riscv32.c`: clone `reloc_riscv64.c`; map `R_ABS32`→`ELF_R_RISCV_32`, + and `R_ABS64`/`R_RV_ADD64`/`R_RV_SUB64`→unsupported; reuse all XLEN-neutral kinds. +- `src/obj/registry.c`: add the rv32 `obj_elf_arch_ops` entry. **EM_RISCV is shared by rv32 + and rv64** — disambiguate reloc-table selection by `EI_CLASS`, not e_machine alone. +- **Gate**: new `test/elf/unit/rv32_class32.c` write-then-read round-trip; `kit objdump`/`nm` + on a hand-built rv32 `.o`. + +### WS6 — 64-bit-int + soft-float-double legalization (hardest part) +The cg layer (`src/cg/arith.c`) only routes wide ops to libcalls for the `__int128` builtin +(`api_i128_stack_top`), **never by width** — so `long long` on rv32 currently reaches the +backend as a raw 8-byte value, and `double` arithmetic would emit illegal `.d` ops. +- **64-bit integers on rv32**: generalize the i128 libcall mechanism in `src/cg/arith.c` to a + "wider than target word" predicate (`type_size > c->target.ptr_size`). Recommended v1: + route `mul/div/udiv/mod/shifts` to runtime libcalls (`__muldi3`, `__divdi3`, `__udivdi3`, + `__moddi3`, `__ashldi3`, `__lshrdi3`, `__ashrdi3`); do `add/sub/and/or/xor/load/store/move` + inline as register pairs in the backend (these are unavoidable for memory/arg traffic). + Add a **loud panic** in `rv_binop`/`rv_convert` if a wide value reaches the native-width + path, so any missed case fails fast. +- **Soft-float `double` on ilp32f/ilp32**: route `double` arithmetic and `double`↔int/float + conversions to libcalls (`__adddf3`, `__subdf3`, `__muldf3`, `__divdf3`, `__extendsfdf2`, + `__truncdfsf2`, `__fixdfsi`, `__floatsidf`, df compares) — mirror the existing f128 path so + the backend only ever sees `float` (S) FP ops. Backend panics on any `RV_FMT_D` selection + when `xlen==32`. +- Confirm `long double == double` (8B) and `__int128` absent on rv32 (runtime sets + `INT128=0`, no `LDBL128`), so the 16-byte scalar classify path is effectively dead there. +- **Gate**: red-green targeted tests — `long long` add/mul/div and `double` add/mul/convert + compile to plausible sequences (verified via decode/disasm; behavior via qemu if available). + +### WS7 — Runtime build wiring (`mk/rt.mk`) +- The `riscv32-elf` / `riscv32-elf-save-restore` variants exist but are **wrong**: + `-mabi=ilp32 -march=rv32imafd` (D present). Fix to the confirmed profile and add the + hard-float variant: + - `riscv32-elf` (ilp32, soft): `-mabi=ilp32 -march=rv32imac`. + - `riscv32-elf-hardfloat` (ilp32f): `-mabi=ilp32f -march=rv32imafc`. + - Both keep `ABI=ilp32` (the *integer* layout → `rt/lib/include/ilp32_le`; `f` only affects + FP arg passing), `INT128=0`, `CORO=riscv32`. +- Mandatory builtins are already selected: `RT_ABI_SRCS_ilp32 = rt/lib/int32/int32.c` (64-bit + int helpers) and `rt/lib/fp/fp.c` (soft `double`). Verify the df soft-float ops compile for + the rv32 target. +- `mk/lib_srcs.mk`: widen the ABI/reloc source guards to include `KIT_ARCH_RV32_ENABLED`; add + `reloc_riscv32.c` to the ELF source group. +- **Gate**: `kit cc -target riscv32-none-elf -c rt/lib/.../smoke` builds; `make rt` produces + the rv32 runtime variants. + +### WS8 — JIT `run` / `dbg` +`kit run`/`dbg` execute JIT bytes **natively in-process** (`run.c` `entry_fn(...)`); there is +no cross-arch execution path (emulator is out of scope). So on a non-rv32 host, rv32 code +cannot be executed — same situation as rv64's existing JIT test, which builds the image and +**skips the call** (exit 77). +- `src/link/link_jit.c`: audit only — it is XLEN-neutral and patches via shared `R_RV_*` + reloc kinds; the only u64/TLV slots are Mach-O-guarded (ELF never reaches them). No change + expected, provided WS2/WS6 emit the same reloc kinds. +- `rv32_dbg_ops` from WS2 (RVC-aware lengths, step-over fallback). +- **v1 deliverable**: JIT image build + relocation + symbol lookup wired and unit-tested + without execution; native execution host-gated to rv32 hosts. + +### WS9 — Tests & verification (see Verification below) + +## Parallel workstream map + +Much of this is separable. Lock a small set of **shared interfaces first** (Phase A), then five +tracks proceed in parallel (Phase B), converging at integration (Phase C). The critical path is +Phase A → Track 1 (the backend chain WS1→WS2→WS3); ELF32 (Track 2) is the largest *effort* but is +parallel, so starting it immediately keeps it off the wall-clock. + +**Phase A — shared contracts (serial, small, land first; unblocks everyone):** +- **WS0** `RiscvVariant` + `riscv_variant_for_kind` + `KIT_ARCH_RV32_ENABLED`. +- **WS4a** the float-ABI interface only: `KitFloatAbi` enum, `KitTargetSpec.float_abi`, + `KitTargetOptions.abi`, and the `-mabi`/`-mcmodel` parse → resolve → validate plumbing + (`driver/lib/target.c`, `driver/cmd/cc.c`, `src/api/core.c`). No classifier change yet. + +The four contracts everyone codes against (freeze these in Phase A): +1. **`RiscvVariant`** fields (XLEN/ptr_bytes/gp_slot_bytes/has_w_forms/shamt_bits/frame_save_size) + — consumed by Track 1. +2. **`float_abi`** on the spec — consumed by Track 2 (e_flags), Track 3 (FP-eligibility), + Track 5 (soft-double), and WS3 (predefined macros). +3. **Reloc-kind list**: the exact `R_RV_*` kinds rv32 codegen emits = the set rv32 ELF maps and + `link_jit` expects (= existing rv64 set minus `R_*64`/`ADD64`/`SUB64`). Track 1 ↔ Track 2. +4. **Runtime libcall names** (`__adddf3`, `__muldf3`, `__fixdfsi`, `__floatsidf`, `__extendsfdf2`, + `__truncdfsf2`, `__muldi3`, `__divdi3`, `__udivdi3`, `__moddi3`, `__ashldi3`, `__lshrdi3`, + `__ashrdi3`) emitted by WS6 = provided by WS7. Track 5 ↔ Track 4. +5. **ABI part-layout**: i64/soft-`double` → even-aligned GPR pair; `gp_slot_size=4`; callee-save + stride. Track 3 publishes it via the vtable; Track 1's native-frame code consumes it. + +**Phase B — parallel tracks (each independently testable):** +- **Track 1 — Backend (critical path, serial within):** WS1 (rename + thread variant, rv64 + byte-identical) → WS2 (ISA/asm/disasm/link/dbg XLEN param) → WS3 (rv32 ArchImpl + `-march` + + macros). Gate per step against rv64 regression, then rv32 mc/disas round-trip. +- **Track 2 — ELF32 (WS5):** fully independent of codegen — develop and test the ELFCLASS32 + writer/reader via a hand-built `ObjBuilder` for `KIT_ARCH_RV32` (`test/elf/unit/rv32_class32.c` + write→read roundtrip). Only consumes `float_abi` (e_flags) + the reloc list. Largest effort; + start day one. +- **Track 3 — ABI classifier (WS4b):** the shared RISC-V classifier + `rv32_vtable`, parameterized + by the descriptor. Independent of codegen — test via `test/api/abi_classify_test.c` for ilp32f + and ilp32. Consumes `RiscvVariant`/`float_abi`. +- **Track 4 — Runtime (WS7):** `mk/rt.mk` fixes (correct `-march`/`-mabi`, add hardfloat variant) + + `mk/lib_srcs.mk` guards. The edits are independent and land early; the `make rt` *validation* + gates on Track 1 codegen. +- **Track 5 — cg legalization (WS6):** wide-int + soft-`double` → libcall routing in + `src/cg/arith.c`, keyed on `ptr_size`/`float_abi`. Logic is independent; end-to-end validation + needs Track 1 + Track 4. Highest correctness risk — design early against the libcall contract. + +**Phase C — integration (after tracks converge):** +- Register `arch_impl_rv32` (Track 1 + Track 3). Wire object registry (Track 2). +- **WS8** JIT `run`/`dbg` audit + `rv32_dbg_ops` (Track 1 + Track 2). +- **WS9** end-to-end: decode/asm goldens, `kit cc → ld → qemu` smoke (all tracks + WS6 + WS7). + +## Verification + +### Verified execution oracle (clang + qemu-system, confirmed working on this host) + +clang 22 has the `riscv32` target and `llvm-objdump`/`llvm-mc`/`ld.lld` are installed. +**qemu user-mode is not built on macOS** — only `qemu-system-riscv32` — which suits a +freestanding `-none-elf` target. A confirmed working recipe (PASS→exit 0, wrong answer→exit 7, +hang→exit 124), to be mirrored by `test/smoke/rv32.sh`: +- Build: `clang --target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f -nostdlib -ffreestanding` + (and an `ilp32`/`rv32imac` soft variant); link `ld.lld -Ttext=0x80000000 -e _start`. +- Startup stub (`_start`): set `sp` (RAM at `0x80000000`); **for ilp32f set `mstatus.FS`** + (`li t0,0x2000; csrs mstatus,t0`) to enable the FPU before any `fadd.s` — otherwise it traps + and hangs. Soft `ilp32` skips this. +- Result via SiFive test finisher at `0x100000`: `0x5555`→qemu poweroff exit 0; + `0x3333|(code<<16)`→qemu exit `code`. +- Run: `qemu-system-riscv32 -machine virt -bios none -kernel prog.elf -nographic -no-reboot` + (wrap in `timeout`). Verified that clang emits the expected `fadd.s` + inline 64-bit `add`/`sltu` + + `fcvt.w.s` for ilp32f, and `llvm-readelf` shows ELF32 / "single-float ABI" / RVC flags. + +This is the kit smoke: `kit cc -target riscv32-none-elf ... -c app.c`, assemble the startup stub, +`kit ld` to an ELF, run under qemu-system, assert exit 0. Unlike rv64 (qemu-user/podman), rv32 +uses qemu-system + a bare-metal startup + finisher device. `regen-rv32.sh` uses +`clang --target=riscv32 + llvm-objdump` for asm/disasm goldens. + +### Milestones + +kit has no in-process rv32 execution path (emulator out of scope), so behavioral correctness +comes from the **clang+qemu-system oracle above**; structural correctness comes from +**self-consistency** (decode↔format, ELF write↔read). Milestone order (each green before the +next), preferring targeted runs and redirecting output to a file (per CLAUDE.md): + +1. **Build/register**: `make lib 2>&1 | tee /tmp/build.log`; target recognized. +2. **Decode/encode self-roundtrip** — new `test/arch/rv32_decode_test.c` (mirror + `rv64_decode_test.c`): no W-forms, `lw/sw` (no `ld/sd`), 5-bit shamt, `c.jal`, + `c.lw/c.sw`, `c.flw/c.fsw`; decode↔format agreement is the oracle. + `make test-isa 2>&1 | tee /tmp/isa.log`. +3. **Assembler/disasm corpus** — `test/asm/` rv32 lane + `regen-rv32.sh` (clang + `--target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f` + `llvm-objdump` as reference, + maintainer-only, soft-skip if absent; committed goldens replayed by CI). + `make test-asm-rv32 2>&1 | tee /tmp/asm32.log`. +4. **ELF32 round-trip** — `test/elf/unit/rv32_class32.c` (first ELFCLASS32 consumer): + write→read-back, assert `EI_CLASS==ELFCLASS32`, `Elf32_Sym`/`Elf32_Rela` survive. + `make test-elf 2>&1 | tee /tmp/elf.log`. +5. **Compile + inspect** (no execution): + `./build/kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -mabi=ilp32f -c + smoke.c -o /tmp/rv32.o` then `./build/kit disas /tmp/rv32.o` (optional cross-check + `llvm-objdump -d --triple=riscv32 /tmp/rv32.o`). +6. **Link + JIT image** — new `test/link/rv32_jit_test.c` (mirror `rv64_jit_test.c`, exit 77 + on non-rv32 host; include a PC-relative reloc to exercise HI20/LO12 pairing). `kit ld` to a + static ELF executable succeeds. +7. **qemu-system smoke** — `test/smoke/rv32.sh` using the verified oracle above + (`qemu-system-riscv32 -machine virt`, FPU-enabling startup for ilp32f, SiFive finisher exit + codes). Compiles `app.c` with `kit cc -target riscv32-none-elf`, links with the startup stub, + runs under qemu, asserts exit 0. This is the **only behavioral oracle** (soft-double and + 64-bit-int correctness are otherwise untestable) — make it a required CI gate where + `qemu-system-riscv32` is present; skip-if-absent elsewhere. Add a doctor + (`test/lib/check_rv32_env.sh`) like rv64's. + +New make targets next to their rv64 peers in `test/test.mk`: `RV32_DECODE_TEST_BIN` (into +`test-isa`), `test-asm-rv32`, `test-rv32-jit`, `test-smoke-rv32`, and `rv32` added to the +runtime test arch list. + +**RV64 regression gate** (run after WS1 and again at the end): +`make test-isa test-asm-rv64 test-smoke-rv64 test-link` + `rv64_jit_test`. + +## Risks + +1. **64-bit-int + soft-double on rv32 (WS6) is the deepest, execution-only risk.** Carry/borrow + chains and soft-float rounding can't be checked by byte-goldens — only execution catches + valid-but-wrong codegen. The behavioral oracle (qemu-system, verified) closes this, but + depends on `qemu-system-riscv32` being present and a correct FPU-enabling startup stub for + ilp32f (a missing `mstatus.FS` set silently hangs instead of failing cleanly). Mitigate with + qemu-gated differential tests (kit result vs host double/int64) and loud backend panics on any + wide/`.d` value reaching the native path. +2. **ELFCLASS32 (WS5) is the dominant effort** (~130 Elf64-hardcoded sites across emit/read/link). + The write-then-read self-oracle catches internal inconsistency but not spec divergence; keep + one clang-oracle `cases/` rv32 ELF test for an independent cross-check. Disambiguating + EM_RISCV by `EI_CLASS` is a cross-cutting correctness point. +3. **Sharing risk to RV64 (WS1/WS2)**: repurposing `rv_is_64` semantics, the `RV_FRAME_SAVE_SIZE` + constant→`2*ptr_bytes`, and the compressed-quadrant/shamt branches all touch the working + rv64 path. Land WS1/WS2 with rv64-byte-identical output and prove zero diff before enabling + rv32. +4. **`-mabi` boundary**: parsed in `driver/`, validated in `src/api/core.c` where feature words + exist. Every spec-construction site that bypasses `kit_target_new` must default + `float_abi=DEFAULT` safely; the catch-all `-m` consumer must not pre-eat `-mabi`. +5. **ilp32 vs ilp32f confusion**: `ilp32` is the *integer* ABI (type widths); the `f` is float + arg-passing only. The runtime `ABI=ilp32` include set is correct for both; the existing + `-march=rv32imafd` (D) is wrong and must become `rv32imafc`/`rv32imac`. +6. **RVC dbg gap**: rv32imafc emits compressed insns pervasively; v1 step-over fallback degrades + `kit dbg` single-step for rv32. The shim unit test must assert the fallback path is taken. + +## Critical files + +- `src/arch/riscv/` (renamed from `rv64/`): `variant.h` (new), `native.c`, `isa.c/.h`, + `disasm.c`, `asm.c`, `link.c`, `dbg.c`, `arch.c` (two ArchImpls, `-march`, macros). +- `src/abi/abi_rv64.c` → shared RISC-V classifier + `rv32_vtable`; `src/abi/registry.c`. +- `src/cg/arith.c` — wide-int + soft-double legalization (WS6, the riskiest, currently absent). +- `src/obj/elf/{elf.h,emit.c,read.c,link.c}` + new `reloc_riscv32.c`; `src/obj/registry.c`. +- `include/kit/core.h` (`KitFloatAbi`, `KitTargetSpec.float_abi`, `KitTargetOptions.abi`), + `include/kit/config.h` (`KIT_ARCH_RV32_ENABLED`). +- `driver/lib/target.c`, `driver/cmd/cc.c`, `driver/cmd/run.c` (`-mabi`, `medlow/medany`); + `src/api/core.c` (resolve/validate). +- `src/arch/registry.c`, `mk/rt.mk`, `mk/lib_srcs.mk`, `test/test.mk` + new test files.