commit b9d9f14016146421df4aad8dc36cb35279db6258
parent 972d2c69b2cde627f555c57732534d1452a281f0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Wed, 3 Jun 2026 11:46:44 -0700
plan: RV32
Diffstat:
| A | doc/plan/RV32.md | | | 385 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 385 insertions(+), 0 deletions(-)
diff --git a/doc/plan/RV32.md b/doc/plan/RV32.md
@@ -0,0 +1,385 @@
+# Plan: RISC-V 32-bit (`riscv32-none-elf`) support
+
+## Context
+
+`kit` today targets `riscv64` (LP64D) via a single backend in `src/arch/rv64/`. We want a
+new cross target:
+
+```
+--target=riscv32-none-elf
+-march=rv32imafc_zicsr_zifencei
+-mabi=ilp32f (and also -mabi=ilp32, soft-float)
+-mcmodel=medlow
+```
+
+This is a freestanding 32-bit RISC-V toolchain target: F (single-precision hardware float)
+but **no D**, so `double` and `long long` are not native and must be lowered. The enum
+`KIT_ARCH_RV32`, the `riscv32` triple parse (`driver/lib/target.c:275`, `ptr_size=4`), ELF
+auto-detection (`src/api/object_detect.c`), and the runtime source files
+(`rt/lib/riscv/rv32.S`, `rt/lib/coro/riscv32.c`) already exist but are unwired/incomplete.
+
+The intended outcome: `kit cc/as/ld/objdump/disas` produce and consume correct
+`riscv32-none-elf` ELFCLASS32 objects and static executables for both `ilp32f` and `ilp32`,
+with `libkit_rt.a` builtins available and the JIT `run`/`dbg` plumbing wired (native
+execution host-gated, as for rv64).
+
+## Confirmed scope decisions
+
+- **Shared backend**: refactor `src/arch/rv64/` into **one XLEN-parameterized RISC-V backend**
+ serving both rv32 and rv64 from a single tree. RV64 must not regress and is re-validated.
+- **Subsystems in scope**: compile + assemble + link + disasm; runtime lib; JIT `run`/`dbg`.
+ **Emulator is out of scope** (`src/emu`, `src/os`, `src/obj/elf/emu_load.c` stay rv64-only).
+- **ABIs**: `ilp32f` (single hard-float: `float` in `fa0-fa7`, `double`/`i64` via integer
+ regs + soft-float) **and** `ilp32` (pure soft-float). `double` is always soft-float.
+- **Code model**: accept and validate `-mcmodel=medlow`/`medany`, but keep the existing
+ PC-relative (`auipc` + `R_RV_PCREL_HI20/LO12`, GOT for externs) addressing for v1. No new
+ absolute-addressing path.
+
+## XLEN-parameterization mechanism
+
+Add a `const RiscvVariant*` descriptor (immutable, two static instances selected by
+`KitArchKind`) carried on the per-function codegen context and threaded into the otherwise
+stateless decode/asm/disasm/link/dbg paths. This honors "no global state — everything hangs
+off a context struct" (the variant is a const table reached through a context, never ambient).
+
+New `src/arch/riscv/variant.h`:
+
+```c
+typedef struct RiscvVariant {
+ KitArchKind kind; /* KIT_ARCH_RV32 / KIT_ARCH_RV64 */
+ const char* name; /* "rv32" / "rv64" */
+ const char* isa_prefix; /* "rv32" / "rv64" — for -march parsing */
+ u8 xlen; /* 32 / 64 */
+ u8 ptr_bytes; /* 4 / 8 — pointer & native register width */
+ u8 gp_slot_bytes; /* 4 / 8 — varargs save & callee-save stride */
+ u8 has_w_forms; /* 0 rv32 / 1 rv64 — ADDW/ADDIW/SLLIW/... */
+ u8 shamt_bits; /* 5 rv32 / 6 rv64 — SLLI/SRLI/SRAI immediate */
+ u32 frame_save_size; /* 2 * ptr_bytes (8 rv32 / 16 rv64) */
+} RiscvVariant;
+const RiscvVariant* riscv_variant_for_kind(KitArchKind);
+```
+
+Reached via: `RvNativeTarget.variant` (codegen), `riscv_variant_for_kind(c->target.arch)` in
+the decoder/assembler/disassembler/dbg (they already hold a `Compiler*`), and two
+`LinkArchDesc` literals for the linker. Distinguish **three different "8"s** carefully —
+`ptr_bytes` (pointer/reg width), `gp_slot_bytes` (ABI save stride), and `frame_save_size`
+(saved ra+s0 pair) — conflating them passes rv64 (all 8) and breaks rv32.
+
+The **float ABI** (soft vs single-hard) is a separate axis from XLEN, carried on
+`KitTargetSpec.float_abi` (see WS4), consumed by the ABI classifier and predefined macros.
+
+## Workstreams (ordered; each leaves a green targeted check)
+
+### WS0 — Config + variant scaffold (no behavior change)
+- `include/kit/config.h`: add `#define KIT_ARCH_RV32_ENABLED 1` (`mk/config.mk` auto-parses
+ it into a make var — no `config.mk` edit needed).
+- Add `src/arch/riscv/variant.h` with the struct + two `const` instances + lookup.
+- **Gate**: `make lib` compiles.
+
+### WS1 — Directory rename + thread variant through codegen (rv64 still identical)
+- `git mv src/arch/rv64 src/arch/riscv`; fix include guards/paths. Update `mk/lib_srcs.mk:55,189`
+ (`LIB_SRCS_ARCH_RV64` → `LIB_SRCS_ARCH_RISCV`, gated by `RV32 || RV64`). The only external
+ referent is the symbol `arch_impl_rv64` in `src/arch/registry.c` (path-independent).
+- Keep file names and internal `rv64_`/`rv_` symbol prefixes for v1 (cosmetic rename is a
+ separate follow-up; renaming 2000+ sites is pure regression risk).
+- `src/arch/riscv/native.c`: add `const RiscvVariant* variant` to `RvNativeTarget` (set from
+ `c->target.arch` in the one constructor); replace hardcoded 8/16/`RV_FRAME_SAVE_SIZE`/
+ `addiw`/`ld`/`sd`/float-fmt sites with variant reads. **With the rv64 variant the emitted
+ bytes are byte-for-byte identical** — this isolates the "sharing" regression from rv32
+ correctness. Key sites (file `src/arch/riscv/native.c`): `rv_emit_li32` (LUI+ADDIW→ADDI when
+ `!has_w_forms`), `enc_int_load/store` (sw/lw vs sd/ld by `ptr_bytes`), `RV_FRAME_SAVE_SIZE`,
+ varargs save area, callee-save stride, `rv_type_size`/`align` defaults, `rv_convert` sext/zext
+ (`xlen - src_bits` shift; `addiw` fast-path only when `has_w_forms`).
+- **Gate**: `make test-smoke-rv64`, `test/arch/rv64_decode_test.c`, `test/asm/regen-rv64.sh`,
+ `test/link/rv64_jit_test.c` all byte-identical green.
+
+### WS2 — ISA / asm / disasm / link / dbg XLEN parameterization (still rv64-only at runtime)
+- `src/arch/riscv/isa.c`/`isa.h`: add a one-byte **availability mask** column to `Rv64InsnDesc`
+ (`RV_AV_RV32 | RV_AV_RV64`) rather than a second table. Mark RV64-only: W-forms
+ (`addw/subw/sllw/srlw/sraw`, `addiw/slliw/srliw/sraiw`, `mulw/divw/divuw/remw/remuw`),
+ 64-bit mem (`ld/sd/lwu`), 64-bit FP int conv (`fcvt.*.l/lu`, `fmv.x.d/d.x`), compressed
+ `c.addiw/c.addw`, and the RV64 meaning of `c.ld/c.sd/c.ldsp/c.sdsp/c.fld/c.fsd/...`. Enable
+ RV32-only: `c.jal` (shares the encoding that is `c.addiw` on rv64), `c.lw/c.sw`, `c.flw/c.fsw`.
+- `src/arch/riscv/disasm.c` + the compressed decoder `rv64_disasm_find_c`: pass the variant in;
+ branch the ambiguous compressed quadrant encodings and the **5-bit vs 6-bit shamt** decode
+ (`& 0x1f` on rv32, reject bit 25 set). `rv64_disasm_find`/`rv64_asm_find` skip rows by mask.
+- `src/arch/riscv/link.c`: split `link_arch_rv64` and a new `link_arch_rv32`; PLT/IPLT stubs use
+ `rv_lw` instead of `rv_ld` (re-check stub sizes/offsets for 4-byte slots).
+- `src/arch/riscv/dbg.c`: parameterize the displaced-step shim by `ptr_bytes`; set
+ `min_insn_len=2, max_insn_len=4` for rv32 (C ext on); RVC control-flow falls back to
+ step-over (`KIT_UNSUPPORTED`), 4-byte fixups reuse the rv64 builder.
+- **Gate**: `make test-isa`, `regen-rv64.sh`, `rv64_jit_test` still green.
+
+### WS3 — rv32 ArchImpl + registry + `-march` + predefined macros
+- `src/arch/riscv/arch.c`: define **both** `arch_impl_rv32` and `arch_impl_rv64` (share
+ `cgtarget_new/asm_new/disasm_new/decode/dwarf/dbg/asm_ops`/register file; differ in `.kind`,
+ `.name`, `.link`, `.predefined_macros`, `.target_feature_*`, and `cfi_data_align_factor`
+ -4 vs -8).
+- Generalize `rv64_target_feature_apply_isa` (currently hard-requires the `"rv64"` prefix,
+ `arch.c:204`) to compare against `variant->isa_prefix`. rv32 default profile =
+ `rv32imafc_zicsr_zifencei` (I/M/A/F/C/Zicsr/Zifencei, **D cleared**).
+- Predefined macros for rv32 (float-abi-dependent, see WS4): `__riscv_xlen=32`,
+ `__ILP32__`/`_ILP32` (drop `__LP64__`/`_LP64`), `__riscv_float_abi_single` (ilp32f) **or**
+ `__riscv_float_abi_soft` (ilp32) instead of `_double`, `__riscv_flen=32` when F present.
+- `src/arch/registry.c`: register `arch_impl_rv32` under `#if KIT_ARCH_RV32_ENABLED`
+ (`:24,50,57`); `arch_kind_name` already returns "riscv32".
+- **Gate**: `kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -E -dM` shows the
+ right macros; `kit mc`/`disas -target riscv32-none-elf` round-trips a hand-written rv32 insn.
+
+### WS4 — ABI vtable refactor + `-mabi` plumbing
+- **New spec field**: in `include/kit/core.h` add `enum KitFloatAbi {DEFAULT, SOFT, SINGLE,
+ DOUBLE}` and `uint8_t float_abi` on `KitTargetSpec`; add `KitSlice abi` to `KitTargetOptions`.
+- **Driver `-mabi`**: in `driver/lib/target.c`, intercept `-mabi=`/`-mabi` in
+ `driver_target_features_try_consume` **before** the catch-all `-m<x>` fallback (which would
+ otherwise mis-eat it), mirroring `-march` at `:154-165`; carry through `driver_target_options`.
+ Add `medlow`→`KIT_CM_SMALL`, `medany`→`KIT_CM_MEDIUM` aliases in `cc_record_mcmodel`
+ (`driver/cmd/cc.c:751`) and `run_record_mcmodel` (`driver/cmd/run.c:379`).
+- **Resolve + validate** in `kit_target_new` (`src/api/core.c`), after `-march` features are
+ known: parse `ilp32|ilp32f|ilp32d|lp64|lp64f|lp64d`; if omitted, derive from `-march`
+ (D→DOUBLE, F-no-D→SINGLE, else SOFT); **reject** `*f` without F and `*d` without D. So
+ `rv32imafc` defaults to `ilp32f`, and `ilp32d` is rejected.
+- **Shared ABI classifier**: generalize `src/abi/abi_rv64.c` into a RISC-V classifier
+ parameterized by a descriptor `{xlen_bytes (=ptr_size), gpr_bytes, aggregate_gpr_bytes=2*gpr,
+ flen (0/4/8), float_abi}` read from `a->c->target`. Replace the `RV64_ABI_*_BYTES=8/16` enum.
+ - FP-eligibility predicate `fp_eligible(desc, size)`: SOFT never; SINGLE iff `size==4`
+ (float; `double` 8>flen4 → INT pair); DOUBLE iff `size<=8` (preserves rv64 LP64D).
+ - `classify_scalar`: i8/16/32/ptr → 1 INT part; `i64`/soft-`double` → **2 INT parts of 4 in
+ an even-aligned GPR pair**; `float` (ilp32f) → 1 FP part (fa0-fa7). Replace the hardcoded
+ `size==16 → 2×8` with `nparts = size/gpr_bytes`.
+ - `classify_aggregate`: register threshold `2*gpr_bytes` (8 on rv32), chunk by `gpr_bytes`;
+ HFA refinement gated by `fp_eligible`.
+ - va_list: `ABI_VA_LIST_POINTER`, `gp_reg_count=8`, `gp_slot_size=4`, `fp_reg_count=0`
+ (**FP varargs always go via INT regs even under ilp32f**). Two thin static vtables
+ (`rv32_vtable`, `rv64_vtable`) sharing the classifier, differing only in the va_list literal.
+- `src/abi/registry.c`: add `KIT_ABI_RV32_ENABLED` and an `{KIT_ARCH_RV32, KIT_OBJ_ELF,
+ &rv32_vtable}` entry (one entry serves both ilp32/ilp32f; the float axis is read from the spec).
+- **Gate**: ABI classification golden tests (`test/api/abi_classify_test.c` style) for rv32
+ ilp32f and ilp32.
+
+### WS5 — ELFCLASS32 object emission + reading (largest item)
+Introduce one `is32`/`ElfEnc` flag (from `c->target.ptr_size`) threaded through, **not**
+copy-paste duplication.
+- `src/obj/elf/elf.h`: add `ELFCLASS32`, `ELF32_{EHDR,PHDR,SHDR}_SIZE` (52/32/40),
+ `ELF32_SYM_SIZE`(16)/`ELF32_RELA_SIZE`(12), `ELF32_R_INFO(s,t)=((s)<<8)|((t)&0xff)`,
+ `ELF32_R_SYM/TYPE`.
+- `src/obj/elf/emit.c`: replace the `ptr_size != 8` panic (`:271`); branch sym record (16B,
+ different field widths) and rela record (12B, `ELF32_R_INFO`) writers; `EI_CLASS`
+ (`:664`); Ehdr/Shdr address fields via `elf_wr_u32` and ELF32 sizes; e_flags from
+ `float_abi` (`EF_RISCV_FLOAT_ABI_SINGLE`/`_SOFT` | `EF_RISCV_RVC`).
+- `src/obj/elf/read.c`: accept `ELFCLASS32` (`:446,814`); add `parse_shdr32`/`parse_sym32`/
+ rela32 with the correct offsets/strides and `ELF32_R_SYM/TYPE`. Scope v1 to ET_REL +
+ ET_EXEC reads; give ELF32 ET_DYN a clear "unsupported" rather than mis-parse.
+- `src/obj/elf/link.c`: ELF32 ET_EXEC writer (parallel parameterization to emit.c) — needed
+ for `ld`/`run`/`dbg`. `link_dyn.c` and `emu_load.c` stay rv64/ELF64-only: gate rv32 to
+ static linking (freestanding `-none-elf` defaults to `KIT_PIC_NONE`), panic-with-diagnostic
+ for rv32 dynamic.
+- New `src/obj/elf/reloc_riscv32.c`: clone `reloc_riscv64.c`; map `R_ABS32`→`ELF_R_RISCV_32`,
+ and `R_ABS64`/`R_RV_ADD64`/`R_RV_SUB64`→unsupported; reuse all XLEN-neutral kinds.
+- `src/obj/registry.c`: add the rv32 `obj_elf_arch_ops` entry. **EM_RISCV is shared by rv32
+ and rv64** — disambiguate reloc-table selection by `EI_CLASS`, not e_machine alone.
+- **Gate**: new `test/elf/unit/rv32_class32.c` write-then-read round-trip; `kit objdump`/`nm`
+ on a hand-built rv32 `.o`.
+
+### WS6 — 64-bit-int + soft-float-double legalization (hardest part)
+The cg layer (`src/cg/arith.c`) only routes wide ops to libcalls for the `__int128` builtin
+(`api_i128_stack_top`), **never by width** — so `long long` on rv32 currently reaches the
+backend as a raw 8-byte value, and `double` arithmetic would emit illegal `.d` ops.
+- **64-bit integers on rv32**: generalize the i128 libcall mechanism in `src/cg/arith.c` to a
+ "wider than target word" predicate (`type_size > c->target.ptr_size`). Recommended v1:
+ route `mul/div/udiv/mod/shifts` to runtime libcalls (`__muldi3`, `__divdi3`, `__udivdi3`,
+ `__moddi3`, `__ashldi3`, `__lshrdi3`, `__ashrdi3`); do `add/sub/and/or/xor/load/store/move`
+ inline as register pairs in the backend (these are unavoidable for memory/arg traffic).
+ Add a **loud panic** in `rv_binop`/`rv_convert` if a wide value reaches the native-width
+ path, so any missed case fails fast.
+- **Soft-float `double` on ilp32f/ilp32**: route `double` arithmetic and `double`↔int/float
+ conversions to libcalls (`__adddf3`, `__subdf3`, `__muldf3`, `__divdf3`, `__extendsfdf2`,
+ `__truncdfsf2`, `__fixdfsi`, `__floatsidf`, df compares) — mirror the existing f128 path so
+ the backend only ever sees `float` (S) FP ops. Backend panics on any `RV_FMT_D` selection
+ when `xlen==32`.
+- Confirm `long double == double` (8B) and `__int128` absent on rv32 (runtime sets
+ `INT128=0`, no `LDBL128`), so the 16-byte scalar classify path is effectively dead there.
+- **Gate**: red-green targeted tests — `long long` add/mul/div and `double` add/mul/convert
+ compile to plausible sequences (verified via decode/disasm; behavior via qemu if available).
+
+### WS7 — Runtime build wiring (`mk/rt.mk`)
+- The `riscv32-elf` / `riscv32-elf-save-restore` variants exist but are **wrong**:
+ `-mabi=ilp32 -march=rv32imafd` (D present). Fix to the confirmed profile and add the
+ hard-float variant:
+ - `riscv32-elf` (ilp32, soft): `-mabi=ilp32 -march=rv32imac`.
+ - `riscv32-elf-hardfloat` (ilp32f): `-mabi=ilp32f -march=rv32imafc`.
+ - Both keep `ABI=ilp32` (the *integer* layout → `rt/lib/include/ilp32_le`; `f` only affects
+ FP arg passing), `INT128=0`, `CORO=riscv32`.
+- Mandatory builtins are already selected: `RT_ABI_SRCS_ilp32 = rt/lib/int32/int32.c` (64-bit
+ int helpers) and `rt/lib/fp/fp.c` (soft `double`). Verify the df soft-float ops compile for
+ the rv32 target.
+- `mk/lib_srcs.mk`: widen the ABI/reloc source guards to include `KIT_ARCH_RV32_ENABLED`; add
+ `reloc_riscv32.c` to the ELF source group.
+- **Gate**: `kit cc -target riscv32-none-elf -c rt/lib/.../smoke` builds; `make rt` produces
+ the rv32 runtime variants.
+
+### WS8 — JIT `run` / `dbg`
+`kit run`/`dbg` execute JIT bytes **natively in-process** (`run.c` `entry_fn(...)`); there is
+no cross-arch execution path (emulator is out of scope). So on a non-rv32 host, rv32 code
+cannot be executed — same situation as rv64's existing JIT test, which builds the image and
+**skips the call** (exit 77).
+- `src/link/link_jit.c`: audit only — it is XLEN-neutral and patches via shared `R_RV_*`
+ reloc kinds; the only u64/TLV slots are Mach-O-guarded (ELF never reaches them). No change
+ expected, provided WS2/WS6 emit the same reloc kinds.
+- `rv32_dbg_ops` from WS2 (RVC-aware lengths, step-over fallback).
+- **v1 deliverable**: JIT image build + relocation + symbol lookup wired and unit-tested
+ without execution; native execution host-gated to rv32 hosts.
+
+### WS9 — Tests & verification (see Verification below)
+
+## Parallel workstream map
+
+Much of this is separable. Lock a small set of **shared interfaces first** (Phase A), then five
+tracks proceed in parallel (Phase B), converging at integration (Phase C). The critical path is
+Phase A → Track 1 (the backend chain WS1→WS2→WS3); ELF32 (Track 2) is the largest *effort* but is
+parallel, so starting it immediately keeps it off the wall-clock.
+
+**Phase A — shared contracts (serial, small, land first; unblocks everyone):**
+- **WS0** `RiscvVariant` + `riscv_variant_for_kind` + `KIT_ARCH_RV32_ENABLED`.
+- **WS4a** the float-ABI interface only: `KitFloatAbi` enum, `KitTargetSpec.float_abi`,
+ `KitTargetOptions.abi`, and the `-mabi`/`-mcmodel` parse → resolve → validate plumbing
+ (`driver/lib/target.c`, `driver/cmd/cc.c`, `src/api/core.c`). No classifier change yet.
+
+The four contracts everyone codes against (freeze these in Phase A):
+1. **`RiscvVariant`** fields (XLEN/ptr_bytes/gp_slot_bytes/has_w_forms/shamt_bits/frame_save_size)
+ — consumed by Track 1.
+2. **`float_abi`** on the spec — consumed by Track 2 (e_flags), Track 3 (FP-eligibility),
+ Track 5 (soft-double), and WS3 (predefined macros).
+3. **Reloc-kind list**: the exact `R_RV_*` kinds rv32 codegen emits = the set rv32 ELF maps and
+ `link_jit` expects (= existing rv64 set minus `R_*64`/`ADD64`/`SUB64`). Track 1 ↔ Track 2.
+4. **Runtime libcall names** (`__adddf3`, `__muldf3`, `__fixdfsi`, `__floatsidf`, `__extendsfdf2`,
+ `__truncdfsf2`, `__muldi3`, `__divdi3`, `__udivdi3`, `__moddi3`, `__ashldi3`, `__lshrdi3`,
+ `__ashrdi3`) emitted by WS6 = provided by WS7. Track 5 ↔ Track 4.
+5. **ABI part-layout**: i64/soft-`double` → even-aligned GPR pair; `gp_slot_size=4`; callee-save
+ stride. Track 3 publishes it via the vtable; Track 1's native-frame code consumes it.
+
+**Phase B — parallel tracks (each independently testable):**
+- **Track 1 — Backend (critical path, serial within):** WS1 (rename + thread variant, rv64
+ byte-identical) → WS2 (ISA/asm/disasm/link/dbg XLEN param) → WS3 (rv32 ArchImpl + `-march` +
+ macros). Gate per step against rv64 regression, then rv32 mc/disas round-trip.
+- **Track 2 — ELF32 (WS5):** fully independent of codegen — develop and test the ELFCLASS32
+ writer/reader via a hand-built `ObjBuilder` for `KIT_ARCH_RV32` (`test/elf/unit/rv32_class32.c`
+ write→read roundtrip). Only consumes `float_abi` (e_flags) + the reloc list. Largest effort;
+ start day one.
+- **Track 3 — ABI classifier (WS4b):** the shared RISC-V classifier + `rv32_vtable`, parameterized
+ by the descriptor. Independent of codegen — test via `test/api/abi_classify_test.c` for ilp32f
+ and ilp32. Consumes `RiscvVariant`/`float_abi`.
+- **Track 4 — Runtime (WS7):** `mk/rt.mk` fixes (correct `-march`/`-mabi`, add hardfloat variant)
+ + `mk/lib_srcs.mk` guards. The edits are independent and land early; the `make rt` *validation*
+ gates on Track 1 codegen.
+- **Track 5 — cg legalization (WS6):** wide-int + soft-`double` → libcall routing in
+ `src/cg/arith.c`, keyed on `ptr_size`/`float_abi`. Logic is independent; end-to-end validation
+ needs Track 1 + Track 4. Highest correctness risk — design early against the libcall contract.
+
+**Phase C — integration (after tracks converge):**
+- Register `arch_impl_rv32` (Track 1 + Track 3). Wire object registry (Track 2).
+- **WS8** JIT `run`/`dbg` audit + `rv32_dbg_ops` (Track 1 + Track 2).
+- **WS9** end-to-end: decode/asm goldens, `kit cc → ld → qemu` smoke (all tracks + WS6 + WS7).
+
+## Verification
+
+### Verified execution oracle (clang + qemu-system, confirmed working on this host)
+
+clang 22 has the `riscv32` target and `llvm-objdump`/`llvm-mc`/`ld.lld` are installed.
+**qemu user-mode is not built on macOS** — only `qemu-system-riscv32` — which suits a
+freestanding `-none-elf` target. A confirmed working recipe (PASS→exit 0, wrong answer→exit 7,
+hang→exit 124), to be mirrored by `test/smoke/rv32.sh`:
+- Build: `clang --target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f -nostdlib -ffreestanding`
+ (and an `ilp32`/`rv32imac` soft variant); link `ld.lld -Ttext=0x80000000 -e _start`.
+- Startup stub (`_start`): set `sp` (RAM at `0x80000000`); **for ilp32f set `mstatus.FS`**
+ (`li t0,0x2000; csrs mstatus,t0`) to enable the FPU before any `fadd.s` — otherwise it traps
+ and hangs. Soft `ilp32` skips this.
+- Result via SiFive test finisher at `0x100000`: `0x5555`→qemu poweroff exit 0;
+ `0x3333|(code<<16)`→qemu exit `code`.
+- Run: `qemu-system-riscv32 -machine virt -bios none -kernel prog.elf -nographic -no-reboot`
+ (wrap in `timeout`). Verified that clang emits the expected `fadd.s` + inline 64-bit `add`/`sltu`
+ + `fcvt.w.s` for ilp32f, and `llvm-readelf` shows ELF32 / "single-float ABI" / RVC flags.
+
+This is the kit smoke: `kit cc -target riscv32-none-elf ... -c app.c`, assemble the startup stub,
+`kit ld` to an ELF, run under qemu-system, assert exit 0. Unlike rv64 (qemu-user/podman), rv32
+uses qemu-system + a bare-metal startup + finisher device. `regen-rv32.sh` uses
+`clang --target=riscv32 + llvm-objdump` for asm/disasm goldens.
+
+### Milestones
+
+kit has no in-process rv32 execution path (emulator out of scope), so behavioral correctness
+comes from the **clang+qemu-system oracle above**; structural correctness comes from
+**self-consistency** (decode↔format, ELF write↔read). Milestone order (each green before the
+next), preferring targeted runs and redirecting output to a file (per CLAUDE.md):
+
+1. **Build/register**: `make lib 2>&1 | tee /tmp/build.log`; target recognized.
+2. **Decode/encode self-roundtrip** — new `test/arch/rv32_decode_test.c` (mirror
+ `rv64_decode_test.c`): no W-forms, `lw/sw` (no `ld/sd`), 5-bit shamt, `c.jal`,
+ `c.lw/c.sw`, `c.flw/c.fsw`; decode↔format agreement is the oracle.
+ `make test-isa 2>&1 | tee /tmp/isa.log`.
+3. **Assembler/disasm corpus** — `test/asm/` rv32 lane + `regen-rv32.sh` (clang
+ `--target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f` + `llvm-objdump` as reference,
+ maintainer-only, soft-skip if absent; committed goldens replayed by CI).
+ `make test-asm-rv32 2>&1 | tee /tmp/asm32.log`.
+4. **ELF32 round-trip** — `test/elf/unit/rv32_class32.c` (first ELFCLASS32 consumer):
+ write→read-back, assert `EI_CLASS==ELFCLASS32`, `Elf32_Sym`/`Elf32_Rela` survive.
+ `make test-elf 2>&1 | tee /tmp/elf.log`.
+5. **Compile + inspect** (no execution):
+ `./build/kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -mabi=ilp32f -c
+ smoke.c -o /tmp/rv32.o` then `./build/kit disas /tmp/rv32.o` (optional cross-check
+ `llvm-objdump -d --triple=riscv32 /tmp/rv32.o`).
+6. **Link + JIT image** — new `test/link/rv32_jit_test.c` (mirror `rv64_jit_test.c`, exit 77
+ on non-rv32 host; include a PC-relative reloc to exercise HI20/LO12 pairing). `kit ld` to a
+ static ELF executable succeeds.
+7. **qemu-system smoke** — `test/smoke/rv32.sh` using the verified oracle above
+ (`qemu-system-riscv32 -machine virt`, FPU-enabling startup for ilp32f, SiFive finisher exit
+ codes). Compiles `app.c` with `kit cc -target riscv32-none-elf`, links with the startup stub,
+ runs under qemu, asserts exit 0. This is the **only behavioral oracle** (soft-double and
+ 64-bit-int correctness are otherwise untestable) — make it a required CI gate where
+ `qemu-system-riscv32` is present; skip-if-absent elsewhere. Add a doctor
+ (`test/lib/check_rv32_env.sh`) like rv64's.
+
+New make targets next to their rv64 peers in `test/test.mk`: `RV32_DECODE_TEST_BIN` (into
+`test-isa`), `test-asm-rv32`, `test-rv32-jit`, `test-smoke-rv32`, and `rv32` added to the
+runtime test arch list.
+
+**RV64 regression gate** (run after WS1 and again at the end):
+`make test-isa test-asm-rv64 test-smoke-rv64 test-link` + `rv64_jit_test`.
+
+## Risks
+
+1. **64-bit-int + soft-double on rv32 (WS6) is the deepest, execution-only risk.** Carry/borrow
+ chains and soft-float rounding can't be checked by byte-goldens — only execution catches
+ valid-but-wrong codegen. The behavioral oracle (qemu-system, verified) closes this, but
+ depends on `qemu-system-riscv32` being present and a correct FPU-enabling startup stub for
+ ilp32f (a missing `mstatus.FS` set silently hangs instead of failing cleanly). Mitigate with
+ qemu-gated differential tests (kit result vs host double/int64) and loud backend panics on any
+ wide/`.d` value reaching the native path.
+2. **ELFCLASS32 (WS5) is the dominant effort** (~130 Elf64-hardcoded sites across emit/read/link).
+ The write-then-read self-oracle catches internal inconsistency but not spec divergence; keep
+ one clang-oracle `cases/` rv32 ELF test for an independent cross-check. Disambiguating
+ EM_RISCV by `EI_CLASS` is a cross-cutting correctness point.
+3. **Sharing risk to RV64 (WS1/WS2)**: repurposing `rv_is_64` semantics, the `RV_FRAME_SAVE_SIZE`
+ constant→`2*ptr_bytes`, and the compressed-quadrant/shamt branches all touch the working
+ rv64 path. Land WS1/WS2 with rv64-byte-identical output and prove zero diff before enabling
+ rv32.
+4. **`-mabi` boundary**: parsed in `driver/`, validated in `src/api/core.c` where feature words
+ exist. Every spec-construction site that bypasses `kit_target_new` must default
+ `float_abi=DEFAULT` safely; the catch-all `-m` consumer must not pre-eat `-mabi`.
+5. **ilp32 vs ilp32f confusion**: `ilp32` is the *integer* ABI (type widths); the `f` is float
+ arg-passing only. The runtime `ABI=ilp32` include set is correct for both; the existing
+ `-march=rv32imafd` (D) is wrong and must become `rv32imafc`/`rv32imac`.
+6. **RVC dbg gap**: rv32imafc emits compressed insns pervasively; v1 step-over fallback degrades
+ `kit dbg` single-step for rv32. The shim unit test must assert the fallback path is taken.
+
+## Critical files
+
+- `src/arch/riscv/` (renamed from `rv64/`): `variant.h` (new), `native.c`, `isa.c/.h`,
+ `disasm.c`, `asm.c`, `link.c`, `dbg.c`, `arch.c` (two ArchImpls, `-march`, macros).
+- `src/abi/abi_rv64.c` → shared RISC-V classifier + `rv32_vtable`; `src/abi/registry.c`.
+- `src/cg/arith.c` — wide-int + soft-double legalization (WS6, the riskiest, currently absent).
+- `src/obj/elf/{elf.h,emit.c,read.c,link.c}` + new `reloc_riscv32.c`; `src/obj/registry.c`.
+- `include/kit/core.h` (`KitFloatAbi`, `KitTargetSpec.float_abi`, `KitTargetOptions.abi`),
+ `include/kit/config.h` (`KIT_ARCH_RV32_ENABLED`).
+- `driver/lib/target.c`, `driver/cmd/cc.c`, `driver/cmd/run.c` (`-mabi`, `medlow/medany`);
+ `src/api/core.c` (resolve/validate).
+- `src/arch/registry.c`, `mk/rt.mk`, `mk/lib_srcs.mk`, `test/test.mk` + new test files.