commit 0ae44208d15bba91c4d3a9188848cb3871bf02bd
parent be9be587a0b6e6599ed17c6764e1da2d93f042ec
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 8 Jun 2026 11:32:25 -0700
rv32: close out riscv32-none-elf; remove completed plan doc
The riscv32-none-elf backend is complete — all cross lanes pass at O0+O1
(test-asm-rv32 16/0, test-toy-rv32 421/0, test-parse-rv32 906/0, smoke
hardfloat green under qemu-system-riscv32). The forward-looking plan in
doc/plan/RV32.md is fully realized, so remove it and its index row (per
convention, completed plans are removed, not checked off), and move the
two durable contracts it tracked into user-facing docs:
- doc/RUNTIME.md: the freestanding TLS thread-pointer contract
([TCB(16)|.tdata|.tbss], tp->TCB base, Local-Exec only; linker bias via
ObjElfArchOps.tls_tp_bias) and the i64-atomics-as-libcall note (8-byte
_Atomic -> spinlock __atomic_*_8, correct but not lock-free).
- mk/test.mk: refresh the rv32 lane comments (corpus green; opt-in like
the rv64 cross lanes, kept out of DEFAULT_TEST_TARGETS because they
need the qemu-system-riscv32 toolchain).
Docs and comments only; no code change.
Diffstat:
4 files changed, 44 insertions(+), 533 deletions(-)
diff --git a/doc/RUNTIME.md b/doc/RUNTIME.md
@@ -119,7 +119,12 @@ native instruction.
(`atomic_common.inc`) provides the lock, hashed by address — no OS dependency.
Implemented over the GCC-style `__atomic_*` builtin family that kit itself
documents (`doc/builtins.md`), with upstream's Clang-only `__c11_atomic_*`
- calls translated. 16-byte cases are keyed off `HAS_INT128`.
+ calls translated. 16-byte cases are keyed off `HAS_INT128`. On 32-bit targets
+ (rv32 `ilp32`/`ilp32f`) the ISA has no 64-bit atomic (`lr.d`/`sc.d`/`amo*.d`
+ are rv64-only), so 8-byte `_Atomic` / `__atomic_*` lower to the `__atomic_*_8`
+ entries here — spinlock-backed, correct but **not** lock-free; the front end's
+ `__atomic_always_lock_free(8, …)` reports false to match. This is the same
+ contract libatomic provides; kit ships no native 64-bit atomic on rv32.
- **Misc** (`rt/lib/cache/clear_cache.c`): a weak `__clear_cache` (target for
`__builtin___clear_cache`) plus weak bare-metal cache stubs. ARM and RISC-V
variants add the AEABI / save-restore assembly described below.
@@ -224,6 +229,36 @@ thread gets an independent resume chain; kit's contract defines
feature, and bare-metal images with no TLS runtime collapse to single-thread
semantics.
+## Thread-local storage (freestanding contract)
+
+kit emits the **Local-Exec** TLS model only — there is no dynamic TLS
+(`__tls_get_addr`, GD/LD) and no TLS allocator. `_Thread_local` objects live in
+the executable's `PT_TLS` image and are reached `tp`-relative, with the per-arch
+offset baked in by the linker (`ObjElfArchOps.tls_tp_bias`, applied in
+`src/obj/elf/link.c`'s `tls_tcb_bias`).
+
+The runtime ships no `crt0`, so a freestanding image's own startup establishes
+the thread block and `tp`. The layout kit's codegen + linker assume for RISC-V
+and AArch64 (TLS variant I) is a 16-byte TCB *ahead* of `.tdata`:
+
+ [ TCB (16 bytes) | .tdata (init image) | .tbss (zeroed) ]
+ ^tp
+
+so a TLS variable at image offset `off` is accessed at `tp + 16 + off`. Startup
+must therefore reserve `16 + tdata_size + tbss_size`, copy the `.tdata` init
+image to `block + 16`, zero the `.tbss` span, and set `tp = block`. The
+reference implementation is `test/link/harness/start.c` (the rv32 bare-metal
+stub lives in `test/lib/exec_rv32_bare.sh`); under `ilp32f` that startup must
+also set `mstatus.FS` before any FP op. x86_64 uses TLS variant II instead
+(`tp`/`%fs` points *past* the image, `TPOFF` offsets are negative), so it carries
+no TCB bias.
+
+On a **hosted** RISC-V target the psABI points `tp` at the image start (bias 0,
+matching Linux/FreeBSD `_init_tls`); kit's linker selects that 0 bias for
+non-freestanding RISC-V automatically. With no thread block set up at all,
+`_Thread_local` still resolves against a single static image, so a bare-metal
+program that never sets `tp` collapses to single-thread semantics.
+
## Shipped headers (`rt/include/`)
kit ships its own header set so freestanding compilation needs no system
diff --git a/doc/plan/README.md b/doc/plan/README.md
@@ -24,7 +24,6 @@ shrinks to whatever remains open.
| [BACKTRACE.md](BACKTRACE.md) | Stack-trace support: GCC-compatible `__builtin_return_address`/`__builtin_frame_address` primitives, a freestanding `__kit_backtrace` capture helper, and symbolized backtrace printing. L1–L3a/L3c shipped; L3b (in-process self-symbolization) deferred. | [../FRONTENDS.md](../FRONTENDS.md), [../RUNTIME.md](../RUNTIME.md), [../DWARF.md](../DWARF.md) |
| [LTO.md](LTO.md) | Whole-program optimization: `symresolve` extraction, cross-TU inlining, internalization. Phase 0 (whole-TU opt) and Phase 1 (all-sources-up-front LTO) shipped; Phase 2 (serialized `.kit.ir` objects) open. | [../OPT.md](../OPT.md) |
| [CODEGEN.md](CODEGEN.md) | CG API interface cleanup: PLACE/VALUE centerpiece, op/intrinsic taxonomy, atomic/order/AsmDir unification, multi-result API, i128/f128-as-VALUE. Tracks 1/3/4/5/6/7 landed; Track 2 (binop/cmp split) and Track 1c open. | [../CODEGEN.md](../CODEGEN.md) |
-| [RV32.md](RV32.md) | riscv32-none-elf backend: all workstreams (WS0–WS9) complete including 64-bit-value legalization at ilp32f/ilp32. Known gaps (`__int128`, i64 atomics, i64 varargs, TLS) are intentionally left red. | [../ARCH.md](../ARCH.md) |
| [DIST_LIBRARY.md](DIST_LIBRARY.md) | Migrating the CAS/package distribution subsystem into libkit as a gated public API (`kit/cas.h`, `kit/package.h`). Main migration shipped; Stage 3 v2 dead-code deletion deferred. | [../DISTRIBUTE.md](../DISTRIBUTE.md) |
| [FREEBSD.md](FREEBSD.md) | FreeBSD target support: VM harness, triple parsing, runtime variants, COMDAT/`STB_GNU_UNIQUE` fixes. Static link blocked on archive weak-alias cycle (needs `--start-group` semantics); dynamic link and full VM validation remaining. | — |
| [TODO.md](TODO.md) | Open deferred fixes and code smells only. Completed items are removed instead of checked off. Not a roadmap; a current backlog. | — |
diff --git a/doc/plan/RV32.md b/doc/plan/RV32.md
@@ -1,528 +0,0 @@
-# Plan: RISC-V 32-bit (`riscv32-none-elf`) support
-
-## Status — 2026-06-03 (branch `rv32`) — core complete; cross-test gaps tracked
-
-`riscv32-none-elf` (`rv32imafc_zicsr_zifencei`, both `ilp32f` and `ilp32`) is a
-working cross target. WS6 — the flagged "hardest part", 64-bit-value legalization
-— is **done and behaviorally verified under `qemu-system-riscv32`** at -O0 and -O1
-for both ABIs. The full kit toolchain (`kit cc → kit ld → qemu-system`) builds and
-runs a correct bare-metal rv32 image with **no special flags** (freestanding
-defaults to no-PIE). As of 2026-06-03 the rv32 runtime is **no longer
-special-cased**: `kit cc`/`kit ld` auto-build and auto-link `libkit_rt.a` for
-`riscv32-none-elf` exactly like every other target — the driver carries two
-rv32 runtime variants (`riscv32-elf` soft ilp32, `riscv32-elf-hardfloat`
-ilp32f), selected by the float ABI recovered from the objects' ELF e_flags, so
-no explicit archive or `-nostdlib` is needed. **RV64 / x64 / aa64 fully
-non-regressed**: asm goldens byte-identical, isa (rv64 21 + rv32 31)/0,
-abi-classify 367/0, elf 41/0, link 122/0 + x64 79/0, cg-api 544/0, smoke-rv64 3/0,
-dwarf/driver/interp green.
-
-Both corpora now run on qemu-system-riscv32 as a cross arch: **Toy `240 pass / 15
-red`** (`test/toy/run.sh`, path X) and **C `439 pass / 36 red`** (`test/parse/run.sh`,
-path E). **The reds are deliberately left red** (no skip sidecars) — they are the
-real remaining rv32 gaps, enumerated in the checklist below.
-
-### Done & verified ✅
-- [x] **WS0–WS5, WS7** — variant scaffold, XLEN-parameterized backend, `arch_impl_rv32`
- + `-march`/`-mabi`/macros, shared ABI classifier + `rv32_vtable`, ELFCLASS32
- emit/read/link + `reloc_riscv32.c`, `mk/rt.mk` variants. (See git history.)
-- [x] **WS6 — 64-bit-value pair-legalization (THE blocker) — DONE.** rv32 8-byte scalars
- (`long long`/i64 AND soft `double`) are **memory-resident** (`api_is_wide8_scalar_type`
- forces `CG_LOCAL_MEMORY_REQUIRED`; `cg_ir_lower`/`pass_native_emit` size>word checks made
- `> ptr_size`), mirroring the proven i128/wide16 model. The allocator binds one register per
- value, so memory residence + the multi-part ABI path (`ABIArgPart.src_offset`,
- `rv_load_part`/`rv_store_part`) is the only correct representation. (`src/cg/arith.c`,
- `src/cg/wide.c`):
- - add/sub/and/or/xor/neg/bnot — **inline 2-word lane ops** (carry/borrow via `sltu`); no
- compiler-rt 64-bit add helper exists, so these *must* be inline.
- - i64 compares — inline lane eq/lt (signed-hi/unsigned-lo); `if(i64)` = `(lo|hi)!=0`.
- - i64 mul/div/rem/shift → `__*di3`; soft `double` → `__*df*`; i64↔float → `__floatdisf`/
- `__fixsfdi`/…; **soft single** f32 under `ilp32` → `__*sf*`; i64 clz/ctz/popcount/bswap →
- `__*di2`; 64-bit consts → two lanes.
- - `nd_*` guards (`native_direct_target.c`) **panic** on any 8-byte value reaching a
- single-register binop/unop/cmp/convert/load_imm/load_const — loud, never truncation.
-- [x] **Runtime (`make rt`) — DONE.** Both `riscv32-elf` (ilp32) and `riscv32-elf-hardfloat`
- (ilp32f) build with kit's own cc. Fixed `mk/rt.mk`: `RT_CFLAGS`/`RT_ASFLAGS` now include
- `RT_<v>_ARCH_FLAGS` (the `-mabi`/`-march` were silently dropped — every variant built ilp32f).
-- [x] **ELF e_flags float-ABI** — `emit.c`/`link.c` derive the RISC-V float-ABI bits from
- `target.float_abi` (the static descriptor hardcoded SINGLE, mislabelling `ilp32` soft);
- rv64/x64/aa64 byte-identical.
-- [x] **Freestanding policy (host-irrelevant, target-derived):**
- - kit stamps **`EI_OSABI=ELFOSABI_STANDALONE`** on `*-none-elf` objects (`emit.c`) so they
- round-trip as `KIT_OS_FREESTANDING` instead of decoding back to Linux (the "none → Linux"
- bug). `kit ld` derives the PIC default from the *target* via `driver_default_pic` (hosted
- → PIE, freestanding → no-PIE) and scans all inputs for a freestanding object — the host's
- default never leaks onto a cross target. So `kit ld` for rv32 needs **no `-no-pie`**.
- - `kit ld`/`kit cc` auto-link a runtime for any target that has a variant
- (`driver_runtime_has_variant`) — **now including `riscv32-none-elf`**. The driver
- (`driver/lib/runtime.c`) carries two rv32 runtime variants distinguished by a new
- `float_abi` axis on `RuntimeVariant` (`riscv32-elf` soft `ilp32`/`rv32imac`,
- `riscv32-elf-hardfloat` `ilp32f`/`rv32imafc`); each is built on demand with its own
- `-march`/`-mabi` via `topts.isa`/`topts.abi`. The float ABI is recovered from the RISC-V
- ELF `e_flags` in `src/api/object_detect.c` and reconciled across all link inputs in
- `driver/cmd/ld.c` (a foreign startup stub that lacks the flag never mis-selects the soft
- runtime). So a freestanding rv32 link needs **no explicit `libkit_rt.a` and no
- `-nostdlib`**. New `-Ttext ADDR` and `-nostdlib`/`--no-default-libs` flags remain
- available for images that supply their own runtime.
- - `.eh_frame` suppressed for `KIT_OS_FREESTANDING` (`src/arch/mc.c`); hosted byte-identical.
- - `layout_dyn` emits a clean diagnostic for an ELF32 dynamic/PIE link (was an ELF64 SEGV).
- - jump-table / label-address slots are width-aware (`R_ABS32` on rv32, `R_ABS64` on 64-bit)
- in `nd_local_static_data_label_addr` — fixes switch jump tables on rv32.
-- [x] **WS9 tests + CI wiring:** `test/arch/rv32_decode_test.c` (→ `test-isa`, 31 checks),
- `test/link/rv32_jit_test.c` (→ `test-rv32-jit`, exit-77 host gate),
- `test/elf/unit/rv32_class32.c` (ELFCLASS32 round-trip, → `test-elf`),
- `test/smoke/rv32.sh` (→ `test-smoke-rv32`): 7 lanes — ilp32f + ilp32 × {-O0,-O1} covering i64
- + soft-double + soft-single, two `kit ld` end-to-end lanes that **auto-link the runtime**
- (no explicit `libkit_rt.a`), a negative control. Wired in
- `mk/test.mk`/`mk/test_unit.mk` (`test-rv32-jit`, `test-smoke-rv32`).
-- [x] **Toy + C cross lanes (rv32 as an arch).** Shared bare-metal runner
- `test/lib/exec_rv32_bare.sh` (clang startup → `kit cc`/parse-runner → `kit ld` → qemu-system,
- SiFive-finisher exit oracle; entry symbol configurable — `main` for Toy, `test_main` for C).
- Toy: `test/toy/run.sh` `cross_one_rv32` (rv32 in default `TOY_CROSS_ARCHS`, path X) — **240/15**.
- C: `test/parse/run.sh` `kit_lane_E` rv32 branch + `kit_test_target.h` rv32 arm (path E,
- `KIT_TEST_ARCH=rv32`) — **439/36**. Both opt-in; reds left red.
-
-### Remaining ⚠️ — clear checklist
-
-**A. rv32 codegen gaps surfaced by the cross lanes (the reds — left red on purpose, no skips).**
-Toy `240/15`, C `439/36`; the 51 reds cluster into:
-- [ ] **`__int128`** (C: `i128_02`…`i128_13+`, ~15 cases — the largest C bucket). rv32 has no
- `__int128` (runtime `INT128=0`; the 16-byte-scalar path is dead on rv32). Decide: reject
- `__int128` on rv32 at the front end with a clear diagnostic (cleanest), or legalize it (a
- 4-word version of the wide8 work — large). Until then these are compile-fail/wrong-result.
-- [ ] **i64 atomics** (`@atomic_*<i64>` / `__atomic_*_8`; Toy 17/22/59/73/74/75/77, C
- `builtin_*_atomic_long`). rv32 `A` has no 64-bit AMO/`lr.d`/`sc.d`; needs `__atomic_*_8`
- libcalls (libatomic / a lock), absent freestanding. Provide 8-byte `__atomic_*` in `rt/`, or
- document as a hard rv32 limitation.
-- [ ] **64-bit `*_overflow` intrinsics** (Toy 58_overflow_record, C `builtin_26_sadd_overflow`).
- Legalize i64 sadd/uadd/ssub/usub/smul/umul-overflow on rv32 (the 64-bit operand reaches the
- backend un-split today → trap), à la the clz/ctz wide8 routing in `arith.c`. 32-bit works.
-- [ ] **i64 varargs** (Toy 133_varargs_mixed_types — wrong result, not a hang). Audit the rv32
- `va_arg` path for an 8-byte value (even-pair fetch from the vararg save area).
-- [ ] **thread-local storage** (Toy 141, C `6_7_1_03_thread_local_basic`, `gnu_thread_storage_01`).
- TLS needs a thread pointer the bare-metal image never sets up — likely a genuine freestanding
- limitation (the Linux lanes get it from the OS); document, or provide a static-TLS model.
-- [ ] **toy soft-float compare lowering** (Toy 153_fp_cmp_negation_b — `kit cc` "addr operand is
- not an lvalue", rv32-only, not reproducible in C). An eager soft-fp compare feeding an
- empty-then/else block hits an lvalue path the rv64 delayed-`SV_CMP` form avoids. Narrow.
-- [ ] **123_spec_demo** (Toy, hangs) — triage which of the above it exercises.
-- Test-environment mismatches (NOT rv32 codegen bugs; an `.rv32.skip` sidecar exists for them but
- none is committed): Toy 145_baremetal_privileged_aa64 (aa64 intrinsics), 20_cg_api_inline_asm_full
- + C `asm_01_grammar` (inline-asm constraints/grammar), 47_target_arch_switch (selects its expected
- exit code by target arch).
-
-**B. Pre-existing follow-ups (orthogonal to the cross tests).**
-- [ ] Optional `make` targets `test-toy-rv32` / `test-parse-rv32` (opt-in; not in
- `DEFAULT_TEST_TARGETS` while reds exist).
-- [ ] **`test/asm/` rv32 byte-golden lane + `regen-rv32.sh`** (rv32 arm in `test/asm/run.sh` /
- `kit_unit.h` + committed clang/llvm-objdump goldens; `kit_test_target.h` already has rv32).
-- [ ] **CSR pseudo-ops in the assembler** (`csrs`/`csrw`/`csrr`/… + CSR names) — a general
- RISC-V-assembler feature (missing on rv64 too; new `RV64_FMT_CSR_{R,W,WI}` + CSR-name table +
- disasm print cases). Until then the smoke/cross startup stub is clang-assembled.
-
-**Out of scope (decided):** `kit ld` ELF32 dynamic/PIE — rv32 is static-only; `layout_dyn`
-clean-panics on an ELF32 dynamic/PIE link and that is the intended behavior.
-
-### Where to look
-- WS6 legalization: `src/cg/wide.c`, `src/cg/arith.c` (binop/unop/cmp/convert + soft-fp + clz/ctz),
- `src/cg/{value,local,memory,call,control}.c`, `src/opt/{cg_ir_lower.c,pass_native_emit.c}`,
- `src/cg/native_direct_target.c` (`nd_*` panics + `nd_local_static_data_label_addr`).
-- Backend: `src/arch/riscv/{variant.{h,c},native.c,isa.{c,h},disasm.c,asm.c,link.c,dbg.c,arch.c}`.
-- ABI: `src/abi/abi_rv64.c` + `src/abi/registry.c`.
-- ELF / kit ld / freestanding policy: `src/obj/elf/{elf.h,emit.c,read.c,link.c,link_dyn.c}` +
- `reloc_riscv32.c`; `driver/cmd/ld.c` (`-Ttext`/`-nostdlib`/PIC-from-target), `driver/lib/target.c`
- (`driver_default_pic`), `driver/lib/runtime.{c,h}` (`driver_runtime_has_variant`, the two rv32
- `RuntimeVariant` entries + `float_abi`/`isa`/`abi` axis, `rt_build_archive`),
- `src/api/object_detect.c` (EI_OSABI → os; RISC-V `e_flags` → `float_abi`),
- `src/link/{link.c,link_layout.c}`, `src/api/link.c`.
-- Runtime/intrinsics: `mk/rt.mk` (ARCH_FLAGS), `src/cg/type.c` (rv32 ≡ rv64 for intrinsics).
-- Tests: `test/smoke/rv32.sh`, `test/lib/{check_rv32_env.sh,exec_rv32_bare.sh,kit_test_target.h}`,
- `test/toy/run.sh` (`cross_one_rv32`), `test/parse/run.sh` (`kit_lane_E` rv32 branch),
- `test/arch/rv32_decode_test.c`, `test/link/rv32_jit_test.c`, `test/elf/unit/rv32_class32.c`,
- `mk/test.mk`, `mk/test_unit.mk`.
-
----
-
-## Context
-
-`kit` today targets `riscv64` (LP64D) via a single backend in `src/arch/rv64/`. We want a
-new cross target:
-
-```
---target=riscv32-none-elf
--march=rv32imafc_zicsr_zifencei
--mabi=ilp32f (and also -mabi=ilp32, soft-float)
--mcmodel=medlow
-```
-
-This is a freestanding 32-bit RISC-V toolchain target: F (single-precision hardware float)
-but **no D**, so `double` and `long long` are not native and must be lowered. The enum
-`KIT_ARCH_RV32`, the `riscv32` triple parse (`driver/lib/target.c:275`, `ptr_size=4`), ELF
-auto-detection (`src/api/object_detect.c`), and the runtime source files
-(`rt/lib/riscv/rv32.S`, `rt/lib/coro/riscv32.c`) already exist but are unwired/incomplete.
-
-The intended outcome: `kit cc/as/ld/objdump/disas` produce and consume correct
-`riscv32-none-elf` ELFCLASS32 objects and static executables for both `ilp32f` and `ilp32`,
-with `libkit_rt.a` builtins available and the JIT `run`/`dbg` plumbing wired (native
-execution host-gated, as for rv64).
-
-## Confirmed scope decisions
-
-- **Shared backend**: refactor `src/arch/rv64/` into **one XLEN-parameterized RISC-V backend**
- serving both rv32 and rv64 from a single tree. RV64 must not regress and is re-validated.
-- **Subsystems in scope**: compile + assemble + link + disasm; runtime lib; JIT `run`/`dbg`.
- **Emulator is out of scope** (`src/emu`, `src/os`, `src/obj/elf/emu_load.c` stay rv64-only).
-- **ABIs**: `ilp32f` (single hard-float: `float` in `fa0-fa7`, `double`/`i64` via integer
- regs + soft-float) **and** `ilp32` (pure soft-float). `double` is always soft-float.
-- **Code model**: accept and validate `-mcmodel=medlow`/`medany`, but keep the existing
- PC-relative (`auipc` + `R_RV_PCREL_HI20/LO12`, GOT for externs) addressing for v1. No new
- absolute-addressing path.
-
-## XLEN-parameterization mechanism
-
-Add a `const RiscvVariant*` descriptor (immutable, two static instances selected by
-`KitArchKind`) carried on the per-function codegen context and threaded into the otherwise
-stateless decode/asm/disasm/link/dbg paths. This honors "no global state — everything hangs
-off a context struct" (the variant is a const table reached through a context, never ambient).
-
-New `src/arch/riscv/variant.h`:
-
-```c
-typedef struct RiscvVariant {
- KitArchKind kind; /* KIT_ARCH_RV32 / KIT_ARCH_RV64 */
- const char* name; /* "rv32" / "rv64" */
- const char* isa_prefix; /* "rv32" / "rv64" — for -march parsing */
- u8 xlen; /* 32 / 64 */
- u8 ptr_bytes; /* 4 / 8 — pointer & native register width */
- u8 gp_slot_bytes; /* 4 / 8 — varargs save & callee-save stride */
- u8 has_w_forms; /* 0 rv32 / 1 rv64 — ADDW/ADDIW/SLLIW/... */
- u8 shamt_bits; /* 5 rv32 / 6 rv64 — SLLI/SRLI/SRAI immediate */
- u32 frame_save_size; /* 2 * ptr_bytes (8 rv32 / 16 rv64) */
-} RiscvVariant;
-const RiscvVariant* riscv_variant_for_kind(KitArchKind);
-```
-
-Reached via: `RvNativeTarget.variant` (codegen), `riscv_variant_for_kind(c->target.arch)` in
-the decoder/assembler/disassembler/dbg (they already hold a `Compiler*`), and two
-`LinkArchDesc` literals for the linker. Distinguish **three different "8"s** carefully —
-`ptr_bytes` (pointer/reg width), `gp_slot_bytes` (ABI save stride), and `frame_save_size`
-(saved ra+s0 pair) — conflating them passes rv64 (all 8) and breaks rv32.
-
-The **float ABI** (soft vs single-hard) is a separate axis from XLEN, carried on
-`KitTargetSpec.float_abi` (see WS4), consumed by the ABI classifier and predefined macros.
-
-## Workstreams (ordered; each leaves a green targeted check)
-
-### WS0 — Config + variant scaffold (no behavior change)
-- `include/kit/config.h`: add `#define KIT_ARCH_RV32_ENABLED 1` (`mk/config.mk` auto-parses
- it into a make var — no `config.mk` edit needed).
-- Add `src/arch/riscv/variant.h` with the struct + two `const` instances + lookup.
-- **Gate**: `make lib` compiles.
-
-### WS1 — Directory rename + thread variant through codegen (rv64 still identical)
-- `git mv src/arch/rv64 src/arch/riscv`; fix include guards/paths. Update `mk/lib_srcs.mk:55,189`
- (`LIB_SRCS_ARCH_RV64` → `LIB_SRCS_ARCH_RISCV`, gated by `RV32 || RV64`). The only external
- referent is the symbol `arch_impl_rv64` in `src/arch/registry.c` (path-independent).
-- Keep file names and internal `rv64_`/`rv_` symbol prefixes for v1 (cosmetic rename is a
- separate follow-up; renaming 2000+ sites is pure regression risk).
-- `src/arch/riscv/native.c`: add `const RiscvVariant* variant` to `RvNativeTarget` (set from
- `c->target.arch` in the one constructor); replace hardcoded 8/16/`RV_FRAME_SAVE_SIZE`/
- `addiw`/`ld`/`sd`/float-fmt sites with variant reads. **With the rv64 variant the emitted
- bytes are byte-for-byte identical** — this isolates the "sharing" regression from rv32
- correctness. Key sites (file `src/arch/riscv/native.c`): `rv_emit_li32` (LUI+ADDIW→ADDI when
- `!has_w_forms`), `enc_int_load/store` (sw/lw vs sd/ld by `ptr_bytes`), `RV_FRAME_SAVE_SIZE`,
- varargs save area, callee-save stride, `rv_type_size`/`align` defaults, `rv_convert` sext/zext
- (`xlen - src_bits` shift; `addiw` fast-path only when `has_w_forms`).
-- **Gate**: `make test-smoke-rv64`, `test/arch/rv64_decode_test.c`, `test/asm/regen-rv64.sh`,
- `test/link/rv64_jit_test.c` all byte-identical green.
-
-### WS2 — ISA / asm / disasm / link / dbg XLEN parameterization (still rv64-only at runtime)
-- `src/arch/riscv/isa.c`/`isa.h`: add a one-byte **availability mask** column to `Rv64InsnDesc`
- (`RV_AV_RV32 | RV_AV_RV64`) rather than a second table. Mark RV64-only: W-forms
- (`addw/subw/sllw/srlw/sraw`, `addiw/slliw/srliw/sraiw`, `mulw/divw/divuw/remw/remuw`),
- 64-bit mem (`ld/sd/lwu`), 64-bit FP int conv (`fcvt.*.l/lu`, `fmv.x.d/d.x`), compressed
- `c.addiw/c.addw`, and the RV64 meaning of `c.ld/c.sd/c.ldsp/c.sdsp/c.fld/c.fsd/...`. Enable
- RV32-only: `c.jal` (shares the encoding that is `c.addiw` on rv64), `c.lw/c.sw`, `c.flw/c.fsw`.
-- `src/arch/riscv/disasm.c` + the compressed decoder `rv64_disasm_find_c`: pass the variant in;
- branch the ambiguous compressed quadrant encodings and the **5-bit vs 6-bit shamt** decode
- (`& 0x1f` on rv32, reject bit 25 set). `rv64_disasm_find`/`rv64_asm_find` skip rows by mask.
-- `src/arch/riscv/link.c`: split `link_arch_rv64` and a new `link_arch_rv32`; PLT/IPLT stubs use
- `rv_lw` instead of `rv_ld` (re-check stub sizes/offsets for 4-byte slots).
-- `src/arch/riscv/dbg.c`: parameterize the displaced-step shim by `ptr_bytes`; set
- `min_insn_len=2, max_insn_len=4` for rv32 (C ext on); RVC control-flow falls back to
- step-over (`KIT_UNSUPPORTED`), 4-byte fixups reuse the rv64 builder.
-- **Gate**: `make test-isa`, `regen-rv64.sh`, `rv64_jit_test` still green.
-
-### WS3 — rv32 ArchImpl + registry + `-march` + predefined macros
-- `src/arch/riscv/arch.c`: define **both** `arch_impl_rv32` and `arch_impl_rv64` (share
- `cgtarget_new/asm_new/disasm_new/decode/dwarf/dbg/asm_ops`/register file; differ in `.kind`,
- `.name`, `.link`, `.predefined_macros`, `.target_feature_*`, and `cfi_data_align_factor`
- -4 vs -8).
-- Generalize `rv64_target_feature_apply_isa` (currently hard-requires the `"rv64"` prefix,
- `arch.c:204`) to compare against `variant->isa_prefix`. rv32 default profile =
- `rv32imafc_zicsr_zifencei` (I/M/A/F/C/Zicsr/Zifencei, **D cleared**).
-- Predefined macros for rv32 (float-abi-dependent, see WS4): `__riscv_xlen=32`,
- `__ILP32__`/`_ILP32` (drop `__LP64__`/`_LP64`), `__riscv_float_abi_single` (ilp32f) **or**
- `__riscv_float_abi_soft` (ilp32) instead of `_double`, `__riscv_flen=32` when F present.
-- `src/arch/registry.c`: register `arch_impl_rv32` under `#if KIT_ARCH_RV32_ENABLED`
- (`:24,50,57`); `arch_kind_name` already returns "riscv32".
-- **Gate**: `kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -E -dM` shows the
- right macros; `kit mc`/`disas -target riscv32-none-elf` round-trips a hand-written rv32 insn.
-
-### WS4 — ABI vtable refactor + `-mabi` plumbing
-- **New spec field**: in `include/kit/core.h` add `enum KitFloatAbi {DEFAULT, SOFT, SINGLE,
- DOUBLE}` and `uint8_t float_abi` on `KitTargetSpec`; add `KitSlice abi` to `KitTargetOptions`.
-- **Driver `-mabi`**: in `driver/lib/target.c`, intercept `-mabi=`/`-mabi` in
- `driver_target_features_try_consume` **before** the catch-all `-m<x>` fallback (which would
- otherwise mis-eat it), mirroring `-march` at `:154-165`; carry through `driver_target_options`.
- Add `medlow`→`KIT_CM_SMALL`, `medany`→`KIT_CM_MEDIUM` aliases in `cc_record_mcmodel`
- (`driver/cmd/cc.c:751`) and `run_record_mcmodel` (`driver/cmd/run.c:379`).
-- **Resolve + validate** in `kit_target_new` (`src/api/core.c`), after `-march` features are
- known: parse `ilp32|ilp32f|ilp32d|lp64|lp64f|lp64d`; if omitted, derive from `-march`
- (D→DOUBLE, F-no-D→SINGLE, else SOFT); **reject** `*f` without F and `*d` without D. So
- `rv32imafc` defaults to `ilp32f`, and `ilp32d` is rejected.
-- **Shared ABI classifier**: generalize `src/abi/abi_rv64.c` into a RISC-V classifier
- parameterized by a descriptor `{xlen_bytes (=ptr_size), gpr_bytes, aggregate_gpr_bytes=2*gpr,
- flen (0/4/8), float_abi}` read from `a->c->target`. Replace the `RV64_ABI_*_BYTES=8/16` enum.
- - FP-eligibility predicate `fp_eligible(desc, size)`: SOFT never; SINGLE iff `size==4`
- (float; `double` 8>flen4 → INT pair); DOUBLE iff `size<=8` (preserves rv64 LP64D).
- - `classify_scalar`: i8/16/32/ptr → 1 INT part; `i64`/soft-`double` → **2 INT parts of 4 in
- an even-aligned GPR pair**; `float` (ilp32f) → 1 FP part (fa0-fa7). Replace the hardcoded
- `size==16 → 2×8` with `nparts = size/gpr_bytes`.
- - `classify_aggregate`: register threshold `2*gpr_bytes` (8 on rv32), chunk by `gpr_bytes`;
- HFA refinement gated by `fp_eligible`.
- - va_list: `ABI_VA_LIST_POINTER`, `gp_reg_count=8`, `gp_slot_size=4`, `fp_reg_count=0`
- (**FP varargs always go via INT regs even under ilp32f**). Two thin static vtables
- (`rv32_vtable`, `rv64_vtable`) sharing the classifier, differing only in the va_list literal.
-- `src/abi/registry.c`: add `KIT_ABI_RV32_ENABLED` and an `{KIT_ARCH_RV32, KIT_OBJ_ELF,
- &rv32_vtable}` entry (one entry serves both ilp32/ilp32f; the float axis is read from the spec).
-- **Gate**: ABI classification golden tests (`test/api/abi_classify_test.c` style) for rv32
- ilp32f and ilp32.
-
-### WS5 — ELFCLASS32 object emission + reading (largest item)
-Introduce one `is32`/`ElfEnc` flag (from `c->target.ptr_size`) threaded through, **not**
-copy-paste duplication.
-- `src/obj/elf/elf.h`: add `ELFCLASS32`, `ELF32_{EHDR,PHDR,SHDR}_SIZE` (52/32/40),
- `ELF32_SYM_SIZE`(16)/`ELF32_RELA_SIZE`(12), `ELF32_R_INFO(s,t)=((s)<<8)|((t)&0xff)`,
- `ELF32_R_SYM/TYPE`.
-- `src/obj/elf/emit.c`: replace the `ptr_size != 8` panic (`:271`); branch sym record (16B,
- different field widths) and rela record (12B, `ELF32_R_INFO`) writers; `EI_CLASS`
- (`:664`); Ehdr/Shdr address fields via `elf_wr_u32` and ELF32 sizes; e_flags from
- `float_abi` (`EF_RISCV_FLOAT_ABI_SINGLE`/`_SOFT` | `EF_RISCV_RVC`).
-- `src/obj/elf/read.c`: accept `ELFCLASS32` (`:446,814`); add `parse_shdr32`/`parse_sym32`/
- rela32 with the correct offsets/strides and `ELF32_R_SYM/TYPE`. Scope v1 to ET_REL +
- ET_EXEC reads; give ELF32 ET_DYN a clear "unsupported" rather than mis-parse.
-- `src/obj/elf/link.c`: ELF32 ET_EXEC writer (parallel parameterization to emit.c) — needed
- for `ld`/`run`/`dbg`. `link_dyn.c` and `emu_load.c` stay rv64/ELF64-only: gate rv32 to
- static linking (freestanding `-none-elf` defaults to `KIT_PIC_NONE`), panic-with-diagnostic
- for rv32 dynamic.
-- New `src/obj/elf/reloc_riscv32.c`: clone `reloc_riscv64.c`; map `R_ABS32`→`ELF_R_RISCV_32`,
- and `R_ABS64`/`R_RV_ADD64`/`R_RV_SUB64`→unsupported; reuse all XLEN-neutral kinds.
-- `src/obj/registry.c`: add the rv32 `obj_elf_arch_ops` entry. **EM_RISCV is shared by rv32
- and rv64** — disambiguate reloc-table selection by `EI_CLASS`, not e_machine alone.
-- **Gate**: new `test/elf/unit/rv32_class32.c` write-then-read round-trip; `kit objdump`/`nm`
- on a hand-built rv32 `.o`.
-
-### WS6 — 64-bit-int + soft-float-double legalization (hardest part)
-The cg layer (`src/cg/arith.c`) only routes wide ops to libcalls for the `__int128` builtin
-(`api_i128_stack_top`), **never by width** — so `long long` on rv32 currently reaches the
-backend as a raw 8-byte value, and `double` arithmetic would emit illegal `.d` ops.
-- **64-bit integers on rv32**: generalize the i128 libcall mechanism in `src/cg/arith.c` to a
- "wider than target word" predicate (`type_size > c->target.ptr_size`). Recommended v1:
- route `mul/div/udiv/mod/shifts` to runtime libcalls (`__muldi3`, `__divdi3`, `__udivdi3`,
- `__moddi3`, `__ashldi3`, `__lshrdi3`, `__ashrdi3`); do `add/sub/and/or/xor/load/store/move`
- inline as register pairs in the backend (these are unavoidable for memory/arg traffic).
- Add a **loud panic** in `rv_binop`/`rv_convert` if a wide value reaches the native-width
- path, so any missed case fails fast.
-- **Soft-float `double` on ilp32f/ilp32**: route `double` arithmetic and `double`↔int/float
- conversions to libcalls (`__adddf3`, `__subdf3`, `__muldf3`, `__divdf3`, `__extendsfdf2`,
- `__truncdfsf2`, `__fixdfsi`, `__floatsidf`, df compares) — mirror the existing f128 path so
- the backend only ever sees `float` (S) FP ops. Backend panics on any `RV_FMT_D` selection
- when `xlen==32`.
-- Confirm `long double == double` (8B) and `__int128` absent on rv32 (runtime sets
- `INT128=0`, no `LDBL128`), so the 16-byte scalar classify path is effectively dead there.
-- **Gate**: red-green targeted tests — `long long` add/mul/div and `double` add/mul/convert
- compile to plausible sequences (verified via decode/disasm; behavior via qemu if available).
-
-### WS7 — Runtime build wiring (`mk/rt.mk`)
-- The `riscv32-elf` / `riscv32-elf-save-restore` variants exist but are **wrong**:
- `-mabi=ilp32 -march=rv32imafd` (D present). Fix to the confirmed profile and add the
- hard-float variant:
- - `riscv32-elf` (ilp32, soft): `-mabi=ilp32 -march=rv32imac`.
- - `riscv32-elf-hardfloat` (ilp32f): `-mabi=ilp32f -march=rv32imafc`.
- - Both keep `ABI=ilp32` (the *integer* layout → `rt/lib/include/ilp32_le`; `f` only affects
- FP arg passing), `INT128=0`, `CORO=riscv32`.
-- Mandatory builtins are already selected: `RT_ABI_SRCS_ilp32 = rt/lib/int32/int32.c` (64-bit
- int helpers) and `rt/lib/fp/fp.c` (soft `double`). Verify the df soft-float ops compile for
- the rv32 target.
-- `mk/lib_srcs.mk`: widen the ABI/reloc source guards to include `KIT_ARCH_RV32_ENABLED`; add
- `reloc_riscv32.c` to the ELF source group.
-- **Gate**: `kit cc -target riscv32-none-elf -c rt/lib/.../smoke` builds; `make rt` produces
- the rv32 runtime variants.
-
-### WS8 — JIT `run` / `dbg`
-`kit run`/`dbg` execute JIT bytes **natively in-process** (`run.c` `entry_fn(...)`); there is
-no cross-arch execution path (emulator is out of scope). So on a non-rv32 host, rv32 code
-cannot be executed — same situation as rv64's existing JIT test, which builds the image and
-**skips the call** (exit 77).
-- `src/link/link_jit.c`: audit only — it is XLEN-neutral and patches via shared `R_RV_*`
- reloc kinds; the only u64/TLV slots are Mach-O-guarded (ELF never reaches them). No change
- expected, provided WS2/WS6 emit the same reloc kinds.
-- `rv32_dbg_ops` from WS2 (RVC-aware lengths, step-over fallback).
-- **v1 deliverable**: JIT image build + relocation + symbol lookup wired and unit-tested
- without execution; native execution host-gated to rv32 hosts.
-
-### WS9 — Tests & verification (see Verification below)
-
-## Parallel workstream map
-
-Much of this is separable. Lock a small set of **shared interfaces first** (Phase A), then five
-tracks proceed in parallel (Phase B), converging at integration (Phase C). The critical path is
-Phase A → Track 1 (the backend chain WS1→WS2→WS3); ELF32 (Track 2) is the largest *effort* but is
-parallel, so starting it immediately keeps it off the wall-clock.
-
-**Phase A — shared contracts (serial, small, land first; unblocks everyone):**
-- **WS0** `RiscvVariant` + `riscv_variant_for_kind` + `KIT_ARCH_RV32_ENABLED`.
-- **WS4a** the float-ABI interface only: `KitFloatAbi` enum, `KitTargetSpec.float_abi`,
- `KitTargetOptions.abi`, and the `-mabi`/`-mcmodel` parse → resolve → validate plumbing
- (`driver/lib/target.c`, `driver/cmd/cc.c`, `src/api/core.c`). No classifier change yet.
-
-The four contracts everyone codes against (freeze these in Phase A):
-1. **`RiscvVariant`** fields (XLEN/ptr_bytes/gp_slot_bytes/has_w_forms/shamt_bits/frame_save_size)
- — consumed by Track 1.
-2. **`float_abi`** on the spec — consumed by Track 2 (e_flags), Track 3 (FP-eligibility),
- Track 5 (soft-double), and WS3 (predefined macros).
-3. **Reloc-kind list**: the exact `R_RV_*` kinds rv32 codegen emits = the set rv32 ELF maps and
- `link_jit` expects (= existing rv64 set minus `R_*64`/`ADD64`/`SUB64`). Track 1 ↔ Track 2.
-4. **Runtime libcall names** (`__adddf3`, `__muldf3`, `__fixdfsi`, `__floatsidf`, `__extendsfdf2`,
- `__truncdfsf2`, `__muldi3`, `__divdi3`, `__udivdi3`, `__moddi3`, `__ashldi3`, `__lshrdi3`,
- `__ashrdi3`) emitted by WS6 = provided by WS7. Track 5 ↔ Track 4.
-5. **ABI part-layout**: i64/soft-`double` → even-aligned GPR pair; `gp_slot_size=4`; callee-save
- stride. Track 3 publishes it via the vtable; Track 1's native-frame code consumes it.
-
-**Phase B — parallel tracks (each independently testable):**
-- **Track 1 — Backend (critical path, serial within):** WS1 (rename + thread variant, rv64
- byte-identical) → WS2 (ISA/asm/disasm/link/dbg XLEN param) → WS3 (rv32 ArchImpl + `-march` +
- macros). Gate per step against rv64 regression, then rv32 mc/disas round-trip.
-- **Track 2 — ELF32 (WS5):** fully independent of codegen — develop and test the ELFCLASS32
- writer/reader via a hand-built `ObjBuilder` for `KIT_ARCH_RV32` (`test/elf/unit/rv32_class32.c`
- write→read roundtrip). Only consumes `float_abi` (e_flags) + the reloc list. Largest effort;
- start day one.
-- **Track 3 — ABI classifier (WS4b):** the shared RISC-V classifier + `rv32_vtable`, parameterized
- by the descriptor. Independent of codegen — test via `test/api/abi_classify_test.c` for ilp32f
- and ilp32. Consumes `RiscvVariant`/`float_abi`.
-- **Track 4 — Runtime (WS7):** `mk/rt.mk` fixes (correct `-march`/`-mabi`, add hardfloat variant)
- + `mk/lib_srcs.mk` guards. The edits are independent and land early; the `make rt` *validation*
- gates on Track 1 codegen.
-- **Track 5 — cg legalization (WS6):** wide-int + soft-`double` → libcall routing in
- `src/cg/arith.c`, keyed on `ptr_size`/`float_abi`. Logic is independent; end-to-end validation
- needs Track 1 + Track 4. Highest correctness risk — design early against the libcall contract.
-
-**Phase C — integration (after tracks converge):**
-- Register `arch_impl_rv32` (Track 1 + Track 3). Wire object registry (Track 2).
-- **WS8** JIT `run`/`dbg` audit + `rv32_dbg_ops` (Track 1 + Track 2).
-- **WS9** end-to-end: decode/asm goldens, `kit cc → ld → qemu` smoke (all tracks + WS6 + WS7).
-
-## Verification
-
-### Verified execution oracle (clang + qemu-system, confirmed working on this host)
-
-clang 22 has the `riscv32` target and `llvm-objdump`/`llvm-mc`/`ld.lld` are installed.
-**qemu user-mode is not built on macOS** — only `qemu-system-riscv32` — which suits a
-freestanding `-none-elf` target. A confirmed working recipe (PASS→exit 0, wrong answer→exit 7,
-hang→exit 124), to be mirrored by `test/smoke/rv32.sh`:
-- Build: `clang --target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f -nostdlib -ffreestanding`
- (and an `ilp32`/`rv32imac` soft variant); link `ld.lld -Ttext=0x80000000 -e _start`.
-- Startup stub (`_start`): set `sp` (RAM at `0x80000000`); **for ilp32f set `mstatus.FS`**
- (`li t0,0x2000; csrs mstatus,t0`) to enable the FPU before any `fadd.s` — otherwise it traps
- and hangs. Soft `ilp32` skips this.
-- Result via SiFive test finisher at `0x100000`: `0x5555`→qemu poweroff exit 0;
- `0x3333|(code<<16)`→qemu exit `code`.
-- Run: `qemu-system-riscv32 -machine virt -bios none -kernel prog.elf -nographic -no-reboot`
- (wrap in `timeout`). Verified that clang emits the expected `fadd.s` + inline 64-bit `add`/`sltu`
- + `fcvt.w.s` for ilp32f, and `llvm-readelf` shows ELF32 / "single-float ABI" / RVC flags.
-
-This is the kit smoke: `kit cc -target riscv32-none-elf ... -c app.c`, assemble the startup stub,
-`kit ld` to an ELF, run under qemu-system, assert exit 0. Unlike rv64 (qemu-user/podman), rv32
-uses qemu-system + a bare-metal startup + finisher device. `regen-rv32.sh` uses
-`clang --target=riscv32 + llvm-objdump` for asm/disasm goldens.
-
-### Milestones
-
-kit has no in-process rv32 execution path (emulator out of scope), so behavioral correctness
-comes from the **clang+qemu-system oracle above**; structural correctness comes from
-**self-consistency** (decode↔format, ELF write↔read). Milestone order (each green before the
-next), preferring targeted runs and redirecting output to a file (per CLAUDE.md):
-
-1. **Build/register**: `make lib 2>&1 | tee /tmp/build.log`; target recognized.
-2. **Decode/encode self-roundtrip** — new `test/arch/rv32_decode_test.c` (mirror
- `rv64_decode_test.c`): no W-forms, `lw/sw` (no `ld/sd`), 5-bit shamt, `c.jal`,
- `c.lw/c.sw`, `c.flw/c.fsw`; decode↔format agreement is the oracle.
- `make test-isa 2>&1 | tee /tmp/isa.log`.
-3. **Assembler/disasm corpus** — `test/asm/` rv32 lane + `regen-rv32.sh` (clang
- `--target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f` + `llvm-objdump` as reference,
- maintainer-only, soft-skip if absent; committed goldens replayed by CI).
- `make test-asm-rv32 2>&1 | tee /tmp/asm32.log`.
-4. **ELF32 round-trip** — `test/elf/unit/rv32_class32.c` (first ELFCLASS32 consumer):
- write→read-back, assert `EI_CLASS==ELFCLASS32`, `Elf32_Sym`/`Elf32_Rela` survive.
- `make test-elf 2>&1 | tee /tmp/elf.log`.
-5. **Compile + inspect** (no execution):
- `./build/kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -mabi=ilp32f -c
- smoke.c -o /tmp/rv32.o` then `./build/kit disas /tmp/rv32.o` (optional cross-check
- `llvm-objdump -d --triple=riscv32 /tmp/rv32.o`).
-6. **Link + JIT image** — new `test/link/rv32_jit_test.c` (mirror `rv64_jit_test.c`, exit 77
- on non-rv32 host; include a PC-relative reloc to exercise HI20/LO12 pairing). `kit ld` to a
- static ELF executable succeeds.
-7. **qemu-system smoke** — `test/smoke/rv32.sh` using the verified oracle above
- (`qemu-system-riscv32 -machine virt`, FPU-enabling startup for ilp32f, SiFive finisher exit
- codes). Compiles `app.c` with `kit cc -target riscv32-none-elf`, links with the startup stub,
- runs under qemu, asserts exit 0. This is the **only behavioral oracle** (soft-double and
- 64-bit-int correctness are otherwise untestable) — make it a required CI gate where
- `qemu-system-riscv32` is present; skip-if-absent elsewhere. Add a doctor
- (`test/lib/check_rv32_env.sh`) like rv64's.
-
-New make targets next to their rv64 peers in `test/test.mk`: `RV32_DECODE_TEST_BIN` (into
-`test-isa`), `test-asm-rv32`, `test-rv32-jit`, `test-smoke-rv32`, and `rv32` added to the
-runtime test arch list.
-
-**RV64 regression gate** (run after WS1 and again at the end):
-`make test-isa test-asm-rv64 test-smoke-rv64 test-link` + `rv64_jit_test`.
-
-## Risks
-
-1. **64-bit-int + soft-double on rv32 (WS6) is the deepest, execution-only risk.** Carry/borrow
- chains and soft-float rounding can't be checked by byte-goldens — only execution catches
- valid-but-wrong codegen. The behavioral oracle (qemu-system, verified) closes this, but
- depends on `qemu-system-riscv32` being present and a correct FPU-enabling startup stub for
- ilp32f (a missing `mstatus.FS` set silently hangs instead of failing cleanly). Mitigate with
- qemu-gated differential tests (kit result vs host double/int64) and loud backend panics on any
- wide/`.d` value reaching the native path.
-2. **ELFCLASS32 (WS5) is the dominant effort** (~130 Elf64-hardcoded sites across emit/read/link).
- The write-then-read self-oracle catches internal inconsistency but not spec divergence; keep
- one clang-oracle `cases/` rv32 ELF test for an independent cross-check. Disambiguating
- EM_RISCV by `EI_CLASS` is a cross-cutting correctness point.
-3. **Sharing risk to RV64 (WS1/WS2)**: repurposing `rv_is_64` semantics, the `RV_FRAME_SAVE_SIZE`
- constant→`2*ptr_bytes`, and the compressed-quadrant/shamt branches all touch the working
- rv64 path. Land WS1/WS2 with rv64-byte-identical output and prove zero diff before enabling
- rv32.
-4. **`-mabi` boundary**: parsed in `driver/`, validated in `src/api/core.c` where feature words
- exist. Every spec-construction site that bypasses `kit_target_new` must default
- `float_abi=DEFAULT` safely; the catch-all `-m` consumer must not pre-eat `-mabi`.
-5. **ilp32 vs ilp32f confusion**: `ilp32` is the *integer* ABI (type widths); the `f` is float
- arg-passing only. The runtime `ABI=ilp32` include set is correct for both; the existing
- `-march=rv32imafd` (D) is wrong and must become `rv32imafc`/`rv32imac`.
-6. **RVC dbg gap**: rv32imafc emits compressed insns pervasively; v1 step-over fallback degrades
- `kit dbg` single-step for rv32. The shim unit test must assert the fallback path is taken.
-
-## Critical files
-
-- `src/arch/riscv/` (renamed from `rv64/`): `variant.h` (new), `native.c`, `isa.c/.h`,
- `disasm.c`, `asm.c`, `link.c`, `dbg.c`, `arch.c` (two ArchImpls, `-march`, macros).
-- `src/abi/abi_rv64.c` → shared RISC-V classifier + `rv32_vtable`; `src/abi/registry.c`.
-- `src/cg/arith.c` — wide-int + soft-double legalization (WS6, the riskiest, currently absent).
-- `src/obj/elf/{elf.h,emit.c,read.c,link.c}` + new `reloc_riscv32.c`; `src/obj/registry.c`.
-- `include/kit/core.h` (`KitFloatAbi`, `KitTargetSpec.float_abi`, `KitTargetOptions.abi`),
- `include/kit/config.h` (`KIT_ARCH_RV32_ENABLED`).
-- `driver/lib/target.c`, `driver/cmd/cc.c`, `driver/cmd/run.c` (`-mabi`, `medlow/medany`);
- `src/api/core.c` (resolve/validate).
-- `src/arch/registry.c`, `mk/rt.mk`, `mk/lib_srcs.mk`, `test/test.mk` + new test files.
diff --git a/mk/test.mk b/mk/test.mk
@@ -876,8 +876,11 @@ test-asm-rv32: lib $(ASM_RUNNER)
# `kit cc -target riscv32-none-elf` and runs the freestanding ELF bare-metal
# under qemu-system-riscv32 via test/lib/exec_rv32_bare.sh (the qemu exit code
# is the exit-code oracle). Self-skips per case when the rv32 toolchain
-# (clang riscv32 + qemu-system-riscv32) is absent. Opt-in (not in
-# DEFAULT_TEST_TARGETS): real rv32 codegen gaps are still left RED on purpose.
+# (clang riscv32 + qemu-system-riscv32) is absent. The corpus is green; this
+# lane is opt-in (not in DEFAULT_TEST_TARGETS) because it needs that toolchain,
+# matching the rv64 cross lanes. The only non-passing cases are intentionally
+# unsupported on rv32 (__int128, binary128 long double, LP64-data-model
+# assumptions, aa64-only intrinsics) and carry committed .rv32.skip sidecars.
test-toy-rv32: bin rt-riscv32-elf-hardfloat
@KIT=$(abspath $(BIN)) KIT_TOY_CROSS_ARCHS=rv32 KIT_TEST_PATHS=X \
test/toy/run.sh
@@ -886,7 +889,9 @@ test-toy-rv32: bin rt-riscv32-elf-hardfloat
# only. parse-runner --emit -> kit ld + start crt -> qemu-system-riscv32
# (test/parse/run.sh's rv32 freestanding E path via exec_rv32_bare.sh). Models
# test-parse-rv64-wide; opt-in (needs the rv32 toolchain/qemu), so excluded
-# from DEFAULT_TEST_TARGETS while rv32 reds still exist.
+# from DEFAULT_TEST_TARGETS, matching test-parse-rv64-wide. The corpus is green;
+# the only skips are intentionally-unsupported cases (__int128, binary128 long
+# double, LP64-data-model assumptions) with committed .rv32.skip sidecars.
test-parse-rv32: lib rt-riscv32-elf-hardfloat $(PARSE_RUNNER) $(ROUNDTRIP_BIN) \
$(LINK_EXE_RUNNER)
@KIT_TEST_ARCH=rv32 KIT_TEST_PATHS=E bash test/parse/run.sh