kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 0ae44208d15bba91c4d3a9188848cb3871bf02bd
parent be9be587a0b6e6599ed17c6764e1da2d93f042ec
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  8 Jun 2026 11:32:25 -0700

rv32: close out riscv32-none-elf; remove completed plan doc

The riscv32-none-elf backend is complete — all cross lanes pass at O0+O1
(test-asm-rv32 16/0, test-toy-rv32 421/0, test-parse-rv32 906/0, smoke
hardfloat green under qemu-system-riscv32). The forward-looking plan in
doc/plan/RV32.md is fully realized, so remove it and its index row (per
convention, completed plans are removed, not checked off), and move the
two durable contracts it tracked into user-facing docs:

- doc/RUNTIME.md: the freestanding TLS thread-pointer contract
  ([TCB(16)|.tdata|.tbss], tp->TCB base, Local-Exec only; linker bias via
  ObjElfArchOps.tls_tp_bias) and the i64-atomics-as-libcall note (8-byte
  _Atomic -> spinlock __atomic_*_8, correct but not lock-free).
- mk/test.mk: refresh the rv32 lane comments (corpus green; opt-in like
  the rv64 cross lanes, kept out of DEFAULT_TEST_TARGETS because they
  need the qemu-system-riscv32 toolchain).

Docs and comments only; no code change.

Diffstat:
Mdoc/RUNTIME.md | 37++++++++++++++++++++++++++++++++++++-
Mdoc/plan/README.md | 1-
Ddoc/plan/RV32.md | 528-------------------------------------------------------------------------------
Mmk/test.mk | 11++++++++---
4 files changed, 44 insertions(+), 533 deletions(-)

diff --git a/doc/RUNTIME.md b/doc/RUNTIME.md @@ -119,7 +119,12 @@ native instruction. (`atomic_common.inc`) provides the lock, hashed by address — no OS dependency. Implemented over the GCC-style `__atomic_*` builtin family that kit itself documents (`doc/builtins.md`), with upstream's Clang-only `__c11_atomic_*` - calls translated. 16-byte cases are keyed off `HAS_INT128`. + calls translated. 16-byte cases are keyed off `HAS_INT128`. On 32-bit targets + (rv32 `ilp32`/`ilp32f`) the ISA has no 64-bit atomic (`lr.d`/`sc.d`/`amo*.d` + are rv64-only), so 8-byte `_Atomic` / `__atomic_*` lower to the `__atomic_*_8` + entries here — spinlock-backed, correct but **not** lock-free; the front end's + `__atomic_always_lock_free(8, …)` reports false to match. This is the same + contract libatomic provides; kit ships no native 64-bit atomic on rv32. - **Misc** (`rt/lib/cache/clear_cache.c`): a weak `__clear_cache` (target for `__builtin___clear_cache`) plus weak bare-metal cache stubs. ARM and RISC-V variants add the AEABI / save-restore assembly described below. @@ -224,6 +229,36 @@ thread gets an independent resume chain; kit's contract defines feature, and bare-metal images with no TLS runtime collapse to single-thread semantics. +## Thread-local storage (freestanding contract) + +kit emits the **Local-Exec** TLS model only — there is no dynamic TLS +(`__tls_get_addr`, GD/LD) and no TLS allocator. `_Thread_local` objects live in +the executable's `PT_TLS` image and are reached `tp`-relative, with the per-arch +offset baked in by the linker (`ObjElfArchOps.tls_tp_bias`, applied in +`src/obj/elf/link.c`'s `tls_tcb_bias`). + +The runtime ships no `crt0`, so a freestanding image's own startup establishes +the thread block and `tp`. The layout kit's codegen + linker assume for RISC-V +and AArch64 (TLS variant I) is a 16-byte TCB *ahead* of `.tdata`: + + [ TCB (16 bytes) | .tdata (init image) | .tbss (zeroed) ] + ^tp + +so a TLS variable at image offset `off` is accessed at `tp + 16 + off`. Startup +must therefore reserve `16 + tdata_size + tbss_size`, copy the `.tdata` init +image to `block + 16`, zero the `.tbss` span, and set `tp = block`. The +reference implementation is `test/link/harness/start.c` (the rv32 bare-metal +stub lives in `test/lib/exec_rv32_bare.sh`); under `ilp32f` that startup must +also set `mstatus.FS` before any FP op. x86_64 uses TLS variant II instead +(`tp`/`%fs` points *past* the image, `TPOFF` offsets are negative), so it carries +no TCB bias. + +On a **hosted** RISC-V target the psABI points `tp` at the image start (bias 0, +matching Linux/FreeBSD `_init_tls`); kit's linker selects that 0 bias for +non-freestanding RISC-V automatically. With no thread block set up at all, +`_Thread_local` still resolves against a single static image, so a bare-metal +program that never sets `tp` collapses to single-thread semantics. + ## Shipped headers (`rt/include/`) kit ships its own header set so freestanding compilation needs no system diff --git a/doc/plan/README.md b/doc/plan/README.md @@ -24,7 +24,6 @@ shrinks to whatever remains open. | [BACKTRACE.md](BACKTRACE.md) | Stack-trace support: GCC-compatible `__builtin_return_address`/`__builtin_frame_address` primitives, a freestanding `__kit_backtrace` capture helper, and symbolized backtrace printing. L1–L3a/L3c shipped; L3b (in-process self-symbolization) deferred. | [../FRONTENDS.md](../FRONTENDS.md), [../RUNTIME.md](../RUNTIME.md), [../DWARF.md](../DWARF.md) | | [LTO.md](LTO.md) | Whole-program optimization: `symresolve` extraction, cross-TU inlining, internalization. Phase 0 (whole-TU opt) and Phase 1 (all-sources-up-front LTO) shipped; Phase 2 (serialized `.kit.ir` objects) open. | [../OPT.md](../OPT.md) | | [CODEGEN.md](CODEGEN.md) | CG API interface cleanup: PLACE/VALUE centerpiece, op/intrinsic taxonomy, atomic/order/AsmDir unification, multi-result API, i128/f128-as-VALUE. Tracks 1/3/4/5/6/7 landed; Track 2 (binop/cmp split) and Track 1c open. | [../CODEGEN.md](../CODEGEN.md) | -| [RV32.md](RV32.md) | riscv32-none-elf backend: all workstreams (WS0–WS9) complete including 64-bit-value legalization at ilp32f/ilp32. Known gaps (`__int128`, i64 atomics, i64 varargs, TLS) are intentionally left red. | [../ARCH.md](../ARCH.md) | | [DIST_LIBRARY.md](DIST_LIBRARY.md) | Migrating the CAS/package distribution subsystem into libkit as a gated public API (`kit/cas.h`, `kit/package.h`). Main migration shipped; Stage 3 v2 dead-code deletion deferred. | [../DISTRIBUTE.md](../DISTRIBUTE.md) | | [FREEBSD.md](FREEBSD.md) | FreeBSD target support: VM harness, triple parsing, runtime variants, COMDAT/`STB_GNU_UNIQUE` fixes. Static link blocked on archive weak-alias cycle (needs `--start-group` semantics); dynamic link and full VM validation remaining. | — | | [TODO.md](TODO.md) | Open deferred fixes and code smells only. Completed items are removed instead of checked off. Not a roadmap; a current backlog. | — | diff --git a/doc/plan/RV32.md b/doc/plan/RV32.md @@ -1,528 +0,0 @@ -# Plan: RISC-V 32-bit (`riscv32-none-elf`) support - -## Status — 2026-06-03 (branch `rv32`) — core complete; cross-test gaps tracked - -`riscv32-none-elf` (`rv32imafc_zicsr_zifencei`, both `ilp32f` and `ilp32`) is a -working cross target. WS6 — the flagged "hardest part", 64-bit-value legalization -— is **done and behaviorally verified under `qemu-system-riscv32`** at -O0 and -O1 -for both ABIs. The full kit toolchain (`kit cc → kit ld → qemu-system`) builds and -runs a correct bare-metal rv32 image with **no special flags** (freestanding -defaults to no-PIE). As of 2026-06-03 the rv32 runtime is **no longer -special-cased**: `kit cc`/`kit ld` auto-build and auto-link `libkit_rt.a` for -`riscv32-none-elf` exactly like every other target — the driver carries two -rv32 runtime variants (`riscv32-elf` soft ilp32, `riscv32-elf-hardfloat` -ilp32f), selected by the float ABI recovered from the objects' ELF e_flags, so -no explicit archive or `-nostdlib` is needed. **RV64 / x64 / aa64 fully -non-regressed**: asm goldens byte-identical, isa (rv64 21 + rv32 31)/0, -abi-classify 367/0, elf 41/0, link 122/0 + x64 79/0, cg-api 544/0, smoke-rv64 3/0, -dwarf/driver/interp green. - -Both corpora now run on qemu-system-riscv32 as a cross arch: **Toy `240 pass / 15 -red`** (`test/toy/run.sh`, path X) and **C `439 pass / 36 red`** (`test/parse/run.sh`, -path E). **The reds are deliberately left red** (no skip sidecars) — they are the -real remaining rv32 gaps, enumerated in the checklist below. - -### Done & verified ✅ -- [x] **WS0–WS5, WS7** — variant scaffold, XLEN-parameterized backend, `arch_impl_rv32` - + `-march`/`-mabi`/macros, shared ABI classifier + `rv32_vtable`, ELFCLASS32 - emit/read/link + `reloc_riscv32.c`, `mk/rt.mk` variants. (See git history.) -- [x] **WS6 — 64-bit-value pair-legalization (THE blocker) — DONE.** rv32 8-byte scalars - (`long long`/i64 AND soft `double`) are **memory-resident** (`api_is_wide8_scalar_type` - forces `CG_LOCAL_MEMORY_REQUIRED`; `cg_ir_lower`/`pass_native_emit` size>word checks made - `> ptr_size`), mirroring the proven i128/wide16 model. The allocator binds one register per - value, so memory residence + the multi-part ABI path (`ABIArgPart.src_offset`, - `rv_load_part`/`rv_store_part`) is the only correct representation. (`src/cg/arith.c`, - `src/cg/wide.c`): - - add/sub/and/or/xor/neg/bnot — **inline 2-word lane ops** (carry/borrow via `sltu`); no - compiler-rt 64-bit add helper exists, so these *must* be inline. - - i64 compares — inline lane eq/lt (signed-hi/unsigned-lo); `if(i64)` = `(lo|hi)!=0`. - - i64 mul/div/rem/shift → `__*di3`; soft `double` → `__*df*`; i64↔float → `__floatdisf`/ - `__fixsfdi`/…; **soft single** f32 under `ilp32` → `__*sf*`; i64 clz/ctz/popcount/bswap → - `__*di2`; 64-bit consts → two lanes. - - `nd_*` guards (`native_direct_target.c`) **panic** on any 8-byte value reaching a - single-register binop/unop/cmp/convert/load_imm/load_const — loud, never truncation. -- [x] **Runtime (`make rt`) — DONE.** Both `riscv32-elf` (ilp32) and `riscv32-elf-hardfloat` - (ilp32f) build with kit's own cc. Fixed `mk/rt.mk`: `RT_CFLAGS`/`RT_ASFLAGS` now include - `RT_<v>_ARCH_FLAGS` (the `-mabi`/`-march` were silently dropped — every variant built ilp32f). -- [x] **ELF e_flags float-ABI** — `emit.c`/`link.c` derive the RISC-V float-ABI bits from - `target.float_abi` (the static descriptor hardcoded SINGLE, mislabelling `ilp32` soft); - rv64/x64/aa64 byte-identical. -- [x] **Freestanding policy (host-irrelevant, target-derived):** - - kit stamps **`EI_OSABI=ELFOSABI_STANDALONE`** on `*-none-elf` objects (`emit.c`) so they - round-trip as `KIT_OS_FREESTANDING` instead of decoding back to Linux (the "none → Linux" - bug). `kit ld` derives the PIC default from the *target* via `driver_default_pic` (hosted - → PIE, freestanding → no-PIE) and scans all inputs for a freestanding object — the host's - default never leaks onto a cross target. So `kit ld` for rv32 needs **no `-no-pie`**. - - `kit ld`/`kit cc` auto-link a runtime for any target that has a variant - (`driver_runtime_has_variant`) — **now including `riscv32-none-elf`**. The driver - (`driver/lib/runtime.c`) carries two rv32 runtime variants distinguished by a new - `float_abi` axis on `RuntimeVariant` (`riscv32-elf` soft `ilp32`/`rv32imac`, - `riscv32-elf-hardfloat` `ilp32f`/`rv32imafc`); each is built on demand with its own - `-march`/`-mabi` via `topts.isa`/`topts.abi`. The float ABI is recovered from the RISC-V - ELF `e_flags` in `src/api/object_detect.c` and reconciled across all link inputs in - `driver/cmd/ld.c` (a foreign startup stub that lacks the flag never mis-selects the soft - runtime). So a freestanding rv32 link needs **no explicit `libkit_rt.a` and no - `-nostdlib`**. New `-Ttext ADDR` and `-nostdlib`/`--no-default-libs` flags remain - available for images that supply their own runtime. - - `.eh_frame` suppressed for `KIT_OS_FREESTANDING` (`src/arch/mc.c`); hosted byte-identical. - - `layout_dyn` emits a clean diagnostic for an ELF32 dynamic/PIE link (was an ELF64 SEGV). - - jump-table / label-address slots are width-aware (`R_ABS32` on rv32, `R_ABS64` on 64-bit) - in `nd_local_static_data_label_addr` — fixes switch jump tables on rv32. -- [x] **WS9 tests + CI wiring:** `test/arch/rv32_decode_test.c` (→ `test-isa`, 31 checks), - `test/link/rv32_jit_test.c` (→ `test-rv32-jit`, exit-77 host gate), - `test/elf/unit/rv32_class32.c` (ELFCLASS32 round-trip, → `test-elf`), - `test/smoke/rv32.sh` (→ `test-smoke-rv32`): 7 lanes — ilp32f + ilp32 × {-O0,-O1} covering i64 - + soft-double + soft-single, two `kit ld` end-to-end lanes that **auto-link the runtime** - (no explicit `libkit_rt.a`), a negative control. Wired in - `mk/test.mk`/`mk/test_unit.mk` (`test-rv32-jit`, `test-smoke-rv32`). -- [x] **Toy + C cross lanes (rv32 as an arch).** Shared bare-metal runner - `test/lib/exec_rv32_bare.sh` (clang startup → `kit cc`/parse-runner → `kit ld` → qemu-system, - SiFive-finisher exit oracle; entry symbol configurable — `main` for Toy, `test_main` for C). - Toy: `test/toy/run.sh` `cross_one_rv32` (rv32 in default `TOY_CROSS_ARCHS`, path X) — **240/15**. - C: `test/parse/run.sh` `kit_lane_E` rv32 branch + `kit_test_target.h` rv32 arm (path E, - `KIT_TEST_ARCH=rv32`) — **439/36**. Both opt-in; reds left red. - -### Remaining ⚠️ — clear checklist - -**A. rv32 codegen gaps surfaced by the cross lanes (the reds — left red on purpose, no skips).** -Toy `240/15`, C `439/36`; the 51 reds cluster into: -- [ ] **`__int128`** (C: `i128_02`…`i128_13+`, ~15 cases — the largest C bucket). rv32 has no - `__int128` (runtime `INT128=0`; the 16-byte-scalar path is dead on rv32). Decide: reject - `__int128` on rv32 at the front end with a clear diagnostic (cleanest), or legalize it (a - 4-word version of the wide8 work — large). Until then these are compile-fail/wrong-result. -- [ ] **i64 atomics** (`@atomic_*<i64>` / `__atomic_*_8`; Toy 17/22/59/73/74/75/77, C - `builtin_*_atomic_long`). rv32 `A` has no 64-bit AMO/`lr.d`/`sc.d`; needs `__atomic_*_8` - libcalls (libatomic / a lock), absent freestanding. Provide 8-byte `__atomic_*` in `rt/`, or - document as a hard rv32 limitation. -- [ ] **64-bit `*_overflow` intrinsics** (Toy 58_overflow_record, C `builtin_26_sadd_overflow`). - Legalize i64 sadd/uadd/ssub/usub/smul/umul-overflow on rv32 (the 64-bit operand reaches the - backend un-split today → trap), à la the clz/ctz wide8 routing in `arith.c`. 32-bit works. -- [ ] **i64 varargs** (Toy 133_varargs_mixed_types — wrong result, not a hang). Audit the rv32 - `va_arg` path for an 8-byte value (even-pair fetch from the vararg save area). -- [ ] **thread-local storage** (Toy 141, C `6_7_1_03_thread_local_basic`, `gnu_thread_storage_01`). - TLS needs a thread pointer the bare-metal image never sets up — likely a genuine freestanding - limitation (the Linux lanes get it from the OS); document, or provide a static-TLS model. -- [ ] **toy soft-float compare lowering** (Toy 153_fp_cmp_negation_b — `kit cc` "addr operand is - not an lvalue", rv32-only, not reproducible in C). An eager soft-fp compare feeding an - empty-then/else block hits an lvalue path the rv64 delayed-`SV_CMP` form avoids. Narrow. -- [ ] **123_spec_demo** (Toy, hangs) — triage which of the above it exercises. -- Test-environment mismatches (NOT rv32 codegen bugs; an `.rv32.skip` sidecar exists for them but - none is committed): Toy 145_baremetal_privileged_aa64 (aa64 intrinsics), 20_cg_api_inline_asm_full - + C `asm_01_grammar` (inline-asm constraints/grammar), 47_target_arch_switch (selects its expected - exit code by target arch). - -**B. Pre-existing follow-ups (orthogonal to the cross tests).** -- [ ] Optional `make` targets `test-toy-rv32` / `test-parse-rv32` (opt-in; not in - `DEFAULT_TEST_TARGETS` while reds exist). -- [ ] **`test/asm/` rv32 byte-golden lane + `regen-rv32.sh`** (rv32 arm in `test/asm/run.sh` / - `kit_unit.h` + committed clang/llvm-objdump goldens; `kit_test_target.h` already has rv32). -- [ ] **CSR pseudo-ops in the assembler** (`csrs`/`csrw`/`csrr`/… + CSR names) — a general - RISC-V-assembler feature (missing on rv64 too; new `RV64_FMT_CSR_{R,W,WI}` + CSR-name table + - disasm print cases). Until then the smoke/cross startup stub is clang-assembled. - -**Out of scope (decided):** `kit ld` ELF32 dynamic/PIE — rv32 is static-only; `layout_dyn` -clean-panics on an ELF32 dynamic/PIE link and that is the intended behavior. - -### Where to look -- WS6 legalization: `src/cg/wide.c`, `src/cg/arith.c` (binop/unop/cmp/convert + soft-fp + clz/ctz), - `src/cg/{value,local,memory,call,control}.c`, `src/opt/{cg_ir_lower.c,pass_native_emit.c}`, - `src/cg/native_direct_target.c` (`nd_*` panics + `nd_local_static_data_label_addr`). -- Backend: `src/arch/riscv/{variant.{h,c},native.c,isa.{c,h},disasm.c,asm.c,link.c,dbg.c,arch.c}`. -- ABI: `src/abi/abi_rv64.c` + `src/abi/registry.c`. -- ELF / kit ld / freestanding policy: `src/obj/elf/{elf.h,emit.c,read.c,link.c,link_dyn.c}` + - `reloc_riscv32.c`; `driver/cmd/ld.c` (`-Ttext`/`-nostdlib`/PIC-from-target), `driver/lib/target.c` - (`driver_default_pic`), `driver/lib/runtime.{c,h}` (`driver_runtime_has_variant`, the two rv32 - `RuntimeVariant` entries + `float_abi`/`isa`/`abi` axis, `rt_build_archive`), - `src/api/object_detect.c` (EI_OSABI → os; RISC-V `e_flags` → `float_abi`), - `src/link/{link.c,link_layout.c}`, `src/api/link.c`. -- Runtime/intrinsics: `mk/rt.mk` (ARCH_FLAGS), `src/cg/type.c` (rv32 ≡ rv64 for intrinsics). -- Tests: `test/smoke/rv32.sh`, `test/lib/{check_rv32_env.sh,exec_rv32_bare.sh,kit_test_target.h}`, - `test/toy/run.sh` (`cross_one_rv32`), `test/parse/run.sh` (`kit_lane_E` rv32 branch), - `test/arch/rv32_decode_test.c`, `test/link/rv32_jit_test.c`, `test/elf/unit/rv32_class32.c`, - `mk/test.mk`, `mk/test_unit.mk`. - ---- - -## Context - -`kit` today targets `riscv64` (LP64D) via a single backend in `src/arch/rv64/`. We want a -new cross target: - -``` ---target=riscv32-none-elf --march=rv32imafc_zicsr_zifencei --mabi=ilp32f (and also -mabi=ilp32, soft-float) --mcmodel=medlow -``` - -This is a freestanding 32-bit RISC-V toolchain target: F (single-precision hardware float) -but **no D**, so `double` and `long long` are not native and must be lowered. The enum -`KIT_ARCH_RV32`, the `riscv32` triple parse (`driver/lib/target.c:275`, `ptr_size=4`), ELF -auto-detection (`src/api/object_detect.c`), and the runtime source files -(`rt/lib/riscv/rv32.S`, `rt/lib/coro/riscv32.c`) already exist but are unwired/incomplete. - -The intended outcome: `kit cc/as/ld/objdump/disas` produce and consume correct -`riscv32-none-elf` ELFCLASS32 objects and static executables for both `ilp32f` and `ilp32`, -with `libkit_rt.a` builtins available and the JIT `run`/`dbg` plumbing wired (native -execution host-gated, as for rv64). - -## Confirmed scope decisions - -- **Shared backend**: refactor `src/arch/rv64/` into **one XLEN-parameterized RISC-V backend** - serving both rv32 and rv64 from a single tree. RV64 must not regress and is re-validated. -- **Subsystems in scope**: compile + assemble + link + disasm; runtime lib; JIT `run`/`dbg`. - **Emulator is out of scope** (`src/emu`, `src/os`, `src/obj/elf/emu_load.c` stay rv64-only). -- **ABIs**: `ilp32f` (single hard-float: `float` in `fa0-fa7`, `double`/`i64` via integer - regs + soft-float) **and** `ilp32` (pure soft-float). `double` is always soft-float. -- **Code model**: accept and validate `-mcmodel=medlow`/`medany`, but keep the existing - PC-relative (`auipc` + `R_RV_PCREL_HI20/LO12`, GOT for externs) addressing for v1. No new - absolute-addressing path. - -## XLEN-parameterization mechanism - -Add a `const RiscvVariant*` descriptor (immutable, two static instances selected by -`KitArchKind`) carried on the per-function codegen context and threaded into the otherwise -stateless decode/asm/disasm/link/dbg paths. This honors "no global state — everything hangs -off a context struct" (the variant is a const table reached through a context, never ambient). - -New `src/arch/riscv/variant.h`: - -```c -typedef struct RiscvVariant { - KitArchKind kind; /* KIT_ARCH_RV32 / KIT_ARCH_RV64 */ - const char* name; /* "rv32" / "rv64" */ - const char* isa_prefix; /* "rv32" / "rv64" — for -march parsing */ - u8 xlen; /* 32 / 64 */ - u8 ptr_bytes; /* 4 / 8 — pointer & native register width */ - u8 gp_slot_bytes; /* 4 / 8 — varargs save & callee-save stride */ - u8 has_w_forms; /* 0 rv32 / 1 rv64 — ADDW/ADDIW/SLLIW/... */ - u8 shamt_bits; /* 5 rv32 / 6 rv64 — SLLI/SRLI/SRAI immediate */ - u32 frame_save_size; /* 2 * ptr_bytes (8 rv32 / 16 rv64) */ -} RiscvVariant; -const RiscvVariant* riscv_variant_for_kind(KitArchKind); -``` - -Reached via: `RvNativeTarget.variant` (codegen), `riscv_variant_for_kind(c->target.arch)` in -the decoder/assembler/disassembler/dbg (they already hold a `Compiler*`), and two -`LinkArchDesc` literals for the linker. Distinguish **three different "8"s** carefully — -`ptr_bytes` (pointer/reg width), `gp_slot_bytes` (ABI save stride), and `frame_save_size` -(saved ra+s0 pair) — conflating them passes rv64 (all 8) and breaks rv32. - -The **float ABI** (soft vs single-hard) is a separate axis from XLEN, carried on -`KitTargetSpec.float_abi` (see WS4), consumed by the ABI classifier and predefined macros. - -## Workstreams (ordered; each leaves a green targeted check) - -### WS0 — Config + variant scaffold (no behavior change) -- `include/kit/config.h`: add `#define KIT_ARCH_RV32_ENABLED 1` (`mk/config.mk` auto-parses - it into a make var — no `config.mk` edit needed). -- Add `src/arch/riscv/variant.h` with the struct + two `const` instances + lookup. -- **Gate**: `make lib` compiles. - -### WS1 — Directory rename + thread variant through codegen (rv64 still identical) -- `git mv src/arch/rv64 src/arch/riscv`; fix include guards/paths. Update `mk/lib_srcs.mk:55,189` - (`LIB_SRCS_ARCH_RV64` → `LIB_SRCS_ARCH_RISCV`, gated by `RV32 || RV64`). The only external - referent is the symbol `arch_impl_rv64` in `src/arch/registry.c` (path-independent). -- Keep file names and internal `rv64_`/`rv_` symbol prefixes for v1 (cosmetic rename is a - separate follow-up; renaming 2000+ sites is pure regression risk). -- `src/arch/riscv/native.c`: add `const RiscvVariant* variant` to `RvNativeTarget` (set from - `c->target.arch` in the one constructor); replace hardcoded 8/16/`RV_FRAME_SAVE_SIZE`/ - `addiw`/`ld`/`sd`/float-fmt sites with variant reads. **With the rv64 variant the emitted - bytes are byte-for-byte identical** — this isolates the "sharing" regression from rv32 - correctness. Key sites (file `src/arch/riscv/native.c`): `rv_emit_li32` (LUI+ADDIW→ADDI when - `!has_w_forms`), `enc_int_load/store` (sw/lw vs sd/ld by `ptr_bytes`), `RV_FRAME_SAVE_SIZE`, - varargs save area, callee-save stride, `rv_type_size`/`align` defaults, `rv_convert` sext/zext - (`xlen - src_bits` shift; `addiw` fast-path only when `has_w_forms`). -- **Gate**: `make test-smoke-rv64`, `test/arch/rv64_decode_test.c`, `test/asm/regen-rv64.sh`, - `test/link/rv64_jit_test.c` all byte-identical green. - -### WS2 — ISA / asm / disasm / link / dbg XLEN parameterization (still rv64-only at runtime) -- `src/arch/riscv/isa.c`/`isa.h`: add a one-byte **availability mask** column to `Rv64InsnDesc` - (`RV_AV_RV32 | RV_AV_RV64`) rather than a second table. Mark RV64-only: W-forms - (`addw/subw/sllw/srlw/sraw`, `addiw/slliw/srliw/sraiw`, `mulw/divw/divuw/remw/remuw`), - 64-bit mem (`ld/sd/lwu`), 64-bit FP int conv (`fcvt.*.l/lu`, `fmv.x.d/d.x`), compressed - `c.addiw/c.addw`, and the RV64 meaning of `c.ld/c.sd/c.ldsp/c.sdsp/c.fld/c.fsd/...`. Enable - RV32-only: `c.jal` (shares the encoding that is `c.addiw` on rv64), `c.lw/c.sw`, `c.flw/c.fsw`. -- `src/arch/riscv/disasm.c` + the compressed decoder `rv64_disasm_find_c`: pass the variant in; - branch the ambiguous compressed quadrant encodings and the **5-bit vs 6-bit shamt** decode - (`& 0x1f` on rv32, reject bit 25 set). `rv64_disasm_find`/`rv64_asm_find` skip rows by mask. -- `src/arch/riscv/link.c`: split `link_arch_rv64` and a new `link_arch_rv32`; PLT/IPLT stubs use - `rv_lw` instead of `rv_ld` (re-check stub sizes/offsets for 4-byte slots). -- `src/arch/riscv/dbg.c`: parameterize the displaced-step shim by `ptr_bytes`; set - `min_insn_len=2, max_insn_len=4` for rv32 (C ext on); RVC control-flow falls back to - step-over (`KIT_UNSUPPORTED`), 4-byte fixups reuse the rv64 builder. -- **Gate**: `make test-isa`, `regen-rv64.sh`, `rv64_jit_test` still green. - -### WS3 — rv32 ArchImpl + registry + `-march` + predefined macros -- `src/arch/riscv/arch.c`: define **both** `arch_impl_rv32` and `arch_impl_rv64` (share - `cgtarget_new/asm_new/disasm_new/decode/dwarf/dbg/asm_ops`/register file; differ in `.kind`, - `.name`, `.link`, `.predefined_macros`, `.target_feature_*`, and `cfi_data_align_factor` - -4 vs -8). -- Generalize `rv64_target_feature_apply_isa` (currently hard-requires the `"rv64"` prefix, - `arch.c:204`) to compare against `variant->isa_prefix`. rv32 default profile = - `rv32imafc_zicsr_zifencei` (I/M/A/F/C/Zicsr/Zifencei, **D cleared**). -- Predefined macros for rv32 (float-abi-dependent, see WS4): `__riscv_xlen=32`, - `__ILP32__`/`_ILP32` (drop `__LP64__`/`_LP64`), `__riscv_float_abi_single` (ilp32f) **or** - `__riscv_float_abi_soft` (ilp32) instead of `_double`, `__riscv_flen=32` when F present. -- `src/arch/registry.c`: register `arch_impl_rv32` under `#if KIT_ARCH_RV32_ENABLED` - (`:24,50,57`); `arch_kind_name` already returns "riscv32". -- **Gate**: `kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -E -dM` shows the - right macros; `kit mc`/`disas -target riscv32-none-elf` round-trips a hand-written rv32 insn. - -### WS4 — ABI vtable refactor + `-mabi` plumbing -- **New spec field**: in `include/kit/core.h` add `enum KitFloatAbi {DEFAULT, SOFT, SINGLE, - DOUBLE}` and `uint8_t float_abi` on `KitTargetSpec`; add `KitSlice abi` to `KitTargetOptions`. -- **Driver `-mabi`**: in `driver/lib/target.c`, intercept `-mabi=`/`-mabi` in - `driver_target_features_try_consume` **before** the catch-all `-m<x>` fallback (which would - otherwise mis-eat it), mirroring `-march` at `:154-165`; carry through `driver_target_options`. - Add `medlow`→`KIT_CM_SMALL`, `medany`→`KIT_CM_MEDIUM` aliases in `cc_record_mcmodel` - (`driver/cmd/cc.c:751`) and `run_record_mcmodel` (`driver/cmd/run.c:379`). -- **Resolve + validate** in `kit_target_new` (`src/api/core.c`), after `-march` features are - known: parse `ilp32|ilp32f|ilp32d|lp64|lp64f|lp64d`; if omitted, derive from `-march` - (D→DOUBLE, F-no-D→SINGLE, else SOFT); **reject** `*f` without F and `*d` without D. So - `rv32imafc` defaults to `ilp32f`, and `ilp32d` is rejected. -- **Shared ABI classifier**: generalize `src/abi/abi_rv64.c` into a RISC-V classifier - parameterized by a descriptor `{xlen_bytes (=ptr_size), gpr_bytes, aggregate_gpr_bytes=2*gpr, - flen (0/4/8), float_abi}` read from `a->c->target`. Replace the `RV64_ABI_*_BYTES=8/16` enum. - - FP-eligibility predicate `fp_eligible(desc, size)`: SOFT never; SINGLE iff `size==4` - (float; `double` 8>flen4 → INT pair); DOUBLE iff `size<=8` (preserves rv64 LP64D). - - `classify_scalar`: i8/16/32/ptr → 1 INT part; `i64`/soft-`double` → **2 INT parts of 4 in - an even-aligned GPR pair**; `float` (ilp32f) → 1 FP part (fa0-fa7). Replace the hardcoded - `size==16 → 2×8` with `nparts = size/gpr_bytes`. - - `classify_aggregate`: register threshold `2*gpr_bytes` (8 on rv32), chunk by `gpr_bytes`; - HFA refinement gated by `fp_eligible`. - - va_list: `ABI_VA_LIST_POINTER`, `gp_reg_count=8`, `gp_slot_size=4`, `fp_reg_count=0` - (**FP varargs always go via INT regs even under ilp32f**). Two thin static vtables - (`rv32_vtable`, `rv64_vtable`) sharing the classifier, differing only in the va_list literal. -- `src/abi/registry.c`: add `KIT_ABI_RV32_ENABLED` and an `{KIT_ARCH_RV32, KIT_OBJ_ELF, - &rv32_vtable}` entry (one entry serves both ilp32/ilp32f; the float axis is read from the spec). -- **Gate**: ABI classification golden tests (`test/api/abi_classify_test.c` style) for rv32 - ilp32f and ilp32. - -### WS5 — ELFCLASS32 object emission + reading (largest item) -Introduce one `is32`/`ElfEnc` flag (from `c->target.ptr_size`) threaded through, **not** -copy-paste duplication. -- `src/obj/elf/elf.h`: add `ELFCLASS32`, `ELF32_{EHDR,PHDR,SHDR}_SIZE` (52/32/40), - `ELF32_SYM_SIZE`(16)/`ELF32_RELA_SIZE`(12), `ELF32_R_INFO(s,t)=((s)<<8)|((t)&0xff)`, - `ELF32_R_SYM/TYPE`. -- `src/obj/elf/emit.c`: replace the `ptr_size != 8` panic (`:271`); branch sym record (16B, - different field widths) and rela record (12B, `ELF32_R_INFO`) writers; `EI_CLASS` - (`:664`); Ehdr/Shdr address fields via `elf_wr_u32` and ELF32 sizes; e_flags from - `float_abi` (`EF_RISCV_FLOAT_ABI_SINGLE`/`_SOFT` | `EF_RISCV_RVC`). -- `src/obj/elf/read.c`: accept `ELFCLASS32` (`:446,814`); add `parse_shdr32`/`parse_sym32`/ - rela32 with the correct offsets/strides and `ELF32_R_SYM/TYPE`. Scope v1 to ET_REL + - ET_EXEC reads; give ELF32 ET_DYN a clear "unsupported" rather than mis-parse. -- `src/obj/elf/link.c`: ELF32 ET_EXEC writer (parallel parameterization to emit.c) — needed - for `ld`/`run`/`dbg`. `link_dyn.c` and `emu_load.c` stay rv64/ELF64-only: gate rv32 to - static linking (freestanding `-none-elf` defaults to `KIT_PIC_NONE`), panic-with-diagnostic - for rv32 dynamic. -- New `src/obj/elf/reloc_riscv32.c`: clone `reloc_riscv64.c`; map `R_ABS32`→`ELF_R_RISCV_32`, - and `R_ABS64`/`R_RV_ADD64`/`R_RV_SUB64`→unsupported; reuse all XLEN-neutral kinds. -- `src/obj/registry.c`: add the rv32 `obj_elf_arch_ops` entry. **EM_RISCV is shared by rv32 - and rv64** — disambiguate reloc-table selection by `EI_CLASS`, not e_machine alone. -- **Gate**: new `test/elf/unit/rv32_class32.c` write-then-read round-trip; `kit objdump`/`nm` - on a hand-built rv32 `.o`. - -### WS6 — 64-bit-int + soft-float-double legalization (hardest part) -The cg layer (`src/cg/arith.c`) only routes wide ops to libcalls for the `__int128` builtin -(`api_i128_stack_top`), **never by width** — so `long long` on rv32 currently reaches the -backend as a raw 8-byte value, and `double` arithmetic would emit illegal `.d` ops. -- **64-bit integers on rv32**: generalize the i128 libcall mechanism in `src/cg/arith.c` to a - "wider than target word" predicate (`type_size > c->target.ptr_size`). Recommended v1: - route `mul/div/udiv/mod/shifts` to runtime libcalls (`__muldi3`, `__divdi3`, `__udivdi3`, - `__moddi3`, `__ashldi3`, `__lshrdi3`, `__ashrdi3`); do `add/sub/and/or/xor/load/store/move` - inline as register pairs in the backend (these are unavoidable for memory/arg traffic). - Add a **loud panic** in `rv_binop`/`rv_convert` if a wide value reaches the native-width - path, so any missed case fails fast. -- **Soft-float `double` on ilp32f/ilp32**: route `double` arithmetic and `double`↔int/float - conversions to libcalls (`__adddf3`, `__subdf3`, `__muldf3`, `__divdf3`, `__extendsfdf2`, - `__truncdfsf2`, `__fixdfsi`, `__floatsidf`, df compares) — mirror the existing f128 path so - the backend only ever sees `float` (S) FP ops. Backend panics on any `RV_FMT_D` selection - when `xlen==32`. -- Confirm `long double == double` (8B) and `__int128` absent on rv32 (runtime sets - `INT128=0`, no `LDBL128`), so the 16-byte scalar classify path is effectively dead there. -- **Gate**: red-green targeted tests — `long long` add/mul/div and `double` add/mul/convert - compile to plausible sequences (verified via decode/disasm; behavior via qemu if available). - -### WS7 — Runtime build wiring (`mk/rt.mk`) -- The `riscv32-elf` / `riscv32-elf-save-restore` variants exist but are **wrong**: - `-mabi=ilp32 -march=rv32imafd` (D present). Fix to the confirmed profile and add the - hard-float variant: - - `riscv32-elf` (ilp32, soft): `-mabi=ilp32 -march=rv32imac`. - - `riscv32-elf-hardfloat` (ilp32f): `-mabi=ilp32f -march=rv32imafc`. - - Both keep `ABI=ilp32` (the *integer* layout → `rt/lib/include/ilp32_le`; `f` only affects - FP arg passing), `INT128=0`, `CORO=riscv32`. -- Mandatory builtins are already selected: `RT_ABI_SRCS_ilp32 = rt/lib/int32/int32.c` (64-bit - int helpers) and `rt/lib/fp/fp.c` (soft `double`). Verify the df soft-float ops compile for - the rv32 target. -- `mk/lib_srcs.mk`: widen the ABI/reloc source guards to include `KIT_ARCH_RV32_ENABLED`; add - `reloc_riscv32.c` to the ELF source group. -- **Gate**: `kit cc -target riscv32-none-elf -c rt/lib/.../smoke` builds; `make rt` produces - the rv32 runtime variants. - -### WS8 — JIT `run` / `dbg` -`kit run`/`dbg` execute JIT bytes **natively in-process** (`run.c` `entry_fn(...)`); there is -no cross-arch execution path (emulator is out of scope). So on a non-rv32 host, rv32 code -cannot be executed — same situation as rv64's existing JIT test, which builds the image and -**skips the call** (exit 77). -- `src/link/link_jit.c`: audit only — it is XLEN-neutral and patches via shared `R_RV_*` - reloc kinds; the only u64/TLV slots are Mach-O-guarded (ELF never reaches them). No change - expected, provided WS2/WS6 emit the same reloc kinds. -- `rv32_dbg_ops` from WS2 (RVC-aware lengths, step-over fallback). -- **v1 deliverable**: JIT image build + relocation + symbol lookup wired and unit-tested - without execution; native execution host-gated to rv32 hosts. - -### WS9 — Tests & verification (see Verification below) - -## Parallel workstream map - -Much of this is separable. Lock a small set of **shared interfaces first** (Phase A), then five -tracks proceed in parallel (Phase B), converging at integration (Phase C). The critical path is -Phase A → Track 1 (the backend chain WS1→WS2→WS3); ELF32 (Track 2) is the largest *effort* but is -parallel, so starting it immediately keeps it off the wall-clock. - -**Phase A — shared contracts (serial, small, land first; unblocks everyone):** -- **WS0** `RiscvVariant` + `riscv_variant_for_kind` + `KIT_ARCH_RV32_ENABLED`. -- **WS4a** the float-ABI interface only: `KitFloatAbi` enum, `KitTargetSpec.float_abi`, - `KitTargetOptions.abi`, and the `-mabi`/`-mcmodel` parse → resolve → validate plumbing - (`driver/lib/target.c`, `driver/cmd/cc.c`, `src/api/core.c`). No classifier change yet. - -The four contracts everyone codes against (freeze these in Phase A): -1. **`RiscvVariant`** fields (XLEN/ptr_bytes/gp_slot_bytes/has_w_forms/shamt_bits/frame_save_size) - — consumed by Track 1. -2. **`float_abi`** on the spec — consumed by Track 2 (e_flags), Track 3 (FP-eligibility), - Track 5 (soft-double), and WS3 (predefined macros). -3. **Reloc-kind list**: the exact `R_RV_*` kinds rv32 codegen emits = the set rv32 ELF maps and - `link_jit` expects (= existing rv64 set minus `R_*64`/`ADD64`/`SUB64`). Track 1 ↔ Track 2. -4. **Runtime libcall names** (`__adddf3`, `__muldf3`, `__fixdfsi`, `__floatsidf`, `__extendsfdf2`, - `__truncdfsf2`, `__muldi3`, `__divdi3`, `__udivdi3`, `__moddi3`, `__ashldi3`, `__lshrdi3`, - `__ashrdi3`) emitted by WS6 = provided by WS7. Track 5 ↔ Track 4. -5. **ABI part-layout**: i64/soft-`double` → even-aligned GPR pair; `gp_slot_size=4`; callee-save - stride. Track 3 publishes it via the vtable; Track 1's native-frame code consumes it. - -**Phase B — parallel tracks (each independently testable):** -- **Track 1 — Backend (critical path, serial within):** WS1 (rename + thread variant, rv64 - byte-identical) → WS2 (ISA/asm/disasm/link/dbg XLEN param) → WS3 (rv32 ArchImpl + `-march` + - macros). Gate per step against rv64 regression, then rv32 mc/disas round-trip. -- **Track 2 — ELF32 (WS5):** fully independent of codegen — develop and test the ELFCLASS32 - writer/reader via a hand-built `ObjBuilder` for `KIT_ARCH_RV32` (`test/elf/unit/rv32_class32.c` - write→read roundtrip). Only consumes `float_abi` (e_flags) + the reloc list. Largest effort; - start day one. -- **Track 3 — ABI classifier (WS4b):** the shared RISC-V classifier + `rv32_vtable`, parameterized - by the descriptor. Independent of codegen — test via `test/api/abi_classify_test.c` for ilp32f - and ilp32. Consumes `RiscvVariant`/`float_abi`. -- **Track 4 — Runtime (WS7):** `mk/rt.mk` fixes (correct `-march`/`-mabi`, add hardfloat variant) - + `mk/lib_srcs.mk` guards. The edits are independent and land early; the `make rt` *validation* - gates on Track 1 codegen. -- **Track 5 — cg legalization (WS6):** wide-int + soft-`double` → libcall routing in - `src/cg/arith.c`, keyed on `ptr_size`/`float_abi`. Logic is independent; end-to-end validation - needs Track 1 + Track 4. Highest correctness risk — design early against the libcall contract. - -**Phase C — integration (after tracks converge):** -- Register `arch_impl_rv32` (Track 1 + Track 3). Wire object registry (Track 2). -- **WS8** JIT `run`/`dbg` audit + `rv32_dbg_ops` (Track 1 + Track 2). -- **WS9** end-to-end: decode/asm goldens, `kit cc → ld → qemu` smoke (all tracks + WS6 + WS7). - -## Verification - -### Verified execution oracle (clang + qemu-system, confirmed working on this host) - -clang 22 has the `riscv32` target and `llvm-objdump`/`llvm-mc`/`ld.lld` are installed. -**qemu user-mode is not built on macOS** — only `qemu-system-riscv32` — which suits a -freestanding `-none-elf` target. A confirmed working recipe (PASS→exit 0, wrong answer→exit 7, -hang→exit 124), to be mirrored by `test/smoke/rv32.sh`: -- Build: `clang --target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f -nostdlib -ffreestanding` - (and an `ilp32`/`rv32imac` soft variant); link `ld.lld -Ttext=0x80000000 -e _start`. -- Startup stub (`_start`): set `sp` (RAM at `0x80000000`); **for ilp32f set `mstatus.FS`** - (`li t0,0x2000; csrs mstatus,t0`) to enable the FPU before any `fadd.s` — otherwise it traps - and hangs. Soft `ilp32` skips this. -- Result via SiFive test finisher at `0x100000`: `0x5555`→qemu poweroff exit 0; - `0x3333|(code<<16)`→qemu exit `code`. -- Run: `qemu-system-riscv32 -machine virt -bios none -kernel prog.elf -nographic -no-reboot` - (wrap in `timeout`). Verified that clang emits the expected `fadd.s` + inline 64-bit `add`/`sltu` - + `fcvt.w.s` for ilp32f, and `llvm-readelf` shows ELF32 / "single-float ABI" / RVC flags. - -This is the kit smoke: `kit cc -target riscv32-none-elf ... -c app.c`, assemble the startup stub, -`kit ld` to an ELF, run under qemu-system, assert exit 0. Unlike rv64 (qemu-user/podman), rv32 -uses qemu-system + a bare-metal startup + finisher device. `regen-rv32.sh` uses -`clang --target=riscv32 + llvm-objdump` for asm/disasm goldens. - -### Milestones - -kit has no in-process rv32 execution path (emulator out of scope), so behavioral correctness -comes from the **clang+qemu-system oracle above**; structural correctness comes from -**self-consistency** (decode↔format, ELF write↔read). Milestone order (each green before the -next), preferring targeted runs and redirecting output to a file (per CLAUDE.md): - -1. **Build/register**: `make lib 2>&1 | tee /tmp/build.log`; target recognized. -2. **Decode/encode self-roundtrip** — new `test/arch/rv32_decode_test.c` (mirror - `rv64_decode_test.c`): no W-forms, `lw/sw` (no `ld/sd`), 5-bit shamt, `c.jal`, - `c.lw/c.sw`, `c.flw/c.fsw`; decode↔format agreement is the oracle. - `make test-isa 2>&1 | tee /tmp/isa.log`. -3. **Assembler/disasm corpus** — `test/asm/` rv32 lane + `regen-rv32.sh` (clang - `--target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f` + `llvm-objdump` as reference, - maintainer-only, soft-skip if absent; committed goldens replayed by CI). - `make test-asm-rv32 2>&1 | tee /tmp/asm32.log`. -4. **ELF32 round-trip** — `test/elf/unit/rv32_class32.c` (first ELFCLASS32 consumer): - write→read-back, assert `EI_CLASS==ELFCLASS32`, `Elf32_Sym`/`Elf32_Rela` survive. - `make test-elf 2>&1 | tee /tmp/elf.log`. -5. **Compile + inspect** (no execution): - `./build/kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -mabi=ilp32f -c - smoke.c -o /tmp/rv32.o` then `./build/kit disas /tmp/rv32.o` (optional cross-check - `llvm-objdump -d --triple=riscv32 /tmp/rv32.o`). -6. **Link + JIT image** — new `test/link/rv32_jit_test.c` (mirror `rv64_jit_test.c`, exit 77 - on non-rv32 host; include a PC-relative reloc to exercise HI20/LO12 pairing). `kit ld` to a - static ELF executable succeeds. -7. **qemu-system smoke** — `test/smoke/rv32.sh` using the verified oracle above - (`qemu-system-riscv32 -machine virt`, FPU-enabling startup for ilp32f, SiFive finisher exit - codes). Compiles `app.c` with `kit cc -target riscv32-none-elf`, links with the startup stub, - runs under qemu, asserts exit 0. This is the **only behavioral oracle** (soft-double and - 64-bit-int correctness are otherwise untestable) — make it a required CI gate where - `qemu-system-riscv32` is present; skip-if-absent elsewhere. Add a doctor - (`test/lib/check_rv32_env.sh`) like rv64's. - -New make targets next to their rv64 peers in `test/test.mk`: `RV32_DECODE_TEST_BIN` (into -`test-isa`), `test-asm-rv32`, `test-rv32-jit`, `test-smoke-rv32`, and `rv32` added to the -runtime test arch list. - -**RV64 regression gate** (run after WS1 and again at the end): -`make test-isa test-asm-rv64 test-smoke-rv64 test-link` + `rv64_jit_test`. - -## Risks - -1. **64-bit-int + soft-double on rv32 (WS6) is the deepest, execution-only risk.** Carry/borrow - chains and soft-float rounding can't be checked by byte-goldens — only execution catches - valid-but-wrong codegen. The behavioral oracle (qemu-system, verified) closes this, but - depends on `qemu-system-riscv32` being present and a correct FPU-enabling startup stub for - ilp32f (a missing `mstatus.FS` set silently hangs instead of failing cleanly). Mitigate with - qemu-gated differential tests (kit result vs host double/int64) and loud backend panics on any - wide/`.d` value reaching the native path. -2. **ELFCLASS32 (WS5) is the dominant effort** (~130 Elf64-hardcoded sites across emit/read/link). - The write-then-read self-oracle catches internal inconsistency but not spec divergence; keep - one clang-oracle `cases/` rv32 ELF test for an independent cross-check. Disambiguating - EM_RISCV by `EI_CLASS` is a cross-cutting correctness point. -3. **Sharing risk to RV64 (WS1/WS2)**: repurposing `rv_is_64` semantics, the `RV_FRAME_SAVE_SIZE` - constant→`2*ptr_bytes`, and the compressed-quadrant/shamt branches all touch the working - rv64 path. Land WS1/WS2 with rv64-byte-identical output and prove zero diff before enabling - rv32. -4. **`-mabi` boundary**: parsed in `driver/`, validated in `src/api/core.c` where feature words - exist. Every spec-construction site that bypasses `kit_target_new` must default - `float_abi=DEFAULT` safely; the catch-all `-m` consumer must not pre-eat `-mabi`. -5. **ilp32 vs ilp32f confusion**: `ilp32` is the *integer* ABI (type widths); the `f` is float - arg-passing only. The runtime `ABI=ilp32` include set is correct for both; the existing - `-march=rv32imafd` (D) is wrong and must become `rv32imafc`/`rv32imac`. -6. **RVC dbg gap**: rv32imafc emits compressed insns pervasively; v1 step-over fallback degrades - `kit dbg` single-step for rv32. The shim unit test must assert the fallback path is taken. - -## Critical files - -- `src/arch/riscv/` (renamed from `rv64/`): `variant.h` (new), `native.c`, `isa.c/.h`, - `disasm.c`, `asm.c`, `link.c`, `dbg.c`, `arch.c` (two ArchImpls, `-march`, macros). -- `src/abi/abi_rv64.c` → shared RISC-V classifier + `rv32_vtable`; `src/abi/registry.c`. -- `src/cg/arith.c` — wide-int + soft-double legalization (WS6, the riskiest, currently absent). -- `src/obj/elf/{elf.h,emit.c,read.c,link.c}` + new `reloc_riscv32.c`; `src/obj/registry.c`. -- `include/kit/core.h` (`KitFloatAbi`, `KitTargetSpec.float_abi`, `KitTargetOptions.abi`), - `include/kit/config.h` (`KIT_ARCH_RV32_ENABLED`). -- `driver/lib/target.c`, `driver/cmd/cc.c`, `driver/cmd/run.c` (`-mabi`, `medlow/medany`); - `src/api/core.c` (resolve/validate). -- `src/arch/registry.c`, `mk/rt.mk`, `mk/lib_srcs.mk`, `test/test.mk` + new test files. diff --git a/mk/test.mk b/mk/test.mk @@ -876,8 +876,11 @@ test-asm-rv32: lib $(ASM_RUNNER) # `kit cc -target riscv32-none-elf` and runs the freestanding ELF bare-metal # under qemu-system-riscv32 via test/lib/exec_rv32_bare.sh (the qemu exit code # is the exit-code oracle). Self-skips per case when the rv32 toolchain -# (clang riscv32 + qemu-system-riscv32) is absent. Opt-in (not in -# DEFAULT_TEST_TARGETS): real rv32 codegen gaps are still left RED on purpose. +# (clang riscv32 + qemu-system-riscv32) is absent. The corpus is green; this +# lane is opt-in (not in DEFAULT_TEST_TARGETS) because it needs that toolchain, +# matching the rv64 cross lanes. The only non-passing cases are intentionally +# unsupported on rv32 (__int128, binary128 long double, LP64-data-model +# assumptions, aa64-only intrinsics) and carry committed .rv32.skip sidecars. test-toy-rv32: bin rt-riscv32-elf-hardfloat @KIT=$(abspath $(BIN)) KIT_TOY_CROSS_ARCHS=rv32 KIT_TEST_PATHS=X \ test/toy/run.sh @@ -886,7 +889,9 @@ test-toy-rv32: bin rt-riscv32-elf-hardfloat # only. parse-runner --emit -> kit ld + start crt -> qemu-system-riscv32 # (test/parse/run.sh's rv32 freestanding E path via exec_rv32_bare.sh). Models # test-parse-rv64-wide; opt-in (needs the rv32 toolchain/qemu), so excluded -# from DEFAULT_TEST_TARGETS while rv32 reds still exist. +# from DEFAULT_TEST_TARGETS, matching test-parse-rv64-wide. The corpus is green; +# the only skips are intentionally-unsupported cases (__int128, binary128 long +# double, LP64-data-model assumptions) with committed .rv32.skip sidecars. test-parse-rv32: lib rt-riscv32-elf-hardfloat $(PARSE_RUNNER) $(ROUNDTRIP_BIN) \ $(LINK_EXE_RUNNER) @KIT_TEST_ARCH=rv32 KIT_TEST_PATHS=E bash test/parse/run.sh