Plan: RISC-V 32-bit (riscv32-none-elf) support
Status — 2026-06-03 (branch rv32) — core complete; cross-test gaps tracked
riscv32-none-elf (rv32imafc_zicsr_zifencei, both ilp32f and ilp32) is a
working cross target. WS6 — the flagged "hardest part", 64-bit-value legalization
— is done and behaviorally verified under qemu-system-riscv32 at -O0 and -O1
for both ABIs. The full kit toolchain (kit cc → kit ld → qemu-system) builds and
runs a correct bare-metal rv32 image with no special flags (freestanding
defaults to no-PIE). As of 2026-06-03 the rv32 runtime is no longer
special-cased: kit cc/kit ld auto-build and auto-link libkit_rt.a for
riscv32-none-elf exactly like every other target — the driver carries two
rv32 runtime variants (riscv32-elf soft ilp32, riscv32-elf-hardfloat
ilp32f), selected by the float ABI recovered from the objects' ELF e_flags, so
no explicit archive or -nostdlib is needed. RV64 / x64 / aa64 fully
non-regressed: asm goldens byte-identical, isa (rv64 21 + rv32 31)/0,
abi-classify 367/0, elf 41/0, link 122/0 + x64 79/0, cg-api 544/0, smoke-rv64 3/0,
dwarf/driver/interp green.
Both corpora now run on qemu-system-riscv32 as a cross arch: Toy 240 pass / 15 red (test/toy/run.sh, path X) and C 439 pass / 36 red (test/parse/run.sh,
path E). The reds are deliberately left red (no skip sidecars) — they are the
real remaining rv32 gaps, enumerated in the checklist below.
Done & verified ✅
- WS0–WS5, WS7 — variant scaffold, XLEN-parameterized backend,
arch_impl_rv32+-march/-mabi/macros, shared ABI classifier +rv32_vtable, ELFCLASS32 emit/read/link +reloc_riscv32.c,mk/rt.mkvariants. (See git history.) - WS6 — 64-bit-value pair-legalization (THE blocker) — DONE. rv32 8-byte scalars
(
long long/i64 AND softdouble) are memory-resident (api_is_wide8_scalar_typeforcesCG_LOCAL_MEMORY_REQUIRED;cg_ir_lower/pass_native_emitsize>word checks made> ptr_size), mirroring the proven i128/wide16 model. The allocator binds one register per value, so memory residence + the multi-part ABI path (ABIArgPart.src_offset,rv_load_part/rv_store_part) is the only correct representation. (src/cg/arith.c,src/cg/wide.c): - add/sub/and/or/xor/neg/bnot — inline 2-word lane ops (carry/borrow viasltu); no compiler-rt 64-bit add helper exists, so these must be inline. - i64 compares — inline lane eq/lt (signed-hi/unsigned-lo);if(i64)=(lo|hi)!=0. - i64 mul/div/rem/shift →__*di3; softdouble→__*df*; i64↔float →__floatdisf/__fixsfdi/…; soft single f32 underilp32→__*sf*; i64 clz/ctz/popcount/bswap →__*di2; 64-bit consts → two lanes. -nd_*guards (native_direct_target.c) panic on any 8-byte value reaching a single-register binop/unop/cmp/convert/load_imm/load_const — loud, never truncation. - Runtime (
make rt) — DONE. Bothriscv32-elf(ilp32) andriscv32-elf-hardfloat(ilp32f) build with kit's own cc. Fixedmk/rt.mk:RT_CFLAGS/RT_ASFLAGSnow includeRT_<v>_ARCH_FLAGS(the-mabi/-marchwere silently dropped — every variant built ilp32f). - ELF e_flags float-ABI —
emit.c/link.cderive the RISC-V float-ABI bits fromtarget.float_abi(the static descriptor hardcoded SINGLE, mislabellingilp32soft); rv64/x64/aa64 byte-identical. - Freestanding policy (host-irrelevant, target-derived):
- kit stamps
EI_OSABI=ELFOSABI_STANDALONEon*-none-elfobjects (emit.c) so they round-trip asKIT_OS_FREESTANDINGinstead of decoding back to Linux (the "none → Linux" bug).kit ldderives the PIC default from the target viadriver_default_pic(hosted → PIE, freestanding → no-PIE) and scans all inputs for a freestanding object — the host's default never leaks onto a cross target. Sokit ldfor rv32 needs no-no-pie. -kit ld/kit ccauto-link a runtime for any target that has a variant (driver_runtime_has_variant) — now includingriscv32-none-elf. The driver (driver/lib/runtime.c) carries two rv32 runtime variants distinguished by a newfloat_abiaxis onRuntimeVariant(riscv32-elfsoftilp32/rv32imac,riscv32-elf-hardfloatilp32f/rv32imafc); each is built on demand with its own-march/-mabiviatopts.isa/topts.abi. The float ABI is recovered from the RISC-V ELFe_flagsinsrc/api/object_detect.cand reconciled across all link inputs indriver/cmd/ld.c(a foreign startup stub that lacks the flag never mis-selects the soft runtime). So a freestanding rv32 link needs no explicitlibkit_rt.aand no-nostdlib. New-Ttext ADDRand-nostdlib/--no-default-libsflags remain available for images that supply their own runtime. -.eh_framesuppressed forKIT_OS_FREESTANDING(src/arch/mc.c); hosted byte-identical. -layout_dynemits a clean diagnostic for an ELF32 dynamic/PIE link (was an ELF64 SEGV). - jump-table / label-address slots are width-aware (R_ABS32on rv32,R_ABS64on 64-bit) innd_local_static_data_label_addr— fixes switch jump tables on rv32. - WS9 tests + CI wiring:
test/arch/rv32_decode_test.c(→test-isa, 31 checks),test/link/rv32_jit_test.c(→test-rv32-jit, exit-77 host gate),test/elf/unit/rv32_class32.c(ELFCLASS32 round-trip, →test-elf),test/smoke/rv32.sh(→test-smoke-rv32): 7 lanes — ilp32f + ilp32 × {-O0,-O1} covering i64 + soft-double + soft-single, twokit ldend-to-end lanes that auto-link the runtime (no explicitlibkit_rt.a), a negative control. Wired inmk/test.mk/mk/test_unit.mk(test-rv32-jit,test-smoke-rv32). - Toy + C cross lanes (rv32 as an arch). Shared bare-metal runner
test/lib/exec_rv32_bare.sh(clang startup →kit cc/parse-runner →kit ld→ qemu-system, SiFive-finisher exit oracle; entry symbol configurable —mainfor Toy,test_mainfor C). Toy:test/toy/run.shcross_one_rv32(rv32 in defaultTOY_CROSS_ARCHS, path X) — 240/15. C:test/parse/run.shkit_lane_Erv32 branch +kit_test_target.hrv32 arm (path E,KIT_TEST_ARCH=rv32) — 439/36. Both opt-in; reds left red.
Remaining ⚠️ — clear checklist
A. rv32 codegen gaps surfaced by the cross lanes (the reds — left red on purpose, no skips).
Toy 240/15, C 439/36; the 51 reds cluster into:
__int128(C:i128_02…i128_13+, ~15 cases — the largest C bucket). rv32 has no__int128(runtimeINT128=0; the 16-byte-scalar path is dead on rv32). Decide: reject__int128on rv32 at the front end with a clear diagnostic (cleanest), or legalize it (a 4-word version of the wide8 work — large). Until then these are compile-fail/wrong-result.- i64 atomics (
@atomic_*<i64>/__atomic_*_8; Toy 17/22/59/73/74/75/77, Cbuiltin_*_atomic_long). rv32Ahas no 64-bit AMO/lr.d/sc.d; needs__atomic_*_8libcalls (libatomic / a lock), absent freestanding. Provide 8-byte__atomic_*inrt/, or document as a hard rv32 limitation. - 64-bit
*_overflowintrinsics (Toy 58_overflow_record, Cbuiltin_26_sadd_overflow). Legalize i64 sadd/uadd/ssub/usub/smul/umul-overflow on rv32 (the 64-bit operand reaches the backend un-split today → trap), à la the clz/ctz wide8 routing inarith.c. 32-bit works. - i64 varargs (Toy 133_varargs_mixed_types — wrong result, not a hang). Audit the rv32
va_argpath for an 8-byte value (even-pair fetch from the vararg save area). - thread-local storage (Toy 141, C
6_7_1_03_thread_local_basic,gnu_thread_storage_01). TLS needs a thread pointer the bare-metal image never sets up — likely a genuine freestanding limitation (the Linux lanes get it from the OS); document, or provide a static-TLS model. - toy soft-float compare lowering (Toy 153_fp_cmp_negation_b —
kit cc"addr operand is not an lvalue", rv32-only, not reproducible in C). An eager soft-fp compare feeding an empty-then/else block hits an lvalue path the rv64 delayed-SV_CMPform avoids. Narrow. - 123_spec_demo (Toy, hangs) — triage which of the above it exercises.
- Test-environment mismatches (NOT rv32 codegen bugs; an
.rv32.skipsidecar exists for them but none is committed): Toy 145_baremetal_privileged_aa64 (aa64 intrinsics), 20_cg_api_inline_asm_full- C
asm_01_grammar(inline-asm constraints/grammar), 47_target_arch_switch (selects its expected exit code by target arch).
- C
B. Pre-existing follow-ups (orthogonal to the cross tests).
- Optional
maketargetstest-toy-rv32/test-parse-rv32(opt-in; not inDEFAULT_TEST_TARGETSwhile reds exist). test/asm/rv32 byte-golden lane +regen-rv32.sh(rv32 arm intest/asm/run.sh/kit_unit.h+ committed clang/llvm-objdump goldens;kit_test_target.halready has rv32).- CSR pseudo-ops in the assembler (
csrs/csrw/csrr/… + CSR names) — a general RISC-V-assembler feature (missing on rv64 too; newRV64_FMT_CSR_{R,W,WI}+ CSR-name table + disasm print cases). Until then the smoke/cross startup stub is clang-assembled.
Out of scope (decided): kit ld ELF32 dynamic/PIE — rv32 is static-only; layout_dyn
clean-panics on an ELF32 dynamic/PIE link and that is the intended behavior.
Where to look
- WS6 legalization:
src/cg/wide.c,src/cg/arith.c(binop/unop/cmp/convert + soft-fp + clz/ctz),src/cg/{value,local,memory,call,control}.c,src/opt/{cg_ir_lower.c,pass_native_emit.c},src/cg/native_direct_target.c(nd_*panics +nd_local_static_data_label_addr). - Backend:
src/arch/riscv/{variant.{h,c},native.c,isa.{c,h},disasm.c,asm.c,link.c,dbg.c,arch.c}. - ABI:
src/abi/abi_rv64.c+src/abi/registry.c. - ELF / kit ld / freestanding policy:
src/obj/elf/{elf.h,emit.c,read.c,link.c,link_dyn.c}+reloc_riscv32.c;driver/cmd/ld.c(-Ttext/-nostdlib/PIC-from-target),driver/lib/target.c(driver_default_pic),driver/lib/runtime.{c,h}(driver_runtime_has_variant, the two rv32RuntimeVariantentries +float_abi/isa/abiaxis,rt_build_archive),src/api/object_detect.c(EI_OSABI → os; RISC-Ve_flags→float_abi),src/link/{link.c,link_layout.c},src/api/link.c. - Runtime/intrinsics:
mk/rt.mk(ARCH_FLAGS),src/cg/type.c(rv32 ≡ rv64 for intrinsics). - Tests:
test/smoke/rv32.sh,test/lib/{check_rv32_env.sh,exec_rv32_bare.sh,kit_test_target.h},test/toy/run.sh(cross_one_rv32),test/parse/run.sh(kit_lane_Erv32 branch),test/arch/rv32_decode_test.c,test/link/rv32_jit_test.c,test/elf/unit/rv32_class32.c,mk/test.mk,mk/test_unit.mk.
Context
kit today targets riscv64 (LP64D) via a single backend in src/arch/rv64/. We want a
new cross target:
--target=riscv32-none-elf
-march=rv32imafc_zicsr_zifencei
-mabi=ilp32f (and also -mabi=ilp32, soft-float)
-mcmodel=medlow
This is a freestanding 32-bit RISC-V toolchain target: F (single-precision hardware float)
but no D, so double and long long are not native and must be lowered. The enum
KIT_ARCH_RV32, the riscv32 triple parse (driver/lib/target.c:275, ptr_size=4), ELF
auto-detection (src/api/object_detect.c), and the runtime source files
(rt/lib/riscv/rv32.S, rt/lib/coro/riscv32.c) already exist but are unwired/incomplete.
The intended outcome: kit cc/as/ld/objdump/disas produce and consume correct
riscv32-none-elf ELFCLASS32 objects and static executables for both ilp32f and ilp32,
with libkit_rt.a builtins available and the JIT run/dbg plumbing wired (native
execution host-gated, as for rv64).
Confirmed scope decisions
- Shared backend: refactor
src/arch/rv64/into one XLEN-parameterized RISC-V backend serving both rv32 and rv64 from a single tree. RV64 must not regress and is re-validated. - Subsystems in scope: compile + assemble + link + disasm; runtime lib; JIT
run/dbg. Emulator is out of scope (src/emu,src/os,src/obj/elf/emu_load.cstay rv64-only). - ABIs:
ilp32f(single hard-float:floatinfa0-fa7,double/i64via integer regs + soft-float) andilp32(pure soft-float).doubleis always soft-float. - Code model: accept and validate
-mcmodel=medlow/medany, but keep the existing PC-relative (auipc+R_RV_PCREL_HI20/LO12, GOT for externs) addressing for v1. No new absolute-addressing path.
XLEN-parameterization mechanism
Add a const RiscvVariant* descriptor (immutable, two static instances selected by
KitArchKind) carried on the per-function codegen context and threaded into the otherwise
stateless decode/asm/disasm/link/dbg paths. This honors "no global state — everything hangs
off a context struct" (the variant is a const table reached through a context, never ambient).
New src/arch/riscv/variant.h:
typedef struct RiscvVariant {
KitArchKind kind; /* KIT_ARCH_RV32 / KIT_ARCH_RV64 */
const char* name; /* "rv32" / "rv64" */
const char* isa_prefix; /* "rv32" / "rv64" — for -march parsing */
u8 xlen; /* 32 / 64 */
u8 ptr_bytes; /* 4 / 8 — pointer & native register width */
u8 gp_slot_bytes; /* 4 / 8 — varargs save & callee-save stride */
u8 has_w_forms; /* 0 rv32 / 1 rv64 — ADDW/ADDIW/SLLIW/... */
u8 shamt_bits; /* 5 rv32 / 6 rv64 — SLLI/SRLI/SRAI immediate */
u32 frame_save_size; /* 2 * ptr_bytes (8 rv32 / 16 rv64) */
} RiscvVariant;
const RiscvVariant* riscv_variant_for_kind(KitArchKind);
Reached via: RvNativeTarget.variant (codegen), riscv_variant_for_kind(c->target.arch) in
the decoder/assembler/disassembler/dbg (they already hold a Compiler*), and two
LinkArchDesc literals for the linker. Distinguish three different "8"s carefully —
ptr_bytes (pointer/reg width), gp_slot_bytes (ABI save stride), and frame_save_size
(saved ra+s0 pair) — conflating them passes rv64 (all 8) and breaks rv32.
The float ABI (soft vs single-hard) is a separate axis from XLEN, carried on
KitTargetSpec.float_abi (see WS4), consumed by the ABI classifier and predefined macros.
Workstreams (ordered; each leaves a green targeted check)
WS0 — Config + variant scaffold (no behavior change)
include/kit/config.h: add#define KIT_ARCH_RV32_ENABLED 1(mk/config.mkauto-parses it into a make var — noconfig.mkedit needed).- Add
src/arch/riscv/variant.hwith the struct + twoconstinstances + lookup. - Gate:
make libcompiles.
WS1 — Directory rename + thread variant through codegen (rv64 still identical)
git mv src/arch/rv64 src/arch/riscv; fix include guards/paths. Updatemk/lib_srcs.mk:55,189(LIB_SRCS_ARCH_RV64→LIB_SRCS_ARCH_RISCV, gated byRV32 || RV64). The only external referent is the symbolarch_impl_rv64insrc/arch/registry.c(path-independent).- Keep file names and internal
rv64_/rv_symbol prefixes for v1 (cosmetic rename is a separate follow-up; renaming 2000+ sites is pure regression risk). src/arch/riscv/native.c: addconst RiscvVariant* varianttoRvNativeTarget(set fromc->target.archin the one constructor); replace hardcoded 8/16/RV_FRAME_SAVE_SIZE/addiw/ld/sd/float-fmt sites with variant reads. With the rv64 variant the emitted bytes are byte-for-byte identical — this isolates the "sharing" regression from rv32 correctness. Key sites (filesrc/arch/riscv/native.c):rv_emit_li32(LUI+ADDIW→ADDI when!has_w_forms),enc_int_load/store(sw/lw vs sd/ld byptr_bytes),RV_FRAME_SAVE_SIZE, varargs save area, callee-save stride,rv_type_size/aligndefaults,rv_convertsext/zext (xlen - src_bitsshift;addiwfast-path only whenhas_w_forms).- Gate:
make test-smoke-rv64,test/arch/rv64_decode_test.c,test/asm/regen-rv64.sh,test/link/rv64_jit_test.call byte-identical green.
WS2 — ISA / asm / disasm / link / dbg XLEN parameterization (still rv64-only at runtime)
src/arch/riscv/isa.c/isa.h: add a one-byte availability mask column toRv64InsnDesc(RV_AV_RV32 | RV_AV_RV64) rather than a second table. Mark RV64-only: W-forms (addw/subw/sllw/srlw/sraw,addiw/slliw/srliw/sraiw,mulw/divw/divuw/remw/remuw), 64-bit mem (ld/sd/lwu), 64-bit FP int conv (fcvt.*.l/lu,fmv.x.d/d.x), compressedc.addiw/c.addw, and the RV64 meaning ofc.ld/c.sd/c.ldsp/c.sdsp/c.fld/c.fsd/.... Enable RV32-only:c.jal(shares the encoding that isc.addiwon rv64),c.lw/c.sw,c.flw/c.fsw.src/arch/riscv/disasm.c+ the compressed decoderrv64_disasm_find_c: pass the variant in; branch the ambiguous compressed quadrant encodings and the 5-bit vs 6-bit shamt decode (& 0x1fon rv32, reject bit 25 set).rv64_disasm_find/rv64_asm_findskip rows by mask.src/arch/riscv/link.c: splitlink_arch_rv64and a newlink_arch_rv32; PLT/IPLT stubs userv_lwinstead ofrv_ld(re-check stub sizes/offsets for 4-byte slots).src/arch/riscv/dbg.c: parameterize the displaced-step shim byptr_bytes; setmin_insn_len=2, max_insn_len=4for rv32 (C ext on); RVC control-flow falls back to step-over (KIT_UNSUPPORTED), 4-byte fixups reuse the rv64 builder.- Gate:
make test-isa,regen-rv64.sh,rv64_jit_teststill green.
WS3 — rv32 ArchImpl + registry + -march + predefined macros
src/arch/riscv/arch.c: define botharch_impl_rv32andarch_impl_rv64(sharecgtarget_new/asm_new/disasm_new/decode/dwarf/dbg/asm_ops/register file; differ in.kind,.name,.link,.predefined_macros,.target_feature_*, andcfi_data_align_factor-4 vs -8).- Generalize
rv64_target_feature_apply_isa(currently hard-requires the"rv64"prefix,arch.c:204) to compare againstvariant->isa_prefix. rv32 default profile =rv32imafc_zicsr_zifencei(I/M/A/F/C/Zicsr/Zifencei, D cleared). - Predefined macros for rv32 (float-abi-dependent, see WS4):
__riscv_xlen=32,__ILP32__/_ILP32(drop__LP64__/_LP64),__riscv_float_abi_single(ilp32f) or__riscv_float_abi_soft(ilp32) instead of_double,__riscv_flen=32when F present. src/arch/registry.c: registerarch_impl_rv32under#if KIT_ARCH_RV32_ENABLED(:24,50,57);arch_kind_namealready returns "riscv32".- Gate:
kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -E -dMshows the right macros;kit mc/disas -target riscv32-none-elfround-trips a hand-written rv32 insn.
WS4 — ABI vtable refactor + -mabi plumbing
- New spec field: in
include/kit/core.haddenum KitFloatAbi {DEFAULT, SOFT, SINGLE, DOUBLE}anduint8_t float_abionKitTargetSpec; addKitSlice abitoKitTargetOptions. - Driver
-mabi: indriver/lib/target.c, intercept-mabi=/-mabiindriver_target_features_try_consumebefore the catch-all-m<x>fallback (which would otherwise mis-eat it), mirroring-marchat:154-165; carry throughdriver_target_options. Addmedlow→KIT_CM_SMALL,medany→KIT_CM_MEDIUMaliases incc_record_mcmodel(driver/cmd/cc.c:751) andrun_record_mcmodel(driver/cmd/run.c:379). - Resolve + validate in
kit_target_new(src/api/core.c), after-marchfeatures are known: parseilp32|ilp32f|ilp32d|lp64|lp64f|lp64d; if omitted, derive from-march(D→DOUBLE, F-no-D→SINGLE, else SOFT); reject*fwithout F and*dwithout D. Sorv32imafcdefaults toilp32f, andilp32dis rejected. - Shared ABI classifier: generalize
src/abi/abi_rv64.cinto a RISC-V classifier parameterized by a descriptor{xlen_bytes (=ptr_size), gpr_bytes, aggregate_gpr_bytes=2*gpr, flen (0/4/8), float_abi}read froma->c->target. Replace theRV64_ABI_*_BYTES=8/16enum.- FP-eligibility predicate
fp_eligible(desc, size): SOFT never; SINGLE iffsize==4(float;double8>flen4 → INT pair); DOUBLE iffsize<=8(preserves rv64 LP64D). classify_scalar: i8/16/32/ptr → 1 INT part;i64/soft-double→ 2 INT parts of 4 in an even-aligned GPR pair;float(ilp32f) → 1 FP part (fa0-fa7). Replace the hardcodedsize==16 → 2×8withnparts = size/gpr_bytes.classify_aggregate: register threshold2*gpr_bytes(8 on rv32), chunk bygpr_bytes; HFA refinement gated byfp_eligible.- va_list:
ABI_VA_LIST_POINTER,gp_reg_count=8,gp_slot_size=4,fp_reg_count=0(FP varargs always go via INT regs even under ilp32f). Two thin static vtables (rv32_vtable,rv64_vtable) sharing the classifier, differing only in the va_list literal.
- FP-eligibility predicate
src/abi/registry.c: addKIT_ABI_RV32_ENABLEDand an{KIT_ARCH_RV32, KIT_OBJ_ELF, &rv32_vtable}entry (one entry serves both ilp32/ilp32f; the float axis is read from the spec).- Gate: ABI classification golden tests (
test/api/abi_classify_test.cstyle) for rv32 ilp32f and ilp32.
WS5 — ELFCLASS32 object emission + reading (largest item)
Introduce one is32/ElfEnc flag (from c->target.ptr_size) threaded through, not
copy-paste duplication.
src/obj/elf/elf.h: addELFCLASS32,ELF32_{EHDR,PHDR,SHDR}_SIZE(52/32/40),ELF32_SYM_SIZE(16)/ELF32_RELA_SIZE(12),ELF32_R_INFO(s,t)=((s)<<8)|((t)&0xff),ELF32_R_SYM/TYPE.src/obj/elf/emit.c: replace theptr_size != 8panic (:271); branch sym record (16B, different field widths) and rela record (12B,ELF32_R_INFO) writers;EI_CLASS(:664); Ehdr/Shdr address fields viaelf_wr_u32and ELF32 sizes; e_flags fromfloat_abi(EF_RISCV_FLOAT_ABI_SINGLE/_SOFT|EF_RISCV_RVC).src/obj/elf/read.c: acceptELFCLASS32(:446,814); addparse_shdr32/parse_sym32/ rela32 with the correct offsets/strides andELF32_R_SYM/TYPE. Scope v1 to ET_REL + ET_EXEC reads; give ELF32 ET_DYN a clear "unsupported" rather than mis-parse.src/obj/elf/link.c: ELF32 ET_EXEC writer (parallel parameterization to emit.c) — needed forld/run/dbg.link_dyn.candemu_load.cstay rv64/ELF64-only: gate rv32 to static linking (freestanding-none-elfdefaults toKIT_PIC_NONE), panic-with-diagnostic for rv32 dynamic.- New
src/obj/elf/reloc_riscv32.c: clonereloc_riscv64.c; mapR_ABS32→ELF_R_RISCV_32, andR_ABS64/R_RV_ADD64/R_RV_SUB64→unsupported; reuse all XLEN-neutral kinds. src/obj/registry.c: add the rv32obj_elf_arch_opsentry. EM_RISCV is shared by rv32 and rv64 — disambiguate reloc-table selection byEI_CLASS, not e_machine alone.- Gate: new
test/elf/unit/rv32_class32.cwrite-then-read round-trip;kit objdump/nmon a hand-built rv32.o.
WS6 — 64-bit-int + soft-float-double legalization (hardest part)
The cg layer (src/cg/arith.c) only routes wide ops to libcalls for the __int128 builtin
(api_i128_stack_top), never by width — so long long on rv32 currently reaches the
backend as a raw 8-byte value, and double arithmetic would emit illegal .d ops.
- 64-bit integers on rv32: generalize the i128 libcall mechanism in
src/cg/arith.cto a "wider than target word" predicate (type_size > c->target.ptr_size). Recommended v1: routemul/div/udiv/mod/shiftsto runtime libcalls (__muldi3,__divdi3,__udivdi3,__moddi3,__ashldi3,__lshrdi3,__ashrdi3); doadd/sub/and/or/xor/load/store/moveinline as register pairs in the backend (these are unavoidable for memory/arg traffic). Add a loud panic inrv_binop/rv_convertif a wide value reaches the native-width path, so any missed case fails fast. - Soft-float
doubleon ilp32f/ilp32: routedoublearithmetic anddouble↔int/float conversions to libcalls (__adddf3,__subdf3,__muldf3,__divdf3,__extendsfdf2,__truncdfsf2,__fixdfsi,__floatsidf, df compares) — mirror the existing f128 path so the backend only ever seesfloat(S) FP ops. Backend panics on anyRV_FMT_Dselection whenxlen==32. - Confirm
long double == double(8B) and__int128absent on rv32 (runtime setsINT128=0, noLDBL128), so the 16-byte scalar classify path is effectively dead there. - Gate: red-green targeted tests —
long longadd/mul/div anddoubleadd/mul/convert compile to plausible sequences (verified via decode/disasm; behavior via qemu if available).
WS7 — Runtime build wiring (mk/rt.mk)
- The
riscv32-elf/riscv32-elf-save-restorevariants exist but are wrong:-mabi=ilp32 -march=rv32imafd(D present). Fix to the confirmed profile and add the hard-float variant:riscv32-elf(ilp32, soft):-mabi=ilp32 -march=rv32imac.riscv32-elf-hardfloat(ilp32f):-mabi=ilp32f -march=rv32imafc.- Both keep
ABI=ilp32(the integer layout →rt/lib/include/ilp32_le;fonly affects FP arg passing),INT128=0,CORO=riscv32.
- Mandatory builtins are already selected:
RT_ABI_SRCS_ilp32 = rt/lib/int32/int32.c(64-bit int helpers) andrt/lib/fp/fp.c(softdouble). Verify the df soft-float ops compile for the rv32 target. mk/lib_srcs.mk: widen the ABI/reloc source guards to includeKIT_ARCH_RV32_ENABLED; addreloc_riscv32.cto the ELF source group.- Gate:
kit cc -target riscv32-none-elf -c rt/lib/.../smokebuilds;make rtproduces the rv32 runtime variants.
WS8 — JIT run / dbg
kit run/dbg execute JIT bytes natively in-process (run.c entry_fn(...)); there is
no cross-arch execution path (emulator is out of scope). So on a non-rv32 host, rv32 code
cannot be executed — same situation as rv64's existing JIT test, which builds the image and
skips the call (exit 77).
src/link/link_jit.c: audit only — it is XLEN-neutral and patches via sharedR_RV_*reloc kinds; the only u64/TLV slots are Mach-O-guarded (ELF never reaches them). No change expected, provided WS2/WS6 emit the same reloc kinds.rv32_dbg_opsfrom WS2 (RVC-aware lengths, step-over fallback).- v1 deliverable: JIT image build + relocation + symbol lookup wired and unit-tested without execution; native execution host-gated to rv32 hosts.
WS9 — Tests & verification (see Verification below)
Parallel workstream map
Much of this is separable. Lock a small set of shared interfaces first (Phase A), then five tracks proceed in parallel (Phase B), converging at integration (Phase C). The critical path is Phase A → Track 1 (the backend chain WS1→WS2→WS3); ELF32 (Track 2) is the largest effort but is parallel, so starting it immediately keeps it off the wall-clock.
Phase A — shared contracts (serial, small, land first; unblocks everyone):
- WS0
RiscvVariant+riscv_variant_for_kind+KIT_ARCH_RV32_ENABLED. - WS4a the float-ABI interface only:
KitFloatAbienum,KitTargetSpec.float_abi,KitTargetOptions.abi, and the-mabi/-mcmodelparse → resolve → validate plumbing (driver/lib/target.c,driver/cmd/cc.c,src/api/core.c). No classifier change yet.
The four contracts everyone codes against (freeze these in Phase A):
RiscvVariantfields (XLEN/ptr_bytes/gp_slot_bytes/has_w_forms/shamt_bits/frame_save_size) — consumed by Track 1.float_abion the spec — consumed by Track 2 (e_flags), Track 3 (FP-eligibility), Track 5 (soft-double), and WS3 (predefined macros).- Reloc-kind list: the exact
R_RV_*kinds rv32 codegen emits = the set rv32 ELF maps andlink_jitexpects (= existing rv64 set minusR_*64/ADD64/SUB64). Track 1 ↔ Track 2. - Runtime libcall names (
__adddf3,__muldf3,__fixdfsi,__floatsidf,__extendsfdf2,__truncdfsf2,__muldi3,__divdi3,__udivdi3,__moddi3,__ashldi3,__lshrdi3,__ashrdi3) emitted by WS6 = provided by WS7. Track 5 ↔ Track 4. - ABI part-layout: i64/soft-
double→ even-aligned GPR pair;gp_slot_size=4; callee-save stride. Track 3 publishes it via the vtable; Track 1's native-frame code consumes it.
Phase B — parallel tracks (each independently testable):
- Track 1 — Backend (critical path, serial within): WS1 (rename + thread variant, rv64
byte-identical) → WS2 (ISA/asm/disasm/link/dbg XLEN param) → WS3 (rv32 ArchImpl +
-march+ macros). Gate per step against rv64 regression, then rv32 mc/disas round-trip. - Track 2 — ELF32 (WS5): fully independent of codegen — develop and test the ELFCLASS32
writer/reader via a hand-built
ObjBuilderforKIT_ARCH_RV32(test/elf/unit/rv32_class32.cwrite→read roundtrip). Only consumesfloat_abi(e_flags) + the reloc list. Largest effort; start day one. - Track 3 — ABI classifier (WS4b): the shared RISC-V classifier +
rv32_vtable, parameterized by the descriptor. Independent of codegen — test viatest/api/abi_classify_test.cfor ilp32f and ilp32. ConsumesRiscvVariant/float_abi. - Track 4 — Runtime (WS7):
mk/rt.mkfixes (correct-march/-mabi, add hardfloat variant)mk/lib_srcs.mkguards. The edits are independent and land early; themake rtvalidation gates on Track 1 codegen.
- Track 5 — cg legalization (WS6): wide-int + soft-
double→ libcall routing insrc/cg/arith.c, keyed onptr_size/float_abi. Logic is independent; end-to-end validation needs Track 1 + Track 4. Highest correctness risk — design early against the libcall contract.
Phase C — integration (after tracks converge):
- Register
arch_impl_rv32(Track 1 + Track 3). Wire object registry (Track 2). - WS8 JIT
run/dbgaudit +rv32_dbg_ops(Track 1 + Track 2). - WS9 end-to-end: decode/asm goldens,
kit cc → ld → qemusmoke (all tracks + WS6 + WS7).
Verification
Verified execution oracle (clang + qemu-system, confirmed working on this host)
clang 22 has the riscv32 target and llvm-objdump/llvm-mc/ld.lld are installed.
qemu user-mode is not built on macOS — only qemu-system-riscv32 — which suits a
freestanding -none-elf target. A confirmed working recipe (PASS→exit 0, wrong answer→exit 7,
hang→exit 124), to be mirrored by test/smoke/rv32.sh:
- Build:
clang --target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f -nostdlib -ffreestanding(and anilp32/rv32imacsoft variant); linkld.lld -Ttext=0x80000000 -e _start. - Startup stub (
_start): setsp(RAM at0x80000000); for ilp32f setmstatus.FS(li t0,0x2000; csrs mstatus,t0) to enable the FPU before anyfadd.s— otherwise it traps and hangs. Softilp32skips this. - Result via SiFive test finisher at
0x100000:0x5555→qemu poweroff exit 0;0x3333|(code<<16)→qemu exitcode. - Run:
qemu-system-riscv32 -machine virt -bios none -kernel prog.elf -nographic -no-reboot(wrap intimeout). Verified that clang emits the expectedfadd.s+ inline 64-bitadd/sltufcvt.w.sfor ilp32f, andllvm-readelfshows ELF32 / "single-float ABI" / RVC flags.
This is the kit smoke: kit cc -target riscv32-none-elf ... -c app.c, assemble the startup stub,
kit ld to an ELF, run under qemu-system, assert exit 0. Unlike rv64 (qemu-user/podman), rv32
uses qemu-system + a bare-metal startup + finisher device. regen-rv32.sh uses
clang --target=riscv32 + llvm-objdump for asm/disasm goldens.
Milestones
kit has no in-process rv32 execution path (emulator out of scope), so behavioral correctness comes from the clang+qemu-system oracle above; structural correctness comes from self-consistency (decode↔format, ELF write↔read). Milestone order (each green before the next), preferring targeted runs and redirecting output to a file (per CLAUDE.md):
- Build/register:
make lib 2>&1 | tee /tmp/build.log; target recognized. - Decode/encode self-roundtrip — new
test/arch/rv32_decode_test.c(mirrorrv64_decode_test.c): no W-forms,lw/sw(nold/sd), 5-bit shamt,c.jal,c.lw/c.sw,c.flw/c.fsw; decode↔format agreement is the oracle.make test-isa 2>&1 | tee /tmp/isa.log. - Assembler/disasm corpus —
test/asm/rv32 lane +regen-rv32.sh(clang--target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f+llvm-objdumpas reference, maintainer-only, soft-skip if absent; committed goldens replayed by CI).make test-asm-rv32 2>&1 | tee /tmp/asm32.log. - ELF32 round-trip —
test/elf/unit/rv32_class32.c(first ELFCLASS32 consumer): write→read-back, assertEI_CLASS==ELFCLASS32,Elf32_Sym/Elf32_Relasurvive.make test-elf 2>&1 | tee /tmp/elf.log. - Compile + inspect (no execution):
./build/kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -mabi=ilp32f -c smoke.c -o /tmp/rv32.othen./build/kit disas /tmp/rv32.o(optional cross-checkllvm-objdump -d --triple=riscv32 /tmp/rv32.o). - Link + JIT image — new
test/link/rv32_jit_test.c(mirrorrv64_jit_test.c, exit 77 on non-rv32 host; include a PC-relative reloc to exercise HI20/LO12 pairing).kit ldto a static ELF executable succeeds. - qemu-system smoke —
test/smoke/rv32.shusing the verified oracle above (qemu-system-riscv32 -machine virt, FPU-enabling startup for ilp32f, SiFive finisher exit codes). Compilesapp.cwithkit cc -target riscv32-none-elf, links with the startup stub, runs under qemu, asserts exit 0. This is the only behavioral oracle (soft-double and 64-bit-int correctness are otherwise untestable) — make it a required CI gate whereqemu-system-riscv32is present; skip-if-absent elsewhere. Add a doctor (test/lib/check_rv32_env.sh) like rv64's.
New make targets next to their rv64 peers in test/test.mk: RV32_DECODE_TEST_BIN (into
test-isa), test-asm-rv32, test-rv32-jit, test-smoke-rv32, and rv32 added to the
runtime test arch list.
RV64 regression gate (run after WS1 and again at the end):
make test-isa test-asm-rv64 test-smoke-rv64 test-link + rv64_jit_test.
Risks
- 64-bit-int + soft-double on rv32 (WS6) is the deepest, execution-only risk. Carry/borrow
chains and soft-float rounding can't be checked by byte-goldens — only execution catches
valid-but-wrong codegen. The behavioral oracle (qemu-system, verified) closes this, but
depends on
qemu-system-riscv32being present and a correct FPU-enabling startup stub for ilp32f (a missingmstatus.FSset silently hangs instead of failing cleanly). Mitigate with qemu-gated differential tests (kit result vs host double/int64) and loud backend panics on any wide/.dvalue reaching the native path. - ELFCLASS32 (WS5) is the dominant effort (~130 Elf64-hardcoded sites across emit/read/link).
The write-then-read self-oracle catches internal inconsistency but not spec divergence; keep
one clang-oracle
cases/rv32 ELF test for an independent cross-check. Disambiguating EM_RISCV byEI_CLASSis a cross-cutting correctness point. - Sharing risk to RV64 (WS1/WS2): repurposing
rv_is_64semantics, theRV_FRAME_SAVE_SIZEconstant→2*ptr_bytes, and the compressed-quadrant/shamt branches all touch the working rv64 path. Land WS1/WS2 with rv64-byte-identical output and prove zero diff before enabling rv32. -mabiboundary: parsed indriver/, validated insrc/api/core.cwhere feature words exist. Every spec-construction site that bypasseskit_target_newmust defaultfloat_abi=DEFAULTsafely; the catch-all-mconsumer must not pre-eat-mabi.- ilp32 vs ilp32f confusion:
ilp32is the integer ABI (type widths); thefis float arg-passing only. The runtimeABI=ilp32include set is correct for both; the existing-march=rv32imafd(D) is wrong and must becomerv32imafc/rv32imac. - RVC dbg gap: rv32imafc emits compressed insns pervasively; v1 step-over fallback degrades
kit dbgsingle-step for rv32. The shim unit test must assert the fallback path is taken.
Critical files
src/arch/riscv/(renamed fromrv64/):variant.h(new),native.c,isa.c/.h,disasm.c,asm.c,link.c,dbg.c,arch.c(two ArchImpls,-march, macros).src/abi/abi_rv64.c→ shared RISC-V classifier +rv32_vtable;src/abi/registry.c.src/cg/arith.c— wide-int + soft-double legalization (WS6, the riskiest, currently absent).src/obj/elf/{elf.h,emit.c,read.c,link.c}+ newreloc_riscv32.c;src/obj/registry.c.include/kit/core.h(KitFloatAbi,KitTargetSpec.float_abi,KitTargetOptions.abi),include/kit/config.h(KIT_ARCH_RV32_ENABLED).driver/lib/target.c,driver/cmd/cc.c,driver/cmd/run.c(-mabi,medlow/medany);src/api/core.c(resolve/validate).src/arch/registry.c,mk/rt.mk,mk/lib_srcs.mk,test/test.mk+ new test files.