kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Plan: RISC-V 32-bit (riscv32-none-elf) support

Status — 2026-06-03 (branch rv32) — core complete; cross-test gaps tracked

riscv32-none-elf (rv32imafc_zicsr_zifencei, both ilp32f and ilp32) is a working cross target. WS6 — the flagged "hardest part", 64-bit-value legalization — is done and behaviorally verified under qemu-system-riscv32 at -O0 and -O1 for both ABIs. The full kit toolchain (kit cc → kit ld → qemu-system) builds and runs a correct bare-metal rv32 image with no special flags (freestanding defaults to no-PIE). As of 2026-06-03 the rv32 runtime is no longer special-cased: kit cc/kit ld auto-build and auto-link libkit_rt.a for riscv32-none-elf exactly like every other target — the driver carries two rv32 runtime variants (riscv32-elf soft ilp32, riscv32-elf-hardfloat ilp32f), selected by the float ABI recovered from the objects' ELF e_flags, so no explicit archive or -nostdlib is needed. RV64 / x64 / aa64 fully non-regressed: asm goldens byte-identical, isa (rv64 21 + rv32 31)/0, abi-classify 367/0, elf 41/0, link 122/0 + x64 79/0, cg-api 544/0, smoke-rv64 3/0, dwarf/driver/interp green.

Both corpora now run on qemu-system-riscv32 as a cross arch: Toy 240 pass / 15 red (test/toy/run.sh, path X) and C 439 pass / 36 red (test/parse/run.sh, path E). The reds are deliberately left red (no skip sidecars) — they are the real remaining rv32 gaps, enumerated in the checklist below.

Done & verified ✅

Remaining ⚠️ — clear checklist

A. rv32 codegen gaps surfaced by the cross lanes (the reds — left red on purpose, no skips). Toy 240/15, C 439/36; the 51 reds cluster into:

B. Pre-existing follow-ups (orthogonal to the cross tests).

Out of scope (decided): kit ld ELF32 dynamic/PIE — rv32 is static-only; layout_dyn clean-panics on an ELF32 dynamic/PIE link and that is the intended behavior.

Where to look


Context

kit today targets riscv64 (LP64D) via a single backend in src/arch/rv64/. We want a new cross target:

--target=riscv32-none-elf
-march=rv32imafc_zicsr_zifencei
-mabi=ilp32f          (and also -mabi=ilp32, soft-float)
-mcmodel=medlow

This is a freestanding 32-bit RISC-V toolchain target: F (single-precision hardware float) but no D, so double and long long are not native and must be lowered. The enum KIT_ARCH_RV32, the riscv32 triple parse (driver/lib/target.c:275, ptr_size=4), ELF auto-detection (src/api/object_detect.c), and the runtime source files (rt/lib/riscv/rv32.S, rt/lib/coro/riscv32.c) already exist but are unwired/incomplete.

The intended outcome: kit cc/as/ld/objdump/disas produce and consume correct riscv32-none-elf ELFCLASS32 objects and static executables for both ilp32f and ilp32, with libkit_rt.a builtins available and the JIT run/dbg plumbing wired (native execution host-gated, as for rv64).

Confirmed scope decisions

XLEN-parameterization mechanism

Add a const RiscvVariant* descriptor (immutable, two static instances selected by KitArchKind) carried on the per-function codegen context and threaded into the otherwise stateless decode/asm/disasm/link/dbg paths. This honors "no global state — everything hangs off a context struct" (the variant is a const table reached through a context, never ambient).

New src/arch/riscv/variant.h:

typedef struct RiscvVariant {
  KitArchKind kind;        /* KIT_ARCH_RV32 / KIT_ARCH_RV64 */
  const char* name;        /* "rv32" / "rv64" */
  const char* isa_prefix;  /* "rv32" / "rv64" — for -march parsing */
  u8  xlen;                /* 32 / 64 */
  u8  ptr_bytes;           /* 4 / 8 — pointer & native register width */
  u8  gp_slot_bytes;       /* 4 / 8 — varargs save & callee-save stride */
  u8  has_w_forms;         /* 0 rv32 / 1 rv64 — ADDW/ADDIW/SLLIW/... */
  u8  shamt_bits;          /* 5 rv32 / 6 rv64 — SLLI/SRLI/SRAI immediate */
  u32 frame_save_size;     /* 2 * ptr_bytes (8 rv32 / 16 rv64) */
} RiscvVariant;
const RiscvVariant* riscv_variant_for_kind(KitArchKind);

Reached via: RvNativeTarget.variant (codegen), riscv_variant_for_kind(c->target.arch) in the decoder/assembler/disassembler/dbg (they already hold a Compiler*), and two LinkArchDesc literals for the linker. Distinguish three different "8"s carefully — ptr_bytes (pointer/reg width), gp_slot_bytes (ABI save stride), and frame_save_size (saved ra+s0 pair) — conflating them passes rv64 (all 8) and breaks rv32.

The float ABI (soft vs single-hard) is a separate axis from XLEN, carried on KitTargetSpec.float_abi (see WS4), consumed by the ABI classifier and predefined macros.

Workstreams (ordered; each leaves a green targeted check)

WS0 — Config + variant scaffold (no behavior change)

WS1 — Directory rename + thread variant through codegen (rv64 still identical)

WS2 — ISA / asm / disasm / link / dbg XLEN parameterization (still rv64-only at runtime)

WS3 — rv32 ArchImpl + registry + -march + predefined macros

WS4 — ABI vtable refactor + -mabi plumbing

WS5 — ELFCLASS32 object emission + reading (largest item)

Introduce one is32/ElfEnc flag (from c->target.ptr_size) threaded through, not copy-paste duplication.

WS6 — 64-bit-int + soft-float-double legalization (hardest part)

The cg layer (src/cg/arith.c) only routes wide ops to libcalls for the __int128 builtin (api_i128_stack_top), never by width — so long long on rv32 currently reaches the backend as a raw 8-byte value, and double arithmetic would emit illegal .d ops.

WS7 — Runtime build wiring (mk/rt.mk)

WS8 — JIT run / dbg

kit run/dbg execute JIT bytes natively in-process (run.c entry_fn(...)); there is no cross-arch execution path (emulator is out of scope). So on a non-rv32 host, rv32 code cannot be executed — same situation as rv64's existing JIT test, which builds the image and skips the call (exit 77).

WS9 — Tests & verification (see Verification below)

Parallel workstream map

Much of this is separable. Lock a small set of shared interfaces first (Phase A), then five tracks proceed in parallel (Phase B), converging at integration (Phase C). The critical path is Phase A → Track 1 (the backend chain WS1→WS2→WS3); ELF32 (Track 2) is the largest effort but is parallel, so starting it immediately keeps it off the wall-clock.

Phase A — shared contracts (serial, small, land first; unblocks everyone):

The four contracts everyone codes against (freeze these in Phase A):

  1. RiscvVariant fields (XLEN/ptr_bytes/gp_slot_bytes/has_w_forms/shamt_bits/frame_save_size) — consumed by Track 1.
  2. float_abi on the spec — consumed by Track 2 (e_flags), Track 3 (FP-eligibility), Track 5 (soft-double), and WS3 (predefined macros).
  3. Reloc-kind list: the exact R_RV_* kinds rv32 codegen emits = the set rv32 ELF maps and link_jit expects (= existing rv64 set minus R_*64/ADD64/SUB64). Track 1 ↔ Track 2.
  4. Runtime libcall names (__adddf3, __muldf3, __fixdfsi, __floatsidf, __extendsfdf2, __truncdfsf2, __muldi3, __divdi3, __udivdi3, __moddi3, __ashldi3, __lshrdi3, __ashrdi3) emitted by WS6 = provided by WS7. Track 5 ↔ Track 4.
  5. ABI part-layout: i64/soft-double → even-aligned GPR pair; gp_slot_size=4; callee-save stride. Track 3 publishes it via the vtable; Track 1's native-frame code consumes it.

Phase B — parallel tracks (each independently testable):

Phase C — integration (after tracks converge):

Verification

Verified execution oracle (clang + qemu-system, confirmed working on this host)

clang 22 has the riscv32 target and llvm-objdump/llvm-mc/ld.lld are installed. qemu user-mode is not built on macOS — only qemu-system-riscv32 — which suits a freestanding -none-elf target. A confirmed working recipe (PASS→exit 0, wrong answer→exit 7, hang→exit 124), to be mirrored by test/smoke/rv32.sh:

This is the kit smoke: kit cc -target riscv32-none-elf ... -c app.c, assemble the startup stub, kit ld to an ELF, run under qemu-system, assert exit 0. Unlike rv64 (qemu-user/podman), rv32 uses qemu-system + a bare-metal startup + finisher device. regen-rv32.sh uses clang --target=riscv32 + llvm-objdump for asm/disasm goldens.

Milestones

kit has no in-process rv32 execution path (emulator out of scope), so behavioral correctness comes from the clang+qemu-system oracle above; structural correctness comes from self-consistency (decode↔format, ELF write↔read). Milestone order (each green before the next), preferring targeted runs and redirecting output to a file (per CLAUDE.md):

  1. Build/register: make lib 2>&1 | tee /tmp/build.log; target recognized.
  2. Decode/encode self-roundtrip — new test/arch/rv32_decode_test.c (mirror rv64_decode_test.c): no W-forms, lw/sw (no ld/sd), 5-bit shamt, c.jal, c.lw/c.sw, c.flw/c.fsw; decode↔format agreement is the oracle. make test-isa 2>&1 | tee /tmp/isa.log.
  3. Assembler/disasm corpustest/asm/ rv32 lane + regen-rv32.sh (clang --target=riscv32-unknown-elf -march=rv32imafc -mabi=ilp32f + llvm-objdump as reference, maintainer-only, soft-skip if absent; committed goldens replayed by CI). make test-asm-rv32 2>&1 | tee /tmp/asm32.log.
  4. ELF32 round-triptest/elf/unit/rv32_class32.c (first ELFCLASS32 consumer): write→read-back, assert EI_CLASS==ELFCLASS32, Elf32_Sym/Elf32_Rela survive. make test-elf 2>&1 | tee /tmp/elf.log.
  5. Compile + inspect (no execution): ./build/kit cc -target riscv32-none-elf -march=rv32imafc_zicsr_zifencei -mabi=ilp32f -c smoke.c -o /tmp/rv32.o then ./build/kit disas /tmp/rv32.o (optional cross-check llvm-objdump -d --triple=riscv32 /tmp/rv32.o).
  6. Link + JIT image — new test/link/rv32_jit_test.c (mirror rv64_jit_test.c, exit 77 on non-rv32 host; include a PC-relative reloc to exercise HI20/LO12 pairing). kit ld to a static ELF executable succeeds.
  7. qemu-system smoketest/smoke/rv32.sh using the verified oracle above (qemu-system-riscv32 -machine virt, FPU-enabling startup for ilp32f, SiFive finisher exit codes). Compiles app.c with kit cc -target riscv32-none-elf, links with the startup stub, runs under qemu, asserts exit 0. This is the only behavioral oracle (soft-double and 64-bit-int correctness are otherwise untestable) — make it a required CI gate where qemu-system-riscv32 is present; skip-if-absent elsewhere. Add a doctor (test/lib/check_rv32_env.sh) like rv64's.

New make targets next to their rv64 peers in test/test.mk: RV32_DECODE_TEST_BIN (into test-isa), test-asm-rv32, test-rv32-jit, test-smoke-rv32, and rv32 added to the runtime test arch list.

RV64 regression gate (run after WS1 and again at the end): make test-isa test-asm-rv64 test-smoke-rv64 test-link + rv64_jit_test.

Risks

  1. 64-bit-int + soft-double on rv32 (WS6) is the deepest, execution-only risk. Carry/borrow chains and soft-float rounding can't be checked by byte-goldens — only execution catches valid-but-wrong codegen. The behavioral oracle (qemu-system, verified) closes this, but depends on qemu-system-riscv32 being present and a correct FPU-enabling startup stub for ilp32f (a missing mstatus.FS set silently hangs instead of failing cleanly). Mitigate with qemu-gated differential tests (kit result vs host double/int64) and loud backend panics on any wide/.d value reaching the native path.
  2. ELFCLASS32 (WS5) is the dominant effort (~130 Elf64-hardcoded sites across emit/read/link). The write-then-read self-oracle catches internal inconsistency but not spec divergence; keep one clang-oracle cases/ rv32 ELF test for an independent cross-check. Disambiguating EM_RISCV by EI_CLASS is a cross-cutting correctness point.
  3. Sharing risk to RV64 (WS1/WS2): repurposing rv_is_64 semantics, the RV_FRAME_SAVE_SIZE constant→2*ptr_bytes, and the compressed-quadrant/shamt branches all touch the working rv64 path. Land WS1/WS2 with rv64-byte-identical output and prove zero diff before enabling rv32.
  4. -mabi boundary: parsed in driver/, validated in src/api/core.c where feature words exist. Every spec-construction site that bypasses kit_target_new must default float_abi=DEFAULT safely; the catch-all -m consumer must not pre-eat -mabi.
  5. ilp32 vs ilp32f confusion: ilp32 is the integer ABI (type widths); the f is float arg-passing only. The runtime ABI=ilp32 include set is correct for both; the existing -march=rv32imafd (D) is wrong and must become rv32imafc/rv32imac.
  6. RVC dbg gap: rv32imafc emits compressed insns pervasively; v1 step-over fallback degrades kit dbg single-step for rv32. The shim unit test must assert the fallback path is taken.

Critical files