kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Arch-Backend Completeness (planned work)

This roadmap consolidates the remaining native-backend work across the three machine-code targets (aa64, x64, rv64). The bulk of the NativeTarget port -- the single-pass (-O0) path and the known-frame (-O1) path for all three arches, plus the asm/disasm/link-reloc/dwarf matrix -- is already in tree and is treated here as the baseline, not as planned work. What follows is the genuinely-open follow-up: per-arch hooks where x64/rv64 still trail the aa64 reference, prologue/epilogue and tail-call cost-model parity, and a small set of niche asm/disasm and debugger gaps. The backend abstraction and ABI layer this work sits behind are documented in the design set: ../CODEGEN.md, ../OPT.md, ../ASM.md, ../DWARF.md.

Baseline (done -- context, not planned work)

The items below are what is not yet at aa64 parity.

1. Tail-call realization on x64 and rv64 (blocker-check removal)

aa64 realizes sibling (tail) calls whenever the outgoing stack-argument area fits the caller's incoming parameter window -- aa_no_tail returns a blocker only on that size check. The same is true of the size check on x64 and rv64 (x64 even accounts for the shadow-space prefix), and the restore-before-jump machinery already exists on both: x64_emit_tail_site / rv_emit_tail_site emit the callee-save restore and frame teardown ahead of the tail jump exactly the way aa64 does. What remains is conservatism in the realizability gate -- x64_no_tail and rv_no_tail still bail out with "callee-saved registers in use" whenever the function has any callee-save live (frame.ncallee_saves != 0), so those functions fall back to a normal call + return even though the tail site could handle them.

This is the single largest aa64-vs-rest divergence and matters most for the recursion-heavy / interpreter-dispatch workloads that the O(1)-tail-call work targets (see the interpreter and toy musttail tracks).

2. Prologue / epilogue cost-model parity (per-call overhead)

The fixed per-call overhead -- prologue + epilogue + arg setup, independent of the body -- is the dominant cost on call-heavy code. aa64 picks one of four frame shapes per function to minimize it. x64 and rv64 now select a cheaper known-frame shape too (see Done below); the design rationale lives in ../ARCH.md; the aa64 measurements and the remaining body-level warts are tracked alongside ../OPT.md and OPTIMIZER.md.

aa64 tiers (baseline, for reference):

tier when fixed insns
slim_prologue (Tier A) no callee-saves, no alloca, no body slots, no outgoing stack 3 (optimal)
fp_at_bottom >=1 callee-save/body slot, no outgoing stack args, frame <= 504 5 (optimal)
slim_small_frame as above but with outgoing stack args 7
fat large frame / alloca / big saved-pair offset 7+

The known-frame asymmetry (bottom-record only on the -O1 path) is intentional: the frame-size-dependent offsets require the frame to be final before the body, which only the optimizer's frame planner guarantees.

Leaf-ness is surfaced to the backends through NativeKnownFrameDesc.is_leaf (set in plan_frame, pass_native_emit.c, as "no IR_CALL of any kind -- regular or sibling/tail"). A leaf never clobbers the return-address register or the stack below sp, which is what unlocks the no-frame / red-zone shapes below.

Done:

Still open:

Body-level per-call warts from the aa64 study that are arch-shared and still open:

3. x64 debugger step-out / unwind

kit_dwarf_unwind_step has no memory provider, and x64 (unlike aa64/rv64, which have a link register) has no link-register fallback, so step-out can't recover the return address from the stack. Compounding it, the JIT debugger doesn't populate .eh_frame for in-process images.

This is a debugging-UX robustness item with test-infra dependencies; see ../DBG.md and ../DWARF.md. Sibling debugger roadmap: DEBUG.md.

4. Niche assembler / disassembler gaps

These are in the standalone as / inline-asm() encode-decode paths only. The compiler's codegen emits machine code directly and never routes through the text assembler, and the shipped runtime .s/.S files don't use these forms, so none of this blocks any build. They are GNU-as / llvm-mc parity gaps for hand-written assembly. Design context: ../ASM.md.

5. Cross-cutting hygiene