kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 19bfa484a2cc61e639afe8823cd50dac1c46d04c
parent 7ea57a999f6e38192ca781416198d64c778cc276
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed, 27 May 2026 05:50:43 -0700

doc: update O1 recovery progress (aggregates mostly landed; tail-sret + asm remain)

Diffstat:
Mdoc/OPT_O1_PASSES.md | 38++++++++++++++++++++++++++++++++++----
1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/doc/OPT_O1_PASSES.md b/doc/OPT_O1_PASSES.md @@ -515,10 +515,16 @@ Completeness — route all ops through the optimizer: in regs across the asm). Refactor aa64 asm clobber-mask / callee-save / restore helpers off `NativeDirectTarget` (same wrapper pattern as va). Toy cases: 102,104,105,108,110,19,20. -- [ ] **Aggregates / sret / byval** — ABI lowering gaps in the optimizer path: - 124 (slices, wrong value), 130 (record sret, wrong codegen), 36 ("scalar - too large" panic), 37 (tail sret). Covers aggregate params/results, - sret returns, and aggregate/by-value call arguments. +- [~] **Aggregates / sret / byval** — mostly landed. DONE: aggregate locals + forced to frame; per-part ABI typing in plan_call/plan_ret; aggregate + results via copy_bytes; aggregate-typed IR_COPY/IR_LOAD/IR_STORE via + copy_bytes. 130 (record sret) and 124 (slices) now pass. REMAINING: + **tail call + sret** (36 musttail, 37 tail). Bug: in the tail+sret arg + shuffle, the first argument is loaded into x8, then x8 is overwritten with + the forwarded sret pointer before being moved to x0 — so x0 gets the sret + pointer instead of arg0 (see `aa_plan_call` tail/sret path + the tail-call + argument staging). Order the sret-x8 setup after the argument moves, or + stage args through temps that don't alias x8. - [ ] **BREAK_TO / CONTINUE_TO + SCOPE cond** — currently unused by frontends (toy/c lower break/continue to `BR`+labels), but unwired in emit. Either lower them to CFG edges in cg_ir_lower or wire emit, for true @@ -549,3 +555,27 @@ Performance (priority 3, after completeness + correctness): - Varargs landed end-to-end on the optimizer path; `IR_ADDR_OF` writeback fixed. Bypass-disabled R-path failures: 14 → 11. Default R-path (O0+O1): 408/408. + (commit "opt: route varargs through optimizer path; fix ADDR_OF spill writeback") +- Aggregate ABI, partial (commit "opt: aggregate ABI lowering ... (partial)"): + - Force aggregate / >8-byte locals to frame in `cg_ir_lower` `lower_locals` + (a 16-byte struct result local was being allocated to a single PReg). + - Type each ABI part by its own width in `aa_plan_call`/`aa_plan_ret` direct + paths via `aa_part_scalar_type` (was using the aggregate type → truncating + `mov w0,w9` for i64 fields). + - `emit_call`/`emit_ret`: aggregate/oversized results use `copy_bytes` / hand + `plan_ret` the value's memory location directly (no scalar temp copy). + - Result: no more "scalar too large" panics; sret no longer truncates. But + values still wrong (130→0, 124→40, 36/37→160); still 11 bypass-off failures. + +- Aggregate COPY/LOAD/STORE via `copy_bytes` (commit "opt: handle + aggregate-typed COPY/LOAD/STORE via byte copy"): the root cause of the + truncated/mis-offset aggregate moves was that `IR_COPY`/`IR_LOAD`/`IR_STORE` + on aggregate-typed operands were emitted as scalar moves. 130 and 124 now + pass. Bypass-off R-path failures: 11 → 9 (7 asm + 2 tail-sret). Default + R-path: 408/408. + +Debugging aids: `CFREE_NO_DIRECT_REPLAY=1 cfree cc -O1 -c <case>.toy` + +`cfree objdump -d`; `CFREE_DUMP=1` / `CFREE_DUMPCG=1` dump optimizer/CG IR +(they `compiler_panic` on the first recorded function — temporarily swap the +panic in `opt_dbg_dump_cg`/`opt_dbg_dump` for `cfree_debug_printf` to dump all +functions). Note the CG-IR dumper does not print INDIRECT `ofs`.