kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 22b8a80d7271c883acfddcfc5948b058c2c8e716
parent 89ec3480b48b5f7c95293b8df207700f11b3e7f0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 29 May 2026 20:23:50 -0700

doc+build: round-trip covers the full core op set; wire L0+L1 into default test

Mark P2 branch relaxation, .L locals, data-section symbolization, FP/SIMD
load/store, and exclusive-atomic decode as done; record the full-core-op-set
coverage (852 lane-checks, 1 skip) and re-scope the remaining follow-ups
(assembler .bss NOBITS, FP reg-offset/q decode, section-symbol/TLS
symbolization, other arches, llvm differential).

Add test-asm-roundtrip (L0+L1, host-independent) to DEFAULT_TEST_TARGETS so
the round-trip runs in the default suite; test-asm-roundtrip-exec (L2) stays
opt-in (native arch).

Diffstat:
Mdoc/ASM_ROUNDTRIP_TESTING.md | 83++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------
Mtest/test.mk | 1+
2 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md @@ -6,13 +6,47 @@ round-tripping the **compiler's own output** rather than only a hand-written corpus. The corpus (`test/asm/`) only tests instructions we thought to write down; codegen output tests every instruction codegen actually emits. -Status: plan + aa64 vertical slice landed (2026-05-29). Prereqs from the -native-arch asm work are in (see `doc/NATIVE_ARCH_COMPLETENESS.md`): the -assembler now parses the full relocation-operator syntax on all three arches, -which is exactly what the `-S` symbolizer (Phase 2) must *emit*. +Status: aa64 slice now covers the **full core op set** (2026-05-29). The corpus +exercises every CG operation family — int/fp arith, bitwise, shifts, compares, +unary, conversions (incl. bitcast), loads/stores of every width, control flow, +switch (compare-chain + jump-table), indirect/recursive/stack-arg calls, +aggregates (struct by-val/by-ref, bitfields, unions), globals/static-locals, and +atomics (RMW / compare-exchange via the exclusive-monitor sequence) — and the +round-trip passes all three lanes at `-O0` and `-O1`: **852 lane-checks pass, +1 skip** (the lone skip, `glob_bss_write`, is the assembler `.bss`-NOBITS +follow-up below). L0+L1 are wired into the default `make test` via +`test-asm-roundtrip`; L2 stays opt-in (`test-asm-roundtrip-exec`, native arch). ### Implemented so far (aa64) +- **P2 — same-section branch relaxation (DONE).** At `asm_parse` finalize the + assembler resolves branch relocations (JUMP26/CONDBR19/TSTBR14, never CALL26) + whose target is a defined local non-function symbol in the same section — + patching the displacement via `link_reloc_apply` with section-relative S/P and + dropping the reloc, matching codegen/GNU-as. L1 now covers control-flow-bearing + code (was auto-skipped). `src/asm/asm.c:relax_local_branches`. +- **`.L` local symbols + data-section symbolization (DONE).** The assembler lexer + accepts `.L`-prefixed locals (incl. embedded dots, `.Lcfree_ro.0`) and the + `name.N` discriminator mangling (`acc.1`) as identifiers; the `-S` symbolizer + emits `.L` operands instead of numeric fallback. `emit_data_range` renders + relocated data as `.quad/.word sym+addend` (the inverse of the assembler's + `.quad`), so switch jump tables (R_ABS64 against the function) and global + pointer tables round-trip. L1 compares relocs across `.text/.rodata/.data`. +- **FP/SIMD scalar load/store + unscaled ld/st family (DONE).** `p_ldst_core` / + `p_ldur_stur` now encode FP transfer registers (Bt..Qt, V=1) and the full + unscaled family (`sturb`/`ldurb`/`sturh`/`ldurh`/`ldursb`/`ldursh`/`ldursw`); + the disassembler decodes the signed unscaled loads (keying Wt/Xt on opc). This + unblocked every FP spill and conversion case. +- **Exclusive / acquire-release atomic decode (DONE).** The assembler already + encoded `ldxr`/`ldaxr`/`stxr`/`stlxr`/`ldar`/`stlr` (+ b/h), but the + disassembler rendered them `.inst`, so the atomic RMW sequence codegen emits + for `_Atomic` was dropped by `cc -S`. Added `AA64_FMT_LDST_EXCL` + + `print_ldst_excl` and the matching decode rows. Found by an adversarial sweep + (atomics were the one core-op family the corpus fan-out missed); now + `roundtrip/atomic_{rmw,cas,ops}`. + +### Earlier vertical-slice notes (aa64) + - **L0 decode-completeness** — `cc -S` already emits the distinct, re-assemblable marker `.inst 0x<word>` for an undecodable word (only `aa64_write_unknown` produces it), so the gate is "no `.inst` inside .text". No emitter change was @@ -38,25 +72,34 @@ which is exactly what the `-S` symbolizer (Phase 2) must *emit*. ### Remaining (tracked here) -- **P2 — assembler same-section branch relaxation (gates L1 for branchy code).** - Codegen resolves intra-function branches locally (no reloc); the assembler - emits a JUMP26/CONDBR19 reloc against the (local) label instead. So L1's - reloc-table comparison diverges for any function with control flow, and the - L1 lane auto-skips cases whose `-S` contains an `Lcf_` label. Fix: at - assembler finalize, for a branch reloc whose target symbol is defined in the - same section, compute the displacement, patch the instruction field (reuse - `link_reloc_apply`), and drop the reloc — matching GNU as / llvm-mc. Then L1 - covers control flow too. (L0 and L2 already do.) +- **Assembler `.bss` is PROGBITS, not NOBITS (the one corpus skip).** `cc -S` + renders a zero-init global as `.section .bss` + `.zero N`; `as` writes real + zero bytes and tracks position by byte count, so the symbol lands at offset 0 + and the section emits `SHT_PROGBITS`. The round-tripped `.bss` then loads + read-only in the JIT image and a store faults (L0/L1 pass, L2 aborts — + `roundtrip/glob_bss_write`). Fix: NOBITS position-tracking in the assembler — + a `SEC_BSS`/`SSEM_NOBITS` section's symbol offsets and `.zero`/`.skip`/`.align` + must advance `bss_size` instead of `obj_write`ing bytes (the obj layer already + treats `SEC_BSS` specially in `obj_align_to`; `obj_pos`/`m_emit_fill`/ + `process_label` need the matching NOBITS path). `glob_rw` covers the + global-write path via a `.data` global meanwhile. +- **FP register-offset + 128-bit `q` decode.** The assembler now *encodes* FP + register-offset (`str d0,[x,x,lsl#3]`) and `q` ldr/str, but the disassembler + decodes neither (renders `.inst`). Codegen emits neither for scalar C (FP + array indexing computes the address in a GPR first), so the round-trip never + hits them; add the decode rows if a NEON/vector path later emits them. - **`.inst` is dropped by `as`** — `cfree as` accepts the `.inst` directive but emits no bytes for it, so an undecoded word would not round-trip at L1 (L0 still flags it). `as` should emit the word (or error). -- **Section-relative + TLS reloc symbolization** — `build_symref` skips - `.`-prefixed (section/local) symbol names; string-literal/static-local data - refs and TLS kinds fall back to numeric. Extend once `as` accepts those. -- **Other arches** — the symbolizer switches on aa64 reloc kinds; x64/rv64 keep - the numeric `-S` output. Broaden per the RelocKind→syntax tables below. -- **Default suite + differential** — wire L0/L1 into the default `make test` - once the corpus is broad; add the llvm-mc / llvm-objdump differential lanes. +- **Section-relative + TLS reloc symbolization** — `build_symref` accepts `.L` + locals but still skips bare section symbols (`.text`) and TLS kinds, which + fall back to numeric. Extend once `as` accepts those operands. +- **Other arches** — the symbolizer switches on aa64 reloc kinds, and the + branch-relaxation predicate lists only the aa64 branch kinds; x64/rv64 keep + the numeric `-S` output and current `as` behavior. Broaden per the + RelocKind→syntax tables below. +- **Differential** — add the llvm-mc / llvm-objdump differential lanes over the + same `-S` output as a second-oracle cross-check. ## Background — what cfree can do today (verified) diff --git a/test/test.mk b/test/test.mk @@ -110,6 +110,7 @@ DEFAULT_TEST_TARGETS = \ test-asm \ test-asm-x64 \ test-asm-rv64 \ + test-asm-roundtrip \ test-isa \ test-aa64-inline \ test-rv64-inline \