kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 7dba70b7a1cba47915a1a2c9129f2ce7b1fbdcd1
parent d12baa0e9445f2d0c31f0bdc7a422319296e037f
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 29 May 2026 18:24:52 -0700

asm: implement aa64 codegen round-trip testing (L0/L1/L2) + symbolize -S

Round-trips the compiler's own output through `cc -S | as` to measure
assembler/disassembler/reloc completeness (doc/ASM_ROUNDTRIP_TESTING.md),
rather than only a hand-written corpus.

Harness (test/asm/roundtrip.sh, C corpus in test/asm/roundtrip/):
- L0 decode-complete: `cc -S` emits no `.inst` (undecoded word) inside .text.
- L1 byte round-trip: `cc -c` vs `cc -S | as`; diff .text bytes + .text relocs.
- L2 exec equivalence: run direct vs round-tripped object (jit-runner), compare
  exit codes.
Targets test-disasm-complete / test-asm-roundtrip / test-asm-roundtrip-exec
(registered, not yet in the default suite). 28 pass / 8 skip on aa64.

-S symbolization (src/api/asm_emit.c), making `cc -S` re-assemblable:
- reloc-kind-keyed operand surgery: CALL26/JUMP26 -> `bl/b sym`,
  ADR_PREL_PG_HI21 -> `adrp Rd, sym`, ADR_GOT_PAGE -> `:got:`,
  ADD_ABS_LO12_NC -> `:lo12:`, LDST*_ABS_LO12_NC -> `[Rn, :lo12:sym]`,
  LD64_GOT_LO12_NC -> `:got_lo12:`.
- intra-function branch-label synthesis: `b/b.cc/cbz/cbnz` to in-section
  targets get a synthesized `Lcf_<sec>_<off>` label.

Assembler bug found+fixed by the harness: p_ldp_stp ignored the post-index
`[Rn], #imm` form, encoding it as the offset form; regression case
test/asm/encode/aa64_ldp_stp_index.

L1 auto-skips functions with intra-function branches: the assembler relocates
same-section branch targets that codegen resolves locally, so the reloc tables
differ (P2 in the doc). L0/L2 cover those cases; x64/rv64 keep numeric -S.

Diffstat:
Mdoc/ASM_ROUNDTRIP_TESTING.md | 55+++++++++++++++++++++++++++++++++++++++++++++++++++----
Msrc/api/asm_emit.c | 425+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
Msrc/arch/aa64/asm.c | 2++
Atest/asm/encode/aa64_ldp_stp_index.expected.hex | 1+
Atest/asm/encode/aa64_ldp_stp_index.s | 7+++++++
Atest/asm/encode/aa64_ldp_stp_index.targets | 1+
Atest/asm/roundtrip.sh | 250+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/asm/roundtrip/arith.c | 8++++++++
Atest/asm/roundtrip/arith.expected | 1+
Atest/asm/roundtrip/call.c | 8++++++++
Atest/asm/roundtrip/call.expected | 1+
Atest/asm/roundtrip/cond.c | 7+++++++
Atest/asm/roundtrip/cond.expected | 1+
Atest/asm/roundtrip/global.c | 6++++++
Atest/asm/roundtrip/global.expected | 1+
Atest/asm/roundtrip/loop.c | 12++++++++++++
Atest/asm/roundtrip/loop.expected | 1+
Atest/asm/roundtrip/ret42.c | 11+++++++++++
Atest/asm/roundtrip/ret42.expected | 1+
Mtest/test.mk | 26++++++++++++++++++++++++++
20 files changed, 815 insertions(+), 10 deletions(-)

diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md @@ -6,10 +6,57 @@ round-tripping the **compiler's own output** rather than only a hand-written corpus. The corpus (`test/asm/`) only tests instructions we thought to write down; codegen output tests every instruction codegen actually emits. -Status: plan only (2026-05-29). Prereqs from the native-arch asm work are in -(see `doc/NATIVE_ARCH_COMPLETENESS.md`): the assembler now parses the full -relocation-operator syntax on all three arches, which is exactly what the -`-S` symbolizer (Phase 2) must *emit*. +Status: plan + aa64 vertical slice landed (2026-05-29). Prereqs from the +native-arch asm work are in (see `doc/NATIVE_ARCH_COMPLETENESS.md`): the +assembler now parses the full relocation-operator syntax on all three arches, +which is exactly what the `-S` symbolizer (Phase 2) must *emit*. + +### Implemented so far (aa64) + +- **L0 decode-completeness** — `cc -S` already emits the distinct, re-assemblable + marker `.inst 0x<word>` for an undecodable word (only `aa64_write_unknown` + produces it), so the gate is "no `.inst` inside .text". No emitter change was + needed for aa64. `test-disasm-complete` runs it at -O0 and -O1. +- **Phase 2 reloc symbolization** — `src/api/asm_emit.c` now consults the + section reloc table and rewrites the covered operand into reloc-operator + syntax (CALL26/JUMP26 → `bl/b sym`; ADR_PREL_PG_HI21 → `adrp Rd, sym`; + ADR_GOT_PAGE → `:got:`; ADD_ABS_LO12_NC → `:lo12:`; LDST*_ABS_LO12_NC → + `[Rn, :lo12:sym]`; LD64_GOT_LO12_NC → `:got_lo12:`). Text-surgery keyed by + reloc kind, so register names the disassembler produced are preserved. +- **Phase 2 branch-label synthesis** — intra-section branches (`b`/`b.cc`/`cbz`/ + `cbnz`, no reloc) get a synthesized local label `Lcf_<sec>_<off>` and the + operand is rewritten to reference it. `-S` is therefore re-assemblable for + control flow. (Non-dot label spelling sidesteps the assembler not yet + accepting `.L`-prefixed identifiers as operands.) +- **Harness** — `test/asm/roundtrip.sh` over a C corpus in `test/asm/roundtrip/`; + targets `test-disasm-complete` (L0), `test-asm-roundtrip` (L0+L1), + `test-asm-roundtrip-exec` (L0+L1+L2 via jit-runner). 28 pass / 8 skip on aa64. +- **Bug found + fixed** — the round-trip immediately caught the aa64 assembler + encoding post-index `ldp/stp [Rn], #imm` as the offset form (`p_ldp_stp` + ignored `post_index`); fixed, with regression case + `test/asm/encode/aa64_ldp_stp_index`. + +### Remaining (tracked here) + +- **P2 — assembler same-section branch relaxation (gates L1 for branchy code).** + Codegen resolves intra-function branches locally (no reloc); the assembler + emits a JUMP26/CONDBR19 reloc against the (local) label instead. So L1's + reloc-table comparison diverges for any function with control flow, and the + L1 lane auto-skips cases whose `-S` contains an `Lcf_` label. Fix: at + assembler finalize, for a branch reloc whose target symbol is defined in the + same section, compute the displacement, patch the instruction field (reuse + `link_reloc_apply`), and drop the reloc — matching GNU as / llvm-mc. Then L1 + covers control flow too. (L0 and L2 already do.) +- **`.inst` is dropped by `as`** — `cfree as` accepts the `.inst` directive but + emits no bytes for it, so an undecoded word would not round-trip at L1 (L0 + still flags it). `as` should emit the word (or error). +- **Section-relative + TLS reloc symbolization** — `build_symref` skips + `.`-prefixed (section/local) symbol names; string-literal/static-local data + refs and TLS kinds fall back to numeric. Extend once `as` accepts those. +- **Other arches** — the symbolizer switches on aa64 reloc kinds; x64/rv64 keep + the numeric `-S` output. Broaden per the RelocKind→syntax tables below. +- **Default suite + differential** — wire L0/L1 into the default `make test` + once the corpus is broad; add the llvm-mc / llvm-objdump differential lanes. ## Background — what cfree can do today (verified) diff --git a/src/api/asm_emit.c b/src/api/asm_emit.c @@ -212,8 +212,390 @@ static CfreeStatus emit_zero_range(Writer* w, u32 size) { return w_newline(w); } -static CfreeStatus emit_disasm_range(Writer* w, ArchDisasm* dasm, - const u8* data, u32 start, u32 end) { +/* ---- Phase 2 symbolization: reloc-driven operand substitution ---------- + * + * `cc -S` must be re-assemblable. The disassembler renders relocated operands + * numerically (e.g. `bl 0x10`, `adrp x16, 0x0`, `ldr w8, [x16]`), which would + * branch to the wrong place or load from address 0 on re-assembly. Here we + * consult the section's relocation table and rewrite the covered operand into + * the relocation-operator syntax the assembler parses (the inverse of + * src/arch/aa64/asm.c's parse_reloc_mod). See doc/ASM_ROUNDTRIP_TESTING.md. + * + * Operand text is rewritten in place rather than re-rendered from decoded + * fields, so the register names the disassembler produced are preserved and + * this layer stays free of per-arch register-naming knowledge. The reloc + * kind alone selects the modifier and the operand shape to patch. */ + +typedef struct { + u32 offset; + u16 kind; + Sym sym; + i64 addend; +} SecReloc; + +static int cmp_secreloc(const void* va, const void* vb) { + const SecReloc* a = (const SecReloc*)va; + const SecReloc* b = (const SecReloc*)vb; + if (a->offset < b->offset) return -1; + if (a->offset > b->offset) return 1; + return 0; +} + +static SecReloc* collect_relocs(Compiler* c, ObjBuilder* ob, ObjSecId sec_id, + u32* n_out) { + u32 total = obj_reloc_total(ob); + u32 n = 0, cap = 0, i; + SecReloc* arr = NULL; + + *n_out = 0; + for (i = 0; i < total; ++i) { + const Reloc* r = obj_reloc_at(ob, i); + const ObjSym* s; + if (!r || r->removed) continue; + if (r->section_id != sec_id) continue; + if (n == cap) { + u32 ncap = cap ? cap * 2 : 8; + SecReloc* na = arena_array(c->tu, SecReloc, ncap); + if (!na) break; + if (arr) memcpy(na, arr, cap * sizeof(SecReloc)); + arr = na; + cap = ncap; + } + s = obj_symbol_get(ob, r->sym); + arr[n].offset = r->offset; + arr[n].kind = r->kind; + arr[n].sym = s ? s->name : (Sym)0; + arr[n].addend = r->addend; + ++n; + } + if (n > 1) qsort(arr, n, sizeof(SecReloc), cmp_secreloc); + *n_out = n; + return arr; +} + +/* First relocation whose offset lies within instruction [off, off+len). */ +static const SecReloc* reloc_in_range(const SecReloc* r, u32 n, u32 off, + u32 len) { + u32 i; + for (i = 0; i < n; ++i) + if (r[i].offset >= off && r[i].offset < off + len) return &r[i]; + return NULL; +} + +/* How a reloc kind is rendered into operand text. */ +typedef enum { SURG_NONE, SURG_TAIL, SURG_MEM } SurgKind; + +/* Map an aarch64 reloc kind to (operand modifier, surgery shape). + * Returns NULL for kinds this layer does not symbolize (caller keeps the + * numeric operand — honest, and the round-trip lane flags the gap). */ +static const char* reloc_modifier(u16 kind, SurgKind* surg) { + switch (kind) { + case R_AARCH64_CALL26: + case R_AARCH64_JUMP26: + case R_AARCH64_CONDBR19: + case R_AARCH64_ADR_PREL_PG_HI21: + case R_AARCH64_ADR_PREL_LO21: + *surg = SURG_TAIL; + return ""; + case R_AARCH64_ADR_GOT_PAGE: + *surg = SURG_TAIL; + return ":got:"; + case R_AARCH64_ADD_ABS_LO12_NC: + *surg = SURG_TAIL; + return ":lo12:"; + case R_AARCH64_LDST8_ABS_LO12_NC: + case R_AARCH64_LDST16_ABS_LO12_NC: + case R_AARCH64_LDST32_ABS_LO12_NC: + case R_AARCH64_LDST64_ABS_LO12_NC: + *surg = SURG_MEM; + return ":lo12:"; + case R_AARCH64_LD64_GOT_LO12_NC: + *surg = SURG_MEM; + return ":got_lo12:"; + default: + *surg = SURG_NONE; + return NULL; + } +} + +/* Build "<mod><sym>[+/-addend]" into buf. Returns length, or -1 if the symbol + * has no usable name (anonymous, or a `.`-prefixed section/local symbol the + * assembler's expression parser does not accept). */ +static int build_symref(char* buf, u32 cap, Compiler* c, const char* mod, + Sym name, i64 addend) { + Slice s; + u32 p = 0, i; + if (!name) return -1; + s = pool_slice(c->global, name); + if (s.len == 0 || s.s[0] == '.') return -1; + for (i = 0; mod[i] && p + 1 < cap; ++i) buf[p++] = mod[i]; + for (i = 0; i < s.len && p + 1 < cap; ++i) buf[p++] = s.s[i]; + if (addend != 0) { + char num[24]; + u32 nl = 0; + u64 mag = addend < 0 ? (u64)(-(addend)) : (u64)addend; + if (p + 1 < cap) buf[p++] = addend < 0 ? '-' : '+'; + do { + num[nl++] = (char)('0' + (u32)(mag % 10)); + mag /= 10; + } while (mag && nl < sizeof(num)); + while (nl && p + 1 < cap) buf[p++] = num[--nl]; + } + buf[p] = '\0'; + return (int)p; +} + +/* Write `ops` with the relocated operand rewritten to `symref`. `surg` + * selects the shape: TAIL replaces the last comma-separated component (or the + * whole operand if there is no comma); MEM rewrites the offset inside [...]. */ +static CfreeStatus w_symbolized(Writer* w, const char* ops, u32 olen, + const char* symref, SurgKind surg) { + if (surg == SURG_TAIL) { + i32 last_comma = -1; + u32 i; + for (i = 0; i < olen; ++i) + if (ops[i] == ',') last_comma = (i32)i; + if (last_comma < 0) return w_str(w, symref); + { + CfreeStatus st = cfree_writer_write(w, ops, (u32)last_comma); + if (st != CFREE_OK) return st; + st = w_str(w, ", "); + if (st != CFREE_OK) return st; + return w_str(w, symref); + } + } + /* SURG_MEM: keep the base register, set the offset to symref. */ + { + i32 lb = -1, rb = -1, base_end; + u32 i; + CfreeStatus st; + for (i = 0; i < olen; ++i) { + if (ops[i] == '[') lb = (i32)i; + else if (ops[i] == ']') rb = (i32)i; + } + if (lb < 0 || rb < 0 || rb < lb) /* unexpected shape; emit verbatim */ + return cfree_writer_write(w, ops, olen); + base_end = lb + 1; + while (base_end < rb && ops[base_end] != ',') ++base_end; + st = cfree_writer_write(w, ops, (u32)(base_end)); /* "...[Rn" */ + if (st != CFREE_OK) return st; + st = w_str(w, ", "); + if (st != CFREE_OK) return st; + st = w_str(w, symref); + if (st != CFREE_OK) return st; + st = w_str(w, "]"); + if (st != CFREE_OK) return st; + /* trailing text after the close bracket (e.g. nothing for reloc'd ldst) */ + if ((u32)(rb + 1) < olen) + return cfree_writer_write(w, ops + rb + 1, olen - (u32)(rb + 1)); + return CFREE_OK; + } +} + +/* ---- Phase 2 symbolization: intra-function branch labels --------------- + * + * Branches that stay within the section carry no relocation — the + * disassembler renders the resolved target numerically (`b 0x60`). The + * assembler rejects a numeric branch target, so re-assembly needs a label. + * We pre-scan the section for such branch targets, synthesize a local label + * at each, and rewrite the branch operand to reference it. + * + * Synthesized names are `Lcf_<secidx>_<hexoff>` — deliberately not `.L` + * prefixed, since the assembler's expression parser does not currently accept + * `.`-led identifiers as operands. The names are unique within the file. */ + +static int cmp_u32(const void* va, const void* vb) { + u32 a = *(const u32*)va, b = *(const u32*)vb; + if (a < b) return -1; + if (a > b) return 1; + return 0; +} + +/* PC-relative branch with an immediate label target: b, b.<cc>, cbz, cbnz. + * Excludes bl (a call — always relocated) and register-form branches. */ +static int is_local_branch_mnem(CfreeSlice m) { + if (m.len == 1 && m.s[0] == 'b') return 1; + if (m.len >= 2 && m.s[0] == 'b' && m.s[1] == '.') return 1; + if (m.len == 3 && memcmp(m.s, "cbz", 3) == 0) return 1; + if (m.len == 4 && memcmp(m.s, "cbnz", 4) == 0) return 1; + return 0; +} + +/* Parse the trailing `0x<hex>` branch-target operand (the last comma-separated + * component). Returns 1 and the value on success. */ +static int parse_hex_tail(CfreeSlice ops, u64* out) { + i32 start = 0, p; + u64 v = 0; + u32 i; + int any = 0; + for (i = 0; i < ops.len; ++i) + if (ops.s[i] == ',') start = (i32)i + 1; + while (start < (i32)ops.len && (ops.s[start] == ' ' || ops.s[start] == '\t')) + ++start; + if (start + 2 > (i32)ops.len || ops.s[start] != '0' || + (ops.s[start + 1] | 32) != 'x') + return 0; + for (p = start + 2; p < (i32)ops.len; ++p) { + char c = ops.s[p]; + u32 d; + if (c >= '0' && c <= '9') d = (u32)(c - '0'); + else if ((c | 32) >= 'a' && (c | 32) <= 'f') d = (u32)((c | 32) - 'a' + 10); + else break; + v = v * 16 + d; + any = 1; + } + while (p < (i32)ops.len && ops.s[p] == ' ') ++p; + if (!any || p != (i32)ops.len) return 0; + *out = v; + return 1; +} + +typedef struct { + Compiler* c; + u32 secidx; + const SecReloc* relocs; + u32 nrelocs; + const SymLabel* labels; + u32 nlabels; + const u32* btargets; + u32 nbt; +} EmitCtx; + +static u32 fmt_u64(char* buf, u32 p, u32 cap, u64 v, u32 base) { + char tmp[24]; + u32 n = 0; + do { + u32 d = (u32)(v % base); + tmp[n++] = (char)(d < 10 ? '0' + d : 'a' + d - 10); + v /= base; + } while (v && n < sizeof tmp); + while (n && p + 1 < cap) buf[p++] = tmp[--n]; + return p; +} + +/* Synthesized label spelling, shared by definition and reference sites. */ +static u32 fmt_synth_label(char* buf, u32 cap, u32 secidx, u32 off) { + u32 p = 0; + const char* pre = "Lcf_"; + u32 i; + for (i = 0; pre[i] && p + 1 < cap; ++i) buf[p++] = pre[i]; + p = fmt_u64(buf, p, cap, secidx, 10); + if (p + 1 < cap) buf[p++] = '_'; + p = fmt_u64(buf, p, cap, off, 16); + buf[p] = '\0'; + return p; +} + +/* Non-dot symbol name defined at `off`, or NULL. Such a symbol is used as the + * branch label directly (no synthesized label needed). */ +static Sym symbol_at(const EmitCtx* x, u32 off) { + u32 i; + for (i = 0; i < x->nlabels; ++i) { + if (x->labels[i].offset == off && x->labels[i].name) { + Slice s = pool_slice(x->c->global, x->labels[i].name); + if (s.len && s.s[0] != '.') return x->labels[i].name; + } + } + return (Sym)0; +} + +/* Label name for a branch target offset: an existing symbol if one is defined + * there, else the synthesized `Lcf_...` name. */ +static u32 build_label_name(char* buf, u32 cap, const EmitCtx* x, u32 off) { + Sym sym = symbol_at(x, off); + if (sym) { + Slice s = pool_slice(x->c->global, sym); + u32 p = 0, i; + for (i = 0; i < s.len && p + 1 < cap; ++i) buf[p++] = s.s[i]; + buf[p] = '\0'; + return p; + } + return fmt_synth_label(buf, cap, x->secidx, off); +} + +static int is_btarget(const EmitCtx* x, u32 off) { + u32 i; + for (i = 0; i < x->nbt; ++i) + if (x->btargets[i] == off) return 1; + return 0; +} + +/* Pre-scan: collect in-section branch targets of un-relocated local branches. */ +static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm, + const SecReloc* relocs, u32 nrelocs, + const u8* data, u32 total, u32* n_out) { + u32* arr = NULL; + u32 n = 0, cap = 0, off = 0; + + *n_out = 0; + while (off < total) { + CfreeInsn insn; + u32 nb = arch_disasm_decode(dasm, data + off, total - off, (u64)off, &insn); + u64 tgt; + if (nb == 0) { + off += 1; + continue; + } + if (!reloc_in_range(relocs, nrelocs, off, nb) && + is_local_branch_mnem(insn.mnemonic) && + parse_hex_tail(insn.operands, &tgt) && tgt < total) { + u32 j; + int found = 0; + for (j = 0; j < n; ++j) + if (arr[j] == (u32)tgt) { + found = 1; + break; + } + if (!found) { + if (n == cap) { + u32 nc = cap ? cap * 2 : 8; + u32* na = arena_array(c->tu, u32, nc); + if (!na) break; + if (arr) memcpy(na, arr, cap * sizeof(u32)); + arr = na; + cap = nc; + } + arr[n++] = (u32)tgt; + } + } + off += nb; + } + if (n > 1) qsort(arr, n, sizeof(u32), cmp_u32); + *n_out = n; + return arr; +} + +/* Emit an instruction's operands, symbolizing a covering relocation or an + * intra-section branch target when present. */ +static CfreeStatus emit_operands(Writer* w, const EmitCtx* x, + const CfreeInsn* insn, u32 off) { + const SecReloc* r; + if (!insn->operands.len) return CFREE_OK; + r = reloc_in_range(x->relocs, x->nrelocs, off, insn->nbytes); + if (r) { + SurgKind surg; + const char* mod = reloc_modifier(r->kind, &surg); + if (mod) { + char symref[256]; + if (build_symref(symref, sizeof symref, x->c, mod, r->sym, r->addend) >= 0) + return w_symbolized(w, insn->operands.s, insn->operands.len, symref, + surg); + } + } else if (is_local_branch_mnem(insn->mnemonic)) { + u64 tgt; + if (parse_hex_tail(insn->operands, &tgt) && is_btarget(x, (u32)tgt)) { + char name[256]; + build_label_name(name, sizeof name, x, (u32)tgt); + return w_symbolized(w, insn->operands.s, insn->operands.len, name, + SURG_TAIL); + } + } + return cfree_writer_write(w, insn->operands.s, insn->operands.len); +} + +static CfreeStatus emit_disasm_range(Writer* w, const EmitCtx* x, + ArchDisasm* dasm, const u8* data, u32 start, + u32 end) { u32 off = start; CfreeStatus st; @@ -221,7 +603,6 @@ static CfreeStatus emit_disasm_range(Writer* w, ArchDisasm* dasm, CfreeInsn insn; u64 vaddr = (u64)off; u32 n = arch_disasm_decode(dasm, data + off, end - off, vaddr, &insn); - u32 b; if (n == 0) { st = w_str(w, " .byte 0x"); @@ -241,13 +622,12 @@ static CfreeStatus emit_disasm_range(Writer* w, ArchDisasm* dasm, if (insn.operands.len) { st = w_str(w, "\t"); if (st != CFREE_OK) return st; - st = cfree_writer_write(w, insn.operands.s, insn.operands.len); + st = emit_operands(w, x, &insn, off); if (st != CFREE_OK) return st; } st = w_newline(w); if (st != CFREE_OK) return st; - (void)b; off += n; } return CFREE_OK; @@ -274,6 +654,11 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, ArchDisasm* dasm; const u8* flat_data; u8* heap_data; + SecReloc* relocs; + u32 nrelocs; + u32* btargets; + u32 nbt, bi; + EmitCtx ctx; if (!sec || sec->removed) continue; dir = sec_directive(sec); @@ -300,15 +685,24 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, dasm = NULL; flat_data = NULL; heap_data = NULL; + relocs = NULL; + nrelocs = 0; + btargets = NULL; + nbt = 0; + bi = 0; if (total > 0 && (sec->flags & SF_EXEC)) { Heap* heap; dasm = arch_disasm_new(c); + relocs = collect_relocs(c, ob, (ObjSecId)i, &nrelocs); heap = c->ctx->heap; heap_data = (u8*)heap->alloc(heap, total, 1); if (heap_data) { buf_flatten(&sec->bytes, heap_data); flat_data = heap_data; + if (dasm) + btargets = collect_branch_targets(c, dasm, relocs, nrelocs, flat_data, + total, &nbt); } } else if (total > 0 && sec->kind != SEC_BSS) { Heap* heap = c->ctx->heap; @@ -319,6 +713,15 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, } } + ctx.c = c; + ctx.secidx = i; + ctx.relocs = relocs; + ctx.nrelocs = nrelocs; + ctx.labels = labels; + ctx.nlabels = nlabels; + ctx.btargets = btargets; + ctx.nbt = nbt; + off = 0; li = 0; @@ -327,6 +730,14 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, emit_label(w, c, &labels[li]); ++li; } + /* Synthesized branch-target label, unless a real symbol sits here. */ + if (nbt && is_btarget(&ctx, off) && !symbol_at(&ctx, off)) { + char name[256]; + fmt_synth_label(name, sizeof name, i, off); + w_str(w, name); + w_str(w, ":"); + w_newline(w); + } if (off >= total) break; @@ -335,11 +746,13 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, if (li < nlabels && labels[li].offset > off && labels[li].offset < total) next = labels[li].offset; + while (bi < nbt && btargets[bi] <= off) ++bi; + if (bi < nbt && btargets[bi] < next) next = btargets[bi]; if (sec->kind == SEC_BSS) { emit_zero_range(w, next - off); } else if ((sec->flags & SF_EXEC) && dasm && flat_data) { - emit_disasm_range(w, dasm, flat_data, off, next); + emit_disasm_range(w, &ctx, dasm, flat_data, off, next); } else if (flat_data) { emit_data_range(w, flat_data, off, next); } diff --git a/src/arch/aa64/asm.c b/src/arch/aa64/asm.c @@ -1277,6 +1277,8 @@ static void p_ldp_stp(AsmDriver* d, int is_load) { .Rt = rt.num}; if (m.pre_index) emit32(d, aa64_ldstp_pre_pack(f)); + else if (m.post_index) + emit32(d, aa64_ldstp_post_pack(f)); else emit32(d, aa64_ldstp_soff_pack(f)); } diff --git a/test/asm/encode/aa64_ldp_stp_index.expected.hex b/test/asm/encode/aa64_ldp_stp_index.expected.hex @@ -0,0 +1 @@ +fd7bbfa9fd7bc1a8f35301a9f35341a9f353bf29f353c128 diff --git a/test/asm/encode/aa64_ldp_stp_index.s b/test/asm/encode/aa64_ldp_stp_index.s @@ -0,0 +1,7 @@ + .text + stp x29, x30, [sp, #-16]! + ldp x29, x30, [sp], #16 + stp x19, x20, [sp, #16] + ldp x19, x20, [sp, #16] + stp w19, w20, [sp, #-8]! + ldp w19, w20, [sp], #8 diff --git a/test/asm/encode/aa64_ldp_stp_index.targets b/test/asm/encode/aa64_ldp_stp_index.targets @@ -0,0 +1 @@ +aa64 diff --git a/test/asm/roundtrip.sh b/test/asm/roundtrip.sh @@ -0,0 +1,250 @@ +#!/usr/bin/env bash +# test/asm/roundtrip.sh — codegen round-trip completeness harness. +# +# Measures completeness of the per-arch assembler, disassembler, and the link +# relocation path by round-tripping the *compiler's own output* rather than a +# hand-written corpus. See doc/ASM_ROUNDTRIP_TESTING.md. +# +# Corpus: test/asm/roundtrip/*.c — each defines `int test_main(...)` so the L2 +# lane can execute it through the shared exec harness (jit-runner). +# +# Three lanes (default "012"): +# +# 0 L0 decode-complete — `cfree cc -S` and assert no in-function decode +# failure marker (aarch64 `.inst`) inside .text. Catches an instruction +# codegen emits that the disassembler cannot decode. Host-independent, +# no exec, pinpoints the undecoded word. +# 1 L1 byte round-trip — `cfree cc -c` (direct.o) vs `cfree cc -S | cfree as` +# (rt.o); diff the .text bytes AND the .text relocation table. Catches +# assembler/disassembler disagreements (round-trip violations). Exact, +# host-independent. Gated on L0 passing first. +# 2 L2 exec equivalence — run direct.o and rt.o and compare exit codes (and, +# when present, against <name>.expected). Tolerant of benign encoding +# differences; the end-to-end "it runs the same" signal. Executes via +# jit-runner; native target only (skips when host arch != cross-target). +# +# Opt levels: CFREE_TEST_OPTS (default "O1"). Each case is run at every level. +# +# Filtering: +# ./roundtrip.sh [name_filter] [lanes] +# name_filter substring match against case basename +# lanes subset of "012" (default "012") +# Equivalent env vars: CFREE_TEST_FILTER, CFREE_TEST_PATHS, CFREE_TEST_OPTS. + +set -u + +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +TEST_DIR="$ROOT/test/asm" +CORPUS_DIR="$TEST_DIR/roundtrip" +BUILD_DIR="$ROOT/build/test" +WORK_ROOT="$BUILD_DIR/asm/roundtrip" +CFREE="$ROOT/build/cfree" +JIT_RUNNER="$BUILD_DIR/jit-runner" + +# CFREE_TEST_ARCH selects the cross-target. Mirrors test/asm/run.sh. +CFREE_TEST_ARCH="${CFREE_TEST_ARCH:-aa64}" +case "$CFREE_TEST_ARCH" in + aa64|aarch64|arm64) TEST_ARCH=aa64; TRIPLE=aarch64-linux-gnu ;; + x64|x86_64|amd64) TEST_ARCH=x64; TRIPLE=x86_64-linux-gnu ;; + rv64|riscv64) TEST_ARCH=rv64; TRIPLE=riscv64-linux-gnu ;; + *) printf 'unknown CFREE_TEST_ARCH=%s\n' "$CFREE_TEST_ARCH" >&2; exit 2 ;; +esac +export CFREE_TEST_ARCH + +OPTS="${CFREE_TEST_OPTS:-O1}" + +FILTER="${1:-${CFREE_TEST_FILTER:-}}" +PATHS="${2:-${CFREE_TEST_PATHS:-012}}" +case "$PATHS" in *0*) RUN_L0=1;; *) RUN_L0=0;; esac +case "$PATHS" in *1*) RUN_L1=1;; *) RUN_L1=0;; esac +case "$PATHS" in *2*) RUN_L2=1;; *) RUN_L2=0;; esac + +PASS=0; FAIL=0; SKIP=0 +FAIL_NAMES=() + +color_red() { printf '\033[31m%s\033[0m' "$1"; } +color_grn() { printf '\033[32m%s\033[0m' "$1"; } +color_yel() { printf '\033[33m%s\033[0m' "$1"; } + +note_pass() { PASS=$((PASS+1)); printf ' %s %s\n' "$(color_grn PASS)" "$1"; } +note_fail() { FAIL=$((FAIL+1)); FAIL_NAMES+=("$1"); printf ' %s %s\n' "$(color_red FAIL)" "$1"; } +note_skip() { SKIP=$((SKIP+1)); printf ' %s %s — %s\n' "$(color_yel SKIP)" "$1" "$2"; } +note_na() { printf ' %s %s — not applicable to %s\n' "$(color_yel SKIP-NA)" "$1" "$TEST_ARCH"; } + +# is_native_target=1 when the cross-target arch matches the host arch (needed +# for in-process JIT exec in the L2 lane). +arch_raw="$(uname -m 2>/dev/null || true)" +is_native_target=0 +case "$TEST_ARCH" in + aa64) { [ "$arch_raw" = "aarch64" ] || [ "$arch_raw" = "arm64" ]; } && is_native_target=1 ;; + x64) { [ "$arch_raw" = "x86_64" ] || [ "$arch_raw" = "amd64" ]; } && is_native_target=1 ;; + rv64) [ "$arch_raw" = "riscv64" ] && is_native_target=1 ;; +esac + +have_jit_runner=0 +[ -x "$JIT_RUNNER" ] && have_jit_runner=1 + +# ---- per-case applicability (mirrors test/asm/run.sh) ---------------------- + +case_applies() { + local name="$1" targets tuple + targets="$CORPUS_DIR/$name.targets" + [ -f "$targets" ] || return 0 # no .targets => applies to all arches + for tuple in $(cat "$targets"); do + case "$tuple:$TEST_ARCH" in + aa64:aa64|aarch64:aa64|arm64:aa64) return 0 ;; + x64:x64|x86_64:x64|amd64:x64) return 0 ;; + rv64:rv64|riscv64:rv64) return 0 ;; + esac + done + return 1 +} + +# ---- extraction helpers ---------------------------------------------------- + +# .text bytes as objdump hex-dump lines (filename header stripped). +text_bytes() { "$CFREE" objdump -s -j .text "$1" 2>/dev/null | awk '/^ *[0-9a-f]+ /'; } + +# .text relocation records only (kind/offset/target), excluding other sections +# (e.g. .eh_frame) that `cc -S` does not reproduce. +text_relocs() { + "$CFREE" objdump -r "$1" 2>/dev/null | awk ' + /RELOCATION RECORDS FOR \[\.text\]/ { f=1; next } + /RELOCATION RECORDS FOR/ { f=0 } + f && /^[0-9a-f]/ { print }' +} + +# Emit the in-function decode-failure markers found in a `cc -S` listing. +# Tracks the current section so only .text `.inst` lines count (data/rodata +# `.byte` and inter-function padding are not decode failures). Prints each +# offending line; exit 0 if any were found. +decode_failures() { + awk ' + /^[[:space:]]*\.text[[:space:]]*$/ { intext=1; next } + /^[[:space:]]*\.section/ { intext=0; next } + intext && /[[:space:]]\.inst[[:space:]]/ { print; found=1 } + END { exit(found ? 0 : 1) } + ' "$1" +} + +# ---- run ------------------------------------------------------------------- + +printf 'roundtrip: arch=%s triple=%s opts="%s" lanes=%s native=%d\n' \ + "$TEST_ARCH" "$TRIPLE" "$OPTS" "$PATHS" "$is_native_target" + +if [ ! -x "$CFREE" ]; then + printf ' %s cfree binary missing — run "make bin"\n' "$(color_red FATAL)" >&2 + exit 1 +fi +if [ $RUN_L2 -eq 1 ] && [ $is_native_target -eq 1 ] && [ $have_jit_runner -eq 0 ]; then + printf ' %s jit-runner missing; L2 lane will skip\n' "$(color_yel warn)" +fi + +mkdir -p "$WORK_ROOT" + +shopt -s nullglob +for src in "$CORPUS_DIR"/*.c; do + name="$(basename "$src" .c)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue + if ! case_applies "$name"; then + note_na "$name" + continue + fi + if [ -e "$CORPUS_DIR/$name.skip" ]; then + note_skip "$name" "$(head -n1 "$CORPUS_DIR/$name.skip")" + continue + fi + + for opt in $OPTS; do + tag="$name[-$opt]" + work="$WORK_ROOT/$name/$opt" + mkdir -p "$work" + asm="$work/out.s" + direct="$work/direct.o" + rt="$work/rt.o" + + # Shared compile: assembly listing (L0/L1) + direct object (L1/L2). + if ! "$CFREE" cc -S "-$opt" -target "$TRIPLE" "$src" -o "$asm" \ + >"$work/cc_s.log" 2>&1; then + note_fail "$tag/L0 (cc -S failed; see $work/cc_s.log)" + continue + fi + + # ---- L0: decode completeness ------------------------------------- + if [ $RUN_L0 -eq 1 ]; then + if decode_failures "$asm" >"$work/decode_fail"; then + note_fail "$tag/L0 (undecoded insn in .text; see $work/decode_fail)" + else + note_pass "$tag/L0" + fi + fi + + # ---- L1: byte + reloc round-trip --------------------------------- + # Intra-function branches are re-assemblable (L0/L2 cover them) but the + # assembler relocates same-section branch targets that codegen resolves + # locally, so the .text reloc tables differ. Skip L1 for such cases + # until the assembler grows same-section branch relaxation (P2, + # doc/ASM_ROUNDTRIP_TESTING.md). The synthesized `Lcf_` label in the + # listing is the marker. + if [ $RUN_L1 -eq 1 ] && grep -q 'Lcf_' "$asm"; then + note_skip "$tag/L1" "intra-function branch; needs assembler local-reloc relaxation (P2)" + elif [ $RUN_L1 -eq 1 ]; then + l1_ok=1; l1_why="" + if ! "$CFREE" cc -c "-$opt" -target "$TRIPLE" "$src" -o "$direct" \ + >"$work/cc_c.log" 2>&1; then + l1_ok=0; l1_why="cc -c failed; see $work/cc_c.log" + elif ! "$CFREE" as -target "$TRIPLE" "$asm" -o "$rt" \ + >"$work/as.log" 2>&1; then + l1_ok=0; l1_why="as failed; see $work/as.log" + else + text_bytes "$direct" >"$work/direct.text" + text_bytes "$rt" >"$work/rt.text" + text_relocs "$direct" >"$work/direct.rel" + text_relocs "$rt" >"$work/rt.rel" + if ! diff -u "$work/direct.text" "$work/rt.text" >"$work/text.diff"; then + l1_ok=0; l1_why=".text bytes differ; see $work/text.diff" + elif ! diff -u "$work/direct.rel" "$work/rt.rel" >"$work/rel.diff"; then + l1_ok=0; l1_why=".text relocs differ; see $work/rel.diff" + fi + fi + if [ $l1_ok -eq 1 ]; then note_pass "$tag/L1"; else note_fail "$tag/L1 ($l1_why)"; fi + fi + + # ---- L2: exec equivalence ---------------------------------------- + if [ $RUN_L2 -eq 1 ]; then + if [ $is_native_target -eq 0 ]; then + note_skip "$tag/L2" "non-native target ($arch_raw); cross-exec lane TODO" + elif [ $have_jit_runner -eq 0 ]; then + note_skip "$tag/L2" "jit-runner unavailable" + else + # Reuse direct.o/rt.o from L1 if present; otherwise build them. + [ -f "$direct" ] || "$CFREE" cc -c "-$opt" -target "$TRIPLE" "$src" -o "$direct" >"$work/cc_c.log" 2>&1 + [ -f "$rt" ] || "$CFREE" cc -S "-$opt" -target "$TRIPLE" "$src" -o "$asm" >"$work/cc_s.log" 2>&1 && \ + "$CFREE" as -target "$TRIPLE" "$asm" -o "$rt" >"$work/as.log" 2>&1 + "$JIT_RUNNER" "$direct" >"$work/direct.out" 2>"$work/direct.err"; rc_direct=$? + "$JIT_RUNNER" "$rt" >"$work/rt.out" 2>"$work/rt.err"; rc_rt=$? + l2_ok=1; l2_why="" + if [ "$rc_direct" != "$rc_rt" ]; then + l2_ok=0; l2_why="exit codes differ: direct=$rc_direct rt=$rc_rt" + elif ! diff -q "$work/direct.out" "$work/rt.out" >/dev/null; then + l2_ok=0; l2_why="stdout differs" + elif [ -f "$CORPUS_DIR/$name.expected" ]; then + exp="$(head -n1 "$CORPUS_DIR/$name.expected")" + if [ "$rc_direct" != "$exp" ]; then + l2_ok=0; l2_why="exit $rc_direct != expected $exp" + fi + fi + if [ $l2_ok -eq 1 ]; then note_pass "$tag/L2"; else note_fail "$tag/L2 ($l2_why)"; fi + fi + fi + done +done +shopt -u nullglob + +printf '\n' +if [ "${#FAIL_NAMES[@]}" -gt 0 ]; then + printf 'Failed:\n' + for n in "${FAIL_NAMES[@]}"; do printf ' %s\n' "$n"; done +fi +printf 'Results: %d pass, %d fail, %d skip\n' "$PASS" "$FAIL" "$SKIP" +[ "$FAIL" -eq 0 ] diff --git a/test/asm/roundtrip/arith.c b/test/asm/roundtrip/arith.c @@ -0,0 +1,8 @@ +/* Branch-free integer arithmetic with memory traffic (volatile forces real + * loads/stores + multiply/subtract rather than constant folding). No relocs, + * no branches: exercises the plain-operand disasm/encode round-trip beyond the + * ret42 leaf. Exit code 100*7 - 658 = 42. */ +int test_main(void) { + volatile int a = 100, b = 7; + return a * b - 658; +} diff --git a/test/asm/roundtrip/arith.expected b/test/asm/roundtrip/arith.expected @@ -0,0 +1 @@ +42 diff --git a/test/asm/roundtrip/call.c b/test/asm/roundtrip/call.c @@ -0,0 +1,8 @@ +/* A real intra-module call. `noinline` keeps `helper` a distinct function so + * codegen emits a CALL26 relocation against it (rather than inlining), which + * the symbolizer must render as `bl helper`. Both function bodies are + * branch-free, so the only round-trip surface added here is CALL26. Same-file + * calls relocate against the function symbol on both sides (codegen and `as`), + * so the .text reloc tables match. Exit code helper(21) = 42. */ +__attribute__((noinline)) static int helper(int x) { return x + x; } +int test_main(void) { return helper(21); } diff --git a/test/asm/roundtrip/call.expected b/test/asm/roundtrip/call.expected @@ -0,0 +1 @@ +42 diff --git a/test/asm/roundtrip/cond.c b/test/asm/roundtrip/cond.c @@ -0,0 +1,7 @@ +/* A conditional (ternary) compiled to an intra-function conditional branch. + * `noinline` forces a real call (CALL26) plus the in-body branch, so this + * exercises both reloc symbolization (the call, checked by L1) and branch-label + * synthesis (the conditional, covered by L0/L2; L1 auto-skipped pending P2). + * Exit code: |-42| = 42. */ +__attribute__((noinline)) static int sel(int x) { return x < 0 ? -x : x; } +int test_main(void) { return sel(-42); } diff --git a/test/asm/roundtrip/cond.expected b/test/asm/roundtrip/cond.expected @@ -0,0 +1 @@ +42 diff --git a/test/asm/roundtrip/global.c b/test/asm/roundtrip/global.c @@ -0,0 +1,6 @@ +/* Read of a module-global. Codegen emits the adrp/ldr pair with an + * ADR_PREL_PG_HI21 + LDST32_ABS_LO12_NC relocation pair against `g`, which the + * symbolizer must render as `adrp x, g` + `ldr w, [x, :lo12:g]`. Branch-free. + * Exit code g + 1 = 42. */ +int g = 41; +int test_main(void) { return g + 1; } diff --git a/test/asm/roundtrip/global.expected b/test/asm/roundtrip/global.expected @@ -0,0 +1 @@ +42 diff --git a/test/asm/roundtrip/loop.c b/test/asm/roundtrip/loop.c @@ -0,0 +1,12 @@ +/* Control flow: a counted loop (volatile bound defeats constant folding so the + * loop body and back-edge survive). Codegen resolves the loop's branches + * within the function (no relocs); the symbolizer synthesizes `Lcf_` labels so + * `-S` is re-assemblable. L0 (decode) and L2 (re-assemble + execute) cover + * this; L1 is auto-skipped pending assembler same-section branch relaxation + * (P2, doc/ASM_ROUNDTRIP_TESTING.md). Exit code: (0+..+8) + 6 = 42. */ +int test_main(void) { + volatile int n = 9; + int sum = 0; + for (int i = 0; i < n; ++i) sum += i; + return sum + 6; +} diff --git a/test/asm/roundtrip/loop.expected b/test/asm/roundtrip/loop.expected @@ -0,0 +1 @@ +42 diff --git a/test/asm/roundtrip/ret42.c b/test/asm/roundtrip/ret42.c @@ -0,0 +1,11 @@ +/* Minimal round-trip vertical slice: a leaf function. + * + * At -O1 this compiles to a branch-free, relocation-free .text body + * (prologue stp / epilogue ldp + a movz + ret), so it round-trips through + * `cc -S | as` without needing any operand symbolization (Phase 2). It is + * the smallest program that exercises every lane (L0/L1/L2) end to end. + * + * Entry is `test_main` to match the shared exec harness (jit-runner / + * link-exe-runner + start.c). The exit code is the L2 oracle. + */ +int test_main(void) { return 42; } diff --git a/test/asm/roundtrip/ret42.expected b/test/asm/roundtrip/ret42.expected @@ -0,0 +1 @@ +42 diff --git a/test/test.mk b/test/test.mk @@ -37,6 +37,9 @@ TEST_TARGETS = \ test-asm \ test-asm-x64 \ test-asm-rv64 \ + test-disasm-complete \ + test-asm-roundtrip \ + test-asm-roundtrip-exec \ test-bounce \ test-cbackend \ test-cg-api \ @@ -619,6 +622,29 @@ test-asm-x64: lib $(ASM_RUNNER) test-asm-rv64: lib $(ASM_RUNNER) @CFREE_TEST_ARCH=rv64 CFREE_TEST_PATHS=HT bash test/asm/run.sh +# Codegen round-trip completeness (doc/ASM_ROUNDTRIP_TESTING.md). These drive +# the `cfree` binary itself (cc -S / as / objdump) over a C corpus rather than +# a hand-written asm corpus, so coverage tracks codegen automatically. +# +# test-disasm-complete L0: cc -S must decode every in-function word +# (no `.inst` markers). Host-independent, no exec. +# test-asm-roundtrip L0+L1: also assert cc -c bytes/relocs == cc -S | as. +# test-asm-roundtrip-exec L0+L1+L2: also run direct vs round-tripped object +# and compare exit codes (native arch; opt-in). +# +# Vertical slice: aa64 only for now; L1/L2 run at -O1 (branch-free), L0 at both +# opt levels. Broadening to -O0, other arches, and the default suite is tracked +# in doc/ASM_ROUNDTRIP_TESTING.md once -S symbolization (Phase 2) lands. +test-disasm-complete: bin + @CFREE_TEST_ARCH=aa64 CFREE_TEST_OPTS="O0 O1" CFREE_TEST_PATHS=0 \ + bash test/asm/roundtrip.sh +test-asm-roundtrip: bin + @CFREE_TEST_ARCH=aa64 CFREE_TEST_OPTS="O0 O1" CFREE_TEST_PATHS=01 \ + bash test/asm/roundtrip.sh +test-asm-roundtrip-exec: bin $(JIT_RUNNER) + @CFREE_TEST_ARCH=aa64 CFREE_TEST_OPTS="O0 O1" CFREE_TEST_PATHS=012 \ + bash test/asm/roundtrip.sh + test-wasm: test-wasm-front test-wasm-target test-wasm-toy test-wasm-front: bin $(WASM_TOOL) $(LINK_EXE_RUNNER) $(JIT_RUNNER)