kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 3d661011371f95c34738c1b22bda05e0309a96f5
parent f26028bd7df956414c3f1fc90638cd69ec15e7c8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat, 30 May 2026 23:00:23 -0700

asm+test: full Toy cc -S → cfree/clang as exec parity across aa64/x64/rv64

The cross-compile + cross-exec lane (test-hostas-cross) now passes BOTH
assemblers by EXECUTION (matching exit codes), not bytes, for all three ELF
targets under podman/qemu: 936/936 each (312 cases × {O0,O1} × 3 arches),
ENFORCE_CLANG on. All three arches are now the gating default.

Three independent problems were in the way:

1. Harness wedge (the "rv64 doesn't work under podman" report). The batched
   single-container runner had no per-case timeout, so ONE hanging binary (a
   clang-assembled jump-table case under qemu-user) blocked all 312 cases,
   leaving the rest unscored (read back as rc 127 → 227 false fails). Add a
   per-case `timeout -s KILL` ($EXEC_CASE_TIMEOUT, default 20s) inside the
   in-container loop so a hang fails exactly one case and the loop continues.
   This made podman exec reliable for all three arches.

2. rv64 clang lane. cfree computes some `&&label` and jump-table targets as
   fixed byte offsets that assume its own uncompressed, un-relaxed layout;
   clang's C extension compresses instructions and shifts them. Emit
   `.option norvc`/`.option norelax` (new ArchAsmOps.file_prologue) to pin the
   layout through any assembler; cfree-as accepts `.option` (it never
   compresses/relaxes anyway).

3. x64 clang lane. x86 has no layout-pinning directive (clang picks movabs vs
   mov-imm32, jmp rel32 vs rel8), so reference code locations SYMBOLICALLY
   instead: `&&label` address-takes become `leaq Lcf_*(%rip)` (un-relocated
   PC-relative computes detected via new ArchAsmOps.pcrel_code_target; the
   target gets a synthesized label), and switch jump-table entries become
   `.quad Lcf_*` rather than `.quad fn+off` (absolute data pointers into an
   executable section are re-pointed at a synthesized code label by the
   extended collect_code_anchors + code_target_label). Gated to arches that
   need it (x86_64); aarch64 is fixed-width and rv64 uses .option norvc, both
   unchanged.

Verified green with no regressions: test-hostas-toy (312/0 both lanes),
test-toy (1338/0), test-asm (27/0), test-asm-x64 (13/0), test-asm-roundtrip
(572/0), test-asm-roundtrip-toy (624/0), test-asm-symmetry (no new asymmetry),
test-diff-llvm (agrees), test-link (122/0), test-elf (40/0), test-driver-ar.

Diffstat:
Mdoc/ASM_ROUNDTRIP_TESTING.md | 57+++++++++++++++++++++++++++------------------------------
Msrc/api/asm_emit.c | 183++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
Msrc/arch/arch.h | 39+++++++++++++++++++++++++++++++++++++++
Msrc/arch/registry.c | 18++++++++++++++++++
Msrc/arch/rv64/asm.c | 16++++++++++++++++
Msrc/arch/x64/asm.c | 66++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/asm/asm.c | 8+++++++-
Mtest/asm/hostas_cross.sh | 46+++++++++++++++++++++++-----------------------
Mtest/lib/exec_target.sh | 15+++++++++++++--
9 files changed, 363 insertions(+), 85 deletions(-)

diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md @@ -246,36 +246,33 @@ raw `exit_group` syscall. Each target **self-skips** (never fails) unless the host has (1) a clang cross target, (2) a runner (podman/qemu), (3) a working `cc -S | cfree as` round-trip for that arch, and (4) a passing **bounded** exec smoke (so a wedged emulator -downgrades to SKIP instead of hanging). Status: - -- **aarch64-linux**: green end-to-end (cfree-as 312/0, clang-as 312/0) — podman - runs arm64 natively in its VM, so it's fast and the primary verified target. -- **x86_64-linux**: the x64 `cc -S` symbolizer is complete — the aarch64 - symbolizer was arch-generalized (`ArchAsmOps.is_local_branch` for `jmp`/`jcc`, - an x64 `reloc_operand` table for `sym(%rip)`/bare-`@PLT`/`@GOTPCREL` with a +4 - rel32 addend bias, operand-driven RIP surgery) and the `emit_data_range` data - path now handles `R_PC32`/`R_PC64` (jump tables, global/array/fp/static-string - data). `cc -S | cfree as` re-assembles AND **cross-EXECS the whole corpus - correctly: cfree-as 312/312.** Byte-faithful 300/312 — the 12 are alloca/abi - cases where the re-assembled encoding is execution-equivalent (e.g. - `leaq (%rsp)` vs `leaq 0(%rsp)`). The clang lane is 301/11 (cfree emits AT&T - text clang rejects). Opt-in (the global clang gate would fail on that residue). -- **riscv64-linux**: the rv64 `cc -S` symbolizer landed — a new `ArchAsmOps` with - `is_local_branch` (j/beq/...), a `reloc_operand` covering `%pcrel_hi`/ - `%pcrel_lo`/`%hi`/`%lo`, the `%pcrel_lo` AUIPC-anchor pairing (synthesized - `.Lpcrel` labels via a new `ARCH_RELOC_SURG_RV_LO12` + `emit_anchor`/ - `ref_anchor`), and an `R_RISCV_CALL` AUIPC+JALR call-pair fusion to `call`/ - `tail`. `cc -S | cfree as` round-trips AND **cross-EXECS correctly: cfree-as - 312/312** — the earlier self-call hang is gone. Byte-faithful 282/312 — the 30 - are tail-call cases where cfree codegen uses `t0` but the standard (and - RAS-friendly) `tail` pseudo the assembler emits uses `t1`; execution-identical. - The clang lane is 254/58 (rv64 data-symbolization syntax + bare-`fcvt` - rounding-mode that clang encodes differently). Opt-in. -- **Remaining (both arches): the third-party `clang` lane.** cfree's `cc -S` is - faithfully re-assemblable and executable by cfree's own `as`, but not yet - fully clang-standard for x64 (a few AT&T spellings) or rv64 (data - `%`-operator syntax; bare-`fcvt` needs an explicit rounding-mode suffix). A - standard-conformance follow-up; does not block the cfree `-S` path. +downgrades to SKIP instead of hanging). All three ELF targets are in the gating +default and pass **both** lanes — **936/936** = 312 cases × {O0,O1} × 3 arches, +cfree-as **and** clang-as, judged purely by execution (matching exit code): + +- **aarch64-linux**: green end-to-end — podman runs arm64 natively in its VM, so + it's fast. Fixed-width encodings preserve cfree's instruction layout, so code + references need no special spelling. +- **x86_64-linux**: cc -S references code locations **symbolically** so clang's + encoding choices (movabs vs mov-imm32, `jmp` rel32 vs rel8) can't shift a fixed + byte offset onto the wrong instruction: a `&&label` address-take is + `leaq Lcf_*(%rip)` (un-relocated PC-relative computes are detected via + `ArchAsmOps.pcrel_code_target`, the target gets a synthesized label), and + switch jump-table entries are `.quad Lcf_*` rather than `.quad fn+off` + (absolute data pointers into an executable section are re-pointed at a + synthesized code label by `collect_code_anchors` + `code_target_label`). +- **riscv64-linux**: cc -S emits `.option norvc`/`.option norelax` + (`ArchAsmOps.file_prologue`) to pin cfree's fixed layout against clang's + C-extension compression — cfree computes some `&&label`/jump-table targets as + fixed offsets, which compression would otherwise shift — plus the + `%pcrel_hi`/`%pcrel_lo` AUIPC-anchor pairing and `R_RISCV_CALL` AUIPC+JALR + fusion to `call`/`tail`. + +Both lanes are judged by **execution**, never by bytes: cfree and clang emit +different (execution-equivalent) code, so a byte/text match would be meaningless. +The batched container runner caps each case at `EXEC_CASE_TIMEOUT` seconds +(default 20) so a single hanging binary can't wedge the whole single-container +run, leaving every later case unscored. Override the matrix with `CFREE_HOSTAS_CROSS_TARGETS="tag:triple ..."`, the exec-smoke cap with `CFREE_HOSTAS_EXEC_TIMEOUT=<secs>`, and per-arch images with diff --git a/src/api/asm_emit.c b/src/api/asm_emit.c @@ -529,6 +529,8 @@ typedef struct { u16 kind; Sym sym; i64 addend; + ObjSecId target_sec; /* section the reloc's symbol is defined in (or NONE) */ + u64 target_val; /* the symbol's value (offset within target_sec) */ } SecReloc; static int cmp_secreloc(const void* va, const void* vb) { @@ -564,6 +566,8 @@ static SecReloc* collect_relocs(Compiler* c, ObjBuilder* ob, ObjSecId sec_id, arr[n].kind = r->kind; arr[n].sym = s ? s->name : (Sym)0; arr[n].addend = r->addend; + arr[n].target_sec = s ? s->section_id : OBJ_SEC_NONE; + arr[n].target_val = s ? s->value : 0; ++n; } if (n > 1) qsort(arr, n, sizeof(SecReloc), cmp_secreloc); @@ -909,12 +913,40 @@ static int is_btarget(const EmitCtx* x, u32 off) { return 0; } -/* Pre-scan: collect in-section branch targets of un-relocated local branches. */ -static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm, - const SecReloc* relocs, u32 nrelocs, - const u8* data, u32 total, u32* n_out) { +/* Append `off` to a dynamic, deduplicated anchor array (arena-grown). */ +static void anchor_add(Compiler* c, u32** arr, u32* n, u32* cap, u32 off) { + u32 j; + for (j = 0; j < *n; ++j) + if ((*arr)[j] == off) return; + if (*n == *cap) { + u32 nc = *cap ? *cap * 2 : 8; + u32* na = arena_array(c->tu, u32, nc); + if (!na) return; + if (*arr) memcpy(na, *arr, *cap * sizeof(u32)); + *arr = na; + *cap = nc; + } + (*arr)[(*n)++] = off; +} + +/* Pre-scan: offsets in section `sec_id` that need a synthesized Lcf_ label so a + * layout-dependent reference resolves symbolically through any assembler: + * 1. targets of un-relocated intra-section local branches (b/jmp/jcc); + * 2. targets of un-relocated PC-relative code-address-takes (x86-64 `leaq + * disp(%rip)` for `&&label`); + * 3. offsets targeted by an absolute data-pointer relocation living in a + * NON-executable section (switch jump-table `.quad fn+off` entries). + * (2)/(3) are exactly the references that break when the assembler picks + * different instruction lengths than cfree did, so they are collected only for + * arches that need symbolic code refs (x86-64); fixed-width (aarch64) or + * layout-pinned (RISC-V .option norvc) arches keep the compact offset forms. */ +static u32* collect_code_anchors(Compiler* c, ObjBuilder* ob, ObjSecId sec_id, + ArchDisasm* dasm, const SecReloc* relocs, + u32 nrelocs, const u8* data, u32 total, + u32* n_out) { u32* arr = NULL; u32 n = 0, cap = 0, off = 0; + int want_sym = arch_needs_symbolic_code_refs(c); *n_out = 0; while (off < total) { @@ -925,30 +957,44 @@ static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm, off += 1; continue; } - if (!reloc_in_range(relocs, nrelocs, off, nb) && - arch_is_local_branch(c, insn.mnemonic) && - parse_hex_tail(insn.operands, &tgt) && tgt < total) { - u32 j; - int found = 0; - for (j = 0; j < n; ++j) - if (arr[j] == (u32)tgt) { - found = 1; - break; - } - if (!found) { - if (n == cap) { - u32 nc = cap ? cap * 2 : 8; - u32* na = arena_array(c->tu, u32, nc); - if (!na) break; - if (arr) memcpy(na, arr, cap * sizeof(u32)); - arr = na; - cap = nc; + if (!reloc_in_range(relocs, nrelocs, off, nb)) { + if (arch_is_local_branch(c, insn.mnemonic) && + parse_hex_tail(insn.operands, &tgt) && tgt < total) { + anchor_add(c, &arr, &n, &cap, (u32)tgt); + } else if (want_sym) { + i64 disp; + if (arch_pcrel_code_target(c, insn.mnemonic, insn.operands, &disp)) { + i64 t = (i64)off + (i64)nb + disp; + if (t >= 0 && (u64)t < total) + anchor_add(c, &arr, &n, &cap, (u32)t); } - arr[n++] = (u32)tgt; } } off += nb; } + + if (want_sym) { + u32 nr = obj_reloc_total(ob), i; + for (i = 0; i < nr; ++i) { + const Reloc* r = obj_reloc_at(ob, i); + const Section* host; + const ObjSym* s; + const char* dir; + u32 width; + int pcrel; + i64 t; + if (!r || r->removed) continue; + host = obj_section_get(ob, r->section_id); + if (!host || (host->flags & SF_EXEC)) continue; /* code reloc: skip */ + if (!data_reloc_directive(r->kind, &dir, &width, &pcrel) || pcrel) + continue; /* only absolute data pointers (jump-table entries) */ + s = obj_symbol_get(ob, r->sym); + if (!s || s->section_id != sec_id) continue; + t = (i64)s->value + r->addend; + if (t >= 0 && (u64)t < total) anchor_add(c, &arr, &n, &cap, (u32)t); + } + } + if (n > 1) qsort(arr, n, sizeof(u32), cmp_u32); *n_out = n; return arr; @@ -998,10 +1044,55 @@ static CfreeStatus emit_operands(Writer* w, const EmitCtx* x, return w_symbolized(w, insn->operands.s, insn->operands.len, name, ARCH_RELOC_SURG_TAIL); } + } else { + /* Un-relocated PC-relative code-address-take (x86-64 `leaq disp(%rip)` for + * `&&label`): rewrite the fixed displacement to the synthesized target + * label so an encoding-divergent assembler recomputes it. */ + i64 disp; + if (arch_pcrel_code_target(x->c, insn->mnemonic, insn->operands, &disp)) { + i64 t = (i64)off + (i64)insn->nbytes + disp; + if (t >= 0 && is_btarget(x, (u32)t)) { + char name[256]; + build_label_name(name, sizeof name, x, (u32)t); + return w_symbolized(w, insn->operands.s, insn->operands.len, name, + ARCH_RELOC_SURG_RIP); + } + } } return cfree_writer_write(w, insn->operands.s, insn->operands.len); } +/* Symbolic name for a code location (target_sec:target_off) referenced from a + * data directive: an assemblable label defined exactly there if one exists, + * else the synthesized `Lcf_<sec>_<off>` that collect_code_anchors guarantees + * is emitted in the target section. Mirrors the synth-vs-real choice the label + * emitter makes (symbol_at / build_label_name), so both ends agree. */ +static u32 code_target_label(char* buf, u32 cap, Compiler* c, ObjBuilder* ob, + ObjSecId target_sec, u32 target_off) { + ObjSymIter* it = obj_symiter_new(ob); + if (it) { + ObjSymEntry e; + while (obj_symiter_next(it, &e)) { + const ObjSym* s = e.sym; + Slice nm; + if (!s || s->removed || !s->name) continue; + if (s->section_id != target_sec || (u32)s->value != target_off) continue; + if (s->kind == SK_SECTION || s->kind == SK_FILE) continue; + nm = pool_slice(c->global, s->name); + if (slice_eq_cstr(nm, ".LpcrelHi")) continue; + if (sym_is_assemblable(nm)) { + u32 p = 0, j; + for (j = 0; j < nm.len && p + 1 < cap; ++j) buf[p++] = nm.s[j]; + buf[p] = '\0'; + obj_symiter_free(it); + return p; + } + } + obj_symiter_free(it); + } + return fmt_synth_label(buf, cap, (u32)target_sec, target_off); +} + /* Emit a data range, rendering any covered relocation as a symbolic integer * directive (`.quad sym+addend`) so cc -S | as reproduces the data relocation * table — switch jump tables (R_ABS64 against the function) and any other @@ -1009,9 +1100,9 @@ static CfreeStatus emit_operands(Writer* w, const EmitCtx* x, * target the assembler can't spell, falls back to raw `.byte`; the dropped * reloc then surfaces in the round-trip's reloc comparison. `relocs` is the * section's relocation list, sorted by offset. */ -static CfreeStatus emit_data_range(Writer* w, Compiler* c, const u8* data, - u32 start, u32 end, const SecReloc* relocs, - u32 nrelocs) { +static CfreeStatus emit_data_range(Writer* w, Compiler* c, ObjBuilder* ob, + const u8* data, u32 start, u32 end, + const SecReloc* relocs, u32 nrelocs) { u32 off = start; while (off < end) { const SecReloc* r = NULL; @@ -1037,6 +1128,33 @@ static CfreeStatus emit_data_range(Writer* w, Compiler* c, const u8* data, * re-derives R_PC{32,64} instead of an absolute reloc. */ ArchRelocOperand bare = {ARCH_RELOC_SURG_NONE, "", "", 0, 0, 0}; if (data_reloc_directive(r->kind, &dir, &width, &pcrel) && + off + width <= end) { + const Section* tsec = (r->target_sec != OBJ_SEC_NONE) + ? obj_section_get(ob, r->target_sec) + : NULL; + /* An absolute pointer into executable code (switch jump-table entry): + * spell it as a label that moves with the code rather than `fn+off`. + * After an encoding-divergent assembler re-lays-out the function, a + * fixed offset would point into the wrong instruction; a label is + * recomputed to the correct address. Only for arches that need it. */ + if (!pcrel && tsec && (tsec->flags & SF_EXEC) && + arch_needs_symbolic_code_refs(c)) { + char label[256]; + u64 toff = r->target_val + (u64)r->addend; + CfreeStatus st; + code_target_label(label, sizeof label, c, ob, r->target_sec, + (u32)toff); + st = w_str(w, dir); + if (st != CFREE_OK) return st; + st = w_str(w, label); + if (st != CFREE_OK) return st; + st = w_newline(w); + if (st != CFREE_OK) return st; + off += width; + continue; + } + } + if (data_reloc_directive(r->kind, &dir, &width, &pcrel) && off + width <= end && build_symref(symref, sizeof symref, c, &bare, r->sym, r->addend) >= 0) { @@ -1198,6 +1316,13 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, sx.c = c; nsec = obj_section_count(ob); + /* Arch-specific leading directives (e.g. RISC-V `.option norvc` to pin + * cfree's fixed instruction layout against a compressing assembler). */ + { + const char* prologue = arch_asm_file_prologue(c); + if (prologue) w_str(w, prologue); + } + for (i = 1; i < nsec; ++i) { const Section* sec = obj_section_get(ob, (ObjSecId)i); SymLabel* labels; @@ -1243,8 +1368,8 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, buf_flatten(&sec->bytes, heap_data); flat_data = heap_data; if (dasm) - btargets = collect_branch_targets(c, dasm, relocs, nrelocs, flat_data, - total, &nbt); + btargets = collect_code_anchors(c, ob, (ObjSecId)i, dasm, relocs, + nrelocs, flat_data, total, &nbt); } } else if (total > 0 && sec->kind != SEC_BSS) { Heap* heap = c->ctx->heap; @@ -1297,7 +1422,7 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, } else if ((sec->flags & SF_EXEC) && dasm && flat_data) { emit_disasm_range(w, &ctx, dasm, flat_data, off, next); } else if (flat_data) { - emit_data_range(w, c, flat_data, off, next, relocs, nrelocs); + emit_data_range(w, c, ob, flat_data, off, next, relocs, nrelocs); } off = next; } diff --git a/src/arch/arch.h b/src/arch/arch.h @@ -244,6 +244,28 @@ typedef struct ArchAsmOps { * pair fusion for the arch. */ int (*reloc_call_pair)(u16 reloc_kind, CfreeSlice pair_mnemonic, CfreeSlice pair_ops, const char** mnemonic_out); + /* Arch-specific leading directives emitted at the very top of a cc -S file, + * before any section, returned as a NUL-terminated string the printer writes + * verbatim (NULL = none). RISC-V returns "\t.option norvc\n.option norelax\n": + * cfree's codegen computes some PC-relative label / jump-table targets as + * fixed byte offsets that assume its own uncompressed, un-relaxed instruction + * stream, so a third-party assembler (clang) must be told not to compress or + * relax, or those offsets shift and the targets break. aarch64/x86-64 have + * fixed-width encodings and no such layout dependence -> NULL. */ + const char* (*file_prologue)(void); + /* 1 if (mnemonic, operands) is an un-relocated PC-relative reference to a + * code address computed as a fixed displacement — x86-64 `leaq disp(%rip), + * reg` emitted for a `&&label` address-take. Sets *disp_out to the signed + * byte displacement from the END of the instruction to the target. The + * symbolizer then synthesizes a label at (insn_end + disp) and rewrites the + * displacement to that label so a re-encoding assembler recomputes it. + * Providing this hook ALSO opts the arch into symbolic switch jump-table + * entries (.quad fn+off -> .quad <label>): both are needed precisely when the + * arch's assembler may pick different instruction lengths than cfree did + * (x86-64 movabs/mov-imm32, jmp rel32/rel8). Fixed-width arches (aarch64) and + * arches that pin layout another way (RISC-V .option norvc) leave it NULL. */ + int (*pcrel_code_target)(CfreeSlice mnemonic, CfreeSlice operands, + i64* disp_out); } ArchAsmOps; typedef struct ArchImpl { @@ -312,6 +334,23 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind, CfreeSlice pair_mnemonic, CfreeSlice pair_ops, const char** mnemonic_out); +/* Leading directive string for the top of a cc -S file for the compiler's + * target arch (e.g. RISC-V `.option norvc`), or NULL when the arch needs none. + * Thin dispatch over ArchAsmOps.file_prologue. */ +const char* arch_asm_file_prologue(const Compiler* c); + +/* 1 if `insn` is an un-relocated PC-relative code-address-take for the target + * arch, with *disp_out set to the signed displacement from the instruction end + * to the target. Thin dispatch over ArchAsmOps.pcrel_code_target. */ +int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic, + CfreeSlice operands, i64* disp_out); + +/* 1 if the target arch needs code locations referenced symbolically (by label) + * rather than as fixed byte offsets in cc -S — true exactly for arches that + * provide pcrel_code_target (x86-64). Drives both `&&label` address-take and + * switch jump-table symbolization. */ +int arch_needs_symbolic_code_refs(const Compiler* c); + ArchDisasm* arch_disasm_new(Compiler*); u32 arch_disasm_decode(ArchDisasm*, const u8* bytes, size_t len, u64 vaddr, CfreeInsn* out); diff --git a/src/arch/registry.c b/src/arch/registry.c @@ -110,6 +110,24 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind, mnemonic_out); } +const char* arch_asm_file_prologue(const Compiler* c) { + const ArchImpl* a = arch_for_compiler(c); + if (!a || !a->asm_ops || !a->asm_ops->file_prologue) return NULL; + return a->asm_ops->file_prologue(); +} + +int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic, + CfreeSlice operands, i64* disp_out) { + const ArchImpl* a = arch_for_compiler(c); + if (!a || !a->asm_ops || !a->asm_ops->pcrel_code_target) return 0; + return a->asm_ops->pcrel_code_target(mnemonic, operands, disp_out); +} + +int arch_needs_symbolic_code_refs(const Compiler* c) { + const ArchImpl* a = arch_for_compiler(c); + return a && a->asm_ops && a->asm_ops->pcrel_code_target != NULL; +} + const CGBackend* cg_backend_for_session(const Compiler* c, const CfreeCodeOptions* opts) { if (opts && opts->check_only) { diff --git a/src/arch/rv64/asm.c b/src/arch/rv64/asm.c @@ -1110,10 +1110,26 @@ static int rv64_reloc_call_pair(u16 kind, CfreeSlice pair_mnemonic, return 0; } +/* RISC-V cc -S file prologue. cfree computes a few PC-relative targets as + * fixed byte offsets baked into the instruction stream rather than as symbolic + * relocations: a `&&label` address-of (auipc+addi with a hardcoded immediate, + * no reloc) and switch jump-table entries (`.quad fn+offset`). Both assume + * cfree's own 4-byte-per-instruction, un-relaxed layout. A standards-conformant + * assembler such as clang defaults to the C extension and would compress + * instructions (e.g. `mv`->`c.mv`), shifting every later offset and sending + * those targets to the wrong place. `.option norvc`/`.option norelax` pin the + * layout so cfree's offsets stay valid through any assembler — cfree's own + * codegen never emits compressed/relaxed forms, so this only constrains a + * third party to match what cfree already does. */ +static const char* rv64_file_prologue(void) { + return "\t.option norvc\n\t.option norelax\n"; +} + const ArchAsmOps rv64_asm_ops = { .reloc_operand = rv64_reloc_operand, .is_local_branch = rv64_is_local_branch, .reloc_call_pair = rv64_reloc_call_pair, + .file_prologue = rv64_file_prologue, }; ArchAsm* rv64_arch_asm_new(Compiler* c) { diff --git a/src/arch/x64/asm.c b/src/arch/x64/asm.c @@ -1637,9 +1637,75 @@ static int x64_is_local_branch(CfreeSlice m) { return 0; } +/* Parse a leading signed integer (decimal or 0x-hex) from [s, s+len). Returns + * chars consumed and sets *out, or 0 if no integer starts here. */ +static u32 x64_parse_leading_int(const char* s, u32 len, i64* out) { + u32 i = 0, start; + int neg = 0; + i64 v = 0; + if (i < len && (s[i] == '+' || s[i] == '-')) { + neg = (s[i] == '-'); + ++i; + } + if (i + 1 < len && s[i] == '0' && (s[i + 1] == 'x' || s[i + 1] == 'X')) { + i += 2; + start = i; + for (; i < len; ++i) { + char c = s[i]; + if (c >= '0' && c <= '9') + v = v * 16 + (c - '0'); + else if (c >= 'a' && c <= 'f') + v = v * 16 + (c - 'a' + 10); + else if (c >= 'A' && c <= 'F') + v = v * 16 + (c - 'A' + 10); + else + break; + } + } else { + start = i; + for (; i < len; ++i) { + char c = s[i]; + if (c >= '0' && c <= '9') + v = v * 10 + (c - '0'); + else + break; + } + } + if (i == start) return 0; + *out = neg ? -v : v; + return i; +} + +/* x86-64 `&&label` address-take: an un-relocated `leaq <disp>(%rip), %reg`. The + * disassembler renders the resolved target as a fixed displacement from the + * next instruction (the %rip base); report it so the symbolizer can swap in a + * label that an encoding-divergent assembler will recompute correctly. */ +static int x64_pcrel_code_target(CfreeSlice mnemonic, CfreeSlice operands, + i64* disp_out) { + const char* o = operands.s; + u32 ol = operands.len, i, n; + i64 disp = 0; + int has_rip = 0; + if (!(mnemonic.len == 4 && memcmp(mnemonic.s, "leaq", 4) == 0) && + !(mnemonic.len == 3 && memcmp(mnemonic.s, "lea", 3) == 0)) + return 0; + for (i = 0; i + 6 <= ol; ++i) + if (memcmp(o + i, "(%rip)", 6) == 0) { + has_rip = 1; + break; + } + if (!has_rip) return 0; + n = x64_parse_leading_int(o, ol, &disp); + /* The displacement must sit immediately before `(%rip)`. */ + if (n == 0 || !(n + 6 <= ol && memcmp(o + n, "(%rip)", 6) == 0)) return 0; + *disp_out = disp; + return 1; +} + const ArchAsmOps x64_asm_ops = { .reloc_operand = x64_reloc_operand, .is_local_branch = x64_is_local_branch, + .pcrel_code_target = x64_pcrel_code_target, }; ArchAsm* x64_arch_asm_new(Compiler* c) { return &x64_asm_open(c)->base; } diff --git a/src/asm/asm.c b/src/asm/asm.c @@ -1180,7 +1180,13 @@ static void do_directive(AsmDriver* d, Sym name) { sym_eq(d, name, "subsections_via_symbols") || sym_eq(d, name, "macro") || sym_eq(d, name, "endm") || sym_eq(d, name, "if") || sym_eq(d, name, "endif") || sym_eq(d, name, "else") || - sym_eq(d, name, "include")) { + sym_eq(d, name, "include") || + /* RISC-V `.option rvc/norvc/relax/norelax/push/pop/...`: cfree's own + * cc -S emits `.option norvc`/`.option norelax` to pin its fixed + * instruction layout (see rv64_file_prologue). cfree-as never compresses + * or relaxes, so it already honors these implicitly — accept and ignore + * rather than treat as an unknown directive. */ + sym_eq(d, name, "option")) { d_skip_to_eol(d); return; } diff --git a/test/asm/hostas_cross.sh b/test/asm/hostas_cross.sh @@ -27,23 +27,24 @@ # clang cross-compiler for it, (2) a runner (podman/qemu) per exec_target, (3) a # working `cfree cc -S | cfree as` round-trip for that arch, and (4) a bounded # exec smoke that returns the oracle. So the harness runs green on whatever the -# host supports and self-extends as gaps close. Status at time of writing: -# - aarch64-linux: works end-to-end (podman runs arm64 natively in its VM). -# This is the gating default (312/312 both lanes). -# - x86_64-linux: `cc -S | cfree as` round-trips and CROSS-EXECS the whole -# corpus correctly (cfree-as 312/312). The clang lane has a -# small residue (~11 efail: cfree emits AT&T text clang -# rejects). Opt-in: the global clang gate (ENFORCE_CLANG=1) -# would fail on that residue, so x64 isn't in the gating -# default yet — run with CFREE_HOSTAS_CROSS_TARGETS, optionally -# CFREE_HOSTAS_ENFORCE_CLANG=0. -# - riscv64-linux: `cc -S | cfree as` round-trips and CROSS-EXECS correctly -# (cfree-as 312/312) — the rv64 symbolizer (ArchAsmOps with -# %pcrel_hi/%pcrel_lo anchor pairing + AUIPC/JALR call fusion) -# landed; the earlier self-call hang is gone. The clang lane -# has a larger residue (~58 efail: rv64 data-symbolization -# syntax + bare-fcvt rounding-mode that clang encodes -# differently). Opt-in, same as x64. +# host supports and self-extends as gaps close. All three ELF targets now pass +# BOTH lanes end-to-end (936/936 = 312 cases x {O0,O1} x 3 arches, ENFORCE_CLANG): +# - aarch64-linux: podman runs arm64 natively in its VM; fixed-width encodings +# keep cfree's layout, so code references need no special form. +# - x86_64-linux: cc -S references code locations symbolically — `&&label` +# address-takes (`leaq Lcf_*(%rip)`) and switch jump-table +# entries (`.quad Lcf_*`) — so clang's encoding choices +# (movabs vs mov-imm32, jmp rel32 vs rel8) can't shift a fixed +# offset onto the wrong instruction. (ArchAsmOps.pcrel_code_target +# + collect_code_anchors; see src/api/asm_emit.c.) +# - riscv64-linux: cc -S emits `.option norvc`/`.option norelax` to pin cfree's +# fixed instruction layout against clang's C-extension +# compression, plus the %pcrel_hi/%pcrel_lo + AUIPC/JALR call +# symbolizer. +# Execution under qemu-user (x86_64/riscv64 in their podman containers) is the +# sole judge — cfree and clang emit different code, so a byte/text match would be +# meaningless. The batched runner caps each case (EXEC_CASE_TIMEOUT) so one +# hanging binary can't wedge the whole container. # # Override the matrix with CFREE_HOSTAS_CROSS_TARGETS="tag:triple ..." and the # clang-as gate with CFREE_HOSTAS_ENFORCE_CLANG=0 (demote lane B to XFAIL). @@ -60,12 +61,11 @@ FILTER="${1:-}" ENFORCE_CLANG="${CFREE_HOSTAS_ENFORCE_CLANG:-1}" EXEC_SMOKE_TIMEOUT="${CFREE_HOSTAS_EXEC_TIMEOUT:-45}" -# "tag:triple" — tag is exec_target.sh's <arch>-<os> spelling. The gating -# default is the fully-verified target (aarch64-linux). x86_64 and riscv64 are -# wired and opt-in (see the status notes above) — add them with -# CFREE_HOSTAS_CROSS_TARGETS once you want to exercise their in-progress lanes: -# CFREE_HOSTAS_CROSS_TARGETS="x64-linux:x86_64-linux-gnu rv64-linux:riscv64-linux-gnu" -TARGETS="${CFREE_HOSTAS_CROSS_TARGETS:-aarch64-linux:aarch64-linux-gnu}" +# "tag:triple" — tag is exec_target.sh's <arch>-<os> spelling. All three ELF +# targets are in the gating default (each SKIPs cleanly if its clang cross +# target or container runner is unavailable). Narrow the matrix with +# CFREE_HOSTAS_CROSS_TARGETS, e.g. CFREE_HOSTAS_CROSS_TARGETS="x64-linux:x86_64-linux-gnu". +TARGETS="${CFREE_HOSTAS_CROSS_TARGETS:-aarch64-linux:aarch64-linux-gnu x64-linux:x86_64-linux-gnu rv64-linux:riscv64-linux-gnu}" # Same TLS-symbolization skip as the sibling lanes. SKIP="141_threadlocal_mutate" diff --git a/test/lib/exec_target.sh b/test/lib/exec_target.sh @@ -287,7 +287,15 @@ _exec_target_flush_tag() { echo "exec_target_flush: EXEC_TARGET_MOUNT_ROOT must be set" >&2 return 2 fi - local platform image platform_flag=() + local platform image platform_flag=() case_to + # Per-case wall-clock cap inside the batched container. Without it a + # single hanging exe (e.g. a miscompiled loop, or qemu-user wedging on + # one binary) blocks the whole single-container run, leaving every + # later case with no .rc — which the caller reads back as 127 and + # reports as a mass failure. With it, a hang is killed (rc 137) and the + # loop moves on, so a real hang fails exactly one case. Override with + # EXEC_CASE_TIMEOUT (seconds); generous by default for slow TCG. + case_to="${EXEC_CASE_TIMEOUT:-20}" platform="$(_exec_target_platform "$tag")" image="$(_exec_target_image "$tag")" if ! _exec_target_podman_native "$tag"; then @@ -307,12 +315,15 @@ _exec_target_flush_tag() { "${EXEC_TARGET_RCS[$k]}" done } | podman run -i --rm --pull=never "${platform_flag[@]}" --net=none \ + -e EXEC_CASE_TIMEOUT="$case_to" \ -v "$EXEC_TARGET_MOUNT_ROOT":"$EXEC_TARGET_MOUNT_ROOT":Z \ "$image" \ /bin/sh -c ' set -u +_to="${EXEC_CASE_TIMEOUT:-20}" +if command -v timeout >/dev/null 2>&1; then _t="timeout -s KILL $_to"; else _t=""; fi while IFS=" " read -r exe out err rc; do - "$exe" >"$out" 2>"$err" + $_t "$exe" >"$out" 2>"$err" echo $? >"$rc" done '