commit 3d661011371f95c34738c1b22bda05e0309a96f5
parent f26028bd7df956414c3f1fc90638cd69ec15e7c8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 30 May 2026 23:00:23 -0700
asm+test: full Toy cc -S → cfree/clang as exec parity across aa64/x64/rv64
The cross-compile + cross-exec lane (test-hostas-cross) now passes BOTH
assemblers by EXECUTION (matching exit codes), not bytes, for all three ELF
targets under podman/qemu: 936/936 each (312 cases × {O0,O1} × 3 arches),
ENFORCE_CLANG on. All three arches are now the gating default.
Three independent problems were in the way:
1. Harness wedge (the "rv64 doesn't work under podman" report). The batched
single-container runner had no per-case timeout, so ONE hanging binary (a
clang-assembled jump-table case under qemu-user) blocked all 312 cases,
leaving the rest unscored (read back as rc 127 → 227 false fails). Add a
per-case `timeout -s KILL` ($EXEC_CASE_TIMEOUT, default 20s) inside the
in-container loop so a hang fails exactly one case and the loop continues.
This made podman exec reliable for all three arches.
2. rv64 clang lane. cfree computes some `&&label` and jump-table targets as
fixed byte offsets that assume its own uncompressed, un-relaxed layout;
clang's C extension compresses instructions and shifts them. Emit
`.option norvc`/`.option norelax` (new ArchAsmOps.file_prologue) to pin the
layout through any assembler; cfree-as accepts `.option` (it never
compresses/relaxes anyway).
3. x64 clang lane. x86 has no layout-pinning directive (clang picks movabs vs
mov-imm32, jmp rel32 vs rel8), so reference code locations SYMBOLICALLY
instead: `&&label` address-takes become `leaq Lcf_*(%rip)` (un-relocated
PC-relative computes detected via new ArchAsmOps.pcrel_code_target; the
target gets a synthesized label), and switch jump-table entries become
`.quad Lcf_*` rather than `.quad fn+off` (absolute data pointers into an
executable section are re-pointed at a synthesized code label by the
extended collect_code_anchors + code_target_label). Gated to arches that
need it (x86_64); aarch64 is fixed-width and rv64 uses .option norvc, both
unchanged.
Verified green with no regressions: test-hostas-toy (312/0 both lanes),
test-toy (1338/0), test-asm (27/0), test-asm-x64 (13/0), test-asm-roundtrip
(572/0), test-asm-roundtrip-toy (624/0), test-asm-symmetry (no new asymmetry),
test-diff-llvm (agrees), test-link (122/0), test-elf (40/0), test-driver-ar.
Diffstat:
9 files changed, 363 insertions(+), 85 deletions(-)
diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md
@@ -246,36 +246,33 @@ raw `exit_group` syscall.
Each target **self-skips** (never fails) unless the host has (1) a clang cross
target, (2) a runner (podman/qemu), (3) a working `cc -S | cfree as` round-trip
for that arch, and (4) a passing **bounded** exec smoke (so a wedged emulator
-downgrades to SKIP instead of hanging). Status:
-
-- **aarch64-linux**: green end-to-end (cfree-as 312/0, clang-as 312/0) — podman
- runs arm64 natively in its VM, so it's fast and the primary verified target.
-- **x86_64-linux**: the x64 `cc -S` symbolizer is complete — the aarch64
- symbolizer was arch-generalized (`ArchAsmOps.is_local_branch` for `jmp`/`jcc`,
- an x64 `reloc_operand` table for `sym(%rip)`/bare-`@PLT`/`@GOTPCREL` with a +4
- rel32 addend bias, operand-driven RIP surgery) and the `emit_data_range` data
- path now handles `R_PC32`/`R_PC64` (jump tables, global/array/fp/static-string
- data). `cc -S | cfree as` re-assembles AND **cross-EXECS the whole corpus
- correctly: cfree-as 312/312.** Byte-faithful 300/312 — the 12 are alloca/abi
- cases where the re-assembled encoding is execution-equivalent (e.g.
- `leaq (%rsp)` vs `leaq 0(%rsp)`). The clang lane is 301/11 (cfree emits AT&T
- text clang rejects). Opt-in (the global clang gate would fail on that residue).
-- **riscv64-linux**: the rv64 `cc -S` symbolizer landed — a new `ArchAsmOps` with
- `is_local_branch` (j/beq/...), a `reloc_operand` covering `%pcrel_hi`/
- `%pcrel_lo`/`%hi`/`%lo`, the `%pcrel_lo` AUIPC-anchor pairing (synthesized
- `.Lpcrel` labels via a new `ARCH_RELOC_SURG_RV_LO12` + `emit_anchor`/
- `ref_anchor`), and an `R_RISCV_CALL` AUIPC+JALR call-pair fusion to `call`/
- `tail`. `cc -S | cfree as` round-trips AND **cross-EXECS correctly: cfree-as
- 312/312** — the earlier self-call hang is gone. Byte-faithful 282/312 — the 30
- are tail-call cases where cfree codegen uses `t0` but the standard (and
- RAS-friendly) `tail` pseudo the assembler emits uses `t1`; execution-identical.
- The clang lane is 254/58 (rv64 data-symbolization syntax + bare-`fcvt`
- rounding-mode that clang encodes differently). Opt-in.
-- **Remaining (both arches): the third-party `clang` lane.** cfree's `cc -S` is
- faithfully re-assemblable and executable by cfree's own `as`, but not yet
- fully clang-standard for x64 (a few AT&T spellings) or rv64 (data
- `%`-operator syntax; bare-`fcvt` needs an explicit rounding-mode suffix). A
- standard-conformance follow-up; does not block the cfree `-S` path.
+downgrades to SKIP instead of hanging). All three ELF targets are in the gating
+default and pass **both** lanes — **936/936** = 312 cases × {O0,O1} × 3 arches,
+cfree-as **and** clang-as, judged purely by execution (matching exit code):
+
+- **aarch64-linux**: green end-to-end — podman runs arm64 natively in its VM, so
+ it's fast. Fixed-width encodings preserve cfree's instruction layout, so code
+ references need no special spelling.
+- **x86_64-linux**: cc -S references code locations **symbolically** so clang's
+ encoding choices (movabs vs mov-imm32, `jmp` rel32 vs rel8) can't shift a fixed
+ byte offset onto the wrong instruction: a `&&label` address-take is
+ `leaq Lcf_*(%rip)` (un-relocated PC-relative computes are detected via
+ `ArchAsmOps.pcrel_code_target`, the target gets a synthesized label), and
+ switch jump-table entries are `.quad Lcf_*` rather than `.quad fn+off`
+ (absolute data pointers into an executable section are re-pointed at a
+ synthesized code label by `collect_code_anchors` + `code_target_label`).
+- **riscv64-linux**: cc -S emits `.option norvc`/`.option norelax`
+ (`ArchAsmOps.file_prologue`) to pin cfree's fixed layout against clang's
+ C-extension compression — cfree computes some `&&label`/jump-table targets as
+ fixed offsets, which compression would otherwise shift — plus the
+ `%pcrel_hi`/`%pcrel_lo` AUIPC-anchor pairing and `R_RISCV_CALL` AUIPC+JALR
+ fusion to `call`/`tail`.
+
+Both lanes are judged by **execution**, never by bytes: cfree and clang emit
+different (execution-equivalent) code, so a byte/text match would be meaningless.
+The batched container runner caps each case at `EXEC_CASE_TIMEOUT` seconds
+(default 20) so a single hanging binary can't wedge the whole single-container
+run, leaving every later case unscored.
Override the matrix with `CFREE_HOSTAS_CROSS_TARGETS="tag:triple ..."`, the
exec-smoke cap with `CFREE_HOSTAS_EXEC_TIMEOUT=<secs>`, and per-arch images with
diff --git a/src/api/asm_emit.c b/src/api/asm_emit.c
@@ -529,6 +529,8 @@ typedef struct {
u16 kind;
Sym sym;
i64 addend;
+ ObjSecId target_sec; /* section the reloc's symbol is defined in (or NONE) */
+ u64 target_val; /* the symbol's value (offset within target_sec) */
} SecReloc;
static int cmp_secreloc(const void* va, const void* vb) {
@@ -564,6 +566,8 @@ static SecReloc* collect_relocs(Compiler* c, ObjBuilder* ob, ObjSecId sec_id,
arr[n].kind = r->kind;
arr[n].sym = s ? s->name : (Sym)0;
arr[n].addend = r->addend;
+ arr[n].target_sec = s ? s->section_id : OBJ_SEC_NONE;
+ arr[n].target_val = s ? s->value : 0;
++n;
}
if (n > 1) qsort(arr, n, sizeof(SecReloc), cmp_secreloc);
@@ -909,12 +913,40 @@ static int is_btarget(const EmitCtx* x, u32 off) {
return 0;
}
-/* Pre-scan: collect in-section branch targets of un-relocated local branches. */
-static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm,
- const SecReloc* relocs, u32 nrelocs,
- const u8* data, u32 total, u32* n_out) {
+/* Append `off` to a dynamic, deduplicated anchor array (arena-grown). */
+static void anchor_add(Compiler* c, u32** arr, u32* n, u32* cap, u32 off) {
+ u32 j;
+ for (j = 0; j < *n; ++j)
+ if ((*arr)[j] == off) return;
+ if (*n == *cap) {
+ u32 nc = *cap ? *cap * 2 : 8;
+ u32* na = arena_array(c->tu, u32, nc);
+ if (!na) return;
+ if (*arr) memcpy(na, *arr, *cap * sizeof(u32));
+ *arr = na;
+ *cap = nc;
+ }
+ (*arr)[(*n)++] = off;
+}
+
+/* Pre-scan: offsets in section `sec_id` that need a synthesized Lcf_ label so a
+ * layout-dependent reference resolves symbolically through any assembler:
+ * 1. targets of un-relocated intra-section local branches (b/jmp/jcc);
+ * 2. targets of un-relocated PC-relative code-address-takes (x86-64 `leaq
+ * disp(%rip)` for `&&label`);
+ * 3. offsets targeted by an absolute data-pointer relocation living in a
+ * NON-executable section (switch jump-table `.quad fn+off` entries).
+ * (2)/(3) are exactly the references that break when the assembler picks
+ * different instruction lengths than cfree did, so they are collected only for
+ * arches that need symbolic code refs (x86-64); fixed-width (aarch64) or
+ * layout-pinned (RISC-V .option norvc) arches keep the compact offset forms. */
+static u32* collect_code_anchors(Compiler* c, ObjBuilder* ob, ObjSecId sec_id,
+ ArchDisasm* dasm, const SecReloc* relocs,
+ u32 nrelocs, const u8* data, u32 total,
+ u32* n_out) {
u32* arr = NULL;
u32 n = 0, cap = 0, off = 0;
+ int want_sym = arch_needs_symbolic_code_refs(c);
*n_out = 0;
while (off < total) {
@@ -925,30 +957,44 @@ static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm,
off += 1;
continue;
}
- if (!reloc_in_range(relocs, nrelocs, off, nb) &&
- arch_is_local_branch(c, insn.mnemonic) &&
- parse_hex_tail(insn.operands, &tgt) && tgt < total) {
- u32 j;
- int found = 0;
- for (j = 0; j < n; ++j)
- if (arr[j] == (u32)tgt) {
- found = 1;
- break;
- }
- if (!found) {
- if (n == cap) {
- u32 nc = cap ? cap * 2 : 8;
- u32* na = arena_array(c->tu, u32, nc);
- if (!na) break;
- if (arr) memcpy(na, arr, cap * sizeof(u32));
- arr = na;
- cap = nc;
+ if (!reloc_in_range(relocs, nrelocs, off, nb)) {
+ if (arch_is_local_branch(c, insn.mnemonic) &&
+ parse_hex_tail(insn.operands, &tgt) && tgt < total) {
+ anchor_add(c, &arr, &n, &cap, (u32)tgt);
+ } else if (want_sym) {
+ i64 disp;
+ if (arch_pcrel_code_target(c, insn.mnemonic, insn.operands, &disp)) {
+ i64 t = (i64)off + (i64)nb + disp;
+ if (t >= 0 && (u64)t < total)
+ anchor_add(c, &arr, &n, &cap, (u32)t);
}
- arr[n++] = (u32)tgt;
}
}
off += nb;
}
+
+ if (want_sym) {
+ u32 nr = obj_reloc_total(ob), i;
+ for (i = 0; i < nr; ++i) {
+ const Reloc* r = obj_reloc_at(ob, i);
+ const Section* host;
+ const ObjSym* s;
+ const char* dir;
+ u32 width;
+ int pcrel;
+ i64 t;
+ if (!r || r->removed) continue;
+ host = obj_section_get(ob, r->section_id);
+ if (!host || (host->flags & SF_EXEC)) continue; /* code reloc: skip */
+ if (!data_reloc_directive(r->kind, &dir, &width, &pcrel) || pcrel)
+ continue; /* only absolute data pointers (jump-table entries) */
+ s = obj_symbol_get(ob, r->sym);
+ if (!s || s->section_id != sec_id) continue;
+ t = (i64)s->value + r->addend;
+ if (t >= 0 && (u64)t < total) anchor_add(c, &arr, &n, &cap, (u32)t);
+ }
+ }
+
if (n > 1) qsort(arr, n, sizeof(u32), cmp_u32);
*n_out = n;
return arr;
@@ -998,10 +1044,55 @@ static CfreeStatus emit_operands(Writer* w, const EmitCtx* x,
return w_symbolized(w, insn->operands.s, insn->operands.len, name,
ARCH_RELOC_SURG_TAIL);
}
+ } else {
+ /* Un-relocated PC-relative code-address-take (x86-64 `leaq disp(%rip)` for
+ * `&&label`): rewrite the fixed displacement to the synthesized target
+ * label so an encoding-divergent assembler recomputes it. */
+ i64 disp;
+ if (arch_pcrel_code_target(x->c, insn->mnemonic, insn->operands, &disp)) {
+ i64 t = (i64)off + (i64)insn->nbytes + disp;
+ if (t >= 0 && is_btarget(x, (u32)t)) {
+ char name[256];
+ build_label_name(name, sizeof name, x, (u32)t);
+ return w_symbolized(w, insn->operands.s, insn->operands.len, name,
+ ARCH_RELOC_SURG_RIP);
+ }
+ }
}
return cfree_writer_write(w, insn->operands.s, insn->operands.len);
}
+/* Symbolic name for a code location (target_sec:target_off) referenced from a
+ * data directive: an assemblable label defined exactly there if one exists,
+ * else the synthesized `Lcf_<sec>_<off>` that collect_code_anchors guarantees
+ * is emitted in the target section. Mirrors the synth-vs-real choice the label
+ * emitter makes (symbol_at / build_label_name), so both ends agree. */
+static u32 code_target_label(char* buf, u32 cap, Compiler* c, ObjBuilder* ob,
+ ObjSecId target_sec, u32 target_off) {
+ ObjSymIter* it = obj_symiter_new(ob);
+ if (it) {
+ ObjSymEntry e;
+ while (obj_symiter_next(it, &e)) {
+ const ObjSym* s = e.sym;
+ Slice nm;
+ if (!s || s->removed || !s->name) continue;
+ if (s->section_id != target_sec || (u32)s->value != target_off) continue;
+ if (s->kind == SK_SECTION || s->kind == SK_FILE) continue;
+ nm = pool_slice(c->global, s->name);
+ if (slice_eq_cstr(nm, ".LpcrelHi")) continue;
+ if (sym_is_assemblable(nm)) {
+ u32 p = 0, j;
+ for (j = 0; j < nm.len && p + 1 < cap; ++j) buf[p++] = nm.s[j];
+ buf[p] = '\0';
+ obj_symiter_free(it);
+ return p;
+ }
+ }
+ obj_symiter_free(it);
+ }
+ return fmt_synth_label(buf, cap, (u32)target_sec, target_off);
+}
+
/* Emit a data range, rendering any covered relocation as a symbolic integer
* directive (`.quad sym+addend`) so cc -S | as reproduces the data relocation
* table — switch jump tables (R_ABS64 against the function) and any other
@@ -1009,9 +1100,9 @@ static CfreeStatus emit_operands(Writer* w, const EmitCtx* x,
* target the assembler can't spell, falls back to raw `.byte`; the dropped
* reloc then surfaces in the round-trip's reloc comparison. `relocs` is the
* section's relocation list, sorted by offset. */
-static CfreeStatus emit_data_range(Writer* w, Compiler* c, const u8* data,
- u32 start, u32 end, const SecReloc* relocs,
- u32 nrelocs) {
+static CfreeStatus emit_data_range(Writer* w, Compiler* c, ObjBuilder* ob,
+ const u8* data, u32 start, u32 end,
+ const SecReloc* relocs, u32 nrelocs) {
u32 off = start;
while (off < end) {
const SecReloc* r = NULL;
@@ -1037,6 +1128,33 @@ static CfreeStatus emit_data_range(Writer* w, Compiler* c, const u8* data,
* re-derives R_PC{32,64} instead of an absolute reloc. */
ArchRelocOperand bare = {ARCH_RELOC_SURG_NONE, "", "", 0, 0, 0};
if (data_reloc_directive(r->kind, &dir, &width, &pcrel) &&
+ off + width <= end) {
+ const Section* tsec = (r->target_sec != OBJ_SEC_NONE)
+ ? obj_section_get(ob, r->target_sec)
+ : NULL;
+ /* An absolute pointer into executable code (switch jump-table entry):
+ * spell it as a label that moves with the code rather than `fn+off`.
+ * After an encoding-divergent assembler re-lays-out the function, a
+ * fixed offset would point into the wrong instruction; a label is
+ * recomputed to the correct address. Only for arches that need it. */
+ if (!pcrel && tsec && (tsec->flags & SF_EXEC) &&
+ arch_needs_symbolic_code_refs(c)) {
+ char label[256];
+ u64 toff = r->target_val + (u64)r->addend;
+ CfreeStatus st;
+ code_target_label(label, sizeof label, c, ob, r->target_sec,
+ (u32)toff);
+ st = w_str(w, dir);
+ if (st != CFREE_OK) return st;
+ st = w_str(w, label);
+ if (st != CFREE_OK) return st;
+ st = w_newline(w);
+ if (st != CFREE_OK) return st;
+ off += width;
+ continue;
+ }
+ }
+ if (data_reloc_directive(r->kind, &dir, &width, &pcrel) &&
off + width <= end &&
build_symref(symref, sizeof symref, c, &bare, r->sym, r->addend) >=
0) {
@@ -1198,6 +1316,13 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder,
sx.c = c;
nsec = obj_section_count(ob);
+ /* Arch-specific leading directives (e.g. RISC-V `.option norvc` to pin
+ * cfree's fixed instruction layout against a compressing assembler). */
+ {
+ const char* prologue = arch_asm_file_prologue(c);
+ if (prologue) w_str(w, prologue);
+ }
+
for (i = 1; i < nsec; ++i) {
const Section* sec = obj_section_get(ob, (ObjSecId)i);
SymLabel* labels;
@@ -1243,8 +1368,8 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder,
buf_flatten(&sec->bytes, heap_data);
flat_data = heap_data;
if (dasm)
- btargets = collect_branch_targets(c, dasm, relocs, nrelocs, flat_data,
- total, &nbt);
+ btargets = collect_code_anchors(c, ob, (ObjSecId)i, dasm, relocs,
+ nrelocs, flat_data, total, &nbt);
}
} else if (total > 0 && sec->kind != SEC_BSS) {
Heap* heap = c->ctx->heap;
@@ -1297,7 +1422,7 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder,
} else if ((sec->flags & SF_EXEC) && dasm && flat_data) {
emit_disasm_range(w, &ctx, dasm, flat_data, off, next);
} else if (flat_data) {
- emit_data_range(w, c, flat_data, off, next, relocs, nrelocs);
+ emit_data_range(w, c, ob, flat_data, off, next, relocs, nrelocs);
}
off = next;
}
diff --git a/src/arch/arch.h b/src/arch/arch.h
@@ -244,6 +244,28 @@ typedef struct ArchAsmOps {
* pair fusion for the arch. */
int (*reloc_call_pair)(u16 reloc_kind, CfreeSlice pair_mnemonic,
CfreeSlice pair_ops, const char** mnemonic_out);
+ /* Arch-specific leading directives emitted at the very top of a cc -S file,
+ * before any section, returned as a NUL-terminated string the printer writes
+ * verbatim (NULL = none). RISC-V returns "\t.option norvc\n.option norelax\n":
+ * cfree's codegen computes some PC-relative label / jump-table targets as
+ * fixed byte offsets that assume its own uncompressed, un-relaxed instruction
+ * stream, so a third-party assembler (clang) must be told not to compress or
+ * relax, or those offsets shift and the targets break. aarch64/x86-64 have
+ * fixed-width encodings and no such layout dependence -> NULL. */
+ const char* (*file_prologue)(void);
+ /* 1 if (mnemonic, operands) is an un-relocated PC-relative reference to a
+ * code address computed as a fixed displacement — x86-64 `leaq disp(%rip),
+ * reg` emitted for a `&&label` address-take. Sets *disp_out to the signed
+ * byte displacement from the END of the instruction to the target. The
+ * symbolizer then synthesizes a label at (insn_end + disp) and rewrites the
+ * displacement to that label so a re-encoding assembler recomputes it.
+ * Providing this hook ALSO opts the arch into symbolic switch jump-table
+ * entries (.quad fn+off -> .quad <label>): both are needed precisely when the
+ * arch's assembler may pick different instruction lengths than cfree did
+ * (x86-64 movabs/mov-imm32, jmp rel32/rel8). Fixed-width arches (aarch64) and
+ * arches that pin layout another way (RISC-V .option norvc) leave it NULL. */
+ int (*pcrel_code_target)(CfreeSlice mnemonic, CfreeSlice operands,
+ i64* disp_out);
} ArchAsmOps;
typedef struct ArchImpl {
@@ -312,6 +334,23 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind,
CfreeSlice pair_mnemonic, CfreeSlice pair_ops,
const char** mnemonic_out);
+/* Leading directive string for the top of a cc -S file for the compiler's
+ * target arch (e.g. RISC-V `.option norvc`), or NULL when the arch needs none.
+ * Thin dispatch over ArchAsmOps.file_prologue. */
+const char* arch_asm_file_prologue(const Compiler* c);
+
+/* 1 if `insn` is an un-relocated PC-relative code-address-take for the target
+ * arch, with *disp_out set to the signed displacement from the instruction end
+ * to the target. Thin dispatch over ArchAsmOps.pcrel_code_target. */
+int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic,
+ CfreeSlice operands, i64* disp_out);
+
+/* 1 if the target arch needs code locations referenced symbolically (by label)
+ * rather than as fixed byte offsets in cc -S — true exactly for arches that
+ * provide pcrel_code_target (x86-64). Drives both `&&label` address-take and
+ * switch jump-table symbolization. */
+int arch_needs_symbolic_code_refs(const Compiler* c);
+
ArchDisasm* arch_disasm_new(Compiler*);
u32 arch_disasm_decode(ArchDisasm*, const u8* bytes, size_t len, u64 vaddr,
CfreeInsn* out);
diff --git a/src/arch/registry.c b/src/arch/registry.c
@@ -110,6 +110,24 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind,
mnemonic_out);
}
+const char* arch_asm_file_prologue(const Compiler* c) {
+ const ArchImpl* a = arch_for_compiler(c);
+ if (!a || !a->asm_ops || !a->asm_ops->file_prologue) return NULL;
+ return a->asm_ops->file_prologue();
+}
+
+int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic,
+ CfreeSlice operands, i64* disp_out) {
+ const ArchImpl* a = arch_for_compiler(c);
+ if (!a || !a->asm_ops || !a->asm_ops->pcrel_code_target) return 0;
+ return a->asm_ops->pcrel_code_target(mnemonic, operands, disp_out);
+}
+
+int arch_needs_symbolic_code_refs(const Compiler* c) {
+ const ArchImpl* a = arch_for_compiler(c);
+ return a && a->asm_ops && a->asm_ops->pcrel_code_target != NULL;
+}
+
const CGBackend* cg_backend_for_session(const Compiler* c,
const CfreeCodeOptions* opts) {
if (opts && opts->check_only) {
diff --git a/src/arch/rv64/asm.c b/src/arch/rv64/asm.c
@@ -1110,10 +1110,26 @@ static int rv64_reloc_call_pair(u16 kind, CfreeSlice pair_mnemonic,
return 0;
}
+/* RISC-V cc -S file prologue. cfree computes a few PC-relative targets as
+ * fixed byte offsets baked into the instruction stream rather than as symbolic
+ * relocations: a `&&label` address-of (auipc+addi with a hardcoded immediate,
+ * no reloc) and switch jump-table entries (`.quad fn+offset`). Both assume
+ * cfree's own 4-byte-per-instruction, un-relaxed layout. A standards-conformant
+ * assembler such as clang defaults to the C extension and would compress
+ * instructions (e.g. `mv`->`c.mv`), shifting every later offset and sending
+ * those targets to the wrong place. `.option norvc`/`.option norelax` pin the
+ * layout so cfree's offsets stay valid through any assembler — cfree's own
+ * codegen never emits compressed/relaxed forms, so this only constrains a
+ * third party to match what cfree already does. */
+static const char* rv64_file_prologue(void) {
+ return "\t.option norvc\n\t.option norelax\n";
+}
+
const ArchAsmOps rv64_asm_ops = {
.reloc_operand = rv64_reloc_operand,
.is_local_branch = rv64_is_local_branch,
.reloc_call_pair = rv64_reloc_call_pair,
+ .file_prologue = rv64_file_prologue,
};
ArchAsm* rv64_arch_asm_new(Compiler* c) {
diff --git a/src/arch/x64/asm.c b/src/arch/x64/asm.c
@@ -1637,9 +1637,75 @@ static int x64_is_local_branch(CfreeSlice m) {
return 0;
}
+/* Parse a leading signed integer (decimal or 0x-hex) from [s, s+len). Returns
+ * chars consumed and sets *out, or 0 if no integer starts here. */
+static u32 x64_parse_leading_int(const char* s, u32 len, i64* out) {
+ u32 i = 0, start;
+ int neg = 0;
+ i64 v = 0;
+ if (i < len && (s[i] == '+' || s[i] == '-')) {
+ neg = (s[i] == '-');
+ ++i;
+ }
+ if (i + 1 < len && s[i] == '0' && (s[i + 1] == 'x' || s[i + 1] == 'X')) {
+ i += 2;
+ start = i;
+ for (; i < len; ++i) {
+ char c = s[i];
+ if (c >= '0' && c <= '9')
+ v = v * 16 + (c - '0');
+ else if (c >= 'a' && c <= 'f')
+ v = v * 16 + (c - 'a' + 10);
+ else if (c >= 'A' && c <= 'F')
+ v = v * 16 + (c - 'A' + 10);
+ else
+ break;
+ }
+ } else {
+ start = i;
+ for (; i < len; ++i) {
+ char c = s[i];
+ if (c >= '0' && c <= '9')
+ v = v * 10 + (c - '0');
+ else
+ break;
+ }
+ }
+ if (i == start) return 0;
+ *out = neg ? -v : v;
+ return i;
+}
+
+/* x86-64 `&&label` address-take: an un-relocated `leaq <disp>(%rip), %reg`. The
+ * disassembler renders the resolved target as a fixed displacement from the
+ * next instruction (the %rip base); report it so the symbolizer can swap in a
+ * label that an encoding-divergent assembler will recompute correctly. */
+static int x64_pcrel_code_target(CfreeSlice mnemonic, CfreeSlice operands,
+ i64* disp_out) {
+ const char* o = operands.s;
+ u32 ol = operands.len, i, n;
+ i64 disp = 0;
+ int has_rip = 0;
+ if (!(mnemonic.len == 4 && memcmp(mnemonic.s, "leaq", 4) == 0) &&
+ !(mnemonic.len == 3 && memcmp(mnemonic.s, "lea", 3) == 0))
+ return 0;
+ for (i = 0; i + 6 <= ol; ++i)
+ if (memcmp(o + i, "(%rip)", 6) == 0) {
+ has_rip = 1;
+ break;
+ }
+ if (!has_rip) return 0;
+ n = x64_parse_leading_int(o, ol, &disp);
+ /* The displacement must sit immediately before `(%rip)`. */
+ if (n == 0 || !(n + 6 <= ol && memcmp(o + n, "(%rip)", 6) == 0)) return 0;
+ *disp_out = disp;
+ return 1;
+}
+
const ArchAsmOps x64_asm_ops = {
.reloc_operand = x64_reloc_operand,
.is_local_branch = x64_is_local_branch,
+ .pcrel_code_target = x64_pcrel_code_target,
};
ArchAsm* x64_arch_asm_new(Compiler* c) { return &x64_asm_open(c)->base; }
diff --git a/src/asm/asm.c b/src/asm/asm.c
@@ -1180,7 +1180,13 @@ static void do_directive(AsmDriver* d, Sym name) {
sym_eq(d, name, "subsections_via_symbols") || sym_eq(d, name, "macro") ||
sym_eq(d, name, "endm") || sym_eq(d, name, "if") ||
sym_eq(d, name, "endif") || sym_eq(d, name, "else") ||
- sym_eq(d, name, "include")) {
+ sym_eq(d, name, "include") ||
+ /* RISC-V `.option rvc/norvc/relax/norelax/push/pop/...`: cfree's own
+ * cc -S emits `.option norvc`/`.option norelax` to pin its fixed
+ * instruction layout (see rv64_file_prologue). cfree-as never compresses
+ * or relaxes, so it already honors these implicitly — accept and ignore
+ * rather than treat as an unknown directive. */
+ sym_eq(d, name, "option")) {
d_skip_to_eol(d);
return;
}
diff --git a/test/asm/hostas_cross.sh b/test/asm/hostas_cross.sh
@@ -27,23 +27,24 @@
# clang cross-compiler for it, (2) a runner (podman/qemu) per exec_target, (3) a
# working `cfree cc -S | cfree as` round-trip for that arch, and (4) a bounded
# exec smoke that returns the oracle. So the harness runs green on whatever the
-# host supports and self-extends as gaps close. Status at time of writing:
-# - aarch64-linux: works end-to-end (podman runs arm64 natively in its VM).
-# This is the gating default (312/312 both lanes).
-# - x86_64-linux: `cc -S | cfree as` round-trips and CROSS-EXECS the whole
-# corpus correctly (cfree-as 312/312). The clang lane has a
-# small residue (~11 efail: cfree emits AT&T text clang
-# rejects). Opt-in: the global clang gate (ENFORCE_CLANG=1)
-# would fail on that residue, so x64 isn't in the gating
-# default yet — run with CFREE_HOSTAS_CROSS_TARGETS, optionally
-# CFREE_HOSTAS_ENFORCE_CLANG=0.
-# - riscv64-linux: `cc -S | cfree as` round-trips and CROSS-EXECS correctly
-# (cfree-as 312/312) — the rv64 symbolizer (ArchAsmOps with
-# %pcrel_hi/%pcrel_lo anchor pairing + AUIPC/JALR call fusion)
-# landed; the earlier self-call hang is gone. The clang lane
-# has a larger residue (~58 efail: rv64 data-symbolization
-# syntax + bare-fcvt rounding-mode that clang encodes
-# differently). Opt-in, same as x64.
+# host supports and self-extends as gaps close. All three ELF targets now pass
+# BOTH lanes end-to-end (936/936 = 312 cases x {O0,O1} x 3 arches, ENFORCE_CLANG):
+# - aarch64-linux: podman runs arm64 natively in its VM; fixed-width encodings
+# keep cfree's layout, so code references need no special form.
+# - x86_64-linux: cc -S references code locations symbolically — `&&label`
+# address-takes (`leaq Lcf_*(%rip)`) and switch jump-table
+# entries (`.quad Lcf_*`) — so clang's encoding choices
+# (movabs vs mov-imm32, jmp rel32 vs rel8) can't shift a fixed
+# offset onto the wrong instruction. (ArchAsmOps.pcrel_code_target
+# + collect_code_anchors; see src/api/asm_emit.c.)
+# - riscv64-linux: cc -S emits `.option norvc`/`.option norelax` to pin cfree's
+# fixed instruction layout against clang's C-extension
+# compression, plus the %pcrel_hi/%pcrel_lo + AUIPC/JALR call
+# symbolizer.
+# Execution under qemu-user (x86_64/riscv64 in their podman containers) is the
+# sole judge — cfree and clang emit different code, so a byte/text match would be
+# meaningless. The batched runner caps each case (EXEC_CASE_TIMEOUT) so one
+# hanging binary can't wedge the whole container.
#
# Override the matrix with CFREE_HOSTAS_CROSS_TARGETS="tag:triple ..." and the
# clang-as gate with CFREE_HOSTAS_ENFORCE_CLANG=0 (demote lane B to XFAIL).
@@ -60,12 +61,11 @@ FILTER="${1:-}"
ENFORCE_CLANG="${CFREE_HOSTAS_ENFORCE_CLANG:-1}"
EXEC_SMOKE_TIMEOUT="${CFREE_HOSTAS_EXEC_TIMEOUT:-45}"
-# "tag:triple" — tag is exec_target.sh's <arch>-<os> spelling. The gating
-# default is the fully-verified target (aarch64-linux). x86_64 and riscv64 are
-# wired and opt-in (see the status notes above) — add them with
-# CFREE_HOSTAS_CROSS_TARGETS once you want to exercise their in-progress lanes:
-# CFREE_HOSTAS_CROSS_TARGETS="x64-linux:x86_64-linux-gnu rv64-linux:riscv64-linux-gnu"
-TARGETS="${CFREE_HOSTAS_CROSS_TARGETS:-aarch64-linux:aarch64-linux-gnu}"
+# "tag:triple" — tag is exec_target.sh's <arch>-<os> spelling. All three ELF
+# targets are in the gating default (each SKIPs cleanly if its clang cross
+# target or container runner is unavailable). Narrow the matrix with
+# CFREE_HOSTAS_CROSS_TARGETS, e.g. CFREE_HOSTAS_CROSS_TARGETS="x64-linux:x86_64-linux-gnu".
+TARGETS="${CFREE_HOSTAS_CROSS_TARGETS:-aarch64-linux:aarch64-linux-gnu x64-linux:x86_64-linux-gnu rv64-linux:riscv64-linux-gnu}"
# Same TLS-symbolization skip as the sibling lanes.
SKIP="141_threadlocal_mutate"
diff --git a/test/lib/exec_target.sh b/test/lib/exec_target.sh
@@ -287,7 +287,15 @@ _exec_target_flush_tag() {
echo "exec_target_flush: EXEC_TARGET_MOUNT_ROOT must be set" >&2
return 2
fi
- local platform image platform_flag=()
+ local platform image platform_flag=() case_to
+ # Per-case wall-clock cap inside the batched container. Without it a
+ # single hanging exe (e.g. a miscompiled loop, or qemu-user wedging on
+ # one binary) blocks the whole single-container run, leaving every
+ # later case with no .rc — which the caller reads back as 127 and
+ # reports as a mass failure. With it, a hang is killed (rc 137) and the
+ # loop moves on, so a real hang fails exactly one case. Override with
+ # EXEC_CASE_TIMEOUT (seconds); generous by default for slow TCG.
+ case_to="${EXEC_CASE_TIMEOUT:-20}"
platform="$(_exec_target_platform "$tag")"
image="$(_exec_target_image "$tag")"
if ! _exec_target_podman_native "$tag"; then
@@ -307,12 +315,15 @@ _exec_target_flush_tag() {
"${EXEC_TARGET_RCS[$k]}"
done
} | podman run -i --rm --pull=never "${platform_flag[@]}" --net=none \
+ -e EXEC_CASE_TIMEOUT="$case_to" \
-v "$EXEC_TARGET_MOUNT_ROOT":"$EXEC_TARGET_MOUNT_ROOT":Z \
"$image" \
/bin/sh -c '
set -u
+_to="${EXEC_CASE_TIMEOUT:-20}"
+if command -v timeout >/dev/null 2>&1; then _t="timeout -s KILL $_to"; else _t=""; fi
while IFS=" " read -r exe out err rc; do
- "$exe" >"$out" 2>"$err"
+ $_t "$exe" >"$out" 2>"$err"
echo $? >"$rc"
done
'