kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit dad429ccbf9c02a28c8922d683832bff9ab12d2d
parent cf009085124f0bab052ea6f5201030ae727c7d5f
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 07:56:21 -0700

asm: phase-1 test harness scaffolding (test/asm + cg path S)

Stand up the asm/disasm test runner before any compiler-side work, so
phases 2-4 can gate on real harness runs from their first commit.
Every smoke case carries a .skip sidecar today because parse_asm /
cfree_disasm_iter_* / cfree_obj_disasm are still stubs in
src/api/stubs.c; the harness wiring builds and runs on every CI pass.

- test/asm/harness/asm_runner.c: 5 modes (--encode, --decode,
  --listing, --emit, --jit) over the public cfree.h surface. The
  three text-output modes drive the goldens; --emit + --jit reuse the
  test/link harness binaries so D/J/E exec paths work the same way
  test/parse does.
- test/asm/run.sh: HTLDJE path matrix. H/T/L are text goldens
  (encode/decode/listing); D/J/E run the assembled code (direct JIT,
  jit-runner, qemu/podman exec) for encode/ cases that ship an
  .expected exit-code sidecar.
- test-asm target in test/test.mk, included in default 'test'.
- test/cg/run.sh recognizes S (asm roundtrip across cg-emitted bytes);
  S is opt-in this phase so the default DREJW matrix is unchanged.

Per doc/ASM.md §5 phase 1; exit criteria met (make test-asm runs
end-to-end with 6 skips; cg path S reports SKIP cleanly).

Diffstat:
Mdoc/ASM.md | 131++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
Atest/asm/CORPUS.md | 76++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/asm/decode/nop_ret.expected.txt | 2++
Atest/asm/decode/nop_ret.hex | 1+
Atest/asm/decode/nop_ret.skip | 1+
Atest/asm/encode/exit_zero.expected | 1+
Atest/asm/encode/exit_zero.expected.hex | 1+
Atest/asm/encode/exit_zero.s | 7+++++++
Atest/asm/encode/exit_zero.skip | 1+
Atest/asm/harness/asm_runner.c | 724+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/asm/listing/nop_ret.expected.lst | 5+++++
Atest/asm/listing/nop_ret.in.bin | 0
Atest/asm/listing/nop_ret.skip | 1+
Atest/asm/regen.sh | 89+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/asm/run.sh | 454+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/cg/run.sh | 27++++++++++++++++++++++++---
Mtest/test.mk | 16++++++++++++++--
17 files changed, 1507 insertions(+), 30 deletions(-)

diff --git a/doc/ASM.md b/doc/ASM.md @@ -297,34 +297,46 @@ encode/decode pairing as a mechanical refactor; phase 3 is the standalone assembler; phase 4 is inline asm + disasm overlay; phase 5 is the seam-rev for x64/rv64. -### Phase 1 — test harness +### Phase 1 — test harness (DONE) Stand up the runner before any compiler-side work. No `src/` changes. -1. New `test/asm/` peer of `test/parse/`. One `run.sh`; three - sub-corpora (§6). Skip-vs-fail follows the `CFREE_TEST_ALLOW_SKIP` - convention used elsewhere — every case skips cleanly today because - `parse_asm` and `cfree_disasm_iter_*` are stubs. -2. New `test-asm` target in `test/test.mk`; added to the default - `test` list. -3. Add `S` (asm-roundtrip) path letter to `test/cg/run.sh`. Plumbed - to walk every `.text` byte of each cg-emitted aarch64 binary - through `cfree_disasm_iter_*` → `cfree as` → byte-compare. Skips - today; turns green when phases 3+4 land. Path matrix becomes - `DREJWS`. -4. Smoke goldens checked in for one case per sub-corpus. Generated - from the host `as` / `objdump` once and committed; a `regen.sh` - documents how to refresh but is not run by default (same - convention as `test/elf/normalize.py`). -5. New runner C binary `asm-runner` under `test/asm/harness/` — - peer of `cg-runner`. Three sub-commands: `--encode`, `--decode`, - `--listing`. Dispatches to `cfree as` / `cfree_disasm_iter_*` / - `cfree_obj_disasm` per case. - -Exit criterion: `make test-asm` runs end-to-end; smoke cases report -SKIP under `CFREE_TEST_ALLOW_SKIP=1` (the default during phase 1) and -the harness wiring is exercised on every CI run. `test/cg/run.sh -S` -also reports SKIP cleanly. No green asm cases yet — that's phase 3. +- [x] New `test/asm/` peer of `test/parse/`. One `run.sh`; three + sub-corpora (`encode/`, `decode/`, `listing/`). Skip-vs-fail follows + the `CFREE_TEST_ALLOW_SKIP` convention used elsewhere — every case + skips cleanly today because `parse_asm` and `cfree_disasm_iter_*` + are stubs. `CFREE_TEST_ALLOW_SKIP` defaults to `1` in the asm + harness for the duration of phase 1; flip to `0` once the assembler + and disasm iterator are real. +- [x] New `test-asm` target in `test/test.mk`; added to the default + `test` list. +- [x] Add `S` (asm-roundtrip) path letter to `test/cg/run.sh`. Skips + today; turns green when phases 3+4 land. Recognized in the path + matrix; `S` is opt-in (`run.sh '' S` or + `CFREE_TEST_PATHS=DREJWS`) until phase 4 lands, so the default + `DREJW` continues to gate CI cleanly. Becomes part of the default + matrix in phase 4. +- [x] Smoke goldens checked in for one case per sub-corpus. A + `test/asm/regen.sh` documents how to refresh them from the host + `clang --target=aarch64-linux-gnu` / `llvm-objdump`; it is committed + as a maintainer aid and is not run by CI (same convention as + `test/elf/normalize.py`). +- [x] New runner C binary `asm-runner` under `test/asm/harness/` — + peer of `parse-runner`. Five sub-commands: `--encode`, `--decode`, + `--listing`, `--emit`, `--jit`. The first three dispatch to + `cfree_compile_obj_emit(CFREE_LANG_ASM)` / `cfree_disasm_iter_*` / + `cfree_obj_disasm`; `--emit` writes a `.o` to disk so the J and E + exec paths can reuse the `test/link` harness binaries; `--jit` + parses + JIT-links and calls `test_main`. +- [x] Path matrix for `test/asm/run.sh`: `HTLDJE`. `H` hex encode, + `T` text decode, `L` listing, `D` direct JIT, `J` jit-via-file, + `E` ELF exec under qemu/podman. D/J/E only run on `encode/` + cases with an `<name>.expected` exit-code sidecar. + +Exit criterion (met): `make test-asm` runs end-to-end; the three smoke +cases report SKIP for every path they apply to and the harness +wiring is exercised on every CI run. `bash test/cg/run.sh '' S` also +reports SKIP cleanly. No green asm cases yet — that's phase 3. ### Phase 2 — finish the ISA descriptor table @@ -485,3 +497,72 @@ multi-call dispatch is exercised end-to-end. C11-freestanding-writable. `parse_asm.c` and `aa64_asm.c` follow the same rule. No reliance on a host assembler at build time *for the compiler*; `rt/` still uses clang and is on its own bootstrap track. + +--- + +## 8. Running the tests + +``` +make test-asm # full asm harness: all paths, all sub-corpora +make test # includes test-asm in the default suite +``` + +The harness lives in `test/asm/`. See `test/asm/CORPUS.md` for the +sub-corpus layout and `test/asm/regen.sh` for golden refresh. + +### Filtering and path selection + +``` +bash test/asm/run.sh # default: every case, HTLDJE +bash test/asm/run.sh nop # name substring filter +bash test/asm/run.sh '' HT # only H (hex encode) + T (decode) +bash test/asm/run.sh exit_zero DJE # exec paths for one case +CFREE_TEST_FILTER=nop CFREE_TEST_PATHS=L bash test/asm/run.sh +``` + +Path letters: + +| letter | path | input | check | +|--------|------------------|--------------|--------------------------------| +| `H` | Hex encode | `encode/*.s` | `--encode` → diff `.expected.hex` | +| `T` | Text decode | `decode/*.hex` | `--decode` → diff `.expected.txt` | +| `L` | Listing | `listing/*.in.bin` | `--listing` → diff `.expected.lst` | +| `D` | Direct JIT | `encode/*.s` (with `.expected` exit) | `--jit` → exit code | +| `J` | JIT via file | `encode/*.s` (with `.expected` exit) | `--emit` + `jit-runner` | +| `E` | ELF exec | `encode/*.s` (with `.expected` exit) | `--emit` + `link-exe-runner` + qemu/podman | + +D and J need the host arch to match `CFREE_TEST_ARCH` (no cross-JIT); +E uses qemu/podman per `test/lib/exec_target.sh` and is cross-host +friendly. + +### Skips during phase 1 + +Every smoke case carries a `<name>.skip` sidecar because `parse_asm` / +`cfree_disasm_iter_*` / `cfree_obj_disasm` are still stubs. The +harness defaults `CFREE_TEST_ALLOW_SKIP=1` so the suite passes; set +`CFREE_TEST_ALLOW_SKIP=0` to surface the skips as failures +(`make test-asm CFREE_TEST_ALLOW_SKIP=0`). Drop the `.skip` files as +each subsystem comes online — the goldens are already in place. + +### Cross-target + +``` +CFREE_TEST_ARCH=aa64 bash test/asm/run.sh # default +CFREE_TEST_ARCH=x64 bash test/asm/run.sh # x64 lane (no green cases yet) +CFREE_TEST_ARCH=rv64 bash test/asm/run.sh # rv64 lane +``` + +### The `S` path on `test/cg/run.sh` + +`S` (asm roundtrip across every cg-emitted aarch64 binary) is +recognized but opt-in this phase — the default cg matrix stays +`DREJW`. Run it explicitly: + +``` +bash test/cg/run.sh '' DREJWS # full matrix incl. S +bash test/cg/run.sh '' S # just S +``` + +Today every `S` invocation reports SKIP with reason +"phase 1: cfree_disasm_iter_* / parse_asm are stubs". Becomes part of +the default cg matrix once phase 4 lands. diff --git a/test/asm/CORPUS.md b/test/asm/CORPUS.md @@ -0,0 +1,76 @@ +# test/asm — assembler / disassembler corpus + +File-driven test harness for `cfree`'s asm front end and matching +disassembler. Companion to `doc/ASM.md`; see phase 1 there for the +current state. + +## Layout + +Three sub-directories, one per static-output `asm-runner` mode. The +shell driver (`run.sh`) walks them by suffix. + +| sub-dir | input | goldens | drives | +|---------------|---------------------|------------------------|--------------| +| `encode/` | `<name>.s` | `<name>.expected.hex` (H) | H, D, J, E | +| | | `<name>.expected` (D/J/E exit) | | +| `decode/` | `<name>.hex` | `<name>.expected.txt` | T | +| `listing/` | `<name>.in.bin` | `<name>.expected.lst` | L | + +## Paths + +Default matrix `HTLDJE`: + +- `H` — Hex encode roundtrip. `asm-runner --encode IN.s` dumps `.text` + bytes as lowercase hex; diff vs `<name>.expected.hex`. Drives the + encoder against a known-good byte sequence. +- `T` — Text decode. `asm-runner --decode IN.hex` walks the bytes + through `cfree_disasm_iter_*`, one instruction per line; diff vs + `<name>.expected.txt`. +- `L` — Listing. `asm-runner --listing IN.in.bin` runs + `cfree_obj_disasm` against the ELF; diff vs `<name>.expected.lst`. +- `D` — Direct JIT. `asm-runner --jit IN.s` parses, JIT-links, and + calls `test_main`; exit code matches `<name>.expected`. No file I/O. + Host arch must match the cross-target. +- `J` — JIT via file. `asm-runner --emit` + `jit-runner` on the + resulting `.o`. Same expected exit code as D. Exercises elf_emit / + elf_read fidelity on asm-produced objects. +- `E` — ELF exec. `asm-runner --emit` + `link-exe-runner` + start.o, + run under qemu/podman per `test/lib/exec_target.sh`. Cross-host + friendly. + +Cases without a `<name>.expected` sidecar skip D/J/E silently — the +encode/decode/listing goldens stand alone and don't always make sense +to execute (e.g. `nop; ret` falls off the end of `_start`). + +The encode case's `.s` file must define a `test_main` global for D/J/E +to be meaningful — start.c calls `test_main` and uses its return value +as the exit code, mirroring the test/parse and test/cg conventions. + +## Sidecars + +- `<name>.expected` — integer exit code; consumed by D/J/E. Absent + → those paths skip. +- `<name>.expected.hex` — golden bytes for H. +- `<name>.expected.txt` — golden decode text for T. +- `<name>.expected.lst` — golden listing for L. +- `<name>.skip` — single-line reason. The case is reported as + SKIP for every path it applies to. Every phase-1 case carries one + because the underlying APIs (`parse_asm`, `cfree_disasm_iter_*`, + `cfree_obj_disasm`) are still stubs. They drop as the matching + subsystems land. + +## Goldens + +Goldens are checked in. `regen.sh` documents how to refresh them from +the host `as` / `objdump`; it is committed as a maintainer aid and not +run by CI. The phase-1 smoke set is intentionally tiny — one case per +sub-dir — because nothing exercises the goldens yet. Coverage expands +alongside phases 3 and 4 in `doc/ASM.md §5`. + +## Naming convention + +Cases are named `<concept>[_<variant>]` (e.g. `exit_zero`, `nop_ret`, +`mov_imm_w0`). When a case targets a specific encoding family, prefix +with the AArch64 spec section ID (`6_4_2_03_*`) so the corpus stays +grouped by spec topic, matching the pattern already used in +`test/parse/cases/`. diff --git a/test/asm/decode/nop_ret.expected.txt b/test/asm/decode/nop_ret.expected.txt @@ -0,0 +1,2 @@ +0: nop +4: ret diff --git a/test/asm/decode/nop_ret.hex b/test/asm/decode/nop_ret.hex @@ -0,0 +1 @@ +1f2003d5c0035fd6 diff --git a/test/asm/decode/nop_ret.skip b/test/asm/decode/nop_ret.skip @@ -0,0 +1 @@ +phase 1: cfree_disasm_iter_* is still a stub (src/api/stubs.c) diff --git a/test/asm/encode/exit_zero.expected b/test/asm/encode/exit_zero.expected @@ -0,0 +1 @@ +0 diff --git a/test/asm/encode/exit_zero.expected.hex b/test/asm/encode/exit_zero.expected.hex @@ -0,0 +1 @@ +00008052c0035fd6 diff --git a/test/asm/encode/exit_zero.s b/test/asm/encode/exit_zero.s @@ -0,0 +1,7 @@ +// Smoke case: defines test_main returning 0. Shared by path H (hex +// encode roundtrip) and paths D/J/E (exec). +.text +.globl test_main +test_main: + mov w0, #0 + ret diff --git a/test/asm/encode/exit_zero.skip b/test/asm/encode/exit_zero.skip @@ -0,0 +1 @@ +phase 1: parse_asm is still a stub (src/api/stubs.c) diff --git a/test/asm/harness/asm_runner.c b/test/asm/harness/asm_runner.c @@ -0,0 +1,724 @@ +/* asm-runner — file-driven assembler/disassembler test runner. + * + * asm-runner --encode IN.s OUT.hex # cfree as -> raw .text bytes (hex) + * asm-runner --decode IN.hex OUT.txt # cfree_disasm_iter_* over bytes + * asm-runner --listing IN.bin OUT.txt # cfree_obj_disasm over an ELF + * asm-runner --emit IN.s OUT.o # cfree_compile_obj_emit -> ELF .o + * asm-runner --jit IN.s # parse + link_jit -> test_main() + * + * Exclusively uses the public cfree.h surface (same path real driver + * consumers take). Built once; the shell runner walks the sub-corpora + * and invokes one mode per case-path pair. + * + * Phase 1: parse_asm and the disasm iterator are still stubs in + * src/api/stubs.c. The runner returns nonzero when the underlying API + * fails; smoke cases each carry a .skip sidecar so the harness reports + * them cleanly until phases 3 and 4 land. + * + * The execmem (W^X) boilerplate mirrors test/parse/harness/parse_runner.c + * — strict dual-mapping on Apple/Linux, single mapping elsewhere. Only + * --jit exercises it. */ + +#include <cfree.h> +#include <ctype.h> +#include <fcntl.h> +#include <stdarg.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mman.h> +#include <sys/stat.h> +#include <unistd.h> + +#include "lib/cfree_test_target.h" + +/* ---- env: heap, diag ---- */ + +static void* h_alloc(CfreeHeap* h, size_t n, size_t a) { + (void)h; + (void)a; + return n ? malloc(n) : NULL; +} +static void* h_realloc(CfreeHeap* h, void* p, size_t o, size_t n, size_t a) { + (void)h; + (void)o; + (void)a; + return realloc(p, n); +} +static void h_free(CfreeHeap* h, void* p, size_t n) { + (void)h; + (void)n; + free(p); +} +static CfreeHeap g_heap = {h_alloc, h_realloc, h_free, NULL}; + +static void diag_emit(CfreeDiagSink* s, CfreeDiagKind k, CfreeSrcLoc loc, + const char* fmt, va_list ap) { + static const char* names[] = {"note", "warning", "error", "fatal"}; + (void)s; + fprintf(stderr, "[%u]:%u:%u: %s: ", loc.file_id, loc.line, loc.col, names[k]); + vfprintf(stderr, fmt, ap); + fputc('\n', stderr); +} +static CfreeDiagSink g_diag = {diag_emit, NULL, 0, 0}; + +/* ---- env: execmem (W^X) — copied verbatim from parse_runner.c. Only the + * --jit mode actually exercises it; the other modes never touch execmem, + * but the env is shared. */ + +#if defined(__APPLE__) +#include <mach/mach.h> +#include <mach/mach_vm.h> +#define XM_DUAL_APPLE 1 +#else +#define XM_DUAL_APPLE 0 +#endif +#if defined(__linux__) +#include <sys/syscall.h> +#define XM_DUAL_LINUX 1 +#else +#define XM_DUAL_LINUX 0 +#endif + +static int xm_to_posix(int p) { + int q = 0; + if (p & CFREE_PROT_READ) q |= PROT_READ; + if (p & CFREE_PROT_WRITE) q |= PROT_WRITE; + if (p & CFREE_PROT_EXEC) q |= PROT_EXEC; + return q; +} +typedef struct XmTok { + void* w; + void* r; + size_t n; +} XmTok; +static int xm_reserve_single(size_t n, CfreeExecMemRegion* out) { + void* p = + mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (p == MAP_FAILED) return 1; + out->write = out->runtime = p; + out->size = n; + out->token = NULL; + return 0; +} +static int xm_reserve(void* u, size_t n, int p, CfreeExecMemRegion* out) { + (void)u; + if (!out || !n) return 1; + if (!(p & CFREE_PROT_EXEC)) return xm_reserve_single(n, out); +#if XM_DUAL_APPLE + { + void* w = + mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + mach_vm_address_t r = 0; + vm_prot_t cur = 0, max = 0; + XmTok* tok; + if (w == MAP_FAILED) return 1; + if (mach_vm_remap(mach_task_self(), &r, (mach_vm_size_t)n, 0, + VM_FLAGS_ANYWHERE, mach_task_self(), + (mach_vm_address_t)(uintptr_t)w, FALSE, &cur, &max, + VM_INHERIT_NONE) != KERN_SUCCESS) { + munmap(w, n); + return 1; + } + if (mprotect((void*)(uintptr_t)r, n, PROT_READ) != 0) { + munmap((void*)(uintptr_t)r, n); + munmap(w, n); + return 1; + } + tok = (XmTok*)malloc(sizeof(*tok)); + if (!tok) { + munmap((void*)(uintptr_t)r, n); + munmap(w, n); + return 1; + } + tok->w = w; + tok->r = (void*)(uintptr_t)r; + tok->n = n; + out->write = w; + out->runtime = (void*)(uintptr_t)r; + out->size = n; + out->token = tok; + return 0; + } +#elif XM_DUAL_LINUX + { + int fd = (int)syscall(SYS_memfd_create, "cfree-asm-jit", 0u); + void *w, *r; + XmTok* tok; + if (fd < 0) return 1; + if (ftruncate(fd, (off_t)n) != 0) { + close(fd); + return 1; + } + w = mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (w == MAP_FAILED) { + close(fd); + return 1; + } + r = mmap(NULL, n, PROT_READ, MAP_SHARED, fd, 0); + close(fd); + if (r == MAP_FAILED) { + munmap(w, n); + return 1; + } + tok = (XmTok*)malloc(sizeof(*tok)); + if (!tok) { + munmap(r, n); + munmap(w, n); + return 1; + } + tok->w = w; + tok->r = r; + tok->n = n; + out->write = w; + out->runtime = r; + out->size = n; + out->token = tok; + return 0; + } +#else + return xm_reserve_single(n, out); +#endif +} +static int xm_protect(void* u, void* a, size_t n, int p) { + (void)u; + return mprotect(a, n, xm_to_posix(p)); +} +static void xm_release(void* u, CfreeExecMemRegion* region) { + (void)u; + if (!region || !region->size) return; + if (region->token) { + XmTok* tok = (XmTok*)region->token; + if (tok->r && tok->r != tok->w) munmap(tok->r, tok->n); + if (tok->w) munmap(tok->w, tok->n); + free(tok); + } else if (region->write) { + munmap(region->write, region->size); + } + region->write = region->runtime = NULL; + region->size = 0; + region->token = NULL; +} +static void xm_flush(void* u, void* a, size_t n) { + (void)u; +#if defined(__aarch64__) || defined(__arm__) + __builtin___clear_cache((char*)a, (char*)a + n); +#else + (void)a; + (void)n; +#endif +} +static CfreeExecMem g_execmem = { + 16 * 1024, xm_reserve, xm_protect, xm_release, xm_flush, NULL, +}; + +static void env_init(CfreeEnv* env) { + memset(env, 0, sizeof *env); + env->heap = &g_heap; + env->diag = &g_diag; + env->execmem = &g_execmem; + env->now = -1; +} + +static void target_from_env(CfreeTarget* t) { + if (cfree_test_target_init(t) != 0) { + fprintf(stderr, "asm-runner: cfree_test_target_init failed\n"); + exit(2); + } +} + +/* ---- file helpers ---- */ + +static int read_file(const char* path, uint8_t** out, size_t* out_len) { + FILE* f = fopen(path, "rb"); + long n; + uint8_t* buf; + size_t got; + if (!f) return 1; + if (fseek(f, 0, SEEK_END) != 0) { + fclose(f); + return 1; + } + n = ftell(f); + if (n < 0 || fseek(f, 0, SEEK_SET) != 0) { + fclose(f); + return 1; + } + buf = (uint8_t*)malloc((size_t)n + 1); + if (!buf) { + fclose(f); + return 1; + } + got = fread(buf, 1, (size_t)n, f); + fclose(f); + if (got != (size_t)n) { + free(buf); + return 1; + } + buf[n] = 0; + *out = buf; + *out_len = (size_t)n; + return 0; +} + +static int write_all(const char* path, const uint8_t* data, size_t len) { + int fd = open(path, O_WRONLY | O_CREAT | O_TRUNC, 0644); + size_t off = 0; + if (fd < 0) { + perror(path); + return 1; + } + while (off < len) { + ssize_t k = write(fd, data + off, len - off); + if (k <= 0) { + perror("write"); + close(fd); + return 1; + } + off += (size_t)k; + } + close(fd); + return 0; +} + +/* Decode an ASCII hex stream — whitespace and a leading "0x" per token are + * tolerated. Used by --decode to read the .hex input fixture. */ +static int hex_decode(const uint8_t* in, size_t in_len, uint8_t** out, + size_t* out_len) { + uint8_t* buf = (uint8_t*)malloc(in_len / 2 + 1); + size_t n = 0; + int hi = -1; + size_t i; + if (!buf) return 1; + for (i = 0; i < in_len; ++i) { + int c = in[i]; + int v; + if (isspace(c)) continue; + if (c == '0' && i + 1 < in_len && (in[i + 1] == 'x' || in[i + 1] == 'X')) { + ++i; + continue; + } + if (c >= '0' && c <= '9') v = c - '0'; + else if (c >= 'a' && c <= 'f') v = 10 + c - 'a'; + else if (c >= 'A' && c <= 'F') v = 10 + c - 'A'; + else { + free(buf); + fprintf(stderr, "asm-runner: bad hex byte 0x%02x\n", c); + return 1; + } + if (hi < 0) { + hi = v; + } else { + buf[n++] = (uint8_t)((hi << 4) | v); + hi = -1; + } + } + if (hi >= 0) { + free(buf); + fprintf(stderr, "asm-runner: odd hex nibble count\n"); + return 1; + } + *out = buf; + *out_len = n; + return 0; +} + +/* ---- modes ---- */ + +/* --encode: compile a .s through CFREE_LANG_ASM, then walk the result via + * the public Obj reader to dump the raw bytes of every PROGBITS section + * marked executable. Output is lowercase hex, no separators. The .text- + * only choice keeps the golden simple for phase-1 smoke; multi-section + * cases will pivot to per-section dumps once the parser lands. */ +static int mode_encode(const char* src_path, const char* out_path) { + uint8_t* src = NULL; + size_t src_len = 0; + CfreeTarget tgt; + CfreeEnv env; + CfreeCompiler* c; + CfreeBytesInput in; + CfreeCompileOptions opts; + CfreeWriter* w; + const uint8_t* obj_bytes; + size_t obj_len = 0; + CfreeObjFile* of; + CfreeBytesInput obj_in; + uint32_t nsec, i; + uint8_t* hex = NULL; + size_t hex_len = 0; + int rc = 0; + + if (read_file(src_path, &src, &src_len)) { + fprintf(stderr, "asm-runner: cannot read %s\n", src_path); + return 2; + } + target_from_env(&tgt); + env_init(&env); + c = cfree_compiler_new(tgt, &env); + if (!c) { + free(src); + return 2; + } + + memset(&in, 0, sizeof in); + in.name = src_path; + in.data = src; + in.len = src_len; + in.lang = CFREE_LANG_ASM; + memset(&opts, 0, sizeof opts); + + w = cfree_writer_mem(&g_heap); + if (cfree_compile_obj_emit(c, &opts, &in, w) != 0) { + cfree_writer_close(w); + cfree_compiler_free(c); + free(src); + return 1; + } + obj_bytes = cfree_writer_mem_bytes(w, &obj_len); + + memset(&obj_in, 0, sizeof obj_in); + obj_in.name = src_path; + obj_in.data = obj_bytes; + obj_in.len = obj_len; + of = cfree_obj_open(&env, &obj_in); + if (!of) { + cfree_writer_close(w); + cfree_compiler_free(c); + free(src); + return 1; + } + + nsec = cfree_obj_nsections(of); + for (i = 0; i < nsec; ++i) { + CfreeObjSecInfo s = cfree_obj_section(of, i); + size_t n = 0; + const uint8_t* data; + size_t j; + char* p; + if (s.kind != CFREE_SEC_TEXT) continue; + data = cfree_obj_section_data(of, i, &n); + if (!data || !n) continue; + p = (char*)realloc(hex, hex_len + n * 2 + 1); + if (!p) { + rc = 1; + break; + } + hex = (uint8_t*)p; + for (j = 0; j < n; ++j) { + static const char H[] = "0123456789abcdef"; + hex[hex_len + 2 * j + 0] = (uint8_t)H[data[j] >> 4]; + hex[hex_len + 2 * j + 1] = (uint8_t)H[data[j] & 0xf]; + } + hex_len += n * 2; + } + if (rc == 0 && hex_len > 0) { + /* Trailing newline so the golden file has a final \n (diff-friendly). */ + char* p = (char*)realloc(hex, hex_len + 1); + if (!p) { + rc = 1; + } else { + hex = (uint8_t*)p; + hex[hex_len++] = '\n'; + rc = write_all(out_path, hex, hex_len); + } + } else if (rc == 0) { + /* No text — emit an empty file so the diff still has a target. */ + rc = write_all(out_path, (const uint8_t*)"", 0); + } + + free(hex); + cfree_obj_close(of); + cfree_writer_close(w); + cfree_compiler_free(c); + free(src); + return rc; +} + +/* --decode: read hex bytes and walk them through cfree_disasm_iter_*. + * Output is one instruction per line: "<vaddr-hex>:\t<mnemonic>\t<operands>" + * (annotation appended when non-empty, prefixed by " ; "). vaddr starts at + * 0; the caller can post-process if they want addresses relative to a + * specific section. */ +static int mode_decode(const char* in_path, const char* out_path) { + uint8_t* raw = NULL; + size_t raw_len = 0; + uint8_t* bytes = NULL; + size_t nbytes = 0; + CfreeTarget tgt; + CfreeEnv env; + CfreeCompiler* c; + CfreeDisasmIter* it; + FILE* out; + CfreeInsn ins; + int rc = 0; + + if (read_file(in_path, &raw, &raw_len)) { + fprintf(stderr, "asm-runner: cannot read %s\n", in_path); + return 2; + } + if (hex_decode(raw, raw_len, &bytes, &nbytes)) { + free(raw); + return 2; + } + free(raw); + + target_from_env(&tgt); + env_init(&env); + c = cfree_compiler_new(tgt, &env); + if (!c) { + free(bytes); + return 2; + } + + it = cfree_disasm_iter_new(c, bytes, nbytes, 0, NULL); + if (!it) { + cfree_compiler_free(c); + free(bytes); + return 1; + } + + out = fopen(out_path, "wb"); + if (!out) { + perror(out_path); + cfree_disasm_iter_free(it); + cfree_compiler_free(c); + free(bytes); + return 2; + } + + while (cfree_disasm_iter_next(it, &ins)) { + fprintf(out, "%llx:\t%s\t%s", + (unsigned long long)ins.vaddr, + ins.mnemonic ? ins.mnemonic : "", + ins.operands ? ins.operands : ""); + if (ins.annotation && ins.annotation[0]) { + fprintf(out, " ; %s", ins.annotation); + } + fputc('\n', out); + } + if (fclose(out) != 0) rc = 1; + + cfree_disasm_iter_free(it); + cfree_compiler_free(c); + free(bytes); + return rc; +} + +/* --listing: walk a relocatable ELF and run cfree_obj_disasm into a + * mem-Writer, then dump to the output path. Annotation overlay comes from + * the ELF's own sym/reloc tables; the harness just stores whatever the + * library emits. */ +static int mode_listing(const char* in_path, const char* out_path) { + uint8_t* bytes = NULL; + size_t nbytes = 0; + CfreeTarget tgt; + CfreeEnv env; + CfreeCompiler* c; + CfreeBytesInput in; + CfreeWriter* w; + const uint8_t* data; + size_t len = 0; + int rc; + + if (read_file(in_path, &bytes, &nbytes)) { + fprintf(stderr, "asm-runner: cannot read %s\n", in_path); + return 2; + } + target_from_env(&tgt); + env_init(&env); + c = cfree_compiler_new(tgt, &env); + if (!c) { + free(bytes); + return 2; + } + + memset(&in, 0, sizeof in); + in.name = in_path; + in.data = bytes; + in.len = nbytes; + in.lang = CFREE_LANG_C; /* unused for obj_disasm but keep the field set */ + + w = cfree_writer_mem(&g_heap); + if (cfree_obj_disasm(c, &in, w) != 0) { + cfree_writer_close(w); + cfree_compiler_free(c); + free(bytes); + return 1; + } + data = cfree_writer_mem_bytes(w, &len); + rc = write_all(out_path, data, len); + + cfree_writer_close(w); + cfree_compiler_free(c); + free(bytes); + return rc; +} + +/* --emit: assemble .s to a relocatable ELF .o on disk. Mirrors + * parse_runner's --emit so the path-J/E shell plumbing (link-exe-runner / + * jit-runner) can be reused verbatim. */ +static int mode_emit(const char* src_path, const char* out_path) { + uint8_t* src = NULL; + size_t src_len = 0; + CfreeTarget tgt; + CfreeEnv env; + CfreeCompiler* c; + CfreeBytesInput in; + CfreeCompileOptions opts; + CfreeWriter* w; + int rc = 0; + size_t len = 0; + const uint8_t* data; + + if (read_file(src_path, &src, &src_len)) { + fprintf(stderr, "asm-runner: cannot read %s\n", src_path); + return 2; + } + target_from_env(&tgt); + env_init(&env); + c = cfree_compiler_new(tgt, &env); + if (!c) { + free(src); + return 2; + } + + memset(&in, 0, sizeof in); + in.name = src_path; + in.data = src; + in.len = src_len; + in.lang = CFREE_LANG_ASM; + memset(&opts, 0, sizeof opts); + + w = cfree_writer_mem(&g_heap); + if (cfree_compile_obj_emit(c, &opts, &in, w) != 0) { + cfree_writer_close(w); + cfree_compiler_free(c); + free(src); + return 1; + } + + data = cfree_writer_mem_bytes(w, &len); + rc = write_all(out_path, data, len); + + cfree_writer_close(w); + cfree_compiler_free(c); + free(src); + return rc; +} + +/* On AArch64 host, set up TLS Local-Exec image before invoking JITed + * code. Mirrors parse_runner / cg_runner: msr → call() must be back-to- + * back with no libc invocations in between. Cases that don't reference + * TLS see the lookups fail and the block stays zeroed — harmless. */ +#if defined(__aarch64__) || defined(__arm64__) +static char g_tls_block[8192] __attribute__((aligned(16))); +#endif + +/* --jit: parse + link_jit + call test_main(). Exit code is test_main's + * return mod 256, matching the parse/cg conventions. */ +static int mode_jit(const char* src_path) { + uint8_t* src = NULL; + size_t src_len = 0; + CfreeTarget tgt; + CfreeEnv env; + CfreeCompiler* c; + CfreeBytesInput in; + CfreeCompileOptions opts; + CfreeObjBuilder* ob = NULL; + CfreeLinkOptions lopts; + CfreeObjBuilder* arr[1]; + CfreeJit* jit = NULL; + int (*fn)(void); + int result; + + if (read_file(src_path, &src, &src_len)) { + fprintf(stderr, "asm-runner: cannot read %s\n", src_path); + return 2; + } + target_from_env(&tgt); + env_init(&env); + c = cfree_compiler_new(tgt, &env); + if (!c) { + free(src); + return 2; + } + + memset(&in, 0, sizeof in); + in.name = src_path; + in.data = src; + in.len = src_len; + in.lang = CFREE_LANG_ASM; + memset(&opts, 0, sizeof opts); + + if (cfree_compile_obj(c, &opts, &in, &ob) != 0 || !ob) { + cfree_compiler_free(c); + free(src); + return 1; + } + + memset(&lopts, 0, sizeof lopts); + arr[0] = ob; + lopts.inputs.objs = arr; + lopts.inputs.nobjs = 1; + lopts.inputs.entry = "test_main"; + + if (cfree_link_jit(c, &lopts, &jit) != 0 || !jit) { + cfree_compiler_free(c); + free(src); + return 1; + } + + fn = (int (*)(void))cfree_jit_lookup(jit, "test_main"); + +#if defined(__aarch64__) || defined(__arm64__) + { + char* td_start = (char*)cfree_jit_lookup(jit, "__tdata_start"); + char* td_end = (char*)cfree_jit_lookup(jit, "__tdata_end"); + unsigned long bs_n = + (unsigned long)(unsigned long long)cfree_jit_lookup(jit, "__tbss_size"); + if (td_start && td_end) { + unsigned long td_n = (unsigned long)(td_end - td_start); + unsigned long i; + for (i = 0; i < td_n; ++i) g_tls_block[16 + i] = td_start[i]; + for (i = 0; i < bs_n; ++i) g_tls_block[16 + td_n + i] = 0; + } + } +#endif + + if (fn) { +#if defined(__aarch64__) || defined(__arm64__) + __asm__ volatile("msr tpidr_el0, %0" ::"r"(g_tls_block) : "memory"); +#endif + result = fn(); + } else { + result = 1; + } + + cfree_jit_free(jit); + cfree_compiler_free(c); + free(src); + return result; +} + +static int usage(void) { + fprintf(stderr, + "usage: asm-runner --encode IN.s OUT.hex\n" + " asm-runner --decode IN.hex OUT.txt\n" + " asm-runner --listing IN.bin OUT.txt\n" + " asm-runner --emit IN.s OUT.o\n" + " asm-runner --jit IN.s\n"); + return 2; +} + +int main(int argc, char** argv) { + long ps = sysconf(_SC_PAGESIZE); + if (ps > 0) g_execmem.page_size = (size_t)ps; + if (argc < 2) return usage(); + if (!strcmp(argv[1], "--encode") && argc == 4) return mode_encode(argv[2], argv[3]); + if (!strcmp(argv[1], "--decode") && argc == 4) return mode_decode(argv[2], argv[3]); + if (!strcmp(argv[1], "--listing") && argc == 4) return mode_listing(argv[2], argv[3]); + if (!strcmp(argv[1], "--emit") && argc == 4) return mode_emit(argv[2], argv[3]); + if (!strcmp(argv[1], "--jit") && argc == 3) return mode_jit(argv[2]); + return usage(); +} diff --git a/test/asm/listing/nop_ret.expected.lst b/test/asm/listing/nop_ret.expected.lst @@ -0,0 +1,5 @@ +Disassembly of section .text: + +0000000000000000 <_start>: + 0: d503201f nop + 4: d65f03c0 ret diff --git a/test/asm/listing/nop_ret.in.bin b/test/asm/listing/nop_ret.in.bin Binary files differ. diff --git a/test/asm/listing/nop_ret.skip b/test/asm/listing/nop_ret.skip @@ -0,0 +1 @@ +phase 1: cfree_obj_disasm is still a stub (src/api/stubs.c) diff --git a/test/asm/regen.sh b/test/asm/regen.sh @@ -0,0 +1,89 @@ +#!/usr/bin/env bash +# test/asm/regen.sh — regenerate the smoke goldens from the host +# `as` / `objdump` (via clang as the cross driver). Maintainer aid: NOT +# run by CI. Commit the refreshed goldens alongside the case changes. +# +# Usage: +# ./regen.sh regenerate every case +# ./regen.sh <name> regenerate just one case (substring match) +# +# Requires: +# clang --target=aarch64-linux-gnu (the system clang on macOS is fine) +# llvm-objdump or aarch64-linux-gnu-objdump +# xxd (for hex dumps) + +set -eu + +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +TEST_DIR="$ROOT/test/asm" +FILTER="${1:-}" + +CLANG_TARGET="--target=aarch64-linux-gnu" +OBJDUMP="$(command -v llvm-objdump 2>/dev/null || command -v aarch64-linux-gnu-objdump 2>/dev/null || true)" +if [ -z "$OBJDUMP" ]; then + printf 'regen.sh: no llvm-objdump / aarch64-linux-gnu-objdump on PATH\n' >&2 + exit 1 +fi + +tmp="$(mktemp -d)" +trap 'rm -rf "$tmp"' EXIT + +regen_encode() { + local src="$1" name out_obj out_hex + name="$(basename "$src" .s)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && return 0 + out_obj="$tmp/$name.o" + out_hex="$TEST_DIR/encode/$name.expected.hex" + clang $CLANG_TARGET -c "$src" -o "$out_obj" + # Dump every executable section (smoke only inspects .text; this is + # tight enough for now). For multi-section cases we'll pivot to a + # per-section dump alongside phase 3. + "$OBJDUMP" -h -j .text "$out_obj" >/dev/null 2>&1 + "$OBJDUMP" --full-contents -j .text "$out_obj" \ + | awk '/^Contents of section/ {next} /^$/ {next} + { for (i=2; i<=5; i++) if ($i ~ /^[0-9a-f]+$/) printf "%s", $i; printf "\n" }' \ + | tr -d '\n' \ + | { cat; printf '\n'; } >"$out_hex" + printf ' regen encode/%s\n' "$name" +} + +regen_decode() { + local hexfile="$1" name out_txt + name="$(basename "$hexfile" .hex)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && return 0 + out_txt="$TEST_DIR/decode/$name.expected.txt" + # Mirror asm-runner --decode output exactly: vaddr:\tmnemonic\toperands. + # objdump's listing format differs (it interleaves addresses + raw hex); + # rebuild a minimal line per insn via awk so the goldens match the + # runner's exact-match expectation. + local raw="$tmp/$name.bin" + xxd -r -p "$hexfile" "$raw" + "$OBJDUMP" -b binary -m aarch64 -D "$raw" \ + | awk '/^[ ]+[0-9a-f]+:/ { + sub(/:/, "", $1); + addr = $1; + # fields: addr raw-hex mnemonic operands... + mnem = $3; + ops = ""; + for (i=4; i<=NF; i++) ops = (ops=="" ? $i : ops " " $i); + printf "%s:\t%s\t%s\n", addr, mnem, ops; + }' >"$out_txt" + printf ' regen decode/%s\n' "$name" +} + +regen_listing() { + local bin="$1" name out_lst + name="$(basename "$bin" .in.bin)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && return 0 + out_lst="$TEST_DIR/listing/$name.expected.lst" + "$OBJDUMP" -d "$bin" \ + | awk '/^Disassembly of section/ || /^[0-9a-f]+ </ || /^[ ]+[0-9a-f]+:/ || /^$/' \ + >"$out_lst" + printf ' regen listing/%s\n' "$name" +} + +printf 'Regenerating goldens...\n' +for src in "$TEST_DIR"/encode/*.s; do [ -e "$src" ] && regen_encode "$src"; done +for src in "$TEST_DIR"/decode/*.hex; do [ -e "$src" ] && regen_decode "$src"; done +for src in "$TEST_DIR"/listing/*.in.bin; do [ -e "$src" ] && regen_listing "$src"; done +printf 'Done.\n' diff --git a/test/asm/run.sh b/test/asm/run.sh @@ -0,0 +1,454 @@ +#!/usr/bin/env bash +# test/asm/run.sh — file-driven assembler / disassembler test harness. +# +# Three sub-corpora under test/asm/, one per asm-runner static-output mode: +# +# encode/ <name>.s + <name>.expected.hex golden hex bytes +# decode/ <name>.hex + <name>.expected.txt golden decoded text +# listing/ <name>.in.bin + <name>.expected.lst golden objdump-style +# +# Path matrix (6 letters, default HTLDJE): +# +# H Hex encode — encode/ only. asm-runner --encode IN.s → hex; diff +# vs <name>.expected.hex. +# T Text decode — decode/ only. asm-runner --decode IN.hex → text; diff +# vs <name>.expected.txt. +# L Listing — listing/ only. asm-runner --listing IN.in.bin → text; +# diff vs <name>.expected.lst. +# D Direct JIT — encode/ only, when <name>.expected (integer exit) is +# present. asm-runner --jit IN.s → exit code matches. +# Host arch must match the cross-target. +# J JIT via file — encode/ only, when <name>.expected is present. +# asm-runner --emit + jit-runner. Host arch must match. +# E ELF exec — encode/ only, when <name>.expected is present. +# asm-runner --emit + start.o → link-exe-runner → +# qemu/podman → exit code. Cross-host friendly. +# +# Reuses the test/link harness binaries (link-exe-runner, jit-runner) plus +# test/link/harness/start.c verbatim — same convention as test/parse/run.sh. +# +# Phase 1 (doc/ASM.md §5): parse_asm and the disasm iterator are still +# stubs in src/api/stubs.c. Every smoke case carries a .skip sidecar so +# the harness reports SKIP cleanly; the wiring still runs on every CI +# pass. CFREE_TEST_ALLOW_SKIP defaults to 1 here for the duration of +# phase 1 — flip to 0 (matching the rest of the suite) once the +# assembler / disasm iterator are real. +# +# Filtering: +# ./run.sh [name_filter] [paths] +# name_filter substring match against case basename +# paths subset of "HTLDJE" (default "HTLDJE") +# Equivalent env vars: CFREE_TEST_FILTER, CFREE_TEST_PATHS. + +set -u + +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +TEST_DIR="$ROOT/test/asm" +LINK_TEST_DIR="$ROOT/test/link" +BUILD_DIR="$ROOT/build/test" +LIB_AR="$ROOT/build/libcfree.a" + +ASM_RUNNER="$BUILD_DIR/asm-runner" +LINK_EXE_RUNNER="$BUILD_DIR/link-exe-runner" +JIT_RUNNER="$BUILD_DIR/jit-runner" + +# CFREE_TEST_ARCH selects the cross-target. Default aa64 preserves the +# pre-multiarch behavior. The asm-runner reads the same env via +# test/lib/cfree_test_target.h. +CFREE_TEST_ARCH="${CFREE_TEST_ARCH:-aa64}" +case "$CFREE_TEST_ARCH" in + aa64|aarch64|arm64) TEST_ARCH=aa64; CLANG_TRIPLE=aarch64-linux-gnu; EXEC_ARCH=aarch64 ;; + x64|x86_64|amd64) TEST_ARCH=x64; CLANG_TRIPLE=x86_64-linux-gnu; EXEC_ARCH=x64 ;; + rv64|riscv64) TEST_ARCH=rv64; CLANG_TRIPLE=riscv64-linux-gnu; EXEC_ARCH=rv64 ;; + *) printf 'unknown CFREE_TEST_ARCH=%s\n' "$CFREE_TEST_ARCH" >&2; exit 2 ;; +esac +export CFREE_TEST_ARCH + +CLANG_TARGET="--target=$CLANG_TRIPLE" +CC="${CC:-cc}" +HARNESS_CFLAGS="-std=c11 -Wall -Wextra -I$ROOT/include -I$ROOT/test" +# Phase 1: ALLOW_SKIP defaults to 1 (smoke cases skip cleanly because +# parse_asm / cfree_disasm_iter_* are still stubs). Flip to 0 once the +# assembler / disassembler land. +ALLOW_SKIP="${CFREE_TEST_ALLOW_SKIP:-1}" + +FILTER="${1:-${CFREE_TEST_FILTER:-}}" +PATHS="${2:-${CFREE_TEST_PATHS:-HTLDJE}}" +case "$PATHS" in *H*) RUN_H=1;; *) RUN_H=0;; esac +case "$PATHS" in *T*) RUN_T=1;; *) RUN_T=0;; esac +case "$PATHS" in *L*) RUN_L=1;; *) RUN_L=0;; esac +case "$PATHS" in *D*) RUN_D=1;; *) RUN_D=0;; esac +case "$PATHS" in *J*) RUN_J=1;; *) RUN_J=0;; esac +case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac +T_H=0; T_T=0; T_L=0; T_D=0; T_J=0; T_E=0 +now_ms() { python3 -c 'import time;print(int(time.time()*1000))'; } + +mkdir -p "$BUILD_DIR" "$BUILD_DIR/asm" + +PASS=0; FAIL=0; SKIP=0 +FAIL_NAMES=(); SKIP_NAMES=() + +color_red() { printf '\033[31m%s\033[0m' "$1"; } +color_grn() { printf '\033[32m%s\033[0m' "$1"; } +color_yel() { printf '\033[33m%s\033[0m' "$1"; } + +note_pass() { PASS=$((PASS+1)); printf ' %s %s\n' "$(color_grn PASS)" "$1"; } +note_fail() { FAIL=$((FAIL+1)); FAIL_NAMES+=("$1"); printf ' %s %s\n' "$(color_red FAIL)" "$1"; } +note_skip() { SKIP=$((SKIP+1)); SKIP_NAMES+=("$1"); printf ' %s %s — %s\n' "$(color_yel SKIP)" "$1" "$2"; } + +# ---- tool detection (mirrors test/parse/run.sh) ---------------------------- + +have_clang_cross=0 +have_exe_runner=0 +have_jit_runner=0 +is_aarch64=0 + +if clang $CLANG_TARGET -c -x c - -o /dev/null < /dev/null 2>/dev/null; then + have_clang_cross=1 +fi + +arch_raw="$(uname -m 2>/dev/null || true)" +{ [ "$arch_raw" = "aarch64" ] || [ "$arch_raw" = "arm64" ]; } && is_aarch64=1 + +# is_native_target=1 when the cross-target arch matches the host arch. +# Required for in-process JIT (path D) and the jit-runner (path J). +is_native_target=0 +case "$TEST_ARCH" in + aa64) [ $is_aarch64 -eq 1 ] && is_native_target=1 ;; + x64) { [ "$arch_raw" = "x86_64" ] || [ "$arch_raw" = "amd64" ]; } && is_native_target=1 ;; + rv64) [ "$arch_raw" = "riscv64" ] && is_native_target=1 ;; +esac + +# Shared per-arch exec helper — see test/lib/exec_target.sh. +EXEC_TARGET_MOUNT_ROOT="$BUILD_DIR" +# shellcheck source=../lib/exec_target.sh +source "$ROOT/test/lib/exec_target.sh" + +# ---- build harness binaries ------------------------------------------------ + +printf 'Building harness...\n' + +if [ ! -f "$LIB_AR" ]; then + printf ' FATAL: %s not found — run "make lib" first\n' "$LIB_AR" >&2 + exit 1 +fi + +# asm-runner +if $CC $HARNESS_CFLAGS \ + "$TEST_DIR/harness/asm_runner.c" \ + "$LIB_AR" -o "$ASM_RUNNER" 2>"$BUILD_DIR/asm-runner.err"; then + printf ' %s asm-runner\n' "$(color_grn built)" +else + printf ' %s asm-runner (see %s)\n' \ + "$(color_red FATAL)" "$BUILD_DIR/asm-runner.err" >&2 + exit 1 +fi + +# link-exe-runner — for path E. +if [ ! -x "$LINK_EXE_RUNNER" ]; then + if $CC -I"$ROOT/include" -I"$ROOT/test" "$LINK_TEST_DIR/harness/link_exe_runner.c" \ + "$LIB_AR" -o "$LINK_EXE_RUNNER" 2>"$BUILD_DIR/link-exe-runner.err"; then + have_exe_runner=1 + printf ' %s link-exe-runner\n' "$(color_grn built)" + else + printf ' %s link-exe-runner (see %s)\n' \ + "$(color_yel warn)" "$BUILD_DIR/link-exe-runner.err" >&2 + fi +else + have_exe_runner=1 +fi + +# jit-runner — for path J. Only meaningful when host arch matches the cross-target. +if [ $is_native_target -eq 1 ]; then + if [ ! -x "$JIT_RUNNER" ]; then + if $CC -I"$ROOT/include" -I"$ROOT/test" "$LINK_TEST_DIR/harness/jit_runner.c" \ + "$LIB_AR" -o "$JIT_RUNNER" 2>"$BUILD_DIR/jit-runner.err"; then + have_jit_runner=1 + printf ' %s jit-runner\n' "$(color_grn built)" + else + printf ' %s jit-runner (see %s)\n' \ + "$(color_yel warn)" "$BUILD_DIR/jit-runner.err" >&2 + fi + else + have_jit_runner=1 + fi +fi + +# Cached start.o — same trick as parse/cg harnesses; build once for the +# whole run. +START_OBJ="$BUILD_DIR/asm_start.o" +have_start_obj=0 +if [ $have_clang_cross -eq 1 ]; then + if clang $CLANG_TARGET -O1 -ffreestanding -fno-stack-protector \ + -fno-PIC -fno-pie \ + -c "$LINK_TEST_DIR/harness/start.c" -o "$START_OBJ" 2>/dev/null; then + have_start_obj=1 + fi +fi + +printf 'Running cases...\n' + +# ---- helpers -------------------------------------------------------------- + +# diff_case <name>/<P> <work> <expected> <actual> <link_dt_or_0> +# Emits PASS or FAIL based on byte-exact match. +diff_case() { + local label="$1" work="$2" expected="$3" actual="$4" + if diff -u "$expected" "$actual" >"$work/diff" 2>&1; then + note_pass "$label" + else + note_fail "$label (golden mismatch; see $work/diff)" + fi +} + +# ---- decode and listing loops — single-path, golden-driven only ----------- + +if [ $RUN_T -eq 1 ] && [ -d "$TEST_DIR/decode" ]; then + for in_path in "$TEST_DIR"/decode/*.hex; do + [ -e "$in_path" ] || continue + name="$(basename "$in_path" .hex)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue + work="$BUILD_DIR/asm/decode/$name" + mkdir -p "$work" + if [ -e "$TEST_DIR/decode/$name.skip" ]; then + reason=$(head -n1 "$TEST_DIR/decode/$name.skip") + note_skip "$name/T" "$reason" + continue + fi + expected="$TEST_DIR/decode/$name.expected.txt" + if [ ! -e "$expected" ]; then + note_fail "$name/T (missing golden $(basename "$expected"))" + continue + fi + t0=$(now_ms) + out="$work/out.txt" + if ! "$ASM_RUNNER" --decode "$in_path" "$out" >"$work/stdout" 2>"$work/stderr"; then + dt=$(( $(now_ms) - t0 )); T_T=$(( T_T + dt )) + note_fail "$name/T (asm-runner --decode failed; see $work/stderr, ${dt}ms)" + continue + fi + dt=$(( $(now_ms) - t0 )); T_T=$(( T_T + dt )) + diff_case "$name/T (${dt}ms)" "$work" "$expected" "$out" + done +fi + +if [ $RUN_L -eq 1 ] && [ -d "$TEST_DIR/listing" ]; then + for in_path in "$TEST_DIR"/listing/*.in.bin; do + [ -e "$in_path" ] || continue + name="$(basename "$in_path" .in.bin)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue + work="$BUILD_DIR/asm/listing/$name" + mkdir -p "$work" + if [ -e "$TEST_DIR/listing/$name.skip" ]; then + reason=$(head -n1 "$TEST_DIR/listing/$name.skip") + note_skip "$name/L" "$reason" + continue + fi + expected="$TEST_DIR/listing/$name.expected.lst" + if [ ! -e "$expected" ]; then + note_fail "$name/L (missing golden $(basename "$expected"))" + continue + fi + t0=$(now_ms) + out="$work/out.lst" + if ! "$ASM_RUNNER" --listing "$in_path" "$out" >"$work/stdout" 2>"$work/stderr"; then + dt=$(( $(now_ms) - t0 )); T_L=$(( T_L + dt )) + note_fail "$name/L (asm-runner --listing failed; see $work/stderr, ${dt}ms)" + continue + fi + dt=$(( $(now_ms) - t0 )); T_L=$(( T_L + dt )) + diff_case "$name/L (${dt}ms)" "$work" "$expected" "$out" + done +fi + +# ---- encode loop — drives H + D + J + E per .s case ---------------------- + +# Path E result bookkeeping — same shape as test/parse. +E_NAMES=() +E_WORK=() +E_LINK_MS=() +E_EXPECTED=() + +if [ -d "$TEST_DIR/encode" ]; then + for src in "$TEST_DIR"/encode/*.s; do + [ -e "$src" ] || continue + name="$(basename "$src" .s)" + [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue + work="$BUILD_DIR/asm/encode/$name" + mkdir -p "$work" + + # Per-case skip sidecar — applies to every path for this case. + if [ -e "$TEST_DIR/encode/$name.skip" ]; then + reason=$(head -n1 "$TEST_DIR/encode/$name.skip") + [ $RUN_H -eq 1 ] && note_skip "$name/H" "$reason" + [ $RUN_D -eq 1 ] && note_skip "$name/D" "$reason" + [ $RUN_J -eq 1 ] && note_skip "$name/J" "$reason" + [ $RUN_E -eq 1 ] && note_skip "$name/E" "$reason" + continue + fi + + # Expected exit code (for D/J/E). Absent → those paths skip. + expected_exit_file="$TEST_DIR/encode/$name.expected" + has_exit=0 + if [ -e "$expected_exit_file" ]; then + expected=$(head -n1 "$expected_exit_file") + expected_byte=$(( expected & 0xff )) + has_exit=1 + fi + + # ---- Path H: hex encode roundtrip ---------------------------------- + if [ $RUN_H -eq 1 ]; then + expected_hex="$TEST_DIR/encode/$name.expected.hex" + if [ ! -e "$expected_hex" ]; then + note_skip "$name/H" "no .expected.hex golden" + else + t0=$(now_ms) + out="$work/out.hex" + if ! "$ASM_RUNNER" --encode "$src" "$out" >"$work/h.out" 2>"$work/h.err"; then + dt=$(( $(now_ms) - t0 )); T_H=$(( T_H + dt )) + note_fail "$name/H (asm-runner --encode failed; see $work/h.err, ${dt}ms)" + else + dt=$(( $(now_ms) - t0 )); T_H=$(( T_H + dt )) + diff_case "$name/H (${dt}ms)" "$work" "$expected_hex" "$out" + fi + fi + fi + + # ---- emit (needed by D/J/E exec paths) ----------------------------- + # D doesn't strictly need a .o on disk — asm-runner --jit does the + # full parse+jit in process. But J and E need the file emit. + obj="$work/$name.o" + need_emit=0 + if [ $has_exit -eq 1 ]; then + if [ $RUN_J -eq 1 ] || [ $RUN_E -eq 1 ]; then need_emit=1; fi + fi + if [ $need_emit -eq 1 ]; then + if ! "$ASM_RUNNER" --emit "$src" "$obj" 2>"$work/emit.err"; then + # D may still run independently; the J/E paths below detect + # the missing .o and skip themselves. + printf ' %s %s/emit (asm-runner --emit failed; see %s)\n' \ + "$(color_red FAIL)" "$name" "$work/emit.err" + FAIL=$((FAIL+1)); FAIL_NAMES+=("$name/emit") + fi + fi + + # ---- Path D: in-process JIT ---------------------------------------- + if [ $RUN_D -eq 1 ]; then + if [ $has_exit -eq 0 ]; then + note_skip "$name/D" "no .expected exit code" + elif [ $is_native_target -eq 0 ]; then + note_skip "$name/D" "host arch != $TEST_ARCH (no native JIT)" + else + t0=$(now_ms) + "$ASM_RUNNER" --jit "$src" >"$work/d.out" 2>"$work/d.err" + d_rc=$? + dt=$(( $(now_ms) - t0 )); T_D=$(( T_D + dt )) + if [ "$d_rc" -eq "$expected_byte" ]; then + note_pass "$name/D (${dt}ms)" + else + note_fail "$name/D (expected $expected_byte got $d_rc, ${dt}ms)" + fi + fi + fi + + # ---- Path J: jit-via-file ------------------------------------------ + if [ $RUN_J -eq 1 ]; then + if [ $has_exit -eq 0 ]; then + note_skip "$name/J" "no .expected exit code" + elif [ $have_jit_runner -eq 0 ]; then + note_skip "$name/J" "no jit-runner (host arch != $TEST_ARCH)" + elif [ ! -e "$obj" ]; then + note_skip "$name/J" "no .o (--emit failed)" + else + t0=$(now_ms) + "$JIT_RUNNER" "$obj" >"$work/jit.out" 2>"$work/jit.err" + j_rc=$? + dt=$(( $(now_ms) - t0 )); T_J=$(( T_J + dt )) + if [ "$j_rc" -eq "$expected_byte" ]; then + note_pass "$name/J (${dt}ms)" + else + note_fail "$name/J (expected $expected_byte got $j_rc, ${dt}ms)" + fi + fi + fi + + # ---- Path E: link + (batched) qemu/podman -------------------------- + if [ $RUN_E -eq 1 ]; then + if [ $has_exit -eq 0 ]; then + note_skip "$name/E" "no .expected exit code" + elif [ $have_exe_runner -eq 0 ] || [ $have_clang_cross -eq 0 ] \ + || [ $have_start_obj -eq 0 ]; then + note_skip "$name/E" "no link-exe-runner, $TEST_ARCH clang, or start.o" + elif [ ! -e "$obj" ]; then + note_skip "$name/E" "no .o (--emit failed)" + else + t0=$(now_ms) + exe="$work/linked.exe" + if ! "$LINK_EXE_RUNNER" -o "$exe" "$obj" "$START_OBJ" \ + >"$work/exec_link.out" 2>"$work/exec_link.err"; then + dt=$(( $(now_ms) - t0 )); T_E=$(( T_E + dt )) + note_fail "$name/E (link failed, ${dt}ms)" + elif exec_target_supported "$EXEC_ARCH"; then + link_dt=$(( $(now_ms) - t0 )); T_E=$(( T_E + link_dt )) + E_NAMES+=("$name") + E_WORK+=("$work") + E_LINK_MS+=("$link_dt") + E_EXPECTED+=("$expected_byte") + exec_target_queue "$EXEC_ARCH" "$name" "$exe" \ + "$work/exec.out" "$work/exec.err" "$work/exec.rc" + else + note_skip "$name/E" "no runner for $EXEC_ARCH" + fi + fi + fi + done +fi + +# ---- batched path-E flush + verification ----------------------------------- + +T_E_BATCH=0 +if [ "$(exec_target_queue_size)" -gt 0 ]; then + printf 'Running path E (%d cases batched)...\n' "$(exec_target_queue_size)" + t0=$(now_ms) + exec_target_flush + T_E_BATCH=$(( $(now_ms) - t0 )); T_E=$(( T_E + T_E_BATCH )) + + i=0 + while [ $i -lt ${#E_NAMES[@]} ]; do + name="${E_NAMES[$i]}" + work="${E_WORK[$i]}" + link_dt="${E_LINK_MS[$i]}" + expected_byte="${E_EXPECTED[$i]}" + if [ ! -f "$work/exec.rc" ]; then + note_fail "$name/E (no rc; podman batch did not produce results)" + else + RUN_RC="$(cat "$work/exec.rc")" + if [ "$RUN_RC" -eq "$expected_byte" ]; then + note_pass "$name/E (link ${link_dt}ms)" + else + note_fail "$name/E (expected $expected_byte got $RUN_RC, link ${link_dt}ms)" + fi + fi + i=$((i+1)) + done +fi + +# ---- summary --------------------------------------------------------------- + +if [ ${#FAIL_NAMES[@]} -gt 0 ]; then + printf '\nFailed:\n' + for n in "${FAIL_NAMES[@]}"; do printf ' %s\n' "$n"; done +fi + +if [ ${#SKIP_NAMES[@]} -gt 0 ] && [ "$ALLOW_SKIP" != "1" ]; then + printf '\nSkipped (treat as failure; set CFREE_TEST_ALLOW_SKIP=1 to allow):\n' + for n in "${SKIP_NAMES[@]}"; do printf ' %s\n' "$n"; done +fi + +printf '\nResults: %s pass, %s fail, %s skip\n' "$PASS" "$FAIL" "$SKIP" +printf 'Time: H=%dms T=%dms L=%dms D=%dms J=%dms E=%dms (batch %dms)\n' \ + "$T_H" "$T_T" "$T_L" "$T_D" "$T_J" "$T_E" "$T_E_BATCH" + +if [ $FAIL -gt 0 ]; then exit 1; fi +if [ $SKIP -gt 0 ] && [ "$ALLOW_SKIP" != "1" ]; then exit 1; fi +exit 0 diff --git a/test/cg/run.sh b/test/cg/run.sh @@ -15,6 +15,15 @@ # register checks are silently skipped. Today every # check fails by design — debug_emit and the # cfree_dwarf_* consumers are stubs. +# S asm roundtrip — for every cg-emitted aarch64 binary, walk .text +# through cfree_disasm_iter_*, re-assemble the +# resulting text via the asm-runner, byte-compare. +# Phase 1 (per doc/ASM.md §5): always reports +# SKIP — the disasm iterator and asm parser are +# stubs in src/api/stubs.c. S is opt-in (not in +# the default DREJW path matrix) until phase 4 +# lands; run with `./run.sh '' S` or +# CFREE_TEST_PATHS=DREJWS. # # Reuses the existing test/link harness binaries (link-exe-runner, # jit-runner, cfree-roundtrip) verbatim. @@ -102,7 +111,8 @@ case "$PATHS" in *R*) RUN_R=1;; *) RUN_R=0;; esac case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac case "$PATHS" in *J*) RUN_J=1;; *) RUN_J=0;; esac case "$PATHS" in *W*) RUN_W=1;; *) RUN_W=0;; esac -T_D=0; T_R=0; T_E=0; T_J=0; T_W=0 # accumulated wall-clock seconds per path +case "$PATHS" in *S*) RUN_S=1;; *) RUN_S=0;; esac +T_D=0; T_R=0; T_E=0; T_J=0; T_W=0; T_S=0 # accumulated wall-clock seconds per path now_ms() { python3 -c 'import time;print(int(time.time()*1000))'; } mkdir -p "$BUILD_DIR" "$BUILD_DIR/cg" @@ -450,6 +460,17 @@ for OPT_LEVEL in $OPT_LEVELS; do fi fi fi + + # ---- Path S: asm roundtrip (phase-1 stub) ----------------------------- + # Walks .text through cfree_disasm_iter_*, reassembles via + # asm-runner --encode, byte-compares against the emitted bytes. + # Phase 1 per doc/ASM.md §5: the iterator and parse_asm are still + # stubs, so we report SKIP unconditionally when S is requested. + # When phase 3+4 land, replace this block with the real + # disasm/reassemble pipeline. + if [ $RUN_S -eq 1 ]; then + note_skip "$name/S${TAG}" "phase 1: cfree_disasm_iter_* / parse_asm are stubs" + fi done # ---- batched path-E flush + verification (per level) ------------------- @@ -499,8 +520,8 @@ if [ ${#SKIP_NAMES[@]} -gt 0 ] && [ "$ALLOW_SKIP" != "1" ]; then fi printf '\nResults: %s pass, %s fail, %s skip\n' "$PASS" "$FAIL" "$SKIP" -printf 'Time: D=%dms R=%dms E=%dms (batch %dms) J=%dms W=%dms\n' \ - "$T_D" "$T_R" "$T_E" "$T_E_BATCH" "$T_J" "$T_W" +printf 'Time: D=%dms R=%dms E=%dms (batch %dms) J=%dms W=%dms S=%dms\n' \ + "$T_D" "$T_R" "$T_E" "$T_E_BATCH" "$T_J" "$T_W" "$T_S" if [ $FAIL -gt 0 ]; then exit 1; fi if [ $SKIP -gt 0 ] && [ "$ALLOW_SKIP" != "1" ]; then exit 1; fi diff --git a/test/test.mk b/test/test.mk @@ -23,10 +23,15 @@ # source file rather than a hand-built ObjBuilder fixture. Built # against the public cfree.h surface; reuses cfree-roundtrip, # link-exe-runner, and jit-runner. +# - test-asm: file-driven assembler/disassembler harness in test/asm/. +# Three sub-corpora (encode/, decode/, listing/), one mode per +# sub-dir. Phase 1: every smoke case carries a .skip sidecar because +# parse_asm / cfree_disasm_iter_* are still stubs; the harness builds +# and runs end-to-end so the wiring stays exercised. See doc/ASM.md. -.PHONY: test test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg test-dwarf test-debug test-parse test-parse-err test-libc test-musl test-glibc test-lib-deps test-smoke-x64 test-smoke-rv64 +.PHONY: test test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg test-dwarf test-debug test-parse test-parse-err test-asm test-libc test-musl test-glibc test-lib-deps test-smoke-x64 test-smoke-rv64 -test: test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg test-dwarf test-debug test-parse test-parse-err test-lib-deps +test: test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg test-dwarf test-debug test-parse test-parse-err test-asm test-lib-deps test-lex: bin @CFREE=$(abspath $(BIN)) test/lex/run.sh @@ -137,6 +142,13 @@ test-parse: lib $(PARSE_RUNNER) $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER test-parse-err: lib $(PARSE_RUNNER) sh test/parse/run_errors.sh +# test-asm: builds asm-runner inside the run.sh driver (the harness owns +# its compile rule, mirroring test/lex/test/pp). Phase 1: every smoke +# case carries a .skip sidecar; the harness wiring runs on every CI run +# so regressions in the public asm/disasm surface are caught early. +test-asm: lib + bash test/asm/run.sh + # test-smoke-x64: phase-1 sanity check for the multi-arch bring-up. Builds a # tiny freestanding x86_64 ELF with clang --target=x86_64-linux-gnu and # runs it through test/lib/exec_target.sh's podman/qemu pipeline,