commit cb33525798c6427becd139b2632ba02986406d1f
parent 0f61a9b9327b44012f4c9ef6059ea07381da2870
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 10 May 2026 11:46:05 -0700
ASM.md plan
Diffstat:
| A | doc/ASM.md | | | 487 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 487 insertions(+), 0 deletions(-)
diff --git a/doc/ASM.md b/doc/ASM.md
@@ -0,0 +1,487 @@
+# ASM — assembler and disassembler plan
+
+Scope: bring up the asm frontend (standalone `.s` and inline `asm("...")`)
+and the matching disassembler, starting with aarch64. Companion to
+`DESIGN.md §10` and `MULTIARCH.md`.
+
+The asm and disasm sides are designed together so that one description of
+each instruction serves both: same field layout, same operand syntax, same
+mnemonic table. When an opcode bit moves, the encoder and the decoder
+update at one site and stay in sync by construction.
+
+---
+
+## 1. Current state
+
+- `src/arch/aa64_isa.{h,c}`: per-format `pack`/`unpack` round-trippers and
+ a `(mnemonic, match, mask, AA64Format)` descriptor table.
+ `aa64_disasm_find` already linear-scans the table by `(word & mask) ==
+ match`. The encoders are inline wrappers that call `pack`. **This is the
+ pairing seam.** Half of the work below is just finishing what's started
+ here.
+- `src/parse/parse.h:23` declares `parse_asm`; `src/api/stubs.c:45`
+ implements it as a panic. `src/api/pipeline.c:208` already routes
+ `CFREE_LANG_ASM` inputs to it, so the wiring is in place.
+- `CGTarget.asm_block` is a method on every backend; `aa_asm_block`
+ (`arch/aarch64.c:2969`), `xx_asm_block`, `rv_asm_block`, and the opt
+ recorder's `w_asm_block` all panic.
+- `cg_inline_asm` (`src/cg/cg.c:1337`) is the parser-side entry; panics.
+- `arch_disasm_new` / `arch_disasm_decode` (`src/arch/arch.h:647`) are
+ declared but no impl exists. Public surface
+ (`cfree_disasm_iter_*`, `cfree_obj_disasm`,
+ `cfree_arch_register_*`) is in `include/cfree.h` and stubbed out in
+ `src/api/stubs.c`.
+
+So: data model decided, descriptor table partly populated, all behavior
+still a panic. The work is one focused vertical.
+
+---
+
+## 2. Target slice for first milestone
+
+| axis | value |
+|-----------|--------------------------------|
+| arch | `CFREE_ARCH_ARM_64` |
+| syntax | GNU `as` "unified" (per `DESIGN.md §10`) |
+| objfmt | ELF (the only one wired today) |
+| inline | aarch64 GCC `%w0`/`%x0`/`%[name]` substitution |
+| constraints | `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching) |
+
+Out of scope this pass:
+
+- x86 AT&T and RISC-V GNU syntax. Their per-arch parsers slot in as peers
+ once aarch64 proves the seams (see §6).
+- WAT for WASM. Different enough to merit its own document.
+- Macros / `.if` / `.macro` / `.altmacro`. The directive set is
+ intentionally small (§4.4); macros are a follow-up.
+- Full GCC constraint coverage (multi-alternative, `&` outside outputs,
+ most letter constraints). Tracked under `DESIGN.md §10` as deferred.
+
+---
+
+## 3. Encode/decode pairing — one description per instruction
+
+The discipline that makes asm and disasm cheap to keep in sync:
+
+```
+ ┌──────────────────┐
+ asm text ─lex─► │ per-format │ ─pack(fields)─► u32 bytes
+ │ parse_operands │ │
+ bytes ──────────┤ │ ◄─unpack(word)── │
+ │ per-format │
+ disasm text ◄────────┤ print_operands │
+ └──────────────────┘
+ ▲
+ │
+ AA64InsnDesc { mnemonic, match, mask, format }
+```
+
+Per format (already established in `aa64_isa.h`):
+
+- `AA64<Fmt>` field struct.
+- `aa64_<fmt>_pack(fields) -> u32`, `aa64_<fmt>_unpack(u32) -> fields`.
+- *(new)* `aa64_<fmt>_parse(Asm*, AA64InsnDesc*, fields*)` — parses the
+ operand grammar for this format and fills the field struct. Reads
+ per-instruction opcode bits from the descriptor, so one parser handles
+ all members of the family.
+- *(new)* `aa64_<fmt>_print(fields, sb)` — renders text for a decoded
+ word. The disasm text path and the round-trip check share this.
+
+Per instruction (one row in `aa64_insn_table`):
+
+- *(unchanged)* `mnemonic`, `match`, `mask`, `format`.
+- *(new)* a small `AsmFlags` byte for things that vary across same-format
+ members: alias status, sf-required, special operand syntax (e.g.
+ `RET` with optional `Xn`).
+
+**Aliases** (`MOV` for `ORR Rd, ZR, Rm`; `MUL` for `MADD ..., ZR`; `NEG`
+for `SUB Rd, ZR, Rm`) are extra rows with tighter masks placed *before*
+the canonical row in `aa64_insn_table.c`. First-match-wins is already the
+documented invariant. The disasm prints the alias; the asm accepts both
+the alias and the canonical spelling.
+
+**Source of truth.** Encoder, decoder, asm parser, and asm printer all go
+through `aa64_isa.h`. No second copy of the bit layout anywhere — not in
+`arch/aarch64.c` codegen helpers (those already call the inline encoders),
+not in test fixtures.
+
+**Round-trip property.** For every byte sequence `B` the disasm-then-asm
+round trip is idempotent: `assemble(disasm(B)) == B` for every
+instruction the assembler accepts. `disasm(assemble(disasm(B))) ==
+disasm(B)` is the testable form (see §7). This catches missing format
+entries and operand-print/parse drift.
+
+---
+
+## 4. Module layout
+
+Reuse `aa64` prefix (`MULTIARCH.md §5`).
+
+```
+src/parse/parse_asm.c shared driver: scan tokens, dispatch directives,
+ call per-arch instruction parser. New.
+src/arch/aa64_asm.{h,c} aa64 instruction parser + inline-asm template
+ walker. New. Owns AsmCtx and constraint binding.
+src/arch/aa64_disasm.{h,c} aa64 ArchDisasm impl. Wraps aa64_disasm_find
+ with operand printing. New.
+src/arch/aa64_isa.{h,c} already exists. Gains per-format
+ parse_operands / print_operands and
+ AsmFlags column on AA64InsnDesc.
+src/arch/disasm.c arch_disasm_new dispatch by c->target.arch
+ (peer of arch/cgtarget.c per MULTIARCH §2.1).
+ New.
+src/api/disasm.c cfree_disasm_iter_* / cfree_obj_disasm /
+ cfree_arch_register_* over arch_disasm_*.
+ Replaces stubs in src/api/stubs.c.
+```
+
+The four pieces fall on three seams: (a) `parse_asm` ↔ per-arch instruction
+parser, (b) `MCEmitter` is the byte sink for both asm and codegen, (c)
+`arch_disasm_new` ↔ per-arch decoder. `aa64_isa.h` is the shared truth
+crossing those seams.
+
+### 4.1 `parse_asm` driver — arch-agnostic
+
+```c
+void parse_asm(Compiler* c, Lexer* l, MCEmitter* mc) {
+ AsmDriver d = {.c = c, .lex = l, .mc = mc, .arch = aa64_asm_open(c)};
+ for (;;) {
+ Tok t = lex_next(l);
+ if (t.kind == TOK_EOF) break;
+ if (is_directive(t)) parse_directive(&d, t); /* .text, .globl, ... */
+ else if (is_label(t)) parse_label(&d, t); /* foo: */
+ else if (is_ident(t)) d.arch->insn(d.arch, &d, t); /* mnemonic line */
+ else parse_skip_to_newline(&d);
+ }
+ aa64_asm_close(d.arch);
+}
+```
+
+Directives, labels, expression evaluation, and the `MCEmitter` glue live
+in this file because every arch needs the same set. Per-arch code is one
+function pointer (`arch->insn`), one symbol (`aa64_asm_open`).
+
+### 4.2 `aa64_asm_open` — instruction parser
+
+```c
+typedef struct AsmCtx AsmCtx; /* tokens, scratch, label map */
+typedef struct AA64Asm {
+ Compiler* c;
+ void (*insn)(struct AA64Asm*, AsmDriver*, Tok mnemonic);
+ /* + register-name table, mnemonic→AA64InsnDesc lookup,
+ * inline-asm placeholder substitution state. */
+} AA64Asm;
+```
+
+`insn` looks the mnemonic up in `aa64_insn_table` (the same table
+`aa64_disasm_find` uses), dispatches on `format`, calls
+`aa64_<fmt>_parse` to fill the field struct, then calls
+`aa64_<fmt>_pack` and writes the `u32` through `mc->emit_bytes`. Branches
+also call `mc->emit_label_ref` for the relocatable bit slice.
+
+### 4.3 Inline asm — same parser, different operand source
+
+```c
+static void aa_asm_block(CGTarget* t, const char* tmpl,
+ const AsmConstraint* outs, u32 nout, Operand* out_ops,
+ const AsmConstraint* ins, u32 nin, const Operand* in_ops,
+ const Sym* clobbers, u32 nclob) {
+ AA64Asm* a = aa64_asm_open(t->c);
+ aa64_inline_bind(a, outs, nout, out_ops, ins, nin, in_ops, clobbers, nclob);
+ aa64_asm_run_template(a, t->mc, tmpl);
+ aa64_asm_close(a);
+}
+```
+
+The template walker is the same `aa64_<fmt>_parse` set used by the
+standalone path. The only delta is the operand lexer: in inline mode,
+`%0`, `%w0`, `%x0`, `%[name]` resolve to the bound `Operand` for the
+corresponding constraint. `%w0` prints the W-form register name (forces
+`sf=0`); `%x0` the X-form. Memory operands `%a0` materialize as
+`[Xn, #ofs]`. Bit width is checked against the format's expectation
+(e.g. a 32-bit format with `%x0` is a diagnostic).
+
+Constraint binding (v1 set):
+
+| constraint | meaning |
+|------------|---------------------------------------------------------------|
+| `r` / `=r` | int reg; allocated via the codegen scratch pool of the active CGTarget |
+| `+r` | input + output, same register |
+| `=&r` | early-clobber output (allocated disjoint from any input) |
+| `i` | compile-time integer; must be `OPK_IMM` |
+| `m` | memory operand; bind a scratch base reg if the source isn't `OPK_INDIRECT` |
+| `0` (etc.) | matching constraint: input must use the same physical reg as output 0 |
+
+`"memory"` clobber forces CG to flush all live stack values to memory
+before the block and reload after, per `DESIGN.md §10`. Register-name
+clobbers add to the "clobbered by call" set so RA does not reuse them
+across the block. `"cc"` is accepted and ignored on aarch64 (NZCV is
+reserved by the inline-asm contract anyway — no instruction outside the
+block reads it across the block).
+
+Under `opt_cgtarget` the call is recorded as `IR_ASM_BLOCK` (already an
+opaque-to-passes record per `DESIGN.md §9.5`); at lowering the wrapped
+target sees the same call with materialized operands.
+
+### 4.4 Directives — minimum viable set
+
+```
+.section NAME [, "FLAGS", @TYPE]
+.text .data .rodata .bss
+.globl SYM .local SYM .weak SYM .hidden SYM
+.type SYM, @function | @object
+.size SYM, EXPR
+.byte .hword .word .quad EXPR [, EXPR ...]
+.ascii "..." .asciz "..." .string "..."
+.zero N .skip N .fill N, SIZE, VALUE
+.align N .balign N .p2align N
+.set NAME, EXPR
+.equ NAME, EXPR (= .set)
+.file "name" (debug line filename)
+.loc FILE LINE [COL] (debug line row)
+```
+
+CFI directives (`.cfi_startproc`, `.cfi_def_cfa`, …) are accepted and
+forwarded to the corresponding `MCEmitter.cfi_*` calls (already exist
+per `arch/arch.h`). Unknown directives are a recovery diagnostic, not a
+panic — skip to newline.
+
+`.macro` / `.if` / `.include` are deferred. Inline asm gets there first
+because `cg_inline_asm` is the immediate consumer.
+
+### 4.5 Disassembler — `arch_disasm_new` for aarch64
+
+```c
+typedef struct AA64Disasm {
+ ArchDisasm base;
+ Compiler* c;
+ StrBuf mnemonic, operands, annotation; /* reused per decode */
+} AA64Disasm;
+
+u32 arch_disasm_decode(ArchDisasm* d_, const u8* b, size_t n,
+ u64 vaddr, CfreeInsn* out) {
+ if (n < 4) return 0;
+ u32 w = read_u32_le(b);
+ const AA64InsnDesc* ins = aa64_disasm_find(w);
+ if (!ins) { write_unknown(d, b, out); return 4; }
+ AA64Fields f = aa64_<ins->format>_unpack(w);
+ strbuf_set(&d->mnemonic, ins->mnemonic);
+ aa64_<ins->format>_print(&d->operands, ins, &f, vaddr);
+ out->vaddr = vaddr;
+ out->bytes = b; out->nbytes = 4;
+ out->mnemonic = d->mnemonic.p;
+ out->operands = d->operands.p;
+ out->annotation = ""; /* sym/reloc overlay added by cfree_obj_disasm */
+ return 4;
+}
+```
+
+Annotations (sym/reloc overlay) live one level up in `cfree_obj_disasm` /
+`cfree_disasm_iter_new(..., obj)`: the iterator walks `ObjBuilder` relocs
+keyed on the section + offset and writes the resolved `name+addend` into
+`annotation`. The arch-level decoder is reloc-unaware — it only reads
+bytes. This keeps `arch_disasm_decode` per-arch and the symbol/reloc
+overlay arch-agnostic.
+
+`cfree_arch_register_name` / `_index` table lives in `aa64_asm.c`
+alongside the parser (one canonical name list — same source for parse
+and print).
+
+---
+
+## 5. Phasing
+
+Each phase ends mergeable. Phase 1 stands up the test harness so every
+later phase gates on real runs from its first commit (mirrors
+`MULTIARCH.md §4` Phase 1). Phase 2 lands the encode/decode pairing as
+a mechanical refactor; phase 3 is the standalone assembler; phase 4 is
+inline asm + disasm overlay; phase 5 is the seam-rev for x64/rv64.
+
+### Phase 1 — test harness
+
+Stand up the runner before any compiler-side work. No `src/` changes.
+
+1. New `test/asm/` peer of `test/parse/`. One `run.sh`; three
+ sub-corpora (§6). Skip-vs-fail follows the `CFREE_TEST_ALLOW_SKIP`
+ convention used elsewhere — every case skips cleanly today because
+ `parse_asm` and `cfree_disasm_iter_*` are stubs.
+2. New `test-asm` target in `test/test.mk`; added to the default
+ `test` list.
+3. Add `S` (asm-roundtrip) path letter to `test/cg/run.sh`. Plumbed
+ to walk every `.text` byte of each cg-emitted aarch64 binary
+ through `cfree_disasm_iter_*` → `cfree as` → byte-compare. Skips
+ today; turns green when phases 3+4 land. Path matrix becomes
+ `DREJWS`.
+4. Smoke goldens checked in for one case per sub-corpus. Generated
+ from the host `as` / `objdump` once and committed; a `regen.sh`
+ documents how to refresh but is not run by default (same
+ convention as `test/elf/normalize.py`).
+5. New runner C binary `asm-runner` under `test/asm/harness/` —
+ peer of `cg-runner`. Three sub-commands: `--encode`, `--decode`,
+ `--listing`. Dispatches to `cfree as` / `cfree_disasm_iter_*` /
+ `cfree_obj_disasm` per case.
+
+Exit criterion: `make test-asm` runs end-to-end; smoke cases report
+SKIP under `CFREE_TEST_ALLOW_SKIP=1` (the default during phase 1) and
+the harness wiring is exercised on every CI run. `test/cg/run.sh -S`
+also reports SKIP cleanly. No green asm cases yet — that's phase 3.
+
+### Phase 2 — finish the ISA descriptor table
+
+Pure refactor. No new behavior; existing codegen still calls inline
+encoders.
+
+1. Add `parse_operands` and `print_operands` per `AA64Format` in
+ `aa64_isa.{h,c}`. The first cut prints into a `StrBuf` and parses
+ from a tiny operand lexer (reg name, `#imm`, `[Xn, #ofs]`, label).
+2. Add `AsmFlags` column to `AA64InsnDesc`. Mark aliases (`MOV`, `MUL`,
+ `NEG`, `RET`).
+3. Reorder rows in `aa64_insn_table` so aliases precede their canonical
+ forms.
+4. Backfill formats not yet in the table that codegen emits today (load/
+ store immediate, branch immediate, conditional branch, NOP, BRK).
+ Each lands as one format-struct + pack/unpack + parse/print + table
+ rows.
+
+Exit criterion: `aa64_disasm_find` returns a desc for every byte
+sequence the codegen currently produces; no test changes.
+
+### Phase 3 — standalone `.s` assembler
+
+1. New `src/parse/parse_asm.c`. Replace the panic in `src/api/stubs.c`.
+ Driver loop, directive parser, label management, expression
+ evaluator (constant + `sym + const`, full arithmetic on constants
+ per §7).
+2. New `src/arch/aa64_asm.{h,c}` with `aa64_asm_open` and the
+ instruction parser. Mnemonic lookup goes through `aa64_insn_table`.
+3. CFI directives forwarded to `MCEmitter.cfi_*`.
+4. `.loc` calls `mc->set_loc` so debug line tables work for hand-written
+ `.s`.
+5. `cfree as` driver subcommand (multi-call dispatch).
+
+Exit criterion: every `test/asm/encode/` case is green; every row of
+`aa64_insn_table` is hit by at least one encode case. `rt/` stays on
+clang.
+
+### Phase 4 — inline asm + disasm overlay
+
+1. Implement `aa_asm_block` in `arch/aarch64.c` calling into
+ `aa64_asm_run_template`. Implement `cg_inline_asm` in `cg/cg.c`:
+ evaluate inputs to `Operand`s, materialize `&buf` for `m` constraints,
+ call `target->asm_block`, push `out_ops` back as `SValue`s.
+2. Constraint binding (§4.3): `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0`.
+3. Memory clobber: CG flushes value stack (`spill_reg` for every live
+ reg-resident SValue) before the call, marks them invalid after.
+ Register clobbers route through the existing `clobbers` mechanism.
+4. `IR_ASM_BLOCK` already opaque-to-passes; opt recorder
+ (`opt.c:692`) materializes operands and replays.
+5. `arch_disasm_new` for aarch64 (`aa64_disasm.c`); dispatch in new
+ `arch/disasm.c`.
+6. `cfree_obj_disasm` / `cfree_disasm_iter_*` over `arch_disasm_*`,
+ plus reloc/symbol annotation overlay. `cfree_arch_register_*` table.
+
+Exit criterion: every `test/asm/decode/` and `test/asm/listing/` case
+is green; the inline-asm cases under `test/cg/` (svc-style write-then-
+exit) build, run under qemu/podman, and report green on `DREJWS`. The
+`S` path turns green for the full cg corpus, proving encode/decode
+pairing across every `.text` byte cfree currently emits.
+
+### Phase 5 — multiarch seam
+
+Land before x64/rv64 codegen needs it.
+
+1. `arch/disasm.c::arch_disasm_new` switches on `c->target.arch`
+ (currently aarch64-only).
+2. `parse_asm` driver dispatches per-arch instruction parser by
+ `c->target.arch`. `aa64_asm_open` becomes one of N constructors.
+3. Reg-name table dispatched the same way (`cfree_arch_register_name`).
+4. `x64_isa.{h,c}` and `rv64_isa.{h,c}` skeletons (formats + tables,
+ not populated). x64 brings AT&T, rv64 brings GNU. Each pulls in its
+ own `<arch>_asm.{h,c}` and `<arch>_disasm.{h,c}`. Per `DESIGN.md
+ §10` the asm flavour is decided per-arch, single supported flavour.
+
+Exit criterion: builds for `CFREE_ARCH_X86_64` reach the x64 asm/disasm
+stubs and panic with a clean diagnostic; aarch64 path unchanged.
+
+---
+
+## 6. Testing
+
+The pairing buys a strong test shape: most tests run the round trip
+rather than spelling expected bytes by hand. Three buckets, all wired
+in phase 1:
+
+### 6.1 `test/asm/` — file-driven goldens (new)
+
+Peer of `test/parse/`. One `run.sh`, one `asm-runner` C binary, three
+sub-corpora keyed off filename suffix:
+
+| dir | input | expected | drives |
+|-------------------------|--------------------|--------------------|------------------------------|
+| `test/asm/encode/` | `<name>.s` | `<name>.expected.hex` | `cfree as` over the `.s`, hex-compare against expected |
+| `test/asm/decode/` | `<name>.hex` | `<name>.expected.txt` | `cfree_disasm_iter_*` over the bytes, text-compare |
+| `test/asm/listing/` | `<name>.in.bin` (ELF) | `<name>.expected.lst` | `cfree_obj_disasm` against the ELF, listing-compare |
+
+Goldens are checked in. A `test/asm/regen.sh` regenerates them from
+the host `as` / `objdump` (committed only as a maintainer aid; not
+run by CI). One smoke case per sub-corpus is enough for phase 1; the
+table fills up alongside phases 3 and 4.
+
+### 6.2 `test/cg/` `S` path — asm roundtrip (new path letter)
+
+Path letter added to `test/cg/run.sh`. For every cg-emitted aarch64
+binary already in the corpus: walk `.text`, decode each instruction,
+re-assemble the resulting text, byte-compare. No new corpus —
+piggybacks on every existing cg case for free coverage. Catches
+encode/decode drift the moment a format gains a member.
+
+Reports SKIP today, green after phase 4. Path matrix becomes
+`DREJWS`. Skip-vs-fail and filtering match the rest of the cg paths
+verbatim.
+
+### 6.3 `test/cg/` inline-asm cases — under existing harness
+
+Inline asm is behavioral C with exit-code assertions, which is exactly
+what `test/cg/cases_*.c` already does. Add a new `cases_asm.c` (or
+fold cases into the existing buckets) registered through `cg-runner`
+the same way every other case is. The path matrix (`DREJW`) and the
+qemu/podman runner from `test/lib/exec_aarch64.sh` cover execution
+unchanged.
+
+### Driver wiring
+
+A standalone `cfree as` subcommand is exposed by the multi-call driver
+in phase 3 (same dispatch as `cfree -c <file.s>` modulo lang
+inference). `test/asm/encode/` drives `cfree as` directly so the
+multi-call dispatch is exercised end-to-end.
+
+---
+
+## 7. Decisions
+
+- **Disasm immediate format: context-sensitive.** Signed decimal for
+ fields the ISA defines as signed (branch displacements, signed-imm12
+ add/sub, load/store offsets). `0x`-prefixed hex everywhere else
+ (logical bitmask immediates, MOVZ/MOVK halfword, addresses).
+ `aa64_<fmt>_print` carries a per-field signedness bit; the print
+ helpers branch on it. Goldens lock the chosen form per format.
+- **`.s` constant expressions: arithmetic with parens.** Operators
+ `+ - * / % << >> & | ^ ~`, parenthesized, over signed integer
+ constants and `sym + const` terms. Symbol-involving expressions are
+ restricted to `(sym ± const)`; any product, quotient, shift, or
+ bitwise op that has a symbol operand is a diagnostic. Reloc-modifier
+ syntax (`:lo12:sym`, `:got:sym`) and macro counters (`\@`) are
+ deferred — they belong with the macro/full-PIC follow-up.
+- **`__cfree_setjmp.s` is decoupled.** Phase 2 lands against the
+ synthetic suites in §6, not against the runtime. `rt/` is currently
+ built with clang (`rt/Makefile`) and continues to be through phase 3;
+ `__cfree_setjmp.s` migrates to `parse_asm` as a follow-up after the
+ assembler is proven on the test corpus. The same applies to any
+ other `.s` files `rt/` adds before then.
+- **Absolute relocs in `.s`** (e.g. `.quad some_sym + 8` in `.data`)
+ go through `MCEmitter.emit_reloc_at` against the existing
+ `RelocKind` set — no new mechanism needed.
+- **Self-hosting.** Per `DESIGN.md §12`, anything in `src/` must be
+ C11-freestanding-writable. `parse_asm.c` and `aa64_asm.c` follow the
+ same rule. No reliance on a host assembler at build time *for the
+ compiler*; `rt/` still uses clang and is on its own bootstrap track.