commit 0463b6ab0c2e8e7f51c0aabca3a7730b2b3d2be7
parent cc72c49f15d87bd56dbed760c9a310e12de541a9
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 11 May 2026 10:38:07 -0700
ASM.md plan update
Diffstat:
| M | doc/ASM.md | | | 845 | ++++++++++++++++++++----------------------------------------------------------- |
| D | doc/INLINEASM.md | | | 325 | ------------------------------------------------------------------------------- |
2 files changed, 215 insertions(+), 955 deletions(-)
diff --git a/doc/ASM.md b/doc/ASM.md
@@ -1,95 +1,36 @@
-# ASM — assembler and disassembler plan
+# ASM — assembler, disassembler, inline asm
-Scope: bring up the asm frontend (standalone `.s` and inline `asm("...")`)
-and the matching disassembler, starting with aarch64. Companion to
-`DESIGN.md §10`.
+Scope: cfree's asm frontend — standalone `.s`, inline `asm("...")`, and the
+matching disassembler. aarch64 only today; x64 / rv64 are stubs that panic
+cleanly. Companion to `DESIGN.md §10`.
-The asm and disasm sides are designed together so that one description of
-each instruction serves both: same field layout, same operand syntax, same
-mnemonic table. When an opcode bit moves, the encoder and the decoder
-update at one site and stay in sync by construction.
+Asm and disasm are designed together: one description of each instruction
+serves both. When an opcode bit moves, encoder and decoder update at one site
+and stay in sync by construction.
---
-## 1. Current state
-
-- `src/arch/aa64_isa.{h,c}`: per-format `pack`/`unpack` round-trippers
- and a `(mnemonic, match, mask, AA64Format, AsmFlags)` descriptor
- table. `aa64_disasm_find` linear-scans the table by
- `(word & mask) == match`; first-match-wins, with alias rows placed
- before their canonical form. `aa64_print_operands` renders operand
- text via per-format helpers — shared between the disasm iterator and
- the object listing. **This is the pairing seam.**
-- `src/parse/parse_asm.c`: arch-agnostic .s driver — directives
- (`.text/.data/.rodata/.bss/.section/.globl/.local/.weak/.hidden/
- .protected/.internal/.type/.size/.byte/.hword/.word/.quad/.ascii/
- .asciz/.string/.zero/.skip/.fill/.align/.balign/.p2align/.set/.equ`
- plus accepted-but-ignored `.cfi_*`/`.file`/`.loc`/`.macro`/...),
- labels, full constant-expression evaluator (`+ - * / % << >> & | ^ ~`
- with parens), `sym ± const` symbolic terms, string-literal decoding
- with C-style escapes. Wired by `src/api/pipeline.c:208`; the panic
- stub in `src/api/stubs.c` is gone.
-- `src/arch/aa64_asm.{h,c}`: per-mnemonic dispatch over
- `aa64_insn_table` via the inline encoders in `aa64_isa.h`. Coverage
- spans `nop, ret/br/blr, mov(reg/imm)/mvn/movz/movn/movk,
- add(s)/sub(s)/cmp/cmn/neg(s), and/orr/eor/bic/orn/eon/ands/bics,
- madd/msub/mul/mneg, udiv/sdiv/lslv/lsrv/asrv/rorv, b/bl/b.<cc>/
- cbz/cbnz, svc/brk/hlt, ldr/str (scaled + simm9 fallback), ldur/stur,
- ldp/stp (signed-offset + pre-indexed), adr/adrp`. Branches emit
- `R_AARCH64_{CALL,JUMP}26` / `R_AARCH64_CONDBR19`; `adr/adrp` emit
- `R_AARCH64_ADR_PREL_{LO21,PG_HI21}`.
-- `src/arch/aa64_disasm.{h,c}` + `src/arch/disasm.c`: aarch64
- `ArchDisasm` impl wraps `aa64_disasm_find` + `aa64_print_operands`,
- synthesizing `b.<cond>` mnemonics from the BR_COND format.
- `arch_disasm_new` dispatches by `c->target.arch` (aarch64 only; x64
- / rv64 panic with a clean diagnostic).
-- `src/api/disasm.c`: public `cfree_disasm_iter_*` and
- `cfree_obj_disasm` over `arch_disasm_*`, plus the reloc/symbol
- annotation overlay (rendered into the iterator's annotation buffer
- per decoded word).
-- `src/arch/aa64_regs.{h,c}` + `src/api/arch_regs.c`: stateless
- `cfree_arch_register_name`/`_index` queries against the canonical
- aarch64 register table — same source list the parser and printer
- consume.
-- `driver/as.c`: `cfree as` multi-call subcommand wired to
- `cfree_compile_obj_emit(CFREE_LANG_ASM)`. Accepts `-target` for
- cross-assembly, `-g` for debug-info forwarding (no-op until CFI
- storage is wired).
-- `CGTarget.asm_block` (inline asm) is still a panic on every backend
- (`aa_asm_block`, `xx_asm_block`, `rv_asm_block`, the opt recorder's
- `w_asm_block`); `cg_inline_asm` (`src/cg/cg.c`) likewise. Inline-asm
- bring-up is the remaining phase-4 work.
-
-So: standalone `.s` end-to-end works (encode → ELF → disasm
-round-trip, plus JIT execute). Inline asm is the next vertical.
+## 1. Status
----
-
-## 2. Target slice for first milestone
-
-| axis | value |
-|-----------|--------------------------------|
-| arch | `CFREE_ARCH_ARM_64` |
-| syntax | GNU `as` "unified" (per `DESIGN.md §10`) |
-| objfmt | ELF (the only one wired today) |
-| inline | aarch64 GCC `%w0`/`%x0`/`%[name]` substitution |
-| constraints | `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching) |
-
-Out of scope this pass:
+| layer | state |
+|---|---|
+| standalone `.s` (parse → ELF, JIT, round-trip) | aarch64 ✓ |
+| disasm (`cfree_disasm_iter_*`, `cfree_obj_disasm`) | aarch64 ✓ |
+| inline `asm("...")` C statement | aarch64 ✓ |
+| `cfree as` multi-call driver subcommand | aarch64 ✓ |
+| `cfree_arch_register_name` / `_index` | aarch64 ✓ |
+| x64 / rv64 backends (asm, disasm, inline) | panic with clean diagnostic |
-- x86 AT&T and RISC-V GNU syntax. Their per-arch parsers slot in as peers
- once aarch64 proves the seams (see §6).
-- WAT for WASM. Different enough to merit its own document.
-- Macros / `.if` / `.macro` / `.altmacro`. The directive set is
- intentionally small (§4.4); macros are a follow-up.
-- Full GCC constraint coverage (multi-alternative, `&` outside outputs,
- most letter constraints). Tracked under `DESIGN.md §10` as deferred.
+Coverage of `aa64_asm.c` per-mnemonic table: `nop, ret/br/blr,
+mov(reg/imm)/mvn/movz/movn/movk, add(s)/sub(s)/cmp/cmn/neg(s),
+and/orr/eor/bic/orn/eon/ands/bics, madd/msub/mul/mneg,
+udiv/sdiv/lslv/lsrv/asrv/rorv, b/bl/b.<cc>/cbz/cbnz, svc/brk/hlt, ldr/str
+(scaled + simm9 fallback), ldur/stur, ldp/stp (signed-offset + pre-indexed),
+adr/adrp`.
---
-## 3. Encode/decode pairing — one description per instruction
-
-The discipline that makes asm and disasm cheap to keep in sync:
+## 2. Encode/decode pairing — the design discipline
```
┌──────────────────┐
@@ -101,591 +42,235 @@ The discipline that makes asm and disasm cheap to keep in sync:
└──────────────────┘
▲
│
- AA64InsnDesc { mnemonic, match, mask, format }
+ AA64InsnDesc { mnemonic, match, mask, format, AsmFlags }
```
-Per format (already established in `aa64_isa.h`):
-
-- `AA64<Fmt>` field struct.
-- `aa64_<fmt>_pack(fields) -> u32`, `aa64_<fmt>_unpack(u32) -> fields`.
-- *(new)* `aa64_<fmt>_parse(Asm*, AA64InsnDesc*, fields*)` — parses the
- operand grammar for this format and fills the field struct. Reads
- per-instruction opcode bits from the descriptor, so one parser handles
- all members of the family.
-- *(new)* `aa64_<fmt>_print(fields, sb)` — renders text for a decoded
- word. The disasm text path and the round-trip check share this.
-
-Per instruction (one row in `aa64_insn_table`):
-
-- *(unchanged)* `mnemonic`, `match`, `mask`, `format`.
-- *(new)* a small `AsmFlags` byte for things that vary across same-format
- members: alias status, sf-required, special operand syntax (e.g.
- `RET` with optional `Xn`).
-
-**Aliases** (`MOV` for `ORR Rd, ZR, Rm`; `MUL` for `MADD ..., ZR`; `NEG`
-for `SUB Rd, ZR, Rm`) are extra rows with tighter masks placed *before*
-the canonical row in `aa64_insn_table.c`. First-match-wins is already the
-documented invariant. The disasm prints the alias; the asm accepts both
-the alias and the canonical spelling.
-
-**Source of truth.** Encoder, decoder, asm parser, and asm printer all go
-through `aa64_isa.h`. No second copy of the bit layout anywhere — not in
-`arch/aarch64.c` codegen helpers (those already call the inline encoders),
-not in test fixtures.
-
-**Round-trip property.** For every byte sequence `B` the disasm-then-asm
-round trip is idempotent: `assemble(disasm(B)) == B` for every
-instruction the assembler accepts. `disasm(assemble(disasm(B))) ==
-disasm(B)` is the testable form (see §7). This catches missing format
-entries and operand-print/parse drift.
+Per format (in `aa64_isa.h`): `AA64<Fmt>` struct + `pack` / `unpack` /
+`print`. Per instruction (one row in `aa64_insn_table`): mnemonic, match,
+mask, format, AsmFlags (alias / sf-required / etc.).
----
+**Source of truth.** Encoder, decoder, asm parser, asm printer all go through
+`aa64_isa.h`. No second copy of the bit layout anywhere. If `S`
+(asm-roundtrip) fails on a cg-emitted word, fix the format definition; never
+the parser site.
-## 4. Module layout
+**Aliases** (`MOV` for `ORR Rd, ZR, Rm`; `MUL` for `MADD ..., ZR`; `NEG` for
+`SUB Rd, ZR, Rm`) are extra rows with tighter masks placed *before* the
+canonical row. First-match-wins picks the alias spelling.
-Reuse `aa64` prefix.
+---
+
+## 3. Module layout
```
-src/parse/parse_asm.c shared driver: scan tokens, dispatch directives,
- label management, expression evaluation,
- call per-arch instruction parser.
+src/parse/parse_asm.c arch-agnostic .s driver: directives, labels,
+ expression evaluator, string decoding.
+ asm_driver_open_inline constructor for inline
+ asm template parsing.
src/parse/parse_asm_helpers.h
- driver↔arch seam (asm_driver_peek/next/
- parse_const/parse_sym_expr/intern_sym/panic).
- AsmDriver itself stays internal to parse_asm.c.
+ driver↔arch seam (peek/next/eat_*/parse_const/
+ parse_sym_expr/intern_sym/panic).
+ AsmDriver stays opaque.
+src/parse/parse.c parse_asm_stmt: GNU asm("...") statement
+ grammar (volatile, goto, four colon-separated
+ lists, [name] symbolic operands).
+src/arch/aa64_isa.{h,c} per-format pack/unpack/print + AA64InsnDesc
+ table + alias flags. Shared between encoder,
+ decoder, and printer.
src/arch/aa64_asm.{h,c} aa64 instruction parser: per-mnemonic dispatch
- over aa64_insn_table → inline encoders in
- aa64_isa.h. Phase 4b will grow the inline-asm
- template walker on top of the same parsers.
-src/arch/aa64_disasm.{h,c} aa64 ArchDisasm impl. Wraps aa64_disasm_find
- with operand printing; synthesizes b.<cond>.
-src/arch/aa64_regs.{h,c} canonical aarch64 register name list — same
- source the parser and printer consume.
-src/arch/aa64_isa.{h,c} per-format pack/unpack + print_operands
- dispatcher + AsmFlags column on AA64InsnDesc.
- aa64_parse_operands declared but unused by
- phase 3 (we dispatch per-mnemonic instead;
- the table-driven parser belongs with the
- remaining cg-emitted formats — see §5).
-src/arch/disasm.c arch_disasm_new dispatch by c->target.arch
- (peer of arch/cgtarget.c per MULTIARCH §2.1).
-src/api/disasm.c cfree_disasm_iter_* / cfree_obj_disasm over
- arch_disasm_*, plus reloc/symbol overlay.
-src/api/arch_regs.c cfree_arch_register_name / _index dispatch.
-driver/as.c `cfree as` subcommand: cross-target flag,
- -g/-o, single positional input. Drives
+ over the table → inline encoders.
+ aa64_inline_bind + aa64_asm_run_template
+ implement the inline-asm template walker.
+src/arch/aa64_disasm.{h,c} aa64 ArchDisasm impl wrapping aa64_disasm_find +
+ aa64_print_operands; synthesizes b.<cond>.
+src/arch/aa64_regs.{h,c} canonical aarch64 register name list.
+src/arch/disasm.c arch_disasm_new dispatch on c->target.arch.
+src/arch/aarch64.c aa_asm_block: CGTarget vtable entry for inline
+ asm; opens AA64Asm, binds operands, runs
+ template, closes.
+src/cg/cg.c cg_inline_asm: constraint binder (pops inputs,
+ allocates output regs, handles "memory"
+ clobber, calls target->asm_block, pushes
+ outputs).
+src/opt/opt.c w_asm_block recorder + IR_ASM_BLOCK replay
+ (mirrors w_call / IR_CALL).
+src/api/disasm.c cfree_disasm_iter_* / cfree_obj_disasm + reloc/
+ symbol annotation overlay.
+src/api/arch_regs.c stateless cfree_arch_register_name / _index
+ dispatcher.
+driver/as.c cfree as subcommand wired to
cfree_compile_obj_emit(CFREE_LANG_ASM).
```
-The pieces fall on three seams: (a) `parse_asm` ↔ per-arch instruction
-parser via `parse_asm_helpers.h`, (b) `MCEmitter` is the byte sink for
-both asm and codegen, (c) `arch_disasm_new` ↔ per-arch decoder.
-`aa64_isa.h` is the shared truth crossing all three.
-
-### 4.1 `parse_asm` driver — arch-agnostic
-
-```c
-void parse_asm(Compiler* c, Lexer* l, MCEmitter* mc) {
- AsmDriver d = {.c = c, .lex = l, .mc = mc, .arch = aa64_asm_open(c)};
- for (;;) {
- Tok t = lex_next(l);
- if (t.kind == TOK_EOF) break;
- if (is_directive(t)) parse_directive(&d, t); /* .text, .globl, ... */
- else if (is_label(t)) parse_label(&d, t); /* foo: */
- else if (is_ident(t)) d.arch->insn(d.arch, &d, t); /* mnemonic line */
- else parse_skip_to_newline(&d);
- }
- aa64_asm_close(d.arch);
-}
-```
-
-Directives, labels, expression evaluation, and the `MCEmitter` glue live
-in this file because every arch needs the same set. Per-arch code is one
-function pointer (`arch->insn`), one symbol (`aa64_asm_open`).
-
-### 4.2 `aa64_asm_open` — instruction parser
-
-```c
-void aa64_asm_insn(AA64Asm*, AsmDriver*, Sym mnemonic);
-```
-
-The parser dispatches per-mnemonic against a small in-file table
-(`{name, parse_fn}`). Each `parse_fn` reads operands via the
-`asm_driver_*` helpers (register, immediate, memory addressing) and
-calls the inline encoder in `aa64_isa.h` for its format
-(`aa64_movz`, `aa64_add`, `aa64_ldr64_uimm12`, ...). One parser per
-operand grammar — register-vs-immediate variants of the same
-mnemonic (e.g. `add Rd, Rn, Rm` vs. `add Rd, Rn, #imm`) branch on
-the first non-Rd operand. Aliases (`mov`, `mvn`, `cmp`, `cmn`,
-`neg`, `mul`, `mneg`, no-operand `ret`) live as dedicated rows that
-emit the canonical encoding directly.
-
-Branches do not go through `mc->emit_label_ref`; the parser emits
-the instruction with `imm26=0`/`imm19=0` and records a reloc
-(`R_AARCH64_CALL26`/`JUMP26`/`CONDBR19`) against the operand's
-ObjSymId via `mc->emit_reloc_at`. The linker (and the in-process
-fixup machinery in `src/arch/mc.c`) applies the displacement at
-relocation time.
-
-The table-driven `aa64_parse_operands` declared in `aa64_isa.h`
-(phase-2 placeholder) remains stubbed — phase 3 chose per-mnemonic
-dispatch because it lets one parser handle alias / immediate /
-register-form branching at the right level. The format-driven
-parser slots in alongside this one for the remaining cg-emitted
-formats when `S` (cg round-trip) needs them.
-
-### 4.3 Inline asm — same parser, different operand source
-
-```c
-static void aa_asm_block(CGTarget* t, const char* tmpl,
- const AsmConstraint* outs, u32 nout, Operand* out_ops,
- const AsmConstraint* ins, u32 nin, const Operand* in_ops,
- const Sym* clobbers, u32 nclob) {
- AA64Asm* a = aa64_asm_open(t->c);
- aa64_inline_bind(a, outs, nout, out_ops, ins, nin, in_ops, clobbers, nclob);
- aa64_asm_run_template(a, t->mc, tmpl);
- aa64_asm_close(a);
-}
-```
-
-The template walker is the same `aa64_<fmt>_parse` set used by the
-standalone path. The only delta is the operand lexer: in inline mode,
-`%0`, `%w0`, `%x0`, `%[name]` resolve to the bound `Operand` for the
-corresponding constraint. `%w0` prints the W-form register name (forces
-`sf=0`); `%x0` the X-form. Memory operands `%a0` materialize as
-`[Xn, #ofs]`. Bit width is checked against the format's expectation
-(e.g. a 32-bit format with `%x0` is a diagnostic).
-
-Constraint binding (v1 set):
-
-| constraint | meaning |
-|------------|---------------------------------------------------------------|
-| `r` / `=r` | int reg; allocated via the codegen scratch pool of the active CGTarget |
-| `+r` | input + output, same register |
-| `=&r` | early-clobber output (allocated disjoint from any input) |
-| `i` | compile-time integer; must be `OPK_IMM` |
-| `m` | memory operand; bind a scratch base reg if the source isn't `OPK_INDIRECT` |
-| `0` (etc.) | matching constraint: input must use the same physical reg as output 0 |
-
-`"memory"` clobber forces CG to flush all live stack values to memory
-before the block and reload after, per `DESIGN.md §10`. Register-name
-clobbers add to the "clobbered by call" set so RA does not reuse them
-across the block. `"cc"` is accepted and ignored on aarch64 (NZCV is
-reserved by the inline-asm contract anyway — no instruction outside the
-block reads it across the block).
-
-Under `opt_cgtarget` the call is recorded as `IR_ASM_BLOCK` (already an
-opaque-to-passes record per `DESIGN.md §9.5`); at lowering the wrapped
-target sees the same call with materialized operands.
-
-### 4.4 Directives — minimum viable set
-
-```
-.section NAME [, "FLAGS", @TYPE]
-.text .data .rodata .bss
-.globl SYM .local SYM .weak SYM .hidden SYM
-.type SYM, @function | @object
-.size SYM, EXPR
-.byte .hword .word .quad EXPR [, EXPR ...]
-.ascii "..." .asciz "..." .string "..."
-.zero N .skip N .fill N, SIZE, VALUE
-.align N .balign N .p2align N
-.set NAME, EXPR
-.equ NAME, EXPR (= .set)
-.file "name" (debug line filename)
-.loc FILE LINE [COL] (debug line row)
-```
-
-CFI directives (`.cfi_startproc`, `.cfi_def_cfa`, …) are accepted and
-forwarded to the corresponding `MCEmitter.cfi_*` calls (already exist
-per `arch/arch.h`). Unknown directives are a recovery diagnostic, not a
-panic — skip to newline.
-
-`.macro` / `.if` / `.include` are deferred. Inline asm gets there first
-because `cg_inline_asm` is the immediate consumer.
-
-### 4.5 Disassembler — `arch_disasm_new` for aarch64
-
-```c
-typedef struct AA64Disasm {
- ArchDisasm base;
- Compiler* c;
- StrBuf mnemonic, operands, annotation; /* reused per decode */
-} AA64Disasm;
-
-u32 arch_disasm_decode(ArchDisasm* d_, const u8* b, size_t n,
- u64 vaddr, CfreeInsn* out) {
- if (n < 4) return 0;
- u32 w = read_u32_le(b);
- const AA64InsnDesc* ins = aa64_disasm_find(w);
- if (!ins) { write_unknown(d, b, out); return 4; }
- AA64Fields f = aa64_<ins->format>_unpack(w);
- strbuf_set(&d->mnemonic, ins->mnemonic);
- aa64_<ins->format>_print(&d->operands, ins, &f, vaddr);
- out->vaddr = vaddr;
- out->bytes = b; out->nbytes = 4;
- out->mnemonic = d->mnemonic.p;
- out->operands = d->operands.p;
- out->annotation = ""; /* sym/reloc overlay added by cfree_obj_disasm */
- return 4;
-}
-```
-
-Annotations (sym/reloc overlay) live one level up in `cfree_obj_disasm` /
-`cfree_disasm_iter_new(..., obj)`: the iterator walks `ObjBuilder` relocs
-keyed on the section + offset and writes the resolved `name+addend` into
-`annotation`. The arch-level decoder is reloc-unaware — it only reads
-bytes. This keeps `arch_disasm_decode` per-arch and the symbol/reloc
-overlay arch-agnostic.
-
-`cfree_arch_register_name` / `_index` live in `aa64_regs.{h,c}`
-alongside one canonical name list shared by the parser, the printer,
-and the public API. `src/api/arch_regs.c` is the stateless dispatcher
-(`switch (arch)` over per-arch tables); the iterator surface remains
-a NULL-returning stub pending an env/heap on its constructor (see the
-TODO at the top of `src/api/arch_regs.c`).
-
----
-
-## 5. Phasing
-
-Each phase ends mergeable. Phase 1 stands up the test harness so every
-later phase gates on real runs from its first commit. Phase 2 lands the
-encode/decode pairing as a mechanical refactor; phase 3 is the standalone
-assembler; phase 4 splits into 4a (disasm overlay) and 4b (inline asm);
-phase 5 is the seam-rev for x64/rv64.
-
-Phases 1, 2, 3, and 4a are DONE. Phase 4b (inline asm) and phase 5
-(multiarch) remain.
-
-### Phase 1 — test harness (DONE)
-
-Stand up the runner before any compiler-side work. No `src/` changes.
-
-- [x] New `test/asm/` peer of `test/parse/`. One `run.sh`; three
- sub-corpora (`encode/`, `decode/`, `listing/`). Skip-vs-fail follows
- the `CFREE_TEST_ALLOW_SKIP` convention used elsewhere — every case
- skips cleanly today because `parse_asm` and `cfree_disasm_iter_*`
- are stubs. `CFREE_TEST_ALLOW_SKIP` defaults to `1` in the asm
- harness for the duration of phase 1; flip to `0` once the assembler
- and disasm iterator are real.
-- [x] New `test-asm` target in `test/test.mk`; added to the default
- `test` list.
-- [x] Add `S` (asm-roundtrip) path letter to `test/cg/run.sh`. Skips
- today; turns green when phases 3+4 land. Recognized in the path
- matrix; `S` is opt-in (`run.sh '' S` or
- `CFREE_TEST_PATHS=DREJWS`) until phase 4 lands, so the default
- `DREJW` continues to gate CI cleanly. Becomes part of the default
- matrix in phase 4.
-- [x] Smoke goldens checked in for one case per sub-corpus. A
- `test/asm/regen.sh` documents how to refresh them from the host
- `clang --target=aarch64-linux-gnu` / `llvm-objdump`; it is committed
- as a maintainer aid and is not run by CI (same convention as
- `test/elf/normalize.py`).
-- [x] New runner C binary `asm-runner` under `test/asm/harness/` —
- peer of `parse-runner`. Five sub-commands: `--encode`, `--decode`,
- `--listing`, `--emit`, `--jit`. The first three dispatch to
- `cfree_compile_obj_emit(CFREE_LANG_ASM)` / `cfree_disasm_iter_*` /
- `cfree_obj_disasm`; `--emit` writes a `.o` to disk so the J and E
- exec paths can reuse the `test/link` harness binaries; `--jit`
- parses + JIT-links and calls `test_main`.
-- [x] Path matrix for `test/asm/run.sh`: `HTLDJE`. `H` hex encode,
- `T` text decode, `L` listing, `D` direct JIT, `J` jit-via-file,
- `E` ELF exec under qemu/podman. D/J/E only run on `encode/`
- cases with an `<name>.expected` exit-code sidecar.
-
-Exit criterion (met): `make test-asm` runs end-to-end; the three smoke
-cases report SKIP for every path they apply to and the harness
-wiring is exercised on every CI run. `bash test/cg/run.sh '' S` also
-reports SKIP cleanly. No green asm cases yet — that's phase 3.
-
-### Phase 2 — finish the ISA descriptor table (DONE)
-
-Pure refactor. No new behavior; existing codegen still calls inline
-encoders.
-
-- [x] Added `aa64_print_operands` dispatcher plus per-format
- `print_*` helpers in `aa64_isa.c`; renders into a new tiny `StrBuf`
- (`src/core/strbuf.{h,c}`). `aa64_parse_operands` is declared with
- the phase-3 signature and stubbed to return 0 — phase 3 fills the
- per-format grammar in once the asm token stream lands.
-- [x] Added an `AsmFlags` byte on `AA64InsnDesc`
- (`AA64_ASMFL_ALIAS / SF1 / NORN`). Aliases marked: `MOV` (ORR Rd,
- ZR, Rm), `MVN` (ORN), `NEG` / `NEGS` (SUB / SUBS Rd, ZR, Rm),
- `CMP` / `CMN` (SUBS / ADDS ZR, Rn, Rm), `MUL` / `MNEG` (MADD /
- MSUB with Ra=ZR), `RET`-no-operand (RET X30).
-- [x] Reordered `aa64_insn_table` so each alias precedes its
- canonical form. First-match-wins now picks the alias spelling.
-- [x] Backfilled formats codegen emits: `BR_IMM` (B / BL),
- `BR_COND` (B.cond), `CB` (CBZ / CBNZ), `EXCEPT` (BRK / SVC / HLT),
- `LDST_SIMM9` (LDUR / STUR, V=0 and V=1, every size),
- `LDSTP_SOFF` (STP / LDP signed-offset, X and D forms).
- `LDST_UIMM` and `LDSTP_PRE` rows expanded to cover every
- size × V combination codegen emits today. Each format lands as
- one struct + pack/unpack + print + table rows; phase-3
- parse-grammar bodies follow once the asm token stream exists.
-- [x] New `test/arch/aa64_isa_test.c` + `make test-isa` target.
- Exercises one representative word per format, asserts mnemonic
- and operand text, and pins the alias-precedence invariant
- (`ORR Rd, ZR, Rm` resolves to "mov", `ORR Rd, Rn, Rm` to "orr").
- Added to the default `test` list.
-
-Exit criterion (met): `aa64_disasm_find` returns a desc for the
-representative word of every format used in this phase, and the unit
-test pins that contract for future regressions. Full byte-by-byte
-coverage of every cg-emitted word becomes enforced when the `S`
-path on `test/cg/run.sh` turns green in phase 4 — the remaining
-codegen-only formats (bitfield, condsel, FP-DP1/2, FP↔int cvt,
-ldst-exclusive, dmb/clrex, mrs, dp1, SIMD basic) get table rows then.
-
-### Phase 3 — standalone `.s` assembler (DONE)
-
-- [x] New `src/parse/parse_asm.c`. Panic stub in `src/api/stubs.c`
- removed. Driver loop, directive parser, label management,
- expression evaluator (constants with `+ - * / % << >> & | ^ ~` and
- parens; `sym ± const` for symbolic terms), string-literal decoding
- with C-style escapes.
-- [x] New `src/parse/parse_asm_helpers.h`. Lightweight surface
- (`asm_driver_peek/next/eat_*/parse_const/parse_sym_expr/intern_sym/
- panic/...`) the per-arch parser consumes; the AsmDriver struct
- itself stays internal to `parse_asm.c`.
-- [x] New `src/arch/aa64_asm.{h,c}` with `aa64_asm_open` /
- `aa64_asm_insn`. Per-mnemonic dispatch over `aa64_insn_table`
- resolved through the inline encoders in `aa64_isa.h` (no second
- copy of the bit layout). Composite mnemonics (`b.eq`, `b.ne`, ...)
- are stitched in the driver before dispatch.
-- [x] Reloc-emitting operands: branches → `R_AARCH64_CALL26` /
- `JUMP26`; conditional branches and CBZ/CBNZ → `R_AARCH64_CONDBR19`;
- `adr`/`adrp` → `R_AARCH64_ADR_PREL_LO21` / `_PG_HI21`. Data
- directives (`.word`/`.quad`) with a symbolic operand emit
- `R_ABS32`/`R_ABS64` through `MCEmitter.emit_reloc_at`, no new
- mechanism needed (per §7).
-- [x] CFI directives accepted (parsed + skipped) — forward to
- `MCEmitter.cfi_*` once those hooks store records (today they are
- no-ops in `src/arch/mc.c`). `.loc` and `.file` likewise accepted-
- and-ignored; wiring them to `mc->set_loc` is a follow-up that
- drops in without touching the parser shape.
-- [x] `cfree as` driver subcommand (`driver/as.c`) — accepts
- `-target TRIPLE`, `-g`, `-o OUT.o INPUT.s`. Same composition point
- as `cfree -c <file.s>` modulo lang inference.
-- [x] Smoke-case skips dropped from `test/asm/encode/`,
- `test/asm/decode/`, `test/asm/listing/`. `test-asm` runs green on
- every path it can on the host (E skips on a non-aarch64 host when
- no exec runner is configured).
-
-Exit criterion (met for the smoke corpus): the phase-1 encode case
-runs through H (hex roundtrip), D (direct JIT execute), J (JIT via
-file). Coverage of every row in `aa64_insn_table` becomes enforced
-when the `S` path on `test/cg/run.sh` turns on by default (see
-§6.2); the remaining codegen-only formats (bitfield, condsel,
-FP-DP1/2, FP↔int cvt, ldst-exclusive, dmb/clrex, mrs, dp1, SIMD
-basic) gain table rows + parser coverage in lockstep with `S`.
-
-### Phase 4a — disasm overlay (DONE)
-
-- [x] `src/arch/aa64_disasm.{h,c}`: aarch64 `ArchDisasm` impl wraps
- `aa64_disasm_find` + `aa64_print_operands`. Owns the per-iterator
- StrBuf storage for mnemonic / operands / annotation. Mnemonic
- rewrite for `b.<cond>` happens here (the printer keeps the BR_COND
- format opcode-agnostic).
-- [x] `src/arch/disasm.c`: dispatcher peer of `src/arch/cgtarget.c`,
- switches `arch_disasm_new` on `c->target.arch`. aarch64 only;
- x86_64 / rv64 panic with a clean diagnostic.
-- [x] `src/api/disasm.c`: `cfree_disasm_iter_new/next/free` and
- `cfree_obj_disasm` over `arch_disasm_*`, plus the reloc/symbol
- annotation overlay (rendered per-decoded-word into the iterator
- buffer; the arch decoder stays reloc-unaware).
-- [x] `src/arch/aa64_regs.{h,c}` + `src/api/arch_regs.c`: stateless
- `cfree_arch_register_name` / `_index` against one canonical reg
- table — same source the parser and printer share.
-
-Exit criterion (met): every `test/asm/decode/` and
-`test/asm/listing/` case is green; `cfree objdump -d` over the
-output of `cfree as` round-trips the smoke corpus.
-
-### Phase 4b — inline asm
-
-See `doc/INLINEASM.md` for the detailed plan (scope, files, constraint
-binder, template walker, parallelization across three tracks, testing).
-The summary: stand up `parse_asm_stmt` in `parse.c`, implement
-`cg_inline_asm` (constraint binder + `"memory"` clobber spill), implement
-`aa_asm_block` + `aa64_asm_run_template` in `aa64_asm.c`, wire
-`w_asm_block` recorder + `IR_ASM_BLOCK` replay in `opt.c`. Three tracks
-(frontend / cg+opt / aa64) merge in any order behind the existing panic
-stubs.
-
-Exit criterion: the inline-asm cases under `test/cg/` (svc-style
-write-then-exit) build, run under qemu/podman, and report green on
-`DREJWS`. The `S` path turns green for the full cg corpus, proving
-encode/decode pairing across every `.text` byte cfree currently
-emits.
-
-### Phase 5 — multiarch seam
-
-Land before x64/rv64 codegen needs it.
-
-1. `arch/disasm.c::arch_disasm_new` switches on `c->target.arch`
- (currently aarch64-only).
-2. `parse_asm` driver dispatches per-arch instruction parser by
- `c->target.arch`. `aa64_asm_open` becomes one of N constructors.
-3. Reg-name table dispatched the same way (`cfree_arch_register_name`).
-4. `x64_isa.{h,c}` and `rv64_isa.{h,c}` skeletons (formats + tables,
- not populated). x64 brings AT&T, rv64 brings GNU. Each pulls in its
- own `<arch>_asm.{h,c}` and `<arch>_disasm.{h,c}`. Per `DESIGN.md
- §10` the asm flavour is decided per-arch, single supported flavour.
-
-Exit criterion: builds for `CFREE_ARCH_X86_64` reach the x64 asm/disasm
-stubs and panic with a clean diagnostic; aarch64 path unchanged.
+Three seams: (a) `parse_asm` ↔ per-arch instruction parser via
+`parse_asm_helpers.h`, (b) `MCEmitter` as the byte sink for both asm and
+codegen, (c) `arch_disasm_new` ↔ per-arch decoder. `aa64_isa.h` is the
+shared truth crossing all three.
---
-## 6. Testing
+## 4. Inline asm — constraint binder + template walker
-The pairing buys a strong test shape: most tests run the round trip
-rather than spelling expected bytes by hand. Three buckets, all wired
-in phase 1:
+**Constraints (v1)**: `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching by
+index). `AsmConstraint` carries `{str, name, type, dir}` — `name` is the
+optional `[name]` Sym, `type` is the bound expression's C type (drives
+`RegClass` + width). Hand-built test constraints with `NULL` type fall back
+to 64-bit int.
-### 6.1 `test/asm/` — file-driven goldens (new)
+**Clobbers**:
+- `"memory"` — spill all live RES_REG SValues via `target->spill_reg`;
+ subsequent reads reload through `target->reload_reg`. Same machinery cg
+ uses across function calls.
+- Register names (`"x0"`, …) — passed through to `target->asm_block`.
+- `"cc"` — silently ignored on aarch64 (NZCV reserved across the block).
-Peer of `test/parse/`. One `run.sh`, one `asm-runner` C binary, three
-sub-corpora keyed off filename suffix:
+**Placeholders**: `%N`, `%wN` (force W form), `%xN` (force X form),
+`%[name]` (resolved against `AsmConstraint.name`), `%aN` (memory addressing
+form), `%%`. The walker pre-substitutes them into asm source text and
+re-lexes through the standalone per-mnemonic parsers — no second operand
+grammar.
-| dir | input | expected | drives |
-|-------------------------|--------------------|--------------------|------------------------------|
-| `test/asm/encode/` | `<name>.s` | `<name>.expected.hex` | `cfree as` over the `.s`, hex-compare against expected |
-| `test/asm/decode/` | `<name>.hex` | `<name>.expected.txt` | `cfree_disasm_iter_*` over the bytes, text-compare |
-| `test/asm/listing/` | `<name>.in.bin` (ELF) | `<name>.expected.lst` | `cfree_obj_disasm` against the ELF, listing-compare |
+**IR**: `IR_ASM_BLOCK` with `IRAsmAux { tmpl, outs, ins, in_ops, out_ops,
+clobbers, nout, nin, nclob }`. The opt recorder arena-copies the payload;
+replay xlat_op's each Operand and forwards to the wrapped target.
-Goldens are checked in. A `test/asm/regen.sh` regenerates them from
-the host `as` / `objdump` (committed only as a maintainer aid; not
-run by CI). One smoke case per sub-corpus is enough for phase 1; the
-table fills up alongside phases 3 and 4.
-
-### 6.2 `test/cg/` `S` path — asm roundtrip (new path letter)
-
-Path letter added to `test/cg/run.sh`. For every cg-emitted aarch64
-binary already in the corpus: walk `.text`, decode each instruction,
-re-assemble the resulting text, byte-compare. No new corpus —
-piggybacks on every existing cg case for free coverage. Catches
-encode/decode drift the moment a format gains a member.
-
-Reports SKIP today, green after phase 4. Path matrix becomes
-`DREJWS`. Skip-vs-fail and filtering match the rest of the cg paths
-verbatim.
-
-### 6.3 `test/cg/` inline-asm cases — under existing harness
-
-Inline asm is behavioral C with exit-code assertions, which is exactly
-what `test/cg/cases_*.c` already does. Add a new `cases_asm.c` (or
-fold cases into the existing buckets) registered through `cg-runner`
-the same way every other case is. The path matrix (`DREJW`) and the
-qemu/podman runner from `test/lib/exec_aarch64.sh` cover execution
-unchanged.
-
-### Driver wiring
-
-A standalone `cfree as` subcommand is exposed by the multi-call driver
-in phase 3 (same dispatch as `cfree -c <file.s>` modulo lang
-inference). `test/asm/encode/` drives `cfree as` directly so the
-multi-call dispatch is exercised end-to-end.
+**`asm volatile`**: accepted but informational — `IR_ASM_BLOCK` is already
+opaque-to-passes, so volatile changes nothing at the IR level.
---
-## 7. Decisions
-
-- **Disasm immediate format: context-sensitive.** Signed decimal for
- fields the ISA defines as signed (branch displacements, signed-imm12
- add/sub, load/store offsets). `0x`-prefixed hex everywhere else
- (logical bitmask immediates, MOVZ/MOVK halfword, addresses).
- `aa64_<fmt>_print` carries a per-field signedness bit; the print
- helpers branch on it. Goldens lock the chosen form per format.
-- **`.s` constant expressions: arithmetic with parens.** Operators
- `+ - * / % << >> & | ^ ~`, parenthesized, over signed integer
- constants and `sym + const` terms. Symbol-involving expressions are
- restricted to `(sym ± const)`; any product, quotient, shift, or
- bitwise op that has a symbol operand is a diagnostic. Reloc-modifier
- syntax (`:lo12:sym`, `:got:sym`) and macro counters (`\@`) are
- deferred — they belong with the macro/full-PIC follow-up.
-- **`__cfree_setjmp.s` is decoupled.** Phase 2 lands against the
- synthetic suites in §6, not against the runtime. `rt/` is currently
- built with clang (`rt/Makefile`) and continues to be through phase 3;
- `__cfree_setjmp.s` migrates to `parse_asm` as a follow-up after the
- assembler is proven on the test corpus. The same applies to any
- other `.s` files `rt/` adds before then.
-- **Absolute relocs in `.s`** (e.g. `.quad some_sym + 8` in `.data`)
- go through `MCEmitter.emit_reloc_at` against the existing
- `RelocKind` set — no new mechanism needed.
-- **Self-hosting.** Per `DESIGN.md §12`, anything in `src/` must be
- C11-freestanding-writable. `parse_asm.c` and `aa64_asm.c` follow the
- same rule. No reliance on a host assembler at build time *for the
- compiler*; `rt/` still uses clang and is on its own bootstrap track.
-
----
-
-## 8. Running the tests
+## 5. Testing
```
-make test-asm # full asm harness: all paths, all sub-corpora
-make test # includes test-asm in the default suite
+make test-asm # standalone .s harness
+make test-isa # aa64 ISA descriptor table
+make test-aa64-inline # aa64 inline-asm walker (hand-built Operands)
+make test-cg-binder # cg_inline_asm constraint binder (mock CGTarget)
+make test # includes all of the above
```
-The harness lives in `test/asm/`. See `test/asm/CORPUS.md` for the
-sub-corpus layout and `test/asm/regen.sh` for golden refresh.
-
-### Filtering and path selection
+### `test/asm/` — standalone
-```
-bash test/asm/run.sh # default: every case, HTLDJE
-bash test/asm/run.sh nop # name substring filter
-bash test/asm/run.sh '' HT # only H (hex encode) + T (decode)
-bash test/asm/run.sh exit_zero DJE # exec paths for one case
-CFREE_TEST_FILTER=nop CFREE_TEST_PATHS=L bash test/asm/run.sh
-```
+Three sub-corpora keyed off filename suffix:
-Path letters:
+| dir | input | expected | drives |
+|---|---|---|---|
+| `test/asm/encode/` | `<name>.s` | `<name>.expected.hex` | `cfree as`, hex-compare |
+| `test/asm/decode/` | `<name>.hex` | `<name>.expected.txt` | `cfree_disasm_iter_*`, text-compare |
+| `test/asm/listing/` | `<name>.in.bin` (ELF) | `<name>.expected.lst` | `cfree_obj_disasm`, listing-compare |
-| letter | path | input | check |
-|--------|------------------|--------------|--------------------------------|
-| `H` | Hex encode | `encode/*.s` | `--encode` → diff `.expected.hex` |
-| `T` | Text decode | `decode/*.hex` | `--decode` → diff `.expected.txt` |
-| `L` | Listing | `listing/*.in.bin` | `--listing` → diff `.expected.lst` |
-| `D` | Direct JIT | `encode/*.s` (with `.expected` exit) | `--jit` → exit code |
-| `J` | JIT via file | `encode/*.s` (with `.expected` exit) | `--emit` + `jit-runner` |
-| `E` | ELF exec | `encode/*.s` (with `.expected` exit) | `--emit` + `link-exe-runner` + qemu/podman |
+`test/asm/regen.sh` regenerates from host `as` / `objdump` (maintainer
+aid; not run by CI).
-D and J need the host arch to match `CFREE_TEST_ARCH` (no cross-JIT);
-E uses qemu/podman per `test/lib/exec_target.sh` and is cross-host
-friendly.
+### Path letters — `test/asm/run.sh` and `test/cg/run.sh`
-### Skip sidecars
+| letter | path | check |
+|---|---|---|
+| `H` | hex encode | `--encode` → diff `.expected.hex` |
+| `T` | text decode | `--decode` → diff `.expected.txt` |
+| `L` | listing | `--listing` → diff `.expected.lst` |
+| `D` | direct JIT | `--jit` → exit code |
+| `J` | JIT via file | `--emit` + `jit-runner` |
+| `E` | ELF exec | `--emit` + `link-exe-runner` + qemu/podman |
+| `S` | asm round-trip (cg corpus) | decode every cg-emitted insn, re-assemble, byte-compare |
-The phase-1 smoke `.skip` sidecars are gone; the corresponding
-subsystems are real. New cases that hit an unimplemented mnemonic or
-directive can still drop a `<name>.skip` sidecar — single-line reason
-— and the harness will report SKIP. Run with
-`CFREE_TEST_ALLOW_SKIP=0` to surface skips as failures (the default
-in CI from phase 3 onward).
-
-### Cross-target
+`S` is opt-in on `test/cg/run.sh` (default matrix stays `DREJW`) until the
+remaining cg-emitted formats land in `aa64_insn_table`. Run explicitly:
```
-CFREE_TEST_ARCH=aa64 bash test/asm/run.sh # default
-CFREE_TEST_ARCH=x64 bash test/asm/run.sh # x64 lane (no green cases yet)
-CFREE_TEST_ARCH=rv64 bash test/asm/run.sh # rv64 lane
+bash test/cg/run.sh '' DREJWS # full matrix incl. S
+bash test/cg/run.sh '' S # just S
```
-### The `S` path on `test/cg/run.sh`
+---
-`S` (asm roundtrip across every cg-emitted aarch64 binary) is
-recognized but opt-in until the assembler covers every cg-emitted
-format. The default cg matrix stays `DREJW`. Run it explicitly:
+## 6. Remaining TODOs
+
+End-to-end inline asm:
+
+- [ ] `test/cg/cases_asm.c` smoke case (smallest: `__asm__ volatile("mov
+ w0,%w0; svc #0" :: "r"(rc) : "x0")`) wired into the cg corpus on the
+ `DREJWS` path matrix.
+- [ ] Lift `test/parse/cases/asm_01_grammar.skip` once `dmb sy` and
+ `asm goto` reach `aa64_insn_table` + parser.
+- [ ] `aa64_asm_run_template`: support multi-digit operand index (`%10+`);
+ respect `;` inside `[...]` / quoted strings rather than splitting on
+ every `;`.
+- [ ] `aa_asm_block`: consume `clobbers[]` to inform the aarch64 RA of
+ register-name clobbers (today v1 relies on Track B's `"memory"`
+ spill + parser pass-through; register-name clobbers don't yet evict
+ from the allocator).
+
+`aa64_insn_table` coverage (gates `S` on the full cg corpus):
+
+- [ ] bitfield (`ubfm`/`sbfm` families)
+- [ ] condsel (`csel`/`csinc`/`csinv`/`csneg`)
+- [ ] FP-DP1 / FP-DP2 (`fadd`/`fsub`/`fmul`/`fdiv`/`fneg`/`fabs`/`fsqrt`)
+- [ ] FP↔int cvt (`fcvtzs`/`scvtf` families)
+- [ ] ldst-exclusive (`ldxr`/`stxr`/`ldaxr`/`stlxr`)
+- [ ] memory barriers (`dmb`/`dsb`/`isb`/`clrex`)
+- [ ] system reg access (`mrs`/`msr`)
+- [ ] data-processing 1-source (`rbit`/`rev`/`clz`/`cls`)
+- [ ] SIMD basic (the cg-emitted subset)
+
+Once these land, flip `S` into the default cg matrix and drop the
+`asm_01_grammar.skip`.
+
+Multiarch seam (Phase 5):
+
+- [ ] `arch/disasm.c::arch_disasm_new` switches by `c->target.arch`
+ (currently aarch64-only; x64 / rv64 panic).
+- [ ] `parse_asm` driver dispatches per-arch instruction parser by
+ `c->target.arch`. `aa64_asm_open` becomes one of N constructors.
+- [ ] `cfree_arch_register_name` dispatched the same way.
+- [ ] `x64_isa.{h,c}` + `rv64_isa.{h,c}` skeletons (formats + tables,
+ not populated). x64 brings AT&T, rv64 brings GNU.
+
+Driver / runtime:
+
+- [ ] CFI directives (`.cfi_*`) are parsed and accepted-but-ignored;
+ forward to `MCEmitter.cfi_*` once those hooks store records.
+- [ ] `.loc` / `.file` likewise accepted-and-ignored; wire to
+ `mc->set_loc` for inline DWARF line info.
+- [ ] `__cfree_setjmp.s` (and any other `.s` in `rt/`) migrate from clang
+ to `parse_asm` once the assembler proves on the cg corpus.
+
+Deferred / explicitly out of scope:
+
+- Macros, `.if`, `.macro`, `.altmacro` inside templates — same deferral
+ as standalone.
+- Multi-alternative constraints; most GCC letter constraints — tracked
+ under `DESIGN.md §10`.
+- WAT for WASM (separate document).
-```
-bash test/cg/run.sh '' DREJWS # full matrix incl. S
-bash test/cg/run.sh '' S # just S
-```
+---
+
+## 7. Decisions
-`S` becomes part of the default cg matrix once the remaining
-codegen-only formats (bitfield, condsel, FP-DP1/2, FP↔int cvt,
-ldst-exclusive, dmb/clrex, mrs, dp1, SIMD basic) gain
-`aa64_insn_table` rows and matching `aa64_asm` parsers.
+- **Disasm immediate format**: context-sensitive. Signed decimal for fields
+ the ISA defines as signed (branch displacements, signed-imm12 add/sub,
+ load/store offsets). `0x`-prefixed hex everywhere else. `aa64_<fmt>_print`
+ carries a per-field signedness bit; goldens lock the chosen form.
+- **`.s` constant expressions**: arithmetic with parens — `+ - * / % << >>
+ & | ^ ~` over signed integer constants and `sym + const` terms.
+ Symbol-involving expressions restricted to `(sym ± const)`. Reloc-modifier
+ syntax (`:lo12:sym`, `:got:sym`) and macro counters (`\@`) are deferred.
+- **Absolute relocs in `.s`** (e.g. `.quad some_sym + 8` in `.data`) go
+ through `MCEmitter.emit_reloc_at` against the existing `RelocKind` set —
+ no new mechanism.
+- **Operand transport for inline asm**: parser pushes only inputs onto the
+ CG stack; outputs come back as fresh SValues that the parser assigns to
+ the declared lvalues. Matches the `cg_inline_asm` docstring at
+ `src/cg/cg.h:178-181`.
+- **Template lexing**: pre-substitute placeholders to physical asm text and
+ re-lex via the standalone parser, instead of carrying `Operand`s in `Tok`
+ variants. One operand grammar, one lexer; cost is one extra StrBuf pass
+ per inline block.
+- **Memory clobber**: route through `target->spill_reg` /
+ `target->reload_reg` — same machinery cg already uses across function
+ calls. No new flush mechanism.
+- **`asm goto`**: parsed and rejected in `cg_inline_asm`. Keyword grammar
+ ships; label-ref machinery does not.
+- **Self-hosting**: per `DESIGN.md §12`, anything in `src/` must be
+ C11-freestanding-writable. `parse_asm.c`, `aa64_asm.c`, `parse_asm_stmt`
+ in `parse.c` all follow the rule. `rt/` is on its own bootstrap track.
diff --git a/doc/INLINEASM.md b/doc/INLINEASM.md
@@ -1,325 +0,0 @@
-# INLINEASM — Phase 4b plan
-
-Scope: bring up GCC-style `asm("...")` inline assembly for aarch64. Companion
-to `doc/ASM.md` (which covers the standalone `.s` pipeline already shipped in
-phases 1–4a). This document is the per-phase detail; `doc/ASM.md §5 Phase 4b`
-is a one-paragraph pointer here.
-
-The discipline from `doc/ASM.md §3` carries over: one description of each
-instruction lives in `aa64_isa.h`, and inline asm reuses the standalone
-per-mnemonic parsers verbatim by pre-substituting placeholders into asm source
-text and re-lexing. No second copy of the operand grammar.
-
----
-
-## 1. Scope
-
-In:
-
-- `CFREE_ARCH_ARM_64` only.
-- GNU asm syntax (per `DESIGN.md §10`); `asm`, `__asm__`, `asm volatile`,
- `asm volatile goto` accepted at the keyword level.
-- Constraints: `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching by index).
-- Clobbers: register names route through call-clobber tracking; `"memory"`
- spills/reloads all live SValues; `"cc"` accepted-and-ignored on aarch64
- (NZCV reserved across inline-asm blocks per `doc/ASM.md §4.3`).
-- Placeholders: `%N`, `%wN`, `%xN`, `%[name]`, `%aN`, `%%`.
-
-Deferred:
-
-- Multi-alternative constraints, most letter constraints (per `doc/ASM.md §2`).
-- `asm goto` labels — accept syntactically, error in `cg_inline_asm`.
-- Macros, `.if`, `.macro` inside templates — same deferral as standalone.
-- x64 / rv64 backends — Phase 5 multiarch seam revs the dispatch.
-
----
-
-## 2. Current state
-
-Where the seams are today:
-
-- `src/parse/parse.c` keyword table (~line 115): no `KW_ASM` /
- `KW_BUILTIN_ASM`. `parse_stmt` (~line 5628) has no asm dispatch arm.
-- `src/cg/cg.h:182` declares `cg_inline_asm(...)`. Body at `src/cg/cg.c:1522`
- is a panic stub. The contract docstring at lines 178-181 already says
- inputs ride the CG stack and outputs are pushed back as fresh SValues.
-- `src/arch/aarch64.c:3099` (`aa_asm_block`) — panic stub. Same shape on
- `src/arch/x64.c:2783` (`x_asm_block`) and `src/arch/rv64.c:2529`
- (`rv_asm_block`).
-- `src/opt/opt.c:693` (`w_asm_block`) — panic stub. Replay at
- `src/opt/opt.c:815` is a `break;`.
-- `src/opt/ir.h:88` defines `IR_ASM_BLOCK`; `src/opt/ir.h:170-177` defines
- `IRAsmAux { tmpl, outs, ins, clobbers, out_ops, nout, nin, nclob }`.
-- `src/arch/arch.h:251-269` defines `Operand` (kind: `OPK_IMM`, `OPK_REG`,
- `OPK_LOCAL`, `OPK_GLOBAL`, `OPK_INDIRECT`). `src/arch/arch.h:370-374`
- defines `AsmConstraint { str, dir }`.
-- `src/parse/parse_asm_helpers.h` exposes `asm_driver_*` token plumbing,
- `asm_driver_parse_const`, `asm_driver_parse_sym_expr`, `asm_driver_intern_sym`,
- `asm_driver_panic`, etc. The `AsmDriver` struct stays internal to
- `src/parse/parse_asm.c:46-68`.
-- `src/arch/aa64_asm.h` exposes `aa64_asm_open(c)`, `aa64_asm_close(a)`,
- `aa64_asm_insn(a, d, mnemonic)`. Per-mnemonic table at
- `src/arch/aa64_asm.c:787` (`kTable[]`); `aa64_asm_insn` linear-scans by
- case-insensitive name.
-
-So: every panic site is in place, the IR carrier exists, and the standalone
-asm parser is reusable. Phase 4b wires them together.
-
----
-
-## 3. End-to-end flow
-
-```
-asm("...") in C source
- │
- ▼ parse.c::parse_asm_stmt
-push input exprs onto CG stack, build AsmConstraint[] + clobber Sym[]
- │
- ▼ cg.c::cg_inline_asm
-pop inputs → materialize Operand per constraint
-allocate regs for outputs (r/=r/+r/=&r honoring matching `0`)
-if "memory" clobber: spill all live SValues
-call g->target->asm_block(..., out_ops, in_ops, clobbers)
-push out SValues for parser to assign to lvalues
- │
- ├─► arch/aarch64.c::aa_asm_block
- │ aa64_asm_open → aa64_inline_bind(out_ops, in_ops, clobbers)
- │ aa64_asm_run_template(mc, tmpl)
- │ substitute %0/%w0/%x0/%[name]/%a0 → physical text
- │ for each line: lex via memory-backed Lexer, dispatch
- │ through aa64_asm_insn (existing per-mnemonic table)
- │ aa64_asm_close
- │
- └─► opt/opt.c::w_asm_block (recorder)
- arena-copy template/outs/ins/clobbers/in_ops into IRAsmAux
- rec(o, IR_ASM_BLOCK); replay xlat_op's operands and forwards
- to the wrapped target.
-```
-
----
-
-## 4. Files
-
-| File | Change |
-|------|--------|
-| `src/parse/parse.c` (~5705) | Add `KW_ASM` + `KW_BUILTIN_ASM` to `kw_names[]`; add `parse_asm_stmt` and dispatch from `parse_stmt`. Reuse `pool_intern_cstr(p->c->global, ...)` for constraint/clobber strings. |
-| `src/cg/cg.c:1522` (`cg_inline_asm`) | Replace panic body. Pop `nin` SValues via `pop(g)`; materialize per-constraint into `Operand`s; allocate output regs via `g->target->alloc_reg`; honor matching `0` constraints; on `"memory"` clobber spill live regs through `g->target->spill_reg`; call `g->target->asm_block`; push out SValues. |
-| `src/cg/cg.h:182` | Signature stays — outputs and inputs ride the CG stack, contract already documented at lines 178-181. |
-| `src/arch/aa64_asm.h` | Add `aa64_inline_bind(AA64Asm*, ...)`, `aa64_asm_run_template(AA64Asm*, MCEmitter*, const char* tmpl)` declarations. |
-| `src/arch/aa64_asm.c` | Implement template walker: `%N` / `%wN` / `%xN` / `%[name]` / `%aN` render, per-line memory-Lexer → minimal inline `AsmDriver` → existing `aa64_asm_insn`. Width-check `%wN` vs format's `sf` expectation. |
-| `src/arch/aarch64.c:3099` (`aa_asm_block`) | Replace panic body with `aa64_asm_open → aa64_inline_bind → aa64_asm_run_template → aa64_asm_close`. |
-| `src/parse/parse_asm.c` + `src/parse/parse_asm_helpers.h` | Expose `asm_driver_open_inline(c, mc, lexer)` constructor for an inline-mode `AsmDriver` that reads from a memory buffer + caller-supplied MCEmitter. `AsmDriver` stays opaque. |
-| `src/opt/opt.c:693` (`w_asm_block`) | Mirror `w_call` (lines 495-513): `arena_znew(IRAsmAux)`, copy `tmpl` + `outs` / `ins` / `clobbers` via `arena_array`, copy `in_ops`, allocate `out_ops` slots; `rec(o, IR_ASM_BLOCK)`. |
-| `src/opt/opt.c:815` (`IR_ASM_BLOCK` replay) | Fetch aux, `xlat_op` each `in_ops[i]` and `out_ops[i]`, call `w->asm_block(...)` with materialized arrays. |
-| `src/arch/x64.c:2783`, `src/arch/rv64.c:2529` | Leave panics. Phase 5 work. |
-
----
-
-## 5. Constraint binder (`cg_inline_asm` core)
-
-V1 set per `doc/ASM.md §4.3`:
-
-| constraint | behavior |
-|------------|----------|
-| `r` (in) | pop SValue; `force_reg` to ensure `OPK_REG`; bind to that physical reg |
-| `=r` (out) | `t->alloc_reg(t, RC_INT, type)`; out_ops[i].kind = OPK_REG |
-| `+r` (inout) | reuse the matching input's reg (popped first); out_ops shares it |
-| `=&r` (early-clobber) | `t->alloc_reg` from the set disjoint from already-bound input regs |
-| `i` | pop SValue; assert `op.kind == OPK_IMM`, else `compiler_panic` |
-| `m` | pop SValue; if not already `OPK_INDIRECT`, materialize via the lvalue address machinery (or allocate a scratch base reg + store) |
-| `0`/`1`/… | matching: bind input slot to the same physical reg as the referenced output slot |
-
-Clobbers:
-
-- `"memory"` — call `g->target->spill_reg` for every live reg-resident SValue
- on the stack before the block; mark them spilled so subsequent reads reload
- through `g->target->reload_reg`. Same machinery cg already uses across
- function calls.
-- Register-name clobbers (`"x0"`, …) — pass through to `target->asm_block`;
- the aarch64 backend routes them through the existing call-clobber tracking
- (`t->clobbers` exposes the same set).
-- `"cc"` — silently dropped on aarch64.
-
-After the block:
-
-- For each `=r` / `+r` / `=&r` output, push `SValue{op = out_ops[i],
- type = output_expr_type}` onto the value stack. The parser then assigns each
- output SValue to its lvalue expression via the standard cg mechanisms.
-
----
-
-## 6. Template walker (`aa64_asm_run_template`)
-
-1. Split `tmpl` on `\n` and `;` into asm lines.
-2. For each line, scan for `%` placeholders and emit substitutions into a
- per-call `StrBuf` (`src/core/strbuf.h`):
- - `%N` → register name in default form (use `%xN` width for int reg
- operands, `#imm` for `OPK_IMM`, `[xN, #ofs]` for `OPK_INDIRECT`).
- - `%wN` / `%xN` → forces 32 / 64 register form (diagnostic on mismatch
- with format's `sf` width).
- - `%[name]` → resolved via the optional `[name]` syntax on constraints
- (capture name during parse and store on AsmConstraint).
- - `%aN` → `[Xn, #ofs]` materialization for `m` constraint.
- - `%%` → literal `%`.
-3. Open a memory-backed Lexer over the rendered line, construct a minimal
- inline `AsmDriver` via `asm_driver_open_inline(c, mc, lexer)`, dispatch
- `aa64_asm_insn(a, d, mnemonic)` exactly like the standalone driver.
-4. Branches inside inline asm emit relocs against locally-interned symbol
- names — labels declared in the template (if any) are interned the same way
- `parse_asm.c` already handles them.
-
----
-
-## 7. Parallelization
-
-The work splits cleanly along three seams that each compile and unit-test in
-isolation. Three engineers (or three Claude sessions) can land them in
-parallel; the integration commit is small.
-
-```
- ┌───────────────────────┐
- │ Track A — frontend │ parse.c keyword + parse_asm_stmt;
- │ │ exercised by a temporary stub
- │ │ cg_inline_asm that just records args.
- └─────────┬─────────────┘
- │ shared: AsmConstraint[], clobber Sym[], template str
- ┌─────────┴─────────────┐
- │ Track B — cg / opt │ cg_inline_asm body + w_asm_block
- │ │ recorder + IR_ASM_BLOCK replay.
- │ │ Mocks target->asm_block to a logger
- │ │ for unit tests.
- └─────────┬─────────────┘
- │ shared: Operand[] in_ops/out_ops, Sym[] clobbers, tmpl
- ┌─────────┴─────────────┐
- │ Track C — aa64 backend│ aa64_inline_bind +
- │ │ aa64_asm_run_template + aa_asm_block.
- │ │ Driven by a tiny C-test that builds
- │ │ Operand arrays by hand.
- └───────────────────────┘
-```
-
-### 7.1 Tracks
-
-- **Track A — Frontend (`parse.c`)** — owns `KW_ASM` / `KW_BUILTIN_ASM`
- keyword registration, GNU asm statement grammar (`asm [volatile] [goto]
- (...)`, the four colon-separated lists), constraint string interning. Stops
- at the call to `cg_inline_asm`. Lands behind the existing panic stub on
- `cg_inline_asm`, so the parser merges first; new asm cases panic cleanly
- until B lands.
-- **Track B — CG + Opt (`cg.c`, `opt.c`)** — owns `cg_inline_asm` body
- (constraint binder, output reg allocation, `"memory"` clobber spill),
- `w_asm_block` recorder, `IR_ASM_BLOCK` replay. Lands against a
- `target->asm_block` mock that appends `(tmpl, in_ops, out_ops, clobbers)`
- to a log so unit tests can assert the binder's contract without depending
- on the aarch64 backend being ready.
-- **Track C — aarch64 backend (`aa64_asm.c`, `aarch64.c`)** — owns
- `aa64_inline_bind`, `aa64_asm_run_template`, the placeholder substitution
- pass, `asm_driver_open_inline`, and the `aa_asm_block` thunk. Lands with a
- tiny C unit test that constructs `Operand` arrays directly and invokes
- `aa_asm_block` against an in-process MCEmitter, so it can be merged before
- A or B is finished.
-
-### 7.2 Cut points and contracts
-
-The two seams are typed contracts each track can pin in a header without
-needing the others' implementations:
-
-1. **Parser → CG**: `cg_inline_asm(g, tmpl, outs, nout, ins, nin, clobbers,
- nclob)`. Already declared at `src/cg/cg.h:182`. Track A targets the
- declared signature; Track B implements it; Track A merges with the
- panic-stub still in place if B is not ready yet.
-2. **CG → Target**: `target->asm_block(t, tmpl, outs, nout, out_ops, ins,
- nin, in_ops, clobbers, nclob)`. Already declared at
- `src/arch/arch.h:605-608`. Track B can bind against
- `aa_panic("asm_block")` while Track C builds; cg-side tests use a
- dedicated mock `CGTarget` so they don't depend on landing order.
-
-### 7.3 Integration order
-
-A and C can land in either order. B blocks neither but unlocks end-to-end
-green:
-
-1. A merges (asm keyword + parser + stub call). cg corpus stays green
- because no test exercises asm yet; new asm cases panic at the
- `cg_inline_asm` stub.
-2. C merges (aa64 inline backend + unit test). cg corpus stays green; unit
- test gates the placeholder walker.
-3. B merges. The stub goes away. The new `test/cg/cases_asm.c` smoke flips
- to green on `DREJWS`. `S` path validates encode/decode pairing across the
- new inline-asm bytes.
-
-If only one engineer is on the work, do A → C → B (front-to-back) so each
-step lands an exercisable surface; if two, run A and C in parallel and
-merge B last.
-
----
-
-## 8. Testing
-
-Inline asm is behavioral C with exit-code assertions, which is exactly what
-`test/cg/cases_*.c` already does. Add `test/cg/cases_asm.c` (or fold into an
-existing bucket) registered through `cg-runner` the same way every other
-case is.
-
-Smallest smoke case:
-
-```c
-void test_main(void) {
- int rc = 42;
- __asm__ volatile("mov w0, %w0; svc #0" : : "r"(rc) : "x0");
-}
-```
-
-Path matrix:
-
-- `D` (direct JIT), `J` (JIT via file), `E` (ELF exec under qemu/podman) —
- exit-code assertions.
-- `R` (round-trip), `W` (opt-recorder) — exercise `w_asm_block` /
- `IR_ASM_BLOCK` replay.
-- `S` (asm round-trip) — encode/decode pairing across the new inline-asm
- bytes; if this fails, fix the format definition in `aa64_isa.h`, never the
- parser site.
-
-Run the full matrix:
-
-```
-bash test/cg/run.sh '' DREJWS
-```
-
-Coverage cases (one each so the binder can't silently regress):
-
-- `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching).
-- `"memory"` clobber.
-- Register-name clobber.
-
-Cross-target panic preserved:
-
-```
-CFREE_TEST_ARCH=x64 bash test/cg/run.sh ''
-```
-
-still panics cleanly via the existing `x_panic("asm_block")` on inline-asm
-cases — no silent miscompile.
-
-`make test-asm` is unaffected (no changes to the standalone-`.s` codepath).
-
----
-
-## 9. Decisions
-
-- **Operand transport**: parser pushes only inputs onto the CG stack;
- outputs come back as fresh SValues that the parser assigns to the
- declared lvalues. This matches the existing `cg_inline_asm` docstring at
- `src/cg/cg.h:178-181`.
-- **Template lexing**: pre-substitute placeholders to physical asm text and
- re-lex via the standalone parser, instead of a Tok variant carrying an
- `Operand`. Keeps one operand grammar and one lexer; the cost is one extra
- StrBuf pass per inline block.
-- **Memory clobber**: route through `target->spill_reg` /
- `target->reload_reg`, the same machinery cg uses across function calls.
- No new flush mechanism.
-- **`asm volatile`**: accepted but informational — `IR_ASM_BLOCK` is
- already opaque-to-passes per `doc/ASM.md §9.5`, so volatile changes
- nothing at the IR level today.
-- **`asm goto`**: parsed and rejected in `cg_inline_asm`. Phase 4b ships
- the keyword grammar but not the label-ref machinery.