kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 0463b6ab0c2e8e7f51c0aabca3a7730b2b3d2be7
parent cc72c49f15d87bd56dbed760c9a310e12de541a9
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 10:38:07 -0700

ASM.md plan update

Diffstat:
Mdoc/ASM.md | 845++++++++++++++++++++-----------------------------------------------------------
Ddoc/INLINEASM.md | 325-------------------------------------------------------------------------------
2 files changed, 215 insertions(+), 955 deletions(-)

diff --git a/doc/ASM.md b/doc/ASM.md @@ -1,95 +1,36 @@ -# ASM — assembler and disassembler plan +# ASM — assembler, disassembler, inline asm -Scope: bring up the asm frontend (standalone `.s` and inline `asm("...")`) -and the matching disassembler, starting with aarch64. Companion to -`DESIGN.md §10`. +Scope: cfree's asm frontend — standalone `.s`, inline `asm("...")`, and the +matching disassembler. aarch64 only today; x64 / rv64 are stubs that panic +cleanly. Companion to `DESIGN.md §10`. -The asm and disasm sides are designed together so that one description of -each instruction serves both: same field layout, same operand syntax, same -mnemonic table. When an opcode bit moves, the encoder and the decoder -update at one site and stay in sync by construction. +Asm and disasm are designed together: one description of each instruction +serves both. When an opcode bit moves, encoder and decoder update at one site +and stay in sync by construction. --- -## 1. Current state - -- `src/arch/aa64_isa.{h,c}`: per-format `pack`/`unpack` round-trippers - and a `(mnemonic, match, mask, AA64Format, AsmFlags)` descriptor - table. `aa64_disasm_find` linear-scans the table by - `(word & mask) == match`; first-match-wins, with alias rows placed - before their canonical form. `aa64_print_operands` renders operand - text via per-format helpers — shared between the disasm iterator and - the object listing. **This is the pairing seam.** -- `src/parse/parse_asm.c`: arch-agnostic .s driver — directives - (`.text/.data/.rodata/.bss/.section/.globl/.local/.weak/.hidden/ - .protected/.internal/.type/.size/.byte/.hword/.word/.quad/.ascii/ - .asciz/.string/.zero/.skip/.fill/.align/.balign/.p2align/.set/.equ` - plus accepted-but-ignored `.cfi_*`/`.file`/`.loc`/`.macro`/...), - labels, full constant-expression evaluator (`+ - * / % << >> & | ^ ~` - with parens), `sym ± const` symbolic terms, string-literal decoding - with C-style escapes. Wired by `src/api/pipeline.c:208`; the panic - stub in `src/api/stubs.c` is gone. -- `src/arch/aa64_asm.{h,c}`: per-mnemonic dispatch over - `aa64_insn_table` via the inline encoders in `aa64_isa.h`. Coverage - spans `nop, ret/br/blr, mov(reg/imm)/mvn/movz/movn/movk, - add(s)/sub(s)/cmp/cmn/neg(s), and/orr/eor/bic/orn/eon/ands/bics, - madd/msub/mul/mneg, udiv/sdiv/lslv/lsrv/asrv/rorv, b/bl/b.<cc>/ - cbz/cbnz, svc/brk/hlt, ldr/str (scaled + simm9 fallback), ldur/stur, - ldp/stp (signed-offset + pre-indexed), adr/adrp`. Branches emit - `R_AARCH64_{CALL,JUMP}26` / `R_AARCH64_CONDBR19`; `adr/adrp` emit - `R_AARCH64_ADR_PREL_{LO21,PG_HI21}`. -- `src/arch/aa64_disasm.{h,c}` + `src/arch/disasm.c`: aarch64 - `ArchDisasm` impl wraps `aa64_disasm_find` + `aa64_print_operands`, - synthesizing `b.<cond>` mnemonics from the BR_COND format. - `arch_disasm_new` dispatches by `c->target.arch` (aarch64 only; x64 - / rv64 panic with a clean diagnostic). -- `src/api/disasm.c`: public `cfree_disasm_iter_*` and - `cfree_obj_disasm` over `arch_disasm_*`, plus the reloc/symbol - annotation overlay (rendered into the iterator's annotation buffer - per decoded word). -- `src/arch/aa64_regs.{h,c}` + `src/api/arch_regs.c`: stateless - `cfree_arch_register_name`/`_index` queries against the canonical - aarch64 register table — same source list the parser and printer - consume. -- `driver/as.c`: `cfree as` multi-call subcommand wired to - `cfree_compile_obj_emit(CFREE_LANG_ASM)`. Accepts `-target` for - cross-assembly, `-g` for debug-info forwarding (no-op until CFI - storage is wired). -- `CGTarget.asm_block` (inline asm) is still a panic on every backend - (`aa_asm_block`, `xx_asm_block`, `rv_asm_block`, the opt recorder's - `w_asm_block`); `cg_inline_asm` (`src/cg/cg.c`) likewise. Inline-asm - bring-up is the remaining phase-4 work. - -So: standalone `.s` end-to-end works (encode → ELF → disasm -round-trip, plus JIT execute). Inline asm is the next vertical. +## 1. Status ---- - -## 2. Target slice for first milestone - -| axis | value | -|-----------|--------------------------------| -| arch | `CFREE_ARCH_ARM_64` | -| syntax | GNU `as` "unified" (per `DESIGN.md §10`) | -| objfmt | ELF (the only one wired today) | -| inline | aarch64 GCC `%w0`/`%x0`/`%[name]` substitution | -| constraints | `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching) | - -Out of scope this pass: +| layer | state | +|---|---| +| standalone `.s` (parse → ELF, JIT, round-trip) | aarch64 ✓ | +| disasm (`cfree_disasm_iter_*`, `cfree_obj_disasm`) | aarch64 ✓ | +| inline `asm("...")` C statement | aarch64 ✓ | +| `cfree as` multi-call driver subcommand | aarch64 ✓ | +| `cfree_arch_register_name` / `_index` | aarch64 ✓ | +| x64 / rv64 backends (asm, disasm, inline) | panic with clean diagnostic | -- x86 AT&T and RISC-V GNU syntax. Their per-arch parsers slot in as peers - once aarch64 proves the seams (see §6). -- WAT for WASM. Different enough to merit its own document. -- Macros / `.if` / `.macro` / `.altmacro`. The directive set is - intentionally small (§4.4); macros are a follow-up. -- Full GCC constraint coverage (multi-alternative, `&` outside outputs, - most letter constraints). Tracked under `DESIGN.md §10` as deferred. +Coverage of `aa64_asm.c` per-mnemonic table: `nop, ret/br/blr, +mov(reg/imm)/mvn/movz/movn/movk, add(s)/sub(s)/cmp/cmn/neg(s), +and/orr/eor/bic/orn/eon/ands/bics, madd/msub/mul/mneg, +udiv/sdiv/lslv/lsrv/asrv/rorv, b/bl/b.<cc>/cbz/cbnz, svc/brk/hlt, ldr/str +(scaled + simm9 fallback), ldur/stur, ldp/stp (signed-offset + pre-indexed), +adr/adrp`. --- -## 3. Encode/decode pairing — one description per instruction - -The discipline that makes asm and disasm cheap to keep in sync: +## 2. Encode/decode pairing — the design discipline ``` ┌──────────────────┐ @@ -101,591 +42,235 @@ The discipline that makes asm and disasm cheap to keep in sync: └──────────────────┘ ▲ │ - AA64InsnDesc { mnemonic, match, mask, format } + AA64InsnDesc { mnemonic, match, mask, format, AsmFlags } ``` -Per format (already established in `aa64_isa.h`): - -- `AA64<Fmt>` field struct. -- `aa64_<fmt>_pack(fields) -> u32`, `aa64_<fmt>_unpack(u32) -> fields`. -- *(new)* `aa64_<fmt>_parse(Asm*, AA64InsnDesc*, fields*)` — parses the - operand grammar for this format and fills the field struct. Reads - per-instruction opcode bits from the descriptor, so one parser handles - all members of the family. -- *(new)* `aa64_<fmt>_print(fields, sb)` — renders text for a decoded - word. The disasm text path and the round-trip check share this. - -Per instruction (one row in `aa64_insn_table`): - -- *(unchanged)* `mnemonic`, `match`, `mask`, `format`. -- *(new)* a small `AsmFlags` byte for things that vary across same-format - members: alias status, sf-required, special operand syntax (e.g. - `RET` with optional `Xn`). - -**Aliases** (`MOV` for `ORR Rd, ZR, Rm`; `MUL` for `MADD ..., ZR`; `NEG` -for `SUB Rd, ZR, Rm`) are extra rows with tighter masks placed *before* -the canonical row in `aa64_insn_table.c`. First-match-wins is already the -documented invariant. The disasm prints the alias; the asm accepts both -the alias and the canonical spelling. - -**Source of truth.** Encoder, decoder, asm parser, and asm printer all go -through `aa64_isa.h`. No second copy of the bit layout anywhere — not in -`arch/aarch64.c` codegen helpers (those already call the inline encoders), -not in test fixtures. - -**Round-trip property.** For every byte sequence `B` the disasm-then-asm -round trip is idempotent: `assemble(disasm(B)) == B` for every -instruction the assembler accepts. `disasm(assemble(disasm(B))) == -disasm(B)` is the testable form (see §7). This catches missing format -entries and operand-print/parse drift. +Per format (in `aa64_isa.h`): `AA64<Fmt>` struct + `pack` / `unpack` / +`print`. Per instruction (one row in `aa64_insn_table`): mnemonic, match, +mask, format, AsmFlags (alias / sf-required / etc.). ---- +**Source of truth.** Encoder, decoder, asm parser, asm printer all go through +`aa64_isa.h`. No second copy of the bit layout anywhere. If `S` +(asm-roundtrip) fails on a cg-emitted word, fix the format definition; never +the parser site. -## 4. Module layout +**Aliases** (`MOV` for `ORR Rd, ZR, Rm`; `MUL` for `MADD ..., ZR`; `NEG` for +`SUB Rd, ZR, Rm`) are extra rows with tighter masks placed *before* the +canonical row. First-match-wins picks the alias spelling. -Reuse `aa64` prefix. +--- + +## 3. Module layout ``` -src/parse/parse_asm.c shared driver: scan tokens, dispatch directives, - label management, expression evaluation, - call per-arch instruction parser. +src/parse/parse_asm.c arch-agnostic .s driver: directives, labels, + expression evaluator, string decoding. + asm_driver_open_inline constructor for inline + asm template parsing. src/parse/parse_asm_helpers.h - driver↔arch seam (asm_driver_peek/next/ - parse_const/parse_sym_expr/intern_sym/panic). - AsmDriver itself stays internal to parse_asm.c. + driver↔arch seam (peek/next/eat_*/parse_const/ + parse_sym_expr/intern_sym/panic). + AsmDriver stays opaque. +src/parse/parse.c parse_asm_stmt: GNU asm("...") statement + grammar (volatile, goto, four colon-separated + lists, [name] symbolic operands). +src/arch/aa64_isa.{h,c} per-format pack/unpack/print + AA64InsnDesc + table + alias flags. Shared between encoder, + decoder, and printer. src/arch/aa64_asm.{h,c} aa64 instruction parser: per-mnemonic dispatch - over aa64_insn_table → inline encoders in - aa64_isa.h. Phase 4b will grow the inline-asm - template walker on top of the same parsers. -src/arch/aa64_disasm.{h,c} aa64 ArchDisasm impl. Wraps aa64_disasm_find - with operand printing; synthesizes b.<cond>. -src/arch/aa64_regs.{h,c} canonical aarch64 register name list — same - source the parser and printer consume. -src/arch/aa64_isa.{h,c} per-format pack/unpack + print_operands - dispatcher + AsmFlags column on AA64InsnDesc. - aa64_parse_operands declared but unused by - phase 3 (we dispatch per-mnemonic instead; - the table-driven parser belongs with the - remaining cg-emitted formats — see §5). -src/arch/disasm.c arch_disasm_new dispatch by c->target.arch - (peer of arch/cgtarget.c per MULTIARCH §2.1). -src/api/disasm.c cfree_disasm_iter_* / cfree_obj_disasm over - arch_disasm_*, plus reloc/symbol overlay. -src/api/arch_regs.c cfree_arch_register_name / _index dispatch. -driver/as.c `cfree as` subcommand: cross-target flag, - -g/-o, single positional input. Drives + over the table → inline encoders. + aa64_inline_bind + aa64_asm_run_template + implement the inline-asm template walker. +src/arch/aa64_disasm.{h,c} aa64 ArchDisasm impl wrapping aa64_disasm_find + + aa64_print_operands; synthesizes b.<cond>. +src/arch/aa64_regs.{h,c} canonical aarch64 register name list. +src/arch/disasm.c arch_disasm_new dispatch on c->target.arch. +src/arch/aarch64.c aa_asm_block: CGTarget vtable entry for inline + asm; opens AA64Asm, binds operands, runs + template, closes. +src/cg/cg.c cg_inline_asm: constraint binder (pops inputs, + allocates output regs, handles "memory" + clobber, calls target->asm_block, pushes + outputs). +src/opt/opt.c w_asm_block recorder + IR_ASM_BLOCK replay + (mirrors w_call / IR_CALL). +src/api/disasm.c cfree_disasm_iter_* / cfree_obj_disasm + reloc/ + symbol annotation overlay. +src/api/arch_regs.c stateless cfree_arch_register_name / _index + dispatcher. +driver/as.c cfree as subcommand wired to cfree_compile_obj_emit(CFREE_LANG_ASM). ``` -The pieces fall on three seams: (a) `parse_asm` ↔ per-arch instruction -parser via `parse_asm_helpers.h`, (b) `MCEmitter` is the byte sink for -both asm and codegen, (c) `arch_disasm_new` ↔ per-arch decoder. -`aa64_isa.h` is the shared truth crossing all three. - -### 4.1 `parse_asm` driver — arch-agnostic - -```c -void parse_asm(Compiler* c, Lexer* l, MCEmitter* mc) { - AsmDriver d = {.c = c, .lex = l, .mc = mc, .arch = aa64_asm_open(c)}; - for (;;) { - Tok t = lex_next(l); - if (t.kind == TOK_EOF) break; - if (is_directive(t)) parse_directive(&d, t); /* .text, .globl, ... */ - else if (is_label(t)) parse_label(&d, t); /* foo: */ - else if (is_ident(t)) d.arch->insn(d.arch, &d, t); /* mnemonic line */ - else parse_skip_to_newline(&d); - } - aa64_asm_close(d.arch); -} -``` - -Directives, labels, expression evaluation, and the `MCEmitter` glue live -in this file because every arch needs the same set. Per-arch code is one -function pointer (`arch->insn`), one symbol (`aa64_asm_open`). - -### 4.2 `aa64_asm_open` — instruction parser - -```c -void aa64_asm_insn(AA64Asm*, AsmDriver*, Sym mnemonic); -``` - -The parser dispatches per-mnemonic against a small in-file table -(`{name, parse_fn}`). Each `parse_fn` reads operands via the -`asm_driver_*` helpers (register, immediate, memory addressing) and -calls the inline encoder in `aa64_isa.h` for its format -(`aa64_movz`, `aa64_add`, `aa64_ldr64_uimm12`, ...). One parser per -operand grammar — register-vs-immediate variants of the same -mnemonic (e.g. `add Rd, Rn, Rm` vs. `add Rd, Rn, #imm`) branch on -the first non-Rd operand. Aliases (`mov`, `mvn`, `cmp`, `cmn`, -`neg`, `mul`, `mneg`, no-operand `ret`) live as dedicated rows that -emit the canonical encoding directly. - -Branches do not go through `mc->emit_label_ref`; the parser emits -the instruction with `imm26=0`/`imm19=0` and records a reloc -(`R_AARCH64_CALL26`/`JUMP26`/`CONDBR19`) against the operand's -ObjSymId via `mc->emit_reloc_at`. The linker (and the in-process -fixup machinery in `src/arch/mc.c`) applies the displacement at -relocation time. - -The table-driven `aa64_parse_operands` declared in `aa64_isa.h` -(phase-2 placeholder) remains stubbed — phase 3 chose per-mnemonic -dispatch because it lets one parser handle alias / immediate / -register-form branching at the right level. The format-driven -parser slots in alongside this one for the remaining cg-emitted -formats when `S` (cg round-trip) needs them. - -### 4.3 Inline asm — same parser, different operand source - -```c -static void aa_asm_block(CGTarget* t, const char* tmpl, - const AsmConstraint* outs, u32 nout, Operand* out_ops, - const AsmConstraint* ins, u32 nin, const Operand* in_ops, - const Sym* clobbers, u32 nclob) { - AA64Asm* a = aa64_asm_open(t->c); - aa64_inline_bind(a, outs, nout, out_ops, ins, nin, in_ops, clobbers, nclob); - aa64_asm_run_template(a, t->mc, tmpl); - aa64_asm_close(a); -} -``` - -The template walker is the same `aa64_<fmt>_parse` set used by the -standalone path. The only delta is the operand lexer: in inline mode, -`%0`, `%w0`, `%x0`, `%[name]` resolve to the bound `Operand` for the -corresponding constraint. `%w0` prints the W-form register name (forces -`sf=0`); `%x0` the X-form. Memory operands `%a0` materialize as -`[Xn, #ofs]`. Bit width is checked against the format's expectation -(e.g. a 32-bit format with `%x0` is a diagnostic). - -Constraint binding (v1 set): - -| constraint | meaning | -|------------|---------------------------------------------------------------| -| `r` / `=r` | int reg; allocated via the codegen scratch pool of the active CGTarget | -| `+r` | input + output, same register | -| `=&r` | early-clobber output (allocated disjoint from any input) | -| `i` | compile-time integer; must be `OPK_IMM` | -| `m` | memory operand; bind a scratch base reg if the source isn't `OPK_INDIRECT` | -| `0` (etc.) | matching constraint: input must use the same physical reg as output 0 | - -`"memory"` clobber forces CG to flush all live stack values to memory -before the block and reload after, per `DESIGN.md §10`. Register-name -clobbers add to the "clobbered by call" set so RA does not reuse them -across the block. `"cc"` is accepted and ignored on aarch64 (NZCV is -reserved by the inline-asm contract anyway — no instruction outside the -block reads it across the block). - -Under `opt_cgtarget` the call is recorded as `IR_ASM_BLOCK` (already an -opaque-to-passes record per `DESIGN.md §9.5`); at lowering the wrapped -target sees the same call with materialized operands. - -### 4.4 Directives — minimum viable set - -``` -.section NAME [, "FLAGS", @TYPE] -.text .data .rodata .bss -.globl SYM .local SYM .weak SYM .hidden SYM -.type SYM, @function | @object -.size SYM, EXPR -.byte .hword .word .quad EXPR [, EXPR ...] -.ascii "..." .asciz "..." .string "..." -.zero N .skip N .fill N, SIZE, VALUE -.align N .balign N .p2align N -.set NAME, EXPR -.equ NAME, EXPR (= .set) -.file "name" (debug line filename) -.loc FILE LINE [COL] (debug line row) -``` - -CFI directives (`.cfi_startproc`, `.cfi_def_cfa`, …) are accepted and -forwarded to the corresponding `MCEmitter.cfi_*` calls (already exist -per `arch/arch.h`). Unknown directives are a recovery diagnostic, not a -panic — skip to newline. - -`.macro` / `.if` / `.include` are deferred. Inline asm gets there first -because `cg_inline_asm` is the immediate consumer. - -### 4.5 Disassembler — `arch_disasm_new` for aarch64 - -```c -typedef struct AA64Disasm { - ArchDisasm base; - Compiler* c; - StrBuf mnemonic, operands, annotation; /* reused per decode */ -} AA64Disasm; - -u32 arch_disasm_decode(ArchDisasm* d_, const u8* b, size_t n, - u64 vaddr, CfreeInsn* out) { - if (n < 4) return 0; - u32 w = read_u32_le(b); - const AA64InsnDesc* ins = aa64_disasm_find(w); - if (!ins) { write_unknown(d, b, out); return 4; } - AA64Fields f = aa64_<ins->format>_unpack(w); - strbuf_set(&d->mnemonic, ins->mnemonic); - aa64_<ins->format>_print(&d->operands, ins, &f, vaddr); - out->vaddr = vaddr; - out->bytes = b; out->nbytes = 4; - out->mnemonic = d->mnemonic.p; - out->operands = d->operands.p; - out->annotation = ""; /* sym/reloc overlay added by cfree_obj_disasm */ - return 4; -} -``` - -Annotations (sym/reloc overlay) live one level up in `cfree_obj_disasm` / -`cfree_disasm_iter_new(..., obj)`: the iterator walks `ObjBuilder` relocs -keyed on the section + offset and writes the resolved `name+addend` into -`annotation`. The arch-level decoder is reloc-unaware — it only reads -bytes. This keeps `arch_disasm_decode` per-arch and the symbol/reloc -overlay arch-agnostic. - -`cfree_arch_register_name` / `_index` live in `aa64_regs.{h,c}` -alongside one canonical name list shared by the parser, the printer, -and the public API. `src/api/arch_regs.c` is the stateless dispatcher -(`switch (arch)` over per-arch tables); the iterator surface remains -a NULL-returning stub pending an env/heap on its constructor (see the -TODO at the top of `src/api/arch_regs.c`). - ---- - -## 5. Phasing - -Each phase ends mergeable. Phase 1 stands up the test harness so every -later phase gates on real runs from its first commit. Phase 2 lands the -encode/decode pairing as a mechanical refactor; phase 3 is the standalone -assembler; phase 4 splits into 4a (disasm overlay) and 4b (inline asm); -phase 5 is the seam-rev for x64/rv64. - -Phases 1, 2, 3, and 4a are DONE. Phase 4b (inline asm) and phase 5 -(multiarch) remain. - -### Phase 1 — test harness (DONE) - -Stand up the runner before any compiler-side work. No `src/` changes. - -- [x] New `test/asm/` peer of `test/parse/`. One `run.sh`; three - sub-corpora (`encode/`, `decode/`, `listing/`). Skip-vs-fail follows - the `CFREE_TEST_ALLOW_SKIP` convention used elsewhere — every case - skips cleanly today because `parse_asm` and `cfree_disasm_iter_*` - are stubs. `CFREE_TEST_ALLOW_SKIP` defaults to `1` in the asm - harness for the duration of phase 1; flip to `0` once the assembler - and disasm iterator are real. -- [x] New `test-asm` target in `test/test.mk`; added to the default - `test` list. -- [x] Add `S` (asm-roundtrip) path letter to `test/cg/run.sh`. Skips - today; turns green when phases 3+4 land. Recognized in the path - matrix; `S` is opt-in (`run.sh '' S` or - `CFREE_TEST_PATHS=DREJWS`) until phase 4 lands, so the default - `DREJW` continues to gate CI cleanly. Becomes part of the default - matrix in phase 4. -- [x] Smoke goldens checked in for one case per sub-corpus. A - `test/asm/regen.sh` documents how to refresh them from the host - `clang --target=aarch64-linux-gnu` / `llvm-objdump`; it is committed - as a maintainer aid and is not run by CI (same convention as - `test/elf/normalize.py`). -- [x] New runner C binary `asm-runner` under `test/asm/harness/` — - peer of `parse-runner`. Five sub-commands: `--encode`, `--decode`, - `--listing`, `--emit`, `--jit`. The first three dispatch to - `cfree_compile_obj_emit(CFREE_LANG_ASM)` / `cfree_disasm_iter_*` / - `cfree_obj_disasm`; `--emit` writes a `.o` to disk so the J and E - exec paths can reuse the `test/link` harness binaries; `--jit` - parses + JIT-links and calls `test_main`. -- [x] Path matrix for `test/asm/run.sh`: `HTLDJE`. `H` hex encode, - `T` text decode, `L` listing, `D` direct JIT, `J` jit-via-file, - `E` ELF exec under qemu/podman. D/J/E only run on `encode/` - cases with an `<name>.expected` exit-code sidecar. - -Exit criterion (met): `make test-asm` runs end-to-end; the three smoke -cases report SKIP for every path they apply to and the harness -wiring is exercised on every CI run. `bash test/cg/run.sh '' S` also -reports SKIP cleanly. No green asm cases yet — that's phase 3. - -### Phase 2 — finish the ISA descriptor table (DONE) - -Pure refactor. No new behavior; existing codegen still calls inline -encoders. - -- [x] Added `aa64_print_operands` dispatcher plus per-format - `print_*` helpers in `aa64_isa.c`; renders into a new tiny `StrBuf` - (`src/core/strbuf.{h,c}`). `aa64_parse_operands` is declared with - the phase-3 signature and stubbed to return 0 — phase 3 fills the - per-format grammar in once the asm token stream lands. -- [x] Added an `AsmFlags` byte on `AA64InsnDesc` - (`AA64_ASMFL_ALIAS / SF1 / NORN`). Aliases marked: `MOV` (ORR Rd, - ZR, Rm), `MVN` (ORN), `NEG` / `NEGS` (SUB / SUBS Rd, ZR, Rm), - `CMP` / `CMN` (SUBS / ADDS ZR, Rn, Rm), `MUL` / `MNEG` (MADD / - MSUB with Ra=ZR), `RET`-no-operand (RET X30). -- [x] Reordered `aa64_insn_table` so each alias precedes its - canonical form. First-match-wins now picks the alias spelling. -- [x] Backfilled formats codegen emits: `BR_IMM` (B / BL), - `BR_COND` (B.cond), `CB` (CBZ / CBNZ), `EXCEPT` (BRK / SVC / HLT), - `LDST_SIMM9` (LDUR / STUR, V=0 and V=1, every size), - `LDSTP_SOFF` (STP / LDP signed-offset, X and D forms). - `LDST_UIMM` and `LDSTP_PRE` rows expanded to cover every - size × V combination codegen emits today. Each format lands as - one struct + pack/unpack + print + table rows; phase-3 - parse-grammar bodies follow once the asm token stream exists. -- [x] New `test/arch/aa64_isa_test.c` + `make test-isa` target. - Exercises one representative word per format, asserts mnemonic - and operand text, and pins the alias-precedence invariant - (`ORR Rd, ZR, Rm` resolves to "mov", `ORR Rd, Rn, Rm` to "orr"). - Added to the default `test` list. - -Exit criterion (met): `aa64_disasm_find` returns a desc for the -representative word of every format used in this phase, and the unit -test pins that contract for future regressions. Full byte-by-byte -coverage of every cg-emitted word becomes enforced when the `S` -path on `test/cg/run.sh` turns green in phase 4 — the remaining -codegen-only formats (bitfield, condsel, FP-DP1/2, FP↔int cvt, -ldst-exclusive, dmb/clrex, mrs, dp1, SIMD basic) get table rows then. - -### Phase 3 — standalone `.s` assembler (DONE) - -- [x] New `src/parse/parse_asm.c`. Panic stub in `src/api/stubs.c` - removed. Driver loop, directive parser, label management, - expression evaluator (constants with `+ - * / % << >> & | ^ ~` and - parens; `sym ± const` for symbolic terms), string-literal decoding - with C-style escapes. -- [x] New `src/parse/parse_asm_helpers.h`. Lightweight surface - (`asm_driver_peek/next/eat_*/parse_const/parse_sym_expr/intern_sym/ - panic/...`) the per-arch parser consumes; the AsmDriver struct - itself stays internal to `parse_asm.c`. -- [x] New `src/arch/aa64_asm.{h,c}` with `aa64_asm_open` / - `aa64_asm_insn`. Per-mnemonic dispatch over `aa64_insn_table` - resolved through the inline encoders in `aa64_isa.h` (no second - copy of the bit layout). Composite mnemonics (`b.eq`, `b.ne`, ...) - are stitched in the driver before dispatch. -- [x] Reloc-emitting operands: branches → `R_AARCH64_CALL26` / - `JUMP26`; conditional branches and CBZ/CBNZ → `R_AARCH64_CONDBR19`; - `adr`/`adrp` → `R_AARCH64_ADR_PREL_LO21` / `_PG_HI21`. Data - directives (`.word`/`.quad`) with a symbolic operand emit - `R_ABS32`/`R_ABS64` through `MCEmitter.emit_reloc_at`, no new - mechanism needed (per §7). -- [x] CFI directives accepted (parsed + skipped) — forward to - `MCEmitter.cfi_*` once those hooks store records (today they are - no-ops in `src/arch/mc.c`). `.loc` and `.file` likewise accepted- - and-ignored; wiring them to `mc->set_loc` is a follow-up that - drops in without touching the parser shape. -- [x] `cfree as` driver subcommand (`driver/as.c`) — accepts - `-target TRIPLE`, `-g`, `-o OUT.o INPUT.s`. Same composition point - as `cfree -c <file.s>` modulo lang inference. -- [x] Smoke-case skips dropped from `test/asm/encode/`, - `test/asm/decode/`, `test/asm/listing/`. `test-asm` runs green on - every path it can on the host (E skips on a non-aarch64 host when - no exec runner is configured). - -Exit criterion (met for the smoke corpus): the phase-1 encode case -runs through H (hex roundtrip), D (direct JIT execute), J (JIT via -file). Coverage of every row in `aa64_insn_table` becomes enforced -when the `S` path on `test/cg/run.sh` turns on by default (see -§6.2); the remaining codegen-only formats (bitfield, condsel, -FP-DP1/2, FP↔int cvt, ldst-exclusive, dmb/clrex, mrs, dp1, SIMD -basic) gain table rows + parser coverage in lockstep with `S`. - -### Phase 4a — disasm overlay (DONE) - -- [x] `src/arch/aa64_disasm.{h,c}`: aarch64 `ArchDisasm` impl wraps - `aa64_disasm_find` + `aa64_print_operands`. Owns the per-iterator - StrBuf storage for mnemonic / operands / annotation. Mnemonic - rewrite for `b.<cond>` happens here (the printer keeps the BR_COND - format opcode-agnostic). -- [x] `src/arch/disasm.c`: dispatcher peer of `src/arch/cgtarget.c`, - switches `arch_disasm_new` on `c->target.arch`. aarch64 only; - x86_64 / rv64 panic with a clean diagnostic. -- [x] `src/api/disasm.c`: `cfree_disasm_iter_new/next/free` and - `cfree_obj_disasm` over `arch_disasm_*`, plus the reloc/symbol - annotation overlay (rendered per-decoded-word into the iterator - buffer; the arch decoder stays reloc-unaware). -- [x] `src/arch/aa64_regs.{h,c}` + `src/api/arch_regs.c`: stateless - `cfree_arch_register_name` / `_index` against one canonical reg - table — same source the parser and printer share. - -Exit criterion (met): every `test/asm/decode/` and -`test/asm/listing/` case is green; `cfree objdump -d` over the -output of `cfree as` round-trips the smoke corpus. - -### Phase 4b — inline asm - -See `doc/INLINEASM.md` for the detailed plan (scope, files, constraint -binder, template walker, parallelization across three tracks, testing). -The summary: stand up `parse_asm_stmt` in `parse.c`, implement -`cg_inline_asm` (constraint binder + `"memory"` clobber spill), implement -`aa_asm_block` + `aa64_asm_run_template` in `aa64_asm.c`, wire -`w_asm_block` recorder + `IR_ASM_BLOCK` replay in `opt.c`. Three tracks -(frontend / cg+opt / aa64) merge in any order behind the existing panic -stubs. - -Exit criterion: the inline-asm cases under `test/cg/` (svc-style -write-then-exit) build, run under qemu/podman, and report green on -`DREJWS`. The `S` path turns green for the full cg corpus, proving -encode/decode pairing across every `.text` byte cfree currently -emits. - -### Phase 5 — multiarch seam - -Land before x64/rv64 codegen needs it. - -1. `arch/disasm.c::arch_disasm_new` switches on `c->target.arch` - (currently aarch64-only). -2. `parse_asm` driver dispatches per-arch instruction parser by - `c->target.arch`. `aa64_asm_open` becomes one of N constructors. -3. Reg-name table dispatched the same way (`cfree_arch_register_name`). -4. `x64_isa.{h,c}` and `rv64_isa.{h,c}` skeletons (formats + tables, - not populated). x64 brings AT&T, rv64 brings GNU. Each pulls in its - own `<arch>_asm.{h,c}` and `<arch>_disasm.{h,c}`. Per `DESIGN.md - §10` the asm flavour is decided per-arch, single supported flavour. - -Exit criterion: builds for `CFREE_ARCH_X86_64` reach the x64 asm/disasm -stubs and panic with a clean diagnostic; aarch64 path unchanged. +Three seams: (a) `parse_asm` ↔ per-arch instruction parser via +`parse_asm_helpers.h`, (b) `MCEmitter` as the byte sink for both asm and +codegen, (c) `arch_disasm_new` ↔ per-arch decoder. `aa64_isa.h` is the +shared truth crossing all three. --- -## 6. Testing +## 4. Inline asm — constraint binder + template walker -The pairing buys a strong test shape: most tests run the round trip -rather than spelling expected bytes by hand. Three buckets, all wired -in phase 1: +**Constraints (v1)**: `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching by +index). `AsmConstraint` carries `{str, name, type, dir}` — `name` is the +optional `[name]` Sym, `type` is the bound expression's C type (drives +`RegClass` + width). Hand-built test constraints with `NULL` type fall back +to 64-bit int. -### 6.1 `test/asm/` — file-driven goldens (new) +**Clobbers**: +- `"memory"` — spill all live RES_REG SValues via `target->spill_reg`; + subsequent reads reload through `target->reload_reg`. Same machinery cg + uses across function calls. +- Register names (`"x0"`, …) — passed through to `target->asm_block`. +- `"cc"` — silently ignored on aarch64 (NZCV reserved across the block). -Peer of `test/parse/`. One `run.sh`, one `asm-runner` C binary, three -sub-corpora keyed off filename suffix: +**Placeholders**: `%N`, `%wN` (force W form), `%xN` (force X form), +`%[name]` (resolved against `AsmConstraint.name`), `%aN` (memory addressing +form), `%%`. The walker pre-substitutes them into asm source text and +re-lexes through the standalone per-mnemonic parsers — no second operand +grammar. -| dir | input | expected | drives | -|-------------------------|--------------------|--------------------|------------------------------| -| `test/asm/encode/` | `<name>.s` | `<name>.expected.hex` | `cfree as` over the `.s`, hex-compare against expected | -| `test/asm/decode/` | `<name>.hex` | `<name>.expected.txt` | `cfree_disasm_iter_*` over the bytes, text-compare | -| `test/asm/listing/` | `<name>.in.bin` (ELF) | `<name>.expected.lst` | `cfree_obj_disasm` against the ELF, listing-compare | +**IR**: `IR_ASM_BLOCK` with `IRAsmAux { tmpl, outs, ins, in_ops, out_ops, +clobbers, nout, nin, nclob }`. The opt recorder arena-copies the payload; +replay xlat_op's each Operand and forwards to the wrapped target. -Goldens are checked in. A `test/asm/regen.sh` regenerates them from -the host `as` / `objdump` (committed only as a maintainer aid; not -run by CI). One smoke case per sub-corpus is enough for phase 1; the -table fills up alongside phases 3 and 4. - -### 6.2 `test/cg/` `S` path — asm roundtrip (new path letter) - -Path letter added to `test/cg/run.sh`. For every cg-emitted aarch64 -binary already in the corpus: walk `.text`, decode each instruction, -re-assemble the resulting text, byte-compare. No new corpus — -piggybacks on every existing cg case for free coverage. Catches -encode/decode drift the moment a format gains a member. - -Reports SKIP today, green after phase 4. Path matrix becomes -`DREJWS`. Skip-vs-fail and filtering match the rest of the cg paths -verbatim. - -### 6.3 `test/cg/` inline-asm cases — under existing harness - -Inline asm is behavioral C with exit-code assertions, which is exactly -what `test/cg/cases_*.c` already does. Add a new `cases_asm.c` (or -fold cases into the existing buckets) registered through `cg-runner` -the same way every other case is. The path matrix (`DREJW`) and the -qemu/podman runner from `test/lib/exec_aarch64.sh` cover execution -unchanged. - -### Driver wiring - -A standalone `cfree as` subcommand is exposed by the multi-call driver -in phase 3 (same dispatch as `cfree -c <file.s>` modulo lang -inference). `test/asm/encode/` drives `cfree as` directly so the -multi-call dispatch is exercised end-to-end. +**`asm volatile`**: accepted but informational — `IR_ASM_BLOCK` is already +opaque-to-passes, so volatile changes nothing at the IR level. --- -## 7. Decisions - -- **Disasm immediate format: context-sensitive.** Signed decimal for - fields the ISA defines as signed (branch displacements, signed-imm12 - add/sub, load/store offsets). `0x`-prefixed hex everywhere else - (logical bitmask immediates, MOVZ/MOVK halfword, addresses). - `aa64_<fmt>_print` carries a per-field signedness bit; the print - helpers branch on it. Goldens lock the chosen form per format. -- **`.s` constant expressions: arithmetic with parens.** Operators - `+ - * / % << >> & | ^ ~`, parenthesized, over signed integer - constants and `sym + const` terms. Symbol-involving expressions are - restricted to `(sym ± const)`; any product, quotient, shift, or - bitwise op that has a symbol operand is a diagnostic. Reloc-modifier - syntax (`:lo12:sym`, `:got:sym`) and macro counters (`\@`) are - deferred — they belong with the macro/full-PIC follow-up. -- **`__cfree_setjmp.s` is decoupled.** Phase 2 lands against the - synthetic suites in §6, not against the runtime. `rt/` is currently - built with clang (`rt/Makefile`) and continues to be through phase 3; - `__cfree_setjmp.s` migrates to `parse_asm` as a follow-up after the - assembler is proven on the test corpus. The same applies to any - other `.s` files `rt/` adds before then. -- **Absolute relocs in `.s`** (e.g. `.quad some_sym + 8` in `.data`) - go through `MCEmitter.emit_reloc_at` against the existing - `RelocKind` set — no new mechanism needed. -- **Self-hosting.** Per `DESIGN.md §12`, anything in `src/` must be - C11-freestanding-writable. `parse_asm.c` and `aa64_asm.c` follow the - same rule. No reliance on a host assembler at build time *for the - compiler*; `rt/` still uses clang and is on its own bootstrap track. - ---- - -## 8. Running the tests +## 5. Testing ``` -make test-asm # full asm harness: all paths, all sub-corpora -make test # includes test-asm in the default suite +make test-asm # standalone .s harness +make test-isa # aa64 ISA descriptor table +make test-aa64-inline # aa64 inline-asm walker (hand-built Operands) +make test-cg-binder # cg_inline_asm constraint binder (mock CGTarget) +make test # includes all of the above ``` -The harness lives in `test/asm/`. See `test/asm/CORPUS.md` for the -sub-corpus layout and `test/asm/regen.sh` for golden refresh. - -### Filtering and path selection +### `test/asm/` — standalone -``` -bash test/asm/run.sh # default: every case, HTLDJE -bash test/asm/run.sh nop # name substring filter -bash test/asm/run.sh '' HT # only H (hex encode) + T (decode) -bash test/asm/run.sh exit_zero DJE # exec paths for one case -CFREE_TEST_FILTER=nop CFREE_TEST_PATHS=L bash test/asm/run.sh -``` +Three sub-corpora keyed off filename suffix: -Path letters: +| dir | input | expected | drives | +|---|---|---|---| +| `test/asm/encode/` | `<name>.s` | `<name>.expected.hex` | `cfree as`, hex-compare | +| `test/asm/decode/` | `<name>.hex` | `<name>.expected.txt` | `cfree_disasm_iter_*`, text-compare | +| `test/asm/listing/` | `<name>.in.bin` (ELF) | `<name>.expected.lst` | `cfree_obj_disasm`, listing-compare | -| letter | path | input | check | -|--------|------------------|--------------|--------------------------------| -| `H` | Hex encode | `encode/*.s` | `--encode` → diff `.expected.hex` | -| `T` | Text decode | `decode/*.hex` | `--decode` → diff `.expected.txt` | -| `L` | Listing | `listing/*.in.bin` | `--listing` → diff `.expected.lst` | -| `D` | Direct JIT | `encode/*.s` (with `.expected` exit) | `--jit` → exit code | -| `J` | JIT via file | `encode/*.s` (with `.expected` exit) | `--emit` + `jit-runner` | -| `E` | ELF exec | `encode/*.s` (with `.expected` exit) | `--emit` + `link-exe-runner` + qemu/podman | +`test/asm/regen.sh` regenerates from host `as` / `objdump` (maintainer +aid; not run by CI). -D and J need the host arch to match `CFREE_TEST_ARCH` (no cross-JIT); -E uses qemu/podman per `test/lib/exec_target.sh` and is cross-host -friendly. +### Path letters — `test/asm/run.sh` and `test/cg/run.sh` -### Skip sidecars +| letter | path | check | +|---|---|---| +| `H` | hex encode | `--encode` → diff `.expected.hex` | +| `T` | text decode | `--decode` → diff `.expected.txt` | +| `L` | listing | `--listing` → diff `.expected.lst` | +| `D` | direct JIT | `--jit` → exit code | +| `J` | JIT via file | `--emit` + `jit-runner` | +| `E` | ELF exec | `--emit` + `link-exe-runner` + qemu/podman | +| `S` | asm round-trip (cg corpus) | decode every cg-emitted insn, re-assemble, byte-compare | -The phase-1 smoke `.skip` sidecars are gone; the corresponding -subsystems are real. New cases that hit an unimplemented mnemonic or -directive can still drop a `<name>.skip` sidecar — single-line reason -— and the harness will report SKIP. Run with -`CFREE_TEST_ALLOW_SKIP=0` to surface skips as failures (the default -in CI from phase 3 onward). - -### Cross-target +`S` is opt-in on `test/cg/run.sh` (default matrix stays `DREJW`) until the +remaining cg-emitted formats land in `aa64_insn_table`. Run explicitly: ``` -CFREE_TEST_ARCH=aa64 bash test/asm/run.sh # default -CFREE_TEST_ARCH=x64 bash test/asm/run.sh # x64 lane (no green cases yet) -CFREE_TEST_ARCH=rv64 bash test/asm/run.sh # rv64 lane +bash test/cg/run.sh '' DREJWS # full matrix incl. S +bash test/cg/run.sh '' S # just S ``` -### The `S` path on `test/cg/run.sh` +--- -`S` (asm roundtrip across every cg-emitted aarch64 binary) is -recognized but opt-in until the assembler covers every cg-emitted -format. The default cg matrix stays `DREJW`. Run it explicitly: +## 6. Remaining TODOs + +End-to-end inline asm: + +- [ ] `test/cg/cases_asm.c` smoke case (smallest: `__asm__ volatile("mov + w0,%w0; svc #0" :: "r"(rc) : "x0")`) wired into the cg corpus on the + `DREJWS` path matrix. +- [ ] Lift `test/parse/cases/asm_01_grammar.skip` once `dmb sy` and + `asm goto` reach `aa64_insn_table` + parser. +- [ ] `aa64_asm_run_template`: support multi-digit operand index (`%10+`); + respect `;` inside `[...]` / quoted strings rather than splitting on + every `;`. +- [ ] `aa_asm_block`: consume `clobbers[]` to inform the aarch64 RA of + register-name clobbers (today v1 relies on Track B's `"memory"` + spill + parser pass-through; register-name clobbers don't yet evict + from the allocator). + +`aa64_insn_table` coverage (gates `S` on the full cg corpus): + +- [ ] bitfield (`ubfm`/`sbfm` families) +- [ ] condsel (`csel`/`csinc`/`csinv`/`csneg`) +- [ ] FP-DP1 / FP-DP2 (`fadd`/`fsub`/`fmul`/`fdiv`/`fneg`/`fabs`/`fsqrt`) +- [ ] FP↔int cvt (`fcvtzs`/`scvtf` families) +- [ ] ldst-exclusive (`ldxr`/`stxr`/`ldaxr`/`stlxr`) +- [ ] memory barriers (`dmb`/`dsb`/`isb`/`clrex`) +- [ ] system reg access (`mrs`/`msr`) +- [ ] data-processing 1-source (`rbit`/`rev`/`clz`/`cls`) +- [ ] SIMD basic (the cg-emitted subset) + +Once these land, flip `S` into the default cg matrix and drop the +`asm_01_grammar.skip`. + +Multiarch seam (Phase 5): + +- [ ] `arch/disasm.c::arch_disasm_new` switches by `c->target.arch` + (currently aarch64-only; x64 / rv64 panic). +- [ ] `parse_asm` driver dispatches per-arch instruction parser by + `c->target.arch`. `aa64_asm_open` becomes one of N constructors. +- [ ] `cfree_arch_register_name` dispatched the same way. +- [ ] `x64_isa.{h,c}` + `rv64_isa.{h,c}` skeletons (formats + tables, + not populated). x64 brings AT&T, rv64 brings GNU. + +Driver / runtime: + +- [ ] CFI directives (`.cfi_*`) are parsed and accepted-but-ignored; + forward to `MCEmitter.cfi_*` once those hooks store records. +- [ ] `.loc` / `.file` likewise accepted-and-ignored; wire to + `mc->set_loc` for inline DWARF line info. +- [ ] `__cfree_setjmp.s` (and any other `.s` in `rt/`) migrate from clang + to `parse_asm` once the assembler proves on the cg corpus. + +Deferred / explicitly out of scope: + +- Macros, `.if`, `.macro`, `.altmacro` inside templates — same deferral + as standalone. +- Multi-alternative constraints; most GCC letter constraints — tracked + under `DESIGN.md §10`. +- WAT for WASM (separate document). -``` -bash test/cg/run.sh '' DREJWS # full matrix incl. S -bash test/cg/run.sh '' S # just S -``` +--- + +## 7. Decisions -`S` becomes part of the default cg matrix once the remaining -codegen-only formats (bitfield, condsel, FP-DP1/2, FP↔int cvt, -ldst-exclusive, dmb/clrex, mrs, dp1, SIMD basic) gain -`aa64_insn_table` rows and matching `aa64_asm` parsers. +- **Disasm immediate format**: context-sensitive. Signed decimal for fields + the ISA defines as signed (branch displacements, signed-imm12 add/sub, + load/store offsets). `0x`-prefixed hex everywhere else. `aa64_<fmt>_print` + carries a per-field signedness bit; goldens lock the chosen form. +- **`.s` constant expressions**: arithmetic with parens — `+ - * / % << >> + & | ^ ~` over signed integer constants and `sym + const` terms. + Symbol-involving expressions restricted to `(sym ± const)`. Reloc-modifier + syntax (`:lo12:sym`, `:got:sym`) and macro counters (`\@`) are deferred. +- **Absolute relocs in `.s`** (e.g. `.quad some_sym + 8` in `.data`) go + through `MCEmitter.emit_reloc_at` against the existing `RelocKind` set — + no new mechanism. +- **Operand transport for inline asm**: parser pushes only inputs onto the + CG stack; outputs come back as fresh SValues that the parser assigns to + the declared lvalues. Matches the `cg_inline_asm` docstring at + `src/cg/cg.h:178-181`. +- **Template lexing**: pre-substitute placeholders to physical asm text and + re-lex via the standalone parser, instead of carrying `Operand`s in `Tok` + variants. One operand grammar, one lexer; cost is one extra StrBuf pass + per inline block. +- **Memory clobber**: route through `target->spill_reg` / + `target->reload_reg` — same machinery cg already uses across function + calls. No new flush mechanism. +- **`asm goto`**: parsed and rejected in `cg_inline_asm`. Keyword grammar + ships; label-ref machinery does not. +- **Self-hosting**: per `DESIGN.md §12`, anything in `src/` must be + C11-freestanding-writable. `parse_asm.c`, `aa64_asm.c`, `parse_asm_stmt` + in `parse.c` all follow the rule. `rt/` is on its own bootstrap track. diff --git a/doc/INLINEASM.md b/doc/INLINEASM.md @@ -1,325 +0,0 @@ -# INLINEASM — Phase 4b plan - -Scope: bring up GCC-style `asm("...")` inline assembly for aarch64. Companion -to `doc/ASM.md` (which covers the standalone `.s` pipeline already shipped in -phases 1–4a). This document is the per-phase detail; `doc/ASM.md §5 Phase 4b` -is a one-paragraph pointer here. - -The discipline from `doc/ASM.md §3` carries over: one description of each -instruction lives in `aa64_isa.h`, and inline asm reuses the standalone -per-mnemonic parsers verbatim by pre-substituting placeholders into asm source -text and re-lexing. No second copy of the operand grammar. - ---- - -## 1. Scope - -In: - -- `CFREE_ARCH_ARM_64` only. -- GNU asm syntax (per `DESIGN.md §10`); `asm`, `__asm__`, `asm volatile`, - `asm volatile goto` accepted at the keyword level. -- Constraints: `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching by index). -- Clobbers: register names route through call-clobber tracking; `"memory"` - spills/reloads all live SValues; `"cc"` accepted-and-ignored on aarch64 - (NZCV reserved across inline-asm blocks per `doc/ASM.md §4.3`). -- Placeholders: `%N`, `%wN`, `%xN`, `%[name]`, `%aN`, `%%`. - -Deferred: - -- Multi-alternative constraints, most letter constraints (per `doc/ASM.md §2`). -- `asm goto` labels — accept syntactically, error in `cg_inline_asm`. -- Macros, `.if`, `.macro` inside templates — same deferral as standalone. -- x64 / rv64 backends — Phase 5 multiarch seam revs the dispatch. - ---- - -## 2. Current state - -Where the seams are today: - -- `src/parse/parse.c` keyword table (~line 115): no `KW_ASM` / - `KW_BUILTIN_ASM`. `parse_stmt` (~line 5628) has no asm dispatch arm. -- `src/cg/cg.h:182` declares `cg_inline_asm(...)`. Body at `src/cg/cg.c:1522` - is a panic stub. The contract docstring at lines 178-181 already says - inputs ride the CG stack and outputs are pushed back as fresh SValues. -- `src/arch/aarch64.c:3099` (`aa_asm_block`) — panic stub. Same shape on - `src/arch/x64.c:2783` (`x_asm_block`) and `src/arch/rv64.c:2529` - (`rv_asm_block`). -- `src/opt/opt.c:693` (`w_asm_block`) — panic stub. Replay at - `src/opt/opt.c:815` is a `break;`. -- `src/opt/ir.h:88` defines `IR_ASM_BLOCK`; `src/opt/ir.h:170-177` defines - `IRAsmAux { tmpl, outs, ins, clobbers, out_ops, nout, nin, nclob }`. -- `src/arch/arch.h:251-269` defines `Operand` (kind: `OPK_IMM`, `OPK_REG`, - `OPK_LOCAL`, `OPK_GLOBAL`, `OPK_INDIRECT`). `src/arch/arch.h:370-374` - defines `AsmConstraint { str, dir }`. -- `src/parse/parse_asm_helpers.h` exposes `asm_driver_*` token plumbing, - `asm_driver_parse_const`, `asm_driver_parse_sym_expr`, `asm_driver_intern_sym`, - `asm_driver_panic`, etc. The `AsmDriver` struct stays internal to - `src/parse/parse_asm.c:46-68`. -- `src/arch/aa64_asm.h` exposes `aa64_asm_open(c)`, `aa64_asm_close(a)`, - `aa64_asm_insn(a, d, mnemonic)`. Per-mnemonic table at - `src/arch/aa64_asm.c:787` (`kTable[]`); `aa64_asm_insn` linear-scans by - case-insensitive name. - -So: every panic site is in place, the IR carrier exists, and the standalone -asm parser is reusable. Phase 4b wires them together. - ---- - -## 3. End-to-end flow - -``` -asm("...") in C source - │ - ▼ parse.c::parse_asm_stmt -push input exprs onto CG stack, build AsmConstraint[] + clobber Sym[] - │ - ▼ cg.c::cg_inline_asm -pop inputs → materialize Operand per constraint -allocate regs for outputs (r/=r/+r/=&r honoring matching `0`) -if "memory" clobber: spill all live SValues -call g->target->asm_block(..., out_ops, in_ops, clobbers) -push out SValues for parser to assign to lvalues - │ - ├─► arch/aarch64.c::aa_asm_block - │ aa64_asm_open → aa64_inline_bind(out_ops, in_ops, clobbers) - │ aa64_asm_run_template(mc, tmpl) - │ substitute %0/%w0/%x0/%[name]/%a0 → physical text - │ for each line: lex via memory-backed Lexer, dispatch - │ through aa64_asm_insn (existing per-mnemonic table) - │ aa64_asm_close - │ - └─► opt/opt.c::w_asm_block (recorder) - arena-copy template/outs/ins/clobbers/in_ops into IRAsmAux - rec(o, IR_ASM_BLOCK); replay xlat_op's operands and forwards - to the wrapped target. -``` - ---- - -## 4. Files - -| File | Change | -|------|--------| -| `src/parse/parse.c` (~5705) | Add `KW_ASM` + `KW_BUILTIN_ASM` to `kw_names[]`; add `parse_asm_stmt` and dispatch from `parse_stmt`. Reuse `pool_intern_cstr(p->c->global, ...)` for constraint/clobber strings. | -| `src/cg/cg.c:1522` (`cg_inline_asm`) | Replace panic body. Pop `nin` SValues via `pop(g)`; materialize per-constraint into `Operand`s; allocate output regs via `g->target->alloc_reg`; honor matching `0` constraints; on `"memory"` clobber spill live regs through `g->target->spill_reg`; call `g->target->asm_block`; push out SValues. | -| `src/cg/cg.h:182` | Signature stays — outputs and inputs ride the CG stack, contract already documented at lines 178-181. | -| `src/arch/aa64_asm.h` | Add `aa64_inline_bind(AA64Asm*, ...)`, `aa64_asm_run_template(AA64Asm*, MCEmitter*, const char* tmpl)` declarations. | -| `src/arch/aa64_asm.c` | Implement template walker: `%N` / `%wN` / `%xN` / `%[name]` / `%aN` render, per-line memory-Lexer → minimal inline `AsmDriver` → existing `aa64_asm_insn`. Width-check `%wN` vs format's `sf` expectation. | -| `src/arch/aarch64.c:3099` (`aa_asm_block`) | Replace panic body with `aa64_asm_open → aa64_inline_bind → aa64_asm_run_template → aa64_asm_close`. | -| `src/parse/parse_asm.c` + `src/parse/parse_asm_helpers.h` | Expose `asm_driver_open_inline(c, mc, lexer)` constructor for an inline-mode `AsmDriver` that reads from a memory buffer + caller-supplied MCEmitter. `AsmDriver` stays opaque. | -| `src/opt/opt.c:693` (`w_asm_block`) | Mirror `w_call` (lines 495-513): `arena_znew(IRAsmAux)`, copy `tmpl` + `outs` / `ins` / `clobbers` via `arena_array`, copy `in_ops`, allocate `out_ops` slots; `rec(o, IR_ASM_BLOCK)`. | -| `src/opt/opt.c:815` (`IR_ASM_BLOCK` replay) | Fetch aux, `xlat_op` each `in_ops[i]` and `out_ops[i]`, call `w->asm_block(...)` with materialized arrays. | -| `src/arch/x64.c:2783`, `src/arch/rv64.c:2529` | Leave panics. Phase 5 work. | - ---- - -## 5. Constraint binder (`cg_inline_asm` core) - -V1 set per `doc/ASM.md §4.3`: - -| constraint | behavior | -|------------|----------| -| `r` (in) | pop SValue; `force_reg` to ensure `OPK_REG`; bind to that physical reg | -| `=r` (out) | `t->alloc_reg(t, RC_INT, type)`; out_ops[i].kind = OPK_REG | -| `+r` (inout) | reuse the matching input's reg (popped first); out_ops shares it | -| `=&r` (early-clobber) | `t->alloc_reg` from the set disjoint from already-bound input regs | -| `i` | pop SValue; assert `op.kind == OPK_IMM`, else `compiler_panic` | -| `m` | pop SValue; if not already `OPK_INDIRECT`, materialize via the lvalue address machinery (or allocate a scratch base reg + store) | -| `0`/`1`/… | matching: bind input slot to the same physical reg as the referenced output slot | - -Clobbers: - -- `"memory"` — call `g->target->spill_reg` for every live reg-resident SValue - on the stack before the block; mark them spilled so subsequent reads reload - through `g->target->reload_reg`. Same machinery cg already uses across - function calls. -- Register-name clobbers (`"x0"`, …) — pass through to `target->asm_block`; - the aarch64 backend routes them through the existing call-clobber tracking - (`t->clobbers` exposes the same set). -- `"cc"` — silently dropped on aarch64. - -After the block: - -- For each `=r` / `+r` / `=&r` output, push `SValue{op = out_ops[i], - type = output_expr_type}` onto the value stack. The parser then assigns each - output SValue to its lvalue expression via the standard cg mechanisms. - ---- - -## 6. Template walker (`aa64_asm_run_template`) - -1. Split `tmpl` on `\n` and `;` into asm lines. -2. For each line, scan for `%` placeholders and emit substitutions into a - per-call `StrBuf` (`src/core/strbuf.h`): - - `%N` → register name in default form (use `%xN` width for int reg - operands, `#imm` for `OPK_IMM`, `[xN, #ofs]` for `OPK_INDIRECT`). - - `%wN` / `%xN` → forces 32 / 64 register form (diagnostic on mismatch - with format's `sf` width). - - `%[name]` → resolved via the optional `[name]` syntax on constraints - (capture name during parse and store on AsmConstraint). - - `%aN` → `[Xn, #ofs]` materialization for `m` constraint. - - `%%` → literal `%`. -3. Open a memory-backed Lexer over the rendered line, construct a minimal - inline `AsmDriver` via `asm_driver_open_inline(c, mc, lexer)`, dispatch - `aa64_asm_insn(a, d, mnemonic)` exactly like the standalone driver. -4. Branches inside inline asm emit relocs against locally-interned symbol - names — labels declared in the template (if any) are interned the same way - `parse_asm.c` already handles them. - ---- - -## 7. Parallelization - -The work splits cleanly along three seams that each compile and unit-test in -isolation. Three engineers (or three Claude sessions) can land them in -parallel; the integration commit is small. - -``` - ┌───────────────────────┐ - │ Track A — frontend │ parse.c keyword + parse_asm_stmt; - │ │ exercised by a temporary stub - │ │ cg_inline_asm that just records args. - └─────────┬─────────────┘ - │ shared: AsmConstraint[], clobber Sym[], template str - ┌─────────┴─────────────┐ - │ Track B — cg / opt │ cg_inline_asm body + w_asm_block - │ │ recorder + IR_ASM_BLOCK replay. - │ │ Mocks target->asm_block to a logger - │ │ for unit tests. - └─────────┬─────────────┘ - │ shared: Operand[] in_ops/out_ops, Sym[] clobbers, tmpl - ┌─────────┴─────────────┐ - │ Track C — aa64 backend│ aa64_inline_bind + - │ │ aa64_asm_run_template + aa_asm_block. - │ │ Driven by a tiny C-test that builds - │ │ Operand arrays by hand. - └───────────────────────┘ -``` - -### 7.1 Tracks - -- **Track A — Frontend (`parse.c`)** — owns `KW_ASM` / `KW_BUILTIN_ASM` - keyword registration, GNU asm statement grammar (`asm [volatile] [goto] - (...)`, the four colon-separated lists), constraint string interning. Stops - at the call to `cg_inline_asm`. Lands behind the existing panic stub on - `cg_inline_asm`, so the parser merges first; new asm cases panic cleanly - until B lands. -- **Track B — CG + Opt (`cg.c`, `opt.c`)** — owns `cg_inline_asm` body - (constraint binder, output reg allocation, `"memory"` clobber spill), - `w_asm_block` recorder, `IR_ASM_BLOCK` replay. Lands against a - `target->asm_block` mock that appends `(tmpl, in_ops, out_ops, clobbers)` - to a log so unit tests can assert the binder's contract without depending - on the aarch64 backend being ready. -- **Track C — aarch64 backend (`aa64_asm.c`, `aarch64.c`)** — owns - `aa64_inline_bind`, `aa64_asm_run_template`, the placeholder substitution - pass, `asm_driver_open_inline`, and the `aa_asm_block` thunk. Lands with a - tiny C unit test that constructs `Operand` arrays directly and invokes - `aa_asm_block` against an in-process MCEmitter, so it can be merged before - A or B is finished. - -### 7.2 Cut points and contracts - -The two seams are typed contracts each track can pin in a header without -needing the others' implementations: - -1. **Parser → CG**: `cg_inline_asm(g, tmpl, outs, nout, ins, nin, clobbers, - nclob)`. Already declared at `src/cg/cg.h:182`. Track A targets the - declared signature; Track B implements it; Track A merges with the - panic-stub still in place if B is not ready yet. -2. **CG → Target**: `target->asm_block(t, tmpl, outs, nout, out_ops, ins, - nin, in_ops, clobbers, nclob)`. Already declared at - `src/arch/arch.h:605-608`. Track B can bind against - `aa_panic("asm_block")` while Track C builds; cg-side tests use a - dedicated mock `CGTarget` so they don't depend on landing order. - -### 7.3 Integration order - -A and C can land in either order. B blocks neither but unlocks end-to-end -green: - -1. A merges (asm keyword + parser + stub call). cg corpus stays green - because no test exercises asm yet; new asm cases panic at the - `cg_inline_asm` stub. -2. C merges (aa64 inline backend + unit test). cg corpus stays green; unit - test gates the placeholder walker. -3. B merges. The stub goes away. The new `test/cg/cases_asm.c` smoke flips - to green on `DREJWS`. `S` path validates encode/decode pairing across the - new inline-asm bytes. - -If only one engineer is on the work, do A → C → B (front-to-back) so each -step lands an exercisable surface; if two, run A and C in parallel and -merge B last. - ---- - -## 8. Testing - -Inline asm is behavioral C with exit-code assertions, which is exactly what -`test/cg/cases_*.c` already does. Add `test/cg/cases_asm.c` (or fold into an -existing bucket) registered through `cg-runner` the same way every other -case is. - -Smallest smoke case: - -```c -void test_main(void) { - int rc = 42; - __asm__ volatile("mov w0, %w0; svc #0" : : "r"(rc) : "x0"); -} -``` - -Path matrix: - -- `D` (direct JIT), `J` (JIT via file), `E` (ELF exec under qemu/podman) — - exit-code assertions. -- `R` (round-trip), `W` (opt-recorder) — exercise `w_asm_block` / - `IR_ASM_BLOCK` replay. -- `S` (asm round-trip) — encode/decode pairing across the new inline-asm - bytes; if this fails, fix the format definition in `aa64_isa.h`, never the - parser site. - -Run the full matrix: - -``` -bash test/cg/run.sh '' DREJWS -``` - -Coverage cases (one each so the binder can't silently regress): - -- `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching). -- `"memory"` clobber. -- Register-name clobber. - -Cross-target panic preserved: - -``` -CFREE_TEST_ARCH=x64 bash test/cg/run.sh '' -``` - -still panics cleanly via the existing `x_panic("asm_block")` on inline-asm -cases — no silent miscompile. - -`make test-asm` is unaffected (no changes to the standalone-`.s` codepath). - ---- - -## 9. Decisions - -- **Operand transport**: parser pushes only inputs onto the CG stack; - outputs come back as fresh SValues that the parser assigns to the - declared lvalues. This matches the existing `cg_inline_asm` docstring at - `src/cg/cg.h:178-181`. -- **Template lexing**: pre-substitute placeholders to physical asm text and - re-lex via the standalone parser, instead of a Tok variant carrying an - `Operand`. Keeps one operand grammar and one lexer; the cost is one extra - StrBuf pass per inline block. -- **Memory clobber**: route through `target->spill_reg` / - `target->reload_reg`, the same machinery cg uses across function calls. - No new flush mechanism. -- **`asm volatile`**: accepted but informational — `IR_ASM_BLOCK` is - already opaque-to-passes per `doc/ASM.md §9.5`, so volatile changes - nothing at the IR level today. -- **`asm goto`**: parsed and rejected in `cg_inline_asm`. Phase 4b ships - the keyword grammar but not the label-ref machinery.