commit bb761e84894752fec18b51b8b9a914e06ccc2798
parent 7229b6f010e4525747e734627c71671bdef4275b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 11 May 2026 09:16:58 -0700
INLINEASM.md plan
Diffstat:
| M | doc/ASM.md | | | 19 | ++++++++----------- |
| A | doc/INLINEASM.md | | | 325 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
2 files changed, 333 insertions(+), 11 deletions(-)
diff --git a/doc/ASM.md b/doc/ASM.md
@@ -505,17 +505,14 @@ output of `cfree as` round-trips the smoke corpus.
### Phase 4b — inline asm
-1. Implement `aa_asm_block` in `arch/aarch64.c` calling into
- `aa64_asm_insn` against a template-driven token source. Implement
- `cg_inline_asm` in `cg/cg.c`: evaluate inputs to `Operand`s,
- materialize `&buf` for `m` constraints, call `target->asm_block`,
- push `out_ops` back as `SValue`s.
-2. Constraint binding (§4.3): `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0`.
-3. Memory clobber: CG flushes value stack (`spill_reg` for every live
- reg-resident SValue) before the call, marks them invalid after.
- Register clobbers route through the existing `clobbers` mechanism.
-4. `IR_ASM_BLOCK` already opaque-to-passes; opt recorder
- (`opt.c:692`) materializes operands and replays.
+See `doc/INLINEASM.md` for the detailed plan (scope, files, constraint
+binder, template walker, parallelization across three tracks, testing).
+The summary: stand up `parse_asm_stmt` in `parse.c`, implement
+`cg_inline_asm` (constraint binder + `"memory"` clobber spill), implement
+`aa_asm_block` + `aa64_asm_run_template` in `aa64_asm.c`, wire
+`w_asm_block` recorder + `IR_ASM_BLOCK` replay in `opt.c`. Three tracks
+(frontend / cg+opt / aa64) merge in any order behind the existing panic
+stubs.
Exit criterion: the inline-asm cases under `test/cg/` (svc-style
write-then-exit) build, run under qemu/podman, and report green on
diff --git a/doc/INLINEASM.md b/doc/INLINEASM.md
@@ -0,0 +1,325 @@
+# INLINEASM — Phase 4b plan
+
+Scope: bring up GCC-style `asm("...")` inline assembly for aarch64. Companion
+to `doc/ASM.md` (which covers the standalone `.s` pipeline already shipped in
+phases 1–4a). This document is the per-phase detail; `doc/ASM.md §5 Phase 4b`
+is a one-paragraph pointer here.
+
+The discipline from `doc/ASM.md §3` carries over: one description of each
+instruction lives in `aa64_isa.h`, and inline asm reuses the standalone
+per-mnemonic parsers verbatim by pre-substituting placeholders into asm source
+text and re-lexing. No second copy of the operand grammar.
+
+---
+
+## 1. Scope
+
+In:
+
+- `CFREE_ARCH_ARM_64` only.
+- GNU asm syntax (per `DESIGN.md §10`); `asm`, `__asm__`, `asm volatile`,
+ `asm volatile goto` accepted at the keyword level.
+- Constraints: `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching by index).
+- Clobbers: register names route through call-clobber tracking; `"memory"`
+ spills/reloads all live SValues; `"cc"` accepted-and-ignored on aarch64
+ (NZCV reserved across inline-asm blocks per `doc/ASM.md §4.3`).
+- Placeholders: `%N`, `%wN`, `%xN`, `%[name]`, `%aN`, `%%`.
+
+Deferred:
+
+- Multi-alternative constraints, most letter constraints (per `doc/ASM.md §2`).
+- `asm goto` labels — accept syntactically, error in `cg_inline_asm`.
+- Macros, `.if`, `.macro` inside templates — same deferral as standalone.
+- x64 / rv64 backends — Phase 5 multiarch seam revs the dispatch.
+
+---
+
+## 2. Current state
+
+Where the seams are today:
+
+- `src/parse/parse.c` keyword table (~line 115): no `KW_ASM` /
+ `KW_BUILTIN_ASM`. `parse_stmt` (~line 5628) has no asm dispatch arm.
+- `src/cg/cg.h:182` declares `cg_inline_asm(...)`. Body at `src/cg/cg.c:1522`
+ is a panic stub. The contract docstring at lines 178-181 already says
+ inputs ride the CG stack and outputs are pushed back as fresh SValues.
+- `src/arch/aarch64.c:3099` (`aa_asm_block`) — panic stub. Same shape on
+ `src/arch/x64.c:2783` (`x_asm_block`) and `src/arch/rv64.c:2529`
+ (`rv_asm_block`).
+- `src/opt/opt.c:693` (`w_asm_block`) — panic stub. Replay at
+ `src/opt/opt.c:815` is a `break;`.
+- `src/opt/ir.h:88` defines `IR_ASM_BLOCK`; `src/opt/ir.h:170-177` defines
+ `IRAsmAux { tmpl, outs, ins, clobbers, out_ops, nout, nin, nclob }`.
+- `src/arch/arch.h:251-269` defines `Operand` (kind: `OPK_IMM`, `OPK_REG`,
+ `OPK_LOCAL`, `OPK_GLOBAL`, `OPK_INDIRECT`). `src/arch/arch.h:370-374`
+ defines `AsmConstraint { str, dir }`.
+- `src/parse/parse_asm_helpers.h` exposes `asm_driver_*` token plumbing,
+ `asm_driver_parse_const`, `asm_driver_parse_sym_expr`, `asm_driver_intern_sym`,
+ `asm_driver_panic`, etc. The `AsmDriver` struct stays internal to
+ `src/parse/parse_asm.c:46-68`.
+- `src/arch/aa64_asm.h` exposes `aa64_asm_open(c)`, `aa64_asm_close(a)`,
+ `aa64_asm_insn(a, d, mnemonic)`. Per-mnemonic table at
+ `src/arch/aa64_asm.c:787` (`kTable[]`); `aa64_asm_insn` linear-scans by
+ case-insensitive name.
+
+So: every panic site is in place, the IR carrier exists, and the standalone
+asm parser is reusable. Phase 4b wires them together.
+
+---
+
+## 3. End-to-end flow
+
+```
+asm("...") in C source
+ │
+ ▼ parse.c::parse_asm_stmt
+push input exprs onto CG stack, build AsmConstraint[] + clobber Sym[]
+ │
+ ▼ cg.c::cg_inline_asm
+pop inputs → materialize Operand per constraint
+allocate regs for outputs (r/=r/+r/=&r honoring matching `0`)
+if "memory" clobber: spill all live SValues
+call g->target->asm_block(..., out_ops, in_ops, clobbers)
+push out SValues for parser to assign to lvalues
+ │
+ ├─► arch/aarch64.c::aa_asm_block
+ │ aa64_asm_open → aa64_inline_bind(out_ops, in_ops, clobbers)
+ │ aa64_asm_run_template(mc, tmpl)
+ │ substitute %0/%w0/%x0/%[name]/%a0 → physical text
+ │ for each line: lex via memory-backed Lexer, dispatch
+ │ through aa64_asm_insn (existing per-mnemonic table)
+ │ aa64_asm_close
+ │
+ └─► opt/opt.c::w_asm_block (recorder)
+ arena-copy template/outs/ins/clobbers/in_ops into IRAsmAux
+ rec(o, IR_ASM_BLOCK); replay xlat_op's operands and forwards
+ to the wrapped target.
+```
+
+---
+
+## 4. Files
+
+| File | Change |
+|------|--------|
+| `src/parse/parse.c` (~5705) | Add `KW_ASM` + `KW_BUILTIN_ASM` to `kw_names[]`; add `parse_asm_stmt` and dispatch from `parse_stmt`. Reuse `pool_intern_cstr(p->c->global, ...)` for constraint/clobber strings. |
+| `src/cg/cg.c:1522` (`cg_inline_asm`) | Replace panic body. Pop `nin` SValues via `pop(g)`; materialize per-constraint into `Operand`s; allocate output regs via `g->target->alloc_reg`; honor matching `0` constraints; on `"memory"` clobber spill live regs through `g->target->spill_reg`; call `g->target->asm_block`; push out SValues. |
+| `src/cg/cg.h:182` | Signature stays — outputs and inputs ride the CG stack, contract already documented at lines 178-181. |
+| `src/arch/aa64_asm.h` | Add `aa64_inline_bind(AA64Asm*, ...)`, `aa64_asm_run_template(AA64Asm*, MCEmitter*, const char* tmpl)` declarations. |
+| `src/arch/aa64_asm.c` | Implement template walker: `%N` / `%wN` / `%xN` / `%[name]` / `%aN` render, per-line memory-Lexer → minimal inline `AsmDriver` → existing `aa64_asm_insn`. Width-check `%wN` vs format's `sf` expectation. |
+| `src/arch/aarch64.c:3099` (`aa_asm_block`) | Replace panic body with `aa64_asm_open → aa64_inline_bind → aa64_asm_run_template → aa64_asm_close`. |
+| `src/parse/parse_asm.c` + `src/parse/parse_asm_helpers.h` | Expose `asm_driver_open_inline(c, mc, lexer)` constructor for an inline-mode `AsmDriver` that reads from a memory buffer + caller-supplied MCEmitter. `AsmDriver` stays opaque. |
+| `src/opt/opt.c:693` (`w_asm_block`) | Mirror `w_call` (lines 495-513): `arena_znew(IRAsmAux)`, copy `tmpl` + `outs` / `ins` / `clobbers` via `arena_array`, copy `in_ops`, allocate `out_ops` slots; `rec(o, IR_ASM_BLOCK)`. |
+| `src/opt/opt.c:815` (`IR_ASM_BLOCK` replay) | Fetch aux, `xlat_op` each `in_ops[i]` and `out_ops[i]`, call `w->asm_block(...)` with materialized arrays. |
+| `src/arch/x64.c:2783`, `src/arch/rv64.c:2529` | Leave panics. Phase 5 work. |
+
+---
+
+## 5. Constraint binder (`cg_inline_asm` core)
+
+V1 set per `doc/ASM.md §4.3`:
+
+| constraint | behavior |
+|------------|----------|
+| `r` (in) | pop SValue; `force_reg` to ensure `OPK_REG`; bind to that physical reg |
+| `=r` (out) | `t->alloc_reg(t, RC_INT, type)`; out_ops[i].kind = OPK_REG |
+| `+r` (inout) | reuse the matching input's reg (popped first); out_ops shares it |
+| `=&r` (early-clobber) | `t->alloc_reg` from the set disjoint from already-bound input regs |
+| `i` | pop SValue; assert `op.kind == OPK_IMM`, else `compiler_panic` |
+| `m` | pop SValue; if not already `OPK_INDIRECT`, materialize via the lvalue address machinery (or allocate a scratch base reg + store) |
+| `0`/`1`/… | matching: bind input slot to the same physical reg as the referenced output slot |
+
+Clobbers:
+
+- `"memory"` — call `g->target->spill_reg` for every live reg-resident SValue
+ on the stack before the block; mark them spilled so subsequent reads reload
+ through `g->target->reload_reg`. Same machinery cg already uses across
+ function calls.
+- Register-name clobbers (`"x0"`, …) — pass through to `target->asm_block`;
+ the aarch64 backend routes them through the existing call-clobber tracking
+ (`t->clobbers` exposes the same set).
+- `"cc"` — silently dropped on aarch64.
+
+After the block:
+
+- For each `=r` / `+r` / `=&r` output, push `SValue{op = out_ops[i],
+ type = output_expr_type}` onto the value stack. The parser then assigns each
+ output SValue to its lvalue expression via the standard cg mechanisms.
+
+---
+
+## 6. Template walker (`aa64_asm_run_template`)
+
+1. Split `tmpl` on `\n` and `;` into asm lines.
+2. For each line, scan for `%` placeholders and emit substitutions into a
+ per-call `StrBuf` (`src/core/strbuf.h`):
+ - `%N` → register name in default form (use `%xN` width for int reg
+ operands, `#imm` for `OPK_IMM`, `[xN, #ofs]` for `OPK_INDIRECT`).
+ - `%wN` / `%xN` → forces 32 / 64 register form (diagnostic on mismatch
+ with format's `sf` width).
+ - `%[name]` → resolved via the optional `[name]` syntax on constraints
+ (capture name during parse and store on AsmConstraint).
+ - `%aN` → `[Xn, #ofs]` materialization for `m` constraint.
+ - `%%` → literal `%`.
+3. Open a memory-backed Lexer over the rendered line, construct a minimal
+ inline `AsmDriver` via `asm_driver_open_inline(c, mc, lexer)`, dispatch
+ `aa64_asm_insn(a, d, mnemonic)` exactly like the standalone driver.
+4. Branches inside inline asm emit relocs against locally-interned symbol
+ names — labels declared in the template (if any) are interned the same way
+ `parse_asm.c` already handles them.
+
+---
+
+## 7. Parallelization
+
+The work splits cleanly along three seams that each compile and unit-test in
+isolation. Three engineers (or three Claude sessions) can land them in
+parallel; the integration commit is small.
+
+```
+ ┌───────────────────────┐
+ │ Track A — frontend │ parse.c keyword + parse_asm_stmt;
+ │ │ exercised by a temporary stub
+ │ │ cg_inline_asm that just records args.
+ └─────────┬─────────────┘
+ │ shared: AsmConstraint[], clobber Sym[], template str
+ ┌─────────┴─────────────┐
+ │ Track B — cg / opt │ cg_inline_asm body + w_asm_block
+ │ │ recorder + IR_ASM_BLOCK replay.
+ │ │ Mocks target->asm_block to a logger
+ │ │ for unit tests.
+ └─────────┬─────────────┘
+ │ shared: Operand[] in_ops/out_ops, Sym[] clobbers, tmpl
+ ┌─────────┴─────────────┐
+ │ Track C — aa64 backend│ aa64_inline_bind +
+ │ │ aa64_asm_run_template + aa_asm_block.
+ │ │ Driven by a tiny C-test that builds
+ │ │ Operand arrays by hand.
+ └───────────────────────┘
+```
+
+### 7.1 Tracks
+
+- **Track A — Frontend (`parse.c`)** — owns `KW_ASM` / `KW_BUILTIN_ASM`
+ keyword registration, GNU asm statement grammar (`asm [volatile] [goto]
+ (...)`, the four colon-separated lists), constraint string interning. Stops
+ at the call to `cg_inline_asm`. Lands behind the existing panic stub on
+ `cg_inline_asm`, so the parser merges first; new asm cases panic cleanly
+ until B lands.
+- **Track B — CG + Opt (`cg.c`, `opt.c`)** — owns `cg_inline_asm` body
+ (constraint binder, output reg allocation, `"memory"` clobber spill),
+ `w_asm_block` recorder, `IR_ASM_BLOCK` replay. Lands against a
+ `target->asm_block` mock that appends `(tmpl, in_ops, out_ops, clobbers)`
+ to a log so unit tests can assert the binder's contract without depending
+ on the aarch64 backend being ready.
+- **Track C — aarch64 backend (`aa64_asm.c`, `aarch64.c`)** — owns
+ `aa64_inline_bind`, `aa64_asm_run_template`, the placeholder substitution
+ pass, `asm_driver_open_inline`, and the `aa_asm_block` thunk. Lands with a
+ tiny C unit test that constructs `Operand` arrays directly and invokes
+ `aa_asm_block` against an in-process MCEmitter, so it can be merged before
+ A or B is finished.
+
+### 7.2 Cut points and contracts
+
+The two seams are typed contracts each track can pin in a header without
+needing the others' implementations:
+
+1. **Parser → CG**: `cg_inline_asm(g, tmpl, outs, nout, ins, nin, clobbers,
+ nclob)`. Already declared at `src/cg/cg.h:182`. Track A targets the
+ declared signature; Track B implements it; Track A merges with the
+ panic-stub still in place if B is not ready yet.
+2. **CG → Target**: `target->asm_block(t, tmpl, outs, nout, out_ops, ins,
+ nin, in_ops, clobbers, nclob)`. Already declared at
+ `src/arch/arch.h:605-608`. Track B can bind against
+ `aa_panic("asm_block")` while Track C builds; cg-side tests use a
+ dedicated mock `CGTarget` so they don't depend on landing order.
+
+### 7.3 Integration order
+
+A and C can land in either order. B blocks neither but unlocks end-to-end
+green:
+
+1. A merges (asm keyword + parser + stub call). cg corpus stays green
+ because no test exercises asm yet; new asm cases panic at the
+ `cg_inline_asm` stub.
+2. C merges (aa64 inline backend + unit test). cg corpus stays green; unit
+ test gates the placeholder walker.
+3. B merges. The stub goes away. The new `test/cg/cases_asm.c` smoke flips
+ to green on `DREJWS`. `S` path validates encode/decode pairing across the
+ new inline-asm bytes.
+
+If only one engineer is on the work, do A → C → B (front-to-back) so each
+step lands an exercisable surface; if two, run A and C in parallel and
+merge B last.
+
+---
+
+## 8. Testing
+
+Inline asm is behavioral C with exit-code assertions, which is exactly what
+`test/cg/cases_*.c` already does. Add `test/cg/cases_asm.c` (or fold into an
+existing bucket) registered through `cg-runner` the same way every other
+case is.
+
+Smallest smoke case:
+
+```c
+void test_main(void) {
+ int rc = 42;
+ __asm__ volatile("mov w0, %w0; svc #0" : : "r"(rc) : "x0");
+}
+```
+
+Path matrix:
+
+- `D` (direct JIT), `J` (JIT via file), `E` (ELF exec under qemu/podman) —
+ exit-code assertions.
+- `R` (round-trip), `W` (opt-recorder) — exercise `w_asm_block` /
+ `IR_ASM_BLOCK` replay.
+- `S` (asm round-trip) — encode/decode pairing across the new inline-asm
+ bytes; if this fails, fix the format definition in `aa64_isa.h`, never the
+ parser site.
+
+Run the full matrix:
+
+```
+bash test/cg/run.sh '' DREJWS
+```
+
+Coverage cases (one each so the binder can't silently regress):
+
+- `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching).
+- `"memory"` clobber.
+- Register-name clobber.
+
+Cross-target panic preserved:
+
+```
+CFREE_TEST_ARCH=x64 bash test/cg/run.sh ''
+```
+
+still panics cleanly via the existing `x_panic("asm_block")` on inline-asm
+cases — no silent miscompile.
+
+`make test-asm` is unaffected (no changes to the standalone-`.s` codepath).
+
+---
+
+## 9. Decisions
+
+- **Operand transport**: parser pushes only inputs onto the CG stack;
+ outputs come back as fresh SValues that the parser assigns to the
+ declared lvalues. This matches the existing `cg_inline_asm` docstring at
+ `src/cg/cg.h:178-181`.
+- **Template lexing**: pre-substitute placeholders to physical asm text and
+ re-lex via the standalone parser, instead of a Tok variant carrying an
+ `Operand`. Keeps one operand grammar and one lexer; the cost is one extra
+ StrBuf pass per inline block.
+- **Memory clobber**: route through `target->spill_reg` /
+ `target->reload_reg`, the same machinery cg uses across function calls.
+ No new flush mechanism.
+- **`asm volatile`**: accepted but informational — `IR_ASM_BLOCK` is
+ already opaque-to-passes per `doc/ASM.md §9.5`, so volatile changes
+ nothing at the IR level today.
+- **`asm goto`**: parsed and rejected in `cg_inline_asm`. Phase 4b ships
+ the keyword grammar but not the label-ref machinery.