kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit bb761e84894752fec18b51b8b9a914e06ccc2798
parent 7229b6f010e4525747e734627c71671bdef4275b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 09:16:58 -0700

INLINEASM.md plan

Diffstat:
Mdoc/ASM.md | 19++++++++-----------
Adoc/INLINEASM.md | 325+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 333 insertions(+), 11 deletions(-)

diff --git a/doc/ASM.md b/doc/ASM.md @@ -505,17 +505,14 @@ output of `cfree as` round-trips the smoke corpus. ### Phase 4b — inline asm -1. Implement `aa_asm_block` in `arch/aarch64.c` calling into - `aa64_asm_insn` against a template-driven token source. Implement - `cg_inline_asm` in `cg/cg.c`: evaluate inputs to `Operand`s, - materialize `&buf` for `m` constraints, call `target->asm_block`, - push `out_ops` back as `SValue`s. -2. Constraint binding (§4.3): `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0`. -3. Memory clobber: CG flushes value stack (`spill_reg` for every live - reg-resident SValue) before the call, marks them invalid after. - Register clobbers route through the existing `clobbers` mechanism. -4. `IR_ASM_BLOCK` already opaque-to-passes; opt recorder - (`opt.c:692`) materializes operands and replays. +See `doc/INLINEASM.md` for the detailed plan (scope, files, constraint +binder, template walker, parallelization across three tracks, testing). +The summary: stand up `parse_asm_stmt` in `parse.c`, implement +`cg_inline_asm` (constraint binder + `"memory"` clobber spill), implement +`aa_asm_block` + `aa64_asm_run_template` in `aa64_asm.c`, wire +`w_asm_block` recorder + `IR_ASM_BLOCK` replay in `opt.c`. Three tracks +(frontend / cg+opt / aa64) merge in any order behind the existing panic +stubs. Exit criterion: the inline-asm cases under `test/cg/` (svc-style write-then-exit) build, run under qemu/podman, and report green on diff --git a/doc/INLINEASM.md b/doc/INLINEASM.md @@ -0,0 +1,325 @@ +# INLINEASM — Phase 4b plan + +Scope: bring up GCC-style `asm("...")` inline assembly for aarch64. Companion +to `doc/ASM.md` (which covers the standalone `.s` pipeline already shipped in +phases 1–4a). This document is the per-phase detail; `doc/ASM.md §5 Phase 4b` +is a one-paragraph pointer here. + +The discipline from `doc/ASM.md §3` carries over: one description of each +instruction lives in `aa64_isa.h`, and inline asm reuses the standalone +per-mnemonic parsers verbatim by pre-substituting placeholders into asm source +text and re-lexing. No second copy of the operand grammar. + +--- + +## 1. Scope + +In: + +- `CFREE_ARCH_ARM_64` only. +- GNU asm syntax (per `DESIGN.md §10`); `asm`, `__asm__`, `asm volatile`, + `asm volatile goto` accepted at the keyword level. +- Constraints: `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching by index). +- Clobbers: register names route through call-clobber tracking; `"memory"` + spills/reloads all live SValues; `"cc"` accepted-and-ignored on aarch64 + (NZCV reserved across inline-asm blocks per `doc/ASM.md §4.3`). +- Placeholders: `%N`, `%wN`, `%xN`, `%[name]`, `%aN`, `%%`. + +Deferred: + +- Multi-alternative constraints, most letter constraints (per `doc/ASM.md §2`). +- `asm goto` labels — accept syntactically, error in `cg_inline_asm`. +- Macros, `.if`, `.macro` inside templates — same deferral as standalone. +- x64 / rv64 backends — Phase 5 multiarch seam revs the dispatch. + +--- + +## 2. Current state + +Where the seams are today: + +- `src/parse/parse.c` keyword table (~line 115): no `KW_ASM` / + `KW_BUILTIN_ASM`. `parse_stmt` (~line 5628) has no asm dispatch arm. +- `src/cg/cg.h:182` declares `cg_inline_asm(...)`. Body at `src/cg/cg.c:1522` + is a panic stub. The contract docstring at lines 178-181 already says + inputs ride the CG stack and outputs are pushed back as fresh SValues. +- `src/arch/aarch64.c:3099` (`aa_asm_block`) — panic stub. Same shape on + `src/arch/x64.c:2783` (`x_asm_block`) and `src/arch/rv64.c:2529` + (`rv_asm_block`). +- `src/opt/opt.c:693` (`w_asm_block`) — panic stub. Replay at + `src/opt/opt.c:815` is a `break;`. +- `src/opt/ir.h:88` defines `IR_ASM_BLOCK`; `src/opt/ir.h:170-177` defines + `IRAsmAux { tmpl, outs, ins, clobbers, out_ops, nout, nin, nclob }`. +- `src/arch/arch.h:251-269` defines `Operand` (kind: `OPK_IMM`, `OPK_REG`, + `OPK_LOCAL`, `OPK_GLOBAL`, `OPK_INDIRECT`). `src/arch/arch.h:370-374` + defines `AsmConstraint { str, dir }`. +- `src/parse/parse_asm_helpers.h` exposes `asm_driver_*` token plumbing, + `asm_driver_parse_const`, `asm_driver_parse_sym_expr`, `asm_driver_intern_sym`, + `asm_driver_panic`, etc. The `AsmDriver` struct stays internal to + `src/parse/parse_asm.c:46-68`. +- `src/arch/aa64_asm.h` exposes `aa64_asm_open(c)`, `aa64_asm_close(a)`, + `aa64_asm_insn(a, d, mnemonic)`. Per-mnemonic table at + `src/arch/aa64_asm.c:787` (`kTable[]`); `aa64_asm_insn` linear-scans by + case-insensitive name. + +So: every panic site is in place, the IR carrier exists, and the standalone +asm parser is reusable. Phase 4b wires them together. + +--- + +## 3. End-to-end flow + +``` +asm("...") in C source + │ + ▼ parse.c::parse_asm_stmt +push input exprs onto CG stack, build AsmConstraint[] + clobber Sym[] + │ + ▼ cg.c::cg_inline_asm +pop inputs → materialize Operand per constraint +allocate regs for outputs (r/=r/+r/=&r honoring matching `0`) +if "memory" clobber: spill all live SValues +call g->target->asm_block(..., out_ops, in_ops, clobbers) +push out SValues for parser to assign to lvalues + │ + ├─► arch/aarch64.c::aa_asm_block + │ aa64_asm_open → aa64_inline_bind(out_ops, in_ops, clobbers) + │ aa64_asm_run_template(mc, tmpl) + │ substitute %0/%w0/%x0/%[name]/%a0 → physical text + │ for each line: lex via memory-backed Lexer, dispatch + │ through aa64_asm_insn (existing per-mnemonic table) + │ aa64_asm_close + │ + └─► opt/opt.c::w_asm_block (recorder) + arena-copy template/outs/ins/clobbers/in_ops into IRAsmAux + rec(o, IR_ASM_BLOCK); replay xlat_op's operands and forwards + to the wrapped target. +``` + +--- + +## 4. Files + +| File | Change | +|------|--------| +| `src/parse/parse.c` (~5705) | Add `KW_ASM` + `KW_BUILTIN_ASM` to `kw_names[]`; add `parse_asm_stmt` and dispatch from `parse_stmt`. Reuse `pool_intern_cstr(p->c->global, ...)` for constraint/clobber strings. | +| `src/cg/cg.c:1522` (`cg_inline_asm`) | Replace panic body. Pop `nin` SValues via `pop(g)`; materialize per-constraint into `Operand`s; allocate output regs via `g->target->alloc_reg`; honor matching `0` constraints; on `"memory"` clobber spill live regs through `g->target->spill_reg`; call `g->target->asm_block`; push out SValues. | +| `src/cg/cg.h:182` | Signature stays — outputs and inputs ride the CG stack, contract already documented at lines 178-181. | +| `src/arch/aa64_asm.h` | Add `aa64_inline_bind(AA64Asm*, ...)`, `aa64_asm_run_template(AA64Asm*, MCEmitter*, const char* tmpl)` declarations. | +| `src/arch/aa64_asm.c` | Implement template walker: `%N` / `%wN` / `%xN` / `%[name]` / `%aN` render, per-line memory-Lexer → minimal inline `AsmDriver` → existing `aa64_asm_insn`. Width-check `%wN` vs format's `sf` expectation. | +| `src/arch/aarch64.c:3099` (`aa_asm_block`) | Replace panic body with `aa64_asm_open → aa64_inline_bind → aa64_asm_run_template → aa64_asm_close`. | +| `src/parse/parse_asm.c` + `src/parse/parse_asm_helpers.h` | Expose `asm_driver_open_inline(c, mc, lexer)` constructor for an inline-mode `AsmDriver` that reads from a memory buffer + caller-supplied MCEmitter. `AsmDriver` stays opaque. | +| `src/opt/opt.c:693` (`w_asm_block`) | Mirror `w_call` (lines 495-513): `arena_znew(IRAsmAux)`, copy `tmpl` + `outs` / `ins` / `clobbers` via `arena_array`, copy `in_ops`, allocate `out_ops` slots; `rec(o, IR_ASM_BLOCK)`. | +| `src/opt/opt.c:815` (`IR_ASM_BLOCK` replay) | Fetch aux, `xlat_op` each `in_ops[i]` and `out_ops[i]`, call `w->asm_block(...)` with materialized arrays. | +| `src/arch/x64.c:2783`, `src/arch/rv64.c:2529` | Leave panics. Phase 5 work. | + +--- + +## 5. Constraint binder (`cg_inline_asm` core) + +V1 set per `doc/ASM.md §4.3`: + +| constraint | behavior | +|------------|----------| +| `r` (in) | pop SValue; `force_reg` to ensure `OPK_REG`; bind to that physical reg | +| `=r` (out) | `t->alloc_reg(t, RC_INT, type)`; out_ops[i].kind = OPK_REG | +| `+r` (inout) | reuse the matching input's reg (popped first); out_ops shares it | +| `=&r` (early-clobber) | `t->alloc_reg` from the set disjoint from already-bound input regs | +| `i` | pop SValue; assert `op.kind == OPK_IMM`, else `compiler_panic` | +| `m` | pop SValue; if not already `OPK_INDIRECT`, materialize via the lvalue address machinery (or allocate a scratch base reg + store) | +| `0`/`1`/… | matching: bind input slot to the same physical reg as the referenced output slot | + +Clobbers: + +- `"memory"` — call `g->target->spill_reg` for every live reg-resident SValue + on the stack before the block; mark them spilled so subsequent reads reload + through `g->target->reload_reg`. Same machinery cg already uses across + function calls. +- Register-name clobbers (`"x0"`, …) — pass through to `target->asm_block`; + the aarch64 backend routes them through the existing call-clobber tracking + (`t->clobbers` exposes the same set). +- `"cc"` — silently dropped on aarch64. + +After the block: + +- For each `=r` / `+r` / `=&r` output, push `SValue{op = out_ops[i], + type = output_expr_type}` onto the value stack. The parser then assigns each + output SValue to its lvalue expression via the standard cg mechanisms. + +--- + +## 6. Template walker (`aa64_asm_run_template`) + +1. Split `tmpl` on `\n` and `;` into asm lines. +2. For each line, scan for `%` placeholders and emit substitutions into a + per-call `StrBuf` (`src/core/strbuf.h`): + - `%N` → register name in default form (use `%xN` width for int reg + operands, `#imm` for `OPK_IMM`, `[xN, #ofs]` for `OPK_INDIRECT`). + - `%wN` / `%xN` → forces 32 / 64 register form (diagnostic on mismatch + with format's `sf` width). + - `%[name]` → resolved via the optional `[name]` syntax on constraints + (capture name during parse and store on AsmConstraint). + - `%aN` → `[Xn, #ofs]` materialization for `m` constraint. + - `%%` → literal `%`. +3. Open a memory-backed Lexer over the rendered line, construct a minimal + inline `AsmDriver` via `asm_driver_open_inline(c, mc, lexer)`, dispatch + `aa64_asm_insn(a, d, mnemonic)` exactly like the standalone driver. +4. Branches inside inline asm emit relocs against locally-interned symbol + names — labels declared in the template (if any) are interned the same way + `parse_asm.c` already handles them. + +--- + +## 7. Parallelization + +The work splits cleanly along three seams that each compile and unit-test in +isolation. Three engineers (or three Claude sessions) can land them in +parallel; the integration commit is small. + +``` + ┌───────────────────────┐ + │ Track A — frontend │ parse.c keyword + parse_asm_stmt; + │ │ exercised by a temporary stub + │ │ cg_inline_asm that just records args. + └─────────┬─────────────┘ + │ shared: AsmConstraint[], clobber Sym[], template str + ┌─────────┴─────────────┐ + │ Track B — cg / opt │ cg_inline_asm body + w_asm_block + │ │ recorder + IR_ASM_BLOCK replay. + │ │ Mocks target->asm_block to a logger + │ │ for unit tests. + └─────────┬─────────────┘ + │ shared: Operand[] in_ops/out_ops, Sym[] clobbers, tmpl + ┌─────────┴─────────────┐ + │ Track C — aa64 backend│ aa64_inline_bind + + │ │ aa64_asm_run_template + aa_asm_block. + │ │ Driven by a tiny C-test that builds + │ │ Operand arrays by hand. + └───────────────────────┘ +``` + +### 7.1 Tracks + +- **Track A — Frontend (`parse.c`)** — owns `KW_ASM` / `KW_BUILTIN_ASM` + keyword registration, GNU asm statement grammar (`asm [volatile] [goto] + (...)`, the four colon-separated lists), constraint string interning. Stops + at the call to `cg_inline_asm`. Lands behind the existing panic stub on + `cg_inline_asm`, so the parser merges first; new asm cases panic cleanly + until B lands. +- **Track B — CG + Opt (`cg.c`, `opt.c`)** — owns `cg_inline_asm` body + (constraint binder, output reg allocation, `"memory"` clobber spill), + `w_asm_block` recorder, `IR_ASM_BLOCK` replay. Lands against a + `target->asm_block` mock that appends `(tmpl, in_ops, out_ops, clobbers)` + to a log so unit tests can assert the binder's contract without depending + on the aarch64 backend being ready. +- **Track C — aarch64 backend (`aa64_asm.c`, `aarch64.c`)** — owns + `aa64_inline_bind`, `aa64_asm_run_template`, the placeholder substitution + pass, `asm_driver_open_inline`, and the `aa_asm_block` thunk. Lands with a + tiny C unit test that constructs `Operand` arrays directly and invokes + `aa_asm_block` against an in-process MCEmitter, so it can be merged before + A or B is finished. + +### 7.2 Cut points and contracts + +The two seams are typed contracts each track can pin in a header without +needing the others' implementations: + +1. **Parser → CG**: `cg_inline_asm(g, tmpl, outs, nout, ins, nin, clobbers, + nclob)`. Already declared at `src/cg/cg.h:182`. Track A targets the + declared signature; Track B implements it; Track A merges with the + panic-stub still in place if B is not ready yet. +2. **CG → Target**: `target->asm_block(t, tmpl, outs, nout, out_ops, ins, + nin, in_ops, clobbers, nclob)`. Already declared at + `src/arch/arch.h:605-608`. Track B can bind against + `aa_panic("asm_block")` while Track C builds; cg-side tests use a + dedicated mock `CGTarget` so they don't depend on landing order. + +### 7.3 Integration order + +A and C can land in either order. B blocks neither but unlocks end-to-end +green: + +1. A merges (asm keyword + parser + stub call). cg corpus stays green + because no test exercises asm yet; new asm cases panic at the + `cg_inline_asm` stub. +2. C merges (aa64 inline backend + unit test). cg corpus stays green; unit + test gates the placeholder walker. +3. B merges. The stub goes away. The new `test/cg/cases_asm.c` smoke flips + to green on `DREJWS`. `S` path validates encode/decode pairing across the + new inline-asm bytes. + +If only one engineer is on the work, do A → C → B (front-to-back) so each +step lands an exercisable surface; if two, run A and C in parallel and +merge B last. + +--- + +## 8. Testing + +Inline asm is behavioral C with exit-code assertions, which is exactly what +`test/cg/cases_*.c` already does. Add `test/cg/cases_asm.c` (or fold into an +existing bucket) registered through `cg-runner` the same way every other +case is. + +Smallest smoke case: + +```c +void test_main(void) { + int rc = 42; + __asm__ volatile("mov w0, %w0; svc #0" : : "r"(rc) : "x0"); +} +``` + +Path matrix: + +- `D` (direct JIT), `J` (JIT via file), `E` (ELF exec under qemu/podman) — + exit-code assertions. +- `R` (round-trip), `W` (opt-recorder) — exercise `w_asm_block` / + `IR_ASM_BLOCK` replay. +- `S` (asm round-trip) — encode/decode pairing across the new inline-asm + bytes; if this fails, fix the format definition in `aa64_isa.h`, never the + parser site. + +Run the full matrix: + +``` +bash test/cg/run.sh '' DREJWS +``` + +Coverage cases (one each so the binder can't silently regress): + +- `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching). +- `"memory"` clobber. +- Register-name clobber. + +Cross-target panic preserved: + +``` +CFREE_TEST_ARCH=x64 bash test/cg/run.sh '' +``` + +still panics cleanly via the existing `x_panic("asm_block")` on inline-asm +cases — no silent miscompile. + +`make test-asm` is unaffected (no changes to the standalone-`.s` codepath). + +--- + +## 9. Decisions + +- **Operand transport**: parser pushes only inputs onto the CG stack; + outputs come back as fresh SValues that the parser assigns to the + declared lvalues. This matches the existing `cg_inline_asm` docstring at + `src/cg/cg.h:178-181`. +- **Template lexing**: pre-substitute placeholders to physical asm text and + re-lex via the standalone parser, instead of a Tok variant carrying an + `Operand`. Keeps one operand grammar and one lexer; the cost is one extra + StrBuf pass per inline block. +- **Memory clobber**: route through `target->spill_reg` / + `target->reload_reg`, the same machinery cg uses across function calls. + No new flush mechanism. +- **`asm volatile`**: accepted but informational — `IR_ASM_BLOCK` is + already opaque-to-passes per `doc/ASM.md §9.5`, so volatile changes + nothing at the IR level today. +- **`asm goto`**: parsed and rejected in `cg_inline_asm`. Phase 4b ships + the keyword grammar but not the label-ref machinery.