kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit cb33525798c6427becd139b2632ba02986406d1f
parent 0f61a9b9327b44012f4c9ef6059ea07381da2870
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 11:46:05 -0700

ASM.md plan

Diffstat:
Adoc/ASM.md | 487+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 487 insertions(+), 0 deletions(-)

diff --git a/doc/ASM.md b/doc/ASM.md @@ -0,0 +1,487 @@ +# ASM — assembler and disassembler plan + +Scope: bring up the asm frontend (standalone `.s` and inline `asm("...")`) +and the matching disassembler, starting with aarch64. Companion to +`DESIGN.md §10` and `MULTIARCH.md`. + +The asm and disasm sides are designed together so that one description of +each instruction serves both: same field layout, same operand syntax, same +mnemonic table. When an opcode bit moves, the encoder and the decoder +update at one site and stay in sync by construction. + +--- + +## 1. Current state + +- `src/arch/aa64_isa.{h,c}`: per-format `pack`/`unpack` round-trippers and + a `(mnemonic, match, mask, AA64Format)` descriptor table. + `aa64_disasm_find` already linear-scans the table by `(word & mask) == + match`. The encoders are inline wrappers that call `pack`. **This is the + pairing seam.** Half of the work below is just finishing what's started + here. +- `src/parse/parse.h:23` declares `parse_asm`; `src/api/stubs.c:45` + implements it as a panic. `src/api/pipeline.c:208` already routes + `CFREE_LANG_ASM` inputs to it, so the wiring is in place. +- `CGTarget.asm_block` is a method on every backend; `aa_asm_block` + (`arch/aarch64.c:2969`), `xx_asm_block`, `rv_asm_block`, and the opt + recorder's `w_asm_block` all panic. +- `cg_inline_asm` (`src/cg/cg.c:1337`) is the parser-side entry; panics. +- `arch_disasm_new` / `arch_disasm_decode` (`src/arch/arch.h:647`) are + declared but no impl exists. Public surface + (`cfree_disasm_iter_*`, `cfree_obj_disasm`, + `cfree_arch_register_*`) is in `include/cfree.h` and stubbed out in + `src/api/stubs.c`. + +So: data model decided, descriptor table partly populated, all behavior +still a panic. The work is one focused vertical. + +--- + +## 2. Target slice for first milestone + +| axis | value | +|-----------|--------------------------------| +| arch | `CFREE_ARCH_ARM_64` | +| syntax | GNU `as` "unified" (per `DESIGN.md §10`) | +| objfmt | ELF (the only one wired today) | +| inline | aarch64 GCC `%w0`/`%x0`/`%[name]` substitution | +| constraints | `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0` (matching) | + +Out of scope this pass: + +- x86 AT&T and RISC-V GNU syntax. Their per-arch parsers slot in as peers + once aarch64 proves the seams (see §6). +- WAT for WASM. Different enough to merit its own document. +- Macros / `.if` / `.macro` / `.altmacro`. The directive set is + intentionally small (§4.4); macros are a follow-up. +- Full GCC constraint coverage (multi-alternative, `&` outside outputs, + most letter constraints). Tracked under `DESIGN.md §10` as deferred. + +--- + +## 3. Encode/decode pairing — one description per instruction + +The discipline that makes asm and disasm cheap to keep in sync: + +``` + ┌──────────────────┐ + asm text ─lex─► │ per-format │ ─pack(fields)─► u32 bytes + │ parse_operands │ │ + bytes ──────────┤ │ ◄─unpack(word)── │ + │ per-format │ + disasm text ◄────────┤ print_operands │ + └──────────────────┘ + ▲ + │ + AA64InsnDesc { mnemonic, match, mask, format } +``` + +Per format (already established in `aa64_isa.h`): + +- `AA64<Fmt>` field struct. +- `aa64_<fmt>_pack(fields) -> u32`, `aa64_<fmt>_unpack(u32) -> fields`. +- *(new)* `aa64_<fmt>_parse(Asm*, AA64InsnDesc*, fields*)` — parses the + operand grammar for this format and fills the field struct. Reads + per-instruction opcode bits from the descriptor, so one parser handles + all members of the family. +- *(new)* `aa64_<fmt>_print(fields, sb)` — renders text for a decoded + word. The disasm text path and the round-trip check share this. + +Per instruction (one row in `aa64_insn_table`): + +- *(unchanged)* `mnemonic`, `match`, `mask`, `format`. +- *(new)* a small `AsmFlags` byte for things that vary across same-format + members: alias status, sf-required, special operand syntax (e.g. + `RET` with optional `Xn`). + +**Aliases** (`MOV` for `ORR Rd, ZR, Rm`; `MUL` for `MADD ..., ZR`; `NEG` +for `SUB Rd, ZR, Rm`) are extra rows with tighter masks placed *before* +the canonical row in `aa64_insn_table.c`. First-match-wins is already the +documented invariant. The disasm prints the alias; the asm accepts both +the alias and the canonical spelling. + +**Source of truth.** Encoder, decoder, asm parser, and asm printer all go +through `aa64_isa.h`. No second copy of the bit layout anywhere — not in +`arch/aarch64.c` codegen helpers (those already call the inline encoders), +not in test fixtures. + +**Round-trip property.** For every byte sequence `B` the disasm-then-asm +round trip is idempotent: `assemble(disasm(B)) == B` for every +instruction the assembler accepts. `disasm(assemble(disasm(B))) == +disasm(B)` is the testable form (see §7). This catches missing format +entries and operand-print/parse drift. + +--- + +## 4. Module layout + +Reuse `aa64` prefix (`MULTIARCH.md §5`). + +``` +src/parse/parse_asm.c shared driver: scan tokens, dispatch directives, + call per-arch instruction parser. New. +src/arch/aa64_asm.{h,c} aa64 instruction parser + inline-asm template + walker. New. Owns AsmCtx and constraint binding. +src/arch/aa64_disasm.{h,c} aa64 ArchDisasm impl. Wraps aa64_disasm_find + with operand printing. New. +src/arch/aa64_isa.{h,c} already exists. Gains per-format + parse_operands / print_operands and + AsmFlags column on AA64InsnDesc. +src/arch/disasm.c arch_disasm_new dispatch by c->target.arch + (peer of arch/cgtarget.c per MULTIARCH §2.1). + New. +src/api/disasm.c cfree_disasm_iter_* / cfree_obj_disasm / + cfree_arch_register_* over arch_disasm_*. + Replaces stubs in src/api/stubs.c. +``` + +The four pieces fall on three seams: (a) `parse_asm` ↔ per-arch instruction +parser, (b) `MCEmitter` is the byte sink for both asm and codegen, (c) +`arch_disasm_new` ↔ per-arch decoder. `aa64_isa.h` is the shared truth +crossing those seams. + +### 4.1 `parse_asm` driver — arch-agnostic + +```c +void parse_asm(Compiler* c, Lexer* l, MCEmitter* mc) { + AsmDriver d = {.c = c, .lex = l, .mc = mc, .arch = aa64_asm_open(c)}; + for (;;) { + Tok t = lex_next(l); + if (t.kind == TOK_EOF) break; + if (is_directive(t)) parse_directive(&d, t); /* .text, .globl, ... */ + else if (is_label(t)) parse_label(&d, t); /* foo: */ + else if (is_ident(t)) d.arch->insn(d.arch, &d, t); /* mnemonic line */ + else parse_skip_to_newline(&d); + } + aa64_asm_close(d.arch); +} +``` + +Directives, labels, expression evaluation, and the `MCEmitter` glue live +in this file because every arch needs the same set. Per-arch code is one +function pointer (`arch->insn`), one symbol (`aa64_asm_open`). + +### 4.2 `aa64_asm_open` — instruction parser + +```c +typedef struct AsmCtx AsmCtx; /* tokens, scratch, label map */ +typedef struct AA64Asm { + Compiler* c; + void (*insn)(struct AA64Asm*, AsmDriver*, Tok mnemonic); + /* + register-name table, mnemonic→AA64InsnDesc lookup, + * inline-asm placeholder substitution state. */ +} AA64Asm; +``` + +`insn` looks the mnemonic up in `aa64_insn_table` (the same table +`aa64_disasm_find` uses), dispatches on `format`, calls +`aa64_<fmt>_parse` to fill the field struct, then calls +`aa64_<fmt>_pack` and writes the `u32` through `mc->emit_bytes`. Branches +also call `mc->emit_label_ref` for the relocatable bit slice. + +### 4.3 Inline asm — same parser, different operand source + +```c +static void aa_asm_block(CGTarget* t, const char* tmpl, + const AsmConstraint* outs, u32 nout, Operand* out_ops, + const AsmConstraint* ins, u32 nin, const Operand* in_ops, + const Sym* clobbers, u32 nclob) { + AA64Asm* a = aa64_asm_open(t->c); + aa64_inline_bind(a, outs, nout, out_ops, ins, nin, in_ops, clobbers, nclob); + aa64_asm_run_template(a, t->mc, tmpl); + aa64_asm_close(a); +} +``` + +The template walker is the same `aa64_<fmt>_parse` set used by the +standalone path. The only delta is the operand lexer: in inline mode, +`%0`, `%w0`, `%x0`, `%[name]` resolve to the bound `Operand` for the +corresponding constraint. `%w0` prints the W-form register name (forces +`sf=0`); `%x0` the X-form. Memory operands `%a0` materialize as +`[Xn, #ofs]`. Bit width is checked against the format's expectation +(e.g. a 32-bit format with `%x0` is a diagnostic). + +Constraint binding (v1 set): + +| constraint | meaning | +|------------|---------------------------------------------------------------| +| `r` / `=r` | int reg; allocated via the codegen scratch pool of the active CGTarget | +| `+r` | input + output, same register | +| `=&r` | early-clobber output (allocated disjoint from any input) | +| `i` | compile-time integer; must be `OPK_IMM` | +| `m` | memory operand; bind a scratch base reg if the source isn't `OPK_INDIRECT` | +| `0` (etc.) | matching constraint: input must use the same physical reg as output 0 | + +`"memory"` clobber forces CG to flush all live stack values to memory +before the block and reload after, per `DESIGN.md §10`. Register-name +clobbers add to the "clobbered by call" set so RA does not reuse them +across the block. `"cc"` is accepted and ignored on aarch64 (NZCV is +reserved by the inline-asm contract anyway — no instruction outside the +block reads it across the block). + +Under `opt_cgtarget` the call is recorded as `IR_ASM_BLOCK` (already an +opaque-to-passes record per `DESIGN.md §9.5`); at lowering the wrapped +target sees the same call with materialized operands. + +### 4.4 Directives — minimum viable set + +``` +.section NAME [, "FLAGS", @TYPE] +.text .data .rodata .bss +.globl SYM .local SYM .weak SYM .hidden SYM +.type SYM, @function | @object +.size SYM, EXPR +.byte .hword .word .quad EXPR [, EXPR ...] +.ascii "..." .asciz "..." .string "..." +.zero N .skip N .fill N, SIZE, VALUE +.align N .balign N .p2align N +.set NAME, EXPR +.equ NAME, EXPR (= .set) +.file "name" (debug line filename) +.loc FILE LINE [COL] (debug line row) +``` + +CFI directives (`.cfi_startproc`, `.cfi_def_cfa`, …) are accepted and +forwarded to the corresponding `MCEmitter.cfi_*` calls (already exist +per `arch/arch.h`). Unknown directives are a recovery diagnostic, not a +panic — skip to newline. + +`.macro` / `.if` / `.include` are deferred. Inline asm gets there first +because `cg_inline_asm` is the immediate consumer. + +### 4.5 Disassembler — `arch_disasm_new` for aarch64 + +```c +typedef struct AA64Disasm { + ArchDisasm base; + Compiler* c; + StrBuf mnemonic, operands, annotation; /* reused per decode */ +} AA64Disasm; + +u32 arch_disasm_decode(ArchDisasm* d_, const u8* b, size_t n, + u64 vaddr, CfreeInsn* out) { + if (n < 4) return 0; + u32 w = read_u32_le(b); + const AA64InsnDesc* ins = aa64_disasm_find(w); + if (!ins) { write_unknown(d, b, out); return 4; } + AA64Fields f = aa64_<ins->format>_unpack(w); + strbuf_set(&d->mnemonic, ins->mnemonic); + aa64_<ins->format>_print(&d->operands, ins, &f, vaddr); + out->vaddr = vaddr; + out->bytes = b; out->nbytes = 4; + out->mnemonic = d->mnemonic.p; + out->operands = d->operands.p; + out->annotation = ""; /* sym/reloc overlay added by cfree_obj_disasm */ + return 4; +} +``` + +Annotations (sym/reloc overlay) live one level up in `cfree_obj_disasm` / +`cfree_disasm_iter_new(..., obj)`: the iterator walks `ObjBuilder` relocs +keyed on the section + offset and writes the resolved `name+addend` into +`annotation`. The arch-level decoder is reloc-unaware — it only reads +bytes. This keeps `arch_disasm_decode` per-arch and the symbol/reloc +overlay arch-agnostic. + +`cfree_arch_register_name` / `_index` table lives in `aa64_asm.c` +alongside the parser (one canonical name list — same source for parse +and print). + +--- + +## 5. Phasing + +Each phase ends mergeable. Phase 1 stands up the test harness so every +later phase gates on real runs from its first commit (mirrors +`MULTIARCH.md §4` Phase 1). Phase 2 lands the encode/decode pairing as +a mechanical refactor; phase 3 is the standalone assembler; phase 4 is +inline asm + disasm overlay; phase 5 is the seam-rev for x64/rv64. + +### Phase 1 — test harness + +Stand up the runner before any compiler-side work. No `src/` changes. + +1. New `test/asm/` peer of `test/parse/`. One `run.sh`; three + sub-corpora (§6). Skip-vs-fail follows the `CFREE_TEST_ALLOW_SKIP` + convention used elsewhere — every case skips cleanly today because + `parse_asm` and `cfree_disasm_iter_*` are stubs. +2. New `test-asm` target in `test/test.mk`; added to the default + `test` list. +3. Add `S` (asm-roundtrip) path letter to `test/cg/run.sh`. Plumbed + to walk every `.text` byte of each cg-emitted aarch64 binary + through `cfree_disasm_iter_*` → `cfree as` → byte-compare. Skips + today; turns green when phases 3+4 land. Path matrix becomes + `DREJWS`. +4. Smoke goldens checked in for one case per sub-corpus. Generated + from the host `as` / `objdump` once and committed; a `regen.sh` + documents how to refresh but is not run by default (same + convention as `test/elf/normalize.py`). +5. New runner C binary `asm-runner` under `test/asm/harness/` — + peer of `cg-runner`. Three sub-commands: `--encode`, `--decode`, + `--listing`. Dispatches to `cfree as` / `cfree_disasm_iter_*` / + `cfree_obj_disasm` per case. + +Exit criterion: `make test-asm` runs end-to-end; smoke cases report +SKIP under `CFREE_TEST_ALLOW_SKIP=1` (the default during phase 1) and +the harness wiring is exercised on every CI run. `test/cg/run.sh -S` +also reports SKIP cleanly. No green asm cases yet — that's phase 3. + +### Phase 2 — finish the ISA descriptor table + +Pure refactor. No new behavior; existing codegen still calls inline +encoders. + +1. Add `parse_operands` and `print_operands` per `AA64Format` in + `aa64_isa.{h,c}`. The first cut prints into a `StrBuf` and parses + from a tiny operand lexer (reg name, `#imm`, `[Xn, #ofs]`, label). +2. Add `AsmFlags` column to `AA64InsnDesc`. Mark aliases (`MOV`, `MUL`, + `NEG`, `RET`). +3. Reorder rows in `aa64_insn_table` so aliases precede their canonical + forms. +4. Backfill formats not yet in the table that codegen emits today (load/ + store immediate, branch immediate, conditional branch, NOP, BRK). + Each lands as one format-struct + pack/unpack + parse/print + table + rows. + +Exit criterion: `aa64_disasm_find` returns a desc for every byte +sequence the codegen currently produces; no test changes. + +### Phase 3 — standalone `.s` assembler + +1. New `src/parse/parse_asm.c`. Replace the panic in `src/api/stubs.c`. + Driver loop, directive parser, label management, expression + evaluator (constant + `sym + const`, full arithmetic on constants + per §7). +2. New `src/arch/aa64_asm.{h,c}` with `aa64_asm_open` and the + instruction parser. Mnemonic lookup goes through `aa64_insn_table`. +3. CFI directives forwarded to `MCEmitter.cfi_*`. +4. `.loc` calls `mc->set_loc` so debug line tables work for hand-written + `.s`. +5. `cfree as` driver subcommand (multi-call dispatch). + +Exit criterion: every `test/asm/encode/` case is green; every row of +`aa64_insn_table` is hit by at least one encode case. `rt/` stays on +clang. + +### Phase 4 — inline asm + disasm overlay + +1. Implement `aa_asm_block` in `arch/aarch64.c` calling into + `aa64_asm_run_template`. Implement `cg_inline_asm` in `cg/cg.c`: + evaluate inputs to `Operand`s, materialize `&buf` for `m` constraints, + call `target->asm_block`, push `out_ops` back as `SValue`s. +2. Constraint binding (§4.3): `r`, `=r`, `+r`, `=&r`, `i`, `m`, `0`. +3. Memory clobber: CG flushes value stack (`spill_reg` for every live + reg-resident SValue) before the call, marks them invalid after. + Register clobbers route through the existing `clobbers` mechanism. +4. `IR_ASM_BLOCK` already opaque-to-passes; opt recorder + (`opt.c:692`) materializes operands and replays. +5. `arch_disasm_new` for aarch64 (`aa64_disasm.c`); dispatch in new + `arch/disasm.c`. +6. `cfree_obj_disasm` / `cfree_disasm_iter_*` over `arch_disasm_*`, + plus reloc/symbol annotation overlay. `cfree_arch_register_*` table. + +Exit criterion: every `test/asm/decode/` and `test/asm/listing/` case +is green; the inline-asm cases under `test/cg/` (svc-style write-then- +exit) build, run under qemu/podman, and report green on `DREJWS`. The +`S` path turns green for the full cg corpus, proving encode/decode +pairing across every `.text` byte cfree currently emits. + +### Phase 5 — multiarch seam + +Land before x64/rv64 codegen needs it. + +1. `arch/disasm.c::arch_disasm_new` switches on `c->target.arch` + (currently aarch64-only). +2. `parse_asm` driver dispatches per-arch instruction parser by + `c->target.arch`. `aa64_asm_open` becomes one of N constructors. +3. Reg-name table dispatched the same way (`cfree_arch_register_name`). +4. `x64_isa.{h,c}` and `rv64_isa.{h,c}` skeletons (formats + tables, + not populated). x64 brings AT&T, rv64 brings GNU. Each pulls in its + own `<arch>_asm.{h,c}` and `<arch>_disasm.{h,c}`. Per `DESIGN.md + §10` the asm flavour is decided per-arch, single supported flavour. + +Exit criterion: builds for `CFREE_ARCH_X86_64` reach the x64 asm/disasm +stubs and panic with a clean diagnostic; aarch64 path unchanged. + +--- + +## 6. Testing + +The pairing buys a strong test shape: most tests run the round trip +rather than spelling expected bytes by hand. Three buckets, all wired +in phase 1: + +### 6.1 `test/asm/` — file-driven goldens (new) + +Peer of `test/parse/`. One `run.sh`, one `asm-runner` C binary, three +sub-corpora keyed off filename suffix: + +| dir | input | expected | drives | +|-------------------------|--------------------|--------------------|------------------------------| +| `test/asm/encode/` | `<name>.s` | `<name>.expected.hex` | `cfree as` over the `.s`, hex-compare against expected | +| `test/asm/decode/` | `<name>.hex` | `<name>.expected.txt` | `cfree_disasm_iter_*` over the bytes, text-compare | +| `test/asm/listing/` | `<name>.in.bin` (ELF) | `<name>.expected.lst` | `cfree_obj_disasm` against the ELF, listing-compare | + +Goldens are checked in. A `test/asm/regen.sh` regenerates them from +the host `as` / `objdump` (committed only as a maintainer aid; not +run by CI). One smoke case per sub-corpus is enough for phase 1; the +table fills up alongside phases 3 and 4. + +### 6.2 `test/cg/` `S` path — asm roundtrip (new path letter) + +Path letter added to `test/cg/run.sh`. For every cg-emitted aarch64 +binary already in the corpus: walk `.text`, decode each instruction, +re-assemble the resulting text, byte-compare. No new corpus — +piggybacks on every existing cg case for free coverage. Catches +encode/decode drift the moment a format gains a member. + +Reports SKIP today, green after phase 4. Path matrix becomes +`DREJWS`. Skip-vs-fail and filtering match the rest of the cg paths +verbatim. + +### 6.3 `test/cg/` inline-asm cases — under existing harness + +Inline asm is behavioral C with exit-code assertions, which is exactly +what `test/cg/cases_*.c` already does. Add a new `cases_asm.c` (or +fold cases into the existing buckets) registered through `cg-runner` +the same way every other case is. The path matrix (`DREJW`) and the +qemu/podman runner from `test/lib/exec_aarch64.sh` cover execution +unchanged. + +### Driver wiring + +A standalone `cfree as` subcommand is exposed by the multi-call driver +in phase 3 (same dispatch as `cfree -c <file.s>` modulo lang +inference). `test/asm/encode/` drives `cfree as` directly so the +multi-call dispatch is exercised end-to-end. + +--- + +## 7. Decisions + +- **Disasm immediate format: context-sensitive.** Signed decimal for + fields the ISA defines as signed (branch displacements, signed-imm12 + add/sub, load/store offsets). `0x`-prefixed hex everywhere else + (logical bitmask immediates, MOVZ/MOVK halfword, addresses). + `aa64_<fmt>_print` carries a per-field signedness bit; the print + helpers branch on it. Goldens lock the chosen form per format. +- **`.s` constant expressions: arithmetic with parens.** Operators + `+ - * / % << >> & | ^ ~`, parenthesized, over signed integer + constants and `sym + const` terms. Symbol-involving expressions are + restricted to `(sym ± const)`; any product, quotient, shift, or + bitwise op that has a symbol operand is a diagnostic. Reloc-modifier + syntax (`:lo12:sym`, `:got:sym`) and macro counters (`\@`) are + deferred — they belong with the macro/full-PIC follow-up. +- **`__cfree_setjmp.s` is decoupled.** Phase 2 lands against the + synthetic suites in §6, not against the runtime. `rt/` is currently + built with clang (`rt/Makefile`) and continues to be through phase 3; + `__cfree_setjmp.s` migrates to `parse_asm` as a follow-up after the + assembler is proven on the test corpus. The same applies to any + other `.s` files `rt/` adds before then. +- **Absolute relocs in `.s`** (e.g. `.quad some_sym + 8` in `.data`) + go through `MCEmitter.emit_reloc_at` against the existing + `RelocKind` set — no new mechanism needed. +- **Self-hosting.** Per `DESIGN.md §12`, anything in `src/` must be + C11-freestanding-writable. `parse_asm.c` and `aa64_asm.c` follow the + same rule. No reliance on a host assembler at build time *for the + compiler*; `rt/` still uses clang and is on its own bootstrap track.