ASM — the assembler(s)

kit contains three textual-assembly surfaces that all feed the same object path: a standalone GNU-as-compatible assembler (kit as, cc on a .s input), an inline-asm("...") statement plumbed through codegen, and a symbolizing disassembler-to-text printer (cc -S). They are deliberately co-designed: the per-arch encoders are the single source of truth for bit layout, the disassembler shares those encoders, and cc -S renders operands as the inverse of what the assembler parses — so cc -S | as round-trips cc -c (see TESTING.md). This document describes the layering and the invariants; ISA encoding tables live in ARCH.md, object emission in OBJ.md.

1. Layering

  .s text ──► AsmLexer ──► AsmDriver (arch-neutral)
                              │  directives, labels, section/symbol state,
                              │  expression evaluator, string decoding
                              ▼
                          ArchAsm vtable .insn(driver, mnemonic)
                              │  per-arch instruction parser
                              ▼ (per-arch encoders == ISA source of truth)
                          MCEmitter ──► ObjBuilder ──► ELF / Mach-O
                                          ▲
  inline asm("...") ──► CgTarget.asm_block ─┘ (same MCEmitter, same encoders)

  ObjBuilder ──► arch_disasm_decode ──► cc -S printer (asm_emit.c)
                       ▲                  symbolizes operands, re-spells dirs
                       └── shared with objdump / kit_disasm

Three seams keep the design factored:

driver ↔ per-arch parser: src/asm/asm_helpers.h. AsmDriver is opaque to per-arch code; the helper surface (peek/next, eat_punct, parse_const, parse_sym_expr, intern_sym, cur_section, panic) is the only contact.
assembler/codegen ↔ object bytes: MCEmitter (src/arch/mc.c) is the byte/reloc sink for both the standalone assembler and C codegen, so a hand-written .s and a compiled .c produce structurally identical objects.
printer ↔ per-arch operand syntax: ArchAsmOps (src/arch/arch.h), reached via arch_reloc_operand / arch_is_local_branch / arch_reloc_call_pair. This is the inverse of the per-arch reloc-operand parsers and keeps cc -S arch-agnostic but format-aware.

2. The lexer — `src/asm/asm_lex.c`

AsmLexer streams tokens from a borrowed source buffer. It intentionally keeps C-like number/string spelling rules, because .s sources arrive after C preprocessing and GNU as accepts those spellings in directives and expressions. It does line-splice handling (phase-2 \<newline>) and treats comments (//, /* */) as whitespace, but surfaces physical newlines as ASM_TOK_NEWLINE so the driver can stay line-oriented.

Two classification quirks are load-bearing:

.L-prefixed names lex as a single identifier, leading dot included. This is the universal GNU convention for assembler-local labels (.Lfoo, .LBB0_1, .Lcfblk.3). It is unambiguous against directives — no directive starts with .L — so .text / .section still tokenize as '.' + IDENT and reach the directive dispatcher.
name.N (dot-then-digit) continues an identifier so discriminator-mangled symbols (acc.1) survive, but .+letter does not glue — leaving mnemonic suffixes (b.eq) and the location-counter dot (. - foo) to be reassembled or evaluated by the driver.

# is a distinct token (ASM_TOK_HASH) because it is both the asm immediate marker and the cpp linemarker introducer; the driver disambiguates by position.

3. The arch-neutral driver — `src/asm/asm.c`

asm_parse runs the top loop: skip blank lines, skip a #-at-BOL cpp linemarker, dispatch .directive lines, treat IDENT : as a label definition, and otherwise hand IDENT [.suffix...] to the per-arch instruction parser. Composite mnemonics (b.eq, RISC-V fcvt.w.s, amoadd.d) arrive as IDENT '.' IDENT ... and are reassembled (maybe_compose_mnemonic) before dispatch; dotted directive/section names (.rodata.foo) are stitched the same way. The per-arch parser is created from the target's ArchImpl.asm_new; an arch without that hook is a clean panic.

State held by the driver. Current section, three hashmaps — Sym→ObjSecId (sections), Sym→ObjSymId (symbols), Sym→AsmEqu (.set/.equ constants). The symbol map ensures a forward reference (b foo before foo:) shares one ObjSymId with its later definition. New symbols are minted SB_LOCAL/SK_NOTYPE and, post-parse, promote_undef_externs turns any still-undefined local into an undefined global — matching GNU as, since a local UNDEF can't pull an archive member at link time.

Directives. Section switches (.text/.data/.rodata/.bss/.section), symbol attributes (.globl/.local/.weak/.hidden/.type/.size/...), data emission (.byte/.short/.long/.quad, .ascii/.asciz/.string, .zero/.fill/.align/ .p2align, .uleb128/.sleb128, .inst), .comm/.lcomm (SK_COMMON), .set/.equ. .section parses both GNU (,"flags",@type,entsize) and — when the target object format is Mach-O — the segname,sectname,type dialect; kit as parses the dialect of its target only, no hybrid. CFI, .loc/.file, .option (RISC-V), and a handful of other directives are accepted-and-ignored so a real .s from cc -S parses to completion; an unknown directive recovers by skipping to end-of-line.

Expression evaluator. A precedence-climbing evaluator over + - * / % << >> & | ^ ~ and parentheses. Pure-constant subexpressions fold; symbol-involving expressions are restricted to sym ± const. The lone . token is the location counter, valid only as sym - ., which the .long/.quad path turns into a PC-relative data relocation (R_PC32/R_PC64) rather than an absolute one. asm_driver_parse_const rejects any symbol; asm_driver_parse_sym_expr returns (sym, offset).

Symbolic data relocations. A .quad sym+8 in .data goes through MCEmitter.emit_reloc_at against the existing RelocKind set — no new mechanism. The addend is pre-written into the data field (not zeroed): Mach-O REL relocs carry the addend implicitly in the relocated field, and ELF RELA overwrites it harmlessly, so both converge on sym + addend. kit codegen emits data relocs the same way.

Same-section branch relaxation. The per-arch parser emits a relocation for every symbolic branch target, because a forward b .Lfoo is only known to be local once .Lfoo: appears. After the parse, relax_local_branches resolves intra-section branches in place — compute the displacement, patch the instruction, drop the relocation — for PC-relative branch kinds whose target is a same-section, locally-bound, non-function symbol. The two guards match the two systems this must agree with: GNU as keeps the relocation on a global (preemptible) target, and kit codegen keeps the relocation on an intra-file call/tail-call to a function symbol while resolving branches to internal labels. This is what makes a control-flow-bearing cc -S | as reproduce cc -c's .text relocation table.

4. Per-arch instruction parsers

src/arch/{aa64,x64,rv64}/asm.c. Each implements the ArchAsm vtable: a per-mnemonic dispatch that reads operand tokens through the driver helpers and emits the encoded bytes via the arch's ISA encoders — the same encoders the disassembler decodes through, so when an opcode bit moves the encoder and decoder update at one site and stay in sync by construction. Aliases (mov/neg/cmp/mul; AT&T size-suffix folding on x64; RISC-V pseudos like call/tail/la) are handled inside the parser, branching on operand shape where one mnemonic admits several forms.

Reloc-operator syntax (per-arch, per-object-format)

A symbolic operand can carry a relocation modifier. The spelling depends on both the architecture and the object format; each arch's parser accepts exactly the dialect of its target (asm_driver_compiler(d)->target.obj):

aarch64 — ELF spells modifiers as a :mod: prefix (:lo12:sym, :got:sym, :got_lo12:sym); Mach-O spells them as an @MOD suffix (sym@PAGE, sym@PAGEOFF, sym@GOTPAGE, sym@GOTPAGEOFF). Both map to the same internal AA64RelMod so downstream reloc emission is shared; the load/store :lo12: reloc is selected by access size. A bare adrp sym is the implicit page reloc on ELF, but Mach-O requires the explicit @PAGE.
x86-64 — RIP-relative memory operands use sym(%rip); the GOT form is sym@GOTPCREL(%rip); a call target may carry @PLT. A symbolic memory displacement that is not (%rip) is rejected.
rv64 — the %hi/%lo/%pcrel_hi/%pcrel_lo/%got_pcrel_hi operator syntax, identical on every object format. %pcrel_lo(label) names the AUIPC anchor label, not the target symbol, per the RISC-V ABI; the la, call, and tail pseudos expand to AUIPC+ADDI / AUIPC+JALR pairs with the appropriate paired relocations.

5. Inline asm — `src/cg/asm.c` + per-arch template walkers

kit_cg_inline_asm (src/cg/asm.c) is the public-API constraint binder. It maps the GCC operand model onto the CG stack:

Constraints: r/=r/+r/=&r, i, m, and matching 0..9. +r is decomposed by the frontend into =r plus a synthesized matching input; the binder copies the lvalue's current value into the bound output register before the asm runs. Inputs are popped off the CG stack; =r outputs get fresh temp locals; early-clobber (=&r) outputs are allocated after inputs are bound and checked for collision. m operands resolve to an indirect (address-of) operand.
Clobbers: "memory" spills every live RES_LOCAL stack value (the same machinery cg uses across function calls); named register clobbers are resolved by the arch and force callee-save preservation. "cc" is benign.
The bound operands (combined outs then ins, GCC indexing) plus the template string are handed to CgTarget.asm_block. Results come back as fresh SValues pushed onto the stack.

The template walker lives per-arch (<arch>_asm_run_template, e.g. src/arch/aa64/asm.c). Rather than carry Operands inside tokens, it pre-substitutes placeholders into physical asm text and re-lexes that text through the same per-mnemonic parsers used by the standalone driver — one operand grammar, one lexer. It splits the template on \n/; honoring bracket depth and quote state, substitutes %N/%NN, width forms (%wN/%xN), address form (%aN), and symbolic %[name]/%w[name] (resolved against the constraint's [name]), then drives each rendered line through asm_driver_open_inline — an AsmDriver built around a memory-backed lexer and the caller's MCEmitter that emits into cg's current section and does not allocate a default .text.

CgTarget.asm_block is wired in each backend's native.c: the optimizer path binds the allocator's pre-assigned registers; a direct path self-allocates and saves/restores callee-clobbered registers around the block. CgTarget.file_scope_asm reuses the full standalone asm_parse over a memory lexer, so a top-level asm("...") is just a small .s translation unit. asm statements become an opaque IR_ASM_BLOCK instruction (IRAsmAux payload) that the optimizer records and replays like IR_CALL (see IR.md, OPT.md); asm volatile needs no special IR handling because the block is already opaque to passes.

6. `cc -S` — symbolized disassembly — `src/api/asm_emit.c`

cc -S does not have a separate textual back end. It disassembles the already-emitted object (arch_disasm_decode, the exact decoder used by objdump and kit_disasm; see OBJ.md) and re-spells the result as re-assemblable text. The work is in making operands and directives faithful enough that the output feeds back through the assembler to the same object.

Directive spelling — the AsmSyntax vtable. A tiny vtable selected by c->target.obj (not arch) supplies the format-specific directives: section_header, sym_type (ELF .type, Mach-O none), sym_size, align. Selecting by format is correct because an x64-ELF and aa64-ELF .s share .type/.size/.section; everything else (.globl, .comm, labels, data directives, instruction lines) is format-neutral and stays on the shared path.

Operand symbolization — the ArchRelocOperand inverse. For each relocation covering an instruction, arch_reloc_operand returns how that operand should be spelled: a prefix/suffix modifier (the inverse of the per-arch parser of §4), an addend_bias (x86-64 rel32 relocs store addend-4; the bias undoes it so the printed offset is the symbol offset), and a surgery kind telling the printer where in the operand text to splice the symref:

SURG_TAIL — replace the last comma component / whole operand (branch targets, aarch64 adrp).
SURG_MEM — rewrite the offset inside [...] (aarch64 load/store).
SURG_RIP — insert sym before disp(%rip) (x86-64); auto-selected whenever the operand text contains (%rip), so one reloc kind (R_PC32) serves both a branch and a RIP-relative memory operand.
SURG_RV_LO12 — RISC-V low-half: rewrite the displacement of a disp(base) load/store, or append %lo(...) as a new operand to a register-immediate form (the assembler folds it back into the ADDI).

Three printer-side reconstructions complete the round-trip:

Intra-section branch labels. A relaxed branch (§3) carries no relocation, so the disassembler prints a numeric target. arch_is_local_branch flags such mnemonics; the printer pre-scans for these targets, synthesizes a local label at each, and rewrites the operand to name it.
RISC-V hi/lo anchors. A %pcrel_hi (AUIPC) reloc sets emit_anchor, so the printer defines a unique .Lpcrel_hi_<sec>_<off>: label at it; the paired %pcrel_lo reloc sets ref_anchor and its operand names the nearest preceding anchor — not its own symbol. (Codegen's shared .LpcrelHi symbols are suppressed and replaced by these synthesized unique anchors.)
Call-pair fusion. arch_reloc_call_pair collapses a disassembled AUIPC+JALR pair (RISC-V R_RV_CALL) back into a single call/tail sym pseudo, reading the partner JALR's link register to choose call vs tail.

Data relocations are printed by emit_data_range as .quad sym+addend (bare symbol, no modifier), with a trailing - . for the PC-relative kinds so the assembler re-derives R_PC{32,64}. An undecodable instruction byte falls back to .byte 0x.., and an undecodable instruction word to .inst 0x... so nothing is silently dropped.

7. The encoding–relocation invariant

A code-location reference that an encoding-divergent assembler must be able to recompute never bakes a fixed function + byte-offset. Switch jump-table entries and &&label address-takes relocate against a per-block local symbol (mc_label_symbol, src/arch/mc.c) whose value is the label's offset — minted lazily as .Lcfblk.<id> and defined when the label is placed. A third-party assembler that re-encodes the function to different instruction lengths still resolves such an entry to the right address, where a fixed fn+offset would point into the wrong instruction. cc -S relies on exactly this: jump tables print as .quad .Lcfblk.* and re-assemble against the same local symbols. Only .L-prefixed (and ordinary) names are treated as re-assemblable operands; other dotted names (section symbols) keep the numeric form because the expression parser does not accept them.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

kit

ASM — the assembler(s)

1. Layering

2. The lexer — src/asm/asm_lex.c

3. The arch-neutral driver — src/asm/asm.c