ASM — the assembler(s)
kit contains three textual-assembly surfaces that all feed the same object
path: a standalone GNU-as-compatible assembler (kit as, cc on a .s
input), an inline-asm("...") statement plumbed through codegen, and a
symbolizing disassembler-to-text printer (cc -S). They are deliberately
co-designed: the per-arch encoders are the single source of truth for bit
layout, the disassembler shares those encoders, and cc -S renders operands
as the inverse of what the assembler parses — so cc -S | as round-trips
cc -c (see TESTING.md). This document describes the layering
and the invariants; ISA encoding tables live in ARCH.md, object
emission in OBJ.md.
1. Layering
.s text ──► AsmLexer ──► AsmDriver (arch-neutral)
│ directives, labels, section/symbol state,
│ expression evaluator, string decoding
▼
ArchAsm vtable .insn(driver, mnemonic)
│ per-arch instruction parser
▼ (per-arch encoders == ISA source of truth)
MCEmitter ──► ObjBuilder ──► ELF / Mach-O
▲
inline asm("...") ──► CgTarget.asm_block ─┘ (same MCEmitter, same encoders)
ObjBuilder ──► arch_disasm_decode ──► cc -S printer (asm_emit.c)
▲ symbolizes operands, re-spells dirs
└── shared with objdump / kit_disasm
Three seams keep the design factored:
- driver ↔ per-arch parser:
src/asm/asm_helpers.h.AsmDriveris opaque to per-arch code; the helper surface (peek/next,eat_punct,parse_const,parse_sym_expr,intern_sym,cur_section,panic) is the only contact. - assembler/codegen ↔ object bytes:
MCEmitter(src/arch/mc.c) is the byte/reloc sink for both the standalone assembler and C codegen, so a hand-written.sand a compiled.cproduce structurally identical objects. - printer ↔ per-arch operand syntax:
ArchAsmOps(src/arch/arch.h), reached viaarch_reloc_operand/arch_is_local_branch/arch_reloc_call_pair. This is the inverse of the per-arch reloc-operand parsers and keepscc -Sarch-agnostic but format-aware.
2. The lexer — src/asm/asm_lex.c
AsmLexer streams tokens from a borrowed source buffer. It intentionally keeps
C-like number/string spelling rules, because .s sources arrive after C
preprocessing and GNU as accepts those spellings in directives and
expressions. It does line-splice handling (phase-2 \<newline>) and treats
comments (//, /* */) as whitespace, but surfaces physical newlines as
ASM_TOK_NEWLINE so the driver can stay line-oriented.
Two classification quirks are load-bearing:
.L-prefixed names lex as a single identifier, leading dot included. This is the universal GNU convention for assembler-local labels (.Lfoo,.LBB0_1,.Lcfblk.3). It is unambiguous against directives — no directive starts with.L— so.text/.sectionstill tokenize as'.'+IDENTand reach the directive dispatcher.name.N(dot-then-digit) continues an identifier so discriminator-mangled symbols (acc.1) survive, but.+letter does not glue — leaving mnemonic suffixes (b.eq) and the location-counter dot (. - foo) to be reassembled or evaluated by the driver.
# is a distinct token (ASM_TOK_HASH) because it is both the asm immediate
marker and the cpp linemarker introducer; the driver disambiguates by position.
3. The arch-neutral driver — src/asm/asm.c
asm_parse runs the top loop: skip blank lines, skip a #-at-BOL cpp
linemarker, dispatch .directive lines, treat IDENT : as a label
definition, and otherwise hand IDENT [.suffix...] to the per-arch instruction
parser. Composite mnemonics (b.eq, RISC-V fcvt.w.s, amoadd.d) arrive as
IDENT '.' IDENT ... and are reassembled (maybe_compose_mnemonic) before
dispatch; dotted directive/section names (.rodata.foo) are stitched the same
way. The per-arch parser is created from the target's ArchImpl.asm_new; an
arch without that hook is a clean panic.
State held by the driver. Current section, three hashmaps — Sym→ObjSecId
(sections), Sym→ObjSymId (symbols), Sym→AsmEqu (.set/.equ constants).
The symbol map ensures a forward reference (b foo before foo:) shares one
ObjSymId with its later definition. New symbols are minted SB_LOCAL/SK_NOTYPE
and, post-parse, promote_undef_externs turns any still-undefined local into an
undefined global — matching GNU as, since a local UNDEF can't pull an
archive member at link time.
Directives. Section switches (.text/.data/.rodata/.bss/.section), symbol
attributes (.globl/.local/.weak/.hidden/.type/.size/...), data emission
(.byte/.short/.long/.quad, .ascii/.asciz/.string, .zero/.fill/.align/ .p2align, .uleb128/.sleb128, .inst), .comm/.lcomm (SK_COMMON),
.set/.equ. .section parses both GNU (,"flags",@type,entsize) and — when
the target object format is Mach-O — the segname,sectname,type dialect; kit
as parses the dialect of its target only, no hybrid. CFI, .loc/.file,
.option (RISC-V), and a handful of other directives are accepted-and-ignored
so a real .s from cc -S parses to completion; an unknown directive recovers
by skipping to end-of-line.
Expression evaluator. A precedence-climbing evaluator over
+ - * / % << >> & | ^ ~ and parentheses. Pure-constant subexpressions fold;
symbol-involving expressions are restricted to sym ± const. The lone .
token is the location counter, valid only as sym - ., which the
.long/.quad path turns into a PC-relative data relocation (R_PC32/R_PC64)
rather than an absolute one. asm_driver_parse_const rejects any symbol;
asm_driver_parse_sym_expr returns (sym, offset).
Symbolic data relocations. A .quad sym+8 in .data goes through
MCEmitter.emit_reloc_at against the existing RelocKind set — no new
mechanism. The addend is pre-written into the data field (not zeroed): Mach-O
REL relocs carry the addend implicitly in the relocated field, and ELF RELA
overwrites it harmlessly, so both converge on sym + addend. kit codegen
emits data relocs the same way.
Same-section branch relaxation. The per-arch parser emits a relocation for
every symbolic branch target, because a forward b .Lfoo is only known to be
local once .Lfoo: appears. After the parse, relax_local_branches resolves
intra-section branches in place — compute the displacement, patch the
instruction, drop the relocation — for PC-relative branch kinds whose target
is a same-section, locally-bound, non-function symbol. The two guards match
the two systems this must agree with: GNU as keeps the relocation on a global
(preemptible) target, and kit codegen keeps the relocation on an intra-file
call/tail-call to a function symbol while resolving branches to internal
labels. This is what makes a control-flow-bearing cc -S | as reproduce
cc -c's .text relocation table.
4. Per-arch instruction parsers
src/arch/{aa64,x64,rv64}/asm.c. Each implements the ArchAsm vtable: a
per-mnemonic dispatch that reads operand tokens through the driver helpers and
emits the encoded bytes via the arch's ISA encoders — the same encoders the
disassembler decodes through, so when an opcode bit moves the encoder and
decoder update at one site and stay in sync by construction. Aliases
(mov/neg/cmp/mul; AT&T size-suffix folding on x64; RISC-V pseudos like
call/tail/la) are handled inside the parser, branching on operand shape
where one mnemonic admits several forms.
Reloc-operator syntax (per-arch, per-object-format)
A symbolic operand can carry a relocation modifier. The spelling depends on
both the architecture and the object format; each arch's parser accepts exactly
the dialect of its target (asm_driver_compiler(d)->target.obj):
- aarch64 — ELF spells modifiers as a
:mod:prefix (:lo12:sym,:got:sym,:got_lo12:sym); Mach-O spells them as an@MODsuffix (sym@PAGE,sym@PAGEOFF,sym@GOTPAGE,sym@GOTPAGEOFF). Both map to the same internalAA64RelModso downstream reloc emission is shared; the load/store:lo12:reloc is selected by access size. A bareadrp symis the implicit page reloc on ELF, but Mach-O requires the explicit@PAGE. - x86-64 — RIP-relative memory operands use
sym(%rip); the GOT form issym@GOTPCREL(%rip); a call target may carry@PLT. A symbolic memory displacement that is not(%rip)is rejected. - rv64 — the
%hi/%lo/%pcrel_hi/%pcrel_lo/%got_pcrel_hioperator syntax, identical on every object format.%pcrel_lo(label)names the AUIPC anchor label, not the target symbol, per the RISC-V ABI; thela,call, andtailpseudos expand to AUIPC+ADDI / AUIPC+JALR pairs with the appropriate paired relocations.
5. Inline asm — src/cg/asm.c + per-arch template walkers
kit_cg_inline_asm (src/cg/asm.c) is the public-API constraint binder. It
maps the GCC operand model onto the CG stack:
- Constraints:
r/=r/+r/=&r,i,m, and matching0..9.+ris decomposed by the frontend into=rplus a synthesized matching input; the binder copies the lvalue's current value into the bound output register before the asm runs. Inputs are popped off the CG stack;=routputs get fresh temp locals; early-clobber (=&r) outputs are allocated after inputs are bound and checked for collision.moperands resolve to an indirect (address-of) operand. - Clobbers:
"memory"spills every live RES_LOCAL stack value (the same machinery cg uses across function calls); named register clobbers are resolved by the arch and force callee-save preservation."cc"is benign. - The bound operands (combined
outsthenins, GCC indexing) plus the template string are handed toCgTarget.asm_block. Results come back as fresh SValues pushed onto the stack.
The template walker lives per-arch (<arch>_asm_run_template, e.g.
src/arch/aa64/asm.c). Rather than carry Operands inside tokens, it
pre-substitutes placeholders into physical asm text and re-lexes that text
through the same per-mnemonic parsers used by the standalone driver — one
operand grammar, one lexer. It splits the template on \n/; honoring bracket
depth and quote state, substitutes %N/%NN, width forms (%wN/%xN),
address form (%aN), and symbolic %[name]/%w[name] (resolved against the
constraint's [name]), then drives each rendered line through
asm_driver_open_inline — an AsmDriver built around a memory-backed lexer and
the caller's MCEmitter that emits into cg's current section and does not
allocate a default .text.
CgTarget.asm_block is wired in each backend's native.c: the optimizer path
binds the allocator's pre-assigned registers; a direct path self-allocates and
saves/restores callee-clobbered registers around the block.
CgTarget.file_scope_asm reuses the full standalone asm_parse over a memory
lexer, so a top-level asm("...") is just a small .s translation unit. asm
statements become an opaque IR_ASM_BLOCK instruction (IRAsmAux payload) that
the optimizer records and replays like IR_CALL (see IR.md,
OPT.md); asm volatile needs no special IR handling because the block
is already opaque to passes.
6. cc -S — symbolized disassembly — src/api/asm_emit.c
cc -S does not have a separate textual back end. It disassembles the
already-emitted object (arch_disasm_decode, the exact decoder used by
objdump and kit_disasm; see OBJ.md) and re-spells the result as
re-assemblable text. The work is in making operands and directives faithful
enough that the output feeds back through the assembler to the same object.
Directive spelling — the AsmSyntax vtable. A tiny vtable selected by
c->target.obj (not arch) supplies the format-specific directives:
section_header, sym_type (ELF .type, Mach-O none), sym_size, align.
Selecting by format is correct because an x64-ELF and aa64-ELF .s share
.type/.size/.section; everything else (.globl, .comm, labels, data
directives, instruction lines) is format-neutral and stays on the shared path.
Operand symbolization — the ArchRelocOperand inverse. For each
relocation covering an instruction, arch_reloc_operand returns how that
operand should be spelled: a prefix/suffix modifier (the inverse of the
per-arch parser of §4), an addend_bias (x86-64 rel32 relocs store addend-4;
the bias undoes it so the printed offset is the symbol offset), and a
surgery kind telling the printer where in the operand text to splice the
symref:
SURG_TAIL— replace the last comma component / whole operand (branch targets, aarch64adrp).SURG_MEM— rewrite the offset inside[...](aarch64 load/store).SURG_RIP— insertsymbeforedisp(%rip)(x86-64); auto-selected whenever the operand text contains(%rip), so one reloc kind (R_PC32) serves both a branch and a RIP-relative memory operand.SURG_RV_LO12— RISC-V low-half: rewrite the displacement of adisp(base)load/store, or append%lo(...)as a new operand to a register-immediate form (the assembler folds it back into the ADDI).
Three printer-side reconstructions complete the round-trip:
- Intra-section branch labels. A relaxed branch (§3) carries no relocation,
so the disassembler prints a numeric target.
arch_is_local_branchflags such mnemonics; the printer pre-scans for these targets, synthesizes a local label at each, and rewrites the operand to name it. - RISC-V hi/lo anchors. A
%pcrel_hi(AUIPC) reloc setsemit_anchor, so the printer defines a unique.Lpcrel_hi_<sec>_<off>:label at it; the paired%pcrel_loreloc setsref_anchorand its operand names the nearest preceding anchor — not its own symbol. (Codegen's shared.LpcrelHisymbols are suppressed and replaced by these synthesized unique anchors.) - Call-pair fusion.
arch_reloc_call_paircollapses a disassembled AUIPC+JALR pair (RISC-VR_RV_CALL) back into a singlecall/tail sympseudo, reading the partner JALR's link register to choose call vs tail.
Data relocations are printed by emit_data_range as .quad sym+addend (bare
symbol, no modifier), with a trailing - . for the PC-relative kinds so the
assembler re-derives R_PC{32,64}. An undecodable instruction byte falls back
to .byte 0x.., and an undecodable instruction word to .inst 0x... so nothing
is silently dropped.
7. The encoding–relocation invariant
A code-location reference that an encoding-divergent assembler must be able to
recompute never bakes a fixed function + byte-offset. Switch jump-table
entries and &&label address-takes relocate against a per-block local
symbol (mc_label_symbol, src/arch/mc.c) whose value is the label's
offset — minted lazily as .Lcfblk.<id> and defined when the label is placed.
A third-party assembler that re-encodes the function to different instruction
lengths still resolves such an entry to the right address, where a fixed
fn+offset would point into the wrong instruction. cc -S relies on exactly
this: jump tables print as .quad .Lcfblk.* and re-assemble against the same
local symbols. Only .L-prefixed (and ordinary) names are treated as
re-assemblable operands; other dotted names (section symbols) keep the numeric
form because the expression parser does not accept them.