kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

ASM — the assembler(s)

kit contains three textual-assembly surfaces that all feed the same object path: a standalone GNU-as-compatible assembler (kit as, cc on a .s input), an inline-asm("...") statement plumbed through codegen, and a symbolizing disassembler-to-text printer (cc -S). They are deliberately co-designed: the per-arch encoders are the single source of truth for bit layout, the disassembler shares those encoders, and cc -S renders operands as the inverse of what the assembler parses — so cc -S | as round-trips cc -c (see TESTING.md). This document describes the layering and the invariants; ISA encoding tables live in ARCH.md, object emission in OBJ.md.


1. Layering

  .s text ──► AsmLexer ──► AsmDriver (arch-neutral)
                              │  directives, labels, section/symbol state,
                              │  expression evaluator, string decoding
                              ▼
                          ArchAsm vtable .insn(driver, mnemonic)
                              │  per-arch instruction parser
                              ▼ (per-arch encoders == ISA source of truth)
                          MCEmitter ──► ObjBuilder ──► ELF / Mach-O
                                          ▲
  inline asm("...") ──► CgTarget.asm_block ─┘ (same MCEmitter, same encoders)

  ObjBuilder ──► arch_disasm_decode ──► cc -S printer (asm_emit.c)
                       ▲                  symbolizes operands, re-spells dirs
                       └── shared with objdump / kit_disasm

Three seams keep the design factored:


2. The lexer — src/asm/asm_lex.c

AsmLexer streams tokens from a borrowed source buffer. It intentionally keeps C-like number/string spelling rules, because .s sources arrive after C preprocessing and GNU as accepts those spellings in directives and expressions. It does line-splice handling (phase-2 \<newline>) and treats comments (//, /* */) as whitespace, but surfaces physical newlines as ASM_TOK_NEWLINE so the driver can stay line-oriented.

Two classification quirks are load-bearing:

# is a distinct token (ASM_TOK_HASH) because it is both the asm immediate marker and the cpp linemarker introducer; the driver disambiguates by position.


3. The arch-neutral driver — src/asm/asm.c

asm_parse runs the top loop: skip blank lines, skip a #-at-BOL cpp linemarker, dispatch .directive lines, treat IDENT : as a label definition, and otherwise hand IDENT [.suffix...] to the per-arch instruction parser. Composite mnemonics (b.eq, RISC-V fcvt.w.s, amoadd.d) arrive as IDENT '.' IDENT ... and are reassembled (maybe_compose_mnemonic) before dispatch; dotted directive/section names (.rodata.foo) are stitched the same way. The per-arch parser is created from the target's ArchImpl.asm_new; an arch without that hook is a clean panic.

State held by the driver. Current section, three hashmaps — Sym→ObjSecId (sections), Sym→ObjSymId (symbols), Sym→AsmEqu (.set/.equ constants). The symbol map ensures a forward reference (b foo before foo:) shares one ObjSymId with its later definition. New symbols are minted SB_LOCAL/SK_NOTYPE and, post-parse, promote_undef_externs turns any still-undefined local into an undefined global — matching GNU as, since a local UNDEF can't pull an archive member at link time.

Directives. Section switches (.text/.data/.rodata/.bss/.section), symbol attributes (.globl/.local/.weak/.hidden/.type/.size/...), data emission (.byte/.short/.long/.quad, .ascii/.asciz/.string, .zero/.fill/.align/ .p2align, .uleb128/.sleb128, .inst), .comm/.lcomm (SK_COMMON), .set/.equ. .section parses both GNU (,"flags",@type,entsize) and — when the target object format is Mach-O — the segname,sectname,type dialect; kit as parses the dialect of its target only, no hybrid. CFI, .loc/.file, .option (RISC-V), and a handful of other directives are accepted-and-ignored so a real .s from cc -S parses to completion; an unknown directive recovers by skipping to end-of-line.

Expression evaluator. A precedence-climbing evaluator over + - * / % << >> & | ^ ~ and parentheses. Pure-constant subexpressions fold; symbol-involving expressions are restricted to sym ± const. The lone . token is the location counter, valid only as sym - ., which the .long/.quad path turns into a PC-relative data relocation (R_PC32/R_PC64) rather than an absolute one. asm_driver_parse_const rejects any symbol; asm_driver_parse_sym_expr returns (sym, offset).

Symbolic data relocations. A .quad sym+8 in .data goes through MCEmitter.emit_reloc_at against the existing RelocKind set — no new mechanism. The addend is pre-written into the data field (not zeroed): Mach-O REL relocs carry the addend implicitly in the relocated field, and ELF RELA overwrites it harmlessly, so both converge on sym + addend. kit codegen emits data relocs the same way.

Same-section branch relaxation. The per-arch parser emits a relocation for every symbolic branch target, because a forward b .Lfoo is only known to be local once .Lfoo: appears. After the parse, relax_local_branches resolves intra-section branches in place — compute the displacement, patch the instruction, drop the relocation — for PC-relative branch kinds whose target is a same-section, locally-bound, non-function symbol. The two guards match the two systems this must agree with: GNU as keeps the relocation on a global (preemptible) target, and kit codegen keeps the relocation on an intra-file call/tail-call to a function symbol while resolving branches to internal labels. This is what makes a control-flow-bearing cc -S | as reproduce cc -c's .text relocation table.


4. Per-arch instruction parsers

src/arch/{aa64,x64,rv64}/asm.c. Each implements the ArchAsm vtable: a per-mnemonic dispatch that reads operand tokens through the driver helpers and emits the encoded bytes via the arch's ISA encoders — the same encoders the disassembler decodes through, so when an opcode bit moves the encoder and decoder update at one site and stay in sync by construction. Aliases (mov/neg/cmp/mul; AT&T size-suffix folding on x64; RISC-V pseudos like call/tail/la) are handled inside the parser, branching on operand shape where one mnemonic admits several forms.

Reloc-operator syntax (per-arch, per-object-format)

A symbolic operand can carry a relocation modifier. The spelling depends on both the architecture and the object format; each arch's parser accepts exactly the dialect of its target (asm_driver_compiler(d)->target.obj):


5. Inline asm — src/cg/asm.c + per-arch template walkers

kit_cg_inline_asm (src/cg/asm.c) is the public-API constraint binder. It maps the GCC operand model onto the CG stack:

The template walker lives per-arch (<arch>_asm_run_template, e.g. src/arch/aa64/asm.c). Rather than carry Operands inside tokens, it pre-substitutes placeholders into physical asm text and re-lexes that text through the same per-mnemonic parsers used by the standalone driver — one operand grammar, one lexer. It splits the template on \n/; honoring bracket depth and quote state, substitutes %N/%NN, width forms (%wN/%xN), address form (%aN), and symbolic %[name]/%w[name] (resolved against the constraint's [name]), then drives each rendered line through asm_driver_open_inline — an AsmDriver built around a memory-backed lexer and the caller's MCEmitter that emits into cg's current section and does not allocate a default .text.

CgTarget.asm_block is wired in each backend's native.c: the optimizer path binds the allocator's pre-assigned registers; a direct path self-allocates and saves/restores callee-clobbered registers around the block. CgTarget.file_scope_asm reuses the full standalone asm_parse over a memory lexer, so a top-level asm("...") is just a small .s translation unit. asm statements become an opaque IR_ASM_BLOCK instruction (IRAsmAux payload) that the optimizer records and replays like IR_CALL (see IR.md, OPT.md); asm volatile needs no special IR handling because the block is already opaque to passes.


6. cc -S — symbolized disassembly — src/api/asm_emit.c

cc -S does not have a separate textual back end. It disassembles the already-emitted object (arch_disasm_decode, the exact decoder used by objdump and kit_disasm; see OBJ.md) and re-spells the result as re-assemblable text. The work is in making operands and directives faithful enough that the output feeds back through the assembler to the same object.

Directive spelling — the AsmSyntax vtable. A tiny vtable selected by c->target.obj (not arch) supplies the format-specific directives: section_header, sym_type (ELF .type, Mach-O none), sym_size, align. Selecting by format is correct because an x64-ELF and aa64-ELF .s share .type/.size/.section; everything else (.globl, .comm, labels, data directives, instruction lines) is format-neutral and stays on the shared path.

Operand symbolization — the ArchRelocOperand inverse. For each relocation covering an instruction, arch_reloc_operand returns how that operand should be spelled: a prefix/suffix modifier (the inverse of the per-arch parser of §4), an addend_bias (x86-64 rel32 relocs store addend-4; the bias undoes it so the printed offset is the symbol offset), and a surgery kind telling the printer where in the operand text to splice the symref:

Three printer-side reconstructions complete the round-trip:

Data relocations are printed by emit_data_range as .quad sym+addend (bare symbol, no modifier), with a trailing - . for the PC-relative kinds so the assembler re-derives R_PC{32,64}. An undecodable instruction byte falls back to .byte 0x.., and an undecodable instruction word to .inst 0x... so nothing is silently dropped.


7. The encoding–relocation invariant

A code-location reference that an encoding-divergent assembler must be able to recompute never bakes a fixed function + byte-offset. Switch jump-table entries and &&label address-takes relocate against a per-block local symbol (mc_label_symbol, src/arch/mc.c) whose value is the label's offset — minted lazily as .Lcfblk.<id> and defined when the label is placed. A third-party assembler that re-encodes the function to different instruction lengths still resolves such an entry to the right address, where a fixed fn+offset would point into the wrong instruction. cc -S relies on exactly this: jump tables print as .quad .Lcfblk.* and re-assemble against the same local symbols. Only .L-prefixed (and ordinary) names are treated as re-assemblable operands; other dotted names (section symbols) keep the numeric form because the expression parser does not accept them.