boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

tcc arm64 assembler — design

Working doc. Adds an arm64-asm.c to vendored tcc 0.9.26 so the ARM64-target build accepts .S inputs and __asm__("…") blocks. Lands in two phases: a narrow first cut covering exactly what the repo's .S files need today, then incremental extension to ride parity with riscv64-asm.c and (modulo x86 quirks) i386-asm.c.

Goal of this doc: lock the internal shape so phase 1 is a genuine subset of the final assembler — not a stub we throw away.

Why this exists

vendor/upstream/tcc-0.9.26.tar.gz ships per-target asm:

arch file notes
x86_64 / i386 i386-asm.c (1720 LoC) shared by both targets
arm arm-asm.c (94 LoC) stub: directives only, every opcode → tcc_error
riscv64 riscv64-asm.c (856 LoC) real assembler
arm64 absent; CONFIG_TCC_ASM undefined for TCC_TARGET_ARM64

Today the boot2 Makefile compensates for the arm64 gap by cross-asm'ing tcc-cc/aarch64/start.S and tcc-libc/aarch64/{start,sys_stubs}.S through host clang -target aarch64-linux-gnu (Makefile:386–410). This doc describes the assembler that lets us delete that.

Phase 1 — narrow scope

Cover exactly the mnemonics the in-tree .S fixtures use, plus the directive surface tccasm.c already handles.

Mnemonics required by tcc-cc/aarch64/start.S and tcc-libc/aarch64/{start,sys_stubs}.S:

mnemonic forms used encoding family
mov mov xN, #imm (incl. negative); mov xN, xM movz/movn/movk + ORR(reg) alias
add add xN, sp, #imm add (immediate, 64-bit)
ldr ldr xN, [xM], ldr xN, [sp] LDR (immediate, unsigned offset, 64-bit)
bl bl <symbol> BL → R_AARCH64_CALL26
b b . (self-loop) B → R_AARCH64_JUMP26 (or in-section fixup)
ret ret (uses x30) RET (Xn=30 default)
svc svc #0, svc #imm16 SVC

Registers: x0-x30, w0-w30, sp, xzr, wzr. (fp=x29, lr=x30, ip0=x16, ip1=x17 are aliases — defer to phase 2.)

Directives: anything tccasm.c already drives — .globl, .text, .data, .byte, .word, .quad, .ascii, .asciz, .align, .skip, .section, .previous, labels, .set. Phase 1 does not need to add code here; pulling in tccasm.c is automatic once arm64-asm.c defines CONFIG_TCC_ASM.

Inline __asm__ constraint plumbing (subst_asm_operand, asm_compute_constraints, asm_gen_code) follows riscv64-asm.c's "defined but no-op" pattern in phase 1: .S files work, full constraint-based inline asm doesn't yet. Same posture upstream shipped riscv64 with.

Phase 1 acceptance: the existing tcc-libc and tcc-cc suites pass on ARCH=aarch64 with the host cross-asm path removed from the Makefile (start/sys_stubs assembled by tcc-boot2 itself).

File layout

arm64-asm.c     new — opcode table, parser, encoders   (~600 LoC at parity)
arm64-tok.h     new — DEF_ASM(...) for regs + mnemonics (~150 LoC at parity)
tcctok.h        +3 lines: include arm64-tok.h under TCC_TARGET_ARM64
tcc.h           +1 line:  include arm64-asm.c in the per-target block
libtcc.c        +1 line:  same, in the ONE_SOURCE block

Patches go in scripts/simple-patches/tcc-0.9.26/ and apply via stage1-flatten.sh's apply_our_patch mechanism — same shape as the existing arm64 patches (arm64-stdarg-array, arm64-va-pointer-operand, arm64-va-arg-pointer). New arm64-asm.c and arm64-tok.h ship as straight files in scripts/simple-patches/tcc-0.9.26/files/ and are copied into $SRC by the flatten script before preprocessing.

Internal shape

The narrow set is small enough to write linearly, but it's worth ten minutes more to put the real ARM64 ISA encoding model in place on day one so phase 2 is "add table rows," not "rewrite."

Operand model

enum {
    OPT_REG,        /* X/W register, 0..31 (sp/zr distinguished by use) */
    OPT_SHIFT_REG,  /* Xn[, lsl/lsr/asr/ror #imm]  */
    OPT_EXT_REG,    /* Xn[, uxtw/sxtw/sxtx #imm]   */
    OPT_IMM,        /* unparsed signed/unsigned immediate */
    OPT_IMM12,      /* add/sub immediate: 12-bit + optional lsl#12 */
    OPT_LOG_IMM,    /* and/orr/eor logical immediate (N:imms:immr) */
    OPT_MOV_IMM,    /* movz/movk/movn 16-bit + hw shift */
    OPT_MEM,        /* [Xn], [Xn,#imm], [Xn,Xm{,ext}], pre/post indexed */
    OPT_LABEL,      /* symbol+addend; resolves to PC-rel reloc          */
    OPT_COND,       /* eq/ne/lt/... (4-bit cond code)                   */
    OPT_SYS,        /* sysreg encoding for mrs/msr — phase 3            */
};

typedef struct Operand {
    uint32_t kind;          /* OPT_* */
    uint8_t  is_w;          /* 0=64-bit, 1=32-bit (X vs W) */
    uint8_t  is_sp;         /* 1 if textual form was sp (vs xzr) */
    union {
        struct { uint8_t reg; uint8_t shift_kind; uint8_t shift_amt; } r;
        struct { uint8_t base, idx, ext_kind, ext_amt;
                 int32_t disp; uint8_t mode; /* off|preidx|postidx */ } m;
        ExprValue e;        /* immediates and label refs */
        uint8_t  cond;
    };
} Operand;

The kind enum is the type signature instruction encoders match against. Phase 1 uses only OPT_REG, OPT_IMM, OPT_MEM (simple base+offset variant), and OPT_LABEL. Adding the rest is "new enum value + new parse path"; encoders not yet handling them just expect("supported operand") until they do.

is_w, is_sp — AArch64-specific. Wn and Xn share register numbers; the encoder needs to know the size to set the sf bit. sp and xzr both encode as register 31 but are not interchangeable per-instruction; track which token the user wrote.

Encoder organization

One static helper per ARM64 instruction format, mirroring riscv64-asm.c's asm_emit_i / asm_emit_r / asm_emit_u / asm_emit_s. Group by encoding family in C ARM ARM (Section C4):

helper covers
emit_dp_imm_addsub add/sub/cmp/cmn (immediate)
emit_dp_imm_log and/orr/eor/tst (immediate)
emit_dp_imm_mov movz/movk/movn (incl. mov aliases)
emit_dp_imm_bitfield sbfm/ubfm/bfm + sxtw/uxtb/lsl-imm aliases
emit_dp_reg_addsub add/sub/cmp shifted-reg + extended-reg
emit_dp_reg_log and/orr/eor/bic shifted-reg
emit_dp_reg_shift lslv/lsrv/asrv/rorv + lsl-reg aliases
emit_dp_reg_mul madd/msub/mul/mneg/smull/umull
emit_dp_reg_csel csel/csinc/csinv/csneg + cset/cinc aliases
emit_ldst_imm ldr/str (immediate, unsigned + pre/post)
emit_ldst_reg ldr/str (register offset, with extend)
emit_ldst_pair ldp/stp (incl. pre/post indexed)
emit_ldst_pseudo_eq ldr Xn, =imm64 / =sym — inline lowering, see below
emit_branch_imm b/bl + label reloc
emit_branch_cond b.cond + label reloc
emit_branch_cmp cbz/cbnz/tbz/tbnz
emit_branch_reg br/blr/ret
emit_system svc/hvc/smc/brk/hint (nop/yield/wfe/wfi)

Phase 1 implements only emit_dp_imm_addsub, emit_dp_imm_mov, emit_ldst_imm, emit_branch_imm, emit_branch_reg (just ret), and emit_system (just svc). Each helper has the full ISA-format encoding from day one — phase 1 just feeds it the narrow operand shapes.

asm_opcode dispatch

Same shape as riscv64-asm.c's tail switch: outer switch on the TOK groups dispatching to a per-family parser, which parses operands, validates kinds, and calls the matching emit_* helper. Adding mnemonics in phase 2 is a new case TOK_ASM_xxx: plus, if needed, a new shared parser routine.

Label & relocation interface

bl <sym> / b <sym> emit zero into the instruction word and call greloca(cur_text_section, sym, ind, R_AARCH64_CALL26 /* or JUMP26 */, 0) — both reloc types are already handled by arm64-link.c:30-46. Local backward references (b ., numbered labels) resolve through the symbol table the same way tccasm.c already wires for other arches: asm_new_label defines, asm_expr resolves, the relocation collapses at link time.

Symbol address loads (the ldr Xn, =sym pseudo, and any phase-2 movz/movk-via-:abs_g0:/:abs_g1:/etc. modifiers) emit the same 4-instruction movz/movk chain that arm64-gen.c:431-440 uses for compiler-emitted loads: R_AARCH64_MOVW_UABS_G0_NC + G1_NC + G2_NC + G3. adrp+add is deliberately not used — arm64-gen.c:425-430 documents that the ±4GB ADR_PREL_PG range fails on tcc's static layout and the MOVW chain is the working idiom. The relocs are exercised by every existing tcc-built ARM64 binary, so this is well-trodden ground.

Inline-asm constraint plumbing

Phase 1 stubs:

ST_FUNC void subst_asm_operand(CString *s, SValue *sv, int mod) {
    tcc_error("ARM64 inline asm operands not implemented yet");
}
ST_FUNC void asm_compute_constraints(...) { /* no-op */ }
ST_FUNC void asm_gen_code(...) { /* no-op */ }
ST_FUNC void asm_clobber(uint8_t *cr, const char *str) {
    /* parse register name, mark cr[reg]=1 — copy from riscv64-asm.c */
}
ST_FUNC int asm_parse_regvar(int t) { /* x0..xzr, w0..wzr → 0..31 */ }

This is enough for .S files, top-level __asm__("…") strings, and the __asm__("name") symbol-rename form. Constraint-driven register allocation (__asm__("…" : "=r"(out) : "r"(in))) lights up in phase 3 once subst_asm_operand + asm_compute_constraints are real — straight port from i386-asm.c's template logic adapted to ARM64 register names; no surprises.

Phase plan

Phase 1 (this design)mov/add/ldr/bl/b/ret/svc, all integer-register operand kinds restricted to OP_REG/OP_IMM/OP_MEM (base+disp)/OP_LABEL. Acceptance: .S files in tcc-cc/tcc-libc assemble through tcc-boot2; Makefile drops TCC_ASM dance for ARCH=aarch64.

Phase 2 (implemented) — broadens to riscv64-parity coverage. Surface added:

arm64_encode_bimm64 and arm64_movimm from arm64-gen.c are called directly: under ONE_SOURCE (the bootstrap pipeline) both .c files inhabit the same TU sequentially, so the static helpers are visible to arm64-asm.c without ST_FUNC promotion.

Pre-existing limitation: mes-libc's strtoull truncates via strtol, so 64-bit hex literals (e.g. #0xff00ff00ff00ff00) get clamped at parse-time. Computed expressions (#1<<32) work around it. Out of scope to fix in this phase — surfaces in any asm path that takes wide immediates and is unrelated to the encoder logic.

tests2/73_arm64.c does not exist upstream — the doc was speculative. Validation instead is the in-tree .S round-trip plus a hand-checked build/phase2-test/test.S fixture covering each new family.

Phase 3 — full inline-asm constraint surface (subst_asm_operand + asm_compute_constraints). Ports the i386-asm.c template walk; ARM64-specific bits are operand modifier letters (%w0 for W-form, %x0 for X-form) and clobber semantics.

Validation

Unit-level: a new tests/tcc-asm/ suite with one .S (or __asm__() C wrapper) fixture per mnemonic+operand-shape combo, diffing the encoded bytes against a known-good (host clang or upstream gas) reference. Same shape as the existing P1 suite — fixture in, expected hex out, byte diff.

Integration-level: drop the host cross-asm out of Makefile lines 386–410 for ARCH=aarch64 and let tcc-boot2 build start.S / sys_stubs.S directly. The existing tcc-cc and tcc-libc suites then exercise the new assembler end-to-end, including the stage-2/stage-3 fixed-point check.

Self-host check (phase 2+): compile the patched tcc.flat.c itself with tcc-tcc and confirm the arm64-asm.c it just compiled is byte-identical to the one cc.scm produced.

Resolved decisions