tcc arm64 assembler — design
Working doc. Adds an arm64-asm.c to vendored tcc 0.9.26 so the
ARM64-target build accepts .S inputs and __asm__("…") blocks.
Lands in two phases: a narrow first cut covering exactly what the
repo's .S files need today, then incremental extension to ride
parity with riscv64-asm.c and (modulo x86 quirks) i386-asm.c.
Goal of this doc: lock the internal shape so phase 1 is a genuine subset of the final assembler — not a stub we throw away.
Why this exists
vendor/upstream/tcc-0.9.26.tar.gz ships per-target asm:
| arch | file | notes |
|---|---|---|
| x86_64 / i386 | i386-asm.c (1720 LoC) |
shared by both targets |
| arm | arm-asm.c (94 LoC) |
stub: directives only, every opcode → tcc_error |
| riscv64 | riscv64-asm.c (856 LoC) |
real assembler |
| arm64 | — | absent; CONFIG_TCC_ASM undefined for TCC_TARGET_ARM64 |
Today the boot2 Makefile compensates for the arm64 gap by cross-asm'ing
tcc-cc/aarch64/start.S and tcc-libc/aarch64/{start,sys_stubs}.S
through host clang -target aarch64-linux-gnu (Makefile:386–410).
This doc describes the assembler that lets us delete that.
Phase 1 — narrow scope
Cover exactly the mnemonics the in-tree .S fixtures use, plus the
directive surface tccasm.c already handles.
Mnemonics required by tcc-cc/aarch64/start.S and
tcc-libc/aarch64/{start,sys_stubs}.S:
| mnemonic | forms used | encoding family |
|---|---|---|
mov |
mov xN, #imm (incl. negative); mov xN, xM |
movz/movn/movk + ORR(reg) alias |
add |
add xN, sp, #imm |
add (immediate, 64-bit) |
ldr |
ldr xN, [xM], ldr xN, [sp] |
LDR (immediate, unsigned offset, 64-bit) |
bl |
bl <symbol> |
BL → R_AARCH64_CALL26 |
b |
b . (self-loop) |
B → R_AARCH64_JUMP26 (or in-section fixup) |
ret |
ret (uses x30) |
RET (Xn=30 default) |
svc |
svc #0, svc #imm16 |
SVC |
Registers: x0-x30, w0-w30, sp, xzr, wzr. (fp=x29,
lr=x30, ip0=x16, ip1=x17 are aliases — defer to phase 2.)
Directives: anything tccasm.c already drives — .globl, .text,
.data, .byte, .word, .quad, .ascii, .asciz, .align,
.skip, .section, .previous, labels, .set. Phase 1 does
not need to add code here; pulling in tccasm.c is automatic
once arm64-asm.c defines CONFIG_TCC_ASM.
Inline __asm__ constraint plumbing (subst_asm_operand,
asm_compute_constraints, asm_gen_code) follows riscv64-asm.c's
"defined but no-op" pattern in phase 1: .S files work, full
constraint-based inline asm doesn't yet. Same posture upstream
shipped riscv64 with.
Phase 1 acceptance: the existing tcc-libc and tcc-cc suites
pass on ARCH=aarch64 with the host cross-asm path removed from
the Makefile (start/sys_stubs assembled by tcc-boot2 itself).
File layout
arm64-asm.c new — opcode table, parser, encoders (~600 LoC at parity)
arm64-tok.h new — DEF_ASM(...) for regs + mnemonics (~150 LoC at parity)
tcctok.h +3 lines: include arm64-tok.h under TCC_TARGET_ARM64
tcc.h +1 line: include arm64-asm.c in the per-target block
libtcc.c +1 line: same, in the ONE_SOURCE block
Patches go in scripts/simple-patches/tcc-0.9.26/ and apply via
stage1-flatten.sh's apply_our_patch mechanism — same shape as
the existing arm64 patches (arm64-stdarg-array,
arm64-va-pointer-operand, arm64-va-arg-pointer). New
arm64-asm.c and arm64-tok.h ship as straight files in
scripts/simple-patches/tcc-0.9.26/files/ and are copied into
$SRC by the flatten script before preprocessing.
Internal shape
The narrow set is small enough to write linearly, but it's worth ten minutes more to put the real ARM64 ISA encoding model in place on day one so phase 2 is "add table rows," not "rewrite."
Operand model
enum {
OPT_REG, /* X/W register, 0..31 (sp/zr distinguished by use) */
OPT_SHIFT_REG, /* Xn[, lsl/lsr/asr/ror #imm] */
OPT_EXT_REG, /* Xn[, uxtw/sxtw/sxtx #imm] */
OPT_IMM, /* unparsed signed/unsigned immediate */
OPT_IMM12, /* add/sub immediate: 12-bit + optional lsl#12 */
OPT_LOG_IMM, /* and/orr/eor logical immediate (N:imms:immr) */
OPT_MOV_IMM, /* movz/movk/movn 16-bit + hw shift */
OPT_MEM, /* [Xn], [Xn,#imm], [Xn,Xm{,ext}], pre/post indexed */
OPT_LABEL, /* symbol+addend; resolves to PC-rel reloc */
OPT_COND, /* eq/ne/lt/... (4-bit cond code) */
OPT_SYS, /* sysreg encoding for mrs/msr — phase 3 */
};
typedef struct Operand {
uint32_t kind; /* OPT_* */
uint8_t is_w; /* 0=64-bit, 1=32-bit (X vs W) */
uint8_t is_sp; /* 1 if textual form was sp (vs xzr) */
union {
struct { uint8_t reg; uint8_t shift_kind; uint8_t shift_amt; } r;
struct { uint8_t base, idx, ext_kind, ext_amt;
int32_t disp; uint8_t mode; /* off|preidx|postidx */ } m;
ExprValue e; /* immediates and label refs */
uint8_t cond;
};
} Operand;
The kind enum is the type signature instruction encoders match
against. Phase 1 uses only OPT_REG, OPT_IMM, OPT_MEM
(simple base+offset variant), and OPT_LABEL. Adding the rest is
"new enum value + new parse path"; encoders not yet handling them
just expect("supported operand") until they do.
is_w, is_sp — AArch64-specific. Wn and Xn share register
numbers; the encoder needs to know the size to set the sf bit.
sp and xzr both encode as register 31 but are not
interchangeable per-instruction; track which token the user wrote.
Encoder organization
One static helper per ARM64 instruction format, mirroring
riscv64-asm.c's asm_emit_i / asm_emit_r / asm_emit_u / asm_emit_s.
Group by encoding family in C ARM ARM (Section C4):
| helper | covers |
|---|---|
emit_dp_imm_addsub |
add/sub/cmp/cmn (immediate) |
emit_dp_imm_log |
and/orr/eor/tst (immediate) |
emit_dp_imm_mov |
movz/movk/movn (incl. mov aliases) |
emit_dp_imm_bitfield |
sbfm/ubfm/bfm + sxtw/uxtb/lsl-imm aliases |
emit_dp_reg_addsub |
add/sub/cmp shifted-reg + extended-reg |
emit_dp_reg_log |
and/orr/eor/bic shifted-reg |
emit_dp_reg_shift |
lslv/lsrv/asrv/rorv + lsl-reg aliases |
emit_dp_reg_mul |
madd/msub/mul/mneg/smull/umull |
emit_dp_reg_csel |
csel/csinc/csinv/csneg + cset/cinc aliases |
emit_ldst_imm |
ldr/str (immediate, unsigned + pre/post) |
emit_ldst_reg |
ldr/str (register offset, with extend) |
emit_ldst_pair |
ldp/stp (incl. pre/post indexed) |
emit_ldst_pseudo_eq |
ldr Xn, =imm64 / =sym — inline lowering, see below |
emit_branch_imm |
b/bl + label reloc |
emit_branch_cond |
b.cond + label reloc |
emit_branch_cmp |
cbz/cbnz/tbz/tbnz |
emit_branch_reg |
br/blr/ret |
emit_system |
svc/hvc/smc/brk/hint (nop/yield/wfe/wfi) |
Phase 1 implements only emit_dp_imm_addsub, emit_dp_imm_mov,
emit_ldst_imm, emit_branch_imm, emit_branch_reg (just ret),
and emit_system (just svc). Each helper has the full ISA-format
encoding from day one — phase 1 just feeds it the narrow operand
shapes.
asm_opcode dispatch
Same shape as riscv64-asm.c's tail switch: outer switch on the
TOK groups dispatching to a per-family parser, which parses
operands, validates kinds, and calls the matching emit_* helper.
Adding mnemonics in phase 2 is a new case TOK_ASM_xxx: plus, if
needed, a new shared parser routine.
Label & relocation interface
bl <sym> / b <sym> emit zero into the instruction word and call
greloca(cur_text_section, sym, ind, R_AARCH64_CALL26 /* or JUMP26 */, 0) — both reloc types are already handled by
arm64-link.c:30-46. Local backward references (b ., numbered
labels) resolve through the symbol table the same way tccasm.c
already wires for other arches: asm_new_label defines, asm_expr
resolves, the relocation collapses at link time.
Symbol address loads (the ldr Xn, =sym pseudo, and any phase-2
movz/movk-via-:abs_g0:/:abs_g1:/etc. modifiers) emit the same
4-instruction movz/movk chain that arm64-gen.c:431-440 uses for
compiler-emitted loads: R_AARCH64_MOVW_UABS_G0_NC +
G1_NC + G2_NC + G3. adrp+add is deliberately not
used — arm64-gen.c:425-430 documents that the ±4GB ADR_PREL_PG
range fails on tcc's static layout and the MOVW chain is the
working idiom. The relocs are exercised by every existing
tcc-built ARM64 binary, so this is well-trodden ground.
Inline-asm constraint plumbing
Phase 1 stubs:
ST_FUNC void subst_asm_operand(CString *s, SValue *sv, int mod) {
tcc_error("ARM64 inline asm operands not implemented yet");
}
ST_FUNC void asm_compute_constraints(...) { /* no-op */ }
ST_FUNC void asm_gen_code(...) { /* no-op */ }
ST_FUNC void asm_clobber(uint8_t *cr, const char *str) {
/* parse register name, mark cr[reg]=1 — copy from riscv64-asm.c */
}
ST_FUNC int asm_parse_regvar(int t) { /* x0..xzr, w0..wzr → 0..31 */ }
This is enough for .S files, top-level __asm__("…") strings,
and the __asm__("name") symbol-rename form. Constraint-driven
register allocation (__asm__("…" : "=r"(out) : "r"(in))) lights
up in phase 3 once subst_asm_operand + asm_compute_constraints
are real — straight port from i386-asm.c's template logic
adapted to ARM64 register names; no surprises.
Phase plan
Phase 1 (this design) — mov/add/ldr/bl/b/ret/svc,
all integer-register operand kinds restricted to OP_REG/OP_IMM/OP_MEM
(base+disp)/OP_LABEL. Acceptance: .S files in tcc-cc/tcc-libc
assemble through tcc-boot2; Makefile drops TCC_ASM dance for
ARCH=aarch64.
Phase 2 (implemented) — broadens to riscv64-parity coverage. Surface added:
- DP-imm:
add/sub/adds/subs/cmp/cmn/neg/negs,and/orr/eor/ands/tst(logical-imm),movz/movn/movk,sbfm/ubfm/bfm+lsl/lsr/asr/sxtb/sxth/sxtw/uxtb/uxthimmediate aliases. - DP-reg: shifted-reg
add/sub(and set-flags variants), extended-reg form when one operand issp, logical-regand/orr/eor/bic/orn/eon/bics/mvn, variable shiftslslv/lsrv/asrv/rorvwithlsl/lsr/asr/rorreg aliases,mul/mneg/madd/msub/smull/umull/smaddl/umaddl/smsubl/umsubl/smulh/umulh/udiv/sdiv,csel/csinc/csinv/csneg+cset/csetm/cinc/cinv/cnegaliases. - Mem:
ldr/strregister-offset (with optionallsl/extend shift) and pre/post-indexed forms;ldrb/ldrh/ldrsb/ldrsh/ldrsw/strb/strh;ldp/stpX- and W-forms with all index modes. - Branches:
b.condandcbz/cbnz/tbz/tbnz(in-section targets only — noR_AARCH64_CONDBR19/TSTBR14reloc handlers inarm64-link.c, so extern targets error out),br/blr, fullret. - Pseudo:
ldr Xn, =imm64lowers viaarm64_movimm;ldr Xn, =symlowers to the 4-insnMOVW_UABS_G{0..3}reloc chain. - System:
svc/hvc/smc/brk/hlt,nop/yield/wfe/wfi/sev/sevl/hint,dsb/dmb/isb.
arm64_encode_bimm64 and arm64_movimm from arm64-gen.c are
called directly: under ONE_SOURCE (the bootstrap pipeline)
both .c files inhabit the same TU sequentially, so the static
helpers are visible to arm64-asm.c without ST_FUNC promotion.
Pre-existing limitation: mes-libc's strtoull truncates via
strtol, so 64-bit hex literals (e.g. #0xff00ff00ff00ff00) get
clamped at parse-time. Computed expressions (#1<<32) work
around it. Out of scope to fix in this phase — surfaces in any
asm path that takes wide immediates and is unrelated to the
encoder logic.
tests2/73_arm64.c does not exist upstream — the doc was
speculative. Validation instead is the in-tree .S round-trip
plus a hand-checked build/phase2-test/test.S fixture covering
each new family.
Phase 3 — full inline-asm constraint surface
(subst_asm_operand + asm_compute_constraints). Ports the
i386-asm.c template walk; ARM64-specific bits are operand modifier
letters (%w0 for W-form, %x0 for X-form) and clobber semantics.
Validation
Unit-level: a new tests/tcc-asm/ suite with one .S (or
__asm__() C wrapper) fixture per mnemonic+operand-shape combo,
diffing the encoded bytes against a known-good (host clang or
upstream gas) reference. Same shape as the existing P1 suite —
fixture in, expected hex out, byte diff.
Integration-level: drop the host cross-asm out of Makefile lines
386–410 for ARCH=aarch64 and let tcc-boot2 build start.S /
sys_stubs.S directly. The existing tcc-cc and tcc-libc suites
then exercise the new assembler end-to-end, including the
stage-2/stage-3 fixed-point check.
Self-host check (phase 2+): compile the patched tcc.flat.c itself
with tcc-tcc and confirm the arm64-asm.c it just compiled is
byte-identical to the one cc.scm produced.
Resolved decisions
No literal pool. Phase 2 lowers
ldr Xn, =imm64to an inlinemovz/movkchain (call into the samearm64_movi/arm64_movimmlogicarm64-gen.c:155-221already uses) andldr Xn, =symto the 4-instruction MOVW_UABS chain. tcc's own codegen never emits a pool, so adding pool infrastructure would be net-new for one gas pseudo nothing in-tree uses; inline lowering matches what compiler-emitted code already does. Cost:Xnis clobbered with the constant rather than loaded from.rodata, andldr =0x… ; .word foowon't share — neither matters for any in-tree fixture.Logical-immediate encoder: lift
arm64_encode_bimm64.arm64-gen.c:106-153already implements the full N:imms:immr encoder as a static function, used by gen.c itself fororr-immediate-as-movi(line 187) and direct logical-imm codegen (line 1395). Promote it to anST_FUNCdeclared in the arm64 block oftcc.hand call it fromarm64-asm.c. Zero new code, no port, no licensing question.R_AARCH64_MOVW_UABS_G*is the primary path.arm64-gen.c:429hardcodesavoid_adrp = 1, meaning every symbol address load in every existing tcc-built ARM64 binary already goes through the MOVW_UABS_G{0..3}_NC chain.relocate()(arm64-link.c:174-189) implements all four. Use them; don't useadrp/adrat all.