p1/aarch64: lower %li(rd, imm) as movz+3xmovk - boot2

commit e586fa17898c1d3b99304f4cf22af60b00041837
parent 3317ca30d0f1a9fd607b55ce155db5c6232a6037
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat,  2 May 2026 15:50:33 -0700

p1/aarch64: lower %li(rd, imm) as movz+3xmovk

Replaces the LDR-literal-pool form (LDR Xn,[PC,#8]; B PC+12; <8-byte literal>)
with the standard aarch64 4-instruction movz+movk chain. Same 16-byte size, no
behavior change at the call sites, but removes 8-byte data interleaved into the
.text instruction stream.

Investigated as a possible alignment-related fix for the assert-fail-0 bug in
the tcc-cc suite (some 64-bit LDR literals landed at 4-byte aligned but not
8-byte aligned addresses). Replacing the lowering did not fix the bug, but the
movz/movk form is a defensible default and is kept.

Diffstat:
M P1/P1-aarch64.M1pp  | 19 +++++++++++++++++--

1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/P1/P1-aarch64.M1pp b/P1/P1-aarch64.M1pp
@@ -382,9 +382,24 @@
 
 # ---- P1 operation lowering -----------------------------------------------
 
+# MOVZ + 3 MOVK chain for 64-bit immediate. 4 instructions, 16 bytes — same
+# size as the prior LDR-literal-pool lowering. Pure instructions, no inline
+# data, which is the standard aarch64 codegen for materializing constants.
+%macro aa64_movk_lsl16(rd, imm16)
+%((| 0xF2A00000 (<< (& imm16 0xFFFF) 5) %aa64_reg(rd)))
+%endm
+%macro aa64_movk_lsl32(rd, imm16)
+%((| 0xF2C00000 (<< (& imm16 0xFFFF) 5) %aa64_reg(rd)))
+%endm
+%macro aa64_movk_lsl48(rd, imm16)
+%((| 0xF2E00000 (<< (& imm16 0xFFFF) 5) %aa64_reg(rd)))
+%endm
+
 %macro p1_li(rd, imm)
-%aa64_lit64_prefix(rd)
-$(imm)
+%aa64_movz(rd, (& imm 0xFFFF))
+%aa64_movk_lsl16(rd, (& (>> imm 16) 0xFFFF))
+%aa64_movk_lsl32(rd, (& (>> imm 32) 0xFFFF))
+%aa64_movk_lsl48(rd, (& (>> imm 48) 0xFFFF))
 %endm
 
 %macro p1_la(rd)

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README