commit add01712f59b79acb21ef42924bdb260357ef6b8
parent 932b2491b2cd4b0d21da8e12eb8796f6deb74da5
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Thu, 23 Apr 2026 11:42:13 -0700
drop i386, expand register set, add BLTU
Diffstat:
5 files changed, 110 insertions(+), 51 deletions(-)
diff --git a/docs/P1v2.md b/docs/P1v2.md
@@ -59,8 +59,8 @@ as `LA rd, %label` and `LA_BR %label`.
P1 v2 exposes the following source-level registers:
- `a0`–`a3` — argument registers. Also caller-saved general registers.
-- `t0` — caller-saved temporary.
-- `s0`–`s1` — callee-saved general registers.
+- `t0`–`t2` — caller-saved temporaries.
+- `s0`–`s3` — callee-saved general registers.
- `sp` — stack pointer.
### Hidden registers
@@ -68,24 +68,34 @@ P1 v2 exposes the following source-level registers:
The backend may reserve additional native registers that are never visible in
P1 source:
-- `br` — branch / call target mechanism
+- `br` — branch / call target mechanism, implemented as a dedicated hidden
+ native register on every target
- backend-local scratch used entirely within one instruction expansion
No hidden register may carry a live P1 value across an instruction boundary.
-`br` need not be a native register on every target; it may be implemented by
-another backend-private mechanism.
## Calling Convention
### Arguments and return values
-- In the direct-result convention, explicit argument words 0-3 live in
- `a0-a3`.
+P1 v2 defines three result conventions: one-word direct, two-word direct, and
+indirect.
+
+In the one-word direct-result convention:
+
+- Explicit argument words 0-3 live in `a0-a3`.
- Additional explicit argument words live in the incoming stack-argument area
and are read with `LDARG`.
-- In the direct-result convention, a one-word return value lives in `a0`.
+- On return, a one-word result lives in `a0`.
+
+In the two-word direct-result convention:
+
+- Explicit argument words 0-3 live in `a0-a3` on entry.
+- Additional explicit argument words still live in the incoming
+ stack-argument area.
+- On return, `a0` holds result word 0 and `a1` holds result word 1.
-P1 v2 also defines an indirect-result convention for wider returns:
+In the indirect-result convention:
- The caller passes a writable result buffer pointer in `a0`.
- Explicit argument words 0-2 then live in `a1-a3`.
@@ -93,23 +103,25 @@ P1 v2 also defines an indirect-result convention for wider returns:
stack-argument area.
- On return, `a0` holds the same result buffer pointer value.
-In the direct-result convention, incoming stack-argument slot `0` therefore
-corresponds to explicit argument word `4`. In the indirect-result convention,
-incoming stack-argument slot `0` corresponds to explicit argument word `3`.
+In both direct-result conventions, incoming stack-argument slot `0` corresponds
+to explicit argument word `4`. In the indirect-result convention, incoming
+stack-argument slot `0` corresponds to explicit argument word `3`.
-The indirect-result convention is the portable way to return aggregates or any
-result wider than one word.
+The two-word direct-result convention covers common cases such as 64-bit
+integer results on 32-bit targets, two-word aggregates, and divmod-style
+returns. The indirect-result convention is the portable way to return any
+result wider than two words.
### Register preservation
Caller-saved:
- `a0`–`a3`
-- `t0`
+- `t0`–`t2`
Callee-saved:
-- `s0`–`s1`
+- `s0`–`s3`
- `sp`
### Call semantics
@@ -118,8 +130,9 @@ A call is valid from any function, including a leaf. Call / return correctness
does not depend on establishing a frame first.
If a function needs any incoming argument after making a call, it must save it
-before the call. This matters in particular for argument 0, since `a0` is also
-the return-value register.
+before the call. This matters in particular for `a0`, which is overwritten by
+every convention's return value, and for `a1` when the callee uses the two-word
+direct-result convention.
A call that passes any stack argument words requires the caller to have an
active standard frame with enough frame-local storage to stage those outgoing
@@ -220,7 +233,7 @@ Leaf functions that need no frame-local storage may omit the frame entirely.
| Immediate arithmetic | `ADDI`, `ANDI`, `ORI`, `SHLI`, `SHRI`, `SARI` |
| Memory | `LD`, `ST`, `LB`, `SB` |
| ABI access | `LDARG` |
-| Branching | `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BEQZ`, `BNEZ`, `BLTZ` |
+| Branching | `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ` |
| Calls / returns | `CALL`, `CALLR`, `RET`, `TAIL`, `TAILR` |
| Frame management | `ENTER`, `LEAVE` |
| System | `SYSCALL` |
@@ -344,11 +357,14 @@ The portable branch families are:
- `B` — unconditional branch to the target in `br`
- `BR rs` — unconditional branch to the code pointer in `rs`
-- `BEQ`, `BNE`, `BLT` — conditional branch to the target in `br`
+- `BEQ`, `BNE`, `BLT`, `BLTU` — conditional branch to the target in `br`
- `BEQZ`, `BNEZ`, `BLTZ` — conditional branch to the target in `br` using zero
as the second operand
-`BLT` and `BLTZ` perform signed comparisons on one-word values.
+`BLT` and `BLTZ` perform signed comparisons on one-word values. `BLTU`
+performs an unsigned comparison on one-word values; there is no unsigned
+zero-operand variant because `x < 0` is always false under unsigned
+interpretation.
If a branch condition is true, control transfers to the target currently held in
`br`. If the condition is false, execution falls through to the next
@@ -455,22 +471,20 @@ runtime interface document.
## Target notes
-- `a0` is both argument 0 and the return-value register in the portable
- calling convention in the direct-result case, and the indirect-result buffer
- pointer in the indirect-result case.
+- `a0` is argument 0, the one-word direct return-value register, the low word
+ of the two-word direct return pair, and the indirect-result buffer pointer.
- On aarch64, riscv64, arm32, and rv32, that matches the native integer/pointer
ABI directly.
- On amd64, the backend must translate between portable `a0` and native
- return register `rax` at call and return boundaries.
-- On i386, the backend must translate between portable argument registers and
- the native stack-argument ABI at call boundaries.
-- On amd64 and i386, `LDARG` must account for the return address pushed by the
- native `call` instruction. On aarch64, riscv64, arm32, and rv32, it maps more
+ return register `rax` at call and return boundaries. For the two-word direct
+ return, the backend must also translate `a1` against native `rdx`.
+- On amd64, `LDARG` must account for the return address pushed by the native
+ `call` instruction. On aarch64, riscv64, arm32, and rv32, it maps more
directly to the entry `sp` plus the backend's standard frame/header policy.
-- On amd64, aarch64, riscv64, arm32, and rv32, `br` may be implemented as a
- dedicated hidden native register.
-- On i386, `br` is expected to be a backend-private stack convention, not a
- dedicated hidden register.
+- `br` is implemented as a dedicated hidden native register on every target.
+- On arm32, `t1` and `t2` map to natively callee-saved registers; the backend
+ is responsible for preserving them across function boundaries in accordance
+ with the native ABI, even though P1 treats them as caller-saved.
- Frame-pointer use is backend policy, not part of the P1 v2 architectural
register set.
@@ -485,19 +499,27 @@ runtime interface document.
| `a2` | `rdx` | `x2` | `a2` |
| `a3` | `rcx` | `x3` | `a3` |
| `t0` | `r10` | `x9` | `t0` |
+| `t1` | `r11` | `x10` | `t1` |
+| `t2` | `r8` | `x11` | `t2` |
| `s0` | `rbx` | `x19` | `s1` |
| `s1` | `r12` | `x20` | `s2` |
+| `s2` | `r13` | `x21` | `s3` |
+| `s3` | `r14` | `x22` | `s4` |
| `sp` | `rsp` | `sp` | `sp` |
#### 32-bit targets
-| P1 | arm32 | i386 | rv32 |
-|------|-------|-------|-------|
-| `a0` | `r0` | `eax` | `a0` |
-| `a1` | `r1` | `ecx` | `a1` |
-| `a2` | `r2` | `edx` | `a2` |
-| `a3` | `r3` | `ebx` | `a3` |
-| `t0` | `r12` | `esi` | `t0` |
-| `s0` | `r4` | `edi` | `s1` |
-| `s1` | `r5` | `ebp` | `s2` |
-| `sp` | `sp` | `esp` | `sp` |
+| P1 | arm32 | rv32 |
+|------|-------|-------|
+| `a0` | `r0` | `a0` |
+| `a1` | `r1` | `a1` |
+| `a2` | `r2` | `a2` |
+| `a3` | `r3` | `a3` |
+| `t0` | `r12` | `t0` |
+| `t1` | `r6` | `t1` |
+| `t2` | `r7` | `t2` |
+| `s0` | `r4` | `s1` |
+| `s1` | `r5` | `s2` |
+| `s2` | `r8` | `s3` |
+| `s3` | `r9` | `s4` |
+| `sp` | `sp` | `sp` |
diff --git a/p1/P1-aarch64.M1pp b/p1/P1-aarch64.M1pp
@@ -26,12 +26,24 @@
%macro aa64_reg_t0()
9
%endm
+%macro aa64_reg_t1()
+10
+%endm
+%macro aa64_reg_t2()
+11
+%endm
%macro aa64_reg_s0()
19
%endm
%macro aa64_reg_s1()
20
%endm
+%macro aa64_reg_s2()
+21
+%endm
+%macro aa64_reg_s3()
+22
+%endm
%macro aa64_reg_sp()
31
%endm
@@ -51,13 +63,13 @@
8
%endm
%macro aa64_reg_save0()
-21
+23
%endm
%macro aa64_reg_save1()
-22
+24
%endm
%macro aa64_reg_save2()
-23
+25
%endm
%macro aa64_reg(r)
@@ -85,12 +97,24 @@
%macro aa64_is_sp_t0()
0
%endm
+%macro aa64_is_sp_t1()
+0
+%endm
+%macro aa64_is_sp_t2()
+0
+%endm
%macro aa64_is_sp_s0()
0
%endm
%macro aa64_is_sp_s1()
0
%endm
+%macro aa64_is_sp_s2()
+0
+%endm
+%macro aa64_is_sp_s3()
+0
+%endm
%macro aa64_is_sp_sp()
1
%endm
@@ -429,6 +453,10 @@
%aa64_cmp_skip(10, ra, rb)
%aa64_br(br)
%endm
+%macro p1_condb_BLTU(ra, rb)
+%aa64_cmp_skip(2, ra, rb)
+%aa64_br(br)
+%endm
%macro p1_condb(op, ra, rb)
%p1_condb_##op(ra, rb)
%endm
diff --git a/p1/P1.M1pp b/p1/P1.M1pp
@@ -143,6 +143,10 @@
%p1_condb(BLT, ra, rb)
%endm
+%macro bltu(ra, rb)
+%p1_condb(BLTU, ra, rb)
+%endm
+
%macro beqz(ra)
%p1_condbz(BEQZ, ra)
%endm
diff --git a/p1/aarch64.py b/p1/aarch64.py
@@ -29,17 +29,21 @@ NAT = {
'x4': 4,
'x5': 5,
't0': 9,
+ 't1': 10,
+ 't2': 11,
's0': 19,
's1': 20,
+ 's2': 21,
+ 's3': 22,
'sp': 31,
'xzr': 31,
'lr': 30,
'br': 17,
'scratch': 16,
'x8': 8,
- 'save0': 21,
- 'save1': 22,
- 'save2': 23,
+ 'save0': 23,
+ 'save1': 24,
+ 'save2': 25,
}
@@ -164,6 +168,7 @@ def aa_cmp_skip(op, ra, rb):
'BEQ': 1,
'BNE': 0,
'BLT': 10,
+ 'BLTU': 2,
}[op]
return cmp_hex + le32(0x54000040 | skip_cond)
diff --git a/p1/p1_gen.py b/p1/p1_gen.py
@@ -43,14 +43,14 @@ from common import (
import aarch64 # noqa: F401 - imported for arch registration side effects
-P1_GPRS = ('a0', 'a1', 'a2', 'a3', 't0', 's0', 's1')
+P1_GPRS = ('a0', 'a1', 'a2', 'a3', 't0', 't1', 't2', 's0', 's1', 's2', 's3')
P1_BASES = P1_GPRS + ('sp',)
RRR_OPS = ('ADD', 'SUB', 'AND', 'OR', 'XOR', 'SHL', 'SHR', 'SAR', 'MUL', 'DIV', 'REM')
LOGI_OPS = ('ANDI', 'ORI')
SHIFT_OPS = ('SHLI', 'SHRI', 'SARI')
MEM_OPS = ('LD', 'ST', 'LB', 'SB')
-CONDB_OPS = ('BEQ', 'BNE', 'BLT')
+CONDB_OPS = ('BEQ', 'BNE', 'BLT', 'BLTU')
CONDBZ_OPS = ('BEQZ', 'BNEZ', 'BLTZ')
ADDI_IMMS = (