boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit add01712f59b79acb21ef42924bdb260357ef6b8
parent 932b2491b2cd4b0d21da8e12eb8796f6deb74da5
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 23 Apr 2026 11:42:13 -0700

drop i386, expand register set, add BLTU

Diffstat:
Mdocs/P1v2.md | 108+++++++++++++++++++++++++++++++++++++++++++++++--------------------------------
Mp1/P1-aarch64.M1pp | 34+++++++++++++++++++++++++++++++---
Mp1/P1.M1pp | 4++++
Mp1/aarch64.py | 11++++++++---
Mp1/p1_gen.py | 4++--
5 files changed, 110 insertions(+), 51 deletions(-)

diff --git a/docs/P1v2.md b/docs/P1v2.md @@ -59,8 +59,8 @@ as `LA rd, %label` and `LA_BR %label`. P1 v2 exposes the following source-level registers: - `a0`–`a3` — argument registers. Also caller-saved general registers. -- `t0` — caller-saved temporary. -- `s0`–`s1` — callee-saved general registers. +- `t0`–`t2` — caller-saved temporaries. +- `s0`–`s3` — callee-saved general registers. - `sp` — stack pointer. ### Hidden registers @@ -68,24 +68,34 @@ P1 v2 exposes the following source-level registers: The backend may reserve additional native registers that are never visible in P1 source: -- `br` — branch / call target mechanism +- `br` — branch / call target mechanism, implemented as a dedicated hidden + native register on every target - backend-local scratch used entirely within one instruction expansion No hidden register may carry a live P1 value across an instruction boundary. -`br` need not be a native register on every target; it may be implemented by -another backend-private mechanism. ## Calling Convention ### Arguments and return values -- In the direct-result convention, explicit argument words 0-3 live in - `a0-a3`. +P1 v2 defines three result conventions: one-word direct, two-word direct, and +indirect. + +In the one-word direct-result convention: + +- Explicit argument words 0-3 live in `a0-a3`. - Additional explicit argument words live in the incoming stack-argument area and are read with `LDARG`. -- In the direct-result convention, a one-word return value lives in `a0`. +- On return, a one-word result lives in `a0`. + +In the two-word direct-result convention: + +- Explicit argument words 0-3 live in `a0-a3` on entry. +- Additional explicit argument words still live in the incoming + stack-argument area. +- On return, `a0` holds result word 0 and `a1` holds result word 1. -P1 v2 also defines an indirect-result convention for wider returns: +In the indirect-result convention: - The caller passes a writable result buffer pointer in `a0`. - Explicit argument words 0-2 then live in `a1-a3`. @@ -93,23 +103,25 @@ P1 v2 also defines an indirect-result convention for wider returns: stack-argument area. - On return, `a0` holds the same result buffer pointer value. -In the direct-result convention, incoming stack-argument slot `0` therefore -corresponds to explicit argument word `4`. In the indirect-result convention, -incoming stack-argument slot `0` corresponds to explicit argument word `3`. +In both direct-result conventions, incoming stack-argument slot `0` corresponds +to explicit argument word `4`. In the indirect-result convention, incoming +stack-argument slot `0` corresponds to explicit argument word `3`. -The indirect-result convention is the portable way to return aggregates or any -result wider than one word. +The two-word direct-result convention covers common cases such as 64-bit +integer results on 32-bit targets, two-word aggregates, and divmod-style +returns. The indirect-result convention is the portable way to return any +result wider than two words. ### Register preservation Caller-saved: - `a0`–`a3` -- `t0` +- `t0`–`t2` Callee-saved: -- `s0`–`s1` +- `s0`–`s3` - `sp` ### Call semantics @@ -118,8 +130,9 @@ A call is valid from any function, including a leaf. Call / return correctness does not depend on establishing a frame first. If a function needs any incoming argument after making a call, it must save it -before the call. This matters in particular for argument 0, since `a0` is also -the return-value register. +before the call. This matters in particular for `a0`, which is overwritten by +every convention's return value, and for `a1` when the callee uses the two-word +direct-result convention. A call that passes any stack argument words requires the caller to have an active standard frame with enough frame-local storage to stage those outgoing @@ -220,7 +233,7 @@ Leaf functions that need no frame-local storage may omit the frame entirely. | Immediate arithmetic | `ADDI`, `ANDI`, `ORI`, `SHLI`, `SHRI`, `SARI` | | Memory | `LD`, `ST`, `LB`, `SB` | | ABI access | `LDARG` | -| Branching | `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BEQZ`, `BNEZ`, `BLTZ` | +| Branching | `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ` | | Calls / returns | `CALL`, `CALLR`, `RET`, `TAIL`, `TAILR` | | Frame management | `ENTER`, `LEAVE` | | System | `SYSCALL` | @@ -344,11 +357,14 @@ The portable branch families are: - `B` — unconditional branch to the target in `br` - `BR rs` — unconditional branch to the code pointer in `rs` -- `BEQ`, `BNE`, `BLT` — conditional branch to the target in `br` +- `BEQ`, `BNE`, `BLT`, `BLTU` — conditional branch to the target in `br` - `BEQZ`, `BNEZ`, `BLTZ` — conditional branch to the target in `br` using zero as the second operand -`BLT` and `BLTZ` perform signed comparisons on one-word values. +`BLT` and `BLTZ` perform signed comparisons on one-word values. `BLTU` +performs an unsigned comparison on one-word values; there is no unsigned +zero-operand variant because `x < 0` is always false under unsigned +interpretation. If a branch condition is true, control transfers to the target currently held in `br`. If the condition is false, execution falls through to the next @@ -455,22 +471,20 @@ runtime interface document. ## Target notes -- `a0` is both argument 0 and the return-value register in the portable - calling convention in the direct-result case, and the indirect-result buffer - pointer in the indirect-result case. +- `a0` is argument 0, the one-word direct return-value register, the low word + of the two-word direct return pair, and the indirect-result buffer pointer. - On aarch64, riscv64, arm32, and rv32, that matches the native integer/pointer ABI directly. - On amd64, the backend must translate between portable `a0` and native - return register `rax` at call and return boundaries. -- On i386, the backend must translate between portable argument registers and - the native stack-argument ABI at call boundaries. -- On amd64 and i386, `LDARG` must account for the return address pushed by the - native `call` instruction. On aarch64, riscv64, arm32, and rv32, it maps more + return register `rax` at call and return boundaries. For the two-word direct + return, the backend must also translate `a1` against native `rdx`. +- On amd64, `LDARG` must account for the return address pushed by the native + `call` instruction. On aarch64, riscv64, arm32, and rv32, it maps more directly to the entry `sp` plus the backend's standard frame/header policy. -- On amd64, aarch64, riscv64, arm32, and rv32, `br` may be implemented as a - dedicated hidden native register. -- On i386, `br` is expected to be a backend-private stack convention, not a - dedicated hidden register. +- `br` is implemented as a dedicated hidden native register on every target. +- On arm32, `t1` and `t2` map to natively callee-saved registers; the backend + is responsible for preserving them across function boundaries in accordance + with the native ABI, even though P1 treats them as caller-saved. - Frame-pointer use is backend policy, not part of the P1 v2 architectural register set. @@ -485,19 +499,27 @@ runtime interface document. | `a2` | `rdx` | `x2` | `a2` | | `a3` | `rcx` | `x3` | `a3` | | `t0` | `r10` | `x9` | `t0` | +| `t1` | `r11` | `x10` | `t1` | +| `t2` | `r8` | `x11` | `t2` | | `s0` | `rbx` | `x19` | `s1` | | `s1` | `r12` | `x20` | `s2` | +| `s2` | `r13` | `x21` | `s3` | +| `s3` | `r14` | `x22` | `s4` | | `sp` | `rsp` | `sp` | `sp` | #### 32-bit targets -| P1 | arm32 | i386 | rv32 | -|------|-------|-------|-------| -| `a0` | `r0` | `eax` | `a0` | -| `a1` | `r1` | `ecx` | `a1` | -| `a2` | `r2` | `edx` | `a2` | -| `a3` | `r3` | `ebx` | `a3` | -| `t0` | `r12` | `esi` | `t0` | -| `s0` | `r4` | `edi` | `s1` | -| `s1` | `r5` | `ebp` | `s2` | -| `sp` | `sp` | `esp` | `sp` | +| P1 | arm32 | rv32 | +|------|-------|-------| +| `a0` | `r0` | `a0` | +| `a1` | `r1` | `a1` | +| `a2` | `r2` | `a2` | +| `a3` | `r3` | `a3` | +| `t0` | `r12` | `t0` | +| `t1` | `r6` | `t1` | +| `t2` | `r7` | `t2` | +| `s0` | `r4` | `s1` | +| `s1` | `r5` | `s2` | +| `s2` | `r8` | `s3` | +| `s3` | `r9` | `s4` | +| `sp` | `sp` | `sp` | diff --git a/p1/P1-aarch64.M1pp b/p1/P1-aarch64.M1pp @@ -26,12 +26,24 @@ %macro aa64_reg_t0() 9 %endm +%macro aa64_reg_t1() +10 +%endm +%macro aa64_reg_t2() +11 +%endm %macro aa64_reg_s0() 19 %endm %macro aa64_reg_s1() 20 %endm +%macro aa64_reg_s2() +21 +%endm +%macro aa64_reg_s3() +22 +%endm %macro aa64_reg_sp() 31 %endm @@ -51,13 +63,13 @@ 8 %endm %macro aa64_reg_save0() -21 +23 %endm %macro aa64_reg_save1() -22 +24 %endm %macro aa64_reg_save2() -23 +25 %endm %macro aa64_reg(r) @@ -85,12 +97,24 @@ %macro aa64_is_sp_t0() 0 %endm +%macro aa64_is_sp_t1() +0 +%endm +%macro aa64_is_sp_t2() +0 +%endm %macro aa64_is_sp_s0() 0 %endm %macro aa64_is_sp_s1() 0 %endm +%macro aa64_is_sp_s2() +0 +%endm +%macro aa64_is_sp_s3() +0 +%endm %macro aa64_is_sp_sp() 1 %endm @@ -429,6 +453,10 @@ %aa64_cmp_skip(10, ra, rb) %aa64_br(br) %endm +%macro p1_condb_BLTU(ra, rb) +%aa64_cmp_skip(2, ra, rb) +%aa64_br(br) +%endm %macro p1_condb(op, ra, rb) %p1_condb_##op(ra, rb) %endm diff --git a/p1/P1.M1pp b/p1/P1.M1pp @@ -143,6 +143,10 @@ %p1_condb(BLT, ra, rb) %endm +%macro bltu(ra, rb) +%p1_condb(BLTU, ra, rb) +%endm + %macro beqz(ra) %p1_condbz(BEQZ, ra) %endm diff --git a/p1/aarch64.py b/p1/aarch64.py @@ -29,17 +29,21 @@ NAT = { 'x4': 4, 'x5': 5, 't0': 9, + 't1': 10, + 't2': 11, 's0': 19, 's1': 20, + 's2': 21, + 's3': 22, 'sp': 31, 'xzr': 31, 'lr': 30, 'br': 17, 'scratch': 16, 'x8': 8, - 'save0': 21, - 'save1': 22, - 'save2': 23, + 'save0': 23, + 'save1': 24, + 'save2': 25, } @@ -164,6 +168,7 @@ def aa_cmp_skip(op, ra, rb): 'BEQ': 1, 'BNE': 0, 'BLT': 10, + 'BLTU': 2, }[op] return cmp_hex + le32(0x54000040 | skip_cond) diff --git a/p1/p1_gen.py b/p1/p1_gen.py @@ -43,14 +43,14 @@ from common import ( import aarch64 # noqa: F401 - imported for arch registration side effects -P1_GPRS = ('a0', 'a1', 'a2', 'a3', 't0', 's0', 's1') +P1_GPRS = ('a0', 'a1', 'a2', 'a3', 't0', 't1', 't2', 's0', 's1', 's2', 's3') P1_BASES = P1_GPRS + ('sp',) RRR_OPS = ('ADD', 'SUB', 'AND', 'OR', 'XOR', 'SHL', 'SHR', 'SAR', 'MUL', 'DIV', 'REM') LOGI_OPS = ('ANDI', 'ORI') SHIFT_OPS = ('SHLI', 'SHRI', 'SARI') MEM_OPS = ('LD', 'ST', 'LB', 'SB') -CONDB_OPS = ('BEQ', 'BNE', 'BLT') +CONDB_OPS = ('BEQ', 'BNE', 'BLT', 'BLTU') CONDBZ_OPS = ('BEQZ', 'BNEZ', 'BLTZ') ADDI_IMMS = (