boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

P1

Scope

P1 is a portable pseudo-ISA for standalone executables.

P1 has two width variants:

Portable source may use any number of word arguments. The first four argument registers are explicit, and additional argument words are passed through a portable incoming stack-argument area.

Portable source may directly return 0..1 word. Wider results use the portable indirect-result convention described below.

Toolchain envelope

P1 must be assemblable through the existing M0 + hex2 path, with catm as the only composition primitive between source or generated fragments. The spec therefore assumes only the following toolchain features:

Source notation

This document describes instructions using ordinary assembly notation such as ADD rd, ra, rb, LD rd, [ra + off], or CALL.

Because of the toolchain constraints above, portable source does not encode most operands as textual instruction arguments. Instead, register choices, inline immediate values, and small fixed parameters are fused into opcode names, following the generated-table style used by p1/gen/p1_gen.py.

So the notation in this document is descriptive rather than literal:

Labels still appear in source where the toolchain supports them directly, such as LA rd, %label and LA_BR %label.

Register Model

Exposed registers

P1 exposes the following source-level registers:

Hidden registers

The backend may reserve additional native registers that are never visible in P1 source:

No hidden register may carry a live P1 value across an instruction boundary.

Calling Convention

Arguments and return values

P1 defines three result conventions: one-word direct, two-word direct, and indirect.

In the one-word direct-result convention:

In the two-word direct-result convention:

In the indirect-result convention:

In both direct-result conventions, incoming stack-argument slot 0 corresponds to explicit argument word 4. In the indirect-result convention, incoming stack-argument slot 0 corresponds to explicit argument word 3.

The two-word direct-result convention covers common cases such as 64-bit integer results on 32-bit targets, two-word aggregates, and divmod-style returns. The indirect-result convention is the portable way to return any result wider than two words.

Register preservation

Caller-saved:

Callee-saved:

Call semantics

A function that issues any CALL, CALLR, TAIL, or TAILR must establish a standard frame with ENTER before its first such op. A leaf that issues none of those may omit the frame entirely.

If a function needs any incoming argument after making a call, it must save it before the call. This matters in particular for a0, which is overwritten by every convention's return value, and for a1 when the callee uses the two-word direct-result convention.

A call that passes any stack argument words requires the caller to have an active standard frame with enough frame-local storage to stage those outgoing words.

The return address is hidden machine state. Portable source must not assume that it lives in any exposed register.

Stack Convention

Call-boundary rule

At every call boundary, the backend must satisfy the native C ABI stack alignment rule for the target architecture.

Portable source must therefore treat raw function-entry sp as opaque. It may not assume that the low bits of sp have the same meaning on all targets before a frame is established.

Incoming stack-argument area

P1 defines an abstract incoming stack-argument area for explicit argument words that do not fit in registers.

LDARG is valid only when the current function has an active standard frame. Therefore, a function that needs any incoming stack argument must establish a standard frame before its first LDARG.

Portable source must not assume any direct relationship between incoming argument slots and raw function-entry sp. In particular, source must not try to reconstruct stack arguments by manually indexing from sp; backend entry layouts differ across targets.

For a call with m stack-passed explicit argument words, the caller stages those words in the first m words of its frame-local storage immediately before the call:

[sp + 0*WORD] = outgoing arg word 0
[sp + 1*WORD] = outgoing arg word 1
...

At callee entry, those staged words become incoming argument slots 0..m-1. The backend is responsible for mapping between the caller's frame layout and the callee's abstract incoming argument slots.

Portable code that needs both ordinary locals and stack-passed outgoing arguments must reserve enough total frame-local storage and keep the low- addressed prefix available for outgoing argument staging across the call.

Standard frame layout

Functions that need local stack storage establish a standard frame with ENTER size. After frame establishment, the portable-visible frame-local storage occupies the first size bytes above sp:

[sp + 0 ... sp + size - 1] = frame-local storage

Frame-local storage is byte-addressed. Portable code may use it for ordinary locals, spilled callee-saved registers, and the caller-staged outgoing stack-argument words described above.

Each frame also carries backend-private per-frame state — typically the saved return continuation, saved caller sp, and any padding needed to satisfy STACK_ALIGN. That state is not addressable by portable source, and the backend chooses its layout and total allocation size.

Word sizes:

STACK_ALIGN is target-defined and must satisfy the native call ABI at every call boundary.

Leaf functions that need no frame-local storage may omit the frame entirely.

Frame invariants

Op Set Summary

Category Operations
Materialization LI rd, imm, LA rd, %label, LA_BR %label
Moves MOV rd, rs, MOV rd, sp
Arithmetic ADD, SUB, AND, OR, XOR, SHL, SHR, SAR, MUL, DIV, REM
Immediate arithmetic ADDI, ANDI, ORI, SHLI, SHRI, SARI
Memory LD, ST, LB, SB
ABI access LDARG
Branching B, BR, BEQ, BNE, BLT, BLTU, BEQZ, BNEZ, BLTZ
Calls / returns CALL, CALLR, RET, ERET, TAIL, TAILR
Frame management ENTER
System SYSCALL

Immediates

Immediate operands appear only in instructions that explicitly admit them. Portable source has three immediate classes:

P1 also uses two structured assembly-time operands:

LI rd, imm loads the one-word integer value imm.

LA rd, %label loads the address of %label as a one-word pointer value.

The backend may realize LI and LA using native immediates, literal pools, multi-instruction sequences, or other backend-private mechanisms.

Backends may assume labels fit in 32 bits when realizing LA and LA_BR. This reflects the stage0 image layout (hex2-0 base 0x00600000, programs well under 4 GB), not a portable-ISA-level guarantee. Backends that target images loaded above the 4 GB boundary must adjust their LA / LA_BR lowering. LI makes no such assumption — it materializes any one-word value.

Control Flow

Call / Return / Tail Call

Control-flow targets are materialized with LA_BR %label, which loads %label into the hidden branch-target mechanism br. The immediately following control-flow op consumes that target.

CALL transfers control to the target most recently loaded by LA_BR and establishes a return continuation such that a subsequent RET returns to the instruction after the CALL. CALL requires an active standard frame.

CALLR rs is the register-indirect form of CALL. It transfers control to the code pointer value held in rs and establishes the same return continuation semantics as CALL.

RET returns from a leaf function through the hidden return continuation captured at call time. RET is valid only when the current function has no active standard frame.

ERET returns from a function that has an active standard frame. It performs the standard epilogue — restoring sp and the hidden return continuation — and then returns to the caller. ERET is valid only when the current function has an active standard frame.

TAIL is a tail call to the target most recently loaded by LA_BR. It is valid only when the current function has an active standard frame. TAIL performs the standard epilogue for the current frame and then transfers control to the loaded target without creating a new return continuation. The callee therefore returns directly to the current function's caller.

TAILR rs is the register-indirect form of TAIL. It is valid only when the current function has an active standard frame.

Because stack-passed outgoing argument words are staged in the caller's own frame-local storage, TAIL and TAILR are portable only when the tail-called callee requires no stack-passed argument words. Portable compilers must lower other tail-call cases to an ordinary CALL / ERET sequence.

Portable source must treat the return continuation as hidden machine state. It must not assume that the return address lives in any exposed register or stack location except as defined by the standard frame layout after frame establishment.

Prologue / Epilogue

P1 has a single frame-establishment op, ENTER size. Frame teardown is not a standalone op; it is embedded in ERET, TAIL, and TAILR.

ENTER size establishes a standard frame with size bytes of frame-local storage. After it executes:

[sp + 0 ... sp + size - 1] = frame-local storage

Any backend-private per-frame state (saved return continuation, saved caller sp, alignment padding) lives outside the portable-visible size bytes. Portable source may not address it.

ERET, TAIL, and TAILR each perform the standard epilogue — restoring sp and the hidden return continuation — and then transfer control: ERET to the caller, TAIL to the target in br, and TAILR to the target in rs. Portable source must use one of these ops (not RET) to exit a function that has established a frame.

A function may omit ENTER entirely if it is a leaf and needs no frame. Such a function exits with RET.

ENTER does not implicitly save or restore s0-s3. A function that modifies any callee-saved register must preserve it explicitly, typically by storing it in frame-local storage within its standard frame.

Branching

P1 branch targets are carried through the hidden branch-target mechanism br. Portable source may load br only through:

No branch, call, or tail opcode takes a label operand directly. Portable source must treat br as owned by the control-flow machinery. No live value may be carried in br. Each LA_BR must be consumed by the immediately following branch, call, or tail op, and portable source must not rely on br surviving across any other instruction.

The portable branch families are:

BLT and BLTZ perform signed comparisons on one-word values. BLTU performs an unsigned comparison on one-word values; there is no unsigned zero-operand variant because x < 0 is always false under unsigned interpretation.

If a branch condition is true, control transfers to the target currently held in br. If the condition is false, execution falls through to the next instruction.

Data Ops

Arithmetic

P1 defines the following arithmetic and bitwise operations on one-word values:

For ADD, SUB, MUL, AND, OR, and XOR, computation is modulo the active word size.

SHL shifts left and discards high bits. SHR is a logical right shift and zero-fills. SAR is an arithmetic right shift and sign-fills.

For register-count shifts, only the low 5 bits of the shift count are observed in P1-32, and only the low 6 bits are observed in P1-64.

Immediate-form shifts use inline immediates in the range 0..31 in P1-32 and 0..63 in P1-64.

DIV is signed division on one-word two's-complement values and truncates toward zero. REM is the corresponding signed remainder.

Division by zero is outside the portable contract. The overflow case MIN_INT / -1 is also outside the portable contract, as is the corresponding remainder case.

Moves

P1 defines the following move and materialization operations:

MOV may copy from any exposed general register to any exposed general register.

Portable source may also read the current stack pointer through MOV rd, sp.

Portable source may not write sp through MOV. Stack-pointer updates are only performed by ENTER, ERET, TAIL, TAILR, and backend-private call/return machinery.

LI materializes an integer bit-pattern. LA materializes the address of a label. LA_BR is a separate control-flow-target materialization form and is not part of the general move family.

Memory

P1 defines the following memory-access operations:

LD and ST access one full word: 4 bytes in P1-32 and 8 bytes in P1-64.

LB loads one byte and zero-extends it to a full word. SB stores the low 8 bits of the source value.

Memory offsets use signed 12-bit inline immediates.

The base address for a memory access may be any exposed general register or sp.

LDARG rd, idx loads incoming stack-argument slot idx, where slot 0 is the first stack-passed explicit argument word. idx is word-indexed, not byte-indexed. LDARG is an ABI access, not a general memory operation; it does not expose or imply any raw sp-relative layout at function entry.

LDARG is valid only when the current function has an active standard frame.

Portable source must not assume that labels are aligned beyond what is explicitly established by the program itself. Portable code should use naturally aligned addresses for LD and ST. Unaligned word accesses are outside the portable contract. Byte accesses have no additional alignment requirement.

Program Entry

P1 defines a portable program-entry model so that portable source does not need to know how the native loader delivers argc / argv.

The target backend is responsible for emitting a per-arch _start stub that:

  1. Captures argc and a pointer to the first argv word from the native entry stack or registers.
  2. Calls the portable label p1_main under the one-word direct-result convention with:
    • a0 = argc
    • a1 = argv, a pointer to the first argv word. Subsequent slots live at offsets 1*WORD, 2*WORD, ..., and the list is terminated by a NULL word.
  3. On return from p1_main, performs sys_exit using the returned value in a0 as the exit status.

Portable P1 source defines p1_main as an ordinary P1 function and reads argc / argv through the standard calling convention.

At entry to p1_main, the native entry-stack layout has already been consumed by the backend stub. Portable source may not assume anything about the sp value inherited from _start except that it satisfies the call-boundary alignment rule and that the standard frame protocol (ENTER / ERET) works correctly from it.

p1_main may return normally, or it may call sys_exit directly at any point.

The portable entry model does not expose envp or auxiliary vectors. Targets that need them must provide target-specific extensions; those are outside the portable P1 surface.

System

SYSCALL is part of the portable ISA surface.

At the portable level, the syscall convention is:

At the portable level, SYSCALL clobbers only a0. All other exposed registers are preserved across the syscall.

The mapping from symbolic syscall names to numeric syscall identifiers is target-defined. The set of syscalls available to a given program is likewise specified outside the core P1 ISA, for example by a target profile or runtime interface document.

Target notes