P1
Scope
P1 is a portable pseudo-ISA for standalone executables.
P1 has two width variants:
- P1-64 — one word is one 64-bit integer or pointer value
- P1-32 — one word is one 32-bit integer or pointer value
Portable source may use any number of word arguments. The first four argument registers are explicit, and additional argument words are passed through a portable incoming stack-argument area.
Portable source may directly return 0..1 word. Wider results use the
portable indirect-result convention described below.
Toolchain envelope
P1 must be assemblable through the existing M0 + hex2 path, with
catm as the only composition primitive between source or generated fragments.
The spec therefore assumes only the following toolchain features:
M0-levelDEFINE name hex_bytessubstitution- raw byte emission
- labels and label references supported by
hex2 - file concatenation via
catm
Source notation
This document describes instructions using ordinary assembly notation such as
ADD rd, ra, rb, LD rd, [ra + off], or CALL.
Because of the toolchain constraints above, portable source does not encode
most operands as textual instruction arguments. Instead, register choices,
inline immediate values, and small fixed parameters are fused into opcode
names, following the generated-table style used by p1/gen/p1_gen.py.
So the notation in this document is descriptive rather than literal:
ADD rd, ra, rbmeans a family of fused register-specific opcodesADDI rd, ra, immmeans a family of fused register-and-immediate-specific opcodesENTER sizemeans a family of fused byte-count-specific opcodesLDARG rd, idxmeans a family of fused register-and-argument-slot-specific opcodesBR rs,CALLR rs, andTAILR rsmean register-specific control-flow opcodesERET,CALL,RET,TAIL,B, andSYSCALLremain operand-free
Labels still appear in source where the toolchain supports them directly, such
as LA rd, %label and LA_BR %label.
Register Model
Exposed registers
P1 exposes the following source-level registers:
a0–a3— argument registers. Also caller-saved general registers.t0–t2— caller-saved temporaries.s0–s3— callee-saved general registers.sp— stack pointer.
Hidden registers
The backend may reserve additional native registers that are never visible in P1 source:
br— branch / call target mechanism, implemented as a dedicated hidden native register on every target- backend-local scratch used entirely within one instruction expansion
No hidden register may carry a live P1 value across an instruction boundary.
Calling Convention
Arguments and return values
P1 defines three result conventions: one-word direct, two-word direct, and indirect.
In the one-word direct-result convention:
- Explicit argument words 0-3 live in
a0-a3. - Additional explicit argument words live in the incoming stack-argument area
and are read with
LDARG. - On return, a one-word result lives in
a0.
In the two-word direct-result convention:
- Explicit argument words 0-3 live in
a0-a3on entry. - Additional explicit argument words still live in the incoming stack-argument area.
- On return,
a0holds result word 0 anda1holds result word 1.
In the indirect-result convention:
- The caller passes a writable result buffer pointer in
a0. - Explicit argument words 0-2 then live in
a1-a3. - Additional explicit argument words still live in the incoming stack-argument area.
- On return,
a0holds the same result buffer pointer value.
In both direct-result conventions, incoming stack-argument slot 0 corresponds
to explicit argument word 4. In the indirect-result convention, incoming
stack-argument slot 0 corresponds to explicit argument word 3.
The two-word direct-result convention covers common cases such as 64-bit integer results on 32-bit targets, two-word aggregates, and divmod-style returns. The indirect-result convention is the portable way to return any result wider than two words.
Register preservation
Caller-saved:
a0–a3t0–t2
Callee-saved:
s0–s3sp
Call semantics
A function that issues any CALL, CALLR, TAIL, or TAILR must establish a
standard frame with ENTER before its first such op. A leaf that issues none
of those may omit the frame entirely.
If a function needs any incoming argument after making a call, it must save it
before the call. This matters in particular for a0, which is overwritten by
every convention's return value, and for a1 when the callee uses the two-word
direct-result convention.
A call that passes any stack argument words requires the caller to have an active standard frame with enough frame-local storage to stage those outgoing words.
The return address is hidden machine state. Portable source must not assume that it lives in any exposed register.
Stack Convention
Call-boundary rule
At every call boundary, the backend must satisfy the native C ABI stack alignment rule for the target architecture.
Portable source must therefore treat raw function-entry sp as opaque. It may
not assume that the low bits of sp have the same meaning on all targets
before a frame is established.
Incoming stack-argument area
P1 defines an abstract incoming stack-argument area for explicit argument words that do not fit in registers.
- Slot
0is the first stack-passed explicit argument word. - Slots are word-indexed, not byte-indexed.
- Portable source may access this area only through
LDARG.
LDARG is valid only when the current function has an active standard frame.
Therefore, a function that needs any incoming stack argument must establish a
standard frame before its first LDARG.
Portable source must not assume any direct relationship between incoming
argument slots and raw function-entry sp. In particular, source must not try
to reconstruct stack arguments by manually indexing from sp; backend entry
layouts differ across targets.
For a call with m stack-passed explicit argument words, the caller stages
those words in the first m words of its frame-local storage immediately
before the call:
[sp + 0*WORD] = outgoing arg word 0
[sp + 1*WORD] = outgoing arg word 1
...
At callee entry, those staged words become incoming argument slots 0..m-1.
The backend is responsible for mapping between the caller's frame layout and
the callee's abstract incoming argument slots.
Portable code that needs both ordinary locals and stack-passed outgoing arguments must reserve enough total frame-local storage and keep the low- addressed prefix available for outgoing argument staging across the call.
Standard frame layout
Functions that need local stack storage establish a standard frame with
ENTER size. After frame establishment, the portable-visible frame-local
storage occupies the first size bytes above sp:
[sp + 0 ... sp + size - 1] = frame-local storage
Frame-local storage is byte-addressed. Portable code may use it for ordinary locals, spilled callee-saved registers, and the caller-staged outgoing stack-argument words described above.
Each frame also carries backend-private per-frame state — typically the
saved return continuation, saved caller sp, and any padding needed to
satisfy STACK_ALIGN. That state is not addressable by portable source,
and the backend chooses its layout and total allocation size.
Word sizes:
WORD = 8in P1-64WORD = 4in P1-32
STACK_ALIGN is target-defined and must satisfy the native call ABI at
every call boundary.
Leaf functions that need no frame-local storage may omit the frame entirely.
Frame invariants
- A function that allocates a frame must restore
spbefore returning. - Callee-saved registers modified by the function must be restored before returning.
- The standard frame layout is the only frame shape recognized by P1.
Op Set Summary
| Category | Operations |
|---|---|
| Materialization | LI rd, imm, LA rd, %label, LA_BR %label |
| Moves | MOV rd, rs, MOV rd, sp |
| Arithmetic | ADD, SUB, AND, OR, XOR, SHL, SHR, SAR, MUL, DIV, REM |
| Immediate arithmetic | ADDI, ANDI, ORI, SHLI, SHRI, SARI |
| Memory | LD, ST, LB, SB |
| ABI access | LDARG |
| Branching | B, BR, BEQ, BNE, BLT, BLTU, BEQZ, BNEZ, BLTZ |
| Calls / returns | CALL, CALLR, RET, ERET, TAIL, TAILR |
| Frame management | ENTER |
| System | SYSCALL |
Immediates
Immediate operands appear only in instructions that explicitly admit them. Portable source has three immediate classes:
- Inline integer immediate — a signed 12-bit assembly-time constant in the
range
-2048..2047 - Materialized word value — a full one-word assembly-time constant loaded
with
LI - Materialized address — the address of a label loaded with
LA
P1 also uses two structured assembly-time operands:
- Frame-local byte count — a non-negative byte count used by
ENTER - Argument-slot index — a non-negative word-slot index used by
LDARG
LI rd, imm loads the one-word integer value imm.
LA rd, %label loads the address of %label as a one-word pointer value.
The backend may realize LI and LA using native immediates, literal pools,
multi-instruction sequences, or other backend-private mechanisms.
Backends may assume labels fit in 32 bits when realizing LA and LA_BR.
This reflects the stage0 image layout (hex2-0 base 0x00600000, programs
well under 4 GB), not a portable-ISA-level guarantee. Backends that target
images loaded above the 4 GB boundary must adjust their LA / LA_BR
lowering. LI makes no such assumption — it materializes any one-word value.
Control Flow
Call / Return / Tail Call
Control-flow targets are materialized with LA_BR %label, which loads
%label into the hidden branch-target mechanism br. The immediately
following control-flow op consumes that target.
CALL transfers control to the target most recently loaded by LA_BR and
establishes a return continuation such that a subsequent RET returns to the
instruction after the CALL. CALL requires an active standard frame.
CALLR rs is the register-indirect form of CALL. It transfers control to
the code pointer value held in rs and establishes the same return
continuation semantics as CALL.
RET returns from a leaf function through the hidden return continuation
captured at call time. RET is valid only when the current function has no
active standard frame.
ERET returns from a function that has an active standard frame. It
performs the standard epilogue — restoring sp and the hidden return
continuation — and then returns to the caller. ERET is valid only when
the current function has an active standard frame.
TAIL is a tail call to the target most recently loaded by LA_BR. It is
valid only when the current function has an active standard frame. TAIL
performs the standard epilogue for the current frame and then transfers control
to the loaded target without creating a new return continuation. The callee
therefore returns directly to the current function's caller.
TAILR rs is the register-indirect form of TAIL. It is valid only when the
current function has an active standard frame.
Because stack-passed outgoing argument words are staged in the caller's own
frame-local storage, TAIL and TAILR are portable only when the tail-called
callee requires no stack-passed argument words. Portable compilers must lower
other tail-call cases to an ordinary CALL / ERET sequence.
Portable source must treat the return continuation as hidden machine state. It must not assume that the return address lives in any exposed register or stack location except as defined by the standard frame layout after frame establishment.
Prologue / Epilogue
P1 has a single frame-establishment op, ENTER size. Frame teardown is
not a standalone op; it is embedded in ERET, TAIL, and TAILR.
ENTER size establishes a standard frame with size bytes of frame-local
storage. After it executes:
[sp + 0 ... sp + size - 1] = frame-local storage
Any backend-private per-frame state (saved return continuation, saved
caller sp, alignment padding) lives outside the portable-visible size
bytes. Portable source may not address it.
ERET, TAIL, and TAILR each perform the standard epilogue — restoring
sp and the hidden return continuation — and then transfer control: ERET
to the caller, TAIL to the target in br, and TAILR to the target in
rs. Portable source must use one of these ops (not RET) to exit a
function that has established a frame.
A function may omit ENTER entirely if it is a leaf and needs no frame.
Such a function exits with RET.
ENTER does not implicitly save or restore s0-s3. A function that
modifies any callee-saved register must preserve it explicitly, typically
by storing it in frame-local storage within its standard frame.
Branching
P1 branch targets are carried through the hidden branch-target mechanism
br. Portable source may load br only through:
LA_BR %label— materialize the address of%labelas the next branch, call, or tail-call target
No branch, call, or tail opcode takes a label operand directly. Portable source
must treat br as owned by the control-flow machinery. No live value may be
carried in br. Each LA_BR must be consumed by the immediately following
branch, call, or tail op, and portable source must not rely on br surviving
across any other instruction.
The portable branch families are:
B— unconditional branch to the target inbrBR rs— unconditional branch to the code pointer inrsBEQ,BNE,BLT,BLTU— conditional branch to the target inbrBEQZ,BNEZ,BLTZ— conditional branch to the target inbrusing zero as the second operand
BLT and BLTZ perform signed comparisons on one-word values. BLTU
performs an unsigned comparison on one-word values; there is no unsigned
zero-operand variant because x < 0 is always false under unsigned
interpretation.
If a branch condition is true, control transfers to the target currently held in
br. If the condition is false, execution falls through to the next
instruction.
Data Ops
Arithmetic
P1 defines the following arithmetic and bitwise operations on one-word values:
- register-register:
ADD,SUB,AND,OR,XOR,SHL,SHR,SAR,MUL,DIV,REM - immediate:
ADDI,ANDI,ORI,SHLI,SHRI,SARI
For ADD, SUB, MUL, AND, OR, and XOR, computation is modulo the
active word size.
SHL shifts left and discards high bits. SHR is a logical right shift and
zero-fills. SAR is an arithmetic right shift and sign-fills.
For register-count shifts, only the low 5 bits of the shift count are
observed in P1-32, and only the low 6 bits are observed in P1-64.
Immediate-form shifts use inline immediates in the range 0..31 in P1-32
and 0..63 in P1-64.
DIV is signed division on one-word two's-complement values and truncates
toward zero. REM is the corresponding signed remainder.
Division by zero is outside the portable contract. The overflow case
MIN_INT / -1 is also outside the portable contract, as is the corresponding
remainder case.
Moves
P1 defines the following move and materialization operations:
MOV— register-to-register copyLI— load one-word integer constantLA— load label address
MOV may copy from any exposed general register to any exposed general
register.
Portable source may also read the current stack pointer through MOV rd, sp.
Portable source may not write sp through MOV. Stack-pointer updates are only
performed by ENTER, ERET, TAIL, TAILR, and backend-private call/return
machinery.
LI materializes an integer bit-pattern. LA materializes the address of a
label. LA_BR is a separate control-flow-target materialization form and is not
part of the general move family.
Memory
P1 defines the following memory-access operations:
LD,ST— one-word load and storeLB,SB— byte load and storeLDARG— one-word load from the incoming stack-argument area
LD and ST access one full word: 4 bytes in P1-32 and 8 bytes in
P1-64.
LB loads one byte and zero-extends it to a full word. SB stores the low
8 bits of the source value.
Memory offsets use signed 12-bit inline immediates.
The base address for a memory access may be any exposed general register or
sp.
LDARG rd, idx loads incoming stack-argument slot idx, where slot 0 is the
first stack-passed explicit argument word. idx is word-indexed, not
byte-indexed. LDARG is an ABI access, not a general memory operation; it does
not expose or imply any raw sp-relative layout at function entry.
LDARG is valid only when the current function has an active standard frame.
Portable source must not assume that labels are aligned beyond what is
explicitly established by the program itself. Portable code should use
naturally aligned addresses for LD and ST. Unaligned word accesses are
outside the portable contract. Byte accesses have no additional alignment
requirement.
Program Entry
P1 defines a portable program-entry model so that portable source
does not need to know how the native loader delivers argc / argv.
The target backend is responsible for emitting a per-arch _start
stub that:
- Captures
argcand a pointer to the first argv word from the native entry stack or registers. - Calls the portable label
p1_mainunder the one-word direct-result convention with:a0=argca1=argv, a pointer to the first argv word. Subsequent slots live at offsets1*WORD,2*WORD, ..., and the list is terminated by a NULL word.
- On return from
p1_main, performssys_exitusing the returned value ina0as the exit status.
Portable P1 source defines p1_main as an ordinary P1 function and
reads argc / argv through the standard calling convention.
At entry to p1_main, the native entry-stack layout has already been
consumed by the backend stub. Portable source may not assume anything
about the sp value inherited from _start except that it satisfies
the call-boundary alignment rule and that the standard frame protocol
(ENTER / ERET) works correctly from it.
p1_main may return normally, or it may call sys_exit directly at
any point.
The portable entry model does not expose envp or auxiliary vectors.
Targets that need them must provide target-specific extensions; those
are outside the portable P1 surface.
System
SYSCALL is part of the portable ISA surface.
At the portable level, the syscall convention is:
a0= syscall number on entry, return value on exita1,a2,a3,t0,s0,s1= syscall arguments 0 through 5
At the portable level, SYSCALL clobbers only a0. All other exposed
registers are preserved across the syscall.
The mapping from symbolic syscall names to numeric syscall identifiers is target-defined. The set of syscalls available to a given program is likewise specified outside the core P1 ISA, for example by a target profile or runtime interface document.
Target notes
a0is argument 0, the one-word direct return-value register, the low word of the two-word direct return pair, and the indirect-result buffer pointer.- Some targets call conventions hand results back in registers that differ
from the native equivalent of
a0ora1; such targets must translate between portable and native return registers at call and return boundaries. - On targets whose native call instruction pushes a return address,
LDARGmust account for that slot. On other targets,LDARGmaps more directly to entryspplus the backend's standard frame/header policy. bris implemented as a dedicated hidden native register on every target.- Each backend chooses the native register that holds each P1 register. Those choices are backend-private and may differ from native ABI conventions; backends may preserve P1 caller-saved registers that happen to land in natively callee-saved native registers as a matter of backend policy.
- Frame-pointer use is backend policy, not part of the P1 architectural register set.