libp1pp
Scope
libp1pp is a small portable utility library for P1pp programs, written in P1pp itself. It provides:
- M1PP control-flow macros that wrap P1 branches into structured forms
- Common byte and string primitives
- Integer parsing and formatting
- Character predicates
- Thin syscall wrappers and higher-level IO helpers
- A single-arena bump allocator
- Panic and assertion helpers
libp1pp is a single source file, p1pp.P1pp, composed via catm after the
target backend header and before user source:
catm P1-aarch64.M1pp p1pp.P1pp usersrc.P1pp > program.M1
Because p1pp.P1pp passes through M1PP, it freely mixes P1 code and M1PP macro
definitions.
Target and conventions
Width
libp1pp v1 targets P1-64 only. Word size is 8 bytes. Pointer values, integer results, and syscall arguments are all one word.
Porting libp1pp to P1-32 is out of scope for this document.
Syscall numbers
libp1pp does not hard-code syscall numbers. It relies on the backend header
to provide the p1_sys_<name> data-word macros already defined by existing
backends (e.g. %p1_sys_read(), %p1_sys_write()). User code should not
issue syscalls through raw numbers; it calls the libp1pp wrappers instead.
Error style
IO functions follow kernel conventions: a negative return indicates an error
(typically -errno), a non-negative return is a success value.
Functions that return a pointer use 0 (NULL) to indicate failure.
Parsers return two words under the two-word direct-result convention:
(value, consumed). consumed == 0 means the input did not begin with a
syntactically valid token; the value word is then unspecified.
Functions whose return type is "nothing meaningful" return 0 in a0.
String representation
libp1pp uses two string conventions, distinguished by the function name:
(buf, len)pair — an explicit pointer/length pair. The default.cstr— a NUL-terminated byte pointer. Only functions whose name includescstrexpect or emit NUL termination.
libp1pp never mutates a caller-provided buffer it was not passed as an output parameter.
Allocation
libp1pp functions do not allocate. Anything that produces bytes writes into a caller-provided buffer whose capacity is the caller's responsibility. The single exception is the bump allocator, which allocates only from a region the caller explicitly installed.
Internal label namespace
libp1pp reserves the label prefix libp1pp__ for all internal state and
helper labels — bump allocator cursor/base/cap words, internal scratch
buffers used by print_int / print_hex, private helper routines.
User code must not define labels beginning with libp1pp__, and must
not reference them directly; everything libp1pp exposes is reachable
through its documented functions and macros.
Public entry points (the functions and macros listed in this document,
such as memcpy, bump_alloc, %if_eq) are unprefixed. A user who
sees an undefined-label error for a libp1pp__ name at link time has
almost certainly forgotten to catm p1pp.P1pp into the build.
Initialization
libp1pp requires no global init step at program entry. Subsystems are either self-initializing or require an explicit per-subsystem init call, documented with that subsystem.
In v1 the only subsystem that requires explicit init is the bump
allocator: bump_alloc called before bump_init returns 0 (the
"arena exhausted" sentinel) because no arena is installed yet. Every
other libp1pp function is callable from the first instruction of
p1_main.
p1_main itself inherits the portable entry contract from P1:
a0 = argc, a1 = argv. libp1pp does not wrap or interpose on
p1_main.
Control-flow macros
All control-flow macros take braced blocks as arguments. The braces are
M1PP argument delimiters; they are stripped on substitution. Inside a
block, :@name and &@name local labels resolve within the macro
expansion's own namespace, so nested control flow is safe.
Condition suffixes
Each conditional family is expanded once per condition. Suffixes:
- Two-operand:
eq,ne,lt,ltu - Zero-operand (implicit compare against zero):
eqz,nez,ltz
lt and ltz are signed comparisons. ltu is unsigned. These mirror the
P1 branch opcodes BEQ, BNE, BLT, BLTU, BEQZ, BNEZ, BLTZ.
%if_<cc> / %ifelse_<cc>
%if_eq(ra, rb, { body })
%if_ne(ra, rb, { body })
%if_lt(ra, rb, { body })
%if_ltu(ra, rb, { body })
%if_eqz(ra, { body })
%if_nez(ra, { body })
%if_ltz(ra, { body })
%ifelse_eq(ra, rb, { tblk }, { fblk })
%ifelse_ne(ra, rb, { tblk }, { fblk })
%ifelse_lt(ra, rb, { tblk }, { fblk })
%ifelse_ltu(ra, rb, { tblk }, { fblk })
%ifelse_eqz(ra, { tblk }, { fblk })
%ifelse_nez(ra, { tblk }, { fblk })
%ifelse_ltz(ra, { tblk }, { fblk })
%if_<cc> executes the block when the condition is true and falls through
otherwise. %ifelse_<cc> executes tblk on true and fblk on false, then
falls through to the code after the macro.
Neither form establishes a new frame or changes sp. A block that issues
a CALL must sit inside a function that has already established a frame
with ENTER.
%while_<cc> / %do_while_<cc>
%while_eq(ra, rb, { body })
%while_ne(ra, rb, { body })
%while_lt(ra, rb, { body })
%while_ltu(ra, rb, { body })
%while_eqz(ra, { body })
%while_nez(ra, { body })
%while_ltz(ra, { body })
%do_while_eq(ra, rb, { body })
%do_while_ne(ra, rb, { body })
%do_while_lt(ra, rb, { body })
%do_while_ltu(ra, rb, { body })
%do_while_eqz(ra, { body })
%do_while_nez(ra, { body })
%do_while_ltz(ra, { body })
%while_<cc> tests the condition before the body; %do_while_<cc> after.
In both, the condition is a positive sense ("continue while ra == rb").
The operand registers are re-read on every iteration, so body may update
them.
All %while_<cc> macros share a single lowering pattern so they work
uniformly across conditions, including lt, ltu, and ltz which have no
inverted P1 branches.
%for_lt
%for_lt(i_reg, n_reg, { body })
Counts i_reg from 0 up to but not including n_reg, with step +1,
under signed comparison. On entry, i_reg is set to 0; after each body
iteration, i_reg is incremented by 1; the loop exits once
i_reg < n_reg is false.
n_reg is re-read each iteration, so body may update the bound. Body may
read i_reg but must not otherwise modify it. If body issues a CALL,
the caller is responsible for keeping i_reg live across the call — in
practice, this means i_reg should be a callee-saved register (s0–s3)
or explicitly spilled.
libp1pp does not provide an unsigned variant, an immediate-bound variant, a
step-by-k variant, or a count-down variant. Pointer iteration and
other shapes are better expressed as %while_<cc> plus explicit
increments.
%loop
%loop({ body })
An unconditional loop with no built-in exit. The body runs forever unless
it transfers control out by another mechanism. libp1pp does not provide
%break or %continue; a loop that needs mid-body exit should use
explicit labels:
:scan_loop
...
BEQZ a0, &scan_end
...
B &scan_loop
:scan_end
Tagged loops: %loop_tag, %while_tag_<cc>, %for_lt_tag
Tagged loops predate M1PP's
%scopefeature. They still work, but new code should prefer the scoped equivalents (%loop_scoped,%while_scoped_<cc>,%for_lt_scoped) paired with the generic%break/%continue— no tag argument required.
M1PP's @ local-label mechanism is scoped to the defining macro's body: an
&@name token passed to a macro through an argument is not stamped and
does not share a namespace with the receiving macro. Consequently, a
generic %break / %continue that uses @ cannot be written.
libp1pp provides a tagged variant family for loops that need mid-body exit
or explicit continue. The tag becomes a label-name prefix via ## paste,
so references cross every macro boundary cleanly.
%loop_tag(tag, { body })
%while_tag_eq(tag, ra, rb, { body })
%while_tag_ne(tag, ra, rb, { body })
%while_tag_lt(tag, ra, rb, { body })
%while_tag_ltu(tag, ra, rb, { body })
%while_tag_eqz(tag, ra, { body })
%while_tag_nez(tag, ra, { body })
%while_tag_ltz(tag, ra, { body })
%for_lt_tag(tag, i_reg, n_reg, { body })
Each tagged loop emits two ordinary labels: :tag_top at the point where
a %continue(tag) should land, and :tag_end immediately after the
loop. For top-tested %while_tag_<cc>, tag_top names the condition
test; for %for_lt_tag, tag_top names the increment-and-test block;
for %loop_tag, tag_top names the head of the body.
%break(tag)
Emits B &tag_end. Transfers control out of the enclosing tagged loop.
%continue(tag)
Emits B &tag_top. Transfers control to the enclosing tagged loop's
re-test / increment point.
Both %break and %continue work from arbitrary depth inside a tagged
loop, including inside %if_<cc>, %ifelse_<cc>, or another nested
loop's body. They resolve purely by label name, not by macro-expansion
namespace.
Tags are not scoped: tag_top and tag_end are ordinary hex2 labels
visible across the whole program. Tags must therefore be unique within a
function, and conventionally across functions as well. The recommended
style is <function>_<role> (e.g., parse_outer, scan_inner). hex2
reports a duplicate-label error if two loops share a tag; libp1pp does not
detect this at macro-expansion time.
Untagged forms (%while_<cc>, %for_lt, %loop) are preferred when the
body does not need %break or %continue. They nest without the user
picking names, and their local labels cannot collide.
Frame locals
libp1pp does not introduce a new local-variable macro. Use M1PP's %struct
directly: its 8-byte stride matches WORD on P1-64, and it already
synthesizes %name.SIZE for ENTER.
%struct parse_f { state cursor endp tmp }
:parse_one
ENTER %parse_f.SIZE
ST a0, [sp + %parse_f.state]
ST a1, [sp + %parse_f.cursor]
...
ERET
If the function stages stack-passed outgoing arguments for calls with more than four word arguments, reserve the low-addressed fields for that staging:
%struct parse_f { _o0 _o1 state cursor endp tmp }
The caller places outgoing argument word k at [sp + k * 8] immediately
before the CALL, then reads locals from higher offsets. libp1pp does not
otherwise enforce this convention.
Function definition
%fn(name, size, { body })
Defines a non-leaf function named name with size bytes of
frame-local storage. Expands to:
- a global label
:nameat the function entry, - a
%scope namepush, so labels insidebodyare short (::start,::done) and mangle toname__start,name__done, - an
%enter(size)prologue, - the body,
- an
%eret()epilogue, - a matching
%endscope.
%fn is a scope-introducing-with-block macro: it pushes the scope
name around body. Any %break / %continue directly in body
would target name__end / name__top — which %fn itself does not
define, so those should only appear inside a nested scope-introducing
loop.
Example:
%struct parse_f { state cursor }
%fn(parse_number, %parse_f.SIZE, {
ST a0, [sp + %parse_f.state]
ST a1, [sp + %parse_f.cursor]
...
BEQZ t0, &::done
...
::done
LD a0, [sp + %parse_f.state]
})
size may be a literal byte count, a %struct SIZE reference, or any
M1PP-time integer expression that the backend %enter macro accepts.
Leaf functions that need no frame do not use %fn: they write the
entry label, body, and %ret() directly, and may optionally wrap the
body in %scope name / %endscope if they want scoped labels.
Memory and strings
Byte-buffer primitives
memcpy(dst, src, n) -> dst
memset(dst, byte, n) -> dst
memcmp(a, b, n) -> sign # -1 / 0 / 1
memcpy does not support overlapping ranges where dst > src && dst < src + n.
memset stores only the low 8 bits of byte.
memcmp performs an unsigned byte-wise three-way compare and returns
-1, 0, or 1. It stops at the first differing byte.
NUL-terminated strings
strlen(cstr) -> n
streq(a_cstr, b_cstr) -> 0 or 1
strcmp(a_cstr, b_cstr) -> sign # -1 / 0 / 1
strlen returns the byte count up to but not including the terminating NUL.
streq returns 1 iff the two strings are byte-equal including length.
strcmp compares byte-wise until either a differing byte is found or one
side's NUL is reached, and returns the sign of the first difference (the
shorter string compares less when it is a prefix of the other).
Integer parsing and formatting
Parsers
parse_dec(buf, len) -> (value, consumed)
parse_hex(buf, len) -> (value, consumed)
Both use the two-word direct-result convention: a0 holds the parsed
integer value and a1 holds the number of bytes consumed. consumed == 0
means the input did not start with a valid literal.
parse_dec accepts an optional leading - followed by one or more decimal
digits. On overflow, the result is truncated to 64 bits modulo 2^64;
detection of overflow is not part of the portable contract.
parse_hex accepts one or more hex digits (0-9, a-f, A-F). It does
not consume a 0x prefix; callers handle any prefix themselves. The
result is the unsigned value of the parsed digits, truncated to 64 bits.
Parsers do not skip leading whitespace.
Formatters
fmt_dec(buf, value) -> n_bytes
fmt_hex(buf, value) -> n_bytes
Both write a human-readable representation into buf, starting at offset
0, and return the number of bytes written. Neither writes a terminating
NUL.
fmt_dec emits a signed decimal representation: a leading - for
negative values, then one or more decimal digits. At most 20 bytes are
written.
fmt_hex emits an unsigned lowercase hex representation with no prefix
and no leading zeros (except that 0 is rendered as 0). At most 16
bytes are written.
Callers provide a buffer at least as large as the documented maximum.
Character predicates
All predicates take a single one-byte value (passed as a word; the high
bits are ignored) and return 1 or 0.
is_digit(c) -> 0 or 1 # '0'..'9'
is_hex_digit(c) -> 0 or 1 # 0-9, a-f, A-F
is_space(c) -> 0 or 1 # ' ', '\t', '\n', '\r', '\v', '\f'
is_alpha(c) -> 0 or 1 # a-z, A-Z
is_alnum(c) -> 0 or 1 # is_alpha OR is_digit
Predicates are functions in v1 and may become macros later.
IO
Raw syscall wrappers
sys_read(fd, buf, len) -> n # bytes read; 0 at EOF; <0 error
sys_write(fd, buf, len) -> n # bytes written; <0 error
sys_open(path_cstr, flags, mode)
-> fd # fd >= 0 on success; <0 error
sys_close(fd) -> r # 0 on success; <0 error
sys_exit(code) -> ! # does not return
These are thin wrappers over the P1 SYSCALL op. They set the syscall
number themselves using the backend's %p1_sys_<name> data-word macros,
marshal arguments into the syscall-argument registers, and return the raw
kernel return value unchanged.
sys_open is a logical open: the backend may implement it via open or
openat(AT_FDCWD, ...) as appropriate for the target.
sys_exit terminates the process with the low 8 bits of code as the
exit status. It never returns.
No wrapper interprets the negative return as a specific errno. Callers
that need such detail inspect a0 directly.
Print helpers
print(buf, len) -> r # 0 on success; <0 error
println(buf, len) -> r # writes buf then "\n"
print_cstr(cstr) -> r # writes strlen(cstr) bytes
print_int(value) -> r # decimal
print_hex(value) -> r # hex, no prefix
eprint(buf, len) -> r
eprintln(buf, len) -> r
eprint_cstr(cstr) -> r
print* helpers write to fd 1; eprint* to fd 2. All return 0 on a
successful write of all bytes, or a negative value if the underlying
sys_write reported an error. A partial write is retried until complete
or the kernel returns an error.
print_int and print_hex render into a small internal stack buffer,
then write. They allocate no heap memory.
File helpers
read_file(path_cstr, buf, cap)
-> n # bytes read, or -1
Opens path_cstr read-only, reads up to cap bytes into buf, and
closes the fd. Returns the number of bytes read on success, or -1 if
the file could not be opened, a read failed, or the file exceeds cap
(in which case buf may have been partially written).
write_file(path_cstr, buf, len)
-> r # 0 on success; -1 on error
Creates or truncates path_cstr, writes len bytes from buf, and
closes the fd. Returns 0 on success or -1 if any step failed. The
created file's mode is implementation-defined but intended to be a
reasonable default (typically 0644).
Bump allocator
libp1pp provides a single global bump allocator. Memory is carved from a caller-supplied region; libp1pp does not own or reserve storage itself.
bump_init(base, cap) -> 0
Installs [base, base + cap) as the live arena and sets the cursor to
base. Discards any prior state. base should be word-aligned; cap
should be a multiple of 8. libp1pp does not validate these.
bump_alloc(n) -> ptr # 0 on exhaustion
Advances the cursor by n bytes rounded up to the next multiple of 8 and
returns the pre-advance cursor. Returns 0 and leaves the cursor
unchanged if the rounded-up request would exceed the arena.
Returned memory is not zeroed. Callers that need zero-init memset themselves.
bump_mark() -> saved
Returns the current cursor value as an opaque word.
bump_release(saved) -> 0
Rewinds the cursor to a value previously returned by bump_mark. Any
pointers handed out since that mark become invalid. Passing a value that
was not produced by bump_mark against the currently installed arena is
undefined behavior.
bump_reset() -> 0
Rewinds the cursor to the arena's base.
v1 provides one arena. Multi-arena variants are deferred.
Panic and assertions
panic
panic(msg_cstr) -> !
Writes msg_cstr followed by "\n" to fd 2, then calls sys_exit(1).
Does not return.
panic is used from libp1pp internally only for unrecoverable programmer
errors (none in v1). User code is encouraged to use it for its own
invariant violations.
%assert_<cc> macros
%assert_eq(ra, rb, msg_label)
%assert_ne(ra, rb, msg_label)
%assert_lt(ra, rb, msg_label)
%assert_ltu(ra, rb, msg_label)
%assert_eqz(ra, msg_label)
%assert_nez(ra, msg_label)
%assert_ltz(ra, msg_label)
Each macro asserts that the named condition holds and calls panic with
msg_label (a NUL-terminated string label in the program image) if it
does not. They lower to a B<cc> past an LA a0, msg_label /
CALL &panic sequence, so the non-failure path adds no runtime cost
beyond the original branch.
Because the failure path issues a CALL, %assert_* may be used only in
functions that have established a frame with ENTER.
Not in v1
The following were considered and deferred:
- Field-access helpers such as
%ld_field—LD rd, [base + %S.f]is short enough. printf-style formatted output — replaced by dedicatedprint_*andfmt_*primitives.- Multiple bump arenas — one global arena covers bootstrap needs.
strcpy/strcat— length-aware callers should usememcpywith an explicit byte count.- P1-32 support.
envp/ auxv / command-line-aware helpers beyond whatp1_mainalready receives.