commit 87e3956d3b23b7ef1e537e75fc304782bd3df6bb
parent ff0fd1204fcb984c90ec5ba308bef63a3d04d76a
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Fri, 24 Apr 2026 11:11:17 -0700
LIBP1PP.md
Diffstat:
| A | docs/LIBP1PP.md | | | 571 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 571 insertions(+), 0 deletions(-)
diff --git a/docs/LIBP1PP.md b/docs/LIBP1PP.md
@@ -0,0 +1,571 @@
+# libp1pp
+
+## Scope
+
+libp1pp is a small portable utility library for P1pp programs, written in
+P1pp itself. It provides:
+
+- M1PP control-flow macros that wrap P1 branches into structured forms
+- Common byte and string primitives
+- Integer parsing and formatting
+- Character predicates
+- Thin syscall wrappers and higher-level IO helpers
+- A single-arena bump allocator
+- Panic and assertion helpers
+
+libp1pp is a single source file, `p1pp.P1pp`, composed via `catm` after the
+target backend header and before user source:
+
+ catm P1-aarch64.M1pp p1pp.P1pp usersrc.P1pp > program.M1
+
+Because `p1pp.P1pp` passes through M1PP, it freely mixes P1 code and M1PP macro
+definitions.
+
+## Target and conventions
+
+### Width
+
+libp1pp v1 targets **P1v2-64 only**. Word size is 8 bytes. Pointer values,
+integer results, and syscall arguments are all one word.
+
+Porting libp1pp to P1v2-32 is out of scope for this document.
+
+### Syscall numbers
+
+libp1pp does not hard-code syscall numbers. It relies on the backend header
+to provide the `p1_sys_<name>` data-word macros already defined by existing
+backends (e.g. `%p1_sys_read()`, `%p1_sys_write()`). User code should not
+issue syscalls through raw numbers; it calls the libp1pp wrappers instead.
+
+### Error style
+
+IO functions follow kernel conventions: a negative return indicates an error
+(typically `-errno`), a non-negative return is a success value.
+
+Functions that return a pointer use `0` (`NULL`) to indicate failure.
+
+Parsers return two words under the two-word direct-result convention:
+`(value, consumed)`. `consumed == 0` means the input did not begin with a
+syntactically valid token; the `value` word is then unspecified.
+
+Functions whose return type is "nothing meaningful" return `0` in `a0`.
+
+### String representation
+
+libp1pp uses two string conventions, distinguished by the function name:
+
+- **`(buf, len)` pair** — an explicit pointer/length pair. The default.
+- **`cstr`** — a NUL-terminated byte pointer. Only functions whose name
+ includes `cstr` expect or emit NUL termination.
+
+libp1pp never mutates a caller-provided buffer it was not passed as an
+output parameter.
+
+### Allocation
+
+libp1pp functions do not allocate. Anything that produces bytes writes into
+a caller-provided buffer whose capacity is the caller's responsibility.
+The single exception is the bump allocator, which allocates only from a
+region the caller explicitly installed.
+
+### Internal label namespace
+
+libp1pp reserves the label prefix `libp1pp__` for all internal state and
+helper labels — bump allocator cursor/base/cap words, internal scratch
+buffers used by `print_int` / `print_hex`, private helper routines.
+User code must not define labels beginning with `libp1pp__`, and must
+not reference them directly; everything libp1pp exposes is reachable
+through its documented functions and macros.
+
+Public entry points (the functions and macros listed in this document,
+such as `memcpy`, `bump_alloc`, `%if_eq`) are unprefixed. A user who
+sees an undefined-label error for a `libp1pp__` name at link time has
+almost certainly forgotten to `catm` `p1pp.P1pp` into the build.
+
+### Initialization
+
+libp1pp requires no global init step at program entry. Subsystems are
+either self-initializing or require an explicit per-subsystem init
+call, documented with that subsystem.
+
+In v1 the only subsystem that requires explicit init is the bump
+allocator: `bump_alloc` called before `bump_init` returns `0` (the
+"arena exhausted" sentinel) because no arena is installed yet. Every
+other libp1pp function is callable from the first instruction of
+`p1_main`.
+
+`p1_main` itself inherits the portable entry contract from P1 v2:
+`a0` = `argc`, `a1` = `argv`. libp1pp does not wrap or interpose on
+`p1_main`.
+
+## Control-flow macros
+
+All control-flow macros take braced blocks as arguments. The braces are
+M1PP argument delimiters; they are stripped on substitution. Inside a
+block, `:@name` and `&@name` local labels resolve within the macro
+expansion's own namespace, so nested control flow is safe.
+
+### Condition suffixes
+
+Each conditional family is expanded once per condition. Suffixes:
+
+- Two-operand: `eq`, `ne`, `lt`, `ltu`
+- Zero-operand (implicit compare against zero): `eqz`, `nez`, `ltz`
+
+`lt` and `ltz` are signed comparisons. `ltu` is unsigned. These mirror the
+P1 branch opcodes `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ`.
+
+### `%if_<cc>` / `%ifelse_<cc>`
+
+ %if_eq(ra, rb, { body })
+ %if_ne(ra, rb, { body })
+ %if_lt(ra, rb, { body })
+ %if_ltu(ra, rb, { body })
+ %if_eqz(ra, { body })
+ %if_nez(ra, { body })
+ %if_ltz(ra, { body })
+
+ %ifelse_eq(ra, rb, { tblk }, { fblk })
+ %ifelse_ne(ra, rb, { tblk }, { fblk })
+ %ifelse_lt(ra, rb, { tblk }, { fblk })
+ %ifelse_ltu(ra, rb, { tblk }, { fblk })
+ %ifelse_eqz(ra, { tblk }, { fblk })
+ %ifelse_nez(ra, { tblk }, { fblk })
+ %ifelse_ltz(ra, { tblk }, { fblk })
+
+`%if_<cc>` executes the block when the condition is true and falls through
+otherwise. `%ifelse_<cc>` executes `tblk` on true and `fblk` on false, then
+falls through to the code after the macro.
+
+Neither form establishes a new frame or changes `sp`. A block that issues
+a `CALL` must sit inside a function that has already established a frame
+with `ENTER`.
+
+### `%while_<cc>` / `%do_while_<cc>`
+
+ %while_eq(ra, rb, { body })
+ %while_ne(ra, rb, { body })
+ %while_lt(ra, rb, { body })
+ %while_ltu(ra, rb, { body })
+ %while_eqz(ra, { body })
+ %while_nez(ra, { body })
+ %while_ltz(ra, { body })
+
+ %do_while_eq(ra, rb, { body })
+ %do_while_ne(ra, rb, { body })
+ %do_while_lt(ra, rb, { body })
+ %do_while_ltu(ra, rb, { body })
+ %do_while_eqz(ra, { body })
+ %do_while_nez(ra, { body })
+ %do_while_ltz(ra, { body })
+
+`%while_<cc>` tests the condition before the body; `%do_while_<cc>` after.
+In both, the condition is a positive sense ("continue while `ra == rb`").
+The operand registers are re-read on every iteration, so body may update
+them.
+
+All `%while_<cc>` macros share a single lowering pattern so they work
+uniformly across conditions, including `lt`, `ltu`, and `ltz` which have no
+inverted P1 branches.
+
+### `%for_lt`
+
+ %for_lt(i_reg, n_reg, { body })
+
+Counts `i_reg` from `0` up to but not including `n_reg`, with step `+1`,
+under signed comparison. On entry, `i_reg` is set to `0`; after each body
+iteration, `i_reg` is incremented by `1`; the loop exits once
+`i_reg < n_reg` is false.
+
+`n_reg` is re-read each iteration, so body may update the bound. Body may
+read `i_reg` but must not otherwise modify it. If body issues a `CALL`,
+the caller is responsible for keeping `i_reg` live across the call — in
+practice, this means `i_reg` should be a callee-saved register (`s0`–`s3`)
+or explicitly spilled.
+
+libp1pp does not provide an unsigned variant, an immediate-bound variant, a
+step-by-`k` variant, or a count-down variant. Pointer iteration and
+other shapes are better expressed as `%while_<cc>` plus explicit
+increments.
+
+### `%loop`
+
+ %loop({ body })
+
+An unconditional loop with no built-in exit. The body runs forever unless
+it transfers control out by another mechanism. libp1pp does not provide
+`%break` or `%continue`; a loop that needs mid-body exit should use
+explicit labels:
+
+ :scan_loop
+ ...
+ LA_BR &scan_end
+ BEQZ a0
+ ...
+ LA_BR &scan_loop
+ B
+ :scan_end
+
+### Tagged loops: `%loop_tag`, `%while_tag_<cc>`, `%for_lt_tag`
+
+> **Planned migration.** When the M1PP scope feature
+> (`docs/M1PP-SCOPE.md`) lands, the tagged-loop family is retired in
+> favor of scoped equivalents (`%loop_scoped`, `%while_scoped_<cc>`,
+> `%for_lt_scoped`) paired with a generic `%break()` / `%continue()`.
+> The tagged forms remain in libp1pp v1 until scopes ship, so existing
+> callers keep working; new code should prefer the scoped forms once
+> they are available.
+
+M1PP's `@` local-label mechanism is scoped to the defining macro's body: an
+`&@name` token passed to a macro through an argument is not stamped and
+does not share a namespace with the receiving macro. Consequently, a
+generic `%break` / `%continue` that uses `@` cannot be written.
+
+libp1pp provides a tagged variant family for loops that need mid-body exit
+or explicit continue. The tag becomes a label-name prefix via `##` paste,
+so references cross every macro boundary cleanly.
+
+ %loop_tag(tag, { body })
+
+ %while_tag_eq(tag, ra, rb, { body })
+ %while_tag_ne(tag, ra, rb, { body })
+ %while_tag_lt(tag, ra, rb, { body })
+ %while_tag_ltu(tag, ra, rb, { body })
+ %while_tag_eqz(tag, ra, { body })
+ %while_tag_nez(tag, ra, { body })
+ %while_tag_ltz(tag, ra, { body })
+
+ %for_lt_tag(tag, i_reg, n_reg, { body })
+
+Each tagged loop emits two ordinary labels: `:tag_top` at the point where
+a `%continue(tag)` should land, and `:tag_end` immediately after the
+loop. For top-tested `%while_tag_<cc>`, `tag_top` names the condition
+test; for `%for_lt_tag`, `tag_top` names the increment-and-test block;
+for `%loop_tag`, `tag_top` names the head of the body.
+
+ %break(tag)
+
+Emits `LA_BR &tag_end; B`. Transfers control out of the enclosing
+tagged loop.
+
+ %continue(tag)
+
+Emits `LA_BR &tag_top; B`. Transfers control to the enclosing tagged
+loop's re-test / increment point.
+
+Both `%break` and `%continue` work from arbitrary depth inside a tagged
+loop, including inside `%if_<cc>`, `%ifelse_<cc>`, or another nested
+loop's body. They resolve purely by label name, not by macro-expansion
+namespace.
+
+Tags are not scoped: `tag_top` and `tag_end` are ordinary hex2 labels
+visible across the whole program. Tags must therefore be unique within a
+function, and conventionally across functions as well. The recommended
+style is `<function>_<role>` (e.g., `parse_outer`, `scan_inner`). hex2
+reports a duplicate-label error if two loops share a tag; libp1pp does not
+detect this at macro-expansion time.
+
+Untagged forms (`%while_<cc>`, `%for_lt`, `%loop`) are preferred when the
+body does not need `%break` or `%continue`. They nest without the user
+picking names, and their local labels cannot collide.
+
+## Frame locals
+
+libp1pp does not introduce a new local-variable macro. Use M1PP's `%struct`
+directly: its 8-byte stride matches `WORD` on P1v2-64, and it already
+synthesizes `%name.SIZE` for `ENTER`.
+
+ %struct parse_f { state cursor endp tmp }
+
+ :parse_one
+ ENTER %parse_f.SIZE
+ ST a0, [sp + %parse_f.state]
+ ST a1, [sp + %parse_f.cursor]
+ ...
+ ERET
+
+If the function stages stack-passed outgoing arguments for calls with more
+than four word arguments, reserve the low-addressed fields for that
+staging:
+
+ %struct parse_f { _o0 _o1 state cursor endp tmp }
+
+The caller places outgoing argument word `k` at `[sp + k * 8]` immediately
+before the `CALL`, then reads locals from higher offsets. libp1pp does not
+otherwise enforce this convention.
+
+## Function definition
+
+ %fn(name, size, { body })
+
+Defines a non-leaf function named `name` with `size` bytes of
+frame-local storage. Expands to:
+
+- a global label `:name` at the function entry,
+- a `%scope name` push, so labels inside `body` are short
+ (`::start`, `::done`) and mangle to `name__start`, `name__done`,
+- an `%enter(size)` prologue,
+- the body,
+- an `%eret()` epilogue,
+- a matching `%endscope`.
+
+`%fn` is a scope-introducing-with-block macro in the sense defined by
+M1PP-SCOPE.md. It pushes the scope `name`. Any `%break()` /
+`%continue()` directly in `body` would target `name__end` / `name__top`
+— which `%fn` itself does not define, so those should only appear inside
+a nested scope-introducing loop.
+
+Example:
+
+ %struct parse_f { state cursor }
+
+ %fn(parse_number, %parse_f.SIZE, {
+ ST a0, [sp + %parse_f.state]
+ ST a1, [sp + %parse_f.cursor]
+ ...
+ LA_BR &::done
+ BEQZ t0
+ ...
+ ::done
+ LD a0, [sp + %parse_f.state]
+ })
+
+`size` may be a literal byte count, a `%struct` `SIZE` reference, or any
+M1PP-time integer expression that the backend `%enter` macro accepts.
+
+Leaf functions that need no frame do not use `%fn`: they write the
+entry label, body, and `%ret()` directly, and may optionally wrap the
+body in `%scope name` / `%endscope` if they want scoped labels.
+
+## Memory and strings
+
+### Byte-buffer primitives
+
+ memcpy(dst, src, n) -> dst
+ memset(dst, byte, n) -> dst
+ memcmp(a, b, n) -> sign # -1 / 0 / 1
+
+`memcpy` does not support overlapping ranges where `dst > src && dst < src + n`.
+
+`memset` stores only the low 8 bits of `byte`.
+
+`memcmp` performs an unsigned byte-wise three-way compare and returns
+`-1`, `0`, or `1`. It stops at the first differing byte.
+
+### NUL-terminated strings
+
+ strlen(cstr) -> n
+ streq(a_cstr, b_cstr) -> 0 or 1
+ strcmp(a_cstr, b_cstr) -> sign # -1 / 0 / 1
+
+`strlen` returns the byte count up to but not including the terminating NUL.
+
+`streq` returns `1` iff the two strings are byte-equal including length.
+
+`strcmp` compares byte-wise until either a differing byte is found or one
+side's NUL is reached, and returns the sign of the first difference (the
+shorter string compares less when it is a prefix of the other).
+
+## Integer parsing and formatting
+
+### Parsers
+
+ parse_dec(buf, len) -> (value, consumed)
+ parse_hex(buf, len) -> (value, consumed)
+
+Both use the two-word direct-result convention: `a0` holds the parsed
+integer value and `a1` holds the number of bytes consumed. `consumed == 0`
+means the input did not start with a valid literal.
+
+`parse_dec` accepts an optional leading `-` followed by one or more decimal
+digits. On overflow, the result is truncated to 64 bits modulo 2^64;
+detection of overflow is not part of the portable contract.
+
+`parse_hex` accepts one or more hex digits (`0-9`, `a-f`, `A-F`). It does
+not consume a `0x` prefix; callers handle any prefix themselves. The
+result is the unsigned value of the parsed digits, truncated to 64 bits.
+
+Parsers do not skip leading whitespace.
+
+### Formatters
+
+ fmt_dec(buf, value) -> n_bytes
+ fmt_hex(buf, value) -> n_bytes
+
+Both write a human-readable representation into `buf`, starting at offset
+`0`, and return the number of bytes written. Neither writes a terminating
+NUL.
+
+`fmt_dec` emits a signed decimal representation: a leading `-` for
+negative values, then one or more decimal digits. At most 20 bytes are
+written.
+
+`fmt_hex` emits an unsigned lowercase hex representation with no prefix
+and no leading zeros (except that `0` is rendered as `0`). At most 16
+bytes are written.
+
+Callers provide a buffer at least as large as the documented maximum.
+
+## Character predicates
+
+All predicates take a single one-byte value (passed as a word; the high
+bits are ignored) and return `1` or `0`.
+
+ is_digit(c) -> 0 or 1 # '0'..'9'
+ is_hex_digit(c) -> 0 or 1 # 0-9, a-f, A-F
+ is_space(c) -> 0 or 1 # ' ', '\t', '\n', '\r', '\v', '\f'
+ is_alpha(c) -> 0 or 1 # a-z, A-Z
+ is_alnum(c) -> 0 or 1 # is_alpha OR is_digit
+
+Predicates are functions in v1 and may become macros later.
+
+## IO
+
+### Raw syscall wrappers
+
+ sys_read(fd, buf, len) -> n # bytes read; 0 at EOF; <0 error
+ sys_write(fd, buf, len) -> n # bytes written; <0 error
+ sys_open(path_cstr, flags, mode)
+ -> fd # fd >= 0 on success; <0 error
+ sys_close(fd) -> r # 0 on success; <0 error
+ sys_exit(code) -> ! # does not return
+
+These are thin wrappers over the P1 `SYSCALL` op. They set the syscall
+number themselves using the backend's `%p1_sys_<name>` data-word macros,
+marshal arguments into the syscall-argument registers, and return the raw
+kernel return value unchanged.
+
+`sys_open` is a logical open: the backend may implement it via `open` or
+`openat(AT_FDCWD, ...)` as appropriate for the target.
+
+`sys_exit` terminates the process with the low 8 bits of `code` as the
+exit status. It never returns.
+
+No wrapper interprets the negative return as a specific errno. Callers
+that need such detail inspect `a0` directly.
+
+### Print helpers
+
+ print(buf, len) -> r # 0 on success; <0 error
+ println(buf, len) -> r # writes buf then "\n"
+ print_cstr(cstr) -> r # writes strlen(cstr) bytes
+ print_int(value) -> r # decimal
+ print_hex(value) -> r # hex, no prefix
+ eprint(buf, len) -> r
+ eprintln(buf, len) -> r
+ eprint_cstr(cstr) -> r
+
+`print*` helpers write to fd `1`; `eprint*` to fd `2`. All return `0` on a
+successful write of all bytes, or a negative value if the underlying
+`sys_write` reported an error. A partial write is retried until complete
+or the kernel returns an error.
+
+`print_int` and `print_hex` render into a small internal stack buffer,
+then write. They allocate no heap memory.
+
+### File helpers
+
+ read_file(path_cstr, buf, cap)
+ -> n # bytes read, or -1
+
+Opens `path_cstr` read-only, reads up to `cap` bytes into `buf`, and
+closes the fd. Returns the number of bytes read on success, or `-1` if
+the file could not be opened, a read failed, or the file exceeds `cap`
+(in which case `buf` may have been partially written).
+
+ write_file(path_cstr, buf, len)
+ -> r # 0 on success; -1 on error
+
+Creates or truncates `path_cstr`, writes `len` bytes from `buf`, and
+closes the fd. Returns `0` on success or `-1` if any step failed. The
+created file's mode is implementation-defined but intended to be a
+reasonable default (typically `0644`).
+
+## Bump allocator
+
+libp1pp provides a single global bump allocator. Memory is carved from a
+caller-supplied region; libp1pp does not own or reserve storage itself.
+
+ bump_init(base, cap) -> 0
+
+Installs `[base, base + cap)` as the live arena and sets the cursor to
+`base`. Discards any prior state. `base` should be word-aligned; `cap`
+should be a multiple of 8. libp1pp does not validate these.
+
+ bump_alloc(n) -> ptr # 0 on exhaustion
+
+Advances the cursor by `n` bytes rounded up to the next multiple of 8 and
+returns the pre-advance cursor. Returns `0` and leaves the cursor
+unchanged if the rounded-up request would exceed the arena.
+
+Returned memory is not zeroed. Callers that need zero-init memset
+themselves.
+
+ bump_mark() -> saved
+
+Returns the current cursor value as an opaque word.
+
+ bump_release(saved) -> 0
+
+Rewinds the cursor to a value previously returned by `bump_mark`. Any
+pointers handed out since that mark become invalid. Passing a value that
+was not produced by `bump_mark` against the currently installed arena is
+undefined behavior.
+
+ bump_reset() -> 0
+
+Rewinds the cursor to the arena's `base`.
+
+v1 provides one arena. Multi-arena variants are deferred.
+
+## Panic and assertions
+
+### `panic`
+
+ panic(msg_cstr) -> !
+
+Writes `msg_cstr` followed by `"\n"` to fd `2`, then calls `sys_exit(1)`.
+Does not return.
+
+`panic` is used from libp1pp internally only for unrecoverable programmer
+errors (none in v1). User code is encouraged to use it for its own
+invariant violations.
+
+### `%assert_<cc>` macros
+
+ %assert_eq(ra, rb, msg_label)
+ %assert_ne(ra, rb, msg_label)
+ %assert_lt(ra, rb, msg_label)
+ %assert_ltu(ra, rb, msg_label)
+ %assert_eqz(ra, msg_label)
+ %assert_nez(ra, msg_label)
+ %assert_ltz(ra, msg_label)
+
+Each macro asserts that the named condition holds and calls `panic` with
+`msg_label` (a NUL-terminated string label in the program image) if it
+does not. They lower to one `%if_<cc_inverse>` containing a
+`LA a0, msg_label` / `LA_BR &panic` / `CALL` sequence, so the non-failure
+path adds no runtime cost beyond the original branch.
+
+Because the failure path issues a `CALL`, `%assert_*` may be used only in
+functions that have established a frame with `ENTER`.
+
+## Not in v1
+
+The following were considered and deferred:
+
+- Untagged `%break()` / `%continue()` — these become possible once the
+ M1PP scope feature (see `docs/M1PP-SCOPE.md`) lands; until then,
+ callers use the `%break(tag)` / `%continue(tag)` tagged forms. `%fn`
+ already specs against the scope feature and is a standing TODO until
+ that feature is implemented.
+- Field-access helpers such as `%ld_field` — `LD rd, [base + %S.f]` is
+ short enough.
+- `printf`-style formatted output — replaced by dedicated `print_*` and
+ `fmt_*` primitives.
+- Multiple bump arenas — one global arena covers bootstrap needs.
+- `strcpy` / `strcat` — length-aware callers should use `memcpy` with an
+ explicit byte count.
+- P1v2-32 support.
+- `envp` / auxv / command-line-aware helpers beyond what `p1_main`
+ already receives.