boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit 87e3956d3b23b7ef1e537e75fc304782bd3df6bb
parent ff0fd1204fcb984c90ec5ba308bef63a3d04d76a
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 24 Apr 2026 11:11:17 -0700

LIBP1PP.md

Diffstat:
Adocs/LIBP1PP.md | 571+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 571 insertions(+), 0 deletions(-)

diff --git a/docs/LIBP1PP.md b/docs/LIBP1PP.md @@ -0,0 +1,571 @@ +# libp1pp + +## Scope + +libp1pp is a small portable utility library for P1pp programs, written in +P1pp itself. It provides: + +- M1PP control-flow macros that wrap P1 branches into structured forms +- Common byte and string primitives +- Integer parsing and formatting +- Character predicates +- Thin syscall wrappers and higher-level IO helpers +- A single-arena bump allocator +- Panic and assertion helpers + +libp1pp is a single source file, `p1pp.P1pp`, composed via `catm` after the +target backend header and before user source: + + catm P1-aarch64.M1pp p1pp.P1pp usersrc.P1pp > program.M1 + +Because `p1pp.P1pp` passes through M1PP, it freely mixes P1 code and M1PP macro +definitions. + +## Target and conventions + +### Width + +libp1pp v1 targets **P1v2-64 only**. Word size is 8 bytes. Pointer values, +integer results, and syscall arguments are all one word. + +Porting libp1pp to P1v2-32 is out of scope for this document. + +### Syscall numbers + +libp1pp does not hard-code syscall numbers. It relies on the backend header +to provide the `p1_sys_<name>` data-word macros already defined by existing +backends (e.g. `%p1_sys_read()`, `%p1_sys_write()`). User code should not +issue syscalls through raw numbers; it calls the libp1pp wrappers instead. + +### Error style + +IO functions follow kernel conventions: a negative return indicates an error +(typically `-errno`), a non-negative return is a success value. + +Functions that return a pointer use `0` (`NULL`) to indicate failure. + +Parsers return two words under the two-word direct-result convention: +`(value, consumed)`. `consumed == 0` means the input did not begin with a +syntactically valid token; the `value` word is then unspecified. + +Functions whose return type is "nothing meaningful" return `0` in `a0`. + +### String representation + +libp1pp uses two string conventions, distinguished by the function name: + +- **`(buf, len)` pair** — an explicit pointer/length pair. The default. +- **`cstr`** — a NUL-terminated byte pointer. Only functions whose name + includes `cstr` expect or emit NUL termination. + +libp1pp never mutates a caller-provided buffer it was not passed as an +output parameter. + +### Allocation + +libp1pp functions do not allocate. Anything that produces bytes writes into +a caller-provided buffer whose capacity is the caller's responsibility. +The single exception is the bump allocator, which allocates only from a +region the caller explicitly installed. + +### Internal label namespace + +libp1pp reserves the label prefix `libp1pp__` for all internal state and +helper labels — bump allocator cursor/base/cap words, internal scratch +buffers used by `print_int` / `print_hex`, private helper routines. +User code must not define labels beginning with `libp1pp__`, and must +not reference them directly; everything libp1pp exposes is reachable +through its documented functions and macros. + +Public entry points (the functions and macros listed in this document, +such as `memcpy`, `bump_alloc`, `%if_eq`) are unprefixed. A user who +sees an undefined-label error for a `libp1pp__` name at link time has +almost certainly forgotten to `catm` `p1pp.P1pp` into the build. + +### Initialization + +libp1pp requires no global init step at program entry. Subsystems are +either self-initializing or require an explicit per-subsystem init +call, documented with that subsystem. + +In v1 the only subsystem that requires explicit init is the bump +allocator: `bump_alloc` called before `bump_init` returns `0` (the +"arena exhausted" sentinel) because no arena is installed yet. Every +other libp1pp function is callable from the first instruction of +`p1_main`. + +`p1_main` itself inherits the portable entry contract from P1 v2: +`a0` = `argc`, `a1` = `argv`. libp1pp does not wrap or interpose on +`p1_main`. + +## Control-flow macros + +All control-flow macros take braced blocks as arguments. The braces are +M1PP argument delimiters; they are stripped on substitution. Inside a +block, `:@name` and `&@name` local labels resolve within the macro +expansion's own namespace, so nested control flow is safe. + +### Condition suffixes + +Each conditional family is expanded once per condition. Suffixes: + +- Two-operand: `eq`, `ne`, `lt`, `ltu` +- Zero-operand (implicit compare against zero): `eqz`, `nez`, `ltz` + +`lt` and `ltz` are signed comparisons. `ltu` is unsigned. These mirror the +P1 branch opcodes `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ`. + +### `%if_<cc>` / `%ifelse_<cc>` + + %if_eq(ra, rb, { body }) + %if_ne(ra, rb, { body }) + %if_lt(ra, rb, { body }) + %if_ltu(ra, rb, { body }) + %if_eqz(ra, { body }) + %if_nez(ra, { body }) + %if_ltz(ra, { body }) + + %ifelse_eq(ra, rb, { tblk }, { fblk }) + %ifelse_ne(ra, rb, { tblk }, { fblk }) + %ifelse_lt(ra, rb, { tblk }, { fblk }) + %ifelse_ltu(ra, rb, { tblk }, { fblk }) + %ifelse_eqz(ra, { tblk }, { fblk }) + %ifelse_nez(ra, { tblk }, { fblk }) + %ifelse_ltz(ra, { tblk }, { fblk }) + +`%if_<cc>` executes the block when the condition is true and falls through +otherwise. `%ifelse_<cc>` executes `tblk` on true and `fblk` on false, then +falls through to the code after the macro. + +Neither form establishes a new frame or changes `sp`. A block that issues +a `CALL` must sit inside a function that has already established a frame +with `ENTER`. + +### `%while_<cc>` / `%do_while_<cc>` + + %while_eq(ra, rb, { body }) + %while_ne(ra, rb, { body }) + %while_lt(ra, rb, { body }) + %while_ltu(ra, rb, { body }) + %while_eqz(ra, { body }) + %while_nez(ra, { body }) + %while_ltz(ra, { body }) + + %do_while_eq(ra, rb, { body }) + %do_while_ne(ra, rb, { body }) + %do_while_lt(ra, rb, { body }) + %do_while_ltu(ra, rb, { body }) + %do_while_eqz(ra, { body }) + %do_while_nez(ra, { body }) + %do_while_ltz(ra, { body }) + +`%while_<cc>` tests the condition before the body; `%do_while_<cc>` after. +In both, the condition is a positive sense ("continue while `ra == rb`"). +The operand registers are re-read on every iteration, so body may update +them. + +All `%while_<cc>` macros share a single lowering pattern so they work +uniformly across conditions, including `lt`, `ltu`, and `ltz` which have no +inverted P1 branches. + +### `%for_lt` + + %for_lt(i_reg, n_reg, { body }) + +Counts `i_reg` from `0` up to but not including `n_reg`, with step `+1`, +under signed comparison. On entry, `i_reg` is set to `0`; after each body +iteration, `i_reg` is incremented by `1`; the loop exits once +`i_reg < n_reg` is false. + +`n_reg` is re-read each iteration, so body may update the bound. Body may +read `i_reg` but must not otherwise modify it. If body issues a `CALL`, +the caller is responsible for keeping `i_reg` live across the call — in +practice, this means `i_reg` should be a callee-saved register (`s0`–`s3`) +or explicitly spilled. + +libp1pp does not provide an unsigned variant, an immediate-bound variant, a +step-by-`k` variant, or a count-down variant. Pointer iteration and +other shapes are better expressed as `%while_<cc>` plus explicit +increments. + +### `%loop` + + %loop({ body }) + +An unconditional loop with no built-in exit. The body runs forever unless +it transfers control out by another mechanism. libp1pp does not provide +`%break` or `%continue`; a loop that needs mid-body exit should use +explicit labels: + + :scan_loop + ... + LA_BR &scan_end + BEQZ a0 + ... + LA_BR &scan_loop + B + :scan_end + +### Tagged loops: `%loop_tag`, `%while_tag_<cc>`, `%for_lt_tag` + +> **Planned migration.** When the M1PP scope feature +> (`docs/M1PP-SCOPE.md`) lands, the tagged-loop family is retired in +> favor of scoped equivalents (`%loop_scoped`, `%while_scoped_<cc>`, +> `%for_lt_scoped`) paired with a generic `%break()` / `%continue()`. +> The tagged forms remain in libp1pp v1 until scopes ship, so existing +> callers keep working; new code should prefer the scoped forms once +> they are available. + +M1PP's `@` local-label mechanism is scoped to the defining macro's body: an +`&@name` token passed to a macro through an argument is not stamped and +does not share a namespace with the receiving macro. Consequently, a +generic `%break` / `%continue` that uses `@` cannot be written. + +libp1pp provides a tagged variant family for loops that need mid-body exit +or explicit continue. The tag becomes a label-name prefix via `##` paste, +so references cross every macro boundary cleanly. + + %loop_tag(tag, { body }) + + %while_tag_eq(tag, ra, rb, { body }) + %while_tag_ne(tag, ra, rb, { body }) + %while_tag_lt(tag, ra, rb, { body }) + %while_tag_ltu(tag, ra, rb, { body }) + %while_tag_eqz(tag, ra, { body }) + %while_tag_nez(tag, ra, { body }) + %while_tag_ltz(tag, ra, { body }) + + %for_lt_tag(tag, i_reg, n_reg, { body }) + +Each tagged loop emits two ordinary labels: `:tag_top` at the point where +a `%continue(tag)` should land, and `:tag_end` immediately after the +loop. For top-tested `%while_tag_<cc>`, `tag_top` names the condition +test; for `%for_lt_tag`, `tag_top` names the increment-and-test block; +for `%loop_tag`, `tag_top` names the head of the body. + + %break(tag) + +Emits `LA_BR &tag_end; B`. Transfers control out of the enclosing +tagged loop. + + %continue(tag) + +Emits `LA_BR &tag_top; B`. Transfers control to the enclosing tagged +loop's re-test / increment point. + +Both `%break` and `%continue` work from arbitrary depth inside a tagged +loop, including inside `%if_<cc>`, `%ifelse_<cc>`, or another nested +loop's body. They resolve purely by label name, not by macro-expansion +namespace. + +Tags are not scoped: `tag_top` and `tag_end` are ordinary hex2 labels +visible across the whole program. Tags must therefore be unique within a +function, and conventionally across functions as well. The recommended +style is `<function>_<role>` (e.g., `parse_outer`, `scan_inner`). hex2 +reports a duplicate-label error if two loops share a tag; libp1pp does not +detect this at macro-expansion time. + +Untagged forms (`%while_<cc>`, `%for_lt`, `%loop`) are preferred when the +body does not need `%break` or `%continue`. They nest without the user +picking names, and their local labels cannot collide. + +## Frame locals + +libp1pp does not introduce a new local-variable macro. Use M1PP's `%struct` +directly: its 8-byte stride matches `WORD` on P1v2-64, and it already +synthesizes `%name.SIZE` for `ENTER`. + + %struct parse_f { state cursor endp tmp } + + :parse_one + ENTER %parse_f.SIZE + ST a0, [sp + %parse_f.state] + ST a1, [sp + %parse_f.cursor] + ... + ERET + +If the function stages stack-passed outgoing arguments for calls with more +than four word arguments, reserve the low-addressed fields for that +staging: + + %struct parse_f { _o0 _o1 state cursor endp tmp } + +The caller places outgoing argument word `k` at `[sp + k * 8]` immediately +before the `CALL`, then reads locals from higher offsets. libp1pp does not +otherwise enforce this convention. + +## Function definition + + %fn(name, size, { body }) + +Defines a non-leaf function named `name` with `size` bytes of +frame-local storage. Expands to: + +- a global label `:name` at the function entry, +- a `%scope name` push, so labels inside `body` are short + (`::start`, `::done`) and mangle to `name__start`, `name__done`, +- an `%enter(size)` prologue, +- the body, +- an `%eret()` epilogue, +- a matching `%endscope`. + +`%fn` is a scope-introducing-with-block macro in the sense defined by +M1PP-SCOPE.md. It pushes the scope `name`. Any `%break()` / +`%continue()` directly in `body` would target `name__end` / `name__top` +— which `%fn` itself does not define, so those should only appear inside +a nested scope-introducing loop. + +Example: + + %struct parse_f { state cursor } + + %fn(parse_number, %parse_f.SIZE, { + ST a0, [sp + %parse_f.state] + ST a1, [sp + %parse_f.cursor] + ... + LA_BR &::done + BEQZ t0 + ... + ::done + LD a0, [sp + %parse_f.state] + }) + +`size` may be a literal byte count, a `%struct` `SIZE` reference, or any +M1PP-time integer expression that the backend `%enter` macro accepts. + +Leaf functions that need no frame do not use `%fn`: they write the +entry label, body, and `%ret()` directly, and may optionally wrap the +body in `%scope name` / `%endscope` if they want scoped labels. + +## Memory and strings + +### Byte-buffer primitives + + memcpy(dst, src, n) -> dst + memset(dst, byte, n) -> dst + memcmp(a, b, n) -> sign # -1 / 0 / 1 + +`memcpy` does not support overlapping ranges where `dst > src && dst < src + n`. + +`memset` stores only the low 8 bits of `byte`. + +`memcmp` performs an unsigned byte-wise three-way compare and returns +`-1`, `0`, or `1`. It stops at the first differing byte. + +### NUL-terminated strings + + strlen(cstr) -> n + streq(a_cstr, b_cstr) -> 0 or 1 + strcmp(a_cstr, b_cstr) -> sign # -1 / 0 / 1 + +`strlen` returns the byte count up to but not including the terminating NUL. + +`streq` returns `1` iff the two strings are byte-equal including length. + +`strcmp` compares byte-wise until either a differing byte is found or one +side's NUL is reached, and returns the sign of the first difference (the +shorter string compares less when it is a prefix of the other). + +## Integer parsing and formatting + +### Parsers + + parse_dec(buf, len) -> (value, consumed) + parse_hex(buf, len) -> (value, consumed) + +Both use the two-word direct-result convention: `a0` holds the parsed +integer value and `a1` holds the number of bytes consumed. `consumed == 0` +means the input did not start with a valid literal. + +`parse_dec` accepts an optional leading `-` followed by one or more decimal +digits. On overflow, the result is truncated to 64 bits modulo 2^64; +detection of overflow is not part of the portable contract. + +`parse_hex` accepts one or more hex digits (`0-9`, `a-f`, `A-F`). It does +not consume a `0x` prefix; callers handle any prefix themselves. The +result is the unsigned value of the parsed digits, truncated to 64 bits. + +Parsers do not skip leading whitespace. + +### Formatters + + fmt_dec(buf, value) -> n_bytes + fmt_hex(buf, value) -> n_bytes + +Both write a human-readable representation into `buf`, starting at offset +`0`, and return the number of bytes written. Neither writes a terminating +NUL. + +`fmt_dec` emits a signed decimal representation: a leading `-` for +negative values, then one or more decimal digits. At most 20 bytes are +written. + +`fmt_hex` emits an unsigned lowercase hex representation with no prefix +and no leading zeros (except that `0` is rendered as `0`). At most 16 +bytes are written. + +Callers provide a buffer at least as large as the documented maximum. + +## Character predicates + +All predicates take a single one-byte value (passed as a word; the high +bits are ignored) and return `1` or `0`. + + is_digit(c) -> 0 or 1 # '0'..'9' + is_hex_digit(c) -> 0 or 1 # 0-9, a-f, A-F + is_space(c) -> 0 or 1 # ' ', '\t', '\n', '\r', '\v', '\f' + is_alpha(c) -> 0 or 1 # a-z, A-Z + is_alnum(c) -> 0 or 1 # is_alpha OR is_digit + +Predicates are functions in v1 and may become macros later. + +## IO + +### Raw syscall wrappers + + sys_read(fd, buf, len) -> n # bytes read; 0 at EOF; <0 error + sys_write(fd, buf, len) -> n # bytes written; <0 error + sys_open(path_cstr, flags, mode) + -> fd # fd >= 0 on success; <0 error + sys_close(fd) -> r # 0 on success; <0 error + sys_exit(code) -> ! # does not return + +These are thin wrappers over the P1 `SYSCALL` op. They set the syscall +number themselves using the backend's `%p1_sys_<name>` data-word macros, +marshal arguments into the syscall-argument registers, and return the raw +kernel return value unchanged. + +`sys_open` is a logical open: the backend may implement it via `open` or +`openat(AT_FDCWD, ...)` as appropriate for the target. + +`sys_exit` terminates the process with the low 8 bits of `code` as the +exit status. It never returns. + +No wrapper interprets the negative return as a specific errno. Callers +that need such detail inspect `a0` directly. + +### Print helpers + + print(buf, len) -> r # 0 on success; <0 error + println(buf, len) -> r # writes buf then "\n" + print_cstr(cstr) -> r # writes strlen(cstr) bytes + print_int(value) -> r # decimal + print_hex(value) -> r # hex, no prefix + eprint(buf, len) -> r + eprintln(buf, len) -> r + eprint_cstr(cstr) -> r + +`print*` helpers write to fd `1`; `eprint*` to fd `2`. All return `0` on a +successful write of all bytes, or a negative value if the underlying +`sys_write` reported an error. A partial write is retried until complete +or the kernel returns an error. + +`print_int` and `print_hex` render into a small internal stack buffer, +then write. They allocate no heap memory. + +### File helpers + + read_file(path_cstr, buf, cap) + -> n # bytes read, or -1 + +Opens `path_cstr` read-only, reads up to `cap` bytes into `buf`, and +closes the fd. Returns the number of bytes read on success, or `-1` if +the file could not be opened, a read failed, or the file exceeds `cap` +(in which case `buf` may have been partially written). + + write_file(path_cstr, buf, len) + -> r # 0 on success; -1 on error + +Creates or truncates `path_cstr`, writes `len` bytes from `buf`, and +closes the fd. Returns `0` on success or `-1` if any step failed. The +created file's mode is implementation-defined but intended to be a +reasonable default (typically `0644`). + +## Bump allocator + +libp1pp provides a single global bump allocator. Memory is carved from a +caller-supplied region; libp1pp does not own or reserve storage itself. + + bump_init(base, cap) -> 0 + +Installs `[base, base + cap)` as the live arena and sets the cursor to +`base`. Discards any prior state. `base` should be word-aligned; `cap` +should be a multiple of 8. libp1pp does not validate these. + + bump_alloc(n) -> ptr # 0 on exhaustion + +Advances the cursor by `n` bytes rounded up to the next multiple of 8 and +returns the pre-advance cursor. Returns `0` and leaves the cursor +unchanged if the rounded-up request would exceed the arena. + +Returned memory is not zeroed. Callers that need zero-init memset +themselves. + + bump_mark() -> saved + +Returns the current cursor value as an opaque word. + + bump_release(saved) -> 0 + +Rewinds the cursor to a value previously returned by `bump_mark`. Any +pointers handed out since that mark become invalid. Passing a value that +was not produced by `bump_mark` against the currently installed arena is +undefined behavior. + + bump_reset() -> 0 + +Rewinds the cursor to the arena's `base`. + +v1 provides one arena. Multi-arena variants are deferred. + +## Panic and assertions + +### `panic` + + panic(msg_cstr) -> ! + +Writes `msg_cstr` followed by `"\n"` to fd `2`, then calls `sys_exit(1)`. +Does not return. + +`panic` is used from libp1pp internally only for unrecoverable programmer +errors (none in v1). User code is encouraged to use it for its own +invariant violations. + +### `%assert_<cc>` macros + + %assert_eq(ra, rb, msg_label) + %assert_ne(ra, rb, msg_label) + %assert_lt(ra, rb, msg_label) + %assert_ltu(ra, rb, msg_label) + %assert_eqz(ra, msg_label) + %assert_nez(ra, msg_label) + %assert_ltz(ra, msg_label) + +Each macro asserts that the named condition holds and calls `panic` with +`msg_label` (a NUL-terminated string label in the program image) if it +does not. They lower to one `%if_<cc_inverse>` containing a +`LA a0, msg_label` / `LA_BR &panic` / `CALL` sequence, so the non-failure +path adds no runtime cost beyond the original branch. + +Because the failure path issues a `CALL`, `%assert_*` may be used only in +functions that have established a frame with `ENTER`. + +## Not in v1 + +The following were considered and deferred: + +- Untagged `%break()` / `%continue()` — these become possible once the + M1PP scope feature (see `docs/M1PP-SCOPE.md`) lands; until then, + callers use the `%break(tag)` / `%continue(tag)` tagged forms. `%fn` + already specs against the scope feature and is a standing TODO until + that feature is implemented. +- Field-access helpers such as `%ld_field` — `LD rd, [base + %S.f]` is + short enough. +- `printf`-style formatted output — replaced by dedicated `print_*` and + `fmt_*` primitives. +- Multiple bump arenas — one global arena covers bootstrap needs. +- `strcpy` / `strcat` — length-aware callers should use `memcpy` with an + explicit byte count. +- P1v2-32 support. +- `envp` / auxv / command-line-aware helpers beyond what `p1_main` + already receives.