boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

libp1pp

Scope

libp1pp is a small portable utility library for P1pp programs, written in P1pp itself. It provides:

libp1pp is a single source file, p1pp.P1pp, composed via catm after the target backend header and before user source:

catm P1-aarch64.M1pp p1pp.P1pp usersrc.P1pp > program.M1

Because p1pp.P1pp passes through M1PP, it freely mixes P1 code and M1PP macro definitions.

Target and conventions

Width

libp1pp v1 targets P1-64 only. Word size is 8 bytes. Pointer values, integer results, and syscall arguments are all one word.

Porting libp1pp to P1-32 is out of scope for this document.

Syscall numbers

libp1pp does not hard-code syscall numbers. It relies on the backend header to provide the p1_sys_<name> data-word macros already defined by existing backends (e.g. %p1_sys_read(), %p1_sys_write()). User code should not issue syscalls through raw numbers; it calls the libp1pp wrappers instead.

Error style

IO functions follow kernel conventions: a negative return indicates an error (typically -errno), a non-negative return is a success value.

Functions that return a pointer use 0 (NULL) to indicate failure.

Parsers return two words under the two-word direct-result convention: (value, consumed). consumed == 0 means the input did not begin with a syntactically valid token; the value word is then unspecified.

Functions whose return type is "nothing meaningful" return 0 in a0.

String representation

libp1pp uses two string conventions, distinguished by the function name:

libp1pp never mutates a caller-provided buffer it was not passed as an output parameter.

Allocation

libp1pp functions do not allocate. Anything that produces bytes writes into a caller-provided buffer whose capacity is the caller's responsibility. The single exception is the bump allocator, which allocates only from a region the caller explicitly installed.

Internal label namespace

libp1pp reserves the label prefix libp1pp__ for all internal state and helper labels — bump allocator cursor/base/cap words, internal scratch buffers used by print_int / print_hex, private helper routines. User code must not define labels beginning with libp1pp__, and must not reference them directly; everything libp1pp exposes is reachable through its documented functions and macros.

Public entry points (the functions and macros listed in this document, such as memcpy, bump_alloc, %if_eq) are unprefixed. A user who sees an undefined-label error for a libp1pp__ name at link time has almost certainly forgotten to catm p1pp.P1pp into the build.

Initialization

libp1pp requires no global init step at program entry. Subsystems are either self-initializing or require an explicit per-subsystem init call, documented with that subsystem.

In v1 the only subsystem that requires explicit init is the bump allocator: bump_alloc called before bump_init returns 0 (the "arena exhausted" sentinel) because no arena is installed yet. Every other libp1pp function is callable from the first instruction of p1_main.

p1_main itself inherits the portable entry contract from P1: a0 = argc, a1 = argv. libp1pp does not wrap or interpose on p1_main.

Control-flow macros

All control-flow macros take braced blocks as arguments. The braces are M1PP argument delimiters; they are stripped on substitution. Inside a block, :@name and &@name local labels resolve within the macro expansion's own namespace, so nested control flow is safe.

Condition suffixes

Each conditional family is expanded once per condition. Suffixes:

lt and ltz are signed comparisons. ltu is unsigned. These mirror the P1 branch opcodes BEQ, BNE, BLT, BLTU, BEQZ, BNEZ, BLTZ.

%if_<cc> / %ifelse_<cc>

%if_eq(ra, rb, { body })
%if_ne(ra, rb, { body })
%if_lt(ra, rb, { body })
%if_ltu(ra, rb, { body })
%if_eqz(ra, { body })
%if_nez(ra, { body })
%if_ltz(ra, { body })

%ifelse_eq(ra, rb, { tblk }, { fblk })
%ifelse_ne(ra, rb, { tblk }, { fblk })
%ifelse_lt(ra, rb, { tblk }, { fblk })
%ifelse_ltu(ra, rb, { tblk }, { fblk })
%ifelse_eqz(ra, { tblk }, { fblk })
%ifelse_nez(ra, { tblk }, { fblk })
%ifelse_ltz(ra, { tblk }, { fblk })

%if_<cc> executes the block when the condition is true and falls through otherwise. %ifelse_<cc> executes tblk on true and fblk on false, then falls through to the code after the macro.

Neither form establishes a new frame or changes sp. A block that issues a CALL must sit inside a function that has already established a frame with ENTER.

%while_<cc> / %do_while_<cc>

%while_eq(ra, rb, { body })
%while_ne(ra, rb, { body })
%while_lt(ra, rb, { body })
%while_ltu(ra, rb, { body })
%while_eqz(ra, { body })
%while_nez(ra, { body })
%while_ltz(ra, { body })

%do_while_eq(ra, rb, { body })
%do_while_ne(ra, rb, { body })
%do_while_lt(ra, rb, { body })
%do_while_ltu(ra, rb, { body })
%do_while_eqz(ra, { body })
%do_while_nez(ra, { body })
%do_while_ltz(ra, { body })

%while_<cc> tests the condition before the body; %do_while_<cc> after. In both, the condition is a positive sense ("continue while ra == rb"). The operand registers are re-read on every iteration, so body may update them.

All %while_<cc> macros share a single lowering pattern so they work uniformly across conditions, including lt, ltu, and ltz which have no inverted P1 branches.

%for_lt

%for_lt(i_reg, n_reg, { body })

Counts i_reg from 0 up to but not including n_reg, with step +1, under signed comparison. On entry, i_reg is set to 0; after each body iteration, i_reg is incremented by 1; the loop exits once i_reg < n_reg is false.

n_reg is re-read each iteration, so body may update the bound. Body may read i_reg but must not otherwise modify it. If body issues a CALL, the caller is responsible for keeping i_reg live across the call — in practice, this means i_reg should be a callee-saved register (s0s3) or explicitly spilled.

libp1pp does not provide an unsigned variant, an immediate-bound variant, a step-by-k variant, or a count-down variant. Pointer iteration and other shapes are better expressed as %while_<cc> plus explicit increments.

%loop

%loop({ body })

An unconditional loop with no built-in exit. The body runs forever unless it transfers control out by another mechanism. libp1pp does not provide %break or %continue; a loop that needs mid-body exit should use explicit labels:

:scan_loop
  ...
  BEQZ a0, &scan_end
  ...
  B &scan_loop
:scan_end

Tagged loops: %loop_tag, %while_tag_<cc>, %for_lt_tag

Tagged loops predate M1PP's %scope feature. They still work, but new code should prefer the scoped equivalents (%loop_scoped, %while_scoped_<cc>, %for_lt_scoped) paired with the generic %break / %continue — no tag argument required.

M1PP's @ local-label mechanism is scoped to the defining macro's body: an &@name token passed to a macro through an argument is not stamped and does not share a namespace with the receiving macro. Consequently, a generic %break / %continue that uses @ cannot be written.

libp1pp provides a tagged variant family for loops that need mid-body exit or explicit continue. The tag becomes a label-name prefix via ## paste, so references cross every macro boundary cleanly.

%loop_tag(tag, { body })

%while_tag_eq(tag, ra, rb, { body })
%while_tag_ne(tag, ra, rb, { body })
%while_tag_lt(tag, ra, rb, { body })
%while_tag_ltu(tag, ra, rb, { body })
%while_tag_eqz(tag, ra, { body })
%while_tag_nez(tag, ra, { body })
%while_tag_ltz(tag, ra, { body })

%for_lt_tag(tag, i_reg, n_reg, { body })

Each tagged loop emits two ordinary labels: :tag_top at the point where a %continue(tag) should land, and :tag_end immediately after the loop. For top-tested %while_tag_<cc>, tag_top names the condition test; for %for_lt_tag, tag_top names the increment-and-test block; for %loop_tag, tag_top names the head of the body.

%break(tag)

Emits B &tag_end. Transfers control out of the enclosing tagged loop.

%continue(tag)

Emits B &tag_top. Transfers control to the enclosing tagged loop's re-test / increment point.

Both %break and %continue work from arbitrary depth inside a tagged loop, including inside %if_<cc>, %ifelse_<cc>, or another nested loop's body. They resolve purely by label name, not by macro-expansion namespace.

Tags are not scoped: tag_top and tag_end are ordinary hex2 labels visible across the whole program. Tags must therefore be unique within a function, and conventionally across functions as well. The recommended style is <function>_<role> (e.g., parse_outer, scan_inner). hex2 reports a duplicate-label error if two loops share a tag; libp1pp does not detect this at macro-expansion time.

Untagged forms (%while_<cc>, %for_lt, %loop) are preferred when the body does not need %break or %continue. They nest without the user picking names, and their local labels cannot collide.

Frame locals

libp1pp does not introduce a new local-variable macro. Use M1PP's %struct directly: its 8-byte stride matches WORD on P1-64, and it already synthesizes %name.SIZE for ENTER.

%struct parse_f { state cursor endp tmp }

:parse_one
ENTER %parse_f.SIZE
  ST   a0, [sp + %parse_f.state]
  ST   a1, [sp + %parse_f.cursor]
  ...
ERET

If the function stages stack-passed outgoing arguments for calls with more than four word arguments, reserve the low-addressed fields for that staging:

%struct parse_f { _o0 _o1 state cursor endp tmp }

The caller places outgoing argument word k at [sp + k * 8] immediately before the CALL, then reads locals from higher offsets. libp1pp does not otherwise enforce this convention.

Function definition

%fn(name, size, { body })

Defines a non-leaf function named name with size bytes of frame-local storage. Expands to:

%fn is a scope-introducing-with-block macro: it pushes the scope name around body. Any %break / %continue directly in body would target name__end / name__top — which %fn itself does not define, so those should only appear inside a nested scope-introducing loop.

Example:

%struct parse_f { state cursor }

%fn(parse_number, %parse_f.SIZE, {
  ST a0, [sp + %parse_f.state]
  ST a1, [sp + %parse_f.cursor]
  ...
  BEQZ t0, &::done
  ...
  ::done
  LD a0, [sp + %parse_f.state]
})

size may be a literal byte count, a %struct SIZE reference, or any M1PP-time integer expression that the backend %enter macro accepts.

Leaf functions that need no frame do not use %fn: they write the entry label, body, and %ret() directly, and may optionally wrap the body in %scope name / %endscope if they want scoped labels.

Memory and strings

Byte-buffer primitives

memcpy(dst, src, n)         -> dst
memset(dst, byte, n)        -> dst
memcmp(a, b, n)             -> sign        # -1 / 0 / 1

memcpy does not support overlapping ranges where dst > src && dst < src + n.

memset stores only the low 8 bits of byte.

memcmp performs an unsigned byte-wise three-way compare and returns -1, 0, or 1. It stops at the first differing byte.

NUL-terminated strings

strlen(cstr)                -> n
streq(a_cstr, b_cstr)       -> 0 or 1
strcmp(a_cstr, b_cstr)      -> sign        # -1 / 0 / 1

strlen returns the byte count up to but not including the terminating NUL.

streq returns 1 iff the two strings are byte-equal including length.

strcmp compares byte-wise until either a differing byte is found or one side's NUL is reached, and returns the sign of the first difference (the shorter string compares less when it is a prefix of the other).

Integer parsing and formatting

Parsers

parse_dec(buf, len)         -> (value, consumed)
parse_hex(buf, len)         -> (value, consumed)

Both use the two-word direct-result convention: a0 holds the parsed integer value and a1 holds the number of bytes consumed. consumed == 0 means the input did not start with a valid literal.

parse_dec accepts an optional leading - followed by one or more decimal digits. On overflow, the result is truncated to 64 bits modulo 2^64; detection of overflow is not part of the portable contract.

parse_hex accepts one or more hex digits (0-9, a-f, A-F). It does not consume a 0x prefix; callers handle any prefix themselves. The result is the unsigned value of the parsed digits, truncated to 64 bits.

Parsers do not skip leading whitespace.

Formatters

fmt_dec(buf, value)         -> n_bytes
fmt_hex(buf, value)         -> n_bytes

Both write a human-readable representation into buf, starting at offset 0, and return the number of bytes written. Neither writes a terminating NUL.

fmt_dec emits a signed decimal representation: a leading - for negative values, then one or more decimal digits. At most 20 bytes are written.

fmt_hex emits an unsigned lowercase hex representation with no prefix and no leading zeros (except that 0 is rendered as 0). At most 16 bytes are written.

Callers provide a buffer at least as large as the documented maximum.

Character predicates

All predicates take a single one-byte value (passed as a word; the high bits are ignored) and return 1 or 0.

is_digit(c)                 -> 0 or 1     # '0'..'9'
is_hex_digit(c)             -> 0 or 1     # 0-9, a-f, A-F
is_space(c)                 -> 0 or 1     # ' ', '\t', '\n', '\r', '\v', '\f'
is_alpha(c)                 -> 0 or 1     # a-z, A-Z
is_alnum(c)                 -> 0 or 1     # is_alpha OR is_digit

Predicates are functions in v1 and may become macros later.

IO

Raw syscall wrappers

sys_read(fd, buf, len)      -> n          # bytes read; 0 at EOF; <0 error
sys_write(fd, buf, len)     -> n          # bytes written; <0 error
sys_open(path_cstr, flags, mode)
                            -> fd         # fd >= 0 on success; <0 error
sys_close(fd)               -> r          # 0 on success; <0 error
sys_exit(code)              -> !          # does not return

These are thin wrappers over the P1 SYSCALL op. They set the syscall number themselves using the backend's %p1_sys_<name> data-word macros, marshal arguments into the syscall-argument registers, and return the raw kernel return value unchanged.

sys_open is a logical open: the backend may implement it via open or openat(AT_FDCWD, ...) as appropriate for the target.

sys_exit terminates the process with the low 8 bits of code as the exit status. It never returns.

No wrapper interprets the negative return as a specific errno. Callers that need such detail inspect a0 directly.

Print helpers

print(buf, len)             -> r          # 0 on success; <0 error
println(buf, len)           -> r          # writes buf then "\n"
print_cstr(cstr)            -> r          # writes strlen(cstr) bytes
print_int(value)            -> r          # decimal
print_hex(value)            -> r          # hex, no prefix
eprint(buf, len)            -> r
eprintln(buf, len)          -> r
eprint_cstr(cstr)           -> r

print* helpers write to fd 1; eprint* to fd 2. All return 0 on a successful write of all bytes, or a negative value if the underlying sys_write reported an error. A partial write is retried until complete or the kernel returns an error.

print_int and print_hex render into a small internal stack buffer, then write. They allocate no heap memory.

File helpers

read_file(path_cstr, buf, cap)
                            -> n          # bytes read, or -1

Opens path_cstr read-only, reads up to cap bytes into buf, and closes the fd. Returns the number of bytes read on success, or -1 if the file could not be opened, a read failed, or the file exceeds cap (in which case buf may have been partially written).

write_file(path_cstr, buf, len)
                            -> r          # 0 on success; -1 on error

Creates or truncates path_cstr, writes len bytes from buf, and closes the fd. Returns 0 on success or -1 if any step failed. The created file's mode is implementation-defined but intended to be a reasonable default (typically 0644).

Bump allocator

libp1pp provides a single global bump allocator. Memory is carved from a caller-supplied region; libp1pp does not own or reserve storage itself.

bump_init(base, cap)        -> 0

Installs [base, base + cap) as the live arena and sets the cursor to base. Discards any prior state. base should be word-aligned; cap should be a multiple of 8. libp1pp does not validate these.

bump_alloc(n)               -> ptr        # 0 on exhaustion

Advances the cursor by n bytes rounded up to the next multiple of 8 and returns the pre-advance cursor. Returns 0 and leaves the cursor unchanged if the rounded-up request would exceed the arena.

Returned memory is not zeroed. Callers that need zero-init memset themselves.

bump_mark()                 -> saved

Returns the current cursor value as an opaque word.

bump_release(saved)         -> 0

Rewinds the cursor to a value previously returned by bump_mark. Any pointers handed out since that mark become invalid. Passing a value that was not produced by bump_mark against the currently installed arena is undefined behavior.

bump_reset()                -> 0

Rewinds the cursor to the arena's base.

v1 provides one arena. Multi-arena variants are deferred.

Panic and assertions

panic

panic(msg_cstr)             -> !

Writes msg_cstr followed by "\n" to fd 2, then calls sys_exit(1). Does not return.

panic is used from libp1pp internally only for unrecoverable programmer errors (none in v1). User code is encouraged to use it for its own invariant violations.

%assert_<cc> macros

%assert_eq(ra, rb, msg_label)
%assert_ne(ra, rb, msg_label)
%assert_lt(ra, rb, msg_label)
%assert_ltu(ra, rb, msg_label)
%assert_eqz(ra, msg_label)
%assert_nez(ra, msg_label)
%assert_ltz(ra, msg_label)

Each macro asserts that the named condition holds and calls panic with msg_label (a NUL-terminated string label in the program image) if it does not. They lower to a B<cc> past an LA a0, msg_label / CALL &panic sequence, so the non-failure path adds no runtime cost beyond the original branch.

Because the failure path issues a CALL, %assert_* may be used only in functions that have established a frame with ENTER.

Not in v1

The following were considered and deferred: