Scheme in P1

Goal

A small tree-walking interpreter for the Scheme subset specified in LISP.md, written in P1 via the M1PP + M1 + M0 + hex2 toolchain. Intent: simplicity and correctness first; performance is a later concern.

The self-hosted C compiler (see PLAN.md) will run under this interpreter, as will shell-style scripts built on shell.scm and prelude.scm.

Toolchain

scheme.P1                 (hand-written M1PP source using P1.M1pp)
   | prepend arch backend + P1.M1pp
   | m1pp
   v
scheme.M1                 (target-specific M1 ops)
   | M0
   v
scheme.hex2
   | hex2
   v
scheme                    (native ELF)

C-shaped snippets in this document are descriptive of the target shape, not the source. The source is M1PP-preprocessed P1 assembly.

Word size

P1-64 is the initial target. Every heap word is 8 bytes; every object is 8-byte aligned. P1-32 is out of scope for the first pass; porting would require a parallel layout story because M1PP's %struct stride is fixed at 8.

Settled decisions

Load-bearing; the rest of the document assumes them.

Tree-walking eval/apply over raw s-expressions. No AST preprocessing, no bytecode, no compilation step. The reader's output is the interpreter's input.
Static buffers, no GC. Fixed-size labeled byte reservations for heap, symbol table, reader buffer, writer buffer. Heap exhaustion aborts with an error.
Bump allocation, 8-byte aligned. Single heap_next pointer (pinned in s0); every object is 8-byte aligned by construction.
Tagged values, low-3-bit tag. See §Value representation.
Symbols are interned indices. No heap object per symbol; the value is an index into the symbol table, which stores the name bytevector.
Environment is a flat association list. ((sym . val) ...), extended by consing onto the front. Lookup walks the list; falls through to the global slot in the symbol table.
Closures capture the env pointer directly. No free-variable analysis, no flat closures, no boxes. A closure is four words: [header, params, body, env].
No set! on captured variables. Language-level rule. The implementation does not enforce it; behavior is unspecified if violated. Use a pair for shared mutable state.
Inner define is sequential-only. Visible to later forms in the same body, not to earlier forms. Use letrec for mutual recursion between local helpers.
pmatch is a built-in special form. Saves writing a macro expander for one feature.
Tail calls via P1 TAIL / TAILR. Every tail position in eval and apply compiles to TAIL (label) or TAILR (register) rather than CALL + RET. Scheme-level tail-call correctness falls out.
Primitive failure is undefined behavior. No runtime checks on (car '()), (quotient n 0), out-of-bounds bytevector access, integer overflow, or mutation of literals. Follows the spec's "Primitive failure" policy.

Memory layout

Labeled byte reservations in the final image. Starting sizes:

Region	Size	Purpose
`heap`	64 MB	bump-allocated Scheme objects
`symtab`	16384 ×	open-addressing intern table
`readbuf`	16 MB	slurped source file
`writebuf`	64 KB	staged output for display/write
`argbuf`	64 × 8 B	primitive dispatch scratch

All sized via M1PP constants; easy to retune. Overflow of any buffer → error(...) → sys_exit(1).

Symbol table entry

struct sym_entry {
    val_t    name_bv;      // interned name, heap bytevector
    val_t    global_val;   // top-level binding (or UNBOUND)
    uint32_t name_hash;
    uint32_t chain_next;
};

global_val gives O(1) top-level lookup: every symbol's global binding is reachable without hash probing once the symbol is resolved.

M1PP layout (widened to word-per-field for 8-byte stride):

%struct SYMENT { name_bv global_val name_hash chain_next }
; .name_bv = 0, .global_val = 8, .name_hash = 16, .chain_next = 24,
; .SIZE = 32

The hash and next fields are logically 32-bit but occupy a full word each to stay on the %struct stride; simpler than packing.

The symbol table and heap are the only global roots.

Value representation

All values are 64-bit tagged words. Low 3 bits are the tag.

Tag	Kind	Notes
`000`	fixnum	61-bit signed; `>> 3` to recover
`001`	pair pointer	16 B aligned; `[p-1]` car, `[p+7]` cdr
`010`	symbol	`(idx << 3) \| 2`; idx into `symtab`
`011`	headered heap object	`[p-3]` is the header word
`100`	immediate singleton	`#f`, `#t`, `()`, unspec, unbound
`101`–`111`	reserved

Tag and immediate constants

Encoded as M1PP %enums. Concrete values derive via expression:

%enum TAG { FIXNUM PAIR SYM HEAP IMM }          ; raw indices 0..4
%enum IMM { FALSE TRUE NIL UNSPEC UNBOUND }     ; raw indices 0..4

; a tag 100 value in low bits is literally %TAG.IMM == 4
%macro MKIMM(idx)
$((| (<< idx 3) %TAG.IMM))
%endm

; usage: %MKIMM(%IMM.FALSE), %MKIMM(%IMM.NIL), etc.

Truthiness: every value except %MKIMM(%IMM.FALSE) is truthy. Test is a single compare.

Object types

This is the complete runtime type set.

Type	Representation
fixnum	immediate (tag 000)
boolean	immediate (`FALSE`, `TRUE`)
empty list	immediate (`NIL`)
unspec / unbound	immediates
pair	`[car][cdr]`, 16 B, tag 001, no header
bytevector	`[hdr = BV \| len<<8][bytes...]`, tag 011
closure	`[hdr = CLOSURE][params][body][env]`, tag 011
primitive	`[hdr = PRIM \| entry_label]`, tag 011
record-td	`[hdr = TD][name_sym][nfields]`, tag 011
record	`[hdr = REC][td][field_0]...[field_{n-1}]`, tag 011

Headered object header word (little-endian bytes):

[ type:8 | aux:56 ]

where aux carries type-specific data (length for bytevectors, entry-label for primitives, nfields for records, etc.). One load tells you the type.

Layouts

%struct PAIR    { car cdr }                      ; .SIZE = 16
%struct CLOSURE { hdr params body env }          ; .SIZE = 32
%struct BV      { hdr }                          ; bytes follow inline
%struct PRIM    { hdr }                          ; header encodes entry
%struct TD      { hdr name_sym nfields }         ; .SIZE = 24
; record is variable-width; td+nfields known from TD
%struct REC_HDR { hdr td }                       ; field_i at offset 16 + 8*i
%enum HDR { BV CLOSURE PRIM TD REC }             ; type-byte values

Strings and bytevectors are the same type. string? ≡ bytevector?. No character type: #\a in source is sugar for fixnum 97.

Primitive encoding

An M1 label is 4 bytes. A primitive fits in a single 8-byte header:

bytes [0..4): &entry_label       ; 4-byte hex2 label reference
bytes [4..6): arity              ; @(arity)   little-endian 2 bytes
byte  [6]  : type = %HDR.PRIM    ; !(%HDR.PRIM)
byte  [7]  : 0                   ; !(0)

This makes apply_prim a header-load plus callr: no dispatch table, no switch. §Apply sketches the sequence.

Tag idioms

A handful of short sequences do all of the runtime's bit-twiddling. They are M1PP macros so the rest of the code reads at the Scheme level. These are illustrative; exact op selection is finalized during implementation.

%macro tagof(rd, rs)            ; rd = rs & 7
%andi(rd, rs, 7)
%endm

%macro make_fixnum(rd, rs)      ; rd = rs << 3
%shli(rd, rs, 3)
%endm

%macro untag_fixnum(rd, rs)     ; rd = rs >> 3, arithmetic
%sari(rd, rs, 3)
%endm

%macro car(rd, rs)              ; pair tag is 001; raw = rs - 1
%ld(rd, rs, -1)                 ; [raw + 0] = car
%endm

%macro cdr(rd, rs)
%ld(rd, rs, 7)                  ; [raw + 8] = cdr
%endm

%macro set_car(rs, rv)
%st(rv, rs, -1)
%endm

%macro set_cdr(rs, rv)
%st(rv, rs, 7)
%endm

%macro hdr_load(rd, rs)         ; heap tag is 011; raw = rs - 3
%ld(rd, rs, -3)
%endm

%macro hdr_type(rd, rs)         ; type byte is low byte of header
%lb(rd, rs, -3)
%endm

%macro hdr_field(rd, rs, idx)   ; word field idx, 0 counts the header word
%ld(rd, rs, (- (* idx 8) 3))
%endm

cons is a bump + two stores + tag:

%macro cons(rd, rcar, rcdr, rtmp)
; assumes s0 = heap_next, s1 = heap_end
%addi(rtmp, s0, 16)
%la_br(heap_oom)
%bltu(s1, rtmp)                 ; if heap_end < next+16, OOM
%st(rcar, s0, 0)
%st(rcdr, s0, 8)
%addi(rd, s0, 1)                ; tag 001
%mov(s0, rtmp)
%endm

alloc_hdr(rd, type_byte, bytes, rtmp) is the corresponding idiom for headered objects; it bumps, writes the header word, and returns the tagged pointer. The header word is composed on the fly from the type byte and any aux field.

Dispatch tables

P1 has no indexed branch op, but it has BR rs / CALLR rs / TAILR rs and one-word loads. That's enough to build a jump table for any switch wider than ~3 cases, which we use for the special-form cascade in eval.

Each entry is 8 bytes: a 4-byte M1 label reference plus 4 bytes of pad. P1-64 has no 4-byte load, only full-word LD, so padding to a word is the cheap path.

%macro table_entry(lref)
lref %(0)                       ; label ref + 4 zero bytes = 8 bytes
%endm

Usage (per special form, in index order):

special_dispatch_table:
%table_entry(&eval_quote)
%table_entry(&eval_if)
%table_entry(&eval_lambda)
%table_entry(&eval_define)
%table_entry(&eval_set)
%table_entry(&eval_begin)
...

Dispatch is load + indirect branch:

; t0 = sym_idx, already range-checked (< N_SPECIAL)
%la(t1, special_dispatch_table)
%shli(t2, t0, 3)
%add(t1, t1, t2)
%ld(t1, t1, 0)
%tailr(t1)

Five ops and one memory load, regardless of how many special forms.

Reservation convention. Special-form symbols are interned first at startup, so their sym indices are 0..N_SPECIAL-1. The dispatcher checks idx < N_SPECIAL before using the table; out-of-range means "ordinary application".

Allocation

C sketch (target shape):

static val_t alloc(size_t bytes) {
    char *p = heap_next;
    heap_next += (bytes + 7) & ~7;
    if (heap_next > heap_end) error("heap exhausted");
    return (val_t) p;
}

P1 translation: s0 and s1 are pinned to heap_next and heap_end for the lifetime of the program. Allocation is addi + bltu + mov. In practice almost every allocation site calls one of %cons, %alloc_hdr, or an inline bytevector emitter, not a generic alloc.

Register assignment

Four callee-saved registers carry the hot state:

Reg	Role
`s0`	`heap_next`
`s1`	`heap_end`
`s2`	current env pointer
`s3`	scratch held across one inner loop at a time

eval(expr, env) is a0 = expr, a1 = env, returns in a0. Handlers copy a1 into s2 at entry. a0..a3 and t0..t2 are free across calls. Leaf code inside a handler may use s3 without saving, provided it restores before the handler returns.

Reader, writer, and primitive bodies follow the same discipline. Any primitive that needs deep recursion (e.g. equal?) saves s2 in its frame even if it does not touch env, since its callees might.

Closures

[ hdr = CLOSURE ][ params ][ body ][ env ]

params: parameter list s-expression: (a b c), (a b . rest), or a bare symbol rest for a fully variadic lambda.
body: the list of body forms, evaluated sequentially; last in tail position.
env: the environment chain captured at lambda evaluation.

Four words total. No free-var analysis at closure creation — the whole env chain is retained and walked at lookup time.

Environment

env  ::=  NIL
       |  ( (sym . val) . env )

Lookup. Walk the list, pointer-compare the car of each pair with the target symbol, return the cdr on match. If the walk ends at NIL, fall through to symtab[idx].global_val; if that is UNBOUND, error out with "unbound variable".
Extension (scope entry). Cons new (sym . val) pairs onto the front.
set!. Walk the env chain; on match, set-cdr! the pair. If the chain is exhausted, write to symtab[idx].global_val.
Top-level define. Write to symtab[idx].global_val.
Inner define. Handled inline by eval_body (see below). Each inner define prepends (name . val) to the local env as it is encountered.

The global env is not a chain — it's the symbol table. Closures captured before a top-level define nonetheless see the new binding, because the fallback is to symtab[idx].global_val at lookup time, not at capture time. This is how mutual recursion between top-level functions works without any pre-pass.

Eval

C sketch (target shape):

val_t eval(val_t expr, val_t env) {
    switch (tag(expr)) {
        case TAG_FIXNUM: return expr;
        case TAG_IMM:    return expr;
        case TAG_SYM:    return env_lookup(expr, env);
        case TAG_HEAP:   return expr;   // bv, closure, record — self-eval
        case TAG_PAIR:   break;         // form; dispatch below
    }

    val_t head = car(expr);
    val_t rest = cdr(expr);

    if (head == SYM_QUOTE)      return car(rest);
    if (head == SYM_IF) {
        val_t t = eval(car(rest), env);
        val_t next = (t != FALSE) ? cadr(rest) : caddr(rest);
        MUSTTAIL return eval(next, env);
    }
    if (head == SYM_LAMBDA)     return make_closure(car(rest), cdr(rest), env);
    if (head == SYM_DEFINE)     return eval_define(rest, env);
    if (head == SYM_SET)        return eval_set(rest, env);
    if (head == SYM_BEGIN)      MUSTTAIL return eval_begin(rest, env);
    if (head == SYM_LET)        MUSTTAIL return eval_let(rest, env);
    if (head == SYM_LETSTAR)    MUSTTAIL return eval_letstar(rest, env);
    if (head == SYM_LETREC)     MUSTTAIL return eval_letrec(rest, env);
    if (head == SYM_COND)       MUSTTAIL return eval_cond(rest, env);
    if (head == SYM_AND)        MUSTTAIL return eval_and(rest, env);
    if (head == SYM_OR)         MUSTTAIL return eval_or(rest, env);
    if (head == SYM_PMATCH)     MUSTTAIL return eval_pmatch(rest, env);
    if (head == SYM_DEFINE_RT)  return eval_define_record_type(rest, env);

    // application
    val_t fn = eval(head, env);
    val_t args = eval_list(rest, env);
    MUSTTAIL return apply(fn, args);
}

The special-form symbols (SYM_QUOTE, SYM_IF, etc.) are interned once at startup and stored in labeled word slots. Dispatch is a cascade of integer compares.

P1 translation. The outer tag switch is a short cascade of beqs on the tag value (fixnum / imm / sym / heap return immediately; pair falls through to form dispatch). Form dispatch uses the jump table from §Dispatch tables:

; eval_pair path, a0 = expr, a1 = env already normalized
%car(t1, a0)                    ; t1 = head
%cdr(a1, a0)                    ; a1 = rest, set up for handlers

; fall through to ordinary application if head is not a symbol
%andi(t2, t1, 7)
%la_br(eval_apply)
%addi(t3, t2, (- %TAG.SYM))
%bnez(t3)                       ; tag != SYM -> apply

; extract idx and bounds-check
%sari(t0, t1, 3)                ; t0 = sym_idx
%la(t2, N_SPECIAL_const)
%ld(t2, t2, 0)
%la_br(eval_apply)
%bltu(t0, t2)                   ; invert: take table path iff idx < N
...

; table dispatch
%la(t1, special_dispatch_table)
%shli(t2, t0, 3)
%add(t1, t1, t2)
%ld(t1, t1, 0)
%tailr(t1)

(The bounds-check polarity and the branch for "take table path" are sketched loosely; the final shape is whatever combination of BLT/BLTU/BEQ produces the fewest ops.)

Each handler is its own P1 function. Tail positions inside handlers end with %la_br(target); %tail(). An ordinary application is eval(head), eval_list(rest), then %la_br(apply); %tail().

Apply

C sketch:

val_t apply(val_t fn, val_t args) {
    switch (header_type(fn)) {
        case HDR_PRIM:
            return call_prim(fn, args);
        case HDR_CLOSURE: {
            val_t env = bind_params(clo_params(fn), args, clo_env(fn));
            MUSTTAIL return eval_body(clo_body(fn), env);
        }
        default:
            error("not a procedure", fn);
    }
}

bind_params walks params and args in lockstep. Variadic .-tail: when params becomes a non-pair symbol, that symbol is bound to the remaining args list. When params becomes NIL with args remaining (or vice versa), error out with an arity message.

P1 translation. hdr_type reads one byte; dispatch is two compares:

; apply(fn=a0, args=a1)
%hdr_type(t0, a0)
%la(t1, %HDR.PRIM)
%la_br(apply_prim)
%beq(t0, t1)
%la(t1, %HDR.CLOSURE)
%la_br(apply_closure)
%beq(t0, t1)
%la_br(err_not_procedure)
%b()

apply_prim extracts the entry label from the primitive's header (low 4 bytes) and callrs it with the arg list in a0:

apply_prim:
%hdr_load(t0, a0)               ; t0 = header word
%andi(t0, t0, 0xFFFFFFFF)       ; low 32 bits = &entry
%mov(a0, a1)                    ; arg list in a0 for the primitive
%callr(t0)
%ret()

apply_closure runs bind_params, extends env, then %la_brs eval_body and %tail()s.

Eval_body

val_t eval_body(val_t body, val_t env) {
    while (is_pair(cdr(body))) {
        val_t form = car(body);
        if (is_pair(form) && car(form) == SYM_DEFINE) {
            val_t name = define_name(form);
            val_t val  = eval(define_rhs(form), env);
            env = cons(cons(name, val), env);
        } else {
            eval(form, env);
        }
        body = cdr(body);
    }
    val_t last = car(body);
    if (is_pair(last) && car(last) == SYM_DEFINE) {
        val_t name = define_name(last);
        val_t val  = eval(define_rhs(last), env);
        env = cons(cons(name, val), env);
        return UNSPEC;
    }
    MUSTTAIL return eval(last, env);
}

Inner define is sequential-only: visible to forms that textually follow it in the same body, not to earlier forms. For mutual recursion between local helpers, use letrec or lift to top level. The SYM_DEFINE handler in eval itself handles only top-level define; inner defines are absorbed here and never reach that path.

Special forms

Implemented directly, each ~30-60 P1 ops:

quote — return the datum.
if — eval test; tail-eval the chosen branch.
lambda — build closure (one %alloc_hdr).
define — top-level: write symtab[idx].global_val. Inner: handled by eval_body.
set! — walk env; first match mutates. Miss: write global slot.
begin — eval each form in sequence; tail-eval the last.
let — eval all inits in the current env; extend env with bindings; tail-eval body via eval_body.
let* — extend env one binding at a time, each init evaluated in the progressively extended env.
letrec — pre-bind each name to UNSPEC in a new env; eval each init in that env; set! each binding; tail-eval body.
let (named) — desugar inline to letrec + call, or implement directly.
cond — loop clauses; first truthy test wins; tail-eval its body; else always true.
and / or — short-circuit fold; return the deciding value, not a coerced boolean.
pmatch — eval_pmatch clause loop + pmatch_match recursive walker. See §pmatch below.
define-record-type — see below.

pmatch

Surface spec lives in LISP-PMATCH.md. This section covers integration into the scheme1 spine that's already in scheme1/scheme1.P1pp.

Why built-in (not a library)

Pattern bindings extend the interpreter's env via cons. The runtime exposes no (eval expr env) primitive and no macro system, so a user-level pmatch could not introduce bindings visible to its clause body. Building it in is the smallest path; the alternative is adding either macros or first-class envs, both bigger commitments than one special form.

Reader integration — no refactor

pmatch operates on already-parsed s-expressions. By the time eval reaches (pmatch e (pat body...) ...), every pattern is a heap-consed datum produced by the existing parse_one / parse_list recursion. The matcher walks those datums; it never re-invokes the reader. So nothing in the reader needs restructuring.

The one additive change is a new branch in parse_one's leading-byte dispatch: , (ASCII 44) → consume the byte, recurse parse_one for the inner datum, build (unquote <datum>). This is a near-copy of the existing ' branch (::quote in parse_one) — same shape, different head symbol:

::comma                       ;; mirrors ::quote
%la(t2, &readbuf_pos)
%ld(t0, t2, 0)
%addi(t0, t0, 1)
%st(t0, t2, 0)
%call(&parse_one)
%li(a1, %imm_val(%IMM.NIL))
%call(&cons)
%la(t0, &sym_unquote)
%ld(t0, t0, 0)
%mov(a1, a0)
%mov(a0, t0)
%tail(&cons)

,@ is not a separate token: @ is already in is_ident_byte's set, so ,@x parses as (unquote @x) and the matcher then treats @x as a (legal but odd) binder identifier. An explicit msg_bad_unquote_splice is optional polish, not load-bearing.

There is no quasiquote evaluator, so ,x written outside a pmatch pattern evaluates as (unquote x) — an ordinary application of an unbound unquote — and dies through the existing unbound variable path. No new failure mode.

New cached symbol slots

Four labeled slots, interned in intern_special_forms alongside the existing ones (sym_quote, sym_if, …):

Slot	Purpose
`sym_pmatch`	eval's pair-branch dispatch
`sym_unquote`	pattern-side: detect `(unquote x)` — i.e., a `,x` binder
`sym_guard`	clause shape: `((guard g…) …)` vs. plain body
`sym_underscore`	wildcard test (the literal symbol `_` after `unquote`)

sym_else is already cached for cond; pmatch reuses it for the else clause. Caching sym_underscore (vs. byte-comparing the name) costs one slot and turns the wildcard test into a single beq.

Eval-side dispatch

Two new lines in eval's pair branch — same shape as every other special form already wired up there:

%dispatch_form(&sym_pmatch, &::do_pmatch)   ;; in the cascade
...
::do_pmatch                                  ;; among the tail handlers
%tail_to_handler(&eval_pmatch)

The cascade is order-independent (each row is a pointer-compare on a distinct cached symbol), so position in the list doesn't matter.

eval_pmatch(rest=a0, env=a1) → value (a0)

rest is (subject-expr . clauses). Subject is evaluated once, then clauses walk top-to-bottom restarting from the outer env each time.

Frame: 32 bytes
  +0  subject       ; eval(car(rest), env) once, then read-only
  +8  env_outer     ; restart point per clause; passed to pmatch_match
  +16 clauses       ; cdr(rest); advances per non-match
  +24 spill         ; clause body / extended env across guard eval

Per-clause shape (same control flow as eval_cond's clause loop, just with a richer match step):

loop:
    if clauses == NIL: die msg_pmatch_no_match
    clause = car(clauses)
    pat    = car(clause)
    tail   = cdr(clause)            ; (body...) or ((guard ...) body...)

    if pat eq? sym_else:
        a0 = tail; a1 = env_outer; tail-call eval_body
        ;; we don't validate that an `else` clause has no (guard ...) head;
        ;; per LISP-PMATCH.md grammar that's malformed source, but any
        ;; (guard ...) here will just be evaluated as a body form, which
        ;; will die through the unbound `guard` path.

    call pmatch_match(pat, subject, env_outer)   ; -> (env_ext=a0, ok=a1)
    if ok == 0: advance clauses; loop

    ; guard clause? — tail begins with (guard g...)
    if pair? tail and pair? car(tail) and car(car(tail)) eq? sym_guard:
        guards = cdr(car(tail))
        body   = cdr(tail)
        for each g in guards:                ; AND-fold, short-circuit
            if eval(g, env_ext) == FALSE: advance clauses; loop
    else:
        body = tail

    a0 = body; a1 = env_ext; tail-call eval_body

eval_body already preserves tail position for the body's last form, so the matched clause's last expression is in tail position of the surrounding pmatch — same as cond, let, begin.

pmatch_match(pat=a0, subj=a1, env=a2) → (env'=a0, ok=a1)

Pure leaf-ish recursive helper. Dispatches on the pattern's tag. Output convention mirrors parse_dec: a0 carries the extended env on success, a1 is the 1/0 ok flag; on failure a0 is undefined and the caller must not use it.

Pattern shape	Match condition
`IMM.NIL`	`subj == NIL`
`TAG.FIXNUM`	raw word compare `subj == pat` (covers integers and chars)
`TAG.IMM` (`#t`/`#f`)	raw word compare
`TAG.SYM`	raw word compare (symbols are interned, so this is `eq?`)
`TAG.HEAP` (bv)	byte-for-byte equality via `bv_equal_raw` — see below
`TAG.PAIR`	binder / wildcard if `car(pat) eq? sym_unquote`; else structural recurse

Frame

Frame: 24 bytes
  +0  pat       ; preserved across cons (binder) and across the car recursion
  +8  subj      ; same — needed in cons in binder case, and as cdr(subj) after recurse
  +16 env       ; saved across the cons calls in the binder case

All three are stored on entry; nothing gets unsaved on the way out because the function returns through %eret after computing a0/a1.

Binder / wildcard

phead = car(pat)
if phead eq? sym_unquote:
    ;; validate shape: pat must be (unquote <sym>), exactly two elements
    if tag(cdr(pat))         != PAIR: die msg_bad_unquote_pattern
    if tag(cdr(cdr(pat)))    != IMM.NIL: die msg_bad_unquote_pattern
    pident = car(cdr(pat))
    if tag(pident)           != SYM:  die msg_bad_unquote_pattern

    if pident eq? sym_underscore:
        return (env, ok=1)             ;; ,_  — no binding
    return (cons(cons(pident, subj), env), ok=1)

Validation is per-call and cheap — three tag compares before any cons. A defined runtime error here (rather than UB) is the one carve-out from the "primitive failure is UB" policy: pattern shape is a syntax error in the user's source, not a hot-path concern.

Structural pair

;; phead is anything else: a literal symbol, a nested pattern, NIL, ...
if tag(subj) != PAIR: return (_, ok=0)
(env1, ok) = pmatch_match(car(pat), car(subj), env)
if !ok: return (_, ok=0)
return       pmatch_match(cdr(pat), cdr(subj), env1)    ;; tail position

Two LISP-PMATCH.md guarantees fall out of this shape:

Improper-tail patterns — the cdr-recursion accepts any pattern as ptail. ,rest there binds the list tail (proper or improper); () there demands a proper list.
Nested patterns — descending in lockstep handles arbitrary depth; no special case in the matcher.

Partial-match rollback — none needed

env is passed by register, never mutated. A failed sub-match returns ok=0 without having shadowed the caller's env pointer. The next clause restarts from env_outer stashed in the eval_pmatch frame, so any sub-bindings built during the failed match are simply unreachable and will be reclaimed when the heap is reset.

Allocation cost

Successful binder: one cons for (ident . subj) plus one cons to push it onto env — same as a let binding.
Wildcard: zero allocation.
Failed match: zero allocation past the failure point (the recursion bails on the first mismatch).
No pattern compilation, no decision-tree caching. Patterns are re-walked per call. Revisit if profiling demands it; until then this matches the rest of scheme1 (tree-walk eval, no preprocessing).

Bytevector equality — factor out of `equal?`

The existing equal? primitive contains the byte-for-byte loop already; pmatch_match needs the same logic but on raw val_ts, not on an arg list. Factor a leaf helper:

;; bv_equal_raw(a=a0, b=a1) -> a0 = 1 if equal else 0
;; assumes both inputs are HEAP-tagged bytevectors

Reshape equal?'s bv branch to mov/mov its two unpacked operands into a0/a1 and tail into bv_equal_raw. pmatch_match calls the same helper directly. Both keep their existing public behavior; the duplicated inline loop disappears.

Guards

Evaluated in the extended env, short-circuit AND on FALSE. The inner loop reuses the same compare against %imm_val(%IMM.FALSE) that eval_and already uses; no new helper. A rejected guard advances the clause cursor — it does not backtrack against the subject.

No-match die path

msg_pmatch_no_match routed through runtime_error, same path as every other die in the interpreter. LISP-PMATCH.md classes this as UB; in practice we always crash with a clear message rather than wander.

File placement

Both routines live in the §Special forms section of scheme1/scheme1.P1pp, slotting in alongside eval_cond / eval_let / etc.:

eval_pmatch — the special-form entry point, called via the ::do_pmatch tail handler in eval's pair branch.
pmatch_match — leaf-ish recursive helper, defined immediately below eval_pmatch (its only caller).
bv_equal_raw — leaf helper. Lives near the existing equal? primitive in §Primitives, since both call into it.

intern_form rows for the four new symbol slots go into intern_special_forms, in any position. parse_one's , branch goes alongside the existing ' branch in the reader section.

define-record-type

Desugars at eval time into a sequence of primitive operations.

(define-record-type point
  (make-point x y)
  point?
  (x point-x)
  (y point-y set-point-y!))

becomes

(define <point-td>   (%make-record-td 'point 2))
(define make-point   (lambda (x y) (%make-record <point-td> x y)))
(define point?       (lambda (v) (%record-is-a? v <point-td>)))
(define point-x      (lambda (p) (%record-ref p 0)))
(define point-y      (lambda (p) (%record-ref p 1)))
(define set-point-y! (lambda (p v) (%record-set! p 1 v)))

The % primitives are internal, not part of the user-facing primitive list.

Primitives

55 user-facing primitives plus 6 internal record ops. Each is its own P1 function, reached via callr on the label packed into the primitive object's header.

Predicates / equality (13): eq?, equal?, not, boolean?, pair?, null?, symbol?, integer?, string? (≡ bytevector?), procedure?, record?, record-is-a?, record-type?.

Pairs (5): cons, car, cdr, set-car!, set-cdr!.

Integers (24): +, -, *, quotient, remainder, modulo, =, <, <=, >, >=, zero?, positive?, negative?, abs, min, max, bit-and, bit-or, bit-xor, bit-not, arithmetic-shift, number->string, string->number.

Bytevectors / strings / symbols (10): make-bytevector, bytevector-length, bytevector-u8-ref, bytevector-u8-set!, bytevector-copy, bytevector-copy!, bytevector-append, bytevector=?, string->symbol, symbol->string.

Control / program / error (3): apply, argv, error.

Internal record ops (6): %make-record-td, %make-record, %record-ref, %record-set!, %record-is-a?, %record-td.

Calling convention

apply_prim receives the arg list in a0 and jumps via callr to the primitive's entry label (extracted from the primitive header). Each primitive's body:

Walks the arg list, copying values into argbuf[] and counting argc.
Checks arity (fixed-arity primitives; variadic ones do their own).
Computes result into a0.
rets.

There is no code-id switch and no dispatch table — the table of primitives is the set of labels referenced from register_primitives at startup, and the interpreter keeps zero pointers to it afterwards.

Prelude

Written in Scheme, embedded as a bytevector literal in scheme.P1, parsed and evaluated at startup like any other source. Contents:

(define (list . xs) xs)
(define (length xs) ...)
(define (reverse xs) ...)
(define (append . lists) ...)
(define (list-ref xs i) ...)
(define (map f . lists) ...)
(define (for-each f . lists) ...)

Runs before the user file, sharing the same symbol table and heap. ~80 Scheme LOC.

Reader

Single-pass recursive descent over readbuf with an integer cursor. No token array. Conses directly onto the heap.

Handles the lexical syntax specified in LISP.md:

lists with . dotted tail
'x → (quote x)
,x → (unquote x) — datum sugar only, consumed by pmatch; no unquote evaluator. No quasiquote, no ,@ / unquote-splicing.
fixnums: decimal and #x-prefixed hex, both with optional sign
#t, #f
"..." strings with \n \t \r \\ \" escapes
#\a..#\~, #\newline, #\space, #\tab, #\return, #\null, #\xNN → read as fixnums
symbols: identifier grammar, case-sensitive
; line comments (no block comments, no #;)

Source locations

As the reader conses each pair, it records pair_ptr → (line . col) in a side table (fixed-size open-addressing hash). error consults this when its first argument is a pair and prefixes the message with "at file:line:col: ". Without a GC, entries for collected pairs never arise; a full table simply overwrites old entries via linear probing.

Writer

Three entry points:

display — no quoting on strings; bare character output.
write — readable form; strings quoted.
format fmt args... — supports ~a (display), ~s (write), ~d (decimal fixnum), ~% (newline).

Output buffered in writebuf; flushed via sys_write(1, ...) on buffer-full or on newline for display/write. error flushes before writing to fd 2.

Error path

(error msg arg ...):

Flush writer buffer.
Write "error: " + message + formatted args to fd 2 via sys_write.
sys_exit(1).

No unwinding, no handlers. Primitive failure follows the same policy as the spec: undefined behavior, likely crash, no runtime check.

Startup

P1 defines a portable program-entry model (see P1.md §Program Entry): the arch backend emits _start, captures argc/argv off the native entry stack, and calls the portable label p1_main with a0 = argc, a1 = argv. On return, the backend does sys_exit(a0).

The Scheme interpreter therefore contains no per-target startup code. It just defines p1_main:

p1_main:
    ; a0 = argc, a1 = argv
    init_heap                 ; s0 = &heap, s1 = &heap_end
    init_symtab
    intern_special_forms      ; fill labeled slots sym_quote, sym_if, ...
                              ; (order reserves sym_idx 0..N_SPECIAL-1)
    register_primitives       ; emit %HDR.PRIM words with &entry labels
    parse_and_eval_prelude    ; prelude is an embedded bytevector literal
    ; slurp argv[1] into readbuf
    ; loop: read_one_top_level; eval(form, NIL); advance cursor; until EOF
    li  a0, 0
    ret                       ; backend _start does sys_exit(0)

All state lives in labeled reservations; nothing is heap-allocated before p1_main.

File layout

boot2/
  src/
    scheme.P1             # M1PP source, consumes P1.M1pp macro library
  lisp/
    prelude.scm           # embedded into scheme.P1 as a bytevector literal
  tests/lisp/
    00-arith.scm
    01-list.scm
    02-reader.scm
    03-printer.scm
    04-eval.scm
    05-tailcall.scm
    06-pmatch.scm
    07-records.scm
    ...

Build: prepend the arch-specific backend M1M to p1/P1.M1pp, prepend that bundle to src/scheme.P1, run m1pp → M0 → hex2 → ELF. Tests are .scm files the interpreter runs; pass = exit 0 and expected stdout.

Milestones

Status legend: [x] done · [~] in progress · [ ] not started.

Bootstrapping shim. Per-target _start, argv capture, labeled reservations for heap / symtab / readbuf / writebuf, sys_write + sys_exit error path. Exits with a placeholder status.
Tag macros. %tagof, %make_fixnum, %untag_fixnum, %car, %cdr, %hdr_*. Verify constants at M1PP time with %select.
Bump allocator and cons. %cons macro; OOM branch. Hand-build (cons 42 nil) and exit with the decoded car.
Symbols + interning. 16k-slot open-addressing table; intern(name, len) returns a tagged symbol val_t. Two interns of the same name compare equal as integers.
Bytevectors. Headered heap object; make-bytevector, bytevector-u8-ref, etc.
Reader. End-to-end source → s-expression for the full lexical syntax, with source-location side table.
Writer. display, write, format.
eval + apply core. Self-eval, symbol lookup, if, lambda, top-level define, application. Every tail position uses %tail.
Remaining special forms. set!, begin, let/let*/letrec, named let, cond, and/or, inner define in eval_body.
Primitives. All 55 user-facing + 6 internal. Batches: arith/bitwise, list-core, bv/string, I/O+apply+error, record ops.
Prelude. Embed as bytevector literal; parse and eval at startup.
define-record-type. Desugaring + record primitives.
pmatch. Pattern matcher + clause driver.
End-to-end. Run the self-hosted compiler under it.

Rough total: ~4–5k P1 LOC + ~80 Scheme LOC.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README

boot2