boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

Scheme in P1

Goal

A small tree-walking interpreter for the Scheme subset specified in LISP.md, written in P1 via the M1PP + M1 + M0 + hex2 toolchain. Intent: simplicity and correctness first; performance is a later concern.

The self-hosted C compiler (see PLAN.md) will run under this interpreter, as will shell-style scripts built on shell.scm and prelude.scm.

Toolchain

scheme.P1                 (hand-written M1PP source using P1.M1pp)
   | prepend arch backend + P1.M1pp
   | m1pp
   v
scheme.M1                 (target-specific M1 ops)
   | M0
   v
scheme.hex2
   | hex2
   v
scheme                    (native ELF)

C-shaped snippets in this document are descriptive of the target shape, not the source. The source is M1PP-preprocessed P1 assembly.

Word size

P1-64 is the initial target. Every heap word is 8 bytes; every object is 8-byte aligned. P1-32 is out of scope for the first pass; porting would require a parallel layout story because M1PP's %struct stride is fixed at 8.

Settled decisions

Load-bearing; the rest of the document assumes them.

  1. Tree-walking eval/apply over raw s-expressions. No AST preprocessing, no bytecode, no compilation step. The reader's output is the interpreter's input.
  2. Static buffers, no GC. Fixed-size labeled byte reservations for heap, symbol table, reader buffer, writer buffer. Heap exhaustion aborts with an error.
  3. Bump allocation, 8-byte aligned. Single heap_next pointer (pinned in s0); every object is 8-byte aligned by construction.
  4. Tagged values, low-3-bit tag. See §Value representation.
  5. Symbols are interned indices. No heap object per symbol; the value is an index into the symbol table, which stores the name bytevector.
  6. Environment is a flat association list. ((sym . val) ...), extended by consing onto the front. Lookup walks the list; falls through to the global slot in the symbol table.
  7. Closures capture the env pointer directly. No free-variable analysis, no flat closures, no boxes. A closure is four words: [header, params, body, env].
  8. No set! on captured variables. Language-level rule. The implementation does not enforce it; behavior is unspecified if violated. Use a pair for shared mutable state.
  9. Inner define is sequential-only. Visible to later forms in the same body, not to earlier forms. Use letrec for mutual recursion between local helpers.
  10. pmatch is a built-in special form. Saves writing a macro expander for one feature.
  11. Tail calls via P1 TAIL / TAILR. Every tail position in eval and apply compiles to TAIL (label) or TAILR (register) rather than CALL + RET. Scheme-level tail-call correctness falls out.
  12. Primitive failure is undefined behavior. No runtime checks on (car '()), (quotient n 0), out-of-bounds bytevector access, integer overflow, or mutation of literals. Follows the spec's "Primitive failure" policy.

Memory layout

Labeled byte reservations in the final image. Starting sizes:

Region Size Purpose
heap 64 MB bump-allocated Scheme objects
symtab 16384 × open-addressing intern table
readbuf 16 MB slurped source file
writebuf 64 KB staged output for display/write
argbuf 64 × 8 B primitive dispatch scratch

All sized via M1PP constants; easy to retune. Overflow of any buffer → error(...)sys_exit(1).

Symbol table entry

struct sym_entry {
    val_t    name_bv;      // interned name, heap bytevector
    val_t    global_val;   // top-level binding (or UNBOUND)
    uint32_t name_hash;
    uint32_t chain_next;
};

global_val gives O(1) top-level lookup: every symbol's global binding is reachable without hash probing once the symbol is resolved.

M1PP layout (widened to word-per-field for 8-byte stride):

%struct SYMENT { name_bv global_val name_hash chain_next }
; .name_bv = 0, .global_val = 8, .name_hash = 16, .chain_next = 24,
; .SIZE = 32

The hash and next fields are logically 32-bit but occupy a full word each to stay on the %struct stride; simpler than packing.

The symbol table and heap are the only global roots.

Value representation

All values are 64-bit tagged words. Low 3 bits are the tag.

Tag Kind Notes
000 fixnum 61-bit signed; >> 3 to recover
001 pair pointer 16 B aligned; [p-1] car, [p+7] cdr
010 symbol (idx << 3) | 2; idx into symtab
011 headered heap object [p-3] is the header word
100 immediate singleton #f, #t, (), unspec, unbound
101111 reserved

Tag and immediate constants

Encoded as M1PP %enums. Concrete values derive via expression:

%enum TAG { FIXNUM PAIR SYM HEAP IMM }          ; raw indices 0..4
%enum IMM { FALSE TRUE NIL UNSPEC UNBOUND }     ; raw indices 0..4

; a tag 100 value in low bits is literally %TAG.IMM == 4
%macro MKIMM(idx)
$((| (<< idx 3) %TAG.IMM))
%endm

; usage: %MKIMM(%IMM.FALSE), %MKIMM(%IMM.NIL), etc.

Truthiness: every value except %MKIMM(%IMM.FALSE) is truthy. Test is a single compare.

Object types

This is the complete runtime type set.

Type Representation
fixnum immediate (tag 000)
boolean immediate (FALSE, TRUE)
empty list immediate (NIL)
unspec / unbound immediates
pair [car][cdr], 16 B, tag 001, no header
bytevector [hdr = BV | len<<8][bytes...], tag 011
closure [hdr = CLOSURE][params][body][env], tag 011
primitive [hdr = PRIM | entry_label], tag 011
record-td [hdr = TD][name_sym][nfields], tag 011
record [hdr = REC][td][field_0]...[field_{n-1}], tag 011

Headered object header word (little-endian bytes):

[ type:8 | aux:56 ]

where aux carries type-specific data (length for bytevectors, entry-label for primitives, nfields for records, etc.). One load tells you the type.

Layouts

%struct PAIR    { car cdr }                      ; .SIZE = 16
%struct CLOSURE { hdr params body env }          ; .SIZE = 32
%struct BV      { hdr }                          ; bytes follow inline
%struct PRIM    { hdr }                          ; header encodes entry
%struct TD      { hdr name_sym nfields }         ; .SIZE = 24
; record is variable-width; td+nfields known from TD
%struct REC_HDR { hdr td }                       ; field_i at offset 16 + 8*i
%enum HDR { BV CLOSURE PRIM TD REC }             ; type-byte values

Strings and bytevectors are the same type. string?bytevector?. No character type: #\a in source is sugar for fixnum 97.

Primitive encoding

An M1 label is 4 bytes. A primitive fits in a single 8-byte header:

bytes [0..4): &entry_label       ; 4-byte hex2 label reference
bytes [4..6): arity              ; @(arity)   little-endian 2 bytes
byte  [6]  : type = %HDR.PRIM    ; !(%HDR.PRIM)
byte  [7]  : 0                   ; !(0)

This makes apply_prim a header-load plus callr: no dispatch table, no switch. §Apply sketches the sequence.

Tag idioms

A handful of short sequences do all of the runtime's bit-twiddling. They are M1PP macros so the rest of the code reads at the Scheme level. These are illustrative; exact op selection is finalized during implementation.

%macro tagof(rd, rs)            ; rd = rs & 7
%andi(rd, rs, 7)
%endm

%macro make_fixnum(rd, rs)      ; rd = rs << 3
%shli(rd, rs, 3)
%endm

%macro untag_fixnum(rd, rs)     ; rd = rs >> 3, arithmetic
%sari(rd, rs, 3)
%endm

%macro car(rd, rs)              ; pair tag is 001; raw = rs - 1
%ld(rd, rs, -1)                 ; [raw + 0] = car
%endm

%macro cdr(rd, rs)
%ld(rd, rs, 7)                  ; [raw + 8] = cdr
%endm

%macro set_car(rs, rv)
%st(rv, rs, -1)
%endm

%macro set_cdr(rs, rv)
%st(rv, rs, 7)
%endm

%macro hdr_load(rd, rs)         ; heap tag is 011; raw = rs - 3
%ld(rd, rs, -3)
%endm

%macro hdr_type(rd, rs)         ; type byte is low byte of header
%lb(rd, rs, -3)
%endm

%macro hdr_field(rd, rs, idx)   ; word field idx, 0 counts the header word
%ld(rd, rs, (- (* idx 8) 3))
%endm

cons is a bump + two stores + tag:

%macro cons(rd, rcar, rcdr, rtmp)
; assumes s0 = heap_next, s1 = heap_end
%addi(rtmp, s0, 16)
%la_br(heap_oom)
%bltu(s1, rtmp)                 ; if heap_end < next+16, OOM
%st(rcar, s0, 0)
%st(rcdr, s0, 8)
%addi(rd, s0, 1)                ; tag 001
%mov(s0, rtmp)
%endm

alloc_hdr(rd, type_byte, bytes, rtmp) is the corresponding idiom for headered objects; it bumps, writes the header word, and returns the tagged pointer. The header word is composed on the fly from the type byte and any aux field.

Dispatch tables

P1 has no indexed branch op, but it has BR rs / CALLR rs / TAILR rs and one-word loads. That's enough to build a jump table for any switch wider than ~3 cases, which we use for the special-form cascade in eval.

Each entry is 8 bytes: a 4-byte M1 label reference plus 4 bytes of pad. P1-64 has no 4-byte load, only full-word LD, so padding to a word is the cheap path.

%macro table_entry(lref)
lref %(0)                       ; label ref + 4 zero bytes = 8 bytes
%endm

Usage (per special form, in index order):

special_dispatch_table:
%table_entry(&eval_quote)
%table_entry(&eval_if)
%table_entry(&eval_lambda)
%table_entry(&eval_define)
%table_entry(&eval_set)
%table_entry(&eval_begin)
...

Dispatch is load + indirect branch:

; t0 = sym_idx, already range-checked (< N_SPECIAL)
%la(t1, special_dispatch_table)
%shli(t2, t0, 3)
%add(t1, t1, t2)
%ld(t1, t1, 0)
%tailr(t1)

Five ops and one memory load, regardless of how many special forms.

Reservation convention. Special-form symbols are interned first at startup, so their sym indices are 0..N_SPECIAL-1. The dispatcher checks idx < N_SPECIAL before using the table; out-of-range means "ordinary application".

Allocation

C sketch (target shape):

static val_t alloc(size_t bytes) {
    char *p = heap_next;
    heap_next += (bytes + 7) & ~7;
    if (heap_next > heap_end) error("heap exhausted");
    return (val_t) p;
}

P1 translation: s0 and s1 are pinned to heap_next and heap_end for the lifetime of the program. Allocation is addi + bltu + mov. In practice almost every allocation site calls one of %cons, %alloc_hdr, or an inline bytevector emitter, not a generic alloc.

Register assignment

Four callee-saved registers carry the hot state:

Reg Role
s0 heap_next
s1 heap_end
s2 current env pointer
s3 scratch held across one inner loop at a time

eval(expr, env) is a0 = expr, a1 = env, returns in a0. Handlers copy a1 into s2 at entry. a0..a3 and t0..t2 are free across calls. Leaf code inside a handler may use s3 without saving, provided it restores before the handler returns.

Reader, writer, and primitive bodies follow the same discipline. Any primitive that needs deep recursion (e.g. equal?) saves s2 in its frame even if it does not touch env, since its callees might.

Closures

[ hdr = CLOSURE ][ params ][ body ][ env ]

Four words total. No free-var analysis at closure creation — the whole env chain is retained and walked at lookup time.

Environment

env  ::=  NIL
       |  ( (sym . val) . env )

The global env is not a chain — it's the symbol table. Closures captured before a top-level define nonetheless see the new binding, because the fallback is to symtab[idx].global_val at lookup time, not at capture time. This is how mutual recursion between top-level functions works without any pre-pass.

Eval

C sketch (target shape):

val_t eval(val_t expr, val_t env) {
    switch (tag(expr)) {
        case TAG_FIXNUM: return expr;
        case TAG_IMM:    return expr;
        case TAG_SYM:    return env_lookup(expr, env);
        case TAG_HEAP:   return expr;   // bv, closure, record — self-eval
        case TAG_PAIR:   break;         // form; dispatch below
    }

    val_t head = car(expr);
    val_t rest = cdr(expr);

    if (head == SYM_QUOTE)      return car(rest);
    if (head == SYM_IF) {
        val_t t = eval(car(rest), env);
        val_t next = (t != FALSE) ? cadr(rest) : caddr(rest);
        MUSTTAIL return eval(next, env);
    }
    if (head == SYM_LAMBDA)     return make_closure(car(rest), cdr(rest), env);
    if (head == SYM_DEFINE)     return eval_define(rest, env);
    if (head == SYM_SET)        return eval_set(rest, env);
    if (head == SYM_BEGIN)      MUSTTAIL return eval_begin(rest, env);
    if (head == SYM_LET)        MUSTTAIL return eval_let(rest, env);
    if (head == SYM_LETSTAR)    MUSTTAIL return eval_letstar(rest, env);
    if (head == SYM_LETREC)     MUSTTAIL return eval_letrec(rest, env);
    if (head == SYM_COND)       MUSTTAIL return eval_cond(rest, env);
    if (head == SYM_AND)        MUSTTAIL return eval_and(rest, env);
    if (head == SYM_OR)         MUSTTAIL return eval_or(rest, env);
    if (head == SYM_PMATCH)     MUSTTAIL return eval_pmatch(rest, env);
    if (head == SYM_DEFINE_RT)  return eval_define_record_type(rest, env);

    // application
    val_t fn = eval(head, env);
    val_t args = eval_list(rest, env);
    MUSTTAIL return apply(fn, args);
}

The special-form symbols (SYM_QUOTE, SYM_IF, etc.) are interned once at startup and stored in labeled word slots. Dispatch is a cascade of integer compares.

P1 translation. The outer tag switch is a short cascade of beqs on the tag value (fixnum / imm / sym / heap return immediately; pair falls through to form dispatch). Form dispatch uses the jump table from §Dispatch tables:

; eval_pair path, a0 = expr, a1 = env already normalized
%car(t1, a0)                    ; t1 = head
%cdr(a1, a0)                    ; a1 = rest, set up for handlers

; fall through to ordinary application if head is not a symbol
%andi(t2, t1, 7)
%la_br(eval_apply)
%addi(t3, t2, (- %TAG.SYM))
%bnez(t3)                       ; tag != SYM -> apply

; extract idx and bounds-check
%sari(t0, t1, 3)                ; t0 = sym_idx
%la(t2, N_SPECIAL_const)
%ld(t2, t2, 0)
%la_br(eval_apply)
%bltu(t0, t2)                   ; invert: take table path iff idx < N
...

; table dispatch
%la(t1, special_dispatch_table)
%shli(t2, t0, 3)
%add(t1, t1, t2)
%ld(t1, t1, 0)
%tailr(t1)

(The bounds-check polarity and the branch for "take table path" are sketched loosely; the final shape is whatever combination of BLT/BLTU/BEQ produces the fewest ops.)

Each handler is its own P1 function. Tail positions inside handlers end with %la_br(target); %tail(). An ordinary application is eval(head), eval_list(rest), then %la_br(apply); %tail().

Apply

C sketch:

val_t apply(val_t fn, val_t args) {
    switch (header_type(fn)) {
        case HDR_PRIM:
            return call_prim(fn, args);
        case HDR_CLOSURE: {
            val_t env = bind_params(clo_params(fn), args, clo_env(fn));
            MUSTTAIL return eval_body(clo_body(fn), env);
        }
        default:
            error("not a procedure", fn);
    }
}

bind_params walks params and args in lockstep. Variadic .-tail: when params becomes a non-pair symbol, that symbol is bound to the remaining args list. When params becomes NIL with args remaining (or vice versa), error out with an arity message.

P1 translation. hdr_type reads one byte; dispatch is two compares:

; apply(fn=a0, args=a1)
%hdr_type(t0, a0)
%la(t1, %HDR.PRIM)
%la_br(apply_prim)
%beq(t0, t1)
%la(t1, %HDR.CLOSURE)
%la_br(apply_closure)
%beq(t0, t1)
%la_br(err_not_procedure)
%b()

apply_prim extracts the entry label from the primitive's header (low 4 bytes) and callrs it with the arg list in a0:

apply_prim:
%hdr_load(t0, a0)               ; t0 = header word
%andi(t0, t0, 0xFFFFFFFF)       ; low 32 bits = &entry
%mov(a0, a1)                    ; arg list in a0 for the primitive
%callr(t0)
%ret()

apply_closure runs bind_params, extends env, then %la_brs eval_body and %tail()s.

Eval_body

val_t eval_body(val_t body, val_t env) {
    while (is_pair(cdr(body))) {
        val_t form = car(body);
        if (is_pair(form) && car(form) == SYM_DEFINE) {
            val_t name = define_name(form);
            val_t val  = eval(define_rhs(form), env);
            env = cons(cons(name, val), env);
        } else {
            eval(form, env);
        }
        body = cdr(body);
    }
    val_t last = car(body);
    if (is_pair(last) && car(last) == SYM_DEFINE) {
        val_t name = define_name(last);
        val_t val  = eval(define_rhs(last), env);
        env = cons(cons(name, val), env);
        return UNSPEC;
    }
    MUSTTAIL return eval(last, env);
}

Inner define is sequential-only: visible to forms that textually follow it in the same body, not to earlier forms. For mutual recursion between local helpers, use letrec or lift to top level. The SYM_DEFINE handler in eval itself handles only top-level define; inner defines are absorbed here and never reach that path.

Special forms

Implemented directly, each ~30-60 P1 ops:

pmatch

The matcher walks pattern and subject recursively; on success it returns an env extended with the bindings; on failure it returns a sentinel. The driver tries clauses top-to-bottom, skipping clauses whose guard evaluates to #f. No match and no else is undefined behavior per spec.

Patterns:

Pattern Matches
() the empty list
<literal> integer / string / char / bool by equal?
<symbol> that exact symbol (not a binder)
,<ident> anything; binds to <ident>
,_ anything; no binding
(p1 p2 ...) proper list of exactly that length
(p1 ... . ptail) improper list; ptail binds the rest

No compilation or caching — rewalk the patterns each call.

define-record-type

Desugars at eval time into a sequence of primitive operations.

(define-record-type point
  (make-point x y)
  point?
  (x point-x)
  (y point-y set-point-y!))

becomes

(define <point-td>   (%make-record-td 'point 2))
(define make-point   (lambda (x y) (%make-record <point-td> x y)))
(define point?       (lambda (v) (%record-is-a? v <point-td>)))
(define point-x      (lambda (p) (%record-ref p 0)))
(define point-y      (lambda (p) (%record-ref p 1)))
(define set-point-y! (lambda (p v) (%record-set! p 1 v)))

The % primitives are internal, not part of the user-facing primitive list.

Primitives

55 user-facing primitives plus 6 internal record ops. Each is its own P1 function, reached via callr on the label packed into the primitive object's header.

Predicates / equality (13): eq?, equal?, not, boolean?, pair?, null?, symbol?, integer?, string? (≡ bytevector?), procedure?, record?, record-is-a?, record-type?.

Pairs (5): cons, car, cdr, set-car!, set-cdr!.

Integers (24): +, -, *, quotient, remainder, modulo, =, <, <=, >, >=, zero?, positive?, negative?, abs, min, max, bit-and, bit-or, bit-xor, bit-not, arithmetic-shift, number->string, string->number.

Bytevectors / strings / symbols (10): make-bytevector, bytevector-length, bytevector-u8-ref, bytevector-u8-set!, bytevector-copy, bytevector-copy!, bytevector-append, bytevector=?, string->symbol, symbol->string.

Control / program / error (3): apply, argv, error.

Internal record ops (6): %make-record-td, %make-record, %record-ref, %record-set!, %record-is-a?, %record-td.

Calling convention

apply_prim receives the arg list in a0 and jumps via callr to the primitive's entry label (extracted from the primitive header). Each primitive's body:

  1. Walks the arg list, copying values into argbuf[] and counting argc.
  2. Checks arity (fixed-arity primitives; variadic ones do their own).
  3. Computes result into a0.
  4. rets.

There is no code-id switch and no dispatch table — the table of primitives is the set of labels referenced from register_primitives at startup, and the interpreter keeps zero pointers to it afterwards.

Prelude

Written in Scheme, embedded as a bytevector literal in scheme.P1, parsed and evaluated at startup like any other source. Contents:

(define (list . xs) xs)
(define (length xs) ...)
(define (reverse xs) ...)
(define (append . lists) ...)
(define (list-ref xs i) ...)
(define (map f . lists) ...)
(define (for-each f . lists) ...)

Runs before the user file, sharing the same symbol table and heap. ~80 Scheme LOC.

Reader

Single-pass recursive descent over readbuf with an integer cursor. No token array. Conses directly onto the heap.

Handles the lexical syntax specified in LISP.md:

Source locations

As the reader conses each pair, it records pair_ptr → (line . col) in a side table (fixed-size open-addressing hash). error consults this when its first argument is a pair and prefixes the message with "at file:line:col: ". Without a GC, entries for collected pairs never arise; a full table simply overwrites old entries via linear probing.

Writer

Three entry points:

Output buffered in writebuf; flushed via sys_write(1, ...) on buffer-full or on newline for display/write. error flushes before writing to fd 2.

Error path

(error msg arg ...):

  1. Flush writer buffer.
  2. Write "error: " + message + formatted args to fd 2 via sys_write.
  3. sys_exit(1).

No unwinding, no handlers. Primitive failure follows the same policy as the spec: undefined behavior, likely crash, no runtime check.

Startup

P1 defines a portable program-entry model (see P1.md §Program Entry): the arch backend emits _start, captures argc/argv off the native entry stack, and calls the portable label p1_main with a0 = argc, a1 = argv. On return, the backend does sys_exit(a0).

The Scheme interpreter therefore contains no per-target startup code. It just defines p1_main:

p1_main:
    ; a0 = argc, a1 = argv
    init_heap                 ; s0 = &heap, s1 = &heap_end
    init_symtab
    intern_special_forms      ; fill labeled slots sym_quote, sym_if, ...
                              ; (order reserves sym_idx 0..N_SPECIAL-1)
    register_primitives       ; emit %HDR.PRIM words with &entry labels
    parse_and_eval_prelude    ; prelude is an embedded bytevector literal
    ; slurp argv[1] into readbuf
    ; loop: read_one_top_level; eval(form, NIL); advance cursor; until EOF
    li  a0, 0
    ret                       ; backend _start does sys_exit(0)

All state lives in labeled reservations; nothing is heap-allocated before p1_main.

File layout

boot2/
  src/
    scheme.P1             # M1PP source, consumes P1.M1pp macro library
  lisp/
    prelude.scm           # embedded into scheme.P1 as a bytevector literal
  tests/lisp/
    00-arith.scm
    01-list.scm
    02-reader.scm
    03-printer.scm
    04-eval.scm
    05-tailcall.scm
    06-pmatch.scm
    07-records.scm
    ...

Build: prepend the arch-specific backend M1M to p1/P1.M1pp, prepend that bundle to src/scheme.P1, run m1ppM0hex2 → ELF. Tests are .scm files the interpreter runs; pass = exit 0 and expected stdout.

Milestones

Status legend: [x] done · [~] in progress · [ ] not started.

  1. Bootstrapping shim. Per-target _start, argv capture, labeled reservations for heap / symtab / readbuf / writebuf, sys_write + sys_exit error path. Exits with a placeholder status.
  2. Tag macros. %tagof, %make_fixnum, %untag_fixnum, %car, %cdr, %hdr_*. Verify constants at M1PP time with %select.
  3. Bump allocator and cons. %cons macro; OOM branch. Hand-build (cons 42 nil) and exit with the decoded car.
  4. Symbols + interning. 16k-slot open-addressing table; intern(name, len) returns a tagged symbol val_t. Two interns of the same name compare equal as integers.
  5. Bytevectors. Headered heap object; make-bytevector, bytevector-u8-ref, etc.
  6. Reader. End-to-end source → s-expression for the full lexical syntax, with source-location side table.
  7. Writer. display, write, format.
  8. eval + apply core. Self-eval, symbol lookup, if, lambda, top-level define, application. Every tail position uses %tail.
  9. Remaining special forms. set!, begin, let/let*/letrec, named let, cond, and/or, inner define in eval_body.
  10. Primitives. All 55 user-facing + 6 internal. Batches: arith/bitwise, list-core, bv/string, I/O+apply+error, record ops.
  11. Prelude. Embed as bytevector literal; parse and eval at startup.
  12. define-record-type. Desugaring + record primitives.
  13. pmatch. Pattern matcher + clause driver.
  14. End-to-end. Run the self-hosted compiler under it.

Rough total: ~4–5k P1 LOC + ~80 Scheme LOC.