Scheme in P1
Goal
A small tree-walking interpreter for the Scheme subset specified in LISP.md, written in P1 via the M1PP + M1 + M0 + hex2 toolchain. Intent: simplicity and correctness first; performance is a later concern.
The self-hosted C compiler (see PLAN.md) will run under this interpreter, as will shell-style scripts built on shell.scm and prelude.scm.
Toolchain
scheme.P1 (hand-written M1PP source using P1.M1pp)
| prepend arch backend + P1.M1pp
| m1pp
v
scheme.M1 (target-specific M1 ops)
| M0
v
scheme.hex2
| hex2
v
scheme (native ELF)
C-shaped snippets in this document are descriptive of the target shape, not the source. The source is M1PP-preprocessed P1 assembly.
Word size
P1-64 is the initial target. Every heap word is 8 bytes; every object
is 8-byte aligned. P1-32 is out of scope for the first pass; porting
would require a parallel layout story because M1PP's %struct stride
is fixed at 8.
Settled decisions
Load-bearing; the rest of the document assumes them.
- Tree-walking eval/apply over raw s-expressions. No AST preprocessing, no bytecode, no compilation step. The reader's output is the interpreter's input.
- Static buffers, no GC. Fixed-size labeled byte reservations for heap, symbol table, reader buffer, writer buffer. Heap exhaustion aborts with an error.
- Bump allocation, 8-byte aligned. Single
heap_nextpointer (pinned ins0); every object is 8-byte aligned by construction. - Tagged values, low-3-bit tag. See §Value representation.
- Symbols are interned indices. No heap object per symbol; the value is an index into the symbol table, which stores the name bytevector.
- Environment is a flat association list.
((sym . val) ...), extended by consing onto the front. Lookup walks the list; falls through to the global slot in the symbol table. - Closures capture the env pointer directly. No free-variable
analysis, no flat closures, no boxes. A closure is four words:
[header, params, body, env]. - No
set!on captured variables. Language-level rule. The implementation does not enforce it; behavior is unspecified if violated. Use a pair for shared mutable state. - Inner
defineis sequential-only. Visible to later forms in the same body, not to earlier forms. Useletrecfor mutual recursion between local helpers. pmatchis a built-in special form. Saves writing a macro expander for one feature.- Tail calls via P1
TAIL/TAILR. Every tail position inevalandapplycompiles toTAIL(label) orTAILR(register) rather thanCALL+RET. Scheme-level tail-call correctness falls out. - Primitive failure is undefined behavior. No runtime checks on
(car '()),(quotient n 0), out-of-bounds bytevector access, integer overflow, or mutation of literals. Follows the spec's "Primitive failure" policy.
Memory layout
Labeled byte reservations in the final image. Starting sizes:
| Region | Size | Purpose |
|---|---|---|
heap |
64 MB | bump-allocated Scheme objects |
symtab |
16384 × | open-addressing intern table |
readbuf |
16 MB | slurped source file |
writebuf |
64 KB | staged output for display/write |
argbuf |
64 × 8 B | primitive dispatch scratch |
All sized via M1PP constants; easy to retune. Overflow of any buffer →
error(...) → sys_exit(1).
Symbol table entry
struct sym_entry {
val_t name_bv; // interned name, heap bytevector
val_t global_val; // top-level binding (or UNBOUND)
uint32_t name_hash;
uint32_t chain_next;
};
global_val gives O(1) top-level lookup: every symbol's global
binding is reachable without hash probing once the symbol is resolved.
M1PP layout (widened to word-per-field for 8-byte stride):
%struct SYMENT { name_bv global_val name_hash chain_next }
; .name_bv = 0, .global_val = 8, .name_hash = 16, .chain_next = 24,
; .SIZE = 32
The hash and next fields are logically 32-bit but occupy a full
word each to stay on the %struct stride; simpler than packing.
The symbol table and heap are the only global roots.
Value representation
All values are 64-bit tagged words. Low 3 bits are the tag.
| Tag | Kind | Notes |
|---|---|---|
000 |
fixnum | 61-bit signed; >> 3 to recover |
001 |
pair pointer | 16 B aligned; [p-1] car, [p+7] cdr |
010 |
symbol | (idx << 3) | 2; idx into symtab |
011 |
headered heap object | [p-3] is the header word |
100 |
immediate singleton | #f, #t, (), unspec, unbound |
101–111 |
reserved |
Tag and immediate constants
Encoded as M1PP %enums. Concrete values derive via expression:
%enum TAG { FIXNUM PAIR SYM HEAP IMM } ; raw indices 0..4
%enum IMM { FALSE TRUE NIL UNSPEC UNBOUND } ; raw indices 0..4
; a tag 100 value in low bits is literally %TAG.IMM == 4
%macro MKIMM(idx)
$((| (<< idx 3) %TAG.IMM))
%endm
; usage: %MKIMM(%IMM.FALSE), %MKIMM(%IMM.NIL), etc.
Truthiness: every value except %MKIMM(%IMM.FALSE) is truthy. Test is
a single compare.
Object types
This is the complete runtime type set.
| Type | Representation |
|---|---|
| fixnum | immediate (tag 000) |
| boolean | immediate (FALSE, TRUE) |
| empty list | immediate (NIL) |
| unspec / unbound | immediates |
| pair | [car][cdr], 16 B, tag 001, no header |
| bytevector | [hdr = BV | len<<8][bytes...], tag 011 |
| closure | [hdr = CLOSURE][params][body][env], tag 011 |
| primitive | [hdr = PRIM | entry_label], tag 011 |
| record-td | [hdr = TD][name_sym][nfields], tag 011 |
| record | [hdr = REC][td][field_0]...[field_{n-1}], tag 011 |
Headered object header word (little-endian bytes):
[ type:8 | aux:56 ]
where aux carries type-specific data (length for bytevectors,
entry-label for primitives, nfields for records, etc.). One load
tells you the type.
Layouts
%struct PAIR { car cdr } ; .SIZE = 16
%struct CLOSURE { hdr params body env } ; .SIZE = 32
%struct BV { hdr } ; bytes follow inline
%struct PRIM { hdr } ; header encodes entry
%struct TD { hdr name_sym nfields } ; .SIZE = 24
; record is variable-width; td+nfields known from TD
%struct REC_HDR { hdr td } ; field_i at offset 16 + 8*i
%enum HDR { BV CLOSURE PRIM TD REC } ; type-byte values
Strings and bytevectors are the same type. string? ≡ bytevector?.
No character type: #\a in source is sugar for fixnum 97.
Primitive encoding
An M1 label is 4 bytes. A primitive fits in a single 8-byte header:
bytes [0..4): &entry_label ; 4-byte hex2 label reference
bytes [4..6): arity ; @(arity) little-endian 2 bytes
byte [6] : type = %HDR.PRIM ; !(%HDR.PRIM)
byte [7] : 0 ; !(0)
This makes apply_prim a header-load plus callr: no dispatch table,
no switch. §Apply sketches the sequence.
Tag idioms
A handful of short sequences do all of the runtime's bit-twiddling. They are M1PP macros so the rest of the code reads at the Scheme level. These are illustrative; exact op selection is finalized during implementation.
%macro tagof(rd, rs) ; rd = rs & 7
%andi(rd, rs, 7)
%endm
%macro make_fixnum(rd, rs) ; rd = rs << 3
%shli(rd, rs, 3)
%endm
%macro untag_fixnum(rd, rs) ; rd = rs >> 3, arithmetic
%sari(rd, rs, 3)
%endm
%macro car(rd, rs) ; pair tag is 001; raw = rs - 1
%ld(rd, rs, -1) ; [raw + 0] = car
%endm
%macro cdr(rd, rs)
%ld(rd, rs, 7) ; [raw + 8] = cdr
%endm
%macro set_car(rs, rv)
%st(rv, rs, -1)
%endm
%macro set_cdr(rs, rv)
%st(rv, rs, 7)
%endm
%macro hdr_load(rd, rs) ; heap tag is 011; raw = rs - 3
%ld(rd, rs, -3)
%endm
%macro hdr_type(rd, rs) ; type byte is low byte of header
%lb(rd, rs, -3)
%endm
%macro hdr_field(rd, rs, idx) ; word field idx, 0 counts the header word
%ld(rd, rs, (- (* idx 8) 3))
%endm
cons is a bump + two stores + tag:
%macro cons(rd, rcar, rcdr, rtmp)
; assumes s0 = heap_next, s1 = heap_end
%addi(rtmp, s0, 16)
%la_br(heap_oom)
%bltu(s1, rtmp) ; if heap_end < next+16, OOM
%st(rcar, s0, 0)
%st(rcdr, s0, 8)
%addi(rd, s0, 1) ; tag 001
%mov(s0, rtmp)
%endm
alloc_hdr(rd, type_byte, bytes, rtmp) is the corresponding idiom for
headered objects; it bumps, writes the header word, and returns the
tagged pointer. The header word is composed on the fly from the type
byte and any aux field.
Dispatch tables
P1 has no indexed branch op, but it has BR rs / CALLR rs / TAILR rs and one-word loads. That's enough to build a jump table for any
switch wider than ~3 cases, which we use for the special-form
cascade in eval.
Each entry is 8 bytes: a 4-byte M1 label reference plus 4 bytes of
pad. P1-64 has no 4-byte load, only full-word LD, so padding to
a word is the cheap path.
%macro table_entry(lref)
lref %(0) ; label ref + 4 zero bytes = 8 bytes
%endm
Usage (per special form, in index order):
special_dispatch_table:
%table_entry(&eval_quote)
%table_entry(&eval_if)
%table_entry(&eval_lambda)
%table_entry(&eval_define)
%table_entry(&eval_set)
%table_entry(&eval_begin)
...
Dispatch is load + indirect branch:
; t0 = sym_idx, already range-checked (< N_SPECIAL)
%la(t1, special_dispatch_table)
%shli(t2, t0, 3)
%add(t1, t1, t2)
%ld(t1, t1, 0)
%tailr(t1)
Five ops and one memory load, regardless of how many special forms.
Reservation convention. Special-form symbols are interned first at
startup, so their sym indices are 0..N_SPECIAL-1. The dispatcher
checks idx < N_SPECIAL before using the table; out-of-range means
"ordinary application".
Allocation
C sketch (target shape):
static val_t alloc(size_t bytes) {
char *p = heap_next;
heap_next += (bytes + 7) & ~7;
if (heap_next > heap_end) error("heap exhausted");
return (val_t) p;
}
P1 translation: s0 and s1 are pinned to heap_next and heap_end
for the lifetime of the program. Allocation is addi + bltu +
mov. In practice almost every allocation site calls one of %cons,
%alloc_hdr, or an inline bytevector emitter, not a generic alloc.
Register assignment
Four callee-saved registers carry the hot state:
| Reg | Role |
|---|---|
s0 |
heap_next |
s1 |
heap_end |
s2 |
current env pointer |
s3 |
scratch held across one inner loop at a time |
eval(expr, env) is a0 = expr, a1 = env, returns in a0.
Handlers copy a1 into s2 at entry. a0..a3 and t0..t2 are free
across calls. Leaf code inside a handler may use s3 without saving,
provided it restores before the handler returns.
Reader, writer, and primitive bodies follow the same discipline. Any
primitive that needs deep recursion (e.g. equal?) saves s2 in its
frame even if it does not touch env, since its callees might.
Closures
[ hdr = CLOSURE ][ params ][ body ][ env ]
params: parameter list s-expression:(a b c),(a b . rest), or a bare symbolrestfor a fully variadic lambda.body: the list of body forms, evaluated sequentially; last in tail position.env: the environment chain captured atlambdaevaluation.
Four words total. No free-var analysis at closure creation — the whole env chain is retained and walked at lookup time.
Environment
env ::= NIL
| ( (sym . val) . env )
- Lookup. Walk the list, pointer-compare the car of each pair with
the target symbol, return the cdr on match. If the walk ends at
NIL, fall through tosymtab[idx].global_val; if that isUNBOUND, error out with"unbound variable". - Extension (scope entry). Cons new
(sym . val)pairs onto the front. set!. Walk the env chain; on match,set-cdr!the pair. If the chain is exhausted, write tosymtab[idx].global_val.- Top-level
define. Write tosymtab[idx].global_val. - Inner
define. Handled inline byeval_body(see below). Each inner define prepends(name . val)to the local env as it is encountered.
The global env is not a chain — it's the symbol table. Closures
captured before a top-level define nonetheless see the new binding,
because the fallback is to symtab[idx].global_val at lookup time,
not at capture time. This is how mutual recursion between top-level
functions works without any pre-pass.
Eval
C sketch (target shape):
val_t eval(val_t expr, val_t env) {
switch (tag(expr)) {
case TAG_FIXNUM: return expr;
case TAG_IMM: return expr;
case TAG_SYM: return env_lookup(expr, env);
case TAG_HEAP: return expr; // bv, closure, record — self-eval
case TAG_PAIR: break; // form; dispatch below
}
val_t head = car(expr);
val_t rest = cdr(expr);
if (head == SYM_QUOTE) return car(rest);
if (head == SYM_IF) {
val_t t = eval(car(rest), env);
val_t next = (t != FALSE) ? cadr(rest) : caddr(rest);
MUSTTAIL return eval(next, env);
}
if (head == SYM_LAMBDA) return make_closure(car(rest), cdr(rest), env);
if (head == SYM_DEFINE) return eval_define(rest, env);
if (head == SYM_SET) return eval_set(rest, env);
if (head == SYM_BEGIN) MUSTTAIL return eval_begin(rest, env);
if (head == SYM_LET) MUSTTAIL return eval_let(rest, env);
if (head == SYM_LETSTAR) MUSTTAIL return eval_letstar(rest, env);
if (head == SYM_LETREC) MUSTTAIL return eval_letrec(rest, env);
if (head == SYM_COND) MUSTTAIL return eval_cond(rest, env);
if (head == SYM_AND) MUSTTAIL return eval_and(rest, env);
if (head == SYM_OR) MUSTTAIL return eval_or(rest, env);
if (head == SYM_PMATCH) MUSTTAIL return eval_pmatch(rest, env);
if (head == SYM_DEFINE_RT) return eval_define_record_type(rest, env);
// application
val_t fn = eval(head, env);
val_t args = eval_list(rest, env);
MUSTTAIL return apply(fn, args);
}
The special-form symbols (SYM_QUOTE, SYM_IF, etc.) are interned
once at startup and stored in labeled word slots. Dispatch is a
cascade of integer compares.
P1 translation. The outer tag switch is a short cascade of beqs on
the tag value (fixnum / imm / sym / heap return immediately; pair
falls through to form dispatch). Form dispatch uses the jump table
from §Dispatch tables:
; eval_pair path, a0 = expr, a1 = env already normalized
%car(t1, a0) ; t1 = head
%cdr(a1, a0) ; a1 = rest, set up for handlers
; fall through to ordinary application if head is not a symbol
%andi(t2, t1, 7)
%la_br(eval_apply)
%addi(t3, t2, (- %TAG.SYM))
%bnez(t3) ; tag != SYM -> apply
; extract idx and bounds-check
%sari(t0, t1, 3) ; t0 = sym_idx
%la(t2, N_SPECIAL_const)
%ld(t2, t2, 0)
%la_br(eval_apply)
%bltu(t0, t2) ; invert: take table path iff idx < N
...
; table dispatch
%la(t1, special_dispatch_table)
%shli(t2, t0, 3)
%add(t1, t1, t2)
%ld(t1, t1, 0)
%tailr(t1)
(The bounds-check polarity and the branch for "take table path" are
sketched loosely; the final shape is whatever combination of
BLT/BLTU/BEQ produces the fewest ops.)
Each handler is its own P1 function. Tail positions inside handlers
end with %la_br(target); %tail(). An ordinary application is
eval(head), eval_list(rest), then %la_br(apply); %tail().
Apply
C sketch:
val_t apply(val_t fn, val_t args) {
switch (header_type(fn)) {
case HDR_PRIM:
return call_prim(fn, args);
case HDR_CLOSURE: {
val_t env = bind_params(clo_params(fn), args, clo_env(fn));
MUSTTAIL return eval_body(clo_body(fn), env);
}
default:
error("not a procedure", fn);
}
}
bind_params walks params and args in lockstep. Variadic .-tail:
when params becomes a non-pair symbol, that symbol is bound to the
remaining args list. When params becomes NIL with args remaining
(or vice versa), error out with an arity message.
P1 translation. hdr_type reads one byte; dispatch is two compares:
; apply(fn=a0, args=a1)
%hdr_type(t0, a0)
%la(t1, %HDR.PRIM)
%la_br(apply_prim)
%beq(t0, t1)
%la(t1, %HDR.CLOSURE)
%la_br(apply_closure)
%beq(t0, t1)
%la_br(err_not_procedure)
%b()
apply_prim extracts the entry label from the primitive's header
(low 4 bytes) and callrs it with the arg list in a0:
apply_prim:
%hdr_load(t0, a0) ; t0 = header word
%andi(t0, t0, 0xFFFFFFFF) ; low 32 bits = &entry
%mov(a0, a1) ; arg list in a0 for the primitive
%callr(t0)
%ret()
apply_closure runs bind_params, extends env, then %la_brs
eval_body and %tail()s.
Eval_body
val_t eval_body(val_t body, val_t env) {
while (is_pair(cdr(body))) {
val_t form = car(body);
if (is_pair(form) && car(form) == SYM_DEFINE) {
val_t name = define_name(form);
val_t val = eval(define_rhs(form), env);
env = cons(cons(name, val), env);
} else {
eval(form, env);
}
body = cdr(body);
}
val_t last = car(body);
if (is_pair(last) && car(last) == SYM_DEFINE) {
val_t name = define_name(last);
val_t val = eval(define_rhs(last), env);
env = cons(cons(name, val), env);
return UNSPEC;
}
MUSTTAIL return eval(last, env);
}
Inner define is sequential-only: visible to forms that textually
follow it in the same body, not to earlier forms. For mutual
recursion between local helpers, use letrec or lift to top level.
The SYM_DEFINE handler in eval itself handles only top-level
define; inner defines are absorbed here and never reach that path.
Special forms
Implemented directly, each ~30-60 P1 ops:
quote— return the datum.if— eval test; tail-eval the chosen branch.lambda— build closure (one%alloc_hdr).define— top-level: writesymtab[idx].global_val. Inner: handled byeval_body.set!— walk env; first match mutates. Miss: write global slot.begin— eval each form in sequence; tail-eval the last.let— eval all inits in the current env; extend env with bindings; tail-eval body viaeval_body.let*— extend env one binding at a time, each init evaluated in the progressively extended env.letrec— pre-bind each name toUNSPECin a new env; eval each init in that env;set!each binding; tail-eval body.let(named) — desugar inline toletrec+ call, or implement directly.cond— loop clauses; first truthy test wins; tail-eval its body;elsealways true.and/or— short-circuit fold; return the deciding value, not a coerced boolean.pmatch— see below.define-record-type— see below.
pmatch
The matcher walks pattern and subject recursively; on success it
returns an env extended with the bindings; on failure it returns a
sentinel. The driver tries clauses top-to-bottom, skipping clauses
whose guard evaluates to #f. No match and no else is undefined
behavior per spec.
Patterns:
| Pattern | Matches |
|---|---|
() |
the empty list |
<literal> |
integer / string / char / bool by equal? |
<symbol> |
that exact symbol (not a binder) |
,<ident> |
anything; binds to <ident> |
,_ |
anything; no binding |
(p1 p2 ...) |
proper list of exactly that length |
(p1 ... . ptail) |
improper list; ptail binds the rest |
No compilation or caching — rewalk the patterns each call.
define-record-type
Desugars at eval time into a sequence of primitive operations.
(define-record-type point
(make-point x y)
point?
(x point-x)
(y point-y set-point-y!))
becomes
(define <point-td> (%make-record-td 'point 2))
(define make-point (lambda (x y) (%make-record <point-td> x y)))
(define point? (lambda (v) (%record-is-a? v <point-td>)))
(define point-x (lambda (p) (%record-ref p 0)))
(define point-y (lambda (p) (%record-ref p 1)))
(define set-point-y! (lambda (p v) (%record-set! p 1 v)))
The % primitives are internal, not part of the user-facing
primitive list.
Primitives
55 user-facing primitives plus 6 internal record ops. Each is its own
P1 function, reached via callr on the label packed into the
primitive object's header.
Predicates / equality (13): eq?, equal?, not, boolean?,
pair?, null?, symbol?, integer?, string? (≡ bytevector?),
procedure?, record?, record-is-a?, record-type?.
Pairs (5): cons, car, cdr, set-car!, set-cdr!.
Integers (24): +, -, *, quotient, remainder, modulo,
=, <, <=, >, >=, zero?, positive?, negative?, abs,
min, max, bit-and, bit-or, bit-xor, bit-not,
arithmetic-shift, number->string, string->number.
Bytevectors / strings / symbols (10): make-bytevector,
bytevector-length, bytevector-u8-ref, bytevector-u8-set!,
bytevector-copy, bytevector-copy!, bytevector-append,
bytevector=?, string->symbol, symbol->string.
Control / program / error (3): apply, argv, error.
Internal record ops (6): %make-record-td, %make-record,
%record-ref, %record-set!, %record-is-a?, %record-td.
Calling convention
apply_prim receives the arg list in a0 and jumps via callr to
the primitive's entry label (extracted from the primitive header).
Each primitive's body:
- Walks the arg list, copying values into
argbuf[]and countingargc. - Checks arity (fixed-arity primitives; variadic ones do their own).
- Computes result into
a0. rets.
There is no code-id switch and no dispatch table — the table of
primitives is the set of labels referenced from register_primitives
at startup, and the interpreter keeps zero pointers to it afterwards.
Prelude
Written in Scheme, embedded as a bytevector literal in scheme.P1,
parsed and evaluated at startup like any other source. Contents:
(define (list . xs) xs)
(define (length xs) ...)
(define (reverse xs) ...)
(define (append . lists) ...)
(define (list-ref xs i) ...)
(define (map f . lists) ...)
(define (for-each f . lists) ...)
Runs before the user file, sharing the same symbol table and heap. ~80 Scheme LOC.
Reader
Single-pass recursive descent over readbuf with an integer cursor.
No token array. Conses directly onto the heap.
Handles the lexical syntax specified in LISP.md:
- lists with
.dotted tail 'x→(quote x)(no quasiquote, unquote, unquote-splicing)- fixnums: decimal and
#x-prefixed hex, both with optional sign #t,#f"..."strings with\n \t \r \\ \"escapes#\a..#\~,#\newline,#\space,#\tab,#\return,#\null,#\xNN→ read as fixnums- symbols: identifier grammar, case-sensitive
;line comments (no block comments, no#;)
Source locations
As the reader conses each pair, it records pair_ptr → (line . col)
in a side table (fixed-size open-addressing hash). error consults
this when its first argument is a pair and prefixes the message with
"at file:line:col: ". Without a GC, entries for collected pairs
never arise; a full table simply overwrites old entries via linear
probing.
Writer
Three entry points:
display— no quoting on strings; bare character output.write— readable form; strings quoted.format fmt args...— supports~a(display),~s(write),~d(decimal fixnum),~%(newline).
Output buffered in writebuf; flushed via sys_write(1, ...) on
buffer-full or on newline for display/write. error flushes
before writing to fd 2.
Error path
(error msg arg ...):
- Flush writer buffer.
- Write
"error: "+ message + formatted args to fd 2 viasys_write. sys_exit(1).
No unwinding, no handlers. Primitive failure follows the same policy as the spec: undefined behavior, likely crash, no runtime check.
Startup
P1 defines a portable program-entry model (see P1.md §Program Entry):
the arch backend emits _start, captures argc/argv off the native
entry stack, and calls the portable label p1_main with a0 = argc,
a1 = argv. On return, the backend does sys_exit(a0).
The Scheme interpreter therefore contains no per-target startup code.
It just defines p1_main:
p1_main:
; a0 = argc, a1 = argv
init_heap ; s0 = &heap, s1 = &heap_end
init_symtab
intern_special_forms ; fill labeled slots sym_quote, sym_if, ...
; (order reserves sym_idx 0..N_SPECIAL-1)
register_primitives ; emit %HDR.PRIM words with &entry labels
parse_and_eval_prelude ; prelude is an embedded bytevector literal
; slurp argv[1] into readbuf
; loop: read_one_top_level; eval(form, NIL); advance cursor; until EOF
li a0, 0
ret ; backend _start does sys_exit(0)
All state lives in labeled reservations; nothing is heap-allocated
before p1_main.
File layout
boot2/
src/
scheme.P1 # M1PP source, consumes P1.M1pp macro library
lisp/
prelude.scm # embedded into scheme.P1 as a bytevector literal
tests/lisp/
00-arith.scm
01-list.scm
02-reader.scm
03-printer.scm
04-eval.scm
05-tailcall.scm
06-pmatch.scm
07-records.scm
...
Build: prepend the arch-specific backend M1M to p1/P1.M1pp, prepend
that bundle to src/scheme.P1, run m1pp → M0 → hex2 → ELF.
Tests are .scm files the interpreter runs; pass = exit 0 and
expected stdout.
Milestones
Status legend: [x] done · [~] in progress · [ ] not started.
- Bootstrapping shim. Per-target
_start, argv capture, labeled reservations for heap / symtab / readbuf / writebuf,sys_write+sys_exiterror path. Exits with a placeholder status. - Tag macros.
%tagof,%make_fixnum,%untag_fixnum,%car,%cdr,%hdr_*. Verify constants at M1PP time with%select. - Bump allocator and cons.
%consmacro; OOM branch. Hand-build(cons 42 nil)and exit with the decoded car. - Symbols + interning. 16k-slot open-addressing table;
intern(name, len)returns a tagged symbol val_t. Two interns of the same name compare equal as integers. - Bytevectors. Headered heap object;
make-bytevector,bytevector-u8-ref, etc. - Reader. End-to-end source → s-expression for the full lexical syntax, with source-location side table.
- Writer.
display,write,format. eval+applycore. Self-eval, symbol lookup,if,lambda, top-leveldefine, application. Every tail position uses%tail.- Remaining special forms.
set!,begin,let/let*/letrec, named let,cond,and/or, innerdefineineval_body. - Primitives. All 55 user-facing + 6 internal. Batches: arith/bitwise, list-core, bv/string, I/O+apply+error, record ops.
- Prelude. Embed as bytevector literal; parse and eval at startup.
define-record-type. Desugaring + record primitives.pmatch. Pattern matcher + clause driver.- End-to-end. Run the self-hosted compiler under it.
Rough total: ~4–5k P1 LOC + ~80 Scheme LOC.