boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

Minimal Scheme subset (boot2)

Working doc. Baseline is R7RS-small (2013); everything here is a delta against it. Intent: smallest surface that supports writing a self-hosted compiler and shell-style scripts comfortably. Things cut can always come back; things admitted are load-bearing.

Lexical syntax

Types

The runtime knows about exactly these:

Type Notes
boolean #t, #f
integer word-size; 32-bit or 64-bit per target
symbol globally interned; eq?-comparable
string / bv same type; contiguous u8[]; literals immutable
pair cons cell; literals immutable
empty list '(), disjoint from pair
procedure
record via define-record-type

Characters are not a separate type — #\a is lexical sugar for the u8 value 97.

Not present: vectors, ports (as a language type), promises, parameters (dynamic-scoped values, distinct from function args), continuations, multiple values, exceptions.

Special forms

Core procedures

Flat namespace — no library carving for v1.

Predicates / equality eq?, equal?, not, boolean?, pair?, null?, symbol?, integer?, string? (≡ bytevector?), procedure?.

Pairs & lists cons, car, cdr, set-car!, set-cdr!, list, length, reverse, append, list-ref, map, for-each.

Integers (word-size; overflow undefined) + - *, quotient, remainder, modulo, = < <= > >=, zero?, positive?, negative?, abs, min, max, bit-and, bit-or, bit-xor, bit-not, arithmetic-shift, number->string (optional radix; 10 and 16 required), string->number (optional radix; returns #f on parse failure — this is a pure op, not a syscall, so the normal (ok . val) convention doesn't apply).

Division rounding (R7RS semantics) — worth stating since target ISAs differ at the instruction level:

Strings / bytevectors make-bytevector, bytevector-length, bytevector-u8-ref, bytevector-u8-set!, bytevector-copy (optional start/end → fresh bv), bytevector-copy! (R7RS arg order: dst dst-start src [src-start src-end]), bytevector-append (variadic; fresh bv), bytevector=? (binary; byte-for-byte equality), string->symbol, symbol->string.

Control apply. Tail calls are guaranteed proper on all targets — named let and every looping idiom depend on it.

Program (argv) returns the process's command-line arguments as a list of bytevectors. The first element is the program name as passed by the kernel (argv[0]); user arguments start at the second element.

Errors error aborts with a message and optional irritants. No raise / guard / handlers.

Primitive failure is undefined behavior / crash. (car '()), (quotient 1 0), out-of-bounds bytevector-u8-ref, integer overflow, mutating a literal — all implementation-defined, expect a crash. Callers should not rely on any particular outcome.

Cut from R7RS-small

Kept explicit so additions are deliberate.

Feature Why cut
syntax-rules, macros, define-syntax library is written without them for v1
call/cc, dynamic-wind big runtime cost, not needed here
values, call-with-values replaced by (ok . val) pair convention
Numeric tower (rational/real/complex/float) integers only
Bignums word-size integers only (32- or 64-bit)
Character type (char? char->integer etc) #\a syntax kept as u8 sugar; no char type
Vectors (#(…), make-vector, etc.) lists for sequences, records for structs
First-class ports shell.scm defines its own
quasiquote / ` , ,@ sugar; re-add with macros
case, when, unless, do sugar over cond/if
delay, force, promises
make-parameter, parameterize
guard, raise, with-exception-handler v1 uses (ok . val)
define-library, import, export files are files for v1
include, cond-expand
case-lambda no arity overloading
#o… #b… #; #\| \|# keeps the lexer small (#\… is kept)
eqv? eq? + equal? are enough

Dependencies of shell.scm on this subset

Sanity check that the subset actually supports the library we've written:

Anything flagged here that we ultimately cut forces a rewrite of shell.scm.