Minimal Scheme subset (boot2)
Working doc. Baseline is R7RS-small (2013); everything here is a delta against it. Intent: smallest surface that supports writing a self-hosted compiler and shell-style scripts comfortably. Things cut can always come back; things admitted are load-bearing.
Lexical syntax
- Identifiers: standard Scheme grammar, case-sensitive. Includes
+ - * / ! ? < > = _ & : ~ ^and->in names. No|…|vertical-bar quoted identifiers. A lone.is not a symbol — it's reserved for dotted-pair syntax. - Booleans:
#t,#f. - Integers: decimal (
42,-7) and hex (#xff,#x-1a). Word-size only — 32-bit on 32-bit targets, 64-bit on 64-bit targets. No#o(octal), no#b(binary), no floats, rationals, complex, or bignums. Overflow is undefined (see "Primitive failure" below). - Strings:
"…"with escapes\n \t \r \\ \". A string is a bytevector; indexing is by u8. String literals are immutable —bytevector-u8-set!on a literal is undefined. - Characters:
#\athrough#\~for printable ASCII, plus named forms#\newline(10),#\space(32),#\tab(9),#\return(13),#\null(0), and#\xNNfor any byte. A character literal is a u8 integer — no distinct character type.(= #\a 97)is#t. - Symbols: bare (
foo,list->bytes). Globally interned — two symbols that print the same areeq?. - Pairs / lists:
(a b c),(a . b),'()for the empty list. Literal pairs are immutable. - Quote:
'x≡(quote x). No quasiquote / unquote / unquote-splicing. - Comments:
;to end of line. No#| … |#block, no#;datum.
Types
The runtime knows about exactly these:
| Type | Notes |
|---|---|
| boolean | #t, #f |
| integer | word-size; 32-bit or 64-bit per target |
| symbol | globally interned; eq?-comparable |
| string / bv | same type; contiguous u8[]; literals immutable |
| pair | cons cell; literals immutable |
| empty list | '(), disjoint from pair |
| procedure | |
| record | via define-record-type |
Characters are not a separate type — #\a is lexical sugar for the
u8 value 97.
Not present: vectors, ports (as a language type), promises, parameters (dynamic-scoped values, distinct from function args), continuations, multiple values, exceptions.
Special forms
(define name value)and(define (name arg ...) body ...), including variadic tails:(define (f . rest) …)and(define (f a b . rest) …). Top-leveldefines may forward-reference each other — mutual recursion at module scope is supported. Inside alambda/letbody,defines behave likeletrec*(sequential, visible to later forms).(lambda (arg ...) body ...)with the same.-tail syntax. Alambdaevaluates to a closure: a first-class procedure value that captures its enclosing lexical environment by reference.set!on a captured variable is visible to every closure that captured it, and closures that escape their creating frame outlive it (general case: heap-allocated; non-escape optimization is an implementation detail).(if test then else)—elserequired.(cond (test body ...) ... (else body ...)).(and e ...)/(or e ...)— short-circuiting; return the deciding value, not a coerced boolean.(let ((x v) ...) body ...),(let* …),(letrec …).(let name ((x v) ...) body ...)— named let.(begin e ...).(set! name value).(quote datum)/'datum.(define-record-type name (constructor field ...) predicate (field accessor [mutator]) ...)— R7RS form. Creates a disjoint type with the given constructor, predicate, and field accessors/mutators. Implementable as a single heap-allocated object with a type-descriptor pointer; does not require exposing vectors.(pmatch expr clause ...)— pattern matcher. Not R7RS; added because the self-hosted compiler destructures s-expressions constantly and a macroless language makes hand-rolledcar/cdrchains painful. Clause forms:(pattern body ...) (pattern (guard g-expr ...) body ...) (else body ...)Patterns:
Pattern Matches ()the empty list <literal>integers, strings, chars, booleans by equal?<symbol>that exact symbol (not a binder) ,<ident>anything; binds to <ident>,_anything; no binding (wildcard) (p1 p2 ...)proper list of exactly that length (p1 ... . ptail)improper list; ptailbinds the restClauses are tried top-to-bottom; first matching pattern with truthy guards wins. A guard that evaluates to
#fis treated as no-match and falls through to the next clause. No match with noelseis undefined behavior (follows the primitive-failure policy); put an explicit(else (error ...))when you care.Example:
(pmatch e ((quote ,x) (emit-literal x)) ((if ,t ,a ,b) (emit-if t a b)) ((lambda ,ps . ,body) (emit-lambda ps body)) (,x (guard (symbol? x)) (emit-var x)) (else (error "unknown form" e)))
Core procedures
Flat namespace — no library carving for v1.
Predicates / equality
eq?, equal?, not, boolean?, pair?, null?, symbol?,
integer?, string? (≡ bytevector?), procedure?.
Pairs & lists
cons, car, cdr, set-car!, set-cdr!, list, length,
reverse, append, list-ref, map, for-each.
Integers (word-size; overflow undefined)
+ - *, quotient, remainder, modulo, = < <= > >=, zero?,
positive?, negative?, abs, min, max,
bit-and, bit-or, bit-xor, bit-not, arithmetic-shift,
number->string (optional radix; 10 and 16 required),
string->number (optional radix; returns #f on parse failure — this
is a pure op, not a syscall, so the normal (ok . val) convention
doesn't apply).
Division rounding (R7RS semantics) — worth stating since target ISAs differ at the instruction level:
quotienttruncates toward zero:(quotient -7 2) ⇒ -3remainderhas the sign of the dividend:(remainder -7 2) ⇒ -1modulohas the sign of the divisor:(modulo -7 2) ⇒ 1- Identity:
(= n (+ (* (quotient n d) d) (remainder n d)))
Strings / bytevectors
make-bytevector, bytevector-length, bytevector-u8-ref,
bytevector-u8-set!, bytevector-copy (optional start/end → fresh bv),
bytevector-copy! (R7RS arg order: dst dst-start src [src-start src-end]),
bytevector-append (variadic; fresh bv),
bytevector=? (binary; byte-for-byte equality),
string->symbol, symbol->string.
Control
apply. Tail calls are guaranteed proper on all targets — named let and
every looping idiom depend on it.
Program
(argv) returns the process's command-line arguments as a list of
bytevectors. The first element is the program name as passed by the
kernel (argv[0]); user arguments start at the second element.
Errors
error aborts with a message and optional irritants. No raise /
guard / handlers.
Primitive failure is undefined behavior / crash. (car '()),
(quotient 1 0), out-of-bounds bytevector-u8-ref, integer overflow,
mutating a literal — all implementation-defined, expect a crash. Callers
should not rely on any particular outcome.
Cut from R7RS-small
Kept explicit so additions are deliberate.
| Feature | Why cut |
|---|---|
syntax-rules, macros, define-syntax |
library is written without them for v1 |
call/cc, dynamic-wind |
big runtime cost, not needed here |
values, call-with-values |
replaced by (ok . val) pair convention |
| Numeric tower (rational/real/complex/float) | integers only |
| Bignums | word-size integers only (32- or 64-bit) |
| Character type (char? char->integer etc) | #\a syntax kept as u8 sugar; no char type |
Vectors (#(…), make-vector, etc.) |
lists for sequences, records for structs |
| First-class ports | shell.scm defines its own |
quasiquote / ` , ,@ |
sugar; re-add with macros |
case, when, unless, do |
sugar over cond/if |
delay, force, promises |
— |
make-parameter, parameterize |
— |
guard, raise, with-exception-handler |
v1 uses (ok . val) |
define-library, import, export |
files are files for v1 |
include, cond-expand |
— |
case-lambda |
no arity overloading |
#o… #b… #; #\| \|# |
keeps the lexer small (#\… is kept) |
eqv? |
eq? + equal? are enough |
Dependencies of shell.scm on this subset
Sanity check that the subset actually supports the library we've written:
- variadic
.inlambda/define→spawn,run - named let →
refill!-driven read loops,bv-concat-reverse define-record-type→portrecordapply→runcallingspawnbit-or,bit-and,arithmetic-shift→ openat flags, wait-status decodemake-bytevectorwith fill byte →NL-BVbytevector-copy(3-arg) andbytevector-copy!→ read/write pathscons,car,cdr,null?,zero?,=,<,-,+→ everywhere
Anything flagged here that we ultimately cut forces a rewrite of shell.scm.