boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

scheme1

Minimal Scheme subset implemented by scheme1/scheme1.P1pp. A loose subset of R7RS-small. The interpreter reads s-expressions from argv[1], evaluates them top-to-bottom in a single global env, and exits.

tests/boot-run-scheme1.sh invokes scheme1 with prelude.scm catted in front of the user file. The prelude (scheme1/prelude.scm) defines the R7RS surface that is expressible over the runtime primitives — equivalence aliases, list/char/string helpers, and the shell.scm process / file-I/O layer.

Lexical syntax

Types

The runtime knows exactly:

Type Notes
boolean #t, #f
integer word-size; 32- or 64-bit per target
symbol globally interned; eq?-comparable
string / bv same type (HDR.BV); contiguous u8 buffer
pair cons cell
empty list '(), disjoint from pair
procedure closure or primitive
record via define-record-type
eof-object singleton; bound at top level as eof; also returned on EOF reads
unspecified singleton; result of set!, define, (if #f x), etc.

Multiple-values packs flow through values / call-with-values / let-values / let*-values; they are not intended to be observed directly.

Special forms

Top-level binding:

Procedures and binding:

Conditionals and sequencing:

Quote, records, matching:

Primitives

The runtime built-ins — registered at startup from prim_table in scheme1.P1pp. The prelude builds the wider R7RS surface on top of these.

Equality / predicates eq?, equal?, not, null?, pair?, boolean?, integer?, symbol?, string? (≡ bytevector?), procedure?, zero?, eof?.

Pairs cons, car, cdr, set-car!, set-cdr!, length, list-ref, assq, assoc, reverse. assq compares alist keys by eq?; assoc compares keys by equal?; both return the matching alist pair or #f. reverse returns a fresh reversed list.

Integers (word-size; overflow / divide-by-zero are UB) + - *, quotient, remainder, =, <, >, bit-and, bit-or, bit-xor, bit-not, arithmetic-shift. Arities: + * bit-and bit-or bit-xor accept 0+ args (identities 0 1 -1 0 0); - accepts 1+ ((- x) is unary negate); = < > accept 2+ and chain pairwise. quotient / remainder / arithmetic-shift are binary; bit-not is unary. quotient truncates toward zero; remainder has the sign of the dividend.

Bytevectors / strings make-bytevector, bytevector-length, bytevector-u8-ref, bytevector-u8-set!, bytevector-copy (3-arg src start end → fresh bv), bytevector-copy! (dst dst-start src src-start src-end), bytevector-append (variadic), bytevector=?, string-length (strlen of the data buffer up to the first NUL).

Symbols / numbers as text string->symbol, symbol->string, number->string (decimal by default; lowercase hex when the optional radix arg is 16, with a leading - for negatives; any other radix value falls back to decimal), string->number (decimal by default; hex when radix is 16, accepting upper- or lowercase digits and an optional leading +/-; returns #f on parse failure).

I/O and error display, write, format, error. format understands ~a (display), ~s (write), ~d (decimal fixnum), ~x (lowercase hex fixnum, signed: leading - for negatives), ~% (newline), ~~ (literal tilde); unknown directives pass through verbatim. error writes scheme1: error: <msg> <irritants…> to stderr and exits with status 1.

EOF eof (the singleton, bound at startup), eof?.

Multiple values values, call-with-values. (values x) is identical to x in single-value context; 0 or 2+ args produce an MV-pack consumable by call-with-values / let-values / let*-values.

Apply apply. Tail calls are guaranteed proper.

Syscalls (Linux). Each returns (#t . val) on success or (#f . errno) on failure. sys-read fd buf offset count, sys-write fd buf offset count, sys-close fd, sys-openat dirfd path-bv flags mode, sys-clone (fork-style, no args), sys-execve path-bv argv-list, sys-waitid idtype id infop options, sys-argv (no args; returns the process's argv as a list of bvs), sys-exit code (does not return).

Heap control (used by the cc compiler for arena-style allocation) heap-usage, heap-mark, heap-rewind!, use-scratch-heap!, use-main-heap!, reset-scratch-heap!, heap-in-main?. heap-mark / heap-rewind! discard everything allocated after the mark on whichever heap is current; the scratch heap can be reset wholesale. UNSAFE: the runtime does not track liveness, so any surviving reference into a freed region becomes dangling. Most callers should reach for the prelude wrappers call-with-heap-rewind, call-with-scratch-deep-copy, and call-with-scratch-cycle rather than driving these primitives directly.

Error semantics

error is the only structured error path. Everything else — (car '()), out-of-range bytevector-u8-ref, (quotient 1 0), mutating immutable state, integer overflow, unknown-form pmatch fallthrough — is primitive failure: the runtime aborts with a short message on stderr. Callers should not rely on any particular outcome.

There is no raise / guard / handlers, no call/cc, no exceptions. Wrap-and-return through (ok . val) pairs (the syscall convention) when failure needs to be observable.

Prelude surface

scheme1/prelude.scm is bundled in front of every user program by tests/boot-run-scheme1.sh. It adds: