scheme1 → shell.scm TODO

Checklist for getting lisp/shell.scm running under scheme1.

Workflow: every item is red-green TDD. Add a failing tests/scheme1/NN-*.scm (with .expected-exit and/or .expected) first, run the suite to confirm it fails for the expected reason, then implement until green. Multi-arch suite (make test SUITE=scheme1) must stay clean before moving on.

Audit: deviations and known issues

Everything below is a real bug, hack, or spec gap that must be addressed before calling scheme1 shippable.

Open bugs

Prelude spawn reached through run errors with "unbound variable" in the parent. (run prog) from user code fails even though (spawn prog) inline at user level with the identical body works. Root cause not identified (apply_build_args walking the variadic list, closure env capture, or env extension with the dotted-tail param args are all suspects). See test tests/scheme1/45-shell-spawn.scm — it works around the bug by redefining spawn at user level. Until this is understood, the prelude's spawn and run are effectively unverified.
No heap-exhaustion check. cons, alloc_hdr, and alloc_bytes now compare heap_next + bytes against :heap_end (initialized to heap_buf_ptr + HEAP_CAP_BYTES at startup) and abort via runtime_error on overflow. load_source and eval_prelude reject sources that would overrun READBUF_CAP_BYTES.
No symtab-name copy bound. Name copies still go through alloc_bytes, but that path now errors cleanly when the heap arena is exhausted instead of silently scribbling into the symtab. intern's 1024-slot count check remains and routes through the same runtime_error.
Bytevector-u8-set! / -ref / -copy / -copy! have no bounds check. All four now check 0 <= idx < length (or 0 <= start <= end <= length, plus the dst-side range for bytevector-copy!) and abort via runtime_error. make-bytevector and bytevector-grow! reject negative arguments through the same path.
car / cdr of non-pair, quotient/remainder of zero, etc., are silent UB — same policy as above, no abort path.

Spec features still missing

Per LISP.md and LISP-C.md, but not implemented:

Special forms missing: set!, pmatch, cond's => arrow form. pmatch is called out by LISP-C.md as a built-in special form needed by the self-hosted compiler.
Primitives missing (LISP.md lists them as required):
- Equality: eqv?, equal? (we only have eq?)
- Predicates: boolean?, integer?, string?, procedure?, record?, record-type?
- Numeric: quotient, remainder, modulo, <=, >=, >, positive?, negative?, abs, min, max, bit-xor, bit-not, number->string, string->number
- Pair / list: set-car!, set-cdr!, length, list-ref, map, for-each as primitives (we provide them via the prelude only)
- Bytevector: bytevector-append, bytevector=?, string->symbol, symbol->string
+ - * = < are 2-arg only. R7RS allows any arity.
apply is variadic on the trailing list but otherwise unverified for arity edge cases.
Type names are not bound by define-record-type. The TD is reachable only via the parameterized prims that close over it; no record-type-of, no way to inspect a TD from user code. Spec is ambiguous on this; LISP-C.md's example uses a generated <point-td> binding.
shell.scm's port record-type, stdin/stdout/stderr ports, open-input / open-output / read-line / read-bytes / read-all / bv-concat-reverse / write-bytes / write-line are NOT in the prelude. Only the process-management half of shell.scm is ported.
scheme1/prelude.scm is a strict subset of lisp/prelude.scm. Active set: <=, >=, negative?, abs, caar/cadr/cdar/cddr/caddr, list?, assoc, member, filter, fold, plus the inherited list/shell helpers. Commented-out placeholders for positive? (needs >), vector->list / list->vector (need make-vector / vector-ref / vector-set! / vector-length), and equal? (needs string? / vector? plus their ref/length) wait on the corresponding primitives.

Hacks and fragile invariants

These work today but are easy to break.

Bytevector NUL-termination via headroom. bv_capacity_for returns the smallest power of two strictly greater than n. The byte at index length is the zero-init NUL terminator and we hand the raw data_ptr directly to syscalls expecting C strings (sys-openat, sys-execve, the per-arg pointers in build_execve_argv). If user code calls bytevector-u8-set! past length, that NUL is gone and the next syscall reads garbage. Capacity is never reset by bytevector-copy! or any other op, so the invariant only protects fresh / never-overwritten bytevectors.
bytevector-grow! is a public primitive (bv_grow) that's effectively only there to make the doubling path testable (test 34). Not in R7RS, not in LISP.md. Either expose it as part of a documented mutable-bytevector API or delete and demote bv_grow to internal.
%record-* primitives are exposed publicly in prim_table alongside the parameterized record entries. LISP-C.md says "internal, not part of the user-facing primitive list".
PRIM size grew from 16 to 24 bytes uniformly to fit the parameterized data slot used by record ctors / preds / accessors / mutators. Plain primitives (sys-exit, cons, +, …) waste those 8 bytes per instance.
apply modification: prim ptr is now passed in a1 alongside args in a0. All existing primitives ignore a1, but any future primitive that uses a1 for anything else will silently break.
Symbol-table linear scan. intern walks the table from idx 0 on every call. LISP-C.md describes a 16384-slot open-addressing hash; we have a 1024-slot linear scan that exits with code 5 on overflow.

Test suite caveats

Issues in the test files themselves that need fixing or revisiting before the suite can be considered authoritative.

tests/scheme1/15-dot-symbol.scm — defines .foo (a leading . identifier). LISP.md says "a lone . is not a symbol — it's reserved for dotted-pair syntax", but the spec is silent on whether .foo is admissible. Behavior depends on whether the byte after . is whitespace/paren (handled by parse_list's peek). Useful as a regression test for the dotted-tail detector but not necessarily desired surface syntax.
tests/scheme1/19-letstar.scm — comment claims "outer x; let*'s x must shadow inside the body" but the test only checks the inner shadow path. Nothing exercises that the outer x is not affected after the let* body returns.
tests/scheme1/20-letrec.scm — uses (if n n (f #t)) to test letrec self-reference. Recurses once (n=44 → truthy → returns 44) so it doesn't actually trigger the recursive case. The comment acknowledges the workaround ("Without numeric primitives we terminate by passing #t at the recursive call"). Needs a real recursion test now that the let family + arith primitives are available; 21-letrec-recursion.scm partially fills this.
tests/scheme1/22-named-let.scm — recursion is bounded by a flag (first) flipping from #t to #f. Deep iteration not exercised.
tests/scheme1/27-apply.scm — only tests 2-arg (apply f arglist) and 3-arg (apply f x arglist). (apply f) is unspecified; (apply f a b … last) for N>3 is unverified.
tests/scheme1/40-sys-argv.scm — hard-codes expected-exit = 2, the count of argv entries the runner happens to pass (./binary tests/scheme1/40-sys-argv.scm). Any change to scripts/run-tests.sh's invocation or a wrapper that injects extra args breaks this test silently.
tests/scheme1/41-fileio.scm — opens itself by reading (car (cdr (sys-argv))) and passing the bytevector as a path. Relies on the bv_capacity_for headroom invariant for NUL termination (no explicit chars->bv). Doesn't exercise the (#f . errno) branch of sys-openat (e.g., a non-existent path). Hard-codes O_RDONLY = 0 and mode = 0 instead of using named constants.
tests/scheme1/42-clone-wait.scm — bypasses sys-wait / decode-wait-status entirely; reads siginfo_t.si_status (offset 24) directly from the buffer with bytevector-u8-ref. Encodes Linux-x86_64-and-aarch64 siginfo layout; non-portable to other Linux ABIs and to any non-Linux target.
tests/scheme1/43-prelude.scm — verifies for-each only by running (for-each (lambda (x) x) ys) and checking it doesn't error; doesn't check that for-each actually invokes the lambda for each element (no side-effect verification).
tests/scheme1/44-shell-run.scm — name is misleading. It tests sys-wait + decode-wait-status against a sys-clone child but never calls run. run is what test 45 was supposed to cover, and it doesn't because of the spawn-via-run bug.
tests/scheme1/45-shell-spawn.scm — works around the prelude spawn bug by redefining spawn at user level. The prelude's spawn / run are therefore covered by zero passing tests.
tests/scheme1/38-record-internal-prims.scm — the %record-* primitives are tested via the Scheme surface; the only way to invoke them is through the public binding, which conflicts with LISP-C.md's "internal" classification.
No test verifies (set-car! …) / (set-cdr! …) — the primitives don't exist; spec requires them.
No test verifies that mutating a literal pair ('(1 2 3)) is UB — undefined behavior is policy, but the policy isn't pinned down by a test.
No test verifies tail-call correctness on deep recursion — named let, letrec, and the eval/apply tail positions all rely on %tail/%tailr, but nothing recurses thousands of times to confirm no host-stack growth.
No (define x …) followed by (set! x …) test because set! doesn't exist.
No quoted-pair test ('(1 . 2)) — only quoted lists are tested. The reader handles dotted pairs but no test pins this.
tests/scheme1/16-cond.scm — verifies short-circuit in the positive direction (later truthy clauses don't fire). Doesn't verify that a (cond) with no matching clause and no else returns UNSPEC (or whatever the policy is — currently it does, but unspecified by spec).

Suggested next steps before shipping

In rough priority order:

Track down and fix the prelude spawn-via-run bug; remove the workaround in test 45.
Fill in the spec-required primitives (equal?, eqv?, set-car!, set-cdr!, the comparison family, the bytevector family, the number/string converters).
set!, pmatch.
Port shell.scm's port record + I/O wrappers.
Replace the 1024-slot linear-scan symtab with an open-addressing hash per LISP-C.md.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README