boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

pmatch

Spec for the pmatch special form. Companion to LISP.md §Special forms; implementation sketch in LISP-C.md §pmatch.

Why it's in the language

The Scheme subset has no macros, no case-lambda, no define-syntax. The self-hosted compiler destructures s-expressions on every form it walks ((if t a b), (lambda ps . body), (quote x), …); doing this by hand through car/cdr chains is tedious and error-prone. pmatch is the single concession: one built-in pattern matcher so the compiler reads at the form level. It is not a general macro system — admitting it specifically rules out admitting syntax-rules.

Surface syntax

(pmatch <expr> <clause> ...)

<clause> ::= ( <pattern> <body> ... )
           | ( <pattern> (guard <g-expr> ...) <body> ... )
           | ( else <body> ... )

Patterns

Pattern Matches
() the empty list (null? of subject)
<integer> that integer (equal? — same as = for fixnums)
<string> byte-for-byte equal bytevector (equal?)
#\c the integer that #\c denotes (equal?)
#t / #f that boolean (eq?)
<symbol> that exact symbol (eq?) — not a binder
,<ident> anything; binds <ident> to the subject
,_ anything; introduces no binding (wildcard)
(p1 p2 ... pN) proper list of length exactly N; each pi matches
(p1 ... pN . ptail) improper list; prefix matches p1..pN, tail matches ptail

Patterns nest: (if ,t (op ,a ,b) ,c) is well-formed. Sub-patterns are matched recursively against the corresponding sub-objects.

Malformed ,-patterns

(unquote), (unquote x y ...), and (unquote <non-symbol>) (e.g. ,42, ,(foo)) are all runtime errors, not undefined behavior. This is the only carve-out from the Primitive failure policy in LISP.md — pattern shape is a syntax error in the user's source, worth catching with a clear message. Implementation cost is three tag compares per binder match; see LISP-C.md §pmatch for the check.

Reader interaction

pmatch reuses the standard reader. To make ,<ident> writeable, the reader recognizes ,X as datum sugar for (unquote X) — the same shape R7RS gives it, but only as a datum form. There is no unquote evaluator and no quasiquote; the form exists purely so that pmatch patterns can be parsed. Writing ,x outside a pmatch pattern produces a list (unquote x), which eval will reject as an application of an unbound variable.

This is the only reader concession. ,@ (unquote-splicing) is not recognized.

Symbols vs. binders

This is the most common stumble. In a pattern:

So (if ,t ,a ,b) matches the head symbol if literally and binds t, a, b to the three subforms. (,head ,t ,a ,b) matches any 4-element list and binds head as well.

Wildcards

,_ matches anything and introduces no binding. Multiple ,_ in one pattern are fine. The identifier _ is otherwise unreserved — (define _ 0) is legal Scheme; _ outside a , is just a literal symbol pattern.

Improper-tail patterns

(p1 ... pN . ptail) first matches the prefix exactly like a proper-list pattern, then matches ptail against whatever cdr remains (which may be a proper list, an improper tail, or ()).

ptail is a full pattern, not just a binder. Common cases:

Example: (lambda ,ps . ,body) matches any list whose head is the symbol lambda and at least one more element; ps gets the second element, body gets the cdr from there.

Semantics

Clause selection

Clauses are tried top to bottom. The first clause whose pattern matches the subject and whose guards (if any) all evaluate to a true value is selected. Its body is evaluated in the extended environment; the value of the last body expression is the value of the pmatch.

else is a clause that always matches with no bindings; it must be the last clause if present.

Guards

( <pattern> (guard <g-expr> ...) <body> ... )

Zero <g-expr> is permitted but pointless: (p (guard) body ...) is the same as (p body ...).

Bindings

No-match

If no clause's pattern matches and no else is present, behavior is undefined (per the Primitive failure policy in LISP.md; expect a crash, not a defined error). Write an explicit (else (error "pmatch: no match" subject)) whenever the matcher is not provably exhaustive.

Evaluation order

Equality used for literals

Integer / string / character / boolean patterns match by equal?. Concretely: integers and characters compare via = on the underlying fixnum (characters are u8 fixnums per LISP.md); strings compare byte-for-byte; booleans compare by eq?. Symbol patterns match by eq? (symbols are interned).

Examples

Compiler form dispatch:

(pmatch e
  ((quote ,x)             (emit-literal x))
  ((if ,t ,a ,b)          (emit-if t a b))
  ((lambda ,ps . ,body)   (emit-lambda ps body))
  ((set! ,name ,v)        (emit-set! name v))
  ((,f . ,args)           (emit-call f args))
  (,x (guard (symbol? x)) (emit-var x))
  (,x (guard (integer? x))(emit-int x))
  (else                   (error "unknown form" e)))

Three things to notice:

  1. quote, if, lambda, set! are literal symbol patterns — they match those head symbols only.
  2. The (,f . ,args) clause must come after the special-form clauses because it would otherwise swallow them.
  3. The trailing ,x (guard ...) clauses fire on atoms (subject didn't match any list pattern); guards then split symbol vs. integer.

Destructuring a let clause:

(pmatch clause
  ((,name ,init)  (cons name init))
  (else           (error "bad let clause" clause)))

Walking a proper list with a tail binder:

(define (sum xs)
  (pmatch xs
    (()              0)
    ((,h . ,t)       (+ h (sum t)))))

What pmatch is not

Kept explicit so additions are deliberate.

Feature Status
,@<ident> segment / splice patterns not supported
... / Kleene-star repetition not supported
or / disjunction patterns not supported
and / conjunction patterns not supported
Predicate patterns ((? pred)) use (guard (pred ,x))
As-patterns ((<pat> :as ,name)) not supported
Non-linear patterns (repeated names) undefined behavior
User-extensible patterns not supported
Compilation / decision-tree caching not done; rewalk every call

If a use case demands one of these, the workaround is a guard plus auxiliary ,<ident> bindings, or splitting the match into nested pmatches. Anything beyond that is out of scope for v1.

Implementation contract (for LISP-C.md)

This section pins what the interpreter must guarantee so callers can rely on it; the actual P1 implementation lives in LISP-C.md.