pmatch
Spec for the pmatch special form. Companion to LISP.md §Special
forms; implementation sketch in LISP-C.md §pmatch.
Why it's in the language
The Scheme subset has no macros, no case-lambda, no define-syntax. The
self-hosted compiler destructures s-expressions on every form it walks
((if t a b), (lambda ps . body), (quote x), …); doing this by hand
through car/cdr chains is tedious and error-prone. pmatch is the
single concession: one built-in pattern matcher so the compiler reads at
the form level. It is not a general macro system — admitting it
specifically rules out admitting syntax-rules.
Surface syntax
(pmatch <expr> <clause> ...)
<clause> ::= ( <pattern> <body> ... )
| ( <pattern> (guard <g-expr> ...) <body> ... )
| ( else <body> ... )
<expr>is evaluated once; its value is the subject.<pattern>is a literal datum read by the standard reader; it is not evaluated.<body>is one or more expressions; the last is in tail position of the enclosingpmatch.<g-expr>are evaluated with the pattern's bindings in scope; see Guards below.
Patterns
| Pattern | Matches |
|---|---|
() |
the empty list (null? of subject) |
<integer> |
that integer (equal? — same as = for fixnums) |
<string> |
byte-for-byte equal bytevector (equal?) |
#\c |
the integer that #\c denotes (equal?) |
#t / #f |
that boolean (eq?) |
<symbol> |
that exact symbol (eq?) — not a binder |
,<ident> |
anything; binds <ident> to the subject |
,_ |
anything; introduces no binding (wildcard) |
(p1 p2 ... pN) |
proper list of length exactly N; each pi matches |
(p1 ... pN . ptail) |
improper list; prefix matches p1..pN, tail matches ptail |
Patterns nest: (if ,t (op ,a ,b) ,c) is well-formed. Sub-patterns are
matched recursively against the corresponding sub-objects.
Malformed ,-patterns
(unquote), (unquote x y ...), and (unquote <non-symbol>) (e.g.
,42, ,(foo)) are all runtime errors, not undefined behavior.
This is the only carve-out from the Primitive failure policy in
LISP.md — pattern shape is a syntax error in the user's source, worth
catching with a clear message. Implementation cost is three tag
compares per binder match; see LISP-C.md §pmatch for the check.
Reader interaction
pmatch reuses the standard reader. To make ,<ident> writeable, the
reader recognizes ,X as datum sugar for (unquote X) — the same shape
R7RS gives it, but only as a datum form. There is no unquote
evaluator and no quasiquote; the form exists purely so that pmatch
patterns can be parsed. Writing ,x outside a pmatch pattern produces a
list (unquote x), which eval will reject as an application of an
unbound variable.
This is the only reader concession. ,@ (unquote-splicing) is not
recognized.
Symbols vs. binders
This is the most common stumble. In a pattern:
foomatches the literal symbolfoo(byeq?). It does not bind.,foomatches anything and bindsfooto it.
So (if ,t ,a ,b) matches the head symbol if literally and binds t,
a, b to the three subforms. (,head ,t ,a ,b) matches any 4-element
list and binds head as well.
Wildcards
,_ matches anything and introduces no binding. Multiple ,_ in one
pattern are fine. The identifier _ is otherwise unreserved — (define _ 0) is legal Scheme; _ outside a , is just a literal symbol pattern.
Improper-tail patterns
(p1 ... pN . ptail) first matches the prefix exactly like a proper-list
pattern, then matches ptail against whatever cdr remains (which may be a
proper list, an improper tail, or ()).
ptail is a full pattern, not just a binder. Common cases:
ptail = ,rest—restbinds the remaining list (the most common form; works for both proper and improper tails).ptail = ()— equivalent to writing the proper-list pattern(p1 ... pN).ptail = (q1 ... qM . qtail)— keep destructuring further into the tail.
Example: (lambda ,ps . ,body) matches any list whose head is the symbol
lambda and at least one more element; ps gets the second element,
body gets the cdr from there.
Semantics
Clause selection
Clauses are tried top to bottom. The first clause whose pattern matches
the subject and whose guards (if any) all evaluate to a true value is
selected. Its body is evaluated in the extended environment; the value of
the last body expression is the value of the pmatch.
else is a clause that always matches with no bindings; it must be the
last clause if present.
Guards
( <pattern> (guard <g-expr> ...) <body> ... )
- Guards are evaluated only if the pattern structurally matched.
- Each
<g-expr>is evaluated in left-to-right order with the pattern's bindings visible. Evaluation short-circuits: the first<g-expr>that yields#faborts the guard. - All
<g-expr>truthy ⇒ the clause is selected. Any<g-expr>#f⇒ the clause is rejected, bindings are discarded, and matching continues with the next clause. - A guard rejection does not re-match the pattern against later positions in a list — the subject is fixed. (Same shape as guards in Erlang/ML: structural match, then filter.)
Zero <g-expr> is permitted but pointless: (p (guard) body ...) is the
same as (p body ...).
Bindings
- Each
,<ident>in the pattern introduces a fresh binding. Bindings are visible inside that clause's guards and body, and nowhere else. - Two
,<ident>with the same identifier in one pattern is undefined — the language does not promise to compare for equality (no Prolog-style unification) nor to reject the pattern. Don't write it. - Bindings shadow same-named names in the surrounding lexical scope, just
like
let.
No-match
If no clause's pattern matches and no else is present, behavior is
undefined (per the Primitive failure policy in LISP.md; expect a
crash, not a defined error). Write an explicit
(else (error "pmatch: no match" subject)) whenever the matcher is not
provably exhaustive.
Evaluation order
<expr>evaluated once before any pattern is tried.- Patterns themselves are not evaluated; they are walked structurally against the subject.
- For matching nested patterns, walk order is unspecified — patterns are pure datum tests against an already-evaluated subject, so order is not observable.
- The selected body is in tail position of the
pmatch. Tail calls in the body are proper tail calls of the surrounding context.
Equality used for literals
Integer / string / character / boolean patterns match by equal?.
Concretely: integers and characters compare via = on the underlying
fixnum (characters are u8 fixnums per LISP.md); strings compare
byte-for-byte; booleans compare by eq?. Symbol patterns match by eq?
(symbols are interned).
Examples
Compiler form dispatch:
(pmatch e
((quote ,x) (emit-literal x))
((if ,t ,a ,b) (emit-if t a b))
((lambda ,ps . ,body) (emit-lambda ps body))
((set! ,name ,v) (emit-set! name v))
((,f . ,args) (emit-call f args))
(,x (guard (symbol? x)) (emit-var x))
(,x (guard (integer? x))(emit-int x))
(else (error "unknown form" e)))
Three things to notice:
quote,if,lambda,set!are literal symbol patterns — they match those head symbols only.- The
(,f . ,args)clause must come after the special-form clauses because it would otherwise swallow them. - The trailing
,x (guard ...)clauses fire on atoms (subject didn't match any list pattern); guards then split symbol vs. integer.
Destructuring a let clause:
(pmatch clause
((,name ,init) (cons name init))
(else (error "bad let clause" clause)))
Walking a proper list with a tail binder:
(define (sum xs)
(pmatch xs
(() 0)
((,h . ,t) (+ h (sum t)))))
What pmatch is not
Kept explicit so additions are deliberate.
| Feature | Status |
|---|---|
,@<ident> segment / splice patterns |
not supported |
... / Kleene-star repetition |
not supported |
or / disjunction patterns |
not supported |
and / conjunction patterns |
not supported |
Predicate patterns ((? pred)) |
use (guard (pred ,x)) |
As-patterns ((<pat> :as ,name)) |
not supported |
| Non-linear patterns (repeated names) | undefined behavior |
| User-extensible patterns | not supported |
| Compilation / decision-tree caching | not done; rewalk every call |
If a use case demands one of these, the workaround is a guard plus
auxiliary ,<ident> bindings, or splitting the match into nested
pmatches. Anything beyond that is out of scope for v1.
Implementation contract (for LISP-C.md)
This section pins what the interpreter must guarantee so callers can rely on it; the actual P1 implementation lives in LISP-C.md.
pmatchis a built-in special form, not a primitive procedure. It does not exist as a value.- It receives unevaluated clauses and walks them itself.
- The matcher walks pattern and subject recursively. On match, it returns an environment extended with the new bindings (or fails). The driver tries clauses top-to-bottom, evaluates guards in the extended env, and tail-evaluates the body of the first selected clause.
- No pattern compilation; no decision-tree memoization. Patterns are re-walked per call. (See Primitive failure policy: re-add caching later if profiling shows it's worth it.)
- Bindings are stored in the same flat alist environment used by
let— no separate match-env type. - The body's last expression is reached via the same tail-call path as
begin/letbody, so tail recursion throughpmatchworks.