commit b8f536ed61f21edb813cdb1fbc20e6ff7c435cfa
parent 1233daa9e483569e46de1aff05ef023cbe92dc14
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 1 Jun 2026 17:33:42 -0700
plan: sketch RQL RSN
Diffstat:
| A | doc/plan/RQL.md | | | 927 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | doc/plan/RSN.md | | | 627 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
2 files changed, 1554 insertions(+), 0 deletions(-)
diff --git a/doc/plan/RQL.md b/doc/plan/RQL.md
@@ -0,0 +1,927 @@
+# RQL — a typed relational query language (planned work)
+
+A replacement for SQL: declarative data **query and manipulation**, but *typed,
+consistent, and close to the relational algebra*. `RQL` is a working name (TBD).
+
+SQL conflates three layers that should be separate — a surface syntax, a logical
+algebra, and a physical plan — and its surface does not map cleanly onto its own
+algebra. That mismatch is the source of most of SQL's irregularity. RQL keeps the
+layers apart: a small, total, well-typed **core algebra** (the IR) and a
+**pipeline surface** that maps onto it transparently. Same discipline as a
+compiler: surface → IR → plan, where the IR is where the laws live.
+
+This doc is the living design and the ledger of decisions. It records *what* we
+decided and *why*; open questions sit at the end and graduate up into the ledger
+as we settle them.
+
+## What we're correcting
+
+The specific SQL faults RQL is designed to remove:
+
+- **Clause order ≠ logical order.** `SELECT` is written first but evaluated
+ nearly last, which breaks composition, top-to-bottom reading, and completion.
+- **Three-valued logic.** `NULL` poisons predicates; `NOT IN` with a null is a
+ silent footgun; aggregates skip nulls inconsistently.
+- **`GROUP BY` conflates partitioning with aggregation**, forcing the "every
+ select item must be grouped or aggregated" rule and a separate `HAVING`.
+- **Weak typing & implicit coercion**; no sum types, no row polymorphism, errors
+ surface at runtime.
+- **Bags pretending to be sets**, with `DISTINCT` bolted on, ordered/duplicate
+ column names, and `SELECT *`.
+- **Universal quantification is miserable** — Codd's division ("supplies *all*
+ parts") has no clean SQL spelling.
+
+## Locked decisions
+
+Each entry is a decision plus a one-line rationale. Numbers are stable IDs in
+decision order; the grouping is thematic, so IDs need not run contiguously within
+a group.
+
+### Semantics
+
+1. **A relation is a set of distinct records.** A record (tuple) is an unordered
+ set of named, typed fields; a relation is a set of records over one heading.
+ Boolean-semiring sets, not bags. — Clean algebraic laws ⇒ sound optimizer
+ rewrites (predicate/projection pushdown, join reordering are real equalities);
+ `count` is meaningful (distinct), not "however many dupes were present."
+2. **Base tables must declare a key.** Distinctness is a checked property, not a
+ hope. — Makes "set" honest, and prevents aggregating over a lossy projection.
+3. **Multiplicity is data, not a second semantics.** When duplicates must be
+ counted, carry an explicit `count: Int` (or `weight`) field and weight
+ aggregates explicitly (`sum(group.amount * group.count)`). — Buys the bag / K-relation
+ (provenance-semiring) expressiveness *as a value*, without two equalities, a
+ truncating `minus`, or weight-aware aggregate magic. Accepted cost: a synthetic
+ id at ingest for genuinely keyless multiset data. Revisit if heavy event/log
+ ingestion makes that friction too high.
+4. **Ordering is not part of a relation.** `sort` / `take` / `window` produce a
+ **`Seq`**, a distinct type. — Relations are unordered; ordering is a
+ presentation/boundary concern and shouldn't pollute relational laws.
+
+### Missing values
+
+5. **No `NULL`. Optionality lives in the type** (`T?` = `Option T`), introduced
+ *only* by outer joins. Predicates are total, 2-valued booleans; defaults via
+ `??`. — Removes three-valued logic and the `NOT IN` / aggregate-skipping
+ footguns; "missing" becomes explicit and local.
+26. **Option semantics: total equality, `some`-widening, `where`-narrowing.**
+ `==` / `!=` on `T?` are total Bool (structural: `none == some x` is `false`); a
+ non-option widens to an option implicitly (`T <: T?`, the only implicit
+ conversion); `where x is some` (or `x != none`) narrows `x` to `T` downstream.
+ Ordering, arithmetic, and value-use on `T?` need explicit resolution. — Keeps
+ option handling total and ergonomic without reintroducing 3-valued logic
+ (refines #5).
+
+### Surface
+
+6. **Pipeline / algebra surface:** `R |> op |> op …`, one stage per core
+ operator, written in evaluation order. A pipeline is just a relation
+ expression (there is no `from` keyword), so it nests anywhere a relation is
+ expected. — Composition, top-to-bottom reading, and a 1:1 correspondence with
+ the IR. A comprehension (calculus) surface over the same IR is deferred, not
+ rejected.
+7. **Grouping only partitions; aggregation is ordinary expressions over the
+ relation-valued group field.** No `HAVING` (it is `where` after `group`); no
+ "must appear in GROUP BY" rule (referencing a non-key scalar after grouping is
+ a type error). — One uniform mechanism; the awkward SQL special cases vanish.
+
+### Scope
+
+8. **Query + DML + constraints, unified by relational assignment.** A database is
+ a set of relation variables; insert / update / delete are all `R := <relational
+ expression>`; keys, foreign keys, and arbitrary predicates are declared
+ constraints checked on write. — One mechanism (Third Manifesto / Tutorial D
+ style); set semantics makes assignment idempotent and total.
+
+### Types
+
+*(First draft — see "Type system" below. The shape is settled; specific
+inference rules and the keys-in-type ambition dial are open.)*
+
+9. **Row-polymorphic structural records.** Headings are structural record types;
+ `extend` / `project` / `rename` / `join` are polymorphic over "the other
+ fields" via a row variable, with a field-absence ("lacks") predicate.
+ Duplicate labels are forbidden (a heading can't name a field twice).
+10. **Named scalar domains, no implicit cross-domain coercion.** `Money` is not
+ `Float`; conversions are explicit. — Supports "typed and consistent."
+11. **Keys/FDs live in a metadata + inference layer, not the type proper.** Keys
+ are declared on base tables and *inferred* through operators alongside the
+ type; they power the fan-out-must-aggregate check, optimizer hints, and
+ warnings, but do not participate in type equality. — Pragmatic sweet spot;
+ leaves a clean path to full type-level keys if signatures ever need them.
+
+### Joins
+
+12. **Explicit-predicate joins + key-inference are the foundation; the
+ relationship/FK layer is deferred, additive sugar.** `join R on <predicate>`
+ is the primitive — it covers equi, theta, range, and self joins. When the
+ condition equates a key of one side, key-inference (decision #11) bounds the
+ join to ≤1 match (no fan-out) and feeds the must-aggregate warning; a many-to-
+ many join is flagged. A relationship layer — declared references, reverse
+ navigation, path expressions (`order.customer.region.name`), and
+ cardinality-typed "many" sides that *force* aggregation — rides on top later.
+ It is purely additive and breaks nothing. — Ships a typed join with a
+ cardinality story over arbitrary / un-annotated data first, and defers the
+ schema-declaration machinery (named relationships, reverse naming, path
+ resolution) until the core is proven.
+25. **Join result disambiguation: source-qualified names.** Every field carries
+ the source relvar/alias it came from; a bare name is sugar for unambiguous
+ access. A `join` forms the *qualified union* of both sides (no auto-merge):
+ same bare names coexist as distinct qualified fields and must be referenced
+ `source.field`. `join R as a …` supplies a qualifier (needed for self-joins);
+ `select` / `extend` emit fresh unqualified fields, so qualification is
+ join-local. Natural join (merge common fields, equal types required) is opt-in
+ (`join R natural`). — Refines #9: labels are unique *qualified* names, bare
+ access must be unambiguous.
+
+### Aggregation
+
+13. **Grouping is partition + ordinary expressions; there is no "must-group"
+ rule.** The block form `group by K { … }` binds the per-group relation to the
+ keyword `group` and desugars to `group K into g |> extend … |> remove g`.
+ `group` is a relation; aggregates — written as function calls, `count(group)`,
+ `mean(group.salary)` — are the only relation→scalar bridge; `group.f` where a
+ scalar is expected is a plain type error. — Recovers SQL's grouping rule from
+ the type system; `having` is just `where` after the block.
+14. **Aggregates are commutative monoids; column extraction is a bag.**
+ `group.f` yields one value *per row* (a bag), not a dedup'd set, so sets +
+ keys make aggregation correct with no multiplicity bookkeeping. Order-
+ dependent reductions (first, in-order concat, lag/lead) are *not* aggregates
+ — they live on the `Seq` layer. — A clean dividing line that also explains
+ why windows are separate.
+15. **Empty / option / fan-out aggregation is explicit, never silent.** Empty
+ input → option (`mean`/`min`/`max` → `None`) or identity (`sum`/`count` →
+ `0`); aggregating an option column is a type error (resolve first);
+ duplicate-sensitive aggregates over a fanned-out field warn. — Replaces SQL's
+ silent NULLs, null-skipping, and double-counting with types plus one warning.
+
+### Manipulation (DML)
+
+16. **A database is a set of relvars; the only mutation primitive is relational
+ assignment `R := <expr>`.** Insert / delete / update are sugar over it,
+ reusing the query operators and expressions; assignment requires a *static
+ heading match* (no positional columns). Set semantics ⇒ insert/delete
+ idempotent, duplicate-key insert rejected (no silent upsert). Adds the `with`
+ operator (override an existing field) as the typed sibling of `extend` (add a
+ new field). — One mechanism; query and mutation share a language.
+17. **Constraints are boolean relational expressions that must hold after every
+ transaction.** Keys and FKs are named, incrementally-checked special cases;
+ tuple-level *and* database-level (multi-relvar) checks are unified and
+ first-class. Referential actions (restrict default / cascade / set-null) are
+ declared repairs applied within the transaction. — Cross-relation invariants
+ SQL cannot express become ordinary, checkable booleans.
+18. **The transaction is the checking and atomicity boundary.** A batch of
+ assignments is applied together with all constraints checked once at commit;
+ any violation rolls the batch back atomically; a bare assignment is a
+ singleton transaction. — Makes FKs and cascades coherent (parent + child
+ together) and gives one consistency boundary.
+
+### Recursion
+
+19. **Recursion is a restricted least fixpoint: monotone, range-restricted, with
+ stratified negation.** `recursive R = base |> union (… R …)`, terminating by
+ construction (semi-naive evaluation); the result is an ordinary relation;
+ `closure(...)` is sugar. — First-class reachability / ancestry / BOM
+ explosion without SQL's recursive-CTE pain or unrestricted-recursion
+ non-termination. Recursion *with aggregation* (shortest paths) is deferred.
+
+### Ordered data & windows
+
+20. **The `Seq` layer carries all order-dependent computation.** `sort by`
+ produces `Seq R`; scans (`Seq → Seq`: running/moving/`lag`/`lead`/rank) and
+ ordered folds (`Seq → scalar`: `first`/`last`/`concat`) live there, never in
+ relational aggregates. The `Rel`/`Seq` type boundary *enforces* decision #14.
+ — Order is a type, not a convention.
+21. **Windows reuse partitioning + sorting + `extend`, not a bespoke `OVER`.**
+ `partition by P order by S extend {…}` desugars to `group P into g |> with g =
+ (g |> sort by S |> extend <window-exprs>) |> ungroup g`; frames are relative
+ ranges (`rows[..current]`, `rows[-3..0]`); `lag`/`lead`/`rank` are `Seq`
+ built-ins; output is a `Seq`. — Windows fall out of pieces we already have.
+
+### Concrete syntax
+
+22. **`=` binds values, `:` ascribes types; a bare field name is punning; `--`
+ starts a comment.** Records/tuples are `{ f = e }`, headings and field decls
+ are `{ f: T }`, and `x` in a record / select / group block means `x = x`. —
+ Removes the value/type brace ambiguity in the early sketches.
+23. **`.` is type-directed.** `e.f` is field access on a record (→ scalar), column
+ extraction on a relation (→ `Col T`, a per-row bag), or option-propagating on
+ `T?` (→ `T'?`). A `Col` is legal only as an aggregate / per-row argument; used
+ as a scalar it is a type error — which *is* the no-must-group rule. — One
+ operator, three meanings, all from the operand's type.
+24. **Pipelines are bare relation expressions; aggregates are function calls; the
+ group block binds `group`.** No `from` keyword (a pipeline nests anywhere a
+ relation is expected; `from` survives only in `delete from` / `insert … from`).
+ Aggregates use call syntax (`count(group)`); the sugar `group by K {…}` binds
+ the keyword `group` (`group K into name` for nesting). `select {…}` is sugar
+ (extend + project), `project` / `remove` are pure π, and `let` names a relation
+ expression (replacing CTEs). — Maximal composability, one call syntax.
+
+## Type system (draft)
+
+### Type vocabulary
+
+- **Scalar domains:** `Int`, `Float`, `Bool`, `Text`, `Time`, … and user-defined
+ domains over them. No implicit coercion across domains.
+- **Option:** `T?` = `Option T`. Only outer joins introduce it.
+- **Record (heading):** `{ a: T1, b: T2, … }` — unordered, uniquely-named,
+ typed fields. Field order is irrelevant.
+- **Relation:** `Rel R`, a set of records over heading `R`, e.g.
+ `Rel {sensor: Id, at: Time, value: Float}`.
+- **Sequence:** `Seq R`, an *ordered* list of records (may repeat). Produced by
+ `sort`; the home of order-dependent operations (windows, top-n).
+- **Nested relation:** a field may itself be `Rel R'` (relation-valued attribute).
+ This is how `group` works; it appears only as an intermediate.
+
+### Core operators (the IR)
+
+Each is total and (mostly) `Rel → Rel`. Row variable `ρ` stands for "the other
+fields"; `f ∉ ρ` is the lacks predicate.
+
+```
+restrict (p: {ρ} -> Bool) : Rel {ρ} -> Rel {ρ}
+extend (f = e: T) : Rel {ρ} -> Rel {ρ, f: T} [f ∉ ρ]
+with (f = e: T') : Rel {f: T | ρ} -> Rel {f: T' | ρ} [f ∈ ρ]
+project F : Rel {F | ρ} -> Rel {F}
+remove F : Rel {F | ρ} -> Rel {ρ}
+rename a -> b : Rel {a: T | ρ} -> Rel {b: T | ρ} [b ∉ ρ]
+join : Rel R1 -> Rel R2 -> Rel (R1 ⋈ R2)
+union/minus/intersect : Rel R -> Rel R -> Rel R
+group K into g : Rel {K, A} -> Rel {K, g: Rel {A}}
+ungroup g : Rel {K, g: Rel A} -> Rel {K, A}
+sort by e : Rel R -> Seq R (→ Seq layer)
+```
+
+Aggregates are ordinary function calls `Rel A -> scalar` (`count`, `sum`, `mean`,
+…), used inside `extend` after `group`; `count(group)` is set cardinality.
+
+### Join headings & disambiguation
+
+The default `join` (explicit predicate, decision #12) forms the **qualified
+union** of the two headings: every field keeps the source relvar/alias it came
+from, and same-named fields from the two sides coexist as *distinct* qualified
+fields rather than being merged. A **bare name is sugar for unambiguous access**;
+when a name appears on both sides you must qualify it `source.field`:
+
+```
+employees |> join departments on dept == departments.id
+-- both sides have `name`, so it stays qualified:
+|> select { who = employees.name, team = departments.name }
+```
+
+- **Aliases:** `join R as a on …` supplies the qualifier `a` — required for
+ self-joins (`join employees as mgr on manager_id == mgr.id`).
+- **Equi-join keys** are kept as the two distinct fields they are (`dept` and
+ `departments.id` above); a `select` or natural join collapses them if wanted.
+- **`select` / `extend` clean up:** the fields they emit are fresh and
+ unqualified, so qualification is a transient, join-local concern.
+- **Natural join is opt-in:** `join R natural` merges the fields common to both
+ (which must then have equal types) into single bare fields and joins on their
+ equality — the Tutorial-D behavior, now explicit rather than the default.
+
+This refines decision #9: a heading's labels are unique *qualified* names; bare
+access must resolve to exactly one of them, or it is a type error.
+
+### Keys and functional dependencies
+
+Keys are tracked beyond base-table declarations and **inferred through
+operators**, so the compiler can reason about cardinality:
+
+- `restrict`: keys preserved.
+- `project F`: a declared key survives iff it ⊆ F; the full projected heading is
+ always a trivial superkey (sets auto-dedup).
+- `group K into g`: `K` becomes the key of the result by construction.
+- `join` on fields `J`: if `J` is a key of one side, that side contributes at
+ most one match (no fan-out) and the other side's key is preserved; if `J` keys
+ neither side, the join may fan out and the result key is the union of both.
+
+**Payoff:** the compiler knows when a join fans out, so accidental row
+multiplication feeding a `sum`/`count` becomes *detectable* rather than silently
+wrong. This is the safety net behind decisions #1–#2.
+
+Resolved (decision #11): keys/FDs live in a **metadata + inference layer**, not
+the type proper — propagated for cardinality reasoning, the fan-out check, and
+optimizer hints, but not part of type equality.
+
+### Inference
+
+Pipeline stages are fully inferred (Hindley–Milner over rows); base tables and
+domains are annotated. Query result types are inferred and checkable. Basis:
+Rémy/Wand-style records with absence + row unification (closer to that than to
+Leijen's scoped labels, since relations forbid duplicate labels).
+
+## Aggregation & grouping
+
+### Grouping = partition + ordinary expressions
+
+`group by K { … }` is sugar. The primitive is `group K into g` (nest): it turns
+`Rel {K, A}` into `Rel {K, g: Rel {A}}` — one row per distinct `K`, with the
+matching rows collected into the relation-valued field `g`. The block form adds
+fields computed from `g` and drops `g`:
+
+```
+R |> group by k1, k2 { a = count(group), b = mean(group.salary) }
+
+-- desugars to (the block binds the keyword `group`):
+R |> group (k1, k2) into g
+ |> extend a = count(g), b = mean(g.salary)
+ |> remove g
+```
+
+`group` is in scope in the block; *output* it to keep the nested relation
+(`{ emps = group, n = count(group) }` is a department with its list of
+employees), or omit it and it's dropped.
+
+### Why there's no "must appear in GROUP BY" rule
+
+After grouping, `k1, k2` are scalar key fields and `group` is a *relation*. No
+special rule is needed — it falls out of types:
+
+- `group.salary` is a **column**: the `salary` values across the group's rows (a
+ bag — see below), not a scalar.
+- An **aggregate** is the only thing that turns a column/relation into a scalar
+ (`count(group) : Int`, `mean(group.salary) : Float`).
+- `group.salary` where a scalar is expected is a type error. That *is* SQL's
+ "grouped or aggregated" rule, recovered for free.
+
+### Columns vs. projection — why aggregation is correct
+
+`group.f` (column extraction) yields one value **per row** — a bag — while
+`project {f}` yields a **set** (dedup'd). They differ exactly on duplicates, and
+that is what makes aggregation correct without multiplicity bookkeeping: base
+tables have keys (#2), so a group's rows are distinct *real* rows, so
+`group.salary` has one entry per real row and `sum(group.salary)` is right.
+Decisions #1 + #2 buy footgun-free aggregation; the explicit-`count` hatch (#3) is
+only needed once you deliberately collapse to a keyless form.
+
+An aggregate's argument is a **per-row expression** over the group: `group.f`
+binds the current row's field, so `sum(group.qty * group.price)` reduces
+`qty*price` row by row. Equivalent comprehension form:
+`sum(r.qty * r.price for r in group)`.
+
+### The aggregate functions
+
+Relational aggregates must be **commutative monoids** (sets are unordered ⇒ the
+reduction must be order-independent): `count`, `sum`, `min`, `max`, `mean`,
+`and`/`or`, `count(unique c)`. Order-dependent reductions are *not* relational
+aggregates — they belong to the `Seq` layer. Empty-input behavior uses options,
+not a silent NULL:
+
+| aggregate | empty input | type |
+|-----------------|-------------|---------|
+| `count` | `0` | `Int` |
+| `sum` | `0` | `Num` |
+| `mean/min/max` | `None` | `T?` |
+
+`group by`-produced groups are non-empty by construction, so the metadata layer
+narrows `mean`/`min`/`max` back to `T` there. Aggregating an **option column**
+(`Col T?`) is a **type error** — resolve first (`group.salary ?? 0`, or filter) —
+which removes SQL's inconsistent null-skipping.
+
+### Whole-relation aggregation
+
+`aggregate { … }` is `group by ()` — one group, one row out:
+
+```
+sales |> aggregate { total = sum(group.amount), n = count(group) }
+```
+
+### Nested / multi-level aggregation
+
+`g` is a relation, so any relational expression — including another `group by` —
+applies to it. Aggregating subtables composes:
+
+```
+orders
+|> group by region {
+ revenue = sum(group.amount),
+ by_month = (group |> group by month { m_rev = sum(group.amount) }), -- inner `group` shadows outer
+ }
+```
+
+### Fan-out detection (the safety net)
+
+A field `f` from base table `T` (key `KT`) is determined by `KT`. After a many-to-
+many join the relation's key grows to `KT ∪ KS`, so `f`'s values repeat across the
+extra dimension. Rule: a **duplicate-sensitive** aggregate (`sum`, `mean`,
+`count`) over a field whose determinant key is *coarser* than the relation's
+current key is flagged — its values are being double-counted; **duplicate-
+insensitive** aggregates (`min`, `max`) are exempt. This is the concrete cash-out
+of key-inference (#11): the double-count SQL does silently becomes a warning.
+
+## DML & constraints
+
+### Everything is relational assignment
+
+A database is a set of **relvars** (relation variables); `table T { … }` declares
+one. A relvar holds a relation *value*; the sole mutation primitive is
+**relational assignment**, `T := <relational expression>`, where `expr`'s heading
+must match `T`'s exactly (statically checked — no positional column matching). The
+three DML verbs are **sugar** over assignment, reusing the *same* operators and
+expressions as queries:
+
+```
+insert into R from <expr> --> R := R union <expr>
+insert into R { id = 7, … } --> R := R union (rel{ …one tuple… })
+delete from R where p --> R := R |> where not p
+update R where p set { f = e } --> R := (R |> where not p)
+ union (R |> where p |> with { f = e })
+```
+
+`with { f = e }` overrides an existing field (the override sibling of `extend`,
+which only adds); using the wrong one is a typed error, catching the `set`-typos
+SQL silently accepts. Set semantics makes these well-behaved: re-inserting an
+identical tuple is a **no-op** (`R ∪ {t}`, `t ∈ R`); inserting a *different* tuple
+with an existing key violates the key constraint and is **rejected** (no silent
+upsert); deleting absent rows is a no-op. Insert/delete are idempotent.
+
+### Constraints are booleans that must hold
+
+A constraint is a **boolean relational expression over the database that must be
+true after every transaction.** Keys and FKs are named, incrementally-checked
+special cases of that one idea:
+
+- **Key** (#2): `key (id)`, `key (a, b)` — the projection onto the key is
+ injective; multiple candidate keys allowed.
+- **Foreign key:** `dept: Id references departments(id)` — an inclusion
+ constraint, `R{dept} ⊆ departments{id}`; a nullable FK (`dept: Id?`) admits
+ `None`. Parent-side referential actions — **restrict** (default), **cascade**,
+ **set-null** — are declared repairs applied within the transaction before the
+ check.
+- **Check:** an arbitrary predicate. Tuple-level (`check salary > 0`) is the
+ common case; **database-level** constraints spanning relvars are first-class —
+ exactly what SQL cannot express:
+
+ ```
+ check count(employees |> where role == "CEO") <= 1
+ check (employees{dept} ⊆ departments{id}) -- an FK, spelled out
+ ```
+
+Any assignment whose result would falsify a constraint is rejected; the relvar is
+left unchanged.
+
+### Transactions are the checking boundary
+
+Because constraints span relvars, the unit of checking is the **transaction**: a
+batch of assignments applied together, all constraints checked once at commit, any
+violation rolling the whole batch back atomically. A bare assignment is a
+singleton transaction. This is what lets a parent and child be inserted together
+past an FK, and what makes cascades coherent.
+
+```
+transaction {
+ insert into departments { id = 3, name = "Eng" }
+ insert into employees { id = 7, name = "Ann", dept = 3 }
+} -- both visible together; constraints checked once
+```
+
+### Views
+
+A **view** is a named relation *expression*, not stored: `view active =
+employees |> where active`. Read-only to start; making views updatable (mapping a
+view-assignment back onto base relvars — the view-update problem) is deferred.
+
+### What this buys
+
+Mutation reuses the entire query language — no separate DML dialect (contrast
+SQL's bespoke `UPDATE … SET`). Three verbs collapse to one primitive, so they
+compose and can't interact surprisingly. Constraints unify to "invariants that
+hold after every transaction," with multi-relvar invariants first-class. Static
+heading-match kills positional-column bugs. The transaction is simultaneously the
+atomicity and the constraint boundary.
+
+## Recursion
+
+Restricted to a **least fixpoint of a monotone, range-restricted relational
+expression** — enough for reachability, ancestry, and BOM explosion, and
+guaranteed to terminate. A recursive relation is defined with `recursive`:
+
+```
+-- edges: Rel {src: Id, dst: Id, key (src, dst)}
+recursive reach =
+ edges
+ |> union (reach |> join edges on dst == edges.src
+ |> select { src, dst = edges.dst })
+```
+
+What keeps it total:
+
+- **Monotone:** the recursive name may not appear under `minus`/`not` — negation
+ is **stratified** (it may only refer to a lower stratum), so each iteration only
+ *adds* tuples.
+- **Range-restricted:** every output value comes from the base/input, none are
+ invented (no unbounded arithmetic in the recursive arm). Monotone +
+ range-restricted + finite input ⇒ the least fixpoint is reached in finitely many
+ steps.
+
+The result is an ordinary relation, so it composes with everything else (filter /
+join / group the closure). The common case has sugar — `closure(edges: src ->
+dst)` expands to the fixpoint above. Recursion *with aggregation* (shortest paths,
+PageRank-style iteration) needs well-founded aggregate semantics and is deferred.
+
+## Windows & the `Seq` layer
+
+### Principle
+
+Relations are unordered (#4); **ordering produces a `Seq`**, and *all*
+order-dependent computation lives there — never in relational aggregates, which
+must be commutative monoids (#14). `sort by e : Rel R -> Seq R` is the gateway,
+and the `Rel`/`Seq` type boundary is what enforces #14: you cannot write a running
+total over a `Rel`, nor a duplicate-collapsing set op over a `Seq`, without
+crossing the boundary explicitly. A `Seq` supports two kinds of computation:
+
+- **Scans** (`Seq R -> Seq R'`): give every row a value from its position,
+ neighbors, or a frame — running totals, moving averages, `lag`/`lead`, ranks.
+ Row count preserved.
+- **Ordered folds** (`Seq R -> scalar`): reduce in order — `first`, `last`,
+ `concat(sep)`. (Order-*independent* reductions remain relational aggregates;
+ these are the ones that genuinely need order.)
+
+Plus slicers `take n`, `drop n`, `take while p`, which subsume top-N and
+pagination (`sort by x desc |> take 10`).
+
+### Window expressions (scans)
+
+A scan adds a column via `extend`, where the expression sees ordered context:
+
+- **Frame aggregate:** `agg(e) over <frame>`, an aggregate over a range relative
+ to the current row:
+ - `rows[..current]` — running (unbounded preceding → current)
+ - `rows[current..]` — current → end
+ - `rows[-3..0]` — 3 preceding → current (a width-4 moving window)
+ - `rows[-1..1]` — immediate neighbors
+- **Offsets:** `lag(e, k, default)`, `lead(e, k, default)` — value k rows away.
+- **Ranking:** `row_number()`, `rank()`, `dense_rank()`, `ntile(n)` — read the
+ sort order and ties.
+
+### Partitioning reuses `group`
+
+`PARTITION BY` is just "window within each group independently," and we already
+have grouping. So:
+
+```
+trades
+|> partition by symbol order by ts
+ extend running = sum(amount) over rows[..current],
+ prev = lag(price, 1),
+ r = rank()
+```
+
+is sugar for group / sort-within / position-aware `extend` / ungroup:
+
+```
+trades
+|> group symbol into g
+|> with g = (g |> sort by ts
+ |> extend running = sum(amount) over rows[..current],
+ prev = lag(price, 1),
+ r = rank())
+|> ungroup g
+```
+
+The result is a **`Seq`** carrying within-partition order; cross-partition order
+is unspecified until a final `sort by`. Rows stay distinct, so it can be taken
+back to a `Rel` with `as rel` if you want a set.
+
+### Idioms
+
+```
+-- first row per partition
+events
+|> partition by user order by ts extend rn = row_number()
+|> where rn == 1
+
+-- 7-day moving average
+… |> partition by sku order by day extend ma = mean(qty) over rows[-6..0]
+
+-- in-order audit trail per case (ordered fold)
+steps |> group by case { trail = (group |> sort by ts |> concat(name, " -> ")) }
+```
+
+### Why this beats SQL windows
+
+- No bespoke `OVER (PARTITION BY … ORDER BY … ROWS BETWEEN …)` sub-language:
+ partitioning is the `group` you already know, ordering is `sort by`, the window
+ value is an `extend`.
+- The `Rel`/`Seq` boundary makes order-dependence *visible in the type* and
+ enforces the aggregate/scan split (#14) instead of leaving it to convention.
+- Frames are terse, and ordered folds (`first`/`concat`) are just `Seq`
+ functions, not special cases.
+
+## Concrete syntax (grammar)
+
+A first concrete grammar — enough to parse the examples above and to pin the
+ambiguities. Notation: `(…)` grouping, `*` zero-or-more, `?` optional, `|` choice;
+UPPERCASE = tokens; `--` starts a line comment.
+
+### Module & declarations
+
+```
+module ::= item*
+item ::= domainDecl | tableDecl | viewDecl | letDecl | recDecl
+ | constraintDecl | stmt
+
+domainDecl ::= "domain" IDENT "=" type ("check" "(" expr ")")?
+tableDecl ::= "table" IDENT "{" field ("," field)* ("," tableCon)* ","? "}"
+field ::= IDENT ":" type ("references" IDENT "(" IDENT ")" refAction*)?
+refAction ::= "on" ("delete" | "update") ("restrict" | "cascade" | "set" "null")
+tableCon ::= "key" "(" IDENT ("," IDENT)* ")" | "check" "(" expr ")"
+viewDecl ::= "view" IDENT "=" relExpr
+letDecl ::= "let" IDENT "=" relExpr -- names a relation (replaces CTEs)
+recDecl ::= "recursive" IDENT "=" relExpr -- monotone, range-restricted
+constraintDecl ::= "constraint" IDENT "check" "(" expr ")"
+```
+
+### Statements
+
+```
+stmt ::= IDENT ":=" relExpr -- relational assignment
+ | "insert" "into" IDENT (record | "from" relExpr)
+ | "delete" "from" IDENT "where" expr
+ | "update" IDENT "where" expr "set" record
+ | "transaction" "{" stmt* "}"
+ | relExpr -- a query to evaluate
+```
+
+### Relation expressions (pipelines)
+
+```
+relExpr ::= relPrim ("|>" stage)*
+relPrim ::= IDENT | "(" relExpr ")" | relLit
+ | "closure" "(" IDENT ":" IDENT "->" IDENT ")"
+relLit ::= "rel" "{" record ("," record)* ","? "}"
+
+stage ::= "where" expr
+ | "extend" bindings | "with" bindings
+ | "select" block | "project" nameBlock | "remove" nameBlock
+ | "rename" IDENT "->" IDENT
+ | "join" relPrim ("as" IDENT)? ("on" expr | "natural")
+ | ("union" | "minus" | "intersect") relPrim
+ | "group" "by" nameList block? -- sugar: binds `group`
+ | "group" nameList "into" IDENT -- primitive nest
+ | "ungroup" IDENT
+ | "aggregate" block -- = group by ()
+ | "sort" "by" sortKey ("," sortKey)*
+ | "take" ("while" expr | expr) | "drop" expr
+ | "partition" "by" nameList "order" "by" sortKey ("," sortKey)*
+ "extend" bindings -- window; yields Seq
+
+sortKey ::= expr ("asc" | "desc")?
+bindings ::= binding | block
+binding ::= IDENT "=" expr
+block ::= "{" entry ("," entry)* ","? "}" ; entry ::= binding | IDENT -- punning
+nameBlock::= "{" nameList ","? "}" ; nameList ::= IDENT ("," IDENT)*
+record ::= "{" entry ("," entry)* ","? "}" -- value tuple
+```
+
+### Types
+
+```
+type ::= (IDENT | "Rel" heading | "Seq" heading) "?"?
+heading ::= "{" IDENT ":" type ("," IDENT ":" type)* ","? "}"
+```
+
+### Scalar expressions (low → high precedence)
+
+```
+expr ::= or ("??" or)? -- option default
+or ::= and ("or" and)*
+and ::= cmp ("and" cmp)*
+cmp ::= add (cmpOp add | "is" ("some" | "none"))? -- non-associative
+cmpOp ::= "==" | "!=" | "<" | "<=" | ">" | ">=" | "in" | "subset"
+add ::= mul (("+" | "-") mul)*
+mul ::= un (("*" | "/" | "%") un)*
+un ::= ("-" | "not") un | cast
+cast ::= post ("as" type)?
+post ::= prim trailer*
+trailer ::= "." IDENT | "(" callArgs? ")" | "over" frame
+prim ::= LITERAL | IDENT | "(" expr ")" | quant
+quant ::= ("all" | "any") "(" expr "for" IDENT "in" relExpr ")"
+frame ::= "rows" "[" bound? ".." bound? "]" ; bound ::= "current" | INT
+callArgs ::= expr ("," expr)* -- function / aggregate args
+ | expr "for" IDENT "in" relExpr -- aggregate comprehension form
+LITERAL ::= INT | FLOAT | TEXT | "true" | "false" | "none" | "some" "(" expr ")"
+```
+
+Numeric literals allow `_` digit separators (`50_000`); the `50k`-style shorthand
+in earlier prose was informal.
+
+### Semantic clarifications
+
+These resolve the ambiguities the prose left open; they are the canonical reading
+and supersede any looser earlier example.
+
+1. **`.` is type-directed (#23).** `e.f` is field access on a record (→ scalar),
+ **column extraction** on a relation (→ `Col T`, a per-row bag), or option-
+ propagating access on `T?` (→ `T'?`). A `Col` is only ever an aggregate
+ argument; used as a scalar it is a type error — that *is* the no-must-group
+ rule.
+2. **`=` vs `:` (#22).** `=` binds a value (`{ id = 7 }`); `:` ascribes a type
+ (`{ id: Id }`). A bare `x` in a block is punning for `x = x`.
+3. **Pipelines are expressions; no `from` (#24).** A query is a relation
+ expression, so `R |> …` nests anywhere a relation is expected (join operands,
+ nested groups, `let` / `view` bodies). `from` appears only in `delete from` /
+ `insert … from`.
+4. **`group` keyword; `into` for naming.** `group by K { … }` binds the per-group
+ relation to `group`; nest with the `group K into name` primitive to avoid
+ shadowing.
+5. **Aggregates are calls.** `count(group)`, `mean(group.salary)`,
+ `sum(group.qty * group.price)`; the arguments are per-row expressions over the
+ group that the aggregate reduces. The comprehension form `sum(e for r in R)` is
+ equivalent.
+6. **`select` vs `project`.** `project {a, b}` is pure π (existing names only);
+ `select { a, x = e }` builds an exact output record and desugars to
+ `extend` (computed) + `project` (to the listed fields).
+7. **Frames.** In `over rows[a..b]`: `current` ≡ 0, negative = preceding, positive
+ = following, an empty side = unbounded — `rows[..current]` running,
+ `rows[-3..0]` width-4 trailing, `rows[-1..1]` neighbors.
+8. **Option discipline.** `==` / `!=` on options are **total** Bool (structural:
+ `none == some x` is `false`), and a non-option widens to an option implicitly
+ (`T <: T?` — the only implicit conversion). Ordering, arithmetic, and value-use
+ on a `T?` need resolution first (`??`, or narrowing via `where x is some`);
+ there is no silent null propagation.
+9. **Qualified names (#25).** A field carries its source relvar/alias. A bare name
+ must resolve unambiguously; after a `join`, a name present on both sides is
+ written `source.field`, and `join R as a` aliases the source (needed for
+ self-joins). In row context, a leading identifier that names a source qualifier
+ is qualified field access; otherwise `.` is the type-directed access of
+ clarification 1.
+
+## Showcase
+
+One schema exercised end-to-end in canonical syntax: schema + constraints, then a
+battery of queries, windows, recursion, and DML. Everything here is intended to
+parse under the grammar above and type-check under the rules above.
+
+### Schema
+
+```
+domain Id = Int
+domain Money = Int check (value >= 0) -- minor units; `value` is the domain value
+
+-- FK direction is declarative; declaration order and cycles (departments.head ⇄
+-- employees.dept) are fine — the schema is checked as a whole.
+
+table departments {
+ id: Id,
+ name: Text,
+ budget: Money,
+ head: Id? references employees(id), -- nullable FK: a dept may have no head
+ key (id),
+ key (name), -- a second candidate key
+}
+
+table employees {
+ id: Id,
+ name: Text,
+ email: Text,
+ dept: Id references departments(id) on delete restrict,
+ manager: Id? references employees(id), -- self-ref, nullable (the CEO has none)
+ salary: Money,
+ hired: Time,
+ key (id),
+ key (email),
+ check (salary > 0),
+}
+
+table projects {
+ id: Id,
+ name: Text,
+ dept: Id references departments(id),
+ key (id),
+}
+
+table assignments { -- employees ⋈ projects (many-to-many)
+ emp: Id references employees(id) on delete cascade,
+ project: Id references projects(id) on delete cascade,
+ hours: Int check (hours >= 0),
+ key (emp, project),
+}
+
+-- a database-level invariant SQL cannot state directly:
+constraint one_ceo check (count(employees |> where manager == none) <= 1)
+
+let over_budget =
+ employees
+ |> group by dept { payroll = sum(group.salary) }
+ |> join departments on dept == departments.id
+ |> where payroll > budget
+constraint payroll_within_budget check (count(over_budget) == 0)
+```
+
+### Queries
+
+```
+-- Q1 average salary per dept, only well-paid teams, sorted
+employees
+|> where salary > 50_000
+|> group by dept { headcount = count(group), avg_pay = mean(group.salary) }
+|> where avg_pay > 70_000 -- no HAVING; just `where` after group
+|> join departments on dept == departments.id
+|> select { team = departments.name, headcount, avg_pay }
+|> sort by avg_pay desc -- result is a Seq
+
+-- Q2 each employee with their manager (self-join; a None manager matches no row)
+employees
+|> join employees as mgr on employees.manager == mgr.id
+|> select { who = employees.name, boss = mgr.name }
+-- use `left join` instead to keep the CEO, with boss = none
+
+-- Q3 total hours and head-count per project
+assignments
+|> group by project { total_hours = sum(group.hours), people = count(group) }
+|> join projects on project == projects.id
+|> select { project = projects.name, total_hours, people }
+|> sort by total_hours desc
+
+-- Q4 fan-out: the compiler catches a double-count at compile time
+employees
+|> join assignments on id == assignments.emp -- 1:many ⇒ each emp recurs
+|> group by dept { payroll = sum(group.salary) } -- ⚠ salary double-counted per assignment
+-- correct (employee grain, no fan-out):
+employees |> group by dept { payroll = sum(group.salary) }
+
+-- Q5 top-3 earners per department (window ⇒ Seq)
+employees
+|> partition by dept order by salary desc extend rank = rank()
+|> where rank <= 3
+|> sort by dept, rank
+
+-- Q6 cumulative payroll as people are hired (window scan)
+employees
+|> partition by dept order by hired
+ extend cumulative_payroll = sum(salary) over rows[..current]
+|> select { dept, name, hired, cumulative_payroll }
+
+-- Q7 all transitive reports under employee #1 (recursion + qualified disambiguation)
+let reports = employees |> where manager is some -- narrows `manager` to Id
+ |> select { mgr = manager, emp = id }
+recursive chain =
+ reports
+ |> union (chain |> join reports on chain.emp == reports.mgr
+ |> select { mgr = chain.mgr, emp = reports.emp })
+chain
+|> where mgr == 1
+|> join employees on emp == employees.id
+|> select { report = employees.name }
+```
+
+### Mutation
+
+```
+-- 10% raise for one department
+update employees where dept == 3 set { salary = salary * 11 / 10 }
+
+-- hire + first assignment atomically (the assignment's FK to the new employee
+-- holds because both land in one transaction, checked once at commit)
+transaction {
+ insert into employees {
+ id = 42, name = "Bo", email = "bo@co", dept = 3,
+ manager = 7, salary = 9_000_000, hired = date("2026-06-01")
+ }
+ insert into assignments { emp = 42, project = 5, hours = 0 }
+}
+
+-- retire a project; its assignments cascade away
+delete from projects where id == 5
+```
+
+### Callouts
+
+- **Q2** `employees.manager == mgr.id`: `manager: Id?` against `mgr.id: Id` widens
+ by implicit `some` (#26); equality is total, so a None manager equals no row and
+ the CEO drops out of the inner join. `manager` must be qualified (both sides are
+ `employees`).
+- **Q4** the `⚠` is a *compile-time* warning from key-inference (#11, #15), not a
+ runtime surprise: `salary` is determined by `employees.id` but the joined rows
+ are keyed by `(emp, project)`.
+- **Q5 / Q6** output is a `Seq` (#20); `rank()` and `sum(…) over` are `Seq`-layer
+ operations, unavailable on a `Rel`.
+- **Q7** `where manager is some` narrows `manager` from `Id?` to `Id` (#26); the
+ recursive arm needs qualifiers (`chain.emp`, `reports.mgr`) because both inputs
+ carry `mgr` and `emp` (#25); `union` type-checks because both arms have heading
+ `{mgr, emp}`.
+
+## Open questions
+
+- **Name.** `RQL` is a placeholder.
+- **Relationship/FK layer (deferred).** The additive sugar from decision #12:
+ declared references, reverse navigation, path expressions, cardinality-typed
+ many-sides. When to build it; named-relationship & ambiguity rules; path
+ resolution.
+- **Comprehension surface.** Add a calculus surface over the same IR later, or
+ never?
+- **Value-based & peer frames.** SQL `RANGE` / `GROUPS` (frames by order-key
+ *value* or tie-peers) deferred; positional `rows[…]` frames first.
+- **Recursion with aggregation** (shortest paths, fixpoint + `min`/sum). Deferred
+ — needs well-founded aggregate-in-recursion semantics.
+- **Updatable views.** The view-update problem — when can a view assignment map
+ back to base relvars? Deferred.
+- **Defaults & generators.** Column defaults (constants / pure exprs) vs. impure
+ generators (identity/autoincrement, `now()`), which must arrive as explicit
+ host effects. Design TBD.
+- **Upsert / merge sugar.** A declared-intent form over assignment.
+- **Concurrency / isolation.** Likely out of scope (RQL is a language, not an
+ engine), but the transaction boundary needs an isolation story if it ever runs
+ concurrently.
+- **Implementation substrate.** Could ride cfree's pipeline as a CG-API frontend
+ (cf. RSN) or stand alone. Out of scope until the language is nailed.
diff --git a/doc/plan/RSN.md b/doc/plan/RSN.md
@@ -0,0 +1,627 @@
+# RSN — Ryan's Scripting Notation (planned work)
+
+A small, **statically-typed** scripting language that lets cfree replace its
+remaining external build/test dependencies — **make, POSIX shell, and Python** —
+with code it owns and ships, and that runs on cfree's *own* compiler pipeline.
+RSN is the language; the build system that grows on top of it is a later layer
+(sketched at the end). This doc is the living design.
+
+## The architecture, in one line
+
+**RSN is another frontend on cfree's pipeline** (alongside C, Wasm, toy): RSN
+source → public CG API → IR → execution. To make a garbage-collected language
+fit, **cfree's IR gains a first-class managed-reference type**, so RSN runs on the
+existing IR interpreter (arena-allocated first, then *precisely* GC'd) and can
+compile to *fast native code* through the existing backends later. Static typing
+is what makes
+this pay off: typed operations lower to real IR ops (not opaque runtime calls),
+so the optimizer/backends optimize RSN, and the managed-ref type makes every GC
+root statically known.
+
+Because this turns the IR into a GC-aware IR, it is **staged**, and Stage 1 is
+itself split to keep the highest-blast-radius change (touching the shared CG type
+system) out of the critical path until the language has proven itself:
+
+- **Stage 1a (now):** typed RSN frontend + interp, **arena-allocated, no
+ collector** — managed values live in a run-scoped arena freed at exit. The
+ interp carries a per-function ref-bitmap as local metadata; the C compiler's CG
+ type system is untouched. Cooperative fibers land here. This is where the
+ language gets exercised on real build/test scripts.
+- **Stage 1b (when reclamation is actually needed):** the managed-ref IR type +
+ GC op dialect + the interp's root scan and mark-sweep collector, replacing the
+ arena. Precise by construction. Native/Wasm/C backends reject managed-ref
+ functions.
+- **Stage 2 (later, optional):** native/JIT codegen for managed code — safepoints
+ + GC stack maps in the optimizer and the aa64/x64/rv64 backends — so a build
+ tool written in RSN compiles to a native binary.
+
+## Why
+
+Three external tools carry three jobs today: **make** (~1.5k lines: the
+dependency DAG, host/config branching, `CFREE_*_ENABLED` gating, the cross-arch
+matrix), **shell** (~12k LOC under `test/`: the corpus harness engine, process
+orchestration, golden-diffing), and **Python** (parsing tool output and wrangling
+text/JSON/CSV). RSN's goal is one owned, typed language that does all three, so
+cfree's only external dependency is a seed C toolchain. 3 → ~0.
+
+## Locked decisions
+
+**Architecture & runtime**
+
+1. **Typed RSN as a CG-API frontend** (`lang/rsn/`, with a type checker); no
+ bespoke bytecode — the IR / interp `InterpInsn` *is* RSN's bytecode.
+2. **GC-aware IR**: a managed-reference type + a small GC op dialect in the CG
+ type system + IR, firewalled so the C path pays nothing.
+3. **Precise, non-moving mark-sweep GC**, precise by construction.
+4. **Cooperative single-threaded fibers from day one**, on the interp's explicit
+ swappable stack and its reserved `CFREE_INTERP_BLOCKED` yield state. Cores are
+ saturated by spawned child processes, not threads.
+5. **Staged delivery**: interp first — arena-allocated (1a), then precise GC (1b);
+ native managed codegen later (Stage 2).
+
+**Language**
+
+6. **Syntax**: brace / C-ish (Go/Wren family), strictly **LL(1)**, **significant
+ newline** termination.
+7. **Bindings**: `let` (immutable) + `var` (mutable). Object *contents* stay
+ mutable regardless — this governs only rebinding the name.
+8. **Algebraic data types**: `struct` (named product) + `enum` (sum / variants) +
+ `(T, U, …)` tuple (anonymous product), all destructured by the one `pattern`
+ grammar (`match`/`if let`/`let`/`for`).
+9. **Nullability**: refs are non-null; absence is `T?` (`Option<T>`), unwrapped
+ with `??` (coalesce) and `if let`. Distinct from `Result` (absence vs error).
+10. **Errors**: `Result<T, E>` + Zig-style `try` (propagate) / `catch` (recover),
+ not exceptions. `Result`/`Option` are sum types; `match` also destructures
+ them.
+11. **`any` + `match`**: `any` is the dynamic escape hatch (JSON, parsed text);
+ `match` narrows it (`string s`, `int n`, …). `match` is the single multi-way
+ construct and general destructurer — literals/strings, ADTs, tuples, `any`
+ type-narrows, and or-patterns (`a | b -> …`); no fallthrough, no `switch`.
+12. **Generics — Stage 1 ships only the builtin containers** (`list<T>`,
+ `map<K, V>`) as hand-written specializations (packed/unboxed `list<int>` /
+ `list<float>`, a shared boxed instantiation otherwise). The general
+ instantiation strategy (.NET-style hybrid vs. plain monomorphization, plus
+ constraints for user-defined generic fns) is **deferred** until the feature
+ exists — not locked ahead of need.
+13. **Expression `if`/`match`** (value-producing); blocks appear only in
+ control-flow slots, so a bare `{}` stays a map literal and `?` is reserved for
+ optionals (no ternary). **`match` is the *single* multi-way construct** — there
+ is no separate `switch`; literal/string arms plus or-patterns (`a | b -> …`)
+ cover value-dispatch, arms never fall through.
+14. **Modules**: file = module; `pub` exports (private by default); `import
+ "x.rsn" as x`, qualified access.
+15. **Patterns**: Lua-style string patterns, not a full regex engine.
+16. **Methods via UFCS**: `a.f(b)` ≡ `f(a, b)`. `a.name` is a field load when
+ `name` is a field of `a`'s type, otherwise `a.name(...)` resolves `name` as a
+ free function with `a` as its first argument. One rule covers `objs.push`,
+ `src.replace`, `xs.map` — no method-declaration form, no receiver concept.
+ Tie-break: a field shadows a free function of the same name.
+17. **String interpolation**: `"...${expr}..."` (a lexer feature; the interpolated
+ `expr` is a full expression, `\$` escapes a literal `$`). Argv is still built
+ with lists (`["cfree", "cc", src]`), never by interpolating into one command
+ string — that keeps shell-quoting bugs out by construction.
+18. **First-class tuples + one pattern grammar**: `(T, U, …)` (arity ≥ 2) is a
+ storable, nestable anonymous product; `()` is unit and `(e)` is grouping.
+ Destructure-only (no `.0`/`.1`). A single `pattern` concept serves
+ `let`/`var`/`for`/`match`/`if let` (tuples always written `(a, b)`);
+ `map` yields `(K, V)` so `for (k, v) in m` is just a tuple pattern. `(a, b) =
+ (b, a)` is parallel assignment (RHS fully evaluated first).
+
+Superseded by this design: the earlier **NaN-boxed dynamic value model** and
+**dynamic typing** — typing removes the need for a universal tagged value.
+
+## What we reuse
+
+- **The frontend→CG→opt→interp pipeline.** RSN attaches as the compiler's
+ **interp sink** (`cfree_interp_program_attach`) so each compiled RSN function is
+ lowered to `InterpInsn`, exactly as `cfree run --no-jit` does for C.
+- **The CG type system** (`CfreeCgTypeId`) and recording IR (`src/cg`) — extended
+ with the managed-ref type, not replaced.
+- **The IR interpreter** (`src/interp`): a register VM with direct-threaded
+ dispatch, an **explicit swappable stack** documented as a "swap-ready substrate
+ for fibers/virtual threads," and a **reserved `BLOCKED` yield status**. The
+ fiber runtime and a yielding `run()` complete a half-built design.
+- **`src/core/`** (arena/Heap/Slice/StrBuf/VEC/hashmap/diag) — the frontend's and
+ runtime's substrate.
+- **`rt/lib/`** — the home for RSN's runtime helpers (container growth, string
+ ops, the collector, `any`-boxing, the iterator protocol).
+- **The optimizer and native backends** (Stage 2) — typed RSN lowers to real IR.
+- **CAS** (`include/cfree/cas.h`) — the build layer's content-addressed cache.
+- **The host-vtable convention** (`CfreeCasHost`, `CfreeInterpHost`) — RSN's host
+ vtable supplies all I/O and the scheduler's event source; which fields are bound
+ is the capability gate.
+
+## The IR extension: managed references + GC
+
+The heart of the design, specified with its firewall discipline.
+
+**The type.** A new CG type kind — a **managed ref**: an opaque, GC-owned pointer
+to a heap object of a given *shape*. Distinct from a raw C pointer. RSN's
+`string`/`list`/`map`/`fn`/`fiber`/`chan`/record/`enum`/tuple/`any` values are
+managed refs; `int`/`float`/`bool` are unmanaged scalars.
+
+**The ops** (a small dialect): `managed_alloc(shape) -> ref`, `ref_field_load/
+store`, `ref_elem_load/store`, `ref_eq`, `ref_null`. Safepoints are implicit at
+allocations, calls, and loop back-edges (no explicit op in the interp; once the
+1b collector is on it can collect at any allocation because all roots are in
+registers).
+
+**Object shapes for precise tracing.** Every managed object's header points at a
+**shape descriptor** listing which fields are managed refs (or a trace function
+for containers). **Sum types use a *per-variant* shape**: the collector reads the
+active tag, then traces that variant's ref-map. Records use a single field-map.
+
+**Root finding in the interp.** Each `InterpFunc` carries a static **ref-bitmap**
+marking which PRegs hold managed refs. Roots = every live ref-PReg across all
+live fibers' frames + globals + the string-intern table + a small **C-roots**
+scratch for native functions mid-construction. Precise; no stack maps at the
+interp level.
+
+**Two cost-containing invariants (locked):**
+- **Non-moving** → no write/read barriers, no pointer rewriting in codegen; the
+ backend only ever *identifies* roots, never *updates* them.
+- **No interior pointers** → element/field access re-derives from the base ref, so
+ a stack map only ever needs base refs.
+
+**Firewall discipline.** The managed dialect is an *optional IR feature*: a
+function with no managed refs needs no GC support and pays nothing. The C frontend
+never emits managed refs, so C → IR → {native, Wasm, C} is untouched and
+zero-cost. A backend without managed support asserts out cleanly on a managed
+function. The managed-IR capability is its own flag (`CFREE_MANAGED_IR`), distinct
+from the `CFREE_RSN_ENABLED` frontend that consumes it. The standing test: a
+C-only build compiles and links with the managed dialect compiled out.
+
+**Stage 2 (native).** Precise native GC needs safepoints + **stack maps**: the
+optimizer's liveness + register allocator record which managed refs are live at
+each safepoint/call, per arch. This is the real cross-cutting work and the reason
+native is staged separately. The two invariants above keep it to *root
+identification* rather than the full moving-GC codegen problem.
+
+## Type system
+
+Static, with local inference so scripts stay terse.
+
+- **Primitives:** `int` (i64), `float` (f64), `bool`, `string`.
+- **Containers (builtin generics):** `list<T>`, `map<K, V>`.
+- **Functions:** `fn(A, B) -> R`; closures capture bindings by reference.
+- **Records (product):** `struct Name { field: T, … }` — named-field managed
+ objects with a single shape.
+- **Sums (variant):** `enum Name { Variant, Variant(T, …), … }` — tagged unions
+ with optional payloads; per-variant shape. Constructors are values
+ (`Leaf(5)`, `Empty`); destructured by `match`.
+- **Tuples (anonymous product):** `(T, U, …)`, arity ≥ 2 — a managed ref to an
+ anonymous positional record (reuses the record shape, trivially traced).
+ Storable and nestable (`list<(string, string)>`, `(int, (int, int))`).
+ `()` is unit and `(e)` is grouping, so 1-tuples don't exist. **Destructure-only**
+ — no `.0`/`.1` (consistent with variant payloads, and it dodges the `pair.0.1`
+ float-literal lexer hazard); pull one element with `let (_, v) = p`. Packed
+ all-scalar tuples (e.g. unboxed `(int, int)`) are a deferred niche opt, same
+ bucket as packed `list<int>`.
+- **Builtin sums:** `Option<T> = { Some(T), None }` (with `T?` sugar and `??` /
+ `if let`), `Result<T, E> = { Ok(T), Err(E) }` (with `try` / `catch`).
+- **Builtin error type:** `struct Error { code: int, message: string }` — the
+ default `E` for host/process operations (`run`, `file_io`); user code may use
+ any `E`.
+- **Methods are UFCS:** there is no method-declaration form. `a.f(args)` lowers to
+ `f(a, args)` when `f` is a free function in scope; `a.name` is a plain field
+ load when `name` is a field. A field shadows a free function of the same name.
+- **`any`:** a managed ref to a `{ tag, payload }` box for genuinely dynamic data;
+ narrowed by `match` type-patterns that name the real (lowercase) types
+ (`string s`, `int n`, `list xs`, …). Boxing is confined to `any` (and
+ `Option<scalar>`), not universal.
+- **Inference:** locals infer from initializers (`let n = 5` ⇒ `int`); function
+ parameters and return types are annotated. Explicit type arguments are *not*
+ written at call sites (inferred), so `<` is unambiguous in expressions — no
+ turbofish.
+- **Generics — Stage 1 is builtin containers only.** `list<T>` and `map<K, V>`
+ ship as hand-written specializations: `list<int>` / `list<float>` are
+ packed/unboxed, ref element types share one boxed instantiation. The general
+ scheme for user-defined generics (.NET-style value/ref hybrid vs. plain
+ monomorphization, and constraints) is deferred to its own phase — when it lands,
+ the value/ref split and ref-vs-scalar-parameterized shape descriptors are the
+ intended direction, but nothing here is locked yet.
+- **Modules:** each `.rsn` file is a module; `pub` exports (private by default);
+ `import "util.rsn" as util` then `util.name`. Qualified, clash-free. Selective
+ `from … import { … }` is a possible later sugar.
+
+## Value model
+
+Typed, so values are their **natural machine representations** — no universal
+tagged word:
+
+| RSN type | Representation |
+|-----------------------------------|-----------------------------------------|
+| `int` / `float` / `bool` | i64 / f64 / i8 (unmanaged scalars) |
+| `string`, `list`, `map`, `fn`, `fiber`, `chan`, record, `enum` | managed ref |
+| `(T, U, …)` tuple | managed ref (anonymous positional record); all-scalar packing deferred |
+| `Option<T>` / `T?` | managed ref (a sum); `Option<scalar>` boxes |
+| `any` | managed ref to a `{tag, payload}` box |
+
+`int` is a true 64-bit integer (the reason typing was attractive — no NaN-box, no
+bigint spill, no pointer masking). Mixed `int`/`float` arithmetic requires an
+explicit `float(x)`. Strings are immutable, so they are always safe to share
+across fibers; **interning is selective, not universal** — short strings and any
+string used as a map key are interned (cheap equality, CAS keys), while large
+strings (file bodies, captured subprocess output) are held by reference without
+hashing them into the intern table. `Option<scalar>` boxing is later
+niche-optimizable; builtin `list<int>` stays packed/unboxed.
+
+## Errors and results
+
+Values, typed, with Zig's `try` ergonomics — no exceptions, no unwind tables.
+
+- **`Result<T, E>`** (a sum): `Ok(v)` / `Err(e)`. `try expr` unwraps `T` on `Ok`,
+ or returns the `Err` from the enclosing function on `Err`; it is type-checked
+ (the enclosing return must be a `Result` with a compatible `E`). At the top
+ level, a `try` that hits `Err` halts the run nonzero. `expr catch handler`
+ recovers, with an optional `|e|` binder.
+- **Process primitives:** `run(cmd) -> Result<Proc, Error>` (`Ok` on exit 0, fail
+ fast with `try`); `exec(cmd) -> Proc` (raw `{code, out, err}`, for harnesses
+ that expect nonzero exits).
+- `match` destructures `Result`/`Option`/any sum generally; `try`/`catch`/`??`/
+ `if let` are the ergonomic shortcuts.
+
+```
+fn link(objs: list<string>) -> Result<string, Error> {
+ let a = try run(["cfree", "ar", "rcs", "lib.a"] + objs) # propagate on failure
+ return Ok(a.out)
+}
+```
+
+## Surface syntax
+
+Brace-delimited, keyword-led, semicolon-free, mandatory braces on block bodies.
+Type annotations only in type positions. Comments are `#` to end-of-line (line or
+trailing); there are no block comments. Strings interpolate with `"${expr}"`.
+Method calls are UFCS (`xs.map(f)` ≡ `map(xs, f)`).
+
+```
+struct Target { name: string, inputs: list<string>, optimize: bool }
+
+enum Json {
+ Null
+ Bool(bool)
+ Num(float)
+ Str(string)
+ Arr(list<Json>)
+ Obj(map<string, Json>)
+}
+
+fn render(j: Json) -> string {
+ match j { # match is an expression
+ Null -> "null"
+ Str(s) -> quote(s)
+ Num(n) -> str(n)
+ Arr(xs) -> "[" + join(map(xs, render), ",") + "]"
+ _ -> "..."
+ }
+}
+
+fn build(t: Builder) {
+ let objs: list<string> = [] # annotation: empty list element type
+ for src in glob("src/**/*.c") {
+ let obj = src.replace(".c", ".o")
+ t.target(obj, [src], fn() { try run(["cfree", "cc", "-c", src, "-o", obj]) })
+ objs.push(obj) # mutating contents (let binding is fine)
+ }
+ let mode = if t.release { "opt" } else { "dbg" } # value-if; no ternary
+ t.target("libcfree.a", objs, fn() { try run(["cfree", "ar", "rcs", "libcfree.a"] + objs) })
+}
+```
+
+## Grammar (LL(1))
+
+LL(1) by construction; type syntax confined to type positions. The lexer
+suppresses the statement-terminating newline after an open bracket, a trailing
+binary operator, **or when the next non-blank token is a leading `.`** (so
+method/field chains may wrap across lines); the EBNF omits the newline token.
+Comments (`#` to end-of-line; no block form) and string interpolation
+(`"${expr}"`, `\$` escapes a literal `$`) are lexer-level; an interpolated `${…}`
+lexes as a parenthesized sub-expression.
+
+```ebnf
+program = { item } ;
+item = importDecl | [ "pub" ] decl | stmt ;
+importDecl = "import" STRING "as" IDENT ;
+decl = fnDecl | structDecl | enumDecl ;
+
+fnDecl = "fn" IDENT [ generics ] "(" [ params ] ")" [ "->" type ] block ;
+structDecl = "struct" IDENT [ generics ] "{" { IDENT ":" type [ "," ] } "}" ;
+enumDecl = "enum" IDENT [ generics ] "{" { variant [ "," ] } "}" ;
+variant = IDENT [ "(" type { "," type } ")" ] ;
+generics = "<" IDENT { "," IDENT } ">" ;
+params = IDENT ":" type { "," IDENT ":" type } [ "," ] ;
+
+stmt = letDecl | varDecl | whileStmt | forStmt | returnStmt
+ | "break" | "continue" | exprStmt | block ;
+letDecl = "let" pattern [ ":" type ] "=" expr ; (* pattern irrefutable *)
+varDecl = "var" pattern [ ":" type ] "=" expr ; (* pattern irrefutable *)
+whileStmt = "while" cond block ;
+forStmt = "for" pattern "in" cond block ; (* pattern irrefutable *)
+returnStmt = "return" [ expr ] ;
+exprStmt = expr [ assignOp expr ] ; (* tuple-of-lvalues LHS = parallel assign *)
+assignOp = "=" | "+=" | "-=" | "*=" | "/=" ;
+block = "{" { stmt } "}" ; (* value = trailing expr-stmt, else unit *)
+
+type = baseType { "?" } ;
+baseType = IDENT [ "<" type { "," type } ">" ]
+ | "fn" "(" [ type { "," type } ] ")" "->" type
+ | "(" [ type "," type { "," type } ] ")" ;
+ (* left-factored: "()" = unit; "(" T "," U … ")" = tuple (arity >= 2).
+ No "(" T ")" form — types have no paren-grouping, so a lone "(int)"
+ is a syntax error; unit-vs-tuple is decided after "(" on ")" vs a type. *)
+
+expr = coalesce [ "catch" [ "|" IDENT "|" ] expr ] ;
+coalesce = orExpr { "??" orExpr } ;
+orExpr = andExpr { "||" andExpr } ;
+andExpr = eqExpr { "&&" eqExpr } ;
+eqExpr = relExpr { ("==" | "!=") relExpr } ;
+relExpr = addExpr { ("<" | "<=" | ">" | ">=") addExpr } ;
+addExpr = mulExpr { ("+" | "-") mulExpr } ;
+mulExpr = unary { ("*" | "/" | "%") unary } ;
+unary = "try" unary | ("!" | "-") unary | postfix ;
+postfix = primary { callOp | indexOp | fieldOp } ;
+callOp = "(" [ args ] ")" ;
+indexOp = "[" expr "]" ;
+fieldOp = "." IDENT ;
+args = expr { "," expr } [ "," ] ;
+
+primary = INT | FLOAT | STRING | "true" | "false"
+ | ifExpr | matchExpr | spawnExpr
+ | IDENT [ structLit ] (* structLit suppressed in cond; see below *)
+ | parenExpr
+ | listLit | mapLit | fnLit ;
+parenExpr = "(" expr [ "," expr { "," expr } ] ")" ;
+ (* left-factored: no comma => grouping (value is the inner expr);
+ >= 1 comma => tuple (arity >= 2). The grouping-vs-tuple choice is
+ made on the token after the first expr: ")" vs ",". No 1-tuples,
+ so "(a,)" is rejected. *)
+spawnExpr = "spawn" block ; (* yields a fiber handle *)
+ifExpr = "if" ( "let" pattern "=" cond | cond ) block
+ [ "else" ( ifExpr | block ) ] ;
+cond = expr ; (* a bare `IDENT { … }` struct literal is NOT taken here; a
+ struct literal in condition position must be
+ parenthesized: `if (Foo{…}).ok { … }` *)
+matchExpr = "match" expr "{" { matchArm } "}" ;
+matchArm = pattern "->" ( block | expr ) [ "," ] ;
+pattern = patternAtom { "|" patternAtom } ; (* or-pattern; all alternatives
+ must bind the same names+types.
+ `|` here is pattern position,
+ distinct from catch's `|e|`. *)
+patternAtom = "_" | literal
+ | IDENT IDENT (* type-narrow: type binder *)
+ | IDENT [ "(" pattern { "," pattern } ")" ] (* Ctor / binding *)
+ | "(" pattern "," pattern { "," pattern } ")" ; (* tuple, arity >= 2 *)
+structLit = "{" [ field { "," field } [ "," ] ] "}" ;
+field = IDENT ":" expr ;
+listLit = "[" [ expr { "," expr } [ "," ] ] "]" ;
+mapLit = "{" [ entry { "," entry } [ "," ] ] "}" ;
+entry = ( STRING | "[" expr "]" ) ":" expr ; (* string or computed key;
+ bare `IDENT:` is a struct field, never a map key, so
+ `{name: x}` is unambiguous and `map<int,V>` literals are
+ written `{[k]: v}` *)
+fnLit = "fn" "(" [ params ] ")" [ "->" type ] block ;
+literal = INT | FLOAT | STRING | "true" | "false" ;
+```
+
+LL(1) tactics that constrain the language:
+
+- **`let`/`var` lead bindings; `pub` leads exports; `import` leads imports.**
+- **Assignment is statement-level** (`=` vs `==` unambiguous; no `if (a = b)`).
+- **The `{` rule.** In a statement or control-flow slot (`block`, after `if`/
+ `else`/`while`/`for`/`fn`/`spawn`, and a match arm's `->`), `{` is a block. In
+ expression position, `{` is a map literal and `Name{...}` is a struct literal.
+ To use a map where a block is expected, parenthesize: `-> ({...})`. This is what
+ lets value-`if`/`match` coexist with `{k: v}` maps under LL(1).
+- **Struct literals are suppressed in condition position.** Inside the `cond` of
+ `if`/`while`/`for`, a bare `Name { … }` is *not* a struct literal — otherwise
+ the `{` is eaten as a struct body and starves the required block (the classic
+ Rust/Go conflict). A struct literal needed in a condition must be parenthesized:
+ `if (Foo{…}).ok { … }`. This is what actually makes the grammar LL(1); without
+ it `IDENT [structLit]` and the control-flow `{`-block production collide.
+- **The `(` rule.** In expression position `(` is left-factored: parse one `expr`,
+ then a following `)` means it was grouping and `,` means it's a tuple — so
+ grouping and tuples never conflict at the open paren. In type position there is
+ no grouping, so `()` is unit and `(T, …)` is a tuple, decided right after `(`.
+ In pattern position a leading `(` is always a tuple pattern. (`spawn`/`fn`
+ before `(` are keywords, handled before this rule.)
+- **`fn` at item/stmt start is a named declaration**; anonymous `fn` only in
+ expression position.
+- **`try` is a unary prefix; `catch` a low-precedence tail** with an optional
+ `|e|` binder — LL(1) because `|` is not in FIRST(expr).
+- **`<` is a type-only token in type position** (generics, declared sites). No
+ explicit type args at call sites, so `<` is always less-than in expressions.
+- **Patterns** decide by one lookahead after the leading token: `_`/literal are
+ immediate; a leading `(` is a **tuple pattern**; an `IDENT` followed by another
+ `IDENT` is a type-narrow, by `(` is a constructor, else a binding
+ (constructor-vs-binding for a bare identifier is resolved in the checker against
+ known variants). **One `pattern` grammar serves `let`/`var`/`for`/`match`/`if
+ let`**; `let`/`var`/`for` additionally require it to be *irrefutable* (binders,
+ `_`, nested tuples thereof — never a literal/constructor/type-narrow that could
+ fail to match).
+- **Or-patterns join alternatives with `|`** (`"+" | "-" -> …`): after a
+ `patternAtom`, lookahead `|` continues the or-pattern, anything else ends it.
+ This pattern-position `|` does **not** clash with `catch`'s `|e|` binder (expr
+ position), so the "`|` ∉ FIRST(expr)" tactic still holds. Or-patterns are
+ refutable ⇒ they appear only in `match`/`if let`, and they are how `match`
+ subsumes C `switch` multi-`case`/fallthrough — **arms never fall through**.
+- **`if let`** vs `if`: peek `let` after `if`.
+- **Parallel assignment.** `(a, b) = (b, a)` is an assignment whose LHS is a
+ tuple of lvalues; the RHS tuple is fully evaluated before any target is written
+ (so swap works). Only `=` takes a tuple LHS — not `+=` and friends.
+
+## Semantics defaults
+
+Decided, revisitable — these were settled as defaults rather than formal forks:
+
+- **Integer overflow traps** (halts the run) — correctness over speed for a tool
+ language, where a silently wrapped size/offset is the worse failure. Explicit
+ `+%` / `-%` / `*%` perform two's-complement wrapping where it is actually
+ wanted.
+- **`/`** truncates toward zero; **`%`** sign follows the dividend (C). Mixed
+ `int`/`float` needs `float(x)`.
+- **`==`** compares scalars and strings by value, and lists/maps/records/sums/
+ tuples structurally by contents; identity is a separate `is`.
+- **Iteration:** `for x in list`, `for (k, v) in map`, `for i in range(n)`; one
+ iterator protocol (`next() -> T?`) so user types opt in — `map` yields `(K, V)`
+ tuples, so multi-binding is just a tuple pattern, not a special case.
+- **Unit return:** a function with no `-> T` returns unit (nothing), distinct from
+ `None`.
+- **Strings:** immutable, UTF-8; `.len` is bytes; codepoint/grapheme helpers and
+ patterns operate on bytes.
+- **Closures** capture bindings by reference (mutating a captured `var` is
+ visible). **Loop bindings are fresh per iteration** — each `for`/`while` turn
+ gets its own binding, so a closure created in the loop captures *that*
+ iteration's value (Go 1.22 / JS-`let` semantics). The build example's per-`src`
+ recipe closures depend on this.
+
+## Execution: RSN on the interp
+
+- **Pipeline:** source → lexer → LL(1) parser → AST → type check → CG-API
+ lowering → IR → (interp sink) `InterpInsn`. Typed ops are real IR ops; only
+ `any` and a few container/iterator primitives are `rt/lib/` helper calls.
+- **Register VM, direct-threaded dispatch** — the existing interp.
+- **Fibers** are the interp's explicit `InterpStack`s. A yielding op (`run`,
+ `read`, `sleep`, channel ops) returns `CFREE_INTERP_BLOCKED`; a **scheduler**
+ (living with the host, since readiness depends on host events) resumes the stack
+ when ready. Cooperative: no preemption.
+- **The one limit (like Lua):** a fiber may yield only at RSN/IR instruction
+ boundaries and blessed scheduler-aware host calls — never from inside a native C
+ frame reached via FFI.
+- **Surface:** `spawn { … }` is an *expression* yielding a fiber handle (`let h =
+ spawn { … }`); `parallel(list, limit, fn)` → bounded fan-out, results in input
+ order; `chan()` / `.send` / `.recv`.
+- **Determinism guard:** fibers are execution-phase; build-graph *definition* runs
+ single-fiber, and the build-file capability gate nulls `spawn` and writes
+ (read-only `glob`/`stat` stay live so the graph can discover its inputs).
+
+## Memory and GC
+
+- **Non-moving mark-sweep**, precise via the managed-ref type. Allocation funnels
+ through `rsn_alloc(shape, size)`; collection runs at an allocation threshold.
+- **Stop-the-world is trivial** (single-threaded, cooperative): collect only at
+ allocation safepoints, where all roots are in registers.
+- **Roots:** live ref-PRegs across all fibers (via each function's ref-bitmap),
+ globals, the string-intern table, the C-roots scratch. Tracing uses each
+ object's shape (per-variant for sums).
+- **Finalizers:** `handle`/`chan`/`fiber` and host-resource objects get an
+ optional `finalize` slot in their shape, run on collection and at teardown.
+- **Arena stage (1a):** managed values bump-allocate into a run-scoped arena and
+ the collector is deferred (free at run end); object headers and shapes are
+ designed in from the start, so turning the collector on in 1b is localized.
+
+## Embedding API and capability gating
+
+```c
+typedef struct CfreeRsnHost {
+ const CfreeFileIO* file_io; /* build-def: read/glob/stat live, writes gated */
+ int (*spawn)(void* user, const CfreeRsnSpawn*, CfreeRsnProc** out);
+ int (*proc_reap)(void* user, CfreeRsnProc*, int* exit_code); /* scheduler */
+ int64_t (*clock_ns)(void* user); /* gated */
+ int (*mkdir_p)(void* user, const char* path);
+ void* user;
+} CfreeRsnHost;
+
+CFREE_API CfreeRsn* cfree_rsn_new(CfreeContext*, const CfreeRsnHost*);
+CFREE_API CfreeStatus cfree_rsn_eval(CfreeRsn*, CfreeSlice src, const char* name);
+CFREE_API void cfree_rsn_register(CfreeRsn*, const char* name,
+ CfreeRsnNativeFn, void* ud);
+CFREE_API void cfree_rsn_free(CfreeRsn*);
+```
+
+- **Script mode** (`cfree rsn foo.rsn`): full vtable — replaces the shell glue.
+- **Build-definition mode**: `spawn`/`clock_ns`/write-`file_io` NULL, but
+ **read-only filesystem queries stay live** (`glob`, `stat`, read) — a
+ `build.rsn` must discover its inputs (`glob("src/**/*.c")`) to declare targets,
+ yet cannot `run()` processes, write files, or read the clock, so the graph is
+ hermetic and ad-hoc effects are a load-time error. The capability gate is
+ therefore per-operation (read vs. write vs. spawn), not a single wholesale
+ `file_io` switch. One frontend, two privilege levels, chosen by the caller.
+
+## The build layer (later)
+
+A library *in RSN* plus a thin engine: `target(name, inputs, outputs, recipe)`;
+staleness by **content hash via the CAS** (not mtimes); parallel execution via
+`parallel`/fibers bounded to `ncpu()`; `cfree build` loads `build.rsn` in
+build-definition mode for the graph, then executes it in script mode. Out of scope
+for the milestones below — it is what the language is *for*.
+
+## Implementation phases
+
+**Stage 1a — typed RSN, interpreted, arena-allocated (no collector yet):**
+
+1. `lang/rsn/` lexer + LL(1) parser → AST (full grammar above). Parse-corpus tests
+ under `test/rsn/`.
+2. Type checker: local inference, ADTs (`struct`/`enum`/tuple), builtin-container
+ generics, UFCS resolution, the uniform `pattern` grammar (irrefutability check
+ for `let`/`var`/`for`, exhaustiveness for `match`), `Result`/`Option`/`try`,
+ modules.
+3. RSN → CG-API lowering (typed ops → IR ops; `any`/containers/iterators via
+ `rt/lib/`). Managed values live in a **run-scoped arena, freed at run end** —
+ the interp carries a per-function ref-bitmap as interp-local metadata, but no
+ new CG type kind is added yet, so the C compiler's type system is untouched.
+4. Fibers: complete the `BLOCKED`/yield path + scheduler over `InterpStack`s +
+ host vtable (`spawn`/`run`/`read`/`clock`); `spawn`/`parallel`/`chan`.
+5. Stdlib: strings, list/map, iterators/`range`, Lua-style patterns, JSON (→
+ `any`), `run`/`exec`.
+6. CLI `cfree rsn` + per-operation capability gating.
+
+Goal of 1a: prove the language on real `build.rsn` / test scripts *before* the
+invasive IR surgery. Short-lived runs make arena-free-at-exit genuinely adequate.
+
+**Stage 1b — precise GC (added once runs are long-lived enough to need it):**
+
+7. **CG/IR managed-ref type + GC op dialect** (`src/cg`), gated `CFREE_MANAGED_IR`;
+ object headers + (per-variant) shape descriptors. Firewall test: C-only build
+ green with the dialect compiled out.
+8. Lowering emits managed ops for refs; the interp's root scan + mark-sweep
+ collector + finalizers replace the arena. (The 1a ref-bitmap becomes the
+ precise root map.)
+
+**Stage 2 — native managed codegen (optional, demand-driven):**
+
+9. Safepoints + GC stack maps in the optimizer (liveness/regalloc).
+10. Native emit for managed functions across aa64/x64/rv64.
+11. JIT/AOT RSN: `cfree rsn --jit`, and `cfree build` emitting native build tools.
+12. Wasm/C-backend managed support (furthest out).
+
+**Then:** the build layer.
+
+## Resolved decisions
+
+All of: the architecture (typed CG-API frontend; GC-aware IR; precise non-moving
+mark-sweep; cooperative fibers; staged interp → 1a arena → 1b GC → native);
+`let`/`var`; non-null + `T?`; `Result`/`try`/`catch`; ADTs (`struct`/`enum`) with
+`match`; `any` narrowing via lowercase type-patterns in `match`; **UFCS methods**;
+**string interpolation**; **`#` line comments**; expression `if`/`match`
+with the `{}`-block rule and the struct-literal-in-condition restriction;
+qualified module imports; Lua-style patterns; separate `int`/`float` with explicit
+conversion; **integer overflow traps**; **selective string interning**;
+**per-operation capability gating**; **first-class tuples with one uniform
+`pattern` grammar (Rust-style) + parallel assignment**; **`match` as the single
+multi-way construct (no `switch`) with or-patterns and no fallthrough**; Stage-1
+generics = builtin containers only; the semantics defaults above; NaN-box value
+model dropped.
+
+## Open decisions
+
+None block Stage 1a. Deferred until their phase: **selective imports** (`from …
+import { … }` sugar), **range syntax** (`a..b` sugar vs the `range()` builtin) —
+which would also unlock **range patterns** in `match` (`1..10 -> …`),
+**record-style enum payloads** (`Variant { … }`), and **user-defined generic
+functions** (instantiation strategy + constraints — see Type system). The earlier
+tuple gap is now resolved (first-class `(T, U)`, see Type system / Value model /
+Grammar); the remaining tuple sub-question deferred to its phase is **packed
+all-scalar tuple representation** (e.g. unboxed `(int, int)`), niche-optimized
+alongside packed `list<int>`. **Match guards** (`pattern if cond -> …`) are
+deliberately deferred — `match` stays structural + or-pattern dispatch, and
+conditional logic lives in `if`/`else if`; guards can be added later
+non-breakingly (`matchArm = pattern [ "if" expr ] "->" …`).
+
+## Naming
+
+Language: **RSN — Ryan's Scripting Notation.** Source extension `.rsn`. Frontend
+`lang/rsn/`, CLI `cfree rsn`, frontend flag `CFREE_RSN_ENABLED`; the GC-aware IR
+it depends on is gated separately (`CFREE_MANAGED_IR`). When RSN ships, the
+durable design moves up to `doc/RSN.md` and this roadmap shrinks to what remains
+open.