kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit b8f536ed61f21edb813cdb1fbc20e6ff7c435cfa
parent 1233daa9e483569e46de1aff05ef023cbe92dc14
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  1 Jun 2026 17:33:42 -0700

plan: sketch RQL RSN

Diffstat:
Adoc/plan/RQL.md | 927+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adoc/plan/RSN.md | 627+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 1554 insertions(+), 0 deletions(-)

diff --git a/doc/plan/RQL.md b/doc/plan/RQL.md @@ -0,0 +1,927 @@ +# RQL — a typed relational query language (planned work) + +A replacement for SQL: declarative data **query and manipulation**, but *typed, +consistent, and close to the relational algebra*. `RQL` is a working name (TBD). + +SQL conflates three layers that should be separate — a surface syntax, a logical +algebra, and a physical plan — and its surface does not map cleanly onto its own +algebra. That mismatch is the source of most of SQL's irregularity. RQL keeps the +layers apart: a small, total, well-typed **core algebra** (the IR) and a +**pipeline surface** that maps onto it transparently. Same discipline as a +compiler: surface → IR → plan, where the IR is where the laws live. + +This doc is the living design and the ledger of decisions. It records *what* we +decided and *why*; open questions sit at the end and graduate up into the ledger +as we settle them. + +## What we're correcting + +The specific SQL faults RQL is designed to remove: + +- **Clause order ≠ logical order.** `SELECT` is written first but evaluated + nearly last, which breaks composition, top-to-bottom reading, and completion. +- **Three-valued logic.** `NULL` poisons predicates; `NOT IN` with a null is a + silent footgun; aggregates skip nulls inconsistently. +- **`GROUP BY` conflates partitioning with aggregation**, forcing the "every + select item must be grouped or aggregated" rule and a separate `HAVING`. +- **Weak typing & implicit coercion**; no sum types, no row polymorphism, errors + surface at runtime. +- **Bags pretending to be sets**, with `DISTINCT` bolted on, ordered/duplicate + column names, and `SELECT *`. +- **Universal quantification is miserable** — Codd's division ("supplies *all* + parts") has no clean SQL spelling. + +## Locked decisions + +Each entry is a decision plus a one-line rationale. Numbers are stable IDs in +decision order; the grouping is thematic, so IDs need not run contiguously within +a group. + +### Semantics + +1. **A relation is a set of distinct records.** A record (tuple) is an unordered + set of named, typed fields; a relation is a set of records over one heading. + Boolean-semiring sets, not bags. — Clean algebraic laws ⇒ sound optimizer + rewrites (predicate/projection pushdown, join reordering are real equalities); + `count` is meaningful (distinct), not "however many dupes were present." +2. **Base tables must declare a key.** Distinctness is a checked property, not a + hope. — Makes "set" honest, and prevents aggregating over a lossy projection. +3. **Multiplicity is data, not a second semantics.** When duplicates must be + counted, carry an explicit `count: Int` (or `weight`) field and weight + aggregates explicitly (`sum(group.amount * group.count)`). — Buys the bag / K-relation + (provenance-semiring) expressiveness *as a value*, without two equalities, a + truncating `minus`, or weight-aware aggregate magic. Accepted cost: a synthetic + id at ingest for genuinely keyless multiset data. Revisit if heavy event/log + ingestion makes that friction too high. +4. **Ordering is not part of a relation.** `sort` / `take` / `window` produce a + **`Seq`**, a distinct type. — Relations are unordered; ordering is a + presentation/boundary concern and shouldn't pollute relational laws. + +### Missing values + +5. **No `NULL`. Optionality lives in the type** (`T?` = `Option T`), introduced + *only* by outer joins. Predicates are total, 2-valued booleans; defaults via + `??`. — Removes three-valued logic and the `NOT IN` / aggregate-skipping + footguns; "missing" becomes explicit and local. +26. **Option semantics: total equality, `some`-widening, `where`-narrowing.** + `==` / `!=` on `T?` are total Bool (structural: `none == some x` is `false`); a + non-option widens to an option implicitly (`T <: T?`, the only implicit + conversion); `where x is some` (or `x != none`) narrows `x` to `T` downstream. + Ordering, arithmetic, and value-use on `T?` need explicit resolution. — Keeps + option handling total and ergonomic without reintroducing 3-valued logic + (refines #5). + +### Surface + +6. **Pipeline / algebra surface:** `R |> op |> op …`, one stage per core + operator, written in evaluation order. A pipeline is just a relation + expression (there is no `from` keyword), so it nests anywhere a relation is + expected. — Composition, top-to-bottom reading, and a 1:1 correspondence with + the IR. A comprehension (calculus) surface over the same IR is deferred, not + rejected. +7. **Grouping only partitions; aggregation is ordinary expressions over the + relation-valued group field.** No `HAVING` (it is `where` after `group`); no + "must appear in GROUP BY" rule (referencing a non-key scalar after grouping is + a type error). — One uniform mechanism; the awkward SQL special cases vanish. + +### Scope + +8. **Query + DML + constraints, unified by relational assignment.** A database is + a set of relation variables; insert / update / delete are all `R := <relational + expression>`; keys, foreign keys, and arbitrary predicates are declared + constraints checked on write. — One mechanism (Third Manifesto / Tutorial D + style); set semantics makes assignment idempotent and total. + +### Types + +*(First draft — see "Type system" below. The shape is settled; specific +inference rules and the keys-in-type ambition dial are open.)* + +9. **Row-polymorphic structural records.** Headings are structural record types; + `extend` / `project` / `rename` / `join` are polymorphic over "the other + fields" via a row variable, with a field-absence ("lacks") predicate. + Duplicate labels are forbidden (a heading can't name a field twice). +10. **Named scalar domains, no implicit cross-domain coercion.** `Money` is not + `Float`; conversions are explicit. — Supports "typed and consistent." +11. **Keys/FDs live in a metadata + inference layer, not the type proper.** Keys + are declared on base tables and *inferred* through operators alongside the + type; they power the fan-out-must-aggregate check, optimizer hints, and + warnings, but do not participate in type equality. — Pragmatic sweet spot; + leaves a clean path to full type-level keys if signatures ever need them. + +### Joins + +12. **Explicit-predicate joins + key-inference are the foundation; the + relationship/FK layer is deferred, additive sugar.** `join R on <predicate>` + is the primitive — it covers equi, theta, range, and self joins. When the + condition equates a key of one side, key-inference (decision #11) bounds the + join to ≤1 match (no fan-out) and feeds the must-aggregate warning; a many-to- + many join is flagged. A relationship layer — declared references, reverse + navigation, path expressions (`order.customer.region.name`), and + cardinality-typed "many" sides that *force* aggregation — rides on top later. + It is purely additive and breaks nothing. — Ships a typed join with a + cardinality story over arbitrary / un-annotated data first, and defers the + schema-declaration machinery (named relationships, reverse naming, path + resolution) until the core is proven. +25. **Join result disambiguation: source-qualified names.** Every field carries + the source relvar/alias it came from; a bare name is sugar for unambiguous + access. A `join` forms the *qualified union* of both sides (no auto-merge): + same bare names coexist as distinct qualified fields and must be referenced + `source.field`. `join R as a …` supplies a qualifier (needed for self-joins); + `select` / `extend` emit fresh unqualified fields, so qualification is + join-local. Natural join (merge common fields, equal types required) is opt-in + (`join R natural`). — Refines #9: labels are unique *qualified* names, bare + access must be unambiguous. + +### Aggregation + +13. **Grouping is partition + ordinary expressions; there is no "must-group" + rule.** The block form `group by K { … }` binds the per-group relation to the + keyword `group` and desugars to `group K into g |> extend … |> remove g`. + `group` is a relation; aggregates — written as function calls, `count(group)`, + `mean(group.salary)` — are the only relation→scalar bridge; `group.f` where a + scalar is expected is a plain type error. — Recovers SQL's grouping rule from + the type system; `having` is just `where` after the block. +14. **Aggregates are commutative monoids; column extraction is a bag.** + `group.f` yields one value *per row* (a bag), not a dedup'd set, so sets + + keys make aggregation correct with no multiplicity bookkeeping. Order- + dependent reductions (first, in-order concat, lag/lead) are *not* aggregates + — they live on the `Seq` layer. — A clean dividing line that also explains + why windows are separate. +15. **Empty / option / fan-out aggregation is explicit, never silent.** Empty + input → option (`mean`/`min`/`max` → `None`) or identity (`sum`/`count` → + `0`); aggregating an option column is a type error (resolve first); + duplicate-sensitive aggregates over a fanned-out field warn. — Replaces SQL's + silent NULLs, null-skipping, and double-counting with types plus one warning. + +### Manipulation (DML) + +16. **A database is a set of relvars; the only mutation primitive is relational + assignment `R := <expr>`.** Insert / delete / update are sugar over it, + reusing the query operators and expressions; assignment requires a *static + heading match* (no positional columns). Set semantics ⇒ insert/delete + idempotent, duplicate-key insert rejected (no silent upsert). Adds the `with` + operator (override an existing field) as the typed sibling of `extend` (add a + new field). — One mechanism; query and mutation share a language. +17. **Constraints are boolean relational expressions that must hold after every + transaction.** Keys and FKs are named, incrementally-checked special cases; + tuple-level *and* database-level (multi-relvar) checks are unified and + first-class. Referential actions (restrict default / cascade / set-null) are + declared repairs applied within the transaction. — Cross-relation invariants + SQL cannot express become ordinary, checkable booleans. +18. **The transaction is the checking and atomicity boundary.** A batch of + assignments is applied together with all constraints checked once at commit; + any violation rolls the batch back atomically; a bare assignment is a + singleton transaction. — Makes FKs and cascades coherent (parent + child + together) and gives one consistency boundary. + +### Recursion + +19. **Recursion is a restricted least fixpoint: monotone, range-restricted, with + stratified negation.** `recursive R = base |> union (… R …)`, terminating by + construction (semi-naive evaluation); the result is an ordinary relation; + `closure(...)` is sugar. — First-class reachability / ancestry / BOM + explosion without SQL's recursive-CTE pain or unrestricted-recursion + non-termination. Recursion *with aggregation* (shortest paths) is deferred. + +### Ordered data & windows + +20. **The `Seq` layer carries all order-dependent computation.** `sort by` + produces `Seq R`; scans (`Seq → Seq`: running/moving/`lag`/`lead`/rank) and + ordered folds (`Seq → scalar`: `first`/`last`/`concat`) live there, never in + relational aggregates. The `Rel`/`Seq` type boundary *enforces* decision #14. + — Order is a type, not a convention. +21. **Windows reuse partitioning + sorting + `extend`, not a bespoke `OVER`.** + `partition by P order by S extend {…}` desugars to `group P into g |> with g = + (g |> sort by S |> extend <window-exprs>) |> ungroup g`; frames are relative + ranges (`rows[..current]`, `rows[-3..0]`); `lag`/`lead`/`rank` are `Seq` + built-ins; output is a `Seq`. — Windows fall out of pieces we already have. + +### Concrete syntax + +22. **`=` binds values, `:` ascribes types; a bare field name is punning; `--` + starts a comment.** Records/tuples are `{ f = e }`, headings and field decls + are `{ f: T }`, and `x` in a record / select / group block means `x = x`. — + Removes the value/type brace ambiguity in the early sketches. +23. **`.` is type-directed.** `e.f` is field access on a record (→ scalar), column + extraction on a relation (→ `Col T`, a per-row bag), or option-propagating on + `T?` (→ `T'?`). A `Col` is legal only as an aggregate / per-row argument; used + as a scalar it is a type error — which *is* the no-must-group rule. — One + operator, three meanings, all from the operand's type. +24. **Pipelines are bare relation expressions; aggregates are function calls; the + group block binds `group`.** No `from` keyword (a pipeline nests anywhere a + relation is expected; `from` survives only in `delete from` / `insert … from`). + Aggregates use call syntax (`count(group)`); the sugar `group by K {…}` binds + the keyword `group` (`group K into name` for nesting). `select {…}` is sugar + (extend + project), `project` / `remove` are pure π, and `let` names a relation + expression (replacing CTEs). — Maximal composability, one call syntax. + +## Type system (draft) + +### Type vocabulary + +- **Scalar domains:** `Int`, `Float`, `Bool`, `Text`, `Time`, … and user-defined + domains over them. No implicit coercion across domains. +- **Option:** `T?` = `Option T`. Only outer joins introduce it. +- **Record (heading):** `{ a: T1, b: T2, … }` — unordered, uniquely-named, + typed fields. Field order is irrelevant. +- **Relation:** `Rel R`, a set of records over heading `R`, e.g. + `Rel {sensor: Id, at: Time, value: Float}`. +- **Sequence:** `Seq R`, an *ordered* list of records (may repeat). Produced by + `sort`; the home of order-dependent operations (windows, top-n). +- **Nested relation:** a field may itself be `Rel R'` (relation-valued attribute). + This is how `group` works; it appears only as an intermediate. + +### Core operators (the IR) + +Each is total and (mostly) `Rel → Rel`. Row variable `ρ` stands for "the other +fields"; `f ∉ ρ` is the lacks predicate. + +``` +restrict (p: {ρ} -> Bool) : Rel {ρ} -> Rel {ρ} +extend (f = e: T) : Rel {ρ} -> Rel {ρ, f: T} [f ∉ ρ] +with (f = e: T') : Rel {f: T | ρ} -> Rel {f: T' | ρ} [f ∈ ρ] +project F : Rel {F | ρ} -> Rel {F} +remove F : Rel {F | ρ} -> Rel {ρ} +rename a -> b : Rel {a: T | ρ} -> Rel {b: T | ρ} [b ∉ ρ] +join : Rel R1 -> Rel R2 -> Rel (R1 ⋈ R2) +union/minus/intersect : Rel R -> Rel R -> Rel R +group K into g : Rel {K, A} -> Rel {K, g: Rel {A}} +ungroup g : Rel {K, g: Rel A} -> Rel {K, A} +sort by e : Rel R -> Seq R (→ Seq layer) +``` + +Aggregates are ordinary function calls `Rel A -> scalar` (`count`, `sum`, `mean`, +…), used inside `extend` after `group`; `count(group)` is set cardinality. + +### Join headings & disambiguation + +The default `join` (explicit predicate, decision #12) forms the **qualified +union** of the two headings: every field keeps the source relvar/alias it came +from, and same-named fields from the two sides coexist as *distinct* qualified +fields rather than being merged. A **bare name is sugar for unambiguous access**; +when a name appears on both sides you must qualify it `source.field`: + +``` +employees |> join departments on dept == departments.id +-- both sides have `name`, so it stays qualified: +|> select { who = employees.name, team = departments.name } +``` + +- **Aliases:** `join R as a on …` supplies the qualifier `a` — required for + self-joins (`join employees as mgr on manager_id == mgr.id`). +- **Equi-join keys** are kept as the two distinct fields they are (`dept` and + `departments.id` above); a `select` or natural join collapses them if wanted. +- **`select` / `extend` clean up:** the fields they emit are fresh and + unqualified, so qualification is a transient, join-local concern. +- **Natural join is opt-in:** `join R natural` merges the fields common to both + (which must then have equal types) into single bare fields and joins on their + equality — the Tutorial-D behavior, now explicit rather than the default. + +This refines decision #9: a heading's labels are unique *qualified* names; bare +access must resolve to exactly one of them, or it is a type error. + +### Keys and functional dependencies + +Keys are tracked beyond base-table declarations and **inferred through +operators**, so the compiler can reason about cardinality: + +- `restrict`: keys preserved. +- `project F`: a declared key survives iff it ⊆ F; the full projected heading is + always a trivial superkey (sets auto-dedup). +- `group K into g`: `K` becomes the key of the result by construction. +- `join` on fields `J`: if `J` is a key of one side, that side contributes at + most one match (no fan-out) and the other side's key is preserved; if `J` keys + neither side, the join may fan out and the result key is the union of both. + +**Payoff:** the compiler knows when a join fans out, so accidental row +multiplication feeding a `sum`/`count` becomes *detectable* rather than silently +wrong. This is the safety net behind decisions #1–#2. + +Resolved (decision #11): keys/FDs live in a **metadata + inference layer**, not +the type proper — propagated for cardinality reasoning, the fan-out check, and +optimizer hints, but not part of type equality. + +### Inference + +Pipeline stages are fully inferred (Hindley–Milner over rows); base tables and +domains are annotated. Query result types are inferred and checkable. Basis: +Rémy/Wand-style records with absence + row unification (closer to that than to +Leijen's scoped labels, since relations forbid duplicate labels). + +## Aggregation & grouping + +### Grouping = partition + ordinary expressions + +`group by K { … }` is sugar. The primitive is `group K into g` (nest): it turns +`Rel {K, A}` into `Rel {K, g: Rel {A}}` — one row per distinct `K`, with the +matching rows collected into the relation-valued field `g`. The block form adds +fields computed from `g` and drops `g`: + +``` +R |> group by k1, k2 { a = count(group), b = mean(group.salary) } + +-- desugars to (the block binds the keyword `group`): +R |> group (k1, k2) into g + |> extend a = count(g), b = mean(g.salary) + |> remove g +``` + +`group` is in scope in the block; *output* it to keep the nested relation +(`{ emps = group, n = count(group) }` is a department with its list of +employees), or omit it and it's dropped. + +### Why there's no "must appear in GROUP BY" rule + +After grouping, `k1, k2` are scalar key fields and `group` is a *relation*. No +special rule is needed — it falls out of types: + +- `group.salary` is a **column**: the `salary` values across the group's rows (a + bag — see below), not a scalar. +- An **aggregate** is the only thing that turns a column/relation into a scalar + (`count(group) : Int`, `mean(group.salary) : Float`). +- `group.salary` where a scalar is expected is a type error. That *is* SQL's + "grouped or aggregated" rule, recovered for free. + +### Columns vs. projection — why aggregation is correct + +`group.f` (column extraction) yields one value **per row** — a bag — while +`project {f}` yields a **set** (dedup'd). They differ exactly on duplicates, and +that is what makes aggregation correct without multiplicity bookkeeping: base +tables have keys (#2), so a group's rows are distinct *real* rows, so +`group.salary` has one entry per real row and `sum(group.salary)` is right. +Decisions #1 + #2 buy footgun-free aggregation; the explicit-`count` hatch (#3) is +only needed once you deliberately collapse to a keyless form. + +An aggregate's argument is a **per-row expression** over the group: `group.f` +binds the current row's field, so `sum(group.qty * group.price)` reduces +`qty*price` row by row. Equivalent comprehension form: +`sum(r.qty * r.price for r in group)`. + +### The aggregate functions + +Relational aggregates must be **commutative monoids** (sets are unordered ⇒ the +reduction must be order-independent): `count`, `sum`, `min`, `max`, `mean`, +`and`/`or`, `count(unique c)`. Order-dependent reductions are *not* relational +aggregates — they belong to the `Seq` layer. Empty-input behavior uses options, +not a silent NULL: + +| aggregate | empty input | type | +|-----------------|-------------|---------| +| `count` | `0` | `Int` | +| `sum` | `0` | `Num` | +| `mean/min/max` | `None` | `T?` | + +`group by`-produced groups are non-empty by construction, so the metadata layer +narrows `mean`/`min`/`max` back to `T` there. Aggregating an **option column** +(`Col T?`) is a **type error** — resolve first (`group.salary ?? 0`, or filter) — +which removes SQL's inconsistent null-skipping. + +### Whole-relation aggregation + +`aggregate { … }` is `group by ()` — one group, one row out: + +``` +sales |> aggregate { total = sum(group.amount), n = count(group) } +``` + +### Nested / multi-level aggregation + +`g` is a relation, so any relational expression — including another `group by` — +applies to it. Aggregating subtables composes: + +``` +orders +|> group by region { + revenue = sum(group.amount), + by_month = (group |> group by month { m_rev = sum(group.amount) }), -- inner `group` shadows outer + } +``` + +### Fan-out detection (the safety net) + +A field `f` from base table `T` (key `KT`) is determined by `KT`. After a many-to- +many join the relation's key grows to `KT ∪ KS`, so `f`'s values repeat across the +extra dimension. Rule: a **duplicate-sensitive** aggregate (`sum`, `mean`, +`count`) over a field whose determinant key is *coarser* than the relation's +current key is flagged — its values are being double-counted; **duplicate- +insensitive** aggregates (`min`, `max`) are exempt. This is the concrete cash-out +of key-inference (#11): the double-count SQL does silently becomes a warning. + +## DML & constraints + +### Everything is relational assignment + +A database is a set of **relvars** (relation variables); `table T { … }` declares +one. A relvar holds a relation *value*; the sole mutation primitive is +**relational assignment**, `T := <relational expression>`, where `expr`'s heading +must match `T`'s exactly (statically checked — no positional column matching). The +three DML verbs are **sugar** over assignment, reusing the *same* operators and +expressions as queries: + +``` +insert into R from <expr> --> R := R union <expr> +insert into R { id = 7, … } --> R := R union (rel{ …one tuple… }) +delete from R where p --> R := R |> where not p +update R where p set { f = e } --> R := (R |> where not p) + union (R |> where p |> with { f = e }) +``` + +`with { f = e }` overrides an existing field (the override sibling of `extend`, +which only adds); using the wrong one is a typed error, catching the `set`-typos +SQL silently accepts. Set semantics makes these well-behaved: re-inserting an +identical tuple is a **no-op** (`R ∪ {t}`, `t ∈ R`); inserting a *different* tuple +with an existing key violates the key constraint and is **rejected** (no silent +upsert); deleting absent rows is a no-op. Insert/delete are idempotent. + +### Constraints are booleans that must hold + +A constraint is a **boolean relational expression over the database that must be +true after every transaction.** Keys and FKs are named, incrementally-checked +special cases of that one idea: + +- **Key** (#2): `key (id)`, `key (a, b)` — the projection onto the key is + injective; multiple candidate keys allowed. +- **Foreign key:** `dept: Id references departments(id)` — an inclusion + constraint, `R{dept} ⊆ departments{id}`; a nullable FK (`dept: Id?`) admits + `None`. Parent-side referential actions — **restrict** (default), **cascade**, + **set-null** — are declared repairs applied within the transaction before the + check. +- **Check:** an arbitrary predicate. Tuple-level (`check salary > 0`) is the + common case; **database-level** constraints spanning relvars are first-class — + exactly what SQL cannot express: + + ``` + check count(employees |> where role == "CEO") <= 1 + check (employees{dept} ⊆ departments{id}) -- an FK, spelled out + ``` + +Any assignment whose result would falsify a constraint is rejected; the relvar is +left unchanged. + +### Transactions are the checking boundary + +Because constraints span relvars, the unit of checking is the **transaction**: a +batch of assignments applied together, all constraints checked once at commit, any +violation rolling the whole batch back atomically. A bare assignment is a +singleton transaction. This is what lets a parent and child be inserted together +past an FK, and what makes cascades coherent. + +``` +transaction { + insert into departments { id = 3, name = "Eng" } + insert into employees { id = 7, name = "Ann", dept = 3 } +} -- both visible together; constraints checked once +``` + +### Views + +A **view** is a named relation *expression*, not stored: `view active = +employees |> where active`. Read-only to start; making views updatable (mapping a +view-assignment back onto base relvars — the view-update problem) is deferred. + +### What this buys + +Mutation reuses the entire query language — no separate DML dialect (contrast +SQL's bespoke `UPDATE … SET`). Three verbs collapse to one primitive, so they +compose and can't interact surprisingly. Constraints unify to "invariants that +hold after every transaction," with multi-relvar invariants first-class. Static +heading-match kills positional-column bugs. The transaction is simultaneously the +atomicity and the constraint boundary. + +## Recursion + +Restricted to a **least fixpoint of a monotone, range-restricted relational +expression** — enough for reachability, ancestry, and BOM explosion, and +guaranteed to terminate. A recursive relation is defined with `recursive`: + +``` +-- edges: Rel {src: Id, dst: Id, key (src, dst)} +recursive reach = + edges + |> union (reach |> join edges on dst == edges.src + |> select { src, dst = edges.dst }) +``` + +What keeps it total: + +- **Monotone:** the recursive name may not appear under `minus`/`not` — negation + is **stratified** (it may only refer to a lower stratum), so each iteration only + *adds* tuples. +- **Range-restricted:** every output value comes from the base/input, none are + invented (no unbounded arithmetic in the recursive arm). Monotone + + range-restricted + finite input ⇒ the least fixpoint is reached in finitely many + steps. + +The result is an ordinary relation, so it composes with everything else (filter / +join / group the closure). The common case has sugar — `closure(edges: src -> +dst)` expands to the fixpoint above. Recursion *with aggregation* (shortest paths, +PageRank-style iteration) needs well-founded aggregate semantics and is deferred. + +## Windows & the `Seq` layer + +### Principle + +Relations are unordered (#4); **ordering produces a `Seq`**, and *all* +order-dependent computation lives there — never in relational aggregates, which +must be commutative monoids (#14). `sort by e : Rel R -> Seq R` is the gateway, +and the `Rel`/`Seq` type boundary is what enforces #14: you cannot write a running +total over a `Rel`, nor a duplicate-collapsing set op over a `Seq`, without +crossing the boundary explicitly. A `Seq` supports two kinds of computation: + +- **Scans** (`Seq R -> Seq R'`): give every row a value from its position, + neighbors, or a frame — running totals, moving averages, `lag`/`lead`, ranks. + Row count preserved. +- **Ordered folds** (`Seq R -> scalar`): reduce in order — `first`, `last`, + `concat(sep)`. (Order-*independent* reductions remain relational aggregates; + these are the ones that genuinely need order.) + +Plus slicers `take n`, `drop n`, `take while p`, which subsume top-N and +pagination (`sort by x desc |> take 10`). + +### Window expressions (scans) + +A scan adds a column via `extend`, where the expression sees ordered context: + +- **Frame aggregate:** `agg(e) over <frame>`, an aggregate over a range relative + to the current row: + - `rows[..current]` — running (unbounded preceding → current) + - `rows[current..]` — current → end + - `rows[-3..0]` — 3 preceding → current (a width-4 moving window) + - `rows[-1..1]` — immediate neighbors +- **Offsets:** `lag(e, k, default)`, `lead(e, k, default)` — value k rows away. +- **Ranking:** `row_number()`, `rank()`, `dense_rank()`, `ntile(n)` — read the + sort order and ties. + +### Partitioning reuses `group` + +`PARTITION BY` is just "window within each group independently," and we already +have grouping. So: + +``` +trades +|> partition by symbol order by ts + extend running = sum(amount) over rows[..current], + prev = lag(price, 1), + r = rank() +``` + +is sugar for group / sort-within / position-aware `extend` / ungroup: + +``` +trades +|> group symbol into g +|> with g = (g |> sort by ts + |> extend running = sum(amount) over rows[..current], + prev = lag(price, 1), + r = rank()) +|> ungroup g +``` + +The result is a **`Seq`** carrying within-partition order; cross-partition order +is unspecified until a final `sort by`. Rows stay distinct, so it can be taken +back to a `Rel` with `as rel` if you want a set. + +### Idioms + +``` +-- first row per partition +events +|> partition by user order by ts extend rn = row_number() +|> where rn == 1 + +-- 7-day moving average +… |> partition by sku order by day extend ma = mean(qty) over rows[-6..0] + +-- in-order audit trail per case (ordered fold) +steps |> group by case { trail = (group |> sort by ts |> concat(name, " -> ")) } +``` + +### Why this beats SQL windows + +- No bespoke `OVER (PARTITION BY … ORDER BY … ROWS BETWEEN …)` sub-language: + partitioning is the `group` you already know, ordering is `sort by`, the window + value is an `extend`. +- The `Rel`/`Seq` boundary makes order-dependence *visible in the type* and + enforces the aggregate/scan split (#14) instead of leaving it to convention. +- Frames are terse, and ordered folds (`first`/`concat`) are just `Seq` + functions, not special cases. + +## Concrete syntax (grammar) + +A first concrete grammar — enough to parse the examples above and to pin the +ambiguities. Notation: `(…)` grouping, `*` zero-or-more, `?` optional, `|` choice; +UPPERCASE = tokens; `--` starts a line comment. + +### Module & declarations + +``` +module ::= item* +item ::= domainDecl | tableDecl | viewDecl | letDecl | recDecl + | constraintDecl | stmt + +domainDecl ::= "domain" IDENT "=" type ("check" "(" expr ")")? +tableDecl ::= "table" IDENT "{" field ("," field)* ("," tableCon)* ","? "}" +field ::= IDENT ":" type ("references" IDENT "(" IDENT ")" refAction*)? +refAction ::= "on" ("delete" | "update") ("restrict" | "cascade" | "set" "null") +tableCon ::= "key" "(" IDENT ("," IDENT)* ")" | "check" "(" expr ")" +viewDecl ::= "view" IDENT "=" relExpr +letDecl ::= "let" IDENT "=" relExpr -- names a relation (replaces CTEs) +recDecl ::= "recursive" IDENT "=" relExpr -- monotone, range-restricted +constraintDecl ::= "constraint" IDENT "check" "(" expr ")" +``` + +### Statements + +``` +stmt ::= IDENT ":=" relExpr -- relational assignment + | "insert" "into" IDENT (record | "from" relExpr) + | "delete" "from" IDENT "where" expr + | "update" IDENT "where" expr "set" record + | "transaction" "{" stmt* "}" + | relExpr -- a query to evaluate +``` + +### Relation expressions (pipelines) + +``` +relExpr ::= relPrim ("|>" stage)* +relPrim ::= IDENT | "(" relExpr ")" | relLit + | "closure" "(" IDENT ":" IDENT "->" IDENT ")" +relLit ::= "rel" "{" record ("," record)* ","? "}" + +stage ::= "where" expr + | "extend" bindings | "with" bindings + | "select" block | "project" nameBlock | "remove" nameBlock + | "rename" IDENT "->" IDENT + | "join" relPrim ("as" IDENT)? ("on" expr | "natural") + | ("union" | "minus" | "intersect") relPrim + | "group" "by" nameList block? -- sugar: binds `group` + | "group" nameList "into" IDENT -- primitive nest + | "ungroup" IDENT + | "aggregate" block -- = group by () + | "sort" "by" sortKey ("," sortKey)* + | "take" ("while" expr | expr) | "drop" expr + | "partition" "by" nameList "order" "by" sortKey ("," sortKey)* + "extend" bindings -- window; yields Seq + +sortKey ::= expr ("asc" | "desc")? +bindings ::= binding | block +binding ::= IDENT "=" expr +block ::= "{" entry ("," entry)* ","? "}" ; entry ::= binding | IDENT -- punning +nameBlock::= "{" nameList ","? "}" ; nameList ::= IDENT ("," IDENT)* +record ::= "{" entry ("," entry)* ","? "}" -- value tuple +``` + +### Types + +``` +type ::= (IDENT | "Rel" heading | "Seq" heading) "?"? +heading ::= "{" IDENT ":" type ("," IDENT ":" type)* ","? "}" +``` + +### Scalar expressions (low → high precedence) + +``` +expr ::= or ("??" or)? -- option default +or ::= and ("or" and)* +and ::= cmp ("and" cmp)* +cmp ::= add (cmpOp add | "is" ("some" | "none"))? -- non-associative +cmpOp ::= "==" | "!=" | "<" | "<=" | ">" | ">=" | "in" | "subset" +add ::= mul (("+" | "-") mul)* +mul ::= un (("*" | "/" | "%") un)* +un ::= ("-" | "not") un | cast +cast ::= post ("as" type)? +post ::= prim trailer* +trailer ::= "." IDENT | "(" callArgs? ")" | "over" frame +prim ::= LITERAL | IDENT | "(" expr ")" | quant +quant ::= ("all" | "any") "(" expr "for" IDENT "in" relExpr ")" +frame ::= "rows" "[" bound? ".." bound? "]" ; bound ::= "current" | INT +callArgs ::= expr ("," expr)* -- function / aggregate args + | expr "for" IDENT "in" relExpr -- aggregate comprehension form +LITERAL ::= INT | FLOAT | TEXT | "true" | "false" | "none" | "some" "(" expr ")" +``` + +Numeric literals allow `_` digit separators (`50_000`); the `50k`-style shorthand +in earlier prose was informal. + +### Semantic clarifications + +These resolve the ambiguities the prose left open; they are the canonical reading +and supersede any looser earlier example. + +1. **`.` is type-directed (#23).** `e.f` is field access on a record (→ scalar), + **column extraction** on a relation (→ `Col T`, a per-row bag), or option- + propagating access on `T?` (→ `T'?`). A `Col` is only ever an aggregate + argument; used as a scalar it is a type error — that *is* the no-must-group + rule. +2. **`=` vs `:` (#22).** `=` binds a value (`{ id = 7 }`); `:` ascribes a type + (`{ id: Id }`). A bare `x` in a block is punning for `x = x`. +3. **Pipelines are expressions; no `from` (#24).** A query is a relation + expression, so `R |> …` nests anywhere a relation is expected (join operands, + nested groups, `let` / `view` bodies). `from` appears only in `delete from` / + `insert … from`. +4. **`group` keyword; `into` for naming.** `group by K { … }` binds the per-group + relation to `group`; nest with the `group K into name` primitive to avoid + shadowing. +5. **Aggregates are calls.** `count(group)`, `mean(group.salary)`, + `sum(group.qty * group.price)`; the arguments are per-row expressions over the + group that the aggregate reduces. The comprehension form `sum(e for r in R)` is + equivalent. +6. **`select` vs `project`.** `project {a, b}` is pure π (existing names only); + `select { a, x = e }` builds an exact output record and desugars to + `extend` (computed) + `project` (to the listed fields). +7. **Frames.** In `over rows[a..b]`: `current` ≡ 0, negative = preceding, positive + = following, an empty side = unbounded — `rows[..current]` running, + `rows[-3..0]` width-4 trailing, `rows[-1..1]` neighbors. +8. **Option discipline.** `==` / `!=` on options are **total** Bool (structural: + `none == some x` is `false`), and a non-option widens to an option implicitly + (`T <: T?` — the only implicit conversion). Ordering, arithmetic, and value-use + on a `T?` need resolution first (`??`, or narrowing via `where x is some`); + there is no silent null propagation. +9. **Qualified names (#25).** A field carries its source relvar/alias. A bare name + must resolve unambiguously; after a `join`, a name present on both sides is + written `source.field`, and `join R as a` aliases the source (needed for + self-joins). In row context, a leading identifier that names a source qualifier + is qualified field access; otherwise `.` is the type-directed access of + clarification 1. + +## Showcase + +One schema exercised end-to-end in canonical syntax: schema + constraints, then a +battery of queries, windows, recursion, and DML. Everything here is intended to +parse under the grammar above and type-check under the rules above. + +### Schema + +``` +domain Id = Int +domain Money = Int check (value >= 0) -- minor units; `value` is the domain value + +-- FK direction is declarative; declaration order and cycles (departments.head ⇄ +-- employees.dept) are fine — the schema is checked as a whole. + +table departments { + id: Id, + name: Text, + budget: Money, + head: Id? references employees(id), -- nullable FK: a dept may have no head + key (id), + key (name), -- a second candidate key +} + +table employees { + id: Id, + name: Text, + email: Text, + dept: Id references departments(id) on delete restrict, + manager: Id? references employees(id), -- self-ref, nullable (the CEO has none) + salary: Money, + hired: Time, + key (id), + key (email), + check (salary > 0), +} + +table projects { + id: Id, + name: Text, + dept: Id references departments(id), + key (id), +} + +table assignments { -- employees ⋈ projects (many-to-many) + emp: Id references employees(id) on delete cascade, + project: Id references projects(id) on delete cascade, + hours: Int check (hours >= 0), + key (emp, project), +} + +-- a database-level invariant SQL cannot state directly: +constraint one_ceo check (count(employees |> where manager == none) <= 1) + +let over_budget = + employees + |> group by dept { payroll = sum(group.salary) } + |> join departments on dept == departments.id + |> where payroll > budget +constraint payroll_within_budget check (count(over_budget) == 0) +``` + +### Queries + +``` +-- Q1 average salary per dept, only well-paid teams, sorted +employees +|> where salary > 50_000 +|> group by dept { headcount = count(group), avg_pay = mean(group.salary) } +|> where avg_pay > 70_000 -- no HAVING; just `where` after group +|> join departments on dept == departments.id +|> select { team = departments.name, headcount, avg_pay } +|> sort by avg_pay desc -- result is a Seq + +-- Q2 each employee with their manager (self-join; a None manager matches no row) +employees +|> join employees as mgr on employees.manager == mgr.id +|> select { who = employees.name, boss = mgr.name } +-- use `left join` instead to keep the CEO, with boss = none + +-- Q3 total hours and head-count per project +assignments +|> group by project { total_hours = sum(group.hours), people = count(group) } +|> join projects on project == projects.id +|> select { project = projects.name, total_hours, people } +|> sort by total_hours desc + +-- Q4 fan-out: the compiler catches a double-count at compile time +employees +|> join assignments on id == assignments.emp -- 1:many ⇒ each emp recurs +|> group by dept { payroll = sum(group.salary) } -- ⚠ salary double-counted per assignment +-- correct (employee grain, no fan-out): +employees |> group by dept { payroll = sum(group.salary) } + +-- Q5 top-3 earners per department (window ⇒ Seq) +employees +|> partition by dept order by salary desc extend rank = rank() +|> where rank <= 3 +|> sort by dept, rank + +-- Q6 cumulative payroll as people are hired (window scan) +employees +|> partition by dept order by hired + extend cumulative_payroll = sum(salary) over rows[..current] +|> select { dept, name, hired, cumulative_payroll } + +-- Q7 all transitive reports under employee #1 (recursion + qualified disambiguation) +let reports = employees |> where manager is some -- narrows `manager` to Id + |> select { mgr = manager, emp = id } +recursive chain = + reports + |> union (chain |> join reports on chain.emp == reports.mgr + |> select { mgr = chain.mgr, emp = reports.emp }) +chain +|> where mgr == 1 +|> join employees on emp == employees.id +|> select { report = employees.name } +``` + +### Mutation + +``` +-- 10% raise for one department +update employees where dept == 3 set { salary = salary * 11 / 10 } + +-- hire + first assignment atomically (the assignment's FK to the new employee +-- holds because both land in one transaction, checked once at commit) +transaction { + insert into employees { + id = 42, name = "Bo", email = "bo@co", dept = 3, + manager = 7, salary = 9_000_000, hired = date("2026-06-01") + } + insert into assignments { emp = 42, project = 5, hours = 0 } +} + +-- retire a project; its assignments cascade away +delete from projects where id == 5 +``` + +### Callouts + +- **Q2** `employees.manager == mgr.id`: `manager: Id?` against `mgr.id: Id` widens + by implicit `some` (#26); equality is total, so a None manager equals no row and + the CEO drops out of the inner join. `manager` must be qualified (both sides are + `employees`). +- **Q4** the `⚠` is a *compile-time* warning from key-inference (#11, #15), not a + runtime surprise: `salary` is determined by `employees.id` but the joined rows + are keyed by `(emp, project)`. +- **Q5 / Q6** output is a `Seq` (#20); `rank()` and `sum(…) over` are `Seq`-layer + operations, unavailable on a `Rel`. +- **Q7** `where manager is some` narrows `manager` from `Id?` to `Id` (#26); the + recursive arm needs qualifiers (`chain.emp`, `reports.mgr`) because both inputs + carry `mgr` and `emp` (#25); `union` type-checks because both arms have heading + `{mgr, emp}`. + +## Open questions + +- **Name.** `RQL` is a placeholder. +- **Relationship/FK layer (deferred).** The additive sugar from decision #12: + declared references, reverse navigation, path expressions, cardinality-typed + many-sides. When to build it; named-relationship & ambiguity rules; path + resolution. +- **Comprehension surface.** Add a calculus surface over the same IR later, or + never? +- **Value-based & peer frames.** SQL `RANGE` / `GROUPS` (frames by order-key + *value* or tie-peers) deferred; positional `rows[…]` frames first. +- **Recursion with aggregation** (shortest paths, fixpoint + `min`/sum). Deferred + — needs well-founded aggregate-in-recursion semantics. +- **Updatable views.** The view-update problem — when can a view assignment map + back to base relvars? Deferred. +- **Defaults & generators.** Column defaults (constants / pure exprs) vs. impure + generators (identity/autoincrement, `now()`), which must arrive as explicit + host effects. Design TBD. +- **Upsert / merge sugar.** A declared-intent form over assignment. +- **Concurrency / isolation.** Likely out of scope (RQL is a language, not an + engine), but the transaction boundary needs an isolation story if it ever runs + concurrently. +- **Implementation substrate.** Could ride cfree's pipeline as a CG-API frontend + (cf. RSN) or stand alone. Out of scope until the language is nailed. diff --git a/doc/plan/RSN.md b/doc/plan/RSN.md @@ -0,0 +1,627 @@ +# RSN — Ryan's Scripting Notation (planned work) + +A small, **statically-typed** scripting language that lets cfree replace its +remaining external build/test dependencies — **make, POSIX shell, and Python** — +with code it owns and ships, and that runs on cfree's *own* compiler pipeline. +RSN is the language; the build system that grows on top of it is a later layer +(sketched at the end). This doc is the living design. + +## The architecture, in one line + +**RSN is another frontend on cfree's pipeline** (alongside C, Wasm, toy): RSN +source → public CG API → IR → execution. To make a garbage-collected language +fit, **cfree's IR gains a first-class managed-reference type**, so RSN runs on the +existing IR interpreter (arena-allocated first, then *precisely* GC'd) and can +compile to *fast native code* through the existing backends later. Static typing +is what makes +this pay off: typed operations lower to real IR ops (not opaque runtime calls), +so the optimizer/backends optimize RSN, and the managed-ref type makes every GC +root statically known. + +Because this turns the IR into a GC-aware IR, it is **staged**, and Stage 1 is +itself split to keep the highest-blast-radius change (touching the shared CG type +system) out of the critical path until the language has proven itself: + +- **Stage 1a (now):** typed RSN frontend + interp, **arena-allocated, no + collector** — managed values live in a run-scoped arena freed at exit. The + interp carries a per-function ref-bitmap as local metadata; the C compiler's CG + type system is untouched. Cooperative fibers land here. This is where the + language gets exercised on real build/test scripts. +- **Stage 1b (when reclamation is actually needed):** the managed-ref IR type + + GC op dialect + the interp's root scan and mark-sweep collector, replacing the + arena. Precise by construction. Native/Wasm/C backends reject managed-ref + functions. +- **Stage 2 (later, optional):** native/JIT codegen for managed code — safepoints + + GC stack maps in the optimizer and the aa64/x64/rv64 backends — so a build + tool written in RSN compiles to a native binary. + +## Why + +Three external tools carry three jobs today: **make** (~1.5k lines: the +dependency DAG, host/config branching, `CFREE_*_ENABLED` gating, the cross-arch +matrix), **shell** (~12k LOC under `test/`: the corpus harness engine, process +orchestration, golden-diffing), and **Python** (parsing tool output and wrangling +text/JSON/CSV). RSN's goal is one owned, typed language that does all three, so +cfree's only external dependency is a seed C toolchain. 3 → ~0. + +## Locked decisions + +**Architecture & runtime** + +1. **Typed RSN as a CG-API frontend** (`lang/rsn/`, with a type checker); no + bespoke bytecode — the IR / interp `InterpInsn` *is* RSN's bytecode. +2. **GC-aware IR**: a managed-reference type + a small GC op dialect in the CG + type system + IR, firewalled so the C path pays nothing. +3. **Precise, non-moving mark-sweep GC**, precise by construction. +4. **Cooperative single-threaded fibers from day one**, on the interp's explicit + swappable stack and its reserved `CFREE_INTERP_BLOCKED` yield state. Cores are + saturated by spawned child processes, not threads. +5. **Staged delivery**: interp first — arena-allocated (1a), then precise GC (1b); + native managed codegen later (Stage 2). + +**Language** + +6. **Syntax**: brace / C-ish (Go/Wren family), strictly **LL(1)**, **significant + newline** termination. +7. **Bindings**: `let` (immutable) + `var` (mutable). Object *contents* stay + mutable regardless — this governs only rebinding the name. +8. **Algebraic data types**: `struct` (named product) + `enum` (sum / variants) + + `(T, U, …)` tuple (anonymous product), all destructured by the one `pattern` + grammar (`match`/`if let`/`let`/`for`). +9. **Nullability**: refs are non-null; absence is `T?` (`Option<T>`), unwrapped + with `??` (coalesce) and `if let`. Distinct from `Result` (absence vs error). +10. **Errors**: `Result<T, E>` + Zig-style `try` (propagate) / `catch` (recover), + not exceptions. `Result`/`Option` are sum types; `match` also destructures + them. +11. **`any` + `match`**: `any` is the dynamic escape hatch (JSON, parsed text); + `match` narrows it (`string s`, `int n`, …). `match` is the single multi-way + construct and general destructurer — literals/strings, ADTs, tuples, `any` + type-narrows, and or-patterns (`a | b -> …`); no fallthrough, no `switch`. +12. **Generics — Stage 1 ships only the builtin containers** (`list<T>`, + `map<K, V>`) as hand-written specializations (packed/unboxed `list<int>` / + `list<float>`, a shared boxed instantiation otherwise). The general + instantiation strategy (.NET-style hybrid vs. plain monomorphization, plus + constraints for user-defined generic fns) is **deferred** until the feature + exists — not locked ahead of need. +13. **Expression `if`/`match`** (value-producing); blocks appear only in + control-flow slots, so a bare `{}` stays a map literal and `?` is reserved for + optionals (no ternary). **`match` is the *single* multi-way construct** — there + is no separate `switch`; literal/string arms plus or-patterns (`a | b -> …`) + cover value-dispatch, arms never fall through. +14. **Modules**: file = module; `pub` exports (private by default); `import + "x.rsn" as x`, qualified access. +15. **Patterns**: Lua-style string patterns, not a full regex engine. +16. **Methods via UFCS**: `a.f(b)` ≡ `f(a, b)`. `a.name` is a field load when + `name` is a field of `a`'s type, otherwise `a.name(...)` resolves `name` as a + free function with `a` as its first argument. One rule covers `objs.push`, + `src.replace`, `xs.map` — no method-declaration form, no receiver concept. + Tie-break: a field shadows a free function of the same name. +17. **String interpolation**: `"...${expr}..."` (a lexer feature; the interpolated + `expr` is a full expression, `\$` escapes a literal `$`). Argv is still built + with lists (`["cfree", "cc", src]`), never by interpolating into one command + string — that keeps shell-quoting bugs out by construction. +18. **First-class tuples + one pattern grammar**: `(T, U, …)` (arity ≥ 2) is a + storable, nestable anonymous product; `()` is unit and `(e)` is grouping. + Destructure-only (no `.0`/`.1`). A single `pattern` concept serves + `let`/`var`/`for`/`match`/`if let` (tuples always written `(a, b)`); + `map` yields `(K, V)` so `for (k, v) in m` is just a tuple pattern. `(a, b) = + (b, a)` is parallel assignment (RHS fully evaluated first). + +Superseded by this design: the earlier **NaN-boxed dynamic value model** and +**dynamic typing** — typing removes the need for a universal tagged value. + +## What we reuse + +- **The frontend→CG→opt→interp pipeline.** RSN attaches as the compiler's + **interp sink** (`cfree_interp_program_attach`) so each compiled RSN function is + lowered to `InterpInsn`, exactly as `cfree run --no-jit` does for C. +- **The CG type system** (`CfreeCgTypeId`) and recording IR (`src/cg`) — extended + with the managed-ref type, not replaced. +- **The IR interpreter** (`src/interp`): a register VM with direct-threaded + dispatch, an **explicit swappable stack** documented as a "swap-ready substrate + for fibers/virtual threads," and a **reserved `BLOCKED` yield status**. The + fiber runtime and a yielding `run()` complete a half-built design. +- **`src/core/`** (arena/Heap/Slice/StrBuf/VEC/hashmap/diag) — the frontend's and + runtime's substrate. +- **`rt/lib/`** — the home for RSN's runtime helpers (container growth, string + ops, the collector, `any`-boxing, the iterator protocol). +- **The optimizer and native backends** (Stage 2) — typed RSN lowers to real IR. +- **CAS** (`include/cfree/cas.h`) — the build layer's content-addressed cache. +- **The host-vtable convention** (`CfreeCasHost`, `CfreeInterpHost`) — RSN's host + vtable supplies all I/O and the scheduler's event source; which fields are bound + is the capability gate. + +## The IR extension: managed references + GC + +The heart of the design, specified with its firewall discipline. + +**The type.** A new CG type kind — a **managed ref**: an opaque, GC-owned pointer +to a heap object of a given *shape*. Distinct from a raw C pointer. RSN's +`string`/`list`/`map`/`fn`/`fiber`/`chan`/record/`enum`/tuple/`any` values are +managed refs; `int`/`float`/`bool` are unmanaged scalars. + +**The ops** (a small dialect): `managed_alloc(shape) -> ref`, `ref_field_load/ +store`, `ref_elem_load/store`, `ref_eq`, `ref_null`. Safepoints are implicit at +allocations, calls, and loop back-edges (no explicit op in the interp; once the +1b collector is on it can collect at any allocation because all roots are in +registers). + +**Object shapes for precise tracing.** Every managed object's header points at a +**shape descriptor** listing which fields are managed refs (or a trace function +for containers). **Sum types use a *per-variant* shape**: the collector reads the +active tag, then traces that variant's ref-map. Records use a single field-map. + +**Root finding in the interp.** Each `InterpFunc` carries a static **ref-bitmap** +marking which PRegs hold managed refs. Roots = every live ref-PReg across all +live fibers' frames + globals + the string-intern table + a small **C-roots** +scratch for native functions mid-construction. Precise; no stack maps at the +interp level. + +**Two cost-containing invariants (locked):** +- **Non-moving** → no write/read barriers, no pointer rewriting in codegen; the + backend only ever *identifies* roots, never *updates* them. +- **No interior pointers** → element/field access re-derives from the base ref, so + a stack map only ever needs base refs. + +**Firewall discipline.** The managed dialect is an *optional IR feature*: a +function with no managed refs needs no GC support and pays nothing. The C frontend +never emits managed refs, so C → IR → {native, Wasm, C} is untouched and +zero-cost. A backend without managed support asserts out cleanly on a managed +function. The managed-IR capability is its own flag (`CFREE_MANAGED_IR`), distinct +from the `CFREE_RSN_ENABLED` frontend that consumes it. The standing test: a +C-only build compiles and links with the managed dialect compiled out. + +**Stage 2 (native).** Precise native GC needs safepoints + **stack maps**: the +optimizer's liveness + register allocator record which managed refs are live at +each safepoint/call, per arch. This is the real cross-cutting work and the reason +native is staged separately. The two invariants above keep it to *root +identification* rather than the full moving-GC codegen problem. + +## Type system + +Static, with local inference so scripts stay terse. + +- **Primitives:** `int` (i64), `float` (f64), `bool`, `string`. +- **Containers (builtin generics):** `list<T>`, `map<K, V>`. +- **Functions:** `fn(A, B) -> R`; closures capture bindings by reference. +- **Records (product):** `struct Name { field: T, … }` — named-field managed + objects with a single shape. +- **Sums (variant):** `enum Name { Variant, Variant(T, …), … }` — tagged unions + with optional payloads; per-variant shape. Constructors are values + (`Leaf(5)`, `Empty`); destructured by `match`. +- **Tuples (anonymous product):** `(T, U, …)`, arity ≥ 2 — a managed ref to an + anonymous positional record (reuses the record shape, trivially traced). + Storable and nestable (`list<(string, string)>`, `(int, (int, int))`). + `()` is unit and `(e)` is grouping, so 1-tuples don't exist. **Destructure-only** + — no `.0`/`.1` (consistent with variant payloads, and it dodges the `pair.0.1` + float-literal lexer hazard); pull one element with `let (_, v) = p`. Packed + all-scalar tuples (e.g. unboxed `(int, int)`) are a deferred niche opt, same + bucket as packed `list<int>`. +- **Builtin sums:** `Option<T> = { Some(T), None }` (with `T?` sugar and `??` / + `if let`), `Result<T, E> = { Ok(T), Err(E) }` (with `try` / `catch`). +- **Builtin error type:** `struct Error { code: int, message: string }` — the + default `E` for host/process operations (`run`, `file_io`); user code may use + any `E`. +- **Methods are UFCS:** there is no method-declaration form. `a.f(args)` lowers to + `f(a, args)` when `f` is a free function in scope; `a.name` is a plain field + load when `name` is a field. A field shadows a free function of the same name. +- **`any`:** a managed ref to a `{ tag, payload }` box for genuinely dynamic data; + narrowed by `match` type-patterns that name the real (lowercase) types + (`string s`, `int n`, `list xs`, …). Boxing is confined to `any` (and + `Option<scalar>`), not universal. +- **Inference:** locals infer from initializers (`let n = 5` ⇒ `int`); function + parameters and return types are annotated. Explicit type arguments are *not* + written at call sites (inferred), so `<` is unambiguous in expressions — no + turbofish. +- **Generics — Stage 1 is builtin containers only.** `list<T>` and `map<K, V>` + ship as hand-written specializations: `list<int>` / `list<float>` are + packed/unboxed, ref element types share one boxed instantiation. The general + scheme for user-defined generics (.NET-style value/ref hybrid vs. plain + monomorphization, and constraints) is deferred to its own phase — when it lands, + the value/ref split and ref-vs-scalar-parameterized shape descriptors are the + intended direction, but nothing here is locked yet. +- **Modules:** each `.rsn` file is a module; `pub` exports (private by default); + `import "util.rsn" as util` then `util.name`. Qualified, clash-free. Selective + `from … import { … }` is a possible later sugar. + +## Value model + +Typed, so values are their **natural machine representations** — no universal +tagged word: + +| RSN type | Representation | +|-----------------------------------|-----------------------------------------| +| `int` / `float` / `bool` | i64 / f64 / i8 (unmanaged scalars) | +| `string`, `list`, `map`, `fn`, `fiber`, `chan`, record, `enum` | managed ref | +| `(T, U, …)` tuple | managed ref (anonymous positional record); all-scalar packing deferred | +| `Option<T>` / `T?` | managed ref (a sum); `Option<scalar>` boxes | +| `any` | managed ref to a `{tag, payload}` box | + +`int` is a true 64-bit integer (the reason typing was attractive — no NaN-box, no +bigint spill, no pointer masking). Mixed `int`/`float` arithmetic requires an +explicit `float(x)`. Strings are immutable, so they are always safe to share +across fibers; **interning is selective, not universal** — short strings and any +string used as a map key are interned (cheap equality, CAS keys), while large +strings (file bodies, captured subprocess output) are held by reference without +hashing them into the intern table. `Option<scalar>` boxing is later +niche-optimizable; builtin `list<int>` stays packed/unboxed. + +## Errors and results + +Values, typed, with Zig's `try` ergonomics — no exceptions, no unwind tables. + +- **`Result<T, E>`** (a sum): `Ok(v)` / `Err(e)`. `try expr` unwraps `T` on `Ok`, + or returns the `Err` from the enclosing function on `Err`; it is type-checked + (the enclosing return must be a `Result` with a compatible `E`). At the top + level, a `try` that hits `Err` halts the run nonzero. `expr catch handler` + recovers, with an optional `|e|` binder. +- **Process primitives:** `run(cmd) -> Result<Proc, Error>` (`Ok` on exit 0, fail + fast with `try`); `exec(cmd) -> Proc` (raw `{code, out, err}`, for harnesses + that expect nonzero exits). +- `match` destructures `Result`/`Option`/any sum generally; `try`/`catch`/`??`/ + `if let` are the ergonomic shortcuts. + +``` +fn link(objs: list<string>) -> Result<string, Error> { + let a = try run(["cfree", "ar", "rcs", "lib.a"] + objs) # propagate on failure + return Ok(a.out) +} +``` + +## Surface syntax + +Brace-delimited, keyword-led, semicolon-free, mandatory braces on block bodies. +Type annotations only in type positions. Comments are `#` to end-of-line (line or +trailing); there are no block comments. Strings interpolate with `"${expr}"`. +Method calls are UFCS (`xs.map(f)` ≡ `map(xs, f)`). + +``` +struct Target { name: string, inputs: list<string>, optimize: bool } + +enum Json { + Null + Bool(bool) + Num(float) + Str(string) + Arr(list<Json>) + Obj(map<string, Json>) +} + +fn render(j: Json) -> string { + match j { # match is an expression + Null -> "null" + Str(s) -> quote(s) + Num(n) -> str(n) + Arr(xs) -> "[" + join(map(xs, render), ",") + "]" + _ -> "..." + } +} + +fn build(t: Builder) { + let objs: list<string> = [] # annotation: empty list element type + for src in glob("src/**/*.c") { + let obj = src.replace(".c", ".o") + t.target(obj, [src], fn() { try run(["cfree", "cc", "-c", src, "-o", obj]) }) + objs.push(obj) # mutating contents (let binding is fine) + } + let mode = if t.release { "opt" } else { "dbg" } # value-if; no ternary + t.target("libcfree.a", objs, fn() { try run(["cfree", "ar", "rcs", "libcfree.a"] + objs) }) +} +``` + +## Grammar (LL(1)) + +LL(1) by construction; type syntax confined to type positions. The lexer +suppresses the statement-terminating newline after an open bracket, a trailing +binary operator, **or when the next non-blank token is a leading `.`** (so +method/field chains may wrap across lines); the EBNF omits the newline token. +Comments (`#` to end-of-line; no block form) and string interpolation +(`"${expr}"`, `\$` escapes a literal `$`) are lexer-level; an interpolated `${…}` +lexes as a parenthesized sub-expression. + +```ebnf +program = { item } ; +item = importDecl | [ "pub" ] decl | stmt ; +importDecl = "import" STRING "as" IDENT ; +decl = fnDecl | structDecl | enumDecl ; + +fnDecl = "fn" IDENT [ generics ] "(" [ params ] ")" [ "->" type ] block ; +structDecl = "struct" IDENT [ generics ] "{" { IDENT ":" type [ "," ] } "}" ; +enumDecl = "enum" IDENT [ generics ] "{" { variant [ "," ] } "}" ; +variant = IDENT [ "(" type { "," type } ")" ] ; +generics = "<" IDENT { "," IDENT } ">" ; +params = IDENT ":" type { "," IDENT ":" type } [ "," ] ; + +stmt = letDecl | varDecl | whileStmt | forStmt | returnStmt + | "break" | "continue" | exprStmt | block ; +letDecl = "let" pattern [ ":" type ] "=" expr ; (* pattern irrefutable *) +varDecl = "var" pattern [ ":" type ] "=" expr ; (* pattern irrefutable *) +whileStmt = "while" cond block ; +forStmt = "for" pattern "in" cond block ; (* pattern irrefutable *) +returnStmt = "return" [ expr ] ; +exprStmt = expr [ assignOp expr ] ; (* tuple-of-lvalues LHS = parallel assign *) +assignOp = "=" | "+=" | "-=" | "*=" | "/=" ; +block = "{" { stmt } "}" ; (* value = trailing expr-stmt, else unit *) + +type = baseType { "?" } ; +baseType = IDENT [ "<" type { "," type } ">" ] + | "fn" "(" [ type { "," type } ] ")" "->" type + | "(" [ type "," type { "," type } ] ")" ; + (* left-factored: "()" = unit; "(" T "," U … ")" = tuple (arity >= 2). + No "(" T ")" form — types have no paren-grouping, so a lone "(int)" + is a syntax error; unit-vs-tuple is decided after "(" on ")" vs a type. *) + +expr = coalesce [ "catch" [ "|" IDENT "|" ] expr ] ; +coalesce = orExpr { "??" orExpr } ; +orExpr = andExpr { "||" andExpr } ; +andExpr = eqExpr { "&&" eqExpr } ; +eqExpr = relExpr { ("==" | "!=") relExpr } ; +relExpr = addExpr { ("<" | "<=" | ">" | ">=") addExpr } ; +addExpr = mulExpr { ("+" | "-") mulExpr } ; +mulExpr = unary { ("*" | "/" | "%") unary } ; +unary = "try" unary | ("!" | "-") unary | postfix ; +postfix = primary { callOp | indexOp | fieldOp } ; +callOp = "(" [ args ] ")" ; +indexOp = "[" expr "]" ; +fieldOp = "." IDENT ; +args = expr { "," expr } [ "," ] ; + +primary = INT | FLOAT | STRING | "true" | "false" + | ifExpr | matchExpr | spawnExpr + | IDENT [ structLit ] (* structLit suppressed in cond; see below *) + | parenExpr + | listLit | mapLit | fnLit ; +parenExpr = "(" expr [ "," expr { "," expr } ] ")" ; + (* left-factored: no comma => grouping (value is the inner expr); + >= 1 comma => tuple (arity >= 2). The grouping-vs-tuple choice is + made on the token after the first expr: ")" vs ",". No 1-tuples, + so "(a,)" is rejected. *) +spawnExpr = "spawn" block ; (* yields a fiber handle *) +ifExpr = "if" ( "let" pattern "=" cond | cond ) block + [ "else" ( ifExpr | block ) ] ; +cond = expr ; (* a bare `IDENT { … }` struct literal is NOT taken here; a + struct literal in condition position must be + parenthesized: `if (Foo{…}).ok { … }` *) +matchExpr = "match" expr "{" { matchArm } "}" ; +matchArm = pattern "->" ( block | expr ) [ "," ] ; +pattern = patternAtom { "|" patternAtom } ; (* or-pattern; all alternatives + must bind the same names+types. + `|` here is pattern position, + distinct from catch's `|e|`. *) +patternAtom = "_" | literal + | IDENT IDENT (* type-narrow: type binder *) + | IDENT [ "(" pattern { "," pattern } ")" ] (* Ctor / binding *) + | "(" pattern "," pattern { "," pattern } ")" ; (* tuple, arity >= 2 *) +structLit = "{" [ field { "," field } [ "," ] ] "}" ; +field = IDENT ":" expr ; +listLit = "[" [ expr { "," expr } [ "," ] ] "]" ; +mapLit = "{" [ entry { "," entry } [ "," ] ] "}" ; +entry = ( STRING | "[" expr "]" ) ":" expr ; (* string or computed key; + bare `IDENT:` is a struct field, never a map key, so + `{name: x}` is unambiguous and `map<int,V>` literals are + written `{[k]: v}` *) +fnLit = "fn" "(" [ params ] ")" [ "->" type ] block ; +literal = INT | FLOAT | STRING | "true" | "false" ; +``` + +LL(1) tactics that constrain the language: + +- **`let`/`var` lead bindings; `pub` leads exports; `import` leads imports.** +- **Assignment is statement-level** (`=` vs `==` unambiguous; no `if (a = b)`). +- **The `{` rule.** In a statement or control-flow slot (`block`, after `if`/ + `else`/`while`/`for`/`fn`/`spawn`, and a match arm's `->`), `{` is a block. In + expression position, `{` is a map literal and `Name{...}` is a struct literal. + To use a map where a block is expected, parenthesize: `-> ({...})`. This is what + lets value-`if`/`match` coexist with `{k: v}` maps under LL(1). +- **Struct literals are suppressed in condition position.** Inside the `cond` of + `if`/`while`/`for`, a bare `Name { … }` is *not* a struct literal — otherwise + the `{` is eaten as a struct body and starves the required block (the classic + Rust/Go conflict). A struct literal needed in a condition must be parenthesized: + `if (Foo{…}).ok { … }`. This is what actually makes the grammar LL(1); without + it `IDENT [structLit]` and the control-flow `{`-block production collide. +- **The `(` rule.** In expression position `(` is left-factored: parse one `expr`, + then a following `)` means it was grouping and `,` means it's a tuple — so + grouping and tuples never conflict at the open paren. In type position there is + no grouping, so `()` is unit and `(T, …)` is a tuple, decided right after `(`. + In pattern position a leading `(` is always a tuple pattern. (`spawn`/`fn` + before `(` are keywords, handled before this rule.) +- **`fn` at item/stmt start is a named declaration**; anonymous `fn` only in + expression position. +- **`try` is a unary prefix; `catch` a low-precedence tail** with an optional + `|e|` binder — LL(1) because `|` is not in FIRST(expr). +- **`<` is a type-only token in type position** (generics, declared sites). No + explicit type args at call sites, so `<` is always less-than in expressions. +- **Patterns** decide by one lookahead after the leading token: `_`/literal are + immediate; a leading `(` is a **tuple pattern**; an `IDENT` followed by another + `IDENT` is a type-narrow, by `(` is a constructor, else a binding + (constructor-vs-binding for a bare identifier is resolved in the checker against + known variants). **One `pattern` grammar serves `let`/`var`/`for`/`match`/`if + let`**; `let`/`var`/`for` additionally require it to be *irrefutable* (binders, + `_`, nested tuples thereof — never a literal/constructor/type-narrow that could + fail to match). +- **Or-patterns join alternatives with `|`** (`"+" | "-" -> …`): after a + `patternAtom`, lookahead `|` continues the or-pattern, anything else ends it. + This pattern-position `|` does **not** clash with `catch`'s `|e|` binder (expr + position), so the "`|` ∉ FIRST(expr)" tactic still holds. Or-patterns are + refutable ⇒ they appear only in `match`/`if let`, and they are how `match` + subsumes C `switch` multi-`case`/fallthrough — **arms never fall through**. +- **`if let`** vs `if`: peek `let` after `if`. +- **Parallel assignment.** `(a, b) = (b, a)` is an assignment whose LHS is a + tuple of lvalues; the RHS tuple is fully evaluated before any target is written + (so swap works). Only `=` takes a tuple LHS — not `+=` and friends. + +## Semantics defaults + +Decided, revisitable — these were settled as defaults rather than formal forks: + +- **Integer overflow traps** (halts the run) — correctness over speed for a tool + language, where a silently wrapped size/offset is the worse failure. Explicit + `+%` / `-%` / `*%` perform two's-complement wrapping where it is actually + wanted. +- **`/`** truncates toward zero; **`%`** sign follows the dividend (C). Mixed + `int`/`float` needs `float(x)`. +- **`==`** compares scalars and strings by value, and lists/maps/records/sums/ + tuples structurally by contents; identity is a separate `is`. +- **Iteration:** `for x in list`, `for (k, v) in map`, `for i in range(n)`; one + iterator protocol (`next() -> T?`) so user types opt in — `map` yields `(K, V)` + tuples, so multi-binding is just a tuple pattern, not a special case. +- **Unit return:** a function with no `-> T` returns unit (nothing), distinct from + `None`. +- **Strings:** immutable, UTF-8; `.len` is bytes; codepoint/grapheme helpers and + patterns operate on bytes. +- **Closures** capture bindings by reference (mutating a captured `var` is + visible). **Loop bindings are fresh per iteration** — each `for`/`while` turn + gets its own binding, so a closure created in the loop captures *that* + iteration's value (Go 1.22 / JS-`let` semantics). The build example's per-`src` + recipe closures depend on this. + +## Execution: RSN on the interp + +- **Pipeline:** source → lexer → LL(1) parser → AST → type check → CG-API + lowering → IR → (interp sink) `InterpInsn`. Typed ops are real IR ops; only + `any` and a few container/iterator primitives are `rt/lib/` helper calls. +- **Register VM, direct-threaded dispatch** — the existing interp. +- **Fibers** are the interp's explicit `InterpStack`s. A yielding op (`run`, + `read`, `sleep`, channel ops) returns `CFREE_INTERP_BLOCKED`; a **scheduler** + (living with the host, since readiness depends on host events) resumes the stack + when ready. Cooperative: no preemption. +- **The one limit (like Lua):** a fiber may yield only at RSN/IR instruction + boundaries and blessed scheduler-aware host calls — never from inside a native C + frame reached via FFI. +- **Surface:** `spawn { … }` is an *expression* yielding a fiber handle (`let h = + spawn { … }`); `parallel(list, limit, fn)` → bounded fan-out, results in input + order; `chan()` / `.send` / `.recv`. +- **Determinism guard:** fibers are execution-phase; build-graph *definition* runs + single-fiber, and the build-file capability gate nulls `spawn` and writes + (read-only `glob`/`stat` stay live so the graph can discover its inputs). + +## Memory and GC + +- **Non-moving mark-sweep**, precise via the managed-ref type. Allocation funnels + through `rsn_alloc(shape, size)`; collection runs at an allocation threshold. +- **Stop-the-world is trivial** (single-threaded, cooperative): collect only at + allocation safepoints, where all roots are in registers. +- **Roots:** live ref-PRegs across all fibers (via each function's ref-bitmap), + globals, the string-intern table, the C-roots scratch. Tracing uses each + object's shape (per-variant for sums). +- **Finalizers:** `handle`/`chan`/`fiber` and host-resource objects get an + optional `finalize` slot in their shape, run on collection and at teardown. +- **Arena stage (1a):** managed values bump-allocate into a run-scoped arena and + the collector is deferred (free at run end); object headers and shapes are + designed in from the start, so turning the collector on in 1b is localized. + +## Embedding API and capability gating + +```c +typedef struct CfreeRsnHost { + const CfreeFileIO* file_io; /* build-def: read/glob/stat live, writes gated */ + int (*spawn)(void* user, const CfreeRsnSpawn*, CfreeRsnProc** out); + int (*proc_reap)(void* user, CfreeRsnProc*, int* exit_code); /* scheduler */ + int64_t (*clock_ns)(void* user); /* gated */ + int (*mkdir_p)(void* user, const char* path); + void* user; +} CfreeRsnHost; + +CFREE_API CfreeRsn* cfree_rsn_new(CfreeContext*, const CfreeRsnHost*); +CFREE_API CfreeStatus cfree_rsn_eval(CfreeRsn*, CfreeSlice src, const char* name); +CFREE_API void cfree_rsn_register(CfreeRsn*, const char* name, + CfreeRsnNativeFn, void* ud); +CFREE_API void cfree_rsn_free(CfreeRsn*); +``` + +- **Script mode** (`cfree rsn foo.rsn`): full vtable — replaces the shell glue. +- **Build-definition mode**: `spawn`/`clock_ns`/write-`file_io` NULL, but + **read-only filesystem queries stay live** (`glob`, `stat`, read) — a + `build.rsn` must discover its inputs (`glob("src/**/*.c")`) to declare targets, + yet cannot `run()` processes, write files, or read the clock, so the graph is + hermetic and ad-hoc effects are a load-time error. The capability gate is + therefore per-operation (read vs. write vs. spawn), not a single wholesale + `file_io` switch. One frontend, two privilege levels, chosen by the caller. + +## The build layer (later) + +A library *in RSN* plus a thin engine: `target(name, inputs, outputs, recipe)`; +staleness by **content hash via the CAS** (not mtimes); parallel execution via +`parallel`/fibers bounded to `ncpu()`; `cfree build` loads `build.rsn` in +build-definition mode for the graph, then executes it in script mode. Out of scope +for the milestones below — it is what the language is *for*. + +## Implementation phases + +**Stage 1a — typed RSN, interpreted, arena-allocated (no collector yet):** + +1. `lang/rsn/` lexer + LL(1) parser → AST (full grammar above). Parse-corpus tests + under `test/rsn/`. +2. Type checker: local inference, ADTs (`struct`/`enum`/tuple), builtin-container + generics, UFCS resolution, the uniform `pattern` grammar (irrefutability check + for `let`/`var`/`for`, exhaustiveness for `match`), `Result`/`Option`/`try`, + modules. +3. RSN → CG-API lowering (typed ops → IR ops; `any`/containers/iterators via + `rt/lib/`). Managed values live in a **run-scoped arena, freed at run end** — + the interp carries a per-function ref-bitmap as interp-local metadata, but no + new CG type kind is added yet, so the C compiler's type system is untouched. +4. Fibers: complete the `BLOCKED`/yield path + scheduler over `InterpStack`s + + host vtable (`spawn`/`run`/`read`/`clock`); `spawn`/`parallel`/`chan`. +5. Stdlib: strings, list/map, iterators/`range`, Lua-style patterns, JSON (→ + `any`), `run`/`exec`. +6. CLI `cfree rsn` + per-operation capability gating. + +Goal of 1a: prove the language on real `build.rsn` / test scripts *before* the +invasive IR surgery. Short-lived runs make arena-free-at-exit genuinely adequate. + +**Stage 1b — precise GC (added once runs are long-lived enough to need it):** + +7. **CG/IR managed-ref type + GC op dialect** (`src/cg`), gated `CFREE_MANAGED_IR`; + object headers + (per-variant) shape descriptors. Firewall test: C-only build + green with the dialect compiled out. +8. Lowering emits managed ops for refs; the interp's root scan + mark-sweep + collector + finalizers replace the arena. (The 1a ref-bitmap becomes the + precise root map.) + +**Stage 2 — native managed codegen (optional, demand-driven):** + +9. Safepoints + GC stack maps in the optimizer (liveness/regalloc). +10. Native emit for managed functions across aa64/x64/rv64. +11. JIT/AOT RSN: `cfree rsn --jit`, and `cfree build` emitting native build tools. +12. Wasm/C-backend managed support (furthest out). + +**Then:** the build layer. + +## Resolved decisions + +All of: the architecture (typed CG-API frontend; GC-aware IR; precise non-moving +mark-sweep; cooperative fibers; staged interp → 1a arena → 1b GC → native); +`let`/`var`; non-null + `T?`; `Result`/`try`/`catch`; ADTs (`struct`/`enum`) with +`match`; `any` narrowing via lowercase type-patterns in `match`; **UFCS methods**; +**string interpolation**; **`#` line comments**; expression `if`/`match` +with the `{}`-block rule and the struct-literal-in-condition restriction; +qualified module imports; Lua-style patterns; separate `int`/`float` with explicit +conversion; **integer overflow traps**; **selective string interning**; +**per-operation capability gating**; **first-class tuples with one uniform +`pattern` grammar (Rust-style) + parallel assignment**; **`match` as the single +multi-way construct (no `switch`) with or-patterns and no fallthrough**; Stage-1 +generics = builtin containers only; the semantics defaults above; NaN-box value +model dropped. + +## Open decisions + +None block Stage 1a. Deferred until their phase: **selective imports** (`from … +import { … }` sugar), **range syntax** (`a..b` sugar vs the `range()` builtin) — +which would also unlock **range patterns** in `match` (`1..10 -> …`), +**record-style enum payloads** (`Variant { … }`), and **user-defined generic +functions** (instantiation strategy + constraints — see Type system). The earlier +tuple gap is now resolved (first-class `(T, U)`, see Type system / Value model / +Grammar); the remaining tuple sub-question deferred to its phase is **packed +all-scalar tuple representation** (e.g. unboxed `(int, int)`), niche-optimized +alongside packed `list<int>`. **Match guards** (`pattern if cond -> …`) are +deliberately deferred — `match` stays structural + or-pattern dispatch, and +conditional logic lives in `if`/`else if`; guards can be added later +non-breakingly (`matchArm = pattern [ "if" expr ] "->" …`). + +## Naming + +Language: **RSN — Ryan's Scripting Notation.** Source extension `.rsn`. Frontend +`lang/rsn/`, CLI `cfree rsn`, frontend flag `CFREE_RSN_ENABLED`; the GC-aware IR +it depends on is gated separately (`CFREE_MANAGED_IR`). When RSN ships, the +durable design moves up to `doc/RSN.md` and this roadmap shrinks to what remains +open.