RQL — a typed relational query language (planned work)

A replacement for SQL: declarative data query and manipulation, but typed, consistent, and close to the relational algebra. RQL is a working name (TBD).

SQL conflates three layers that should be separate — a surface syntax, a logical algebra, and a physical plan — and its surface does not map cleanly onto its own algebra. That mismatch is the source of most of SQL's irregularity. RQL keeps the layers apart: a small, total, well-typed core algebra (the IR) and a pipeline surface that maps onto it transparently. Same discipline as a compiler: surface → IR → plan, where the IR is where the laws live.

This doc is the living design and the ledger of decisions. It records what we decided and why; open questions sit at the end and graduate up into the ledger as we settle them.

What we're correcting

The specific SQL faults RQL is designed to remove:

Clause order ≠ logical order. SELECT is written first but evaluated nearly last, which breaks composition, top-to-bottom reading, and completion.
Three-valued logic. NULL poisons predicates; NOT IN with a null is a silent footgun; aggregates skip nulls inconsistently.
GROUP BY conflates partitioning with aggregation, forcing the "every select item must be grouped or aggregated" rule and a separate HAVING.
Weak typing & implicit coercion; no sum types, no row polymorphism, errors surface at runtime.
Bags pretending to be sets, with DISTINCT bolted on, ordered/duplicate column names, and SELECT *.
Universal quantification is miserable — Codd's division ("supplies all parts") has no clean SQL spelling.

Locked decisions

Each entry is a decision plus a one-line rationale. Numbers are stable IDs in decision order; the grouping is thematic, so IDs need not run contiguously within a group.

Semantics

A relation is a set of distinct records. A record (tuple) is an unordered set of named, typed fields; a relation is a set of records over one heading. Boolean-semiring sets, not bags. — Clean algebraic laws ⇒ sound optimizer rewrites (predicate/projection pushdown, join reordering are real equalities); count is meaningful (distinct), not "however many dupes were present."
Base tables must declare a key. Distinctness is a checked property, not a hope. — Makes "set" honest, and prevents aggregating over a lossy projection.
Multiplicity is data, not a second semantics. When duplicates must be counted, carry an explicit count: Int (or weight) field and weight aggregates explicitly (sum(group.amount * group.count)). — Buys the bag / K-relation (provenance-semiring) expressiveness as a value, without two equalities, a truncating minus, or weight-aware aggregate magic. Accepted cost: a synthetic id at ingest for genuinely keyless multiset data. Revisit if heavy event/log ingestion makes that friction too high.
Ordering is not part of a relation. sort / take / window produce a Seq, a distinct type. — Relations are unordered; ordering is a presentation/boundary concern and shouldn't pollute relational laws.

Missing values

No NULL. Optionality lives in the type (T? = Option T), introduced only by outer joins. Predicates are total, 2-valued booleans; defaults via ??. — Removes three-valued logic and the NOT IN / aggregate-skipping footguns; "missing" becomes explicit and local.
Option semantics: total equality, some-widening, where-narrowing. == / != on T? are total Bool (structural: none == some x is false); a non-option widens to an option implicitly (T <: T?, the only implicit conversion); where x is some (or x != none) narrows x to T downstream. Ordering, arithmetic, and value-use on T? need explicit resolution. — Keeps option handling total and ergonomic without reintroducing 3-valued logic (refines #5).

Surface

Pipeline / algebra surface: R |> op |> op …, one stage per core operator, written in evaluation order. A pipeline is just a relation expression (there is no from keyword), so it nests anywhere a relation is expected. — Composition, top-to-bottom reading, and a 1:1 correspondence with the IR. A comprehension (calculus) surface over the same IR is deferred, not rejected.
Grouping only partitions; aggregation is ordinary expressions over the relation-valued group field. No HAVING (it is where after group); no "must appear in GROUP BY" rule (referencing a non-key scalar after grouping is a type error). — One uniform mechanism; the awkward SQL special cases vanish.

Scope

Query + DML + constraints, unified by relational assignment. A database is a set of relation variables; insert / update / delete are all R := <relational expression>; keys, foreign keys, and arbitrary predicates are declared constraints checked on write. — One mechanism (Third Manifesto / Tutorial D style); set semantics makes assignment idempotent and total.

Types

(First draft — see "Type system" below. The shape is settled; specific inference rules and the keys-in-type ambition dial are open.)

Row-polymorphic structural records. Headings are structural record types; extend / project / rename / join are polymorphic over "the other fields" via a row variable, with a field-absence ("lacks") predicate. Duplicate labels are forbidden (a heading can't name a field twice).
Named scalar domains, no implicit cross-domain coercion. Money is not Float; conversions are explicit. — Supports "typed and consistent."
Keys/FDs live in a metadata + inference layer, not the type proper. Keys are declared on base tables and inferred through operators alongside the type; they power the fan-out-must-aggregate check, optimizer hints, and warnings, but do not participate in type equality. — Pragmatic sweet spot; leaves a clean path to full type-level keys if signatures ever need them.

Joins

Explicit-predicate joins + key-inference are the foundation; the relationship/FK layer is deferred, additive sugar. join R on <predicate> is the primitive — it covers equi, theta, range, and self joins. When the condition equates a key of one side, key-inference (decision #11) bounds the join to ≤1 match (no fan-out) and feeds the must-aggregate warning; a many-to- many join is flagged. A relationship layer — declared references, reverse navigation, path expressions (order.customer.region.name), and cardinality-typed "many" sides that force aggregation — rides on top later. It is purely additive and breaks nothing. — Ships a typed join with a cardinality story over arbitrary / un-annotated data first, and defers the schema-declaration machinery (named relationships, reverse naming, path resolution) until the core is proven.
Join result disambiguation: source-qualified names. Every field carries the source relvar/alias it came from; a bare name is sugar for unambiguous access. A join forms the qualified union of both sides (no auto-merge): same bare names coexist as distinct qualified fields and must be referenced source.field. join R as a … supplies a qualifier (needed for self-joins); select / extend emit fresh unqualified fields, so qualification is join-local. Natural join (merge common fields, equal types required) is opt-in (join R natural). — Refines #9: labels are unique qualified names, bare access must be unambiguous.

Aggregation

Grouping is partition + ordinary expressions; there is no "must-group" rule. The block form group by K { … } binds the per-group relation to the keyword group and desugars to group K into g |> extend … |> remove g. group is a relation; aggregates — written as function calls, count(group), mean(group.salary) — are the only relation→scalar bridge; group.f where a scalar is expected is a plain type error. — Recovers SQL's grouping rule from the type system; having is just where after the block.
Aggregates are commutative monoids; column extraction is a bag. group.f yields one value per row (a bag), not a dedup'd set, so sets + keys make aggregation correct with no multiplicity bookkeeping. Order- dependent reductions (first, in-order concat, lag/lead) are not aggregates — they live on the Seq layer. — A clean dividing line that also explains why windows are separate.
Empty / option / fan-out aggregation is explicit, never silent. Empty input → option (mean/min/max → None) or identity (sum/count → 0); aggregating an option column is a type error (resolve first); duplicate-sensitive aggregates over a fanned-out field warn. — Replaces SQL's silent NULLs, null-skipping, and double-counting with types plus one warning.

Manipulation (DML)

A database is a set of relvars; the only mutation primitive is relational assignment R := <expr>. Insert / delete / update are sugar over it, reusing the query operators and expressions; assignment requires a static heading match (no positional columns). Set semantics ⇒ insert/delete idempotent, duplicate-key insert rejected (no silent upsert). Adds the with operator (override an existing field) as the typed sibling of extend (add a new field). — One mechanism; query and mutation share a language.
Constraints are boolean relational expressions that must hold after every transaction. Keys and FKs are named, incrementally-checked special cases; tuple-level and database-level (multi-relvar) checks are unified and first-class. Referential actions (restrict default / cascade / set-null) are declared repairs applied within the transaction. — Cross-relation invariants SQL cannot express become ordinary, checkable booleans.
The transaction is the checking and atomicity boundary. A batch of assignments is applied together with all constraints checked once at commit; any violation rolls the batch back atomically; a bare assignment is a singleton transaction. — Makes FKs and cascades coherent (parent + child together) and gives one consistency boundary.

Recursion

Recursion is a restricted least fixpoint: monotone, range-restricted, with stratified negation. recursive R = base |> union (… R …), terminating by construction (semi-naive evaluation); the result is an ordinary relation; closure(...) is sugar. — First-class reachability / ancestry / BOM explosion without SQL's recursive-CTE pain or unrestricted-recursion non-termination. Recursion with aggregation (shortest paths) is deferred.

Ordered data & windows

The Seq layer carries all order-dependent computation. sort by produces Seq R; scans (Seq → Seq: running/moving/lag/lead/rank) and ordered folds (Seq → scalar: first/last/concat) live there, never in relational aggregates. The Rel/Seq type boundary enforces decision #14. — Order is a type, not a convention.
Windows reuse partitioning + sorting + extend, not a bespoke OVER. partition by P order by S extend {…} desugars to group P into g |> with g = (g |> sort by S |> extend <window-exprs>) |> ungroup g; frames are relative ranges (rows[..current], rows[-3..0]); lag/lead/rank are Seq built-ins; output is a Seq. — Windows fall out of pieces we already have.

Concrete syntax

= binds values, : ascribes types; a bare field name is punning; -- starts a comment. Records/tuples are { f = e }, headings and field decls are { f: T }, and x in a record / select / group block means x = x. — Removes the value/type brace ambiguity in the early sketches.
. is type-directed. e.f is field access on a record (→ scalar), column extraction on a relation (→ Col T, a per-row bag), or option-propagating on T? (→ T'?). A Col is legal only as an aggregate / per-row argument; used as a scalar it is a type error — which is the no-must-group rule. — One operator, three meanings, all from the operand's type.
Pipelines are bare relation expressions; aggregates are function calls; the group block binds group. No from keyword (a pipeline nests anywhere a relation is expected; from survives only in delete from / insert … from). Aggregates use call syntax (count(group)); the sugar group by K {…} binds the keyword group (group K into name for nesting). select {…} is sugar (extend + project), project / remove are pure π, and let names a relation expression (replacing CTEs). — Maximal composability, one call syntax.

Type system (draft)

Type vocabulary

Scalar domains: Int, Float, Bool, Text, Time, … and user-defined domains over them. No implicit coercion across domains.
Option: T? = Option T. Only outer joins introduce it.
Record (heading): { a: T1, b: T2, … } — unordered, uniquely-named, typed fields. Field order is irrelevant.
Relation: Rel R, a set of records over heading R, e.g. Rel {sensor: Id, at: Time, value: Float}.
Sequence: Seq R, an ordered list of records (may repeat). Produced by sort; the home of order-dependent operations (windows, top-n).
Nested relation: a field may itself be Rel R' (relation-valued attribute). This is how group works; it appears only as an intermediate.

Core operators (the IR)

Each is total and (mostly) Rel → Rel. Row variable ρ stands for "the other fields"; f ∉ ρ is the lacks predicate.

restrict  (p: {ρ} -> Bool)        : Rel {ρ}            -> Rel {ρ}
extend    (f = e: T)              : Rel {ρ}            -> Rel {ρ, f: T}      [f ∉ ρ]
with      (f = e: T')            : Rel {f: T | ρ}     -> Rel {f: T' | ρ}    [f ∈ ρ]
project   F                        : Rel {F | ρ}        -> Rel {F}
remove    F                        : Rel {F | ρ}        -> Rel {ρ}
rename    a -> b                   : Rel {a: T | ρ}     -> Rel {b: T | ρ}    [b ∉ ρ]
join                               : Rel R1 -> Rel R2   -> Rel (R1 ⋈ R2)
union/minus/intersect              : Rel R  -> Rel R    -> Rel R
group     K into g                 : Rel {K, A}         -> Rel {K, g: Rel {A}}
ungroup   g                        : Rel {K, g: Rel A}  -> Rel {K, A}
sort by e                          : Rel R              -> Seq R   (→ Seq layer)

Aggregates are ordinary function calls Rel A -> scalar (count, sum, mean, …), used inside extend after group; count(group) is set cardinality.

Join headings & disambiguation

The default join (explicit predicate, decision #12) forms the qualified union of the two headings: every field keeps the source relvar/alias it came from, and same-named fields from the two sides coexist as distinct qualified fields rather than being merged. A bare name is sugar for unambiguous access; when a name appears on both sides you must qualify it source.field:

employees |> join departments on dept == departments.id
-- both sides have `name`, so it stays qualified:
|> select { who = employees.name, team = departments.name }

Aliases: join R as a on … supplies the qualifier a — required for self-joins (join employees as mgr on manager_id == mgr.id).
Equi-join keys are kept as the two distinct fields they are (dept and departments.id above); a select or natural join collapses them if wanted.
select / extend clean up: the fields they emit are fresh and unqualified, so qualification is a transient, join-local concern.
Natural join is opt-in: join R natural merges the fields common to both (which must then have equal types) into single bare fields and joins on their equality — the Tutorial-D behavior, now explicit rather than the default.

This refines decision #9: a heading's labels are unique qualified names; bare access must resolve to exactly one of them, or it is a type error.

Keys and functional dependencies

Keys are tracked beyond base-table declarations and inferred through operators, so the compiler can reason about cardinality:

restrict: keys preserved.
project F: a declared key survives iff it ⊆ F; the full projected heading is always a trivial superkey (sets auto-dedup).
group K into g: K becomes the key of the result by construction.
join on fields J: if J is a key of one side, that side contributes at most one match (no fan-out) and the other side's key is preserved; if J keys neither side, the join may fan out and the result key is the union of both.

Payoff: the compiler knows when a join fans out, so accidental row multiplication feeding a sum/count becomes detectable rather than silently wrong. This is the safety net behind decisions #1–#2.

Resolved (decision #11): keys/FDs live in a metadata + inference layer, not the type proper — propagated for cardinality reasoning, the fan-out check, and optimizer hints, but not part of type equality.

Inference

Pipeline stages are fully inferred (Hindley–Milner over rows); base tables and domains are annotated. Query result types are inferred and checkable. Basis: Rémy/Wand-style records with absence + row unification (closer to that than to Leijen's scoped labels, since relations forbid duplicate labels).

Aggregation & grouping

Grouping = partition + ordinary expressions

group by K { … } is sugar. The primitive is group K into g (nest): it turns Rel {K, A} into Rel {K, g: Rel {A}} — one row per distinct K, with the matching rows collected into the relation-valued field g. The block form adds fields computed from g and drops g:

R |> group by k1, k2 { a = count(group), b = mean(group.salary) }

-- desugars to (the block binds the keyword `group`):
R |> group (k1, k2) into g
  |> extend a = count(g), b = mean(g.salary)
  |> remove g

group is in scope in the block; output it to keep the nested relation ({ emps = group, n = count(group) } is a department with its list of employees), or omit it and it's dropped.

Why there's no "must appear in GROUP BY" rule

After grouping, k1, k2 are scalar key fields and group is a relation. No special rule is needed — it falls out of types:

group.salary is a column: the salary values across the group's rows (a bag — see below), not a scalar.
An aggregate is the only thing that turns a column/relation into a scalar (count(group) : Int, mean(group.salary) : Float).
group.salary where a scalar is expected is a type error. That is SQL's "grouped or aggregated" rule, recovered for free.

Columns vs. projection — why aggregation is correct

group.f (column extraction) yields one value per row — a bag — while project {f} yields a set (dedup'd). They differ exactly on duplicates, and that is what makes aggregation correct without multiplicity bookkeeping: base tables have keys (#2), so a group's rows are distinct real rows, so group.salary has one entry per real row and sum(group.salary) is right. Decisions #1 + #2 buy footgun-free aggregation; the explicit-count hatch (#3) is only needed once you deliberately collapse to a keyless form.

An aggregate's argument is a per-row expression over the group: group.f binds the current row's field, so sum(group.qty * group.price) reduces qty*price row by row. Equivalent comprehension form: sum(r.qty * r.price for r in group).

The aggregate functions

Relational aggregates must be commutative monoids (sets are unordered ⇒ the reduction must be order-independent): count, sum, min, max, mean, and/or, count(unique c). Order-dependent reductions are not relational aggregates — they belong to the Seq layer. Empty-input behavior uses options, not a silent NULL:

aggregate	empty input	type
`count`	`0`	`Int`
`sum`	`0`	`Num`
`mean/min/max`	`None`	`T?`

group by-produced groups are non-empty by construction, so the metadata layer narrows mean/min/max back to T there. Aggregating an option column (Col T?) is a type error — resolve first (group.salary ?? 0, or filter) — which removes SQL's inconsistent null-skipping.

Whole-relation aggregation

aggregate { … } is group by () — one group, one row out:

sales |> aggregate { total = sum(group.amount), n = count(group) }

Nested / multi-level aggregation

g is a relation, so any relational expression — including another group by — applies to it. Aggregating subtables composes:

orders
|> group by region {
     revenue  = sum(group.amount),
     by_month = (group |> group by month { m_rev = sum(group.amount) }),  -- inner `group` shadows outer
   }

Fan-out detection (the safety net)

A field f from base table T (key KT) is determined by KT. After a many-to- many join the relation's key grows to KT ∪ KS, so f's values repeat across the extra dimension. Rule: a duplicate-sensitive aggregate (sum, mean, count) over a field whose determinant key is coarser than the relation's current key is flagged — its values are being double-counted; duplicate- insensitive aggregates (min, max) are exempt. This is the concrete cash-out of key-inference (#11): the double-count SQL does silently becomes a warning.

DML & constraints

Everything is relational assignment

A database is a set of relvars (relation variables); table T { … } declares one. A relvar holds a relation value; the sole mutation primitive is relational assignment, T := <relational expression>, where expr's heading must match T's exactly (statically checked — no positional column matching). The three DML verbs are sugar over assignment, reusing the same operators and expressions as queries:

insert into R from <expr>       -->  R := R union <expr>
insert into R { id = 7, … }     -->  R := R union (rel{ …one tuple… })
delete from R where p           -->  R := R |> where not p
update R where p set { f = e }  -->  R := (R |> where not p)
                                          union (R |> where p |> with { f = e })

with { f = e } overrides an existing field (the override sibling of extend, which only adds); using the wrong one is a typed error, catching the set-typos SQL silently accepts. Set semantics makes these well-behaved: re-inserting an identical tuple is a no-op (R ∪ {t}, t ∈ R); inserting a different tuple with an existing key violates the key constraint and is rejected (no silent upsert); deleting absent rows is a no-op. Insert/delete are idempotent.

Constraints are booleans that must hold

A constraint is a boolean relational expression over the database that must be true after every transaction. Keys and FKs are named, incrementally-checked special cases of that one idea:

Key (#2): key (id), key (a, b) — the projection onto the key is injective; multiple candidate keys allowed.
Foreign key: dept: Id references departments(id) — an inclusion constraint, R{dept} ⊆ departments{id}; a nullable FK (dept: Id?) admits None. Parent-side referential actions — restrict (default), cascade, set-null — are declared repairs applied within the transaction before the check.
Check: an arbitrary predicate. Tuple-level (check salary > 0) is the common case; database-level constraints spanning relvars are first-class — exactly what SQL cannot express:
```
check count(employees |> where role == "CEO") <= 1
check (employees{dept} ⊆ departments{id})        -- an FK, spelled out
```

Any assignment whose result would falsify a constraint is rejected; the relvar is left unchanged.

Transactions are the checking boundary

Because constraints span relvars, the unit of checking is the transaction: a batch of assignments applied together, all constraints checked once at commit, any violation rolling the whole batch back atomically. A bare assignment is a singleton transaction. This is what lets a parent and child be inserted together past an FK, and what makes cascades coherent.

transaction {
  insert into departments { id = 3, name = "Eng" }
  insert into employees   { id = 7, name = "Ann", dept = 3 }
}        -- both visible together; constraints checked once

Views

A view is a named relation expression, not stored: view active = employees |> where active. Read-only to start; making views updatable (mapping a view-assignment back onto base relvars — the view-update problem) is deferred.

What this buys

Mutation reuses the entire query language — no separate DML dialect (contrast SQL's bespoke UPDATE … SET). Three verbs collapse to one primitive, so they compose and can't interact surprisingly. Constraints unify to "invariants that hold after every transaction," with multi-relvar invariants first-class. Static heading-match kills positional-column bugs. The transaction is simultaneously the atomicity and the constraint boundary.

Recursion

Restricted to a least fixpoint of a monotone, range-restricted relational expression — enough for reachability, ancestry, and BOM explosion, and guaranteed to terminate. A recursive relation is defined with recursive:

-- edges: Rel {src: Id, dst: Id, key (src, dst)}
recursive reach =
  edges
  |> union (reach |> join edges on dst == edges.src
                  |> select { src, dst = edges.dst })

What keeps it total:

Monotone: the recursive name may not appear under minus/not — negation is stratified (it may only refer to a lower stratum), so each iteration only adds tuples.
Range-restricted: every output value comes from the base/input, none are invented (no unbounded arithmetic in the recursive arm). Monotone + range-restricted + finite input ⇒ the least fixpoint is reached in finitely many steps.

The result is an ordinary relation, so it composes with everything else (filter / join / group the closure). The common case has sugar — closure(edges: src -> dst) expands to the fixpoint above. Recursion with aggregation (shortest paths, PageRank-style iteration) needs well-founded aggregate semantics and is deferred.

Windows & the `Seq` layer

Principle

Relations are unordered (#4); ordering produces a Seq, and all order-dependent computation lives there — never in relational aggregates, which must be commutative monoids (#14). sort by e : Rel R -> Seq R is the gateway, and the Rel/Seq type boundary is what enforces #14: you cannot write a running total over a Rel, nor a duplicate-collapsing set op over a Seq, without crossing the boundary explicitly. A Seq supports two kinds of computation:

Scans (Seq R -> Seq R'): give every row a value from its position, neighbors, or a frame — running totals, moving averages, lag/lead, ranks. Row count preserved.
Ordered folds (Seq R -> scalar): reduce in order — first, last, concat(sep). (Order-independent reductions remain relational aggregates; these are the ones that genuinely need order.)

Plus slicers take n, drop n, take while p, which subsume top-N and pagination (sort by x desc |> take 10).

Window expressions (scans)

A scan adds a column via extend, where the expression sees ordered context:

Frame aggregate: agg(e) over <frame>, an aggregate over a range relative to the current row:
- rows[..current] — running (unbounded preceding → current)
- rows[current..] — current → end
- rows[-3..0] — 3 preceding → current (a width-4 moving window)
- rows[-1..1] — immediate neighbors
Offsets: lag(e, k, default), lead(e, k, default) — value k rows away.
Ranking: row_number(), rank(), dense_rank(), ntile(n) — read the sort order and ties.

Partitioning reuses `group`

PARTITION BY is just "window within each group independently," and we already have grouping. So:

trades
|> partition by symbol order by ts
   extend running = sum(amount) over rows[..current],
          prev    = lag(price, 1),
          r       = rank()

is sugar for group / sort-within / position-aware extend / ungroup:

trades
|> group symbol into g
|> with g = (g |> sort by ts
               |> extend running = sum(amount) over rows[..current],
                         prev    = lag(price, 1),
                         r       = rank())
|> ungroup g

The result is a Seq carrying within-partition order; cross-partition order is unspecified until a final sort by. Rows stay distinct, so it can be taken back to a Rel with as rel if you want a set.

Idioms

-- first row per partition
events
|> partition by user order by ts extend rn = row_number()
|> where rn == 1

-- 7-day moving average
… |> partition by sku order by day extend ma = mean(qty) over rows[-6..0]

-- in-order audit trail per case (ordered fold)
steps |> group by case { trail = (group |> sort by ts |> concat(name, " -> ")) }

Why this beats SQL windows

No bespoke OVER (PARTITION BY … ORDER BY … ROWS BETWEEN …) sub-language: partitioning is the group you already know, ordering is sort by, the window value is an extend.
The Rel/Seq boundary makes order-dependence visible in the type and enforces the aggregate/scan split (#14) instead of leaving it to convention.
Frames are terse, and ordered folds (first/concat) are just Seq functions, not special cases.

Concrete syntax (grammar)

A first concrete grammar — enough to parse the examples above and to pin the ambiguities. Notation: (…) grouping, * zero-or-more, ? optional, | choice; UPPERCASE = tokens; -- starts a line comment.

Module & declarations

module     ::= item*
item       ::= domainDecl | tableDecl | viewDecl | letDecl | recDecl
             | constraintDecl | stmt

domainDecl ::= "domain" IDENT "=" type ("check" "(" expr ")")?
tableDecl  ::= "table" IDENT "{" field ("," field)* ("," tableCon)* ","? "}"
field      ::= IDENT ":" type ("references" IDENT "(" IDENT ")" refAction*)?
refAction  ::= "on" ("delete" | "update") ("restrict" | "cascade" | "set" "null")
tableCon   ::= "key" "(" IDENT ("," IDENT)* ")" | "check" "(" expr ")"
viewDecl   ::= "view"      IDENT "=" relExpr
letDecl    ::= "let"       IDENT "=" relExpr     -- names a relation (replaces CTEs)
recDecl    ::= "recursive" IDENT "=" relExpr     -- monotone, range-restricted
constraintDecl ::= "constraint" IDENT "check" "(" expr ")"

Statements

stmt ::= IDENT ":=" relExpr                              -- relational assignment
       | "insert" "into" IDENT (record | "from" relExpr)
       | "delete" "from" IDENT "where" expr
       | "update" IDENT "where" expr "set" record
       | "transaction" "{" stmt* "}"
       | relExpr                                          -- a query to evaluate

Relation expressions (pipelines)

relExpr ::= relPrim ("|>" stage)*
relPrim ::= IDENT | "(" relExpr ")" | relLit
          | "closure" "(" IDENT ":" IDENT "->" IDENT ")"
relLit  ::= "rel" "{" record ("," record)* ","? "}"

stage ::= "where"   expr
        | "extend"  bindings   | "with" bindings
        | "select"  block      | "project" nameBlock | "remove" nameBlock
        | "rename"  IDENT "->" IDENT
        | "join"    relPrim ("as" IDENT)? ("on" expr | "natural")
        | ("union" | "minus" | "intersect") relPrim
        | "group" "by" nameList block?                   -- sugar: binds `group`
        | "group" nameList "into" IDENT                  -- primitive nest
        | "ungroup" IDENT
        | "aggregate" block                              -- = group by ()
        | "sort" "by" sortKey ("," sortKey)*
        | "take" ("while" expr | expr) | "drop" expr
        | "partition" "by" nameList "order" "by" sortKey ("," sortKey)*
          "extend" bindings                              -- window; yields Seq

sortKey  ::= expr ("asc" | "desc")?
bindings ::= binding | block
binding  ::= IDENT "=" expr
block    ::= "{" entry ("," entry)* ","? "}"   ;  entry ::= binding | IDENT   -- punning
nameBlock::= "{" nameList ","? "}"             ;  nameList ::= IDENT ("," IDENT)*
record   ::= "{" entry ("," entry)* ","? "}"                  -- value tuple

Types

type    ::= (IDENT | "Rel" heading | "Seq" heading) "?"?
heading ::= "{" IDENT ":" type ("," IDENT ":" type)* ","? "}"

Scalar expressions (low → high precedence)

expr  ::= or ("??" or)?                        -- option default
or    ::= and ("or"  and)*
and   ::= cmp ("and" cmp)*
cmp   ::= add (cmpOp add | "is" ("some" | "none"))?   -- non-associative
cmpOp ::= "==" | "!=" | "<" | "<=" | ">" | ">=" | "in" | "subset"
add   ::= mul (("+" | "-") mul)*
mul   ::= un  (("*" | "/" | "%") un)*
un    ::= ("-" | "not") un | cast
cast  ::= post ("as" type)?
post  ::= prim trailer*
trailer ::= "." IDENT | "(" callArgs? ")" | "over" frame
prim  ::= LITERAL | IDENT | "(" expr ")" | quant
quant ::= ("all" | "any") "(" expr "for" IDENT "in" relExpr ")"
frame ::= "rows" "[" bound? ".." bound? "]"    ;  bound ::= "current" | INT
callArgs ::= expr ("," expr)*                  -- function / aggregate args
           | expr "for" IDENT "in" relExpr     -- aggregate comprehension form
LITERAL ::= INT | FLOAT | TEXT | "true" | "false" | "none" | "some" "(" expr ")"

Numeric literals allow _ digit separators (50_000); the 50k-style shorthand in earlier prose was informal.

Semantic clarifications

These resolve the ambiguities the prose left open; they are the canonical reading and supersede any looser earlier example.

. is type-directed (#23). e.f is field access on a record (→ scalar), column extraction on a relation (→ Col T, a per-row bag), or option- propagating access on T? (→ T'?). A Col is only ever an aggregate argument; used as a scalar it is a type error — that is the no-must-group rule.
= vs : (#22). = binds a value ({ id = 7 }); : ascribes a type ({ id: Id }). A bare x in a block is punning for x = x.
Pipelines are expressions; no from (#24). A query is a relation expression, so R |> … nests anywhere a relation is expected (join operands, nested groups, let / view bodies). from appears only in delete from / insert … from.
group keyword; into for naming. group by K { … } binds the per-group relation to group; nest with the group K into name primitive to avoid shadowing.
Aggregates are calls. count(group), mean(group.salary), sum(group.qty * group.price); the arguments are per-row expressions over the group that the aggregate reduces. The comprehension form sum(e for r in R) is equivalent.
select vs project. project {a, b} is pure π (existing names only); select { a, x = e } builds an exact output record and desugars to extend (computed) + project (to the listed fields).
Frames. In over rows[a..b]: current ≡ 0, negative = preceding, positive = following, an empty side = unbounded — rows[..current] running, rows[-3..0] width-4 trailing, rows[-1..1] neighbors.
Option discipline. == / != on options are total Bool (structural: none == some x is false), and a non-option widens to an option implicitly (T <: T? — the only implicit conversion). Ordering, arithmetic, and value-use on a T? need resolution first (??, or narrowing via where x is some); there is no silent null propagation.
Qualified names (#25). A field carries its source relvar/alias. A bare name must resolve unambiguously; after a join, a name present on both sides is written source.field, and join R as a aliases the source (needed for self-joins). In row context, a leading identifier that names a source qualifier is qualified field access; otherwise . is the type-directed access of clarification 1.

Showcase

One schema exercised end-to-end in canonical syntax: schema + constraints, then a battery of queries, windows, recursion, and DML. Everything here is intended to parse under the grammar above and type-check under the rules above.

Schema

domain Id    = Int
domain Money = Int  check (value >= 0)        -- minor units; `value` is the domain value

-- FK direction is declarative; declaration order and cycles (departments.head ⇄
-- employees.dept) are fine — the schema is checked as a whole.

table departments {
  id:     Id,
  name:   Text,
  budget: Money,
  head:   Id? references employees(id),       -- nullable FK: a dept may have no head
  key (id),
  key (name),                                  -- a second candidate key
}

table employees {
  id:      Id,
  name:    Text,
  email:   Text,
  dept:    Id  references departments(id) on delete restrict,
  manager: Id? references employees(id),       -- self-ref, nullable (the CEO has none)
  salary:  Money,
  hired:   Time,
  key (id),
  key (email),
  check (salary > 0),
}

table projects {
  id:   Id,
  name: Text,
  dept: Id references departments(id),
  key (id),
}

table assignments {                            -- employees ⋈ projects (many-to-many)
  emp:     Id references employees(id) on delete cascade,
  project: Id references projects(id) on delete cascade,
  hours:   Int  check (hours >= 0),
  key (emp, project),
}

-- a database-level invariant SQL cannot state directly:
constraint one_ceo check (count(employees |> where manager == none) <= 1)

let over_budget =
  employees
  |> group by dept { payroll = sum(group.salary) }
  |> join departments on dept == departments.id
  |> where payroll > budget
constraint payroll_within_budget check (count(over_budget) == 0)

Queries

-- Q1  average salary per dept, only well-paid teams, sorted
employees
|> where salary > 50_000
|> group by dept { headcount = count(group), avg_pay = mean(group.salary) }
|> where avg_pay > 70_000                       -- no HAVING; just `where` after group
|> join departments on dept == departments.id
|> select { team = departments.name, headcount, avg_pay }
|> sort by avg_pay desc                          -- result is a Seq

-- Q2  each employee with their manager (self-join; a None manager matches no row)
employees
|> join employees as mgr on employees.manager == mgr.id
|> select { who = employees.name, boss = mgr.name }
-- use `left join` instead to keep the CEO, with boss = none

-- Q3  total hours and head-count per project
assignments
|> group by project { total_hours = sum(group.hours), people = count(group) }
|> join projects on project == projects.id
|> select { project = projects.name, total_hours, people }
|> sort by total_hours desc

-- Q4  fan-out: the compiler catches a double-count at compile time
employees
|> join assignments on id == assignments.emp     -- 1:many ⇒ each emp recurs
|> group by dept { payroll = sum(group.salary) }  -- ⚠ salary double-counted per assignment
-- correct (employee grain, no fan-out):
employees |> group by dept { payroll = sum(group.salary) }

-- Q5  top-3 earners per department (window ⇒ Seq)
employees
|> partition by dept order by salary desc extend rank = rank()
|> where rank <= 3
|> sort by dept, rank

-- Q6  cumulative payroll as people are hired (window scan)
employees
|> partition by dept order by hired
   extend cumulative_payroll = sum(salary) over rows[..current]
|> select { dept, name, hired, cumulative_payroll }

-- Q7  all transitive reports under employee #1 (recursion + qualified disambiguation)
let reports = employees |> where manager is some       -- narrows `manager` to Id
                        |> select { mgr = manager, emp = id }
recursive chain =
  reports
  |> union (chain |> join reports on chain.emp == reports.mgr
                  |> select { mgr = chain.mgr, emp = reports.emp })
chain
|> where mgr == 1
|> join employees on emp == employees.id
|> select { report = employees.name }

Mutation

-- 10% raise for one department
update employees where dept == 3 set { salary = salary * 11 / 10 }

-- hire + first assignment atomically (the assignment's FK to the new employee
-- holds because both land in one transaction, checked once at commit)
transaction {
  insert into employees {
    id = 42, name = "Bo", email = "bo@co", dept = 3,
    manager = 7, salary = 9_000_000, hired = date("2026-06-01")
  }
  insert into assignments { emp = 42, project = 5, hours = 0 }
}

-- retire a project; its assignments cascade away
delete from projects where id == 5

Callouts

Q2 employees.manager == mgr.id: manager: Id? against mgr.id: Id widens by implicit some (#26); equality is total, so a None manager equals no row and the CEO drops out of the inner join. manager must be qualified (both sides are employees).
Q4 the ⚠ is a compile-time warning from key-inference (#11, #15), not a runtime surprise: salary is determined by employees.id but the joined rows are keyed by (emp, project).
Q5 / Q6 output is a Seq (#20); rank() and sum(…) over are Seq-layer operations, unavailable on a Rel.
Q7 where manager is some narrows manager from Id? to Id (#26); the recursive arm needs qualifiers (chain.emp, reports.mgr) because both inputs carry mgr and emp (#25); union type-checks because both arms have heading {mgr, emp}.

Open questions

Name. RQL is a placeholder.
Relationship/FK layer (deferred). The additive sugar from decision #12: declared references, reverse navigation, path expressions, cardinality-typed many-sides. When to build it; named-relationship & ambiguity rules; path resolution.
Comprehension surface. Add a calculus surface over the same IR later, or never?
Value-based & peer frames. SQL RANGE / GROUPS (frames by order-key value or tie-peers) deferred; positional rows[…] frames first.
Recursion with aggregation (shortest paths, fixpoint + min/sum). Deferred — needs well-founded aggregate-in-recursion semantics.
Updatable views. The view-update problem — when can a view assignment map back to base relvars? Deferred.
Defaults & generators. Column defaults (constants / pure exprs) vs. impure generators (identity/autoincrement, now()), which must arrive as explicit host effects. Design TBD.
Upsert / merge sugar. A declared-intent form over assignment.
Concurrency / isolation. Likely out of scope (RQL is a language, not an engine), but the transaction boundary needs an isolation story if it ever runs concurrently.
Implementation substrate. Could ride kit's pipeline as a CG-API frontend (cf. RSN) or stand alone. Out of scope until the language is nailed.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

kit

RQL — a typed relational query language (planned work)

What we're correcting

Locked decisions

Semantics

Missing values

Surface

Scope

Types

Joins

Aggregation

Manipulation (DML)

Recursion

Ordered data & windows

Concrete syntax

Type system (draft)

Type vocabulary

Core operators (the IR)

Join headings & disambiguation

Keys and functional dependencies

Inference

Aggregation & grouping

Grouping = partition + ordinary expressions

Why there's no "must appear in GROUP BY" rule

Columns vs. projection — why aggregation is correct

The aggregate functions

Whole-relation aggregation

Nested / multi-level aggregation

Fan-out detection (the safety net)

DML & constraints

Everything is relational assignment

Constraints are booleans that must hold

Transactions are the checking boundary

Views

What this buys

Recursion

Windows & the Seq layer

Principle

Window expressions (scans)

Partitioning reuses group

Idioms

Why this beats SQL windows

Concrete syntax (grammar)

Module & declarations

Statements

Relation expressions (pipelines)

Types

Scalar expressions (low → high precedence)

Semantic clarifications

Showcase

Schema

Queries

Mutation

Callouts

Open questions

Windows & the `Seq` layer

Partitioning reuses `group`