RQL — a typed relational query language (planned work)
A replacement for SQL: declarative data query and manipulation, but typed,
consistent, and close to the relational algebra. RQL is a working name (TBD).
SQL conflates three layers that should be separate — a surface syntax, a logical algebra, and a physical plan — and its surface does not map cleanly onto its own algebra. That mismatch is the source of most of SQL's irregularity. RQL keeps the layers apart: a small, total, well-typed core algebra (the IR) and a pipeline surface that maps onto it transparently. Same discipline as a compiler: surface → IR → plan, where the IR is where the laws live.
This doc is the living design and the ledger of decisions. It records what we decided and why; open questions sit at the end and graduate up into the ledger as we settle them.
What we're correcting
The specific SQL faults RQL is designed to remove:
- Clause order ≠ logical order.
SELECTis written first but evaluated nearly last, which breaks composition, top-to-bottom reading, and completion. - Three-valued logic.
NULLpoisons predicates;NOT INwith a null is a silent footgun; aggregates skip nulls inconsistently. GROUP BYconflates partitioning with aggregation, forcing the "every select item must be grouped or aggregated" rule and a separateHAVING.- Weak typing & implicit coercion; no sum types, no row polymorphism, errors surface at runtime.
- Bags pretending to be sets, with
DISTINCTbolted on, ordered/duplicate column names, andSELECT *. - Universal quantification is miserable — Codd's division ("supplies all parts") has no clean SQL spelling.
Locked decisions
Each entry is a decision plus a one-line rationale. Numbers are stable IDs in decision order; the grouping is thematic, so IDs need not run contiguously within a group.
Semantics
- A relation is a set of distinct records. A record (tuple) is an unordered
set of named, typed fields; a relation is a set of records over one heading.
Boolean-semiring sets, not bags. — Clean algebraic laws ⇒ sound optimizer
rewrites (predicate/projection pushdown, join reordering are real equalities);
countis meaningful (distinct), not "however many dupes were present." - Base tables must declare a key. Distinctness is a checked property, not a hope. — Makes "set" honest, and prevents aggregating over a lossy projection.
- Multiplicity is data, not a second semantics. When duplicates must be
counted, carry an explicit
count: Int(orweight) field and weight aggregates explicitly (sum(group.amount * group.count)). — Buys the bag / K-relation (provenance-semiring) expressiveness as a value, without two equalities, a truncatingminus, or weight-aware aggregate magic. Accepted cost: a synthetic id at ingest for genuinely keyless multiset data. Revisit if heavy event/log ingestion makes that friction too high. - Ordering is not part of a relation.
sort/take/windowproduce aSeq, a distinct type. — Relations are unordered; ordering is a presentation/boundary concern and shouldn't pollute relational laws.
Missing values
- No
NULL. Optionality lives in the type (T?=Option T), introduced only by outer joins. Predicates are total, 2-valued booleans; defaults via??. — Removes three-valued logic and theNOT IN/ aggregate-skipping footguns; "missing" becomes explicit and local. - Option semantics: total equality,
some-widening,where-narrowing.==/!=onT?are total Bool (structural:none == some xisfalse); a non-option widens to an option implicitly (T <: T?, the only implicit conversion);where x is some(orx != none) narrowsxtoTdownstream. Ordering, arithmetic, and value-use onT?need explicit resolution. — Keeps option handling total and ergonomic without reintroducing 3-valued logic (refines #5).
Surface
- Pipeline / algebra surface:
R |> op |> op …, one stage per core operator, written in evaluation order. A pipeline is just a relation expression (there is nofromkeyword), so it nests anywhere a relation is expected. — Composition, top-to-bottom reading, and a 1:1 correspondence with the IR. A comprehension (calculus) surface over the same IR is deferred, not rejected. - Grouping only partitions; aggregation is ordinary expressions over the
relation-valued group field. No
HAVING(it iswhereaftergroup); no "must appear in GROUP BY" rule (referencing a non-key scalar after grouping is a type error). — One uniform mechanism; the awkward SQL special cases vanish.
Scope
- Query + DML + constraints, unified by relational assignment. A database is
a set of relation variables; insert / update / delete are all
R := <relational expression>; keys, foreign keys, and arbitrary predicates are declared constraints checked on write. — One mechanism (Third Manifesto / Tutorial D style); set semantics makes assignment idempotent and total.
Types
(First draft — see "Type system" below. The shape is settled; specific inference rules and the keys-in-type ambition dial are open.)
- Row-polymorphic structural records. Headings are structural record types;
extend/project/rename/joinare polymorphic over "the other fields" via a row variable, with a field-absence ("lacks") predicate. Duplicate labels are forbidden (a heading can't name a field twice). - Named scalar domains, no implicit cross-domain coercion.
Moneyis notFloat; conversions are explicit. — Supports "typed and consistent." - Keys/FDs live in a metadata + inference layer, not the type proper. Keys are declared on base tables and inferred through operators alongside the type; they power the fan-out-must-aggregate check, optimizer hints, and warnings, but do not participate in type equality. — Pragmatic sweet spot; leaves a clean path to full type-level keys if signatures ever need them.
Joins
- Explicit-predicate joins + key-inference are the foundation; the
relationship/FK layer is deferred, additive sugar.
join R on <predicate>is the primitive — it covers equi, theta, range, and self joins. When the condition equates a key of one side, key-inference (decision #11) bounds the join to ≤1 match (no fan-out) and feeds the must-aggregate warning; a many-to- many join is flagged. A relationship layer — declared references, reverse navigation, path expressions (order.customer.region.name), and cardinality-typed "many" sides that force aggregation — rides on top later. It is purely additive and breaks nothing. — Ships a typed join with a cardinality story over arbitrary / un-annotated data first, and defers the schema-declaration machinery (named relationships, reverse naming, path resolution) until the core is proven. - Join result disambiguation: source-qualified names. Every field carries
the source relvar/alias it came from; a bare name is sugar for unambiguous
access. A
joinforms the qualified union of both sides (no auto-merge): same bare names coexist as distinct qualified fields and must be referencedsource.field.join R as a …supplies a qualifier (needed for self-joins);select/extendemit fresh unqualified fields, so qualification is join-local. Natural join (merge common fields, equal types required) is opt-in (join R natural). — Refines #9: labels are unique qualified names, bare access must be unambiguous.
Aggregation
- Grouping is partition + ordinary expressions; there is no "must-group"
rule. The block form
group by K { … }binds the per-group relation to the keywordgroupand desugars togroup K into g |> extend … |> remove g.groupis a relation; aggregates — written as function calls,count(group),mean(group.salary)— are the only relation→scalar bridge;group.fwhere a scalar is expected is a plain type error. — Recovers SQL's grouping rule from the type system;havingis justwhereafter the block. - Aggregates are commutative monoids; column extraction is a bag.
group.fyields one value per row (a bag), not a dedup'd set, so sets + keys make aggregation correct with no multiplicity bookkeeping. Order- dependent reductions (first, in-order concat, lag/lead) are not aggregates — they live on theSeqlayer. — A clean dividing line that also explains why windows are separate. - Empty / option / fan-out aggregation is explicit, never silent. Empty
input → option (
mean/min/max→None) or identity (sum/count→0); aggregating an option column is a type error (resolve first); duplicate-sensitive aggregates over a fanned-out field warn. — Replaces SQL's silent NULLs, null-skipping, and double-counting with types plus one warning.
Manipulation (DML)
- A database is a set of relvars; the only mutation primitive is relational
assignment
R := <expr>. Insert / delete / update are sugar over it, reusing the query operators and expressions; assignment requires a static heading match (no positional columns). Set semantics ⇒ insert/delete idempotent, duplicate-key insert rejected (no silent upsert). Adds thewithoperator (override an existing field) as the typed sibling ofextend(add a new field). — One mechanism; query and mutation share a language. - Constraints are boolean relational expressions that must hold after every transaction. Keys and FKs are named, incrementally-checked special cases; tuple-level and database-level (multi-relvar) checks are unified and first-class. Referential actions (restrict default / cascade / set-null) are declared repairs applied within the transaction. — Cross-relation invariants SQL cannot express become ordinary, checkable booleans.
- The transaction is the checking and atomicity boundary. A batch of assignments is applied together with all constraints checked once at commit; any violation rolls the batch back atomically; a bare assignment is a singleton transaction. — Makes FKs and cascades coherent (parent + child together) and gives one consistency boundary.
Recursion
- Recursion is a restricted least fixpoint: monotone, range-restricted, with
stratified negation.
recursive R = base |> union (… R …), terminating by construction (semi-naive evaluation); the result is an ordinary relation;closure(...)is sugar. — First-class reachability / ancestry / BOM explosion without SQL's recursive-CTE pain or unrestricted-recursion non-termination. Recursion with aggregation (shortest paths) is deferred.
Ordered data & windows
- The
Seqlayer carries all order-dependent computation.sort byproducesSeq R; scans (Seq → Seq: running/moving/lag/lead/rank) and ordered folds (Seq → scalar:first/last/concat) live there, never in relational aggregates. TheRel/Seqtype boundary enforces decision #14. — Order is a type, not a convention. - Windows reuse partitioning + sorting +
extend, not a bespokeOVER.partition by P order by S extend {…}desugars togroup P into g |> with g = (g |> sort by S |> extend <window-exprs>) |> ungroup g; frames are relative ranges (rows[..current],rows[-3..0]);lag/lead/rankareSeqbuilt-ins; output is aSeq. — Windows fall out of pieces we already have.
Concrete syntax
=binds values,:ascribes types; a bare field name is punning;--starts a comment. Records/tuples are{ f = e }, headings and field decls are{ f: T }, andxin a record / select / group block meansx = x. — Removes the value/type brace ambiguity in the early sketches..is type-directed.e.fis field access on a record (→ scalar), column extraction on a relation (→Col T, a per-row bag), or option-propagating onT?(→T'?). AColis legal only as an aggregate / per-row argument; used as a scalar it is a type error — which is the no-must-group rule. — One operator, three meanings, all from the operand's type.- Pipelines are bare relation expressions; aggregates are function calls; the
group block binds
group. Nofromkeyword (a pipeline nests anywhere a relation is expected;fromsurvives only indelete from/insert … from). Aggregates use call syntax (count(group)); the sugargroup by K {…}binds the keywordgroup(group K into namefor nesting).select {…}is sugar (extend + project),project/removeare pure π, andletnames a relation expression (replacing CTEs). — Maximal composability, one call syntax.
Type system (draft)
Type vocabulary
- Scalar domains:
Int,Float,Bool,Text,Time, … and user-defined domains over them. No implicit coercion across domains. - Option:
T?=Option T. Only outer joins introduce it. - Record (heading):
{ a: T1, b: T2, … }— unordered, uniquely-named, typed fields. Field order is irrelevant. - Relation:
Rel R, a set of records over headingR, e.g.Rel {sensor: Id, at: Time, value: Float}. - Sequence:
Seq R, an ordered list of records (may repeat). Produced bysort; the home of order-dependent operations (windows, top-n). - Nested relation: a field may itself be
Rel R'(relation-valued attribute). This is howgroupworks; it appears only as an intermediate.
Core operators (the IR)
Each is total and (mostly) Rel → Rel. Row variable ρ stands for "the other
fields"; f ∉ ρ is the lacks predicate.
restrict (p: {ρ} -> Bool) : Rel {ρ} -> Rel {ρ}
extend (f = e: T) : Rel {ρ} -> Rel {ρ, f: T} [f ∉ ρ]
with (f = e: T') : Rel {f: T | ρ} -> Rel {f: T' | ρ} [f ∈ ρ]
project F : Rel {F | ρ} -> Rel {F}
remove F : Rel {F | ρ} -> Rel {ρ}
rename a -> b : Rel {a: T | ρ} -> Rel {b: T | ρ} [b ∉ ρ]
join : Rel R1 -> Rel R2 -> Rel (R1 ⋈ R2)
union/minus/intersect : Rel R -> Rel R -> Rel R
group K into g : Rel {K, A} -> Rel {K, g: Rel {A}}
ungroup g : Rel {K, g: Rel A} -> Rel {K, A}
sort by e : Rel R -> Seq R (→ Seq layer)
Aggregates are ordinary function calls Rel A -> scalar (count, sum, mean,
…), used inside extend after group; count(group) is set cardinality.
Join headings & disambiguation
The default join (explicit predicate, decision #12) forms the qualified
union of the two headings: every field keeps the source relvar/alias it came
from, and same-named fields from the two sides coexist as distinct qualified
fields rather than being merged. A bare name is sugar for unambiguous access;
when a name appears on both sides you must qualify it source.field:
employees |> join departments on dept == departments.id
-- both sides have `name`, so it stays qualified:
|> select { who = employees.name, team = departments.name }
- Aliases:
join R as a on …supplies the qualifiera— required for self-joins (join employees as mgr on manager_id == mgr.id). - Equi-join keys are kept as the two distinct fields they are (
deptanddepartments.idabove); aselector natural join collapses them if wanted. select/extendclean up: the fields they emit are fresh and unqualified, so qualification is a transient, join-local concern.- Natural join is opt-in:
join R naturalmerges the fields common to both (which must then have equal types) into single bare fields and joins on their equality — the Tutorial-D behavior, now explicit rather than the default.
This refines decision #9: a heading's labels are unique qualified names; bare access must resolve to exactly one of them, or it is a type error.
Keys and functional dependencies
Keys are tracked beyond base-table declarations and inferred through operators, so the compiler can reason about cardinality:
restrict: keys preserved.project F: a declared key survives iff it ⊆ F; the full projected heading is always a trivial superkey (sets auto-dedup).group K into g:Kbecomes the key of the result by construction.joinon fieldsJ: ifJis a key of one side, that side contributes at most one match (no fan-out) and the other side's key is preserved; ifJkeys neither side, the join may fan out and the result key is the union of both.
Payoff: the compiler knows when a join fans out, so accidental row
multiplication feeding a sum/count becomes detectable rather than silently
wrong. This is the safety net behind decisions #1–#2.
Resolved (decision #11): keys/FDs live in a metadata + inference layer, not the type proper — propagated for cardinality reasoning, the fan-out check, and optimizer hints, but not part of type equality.
Inference
Pipeline stages are fully inferred (Hindley–Milner over rows); base tables and domains are annotated. Query result types are inferred and checkable. Basis: Rémy/Wand-style records with absence + row unification (closer to that than to Leijen's scoped labels, since relations forbid duplicate labels).
Aggregation & grouping
Grouping = partition + ordinary expressions
group by K { … } is sugar. The primitive is group K into g (nest): it turns
Rel {K, A} into Rel {K, g: Rel {A}} — one row per distinct K, with the
matching rows collected into the relation-valued field g. The block form adds
fields computed from g and drops g:
R |> group by k1, k2 { a = count(group), b = mean(group.salary) }
-- desugars to (the block binds the keyword `group`):
R |> group (k1, k2) into g
|> extend a = count(g), b = mean(g.salary)
|> remove g
group is in scope in the block; output it to keep the nested relation
({ emps = group, n = count(group) } is a department with its list of
employees), or omit it and it's dropped.
Why there's no "must appear in GROUP BY" rule
After grouping, k1, k2 are scalar key fields and group is a relation. No
special rule is needed — it falls out of types:
group.salaryis a column: thesalaryvalues across the group's rows (a bag — see below), not a scalar.- An aggregate is the only thing that turns a column/relation into a scalar
(
count(group) : Int,mean(group.salary) : Float). group.salarywhere a scalar is expected is a type error. That is SQL's "grouped or aggregated" rule, recovered for free.
Columns vs. projection — why aggregation is correct
group.f (column extraction) yields one value per row — a bag — while
project {f} yields a set (dedup'd). They differ exactly on duplicates, and
that is what makes aggregation correct without multiplicity bookkeeping: base
tables have keys (#2), so a group's rows are distinct real rows, so
group.salary has one entry per real row and sum(group.salary) is right.
Decisions #1 + #2 buy footgun-free aggregation; the explicit-count hatch (#3) is
only needed once you deliberately collapse to a keyless form.
An aggregate's argument is a per-row expression over the group: group.f
binds the current row's field, so sum(group.qty * group.price) reduces
qty*price row by row. Equivalent comprehension form:
sum(r.qty * r.price for r in group).
The aggregate functions
Relational aggregates must be commutative monoids (sets are unordered ⇒ the
reduction must be order-independent): count, sum, min, max, mean,
and/or, count(unique c). Order-dependent reductions are not relational
aggregates — they belong to the Seq layer. Empty-input behavior uses options,
not a silent NULL:
| aggregate | empty input | type |
|---|---|---|
count |
0 |
Int |
sum |
0 |
Num |
mean/min/max |
None |
T? |
group by-produced groups are non-empty by construction, so the metadata layer
narrows mean/min/max back to T there. Aggregating an option column
(Col T?) is a type error — resolve first (group.salary ?? 0, or filter) —
which removes SQL's inconsistent null-skipping.
Whole-relation aggregation
aggregate { … } is group by () — one group, one row out:
sales |> aggregate { total = sum(group.amount), n = count(group) }
Nested / multi-level aggregation
g is a relation, so any relational expression — including another group by —
applies to it. Aggregating subtables composes:
orders
|> group by region {
revenue = sum(group.amount),
by_month = (group |> group by month { m_rev = sum(group.amount) }), -- inner `group` shadows outer
}
Fan-out detection (the safety net)
A field f from base table T (key KT) is determined by KT. After a many-to-
many join the relation's key grows to KT ∪ KS, so f's values repeat across the
extra dimension. Rule: a duplicate-sensitive aggregate (sum, mean,
count) over a field whose determinant key is coarser than the relation's
current key is flagged — its values are being double-counted; duplicate-
insensitive aggregates (min, max) are exempt. This is the concrete cash-out
of key-inference (#11): the double-count SQL does silently becomes a warning.
DML & constraints
Everything is relational assignment
A database is a set of relvars (relation variables); table T { … } declares
one. A relvar holds a relation value; the sole mutation primitive is
relational assignment, T := <relational expression>, where expr's heading
must match T's exactly (statically checked — no positional column matching). The
three DML verbs are sugar over assignment, reusing the same operators and
expressions as queries:
insert into R from <expr> --> R := R union <expr>
insert into R { id = 7, … } --> R := R union (rel{ …one tuple… })
delete from R where p --> R := R |> where not p
update R where p set { f = e } --> R := (R |> where not p)
union (R |> where p |> with { f = e })
with { f = e } overrides an existing field (the override sibling of extend,
which only adds); using the wrong one is a typed error, catching the set-typos
SQL silently accepts. Set semantics makes these well-behaved: re-inserting an
identical tuple is a no-op (R ∪ {t}, t ∈ R); inserting a different tuple
with an existing key violates the key constraint and is rejected (no silent
upsert); deleting absent rows is a no-op. Insert/delete are idempotent.
Constraints are booleans that must hold
A constraint is a boolean relational expression over the database that must be true after every transaction. Keys and FKs are named, incrementally-checked special cases of that one idea:
Key (#2):
key (id),key (a, b)— the projection onto the key is injective; multiple candidate keys allowed.Foreign key:
dept: Id references departments(id)— an inclusion constraint,R{dept} ⊆ departments{id}; a nullable FK (dept: Id?) admitsNone. Parent-side referential actions — restrict (default), cascade, set-null — are declared repairs applied within the transaction before the check.Check: an arbitrary predicate. Tuple-level (
check salary > 0) is the common case; database-level constraints spanning relvars are first-class — exactly what SQL cannot express:check count(employees |> where role == "CEO") <= 1 check (employees{dept} ⊆ departments{id}) -- an FK, spelled out
Any assignment whose result would falsify a constraint is rejected; the relvar is left unchanged.
Transactions are the checking boundary
Because constraints span relvars, the unit of checking is the transaction: a batch of assignments applied together, all constraints checked once at commit, any violation rolling the whole batch back atomically. A bare assignment is a singleton transaction. This is what lets a parent and child be inserted together past an FK, and what makes cascades coherent.
transaction {
insert into departments { id = 3, name = "Eng" }
insert into employees { id = 7, name = "Ann", dept = 3 }
} -- both visible together; constraints checked once
Views
A view is a named relation expression, not stored: view active = employees |> where active. Read-only to start; making views updatable (mapping a
view-assignment back onto base relvars — the view-update problem) is deferred.
What this buys
Mutation reuses the entire query language — no separate DML dialect (contrast
SQL's bespoke UPDATE … SET). Three verbs collapse to one primitive, so they
compose and can't interact surprisingly. Constraints unify to "invariants that
hold after every transaction," with multi-relvar invariants first-class. Static
heading-match kills positional-column bugs. The transaction is simultaneously the
atomicity and the constraint boundary.
Recursion
Restricted to a least fixpoint of a monotone, range-restricted relational
expression — enough for reachability, ancestry, and BOM explosion, and
guaranteed to terminate. A recursive relation is defined with recursive:
-- edges: Rel {src: Id, dst: Id, key (src, dst)}
recursive reach =
edges
|> union (reach |> join edges on dst == edges.src
|> select { src, dst = edges.dst })
What keeps it total:
- Monotone: the recursive name may not appear under
minus/not— negation is stratified (it may only refer to a lower stratum), so each iteration only adds tuples. - Range-restricted: every output value comes from the base/input, none are invented (no unbounded arithmetic in the recursive arm). Monotone + range-restricted + finite input ⇒ the least fixpoint is reached in finitely many steps.
The result is an ordinary relation, so it composes with everything else (filter /
join / group the closure). The common case has sugar — closure(edges: src -> dst) expands to the fixpoint above. Recursion with aggregation (shortest paths,
PageRank-style iteration) needs well-founded aggregate semantics and is deferred.
Windows & the Seq layer
Principle
Relations are unordered (#4); ordering produces a Seq, and all
order-dependent computation lives there — never in relational aggregates, which
must be commutative monoids (#14). sort by e : Rel R -> Seq R is the gateway,
and the Rel/Seq type boundary is what enforces #14: you cannot write a running
total over a Rel, nor a duplicate-collapsing set op over a Seq, without
crossing the boundary explicitly. A Seq supports two kinds of computation:
- Scans (
Seq R -> Seq R'): give every row a value from its position, neighbors, or a frame — running totals, moving averages,lag/lead, ranks. Row count preserved. - Ordered folds (
Seq R -> scalar): reduce in order —first,last,concat(sep). (Order-independent reductions remain relational aggregates; these are the ones that genuinely need order.)
Plus slicers take n, drop n, take while p, which subsume top-N and
pagination (sort by x desc |> take 10).
Window expressions (scans)
A scan adds a column via extend, where the expression sees ordered context:
- Frame aggregate:
agg(e) over <frame>, an aggregate over a range relative to the current row:rows[..current]— running (unbounded preceding → current)rows[current..]— current → endrows[-3..0]— 3 preceding → current (a width-4 moving window)rows[-1..1]— immediate neighbors
- Offsets:
lag(e, k, default),lead(e, k, default)— value k rows away. - Ranking:
row_number(),rank(),dense_rank(),ntile(n)— read the sort order and ties.
Partitioning reuses group
PARTITION BY is just "window within each group independently," and we already
have grouping. So:
trades
|> partition by symbol order by ts
extend running = sum(amount) over rows[..current],
prev = lag(price, 1),
r = rank()
is sugar for group / sort-within / position-aware extend / ungroup:
trades
|> group symbol into g
|> with g = (g |> sort by ts
|> extend running = sum(amount) over rows[..current],
prev = lag(price, 1),
r = rank())
|> ungroup g
The result is a Seq carrying within-partition order; cross-partition order
is unspecified until a final sort by. Rows stay distinct, so it can be taken
back to a Rel with as rel if you want a set.
Idioms
-- first row per partition
events
|> partition by user order by ts extend rn = row_number()
|> where rn == 1
-- 7-day moving average
… |> partition by sku order by day extend ma = mean(qty) over rows[-6..0]
-- in-order audit trail per case (ordered fold)
steps |> group by case { trail = (group |> sort by ts |> concat(name, " -> ")) }
Why this beats SQL windows
- No bespoke
OVER (PARTITION BY … ORDER BY … ROWS BETWEEN …)sub-language: partitioning is thegroupyou already know, ordering issort by, the window value is anextend. - The
Rel/Seqboundary makes order-dependence visible in the type and enforces the aggregate/scan split (#14) instead of leaving it to convention. - Frames are terse, and ordered folds (
first/concat) are justSeqfunctions, not special cases.
Concrete syntax (grammar)
A first concrete grammar — enough to parse the examples above and to pin the
ambiguities. Notation: (…) grouping, * zero-or-more, ? optional, | choice;
UPPERCASE = tokens; -- starts a line comment.
Module & declarations
module ::= item*
item ::= domainDecl | tableDecl | viewDecl | letDecl | recDecl
| constraintDecl | stmt
domainDecl ::= "domain" IDENT "=" type ("check" "(" expr ")")?
tableDecl ::= "table" IDENT "{" field ("," field)* ("," tableCon)* ","? "}"
field ::= IDENT ":" type ("references" IDENT "(" IDENT ")" refAction*)?
refAction ::= "on" ("delete" | "update") ("restrict" | "cascade" | "set" "null")
tableCon ::= "key" "(" IDENT ("," IDENT)* ")" | "check" "(" expr ")"
viewDecl ::= "view" IDENT "=" relExpr
letDecl ::= "let" IDENT "=" relExpr -- names a relation (replaces CTEs)
recDecl ::= "recursive" IDENT "=" relExpr -- monotone, range-restricted
constraintDecl ::= "constraint" IDENT "check" "(" expr ")"
Statements
stmt ::= IDENT ":=" relExpr -- relational assignment
| "insert" "into" IDENT (record | "from" relExpr)
| "delete" "from" IDENT "where" expr
| "update" IDENT "where" expr "set" record
| "transaction" "{" stmt* "}"
| relExpr -- a query to evaluate
Relation expressions (pipelines)
relExpr ::= relPrim ("|>" stage)*
relPrim ::= IDENT | "(" relExpr ")" | relLit
| "closure" "(" IDENT ":" IDENT "->" IDENT ")"
relLit ::= "rel" "{" record ("," record)* ","? "}"
stage ::= "where" expr
| "extend" bindings | "with" bindings
| "select" block | "project" nameBlock | "remove" nameBlock
| "rename" IDENT "->" IDENT
| "join" relPrim ("as" IDENT)? ("on" expr | "natural")
| ("union" | "minus" | "intersect") relPrim
| "group" "by" nameList block? -- sugar: binds `group`
| "group" nameList "into" IDENT -- primitive nest
| "ungroup" IDENT
| "aggregate" block -- = group by ()
| "sort" "by" sortKey ("," sortKey)*
| "take" ("while" expr | expr) | "drop" expr
| "partition" "by" nameList "order" "by" sortKey ("," sortKey)*
"extend" bindings -- window; yields Seq
sortKey ::= expr ("asc" | "desc")?
bindings ::= binding | block
binding ::= IDENT "=" expr
block ::= "{" entry ("," entry)* ","? "}" ; entry ::= binding | IDENT -- punning
nameBlock::= "{" nameList ","? "}" ; nameList ::= IDENT ("," IDENT)*
record ::= "{" entry ("," entry)* ","? "}" -- value tuple
Types
type ::= (IDENT | "Rel" heading | "Seq" heading) "?"?
heading ::= "{" IDENT ":" type ("," IDENT ":" type)* ","? "}"
Scalar expressions (low → high precedence)
expr ::= or ("??" or)? -- option default
or ::= and ("or" and)*
and ::= cmp ("and" cmp)*
cmp ::= add (cmpOp add | "is" ("some" | "none"))? -- non-associative
cmpOp ::= "==" | "!=" | "<" | "<=" | ">" | ">=" | "in" | "subset"
add ::= mul (("+" | "-") mul)*
mul ::= un (("*" | "/" | "%") un)*
un ::= ("-" | "not") un | cast
cast ::= post ("as" type)?
post ::= prim trailer*
trailer ::= "." IDENT | "(" callArgs? ")" | "over" frame
prim ::= LITERAL | IDENT | "(" expr ")" | quant
quant ::= ("all" | "any") "(" expr "for" IDENT "in" relExpr ")"
frame ::= "rows" "[" bound? ".." bound? "]" ; bound ::= "current" | INT
callArgs ::= expr ("," expr)* -- function / aggregate args
| expr "for" IDENT "in" relExpr -- aggregate comprehension form
LITERAL ::= INT | FLOAT | TEXT | "true" | "false" | "none" | "some" "(" expr ")"
Numeric literals allow _ digit separators (50_000); the 50k-style shorthand
in earlier prose was informal.
Semantic clarifications
These resolve the ambiguities the prose left open; they are the canonical reading and supersede any looser earlier example.
.is type-directed (#23).e.fis field access on a record (→ scalar), column extraction on a relation (→Col T, a per-row bag), or option- propagating access onT?(→T'?). AColis only ever an aggregate argument; used as a scalar it is a type error — that is the no-must-group rule.=vs:(#22).=binds a value ({ id = 7 });:ascribes a type ({ id: Id }). A barexin a block is punning forx = x.- Pipelines are expressions; no
from(#24). A query is a relation expression, soR |> …nests anywhere a relation is expected (join operands, nested groups,let/viewbodies).fromappears only indelete from/insert … from. groupkeyword;intofor naming.group by K { … }binds the per-group relation togroup; nest with thegroup K into nameprimitive to avoid shadowing.- Aggregates are calls.
count(group),mean(group.salary),sum(group.qty * group.price); the arguments are per-row expressions over the group that the aggregate reduces. The comprehension formsum(e for r in R)is equivalent. selectvsproject.project {a, b}is pure π (existing names only);select { a, x = e }builds an exact output record and desugars toextend(computed) +project(to the listed fields).- Frames. In
over rows[a..b]:current≡ 0, negative = preceding, positive = following, an empty side = unbounded —rows[..current]running,rows[-3..0]width-4 trailing,rows[-1..1]neighbors. - Option discipline.
==/!=on options are total Bool (structural:none == some xisfalse), and a non-option widens to an option implicitly (T <: T?— the only implicit conversion). Ordering, arithmetic, and value-use on aT?need resolution first (??, or narrowing viawhere x is some); there is no silent null propagation. - Qualified names (#25). A field carries its source relvar/alias. A bare name
must resolve unambiguously; after a
join, a name present on both sides is writtensource.field, andjoin R as aaliases the source (needed for self-joins). In row context, a leading identifier that names a source qualifier is qualified field access; otherwise.is the type-directed access of clarification 1.
Showcase
One schema exercised end-to-end in canonical syntax: schema + constraints, then a battery of queries, windows, recursion, and DML. Everything here is intended to parse under the grammar above and type-check under the rules above.
Schema
domain Id = Int
domain Money = Int check (value >= 0) -- minor units; `value` is the domain value
-- FK direction is declarative; declaration order and cycles (departments.head ⇄
-- employees.dept) are fine — the schema is checked as a whole.
table departments {
id: Id,
name: Text,
budget: Money,
head: Id? references employees(id), -- nullable FK: a dept may have no head
key (id),
key (name), -- a second candidate key
}
table employees {
id: Id,
name: Text,
email: Text,
dept: Id references departments(id) on delete restrict,
manager: Id? references employees(id), -- self-ref, nullable (the CEO has none)
salary: Money,
hired: Time,
key (id),
key (email),
check (salary > 0),
}
table projects {
id: Id,
name: Text,
dept: Id references departments(id),
key (id),
}
table assignments { -- employees ⋈ projects (many-to-many)
emp: Id references employees(id) on delete cascade,
project: Id references projects(id) on delete cascade,
hours: Int check (hours >= 0),
key (emp, project),
}
-- a database-level invariant SQL cannot state directly:
constraint one_ceo check (count(employees |> where manager == none) <= 1)
let over_budget =
employees
|> group by dept { payroll = sum(group.salary) }
|> join departments on dept == departments.id
|> where payroll > budget
constraint payroll_within_budget check (count(over_budget) == 0)
Queries
-- Q1 average salary per dept, only well-paid teams, sorted
employees
|> where salary > 50_000
|> group by dept { headcount = count(group), avg_pay = mean(group.salary) }
|> where avg_pay > 70_000 -- no HAVING; just `where` after group
|> join departments on dept == departments.id
|> select { team = departments.name, headcount, avg_pay }
|> sort by avg_pay desc -- result is a Seq
-- Q2 each employee with their manager (self-join; a None manager matches no row)
employees
|> join employees as mgr on employees.manager == mgr.id
|> select { who = employees.name, boss = mgr.name }
-- use `left join` instead to keep the CEO, with boss = none
-- Q3 total hours and head-count per project
assignments
|> group by project { total_hours = sum(group.hours), people = count(group) }
|> join projects on project == projects.id
|> select { project = projects.name, total_hours, people }
|> sort by total_hours desc
-- Q4 fan-out: the compiler catches a double-count at compile time
employees
|> join assignments on id == assignments.emp -- 1:many ⇒ each emp recurs
|> group by dept { payroll = sum(group.salary) } -- ⚠ salary double-counted per assignment
-- correct (employee grain, no fan-out):
employees |> group by dept { payroll = sum(group.salary) }
-- Q5 top-3 earners per department (window ⇒ Seq)
employees
|> partition by dept order by salary desc extend rank = rank()
|> where rank <= 3
|> sort by dept, rank
-- Q6 cumulative payroll as people are hired (window scan)
employees
|> partition by dept order by hired
extend cumulative_payroll = sum(salary) over rows[..current]
|> select { dept, name, hired, cumulative_payroll }
-- Q7 all transitive reports under employee #1 (recursion + qualified disambiguation)
let reports = employees |> where manager is some -- narrows `manager` to Id
|> select { mgr = manager, emp = id }
recursive chain =
reports
|> union (chain |> join reports on chain.emp == reports.mgr
|> select { mgr = chain.mgr, emp = reports.emp })
chain
|> where mgr == 1
|> join employees on emp == employees.id
|> select { report = employees.name }
Mutation
-- 10% raise for one department
update employees where dept == 3 set { salary = salary * 11 / 10 }
-- hire + first assignment atomically (the assignment's FK to the new employee
-- holds because both land in one transaction, checked once at commit)
transaction {
insert into employees {
id = 42, name = "Bo", email = "bo@co", dept = 3,
manager = 7, salary = 9_000_000, hired = date("2026-06-01")
}
insert into assignments { emp = 42, project = 5, hours = 0 }
}
-- retire a project; its assignments cascade away
delete from projects where id == 5
Callouts
- Q2
employees.manager == mgr.id:manager: Id?againstmgr.id: Idwidens by implicitsome(#26); equality is total, so a None manager equals no row and the CEO drops out of the inner join.managermust be qualified (both sides areemployees). - Q4 the
⚠is a compile-time warning from key-inference (#11, #15), not a runtime surprise:salaryis determined byemployees.idbut the joined rows are keyed by(emp, project). - Q5 / Q6 output is a
Seq(#20);rank()andsum(…) overareSeq-layer operations, unavailable on aRel. - Q7
where manager is somenarrowsmanagerfromId?toId(#26); the recursive arm needs qualifiers (chain.emp,reports.mgr) because both inputs carrymgrandemp(#25);uniontype-checks because both arms have heading{mgr, emp}.
Open questions
- Name.
RQLis a placeholder. - Relationship/FK layer (deferred). The additive sugar from decision #12: declared references, reverse navigation, path expressions, cardinality-typed many-sides. When to build it; named-relationship & ambiguity rules; path resolution.
- Comprehension surface. Add a calculus surface over the same IR later, or never?
- Value-based & peer frames. SQL
RANGE/GROUPS(frames by order-key value or tie-peers) deferred; positionalrows[…]frames first. - Recursion with aggregation (shortest paths, fixpoint +
min/sum). Deferred — needs well-founded aggregate-in-recursion semantics. - Updatable views. The view-update problem — when can a view assignment map back to base relvars? Deferred.
- Defaults & generators. Column defaults (constants / pure exprs) vs. impure
generators (identity/autoincrement,
now()), which must arrive as explicit host effects. Design TBD. - Upsert / merge sugar. A declared-intent form over assignment.
- Concurrency / isolation. Likely out of scope (RQL is a language, not an engine), but the transaction boundary needs an isolation story if it ever runs concurrently.
- Implementation substrate. Could ride kit's pipeline as a CG-API frontend (cf. RSN) or stand alone. Out of scope until the language is nailed.