RSN — Ryan's Scripting Notation (planned work)
A small, statically-typed scripting language that lets kit replace its remaining external build/test dependencies — make, POSIX shell, and Python — with code it owns and ships, and that runs on kit's own compiler pipeline. RSN is the language; the build system that grows on top of it is a later layer (sketched at the end). This doc is the living design.
The architecture, in one line
RSN is another frontend on kit's pipeline (alongside C, Wasm, toy): RSN source → public CG API → IR → execution. To make a garbage-collected language fit, kit's IR gains a first-class managed-reference type, so RSN runs on the existing IR interpreter (arena-allocated first, then precisely GC'd) and can compile to fast native code through the existing backends later. Static typing is what makes this pay off: typed operations lower to real IR ops (not opaque runtime calls), so the optimizer/backends optimize RSN, and the managed-ref type makes every GC root statically known.
Because this turns the IR into a GC-aware IR, it is staged, and Stage 1 is itself split to keep the highest-blast-radius change (touching the shared CG type system) out of the critical path until the language has proven itself:
- Stage 1a (now): typed RSN frontend + interp, arena-allocated, no collector — managed values live in a run-scoped arena freed at exit. The interp carries a per-function ref-bitmap as local metadata; the C compiler's CG type system is untouched. Cooperative fibers land here. This is where the language gets exercised on real build/test scripts.
- Stage 1b (when reclamation is actually needed): the managed-ref IR type + GC op dialect + the interp's root scan and mark-sweep collector, replacing the arena. Precise by construction. Native/Wasm/C backends reject managed-ref functions.
- Stage 2 (later, optional): native/JIT codegen for managed code — safepoints
- GC stack maps in the optimizer and the aa64/x64/rv64 backends — so a build tool written in RSN compiles to a native binary.
Why
Three external tools carry three jobs today: make (1.5k lines: the
dependency DAG, host/config branching, 12k LOC under KIT_*_ENABLED gating, the cross-arch
matrix), shell (test/: the corpus harness engine, process
orchestration, golden-diffing), and Python (parsing tool output and wrangling
text/JSON/CSV). RSN's goal is one owned, typed language that does all three, so
kit's only external dependency is a seed C toolchain. 3 → ~0.
Locked decisions
Architecture & runtime
- Typed RSN as a CG-API frontend (
lang/rsn/, with a type checker); no bespoke bytecode — the IR / interpInterpInsnis RSN's bytecode. - GC-aware IR: a managed-reference type + a small GC op dialect in the CG type system + IR, firewalled so the C path pays nothing.
- Precise, non-moving mark-sweep GC, precise by construction.
- Cooperative single-threaded fibers from day one, on the interp's explicit
swappable stack and its reserved
KIT_INTERP_BLOCKEDyield state. Cores are saturated by spawned child processes, not threads. - Staged delivery: interp first — arena-allocated (1a), then precise GC (1b); native managed codegen later (Stage 2).
Language
- Syntax: brace / C-ish (Go/Wren family), strictly LL(1), significant newline termination.
- Bindings:
let(immutable) +var(mutable). Object contents stay mutable regardless — this governs only rebinding the name. - Algebraic data types:
struct(named product) +enum(sum / variants) +(T, U, …)tuple (anonymous product), all destructured by the onepatterngrammar (match/if let/let/for). - Nullability: refs are non-null; absence is
T?(Option<T>), unwrapped with??(coalesce) andif let. Distinct fromResult(absence vs error). - Errors:
Result<T, E>+ Zig-styletry(propagate) /catch(recover), not exceptions.Result/Optionare sum types;matchalso destructures them. any+match:anyis the dynamic escape hatch (JSON, parsed text);matchnarrows it (string s,int n, …).matchis the single multi-way construct and general destructurer — literals/strings, ADTs, tuples,anytype-narrows, and or-patterns (a | b -> …); no fallthrough, noswitch.- Generics — Stage 1 ships only the builtin containers (
list<T>,map<K, V>) as hand-written specializations (packed/unboxedlist<int>/list<float>, a shared boxed instantiation otherwise). The general instantiation strategy (.NET-style hybrid vs. plain monomorphization, plus constraints for user-defined generic fns) is deferred until the feature exists — not locked ahead of need. - Expression
if/match(value-producing); blocks appear only in control-flow slots, so a bare{}stays a map literal and?is reserved for optionals (no ternary).matchis the single multi-way construct — there is no separateswitch; literal/string arms plus or-patterns (a | b -> …) cover value-dispatch, arms never fall through. - Modules: file = module;
pubexports (private by default);import "x.rsn" as x, qualified access. - Patterns: Lua-style string patterns, not a full regex engine.
- Methods via UFCS:
a.f(b)≡f(a, b).a.nameis a field load whennameis a field ofa's type, otherwisea.name(...)resolvesnameas a free function withaas its first argument. One rule coversobjs.push,src.replace,xs.map— no method-declaration form, no receiver concept. Tie-break: a field shadows a free function of the same name. - String interpolation:
"...${expr}..."(a lexer feature; the interpolatedexpris a full expression,\$escapes a literal$). Argv is still built with lists (["kit", "cc", src]), never by interpolating into one command string — that keeps shell-quoting bugs out by construction. - First-class tuples + one pattern grammar:
(T, U, …)(arity ≥ 2) is a storable, nestable anonymous product;()is unit and(e)is grouping. Destructure-only (no.0/.1). A singlepatternconcept serveslet/var/for/match/if let(tuples always written(a, b));mapyields(K, V)sofor (k, v) in mis just a tuple pattern.(a, b) = (b, a)is parallel assignment (RHS fully evaluated first).
Superseded by this design: the earlier NaN-boxed dynamic value model and dynamic typing — typing removes the need for a universal tagged value.
What we reuse
- The frontend→CG→opt→interp pipeline. RSN attaches as the compiler's
interp sink (
kit_interp_program_attach) so each compiled RSN function is lowered toInterpInsn, exactly askit run --no-jitdoes for C. - The CG type system (
KitCgTypeId) and recording IR (src/cg) — extended with the managed-ref type, not replaced. - The IR interpreter (
src/interp): a register VM with direct-threaded dispatch, an explicit swappable stack documented as a "swap-ready substrate for fibers/virtual threads," and a reservedBLOCKEDyield status. The fiber runtime and a yieldingrun()complete a half-built design. src/core/(arena/Heap/Slice/StrBuf/VEC/hashmap/diag) — the frontend's and runtime's substrate.rt/lib/— the home for RSN's runtime helpers (container growth, string ops, the collector,any-boxing, the iterator protocol).- The optimizer and native backends (Stage 2) — typed RSN lowers to real IR.
- CAS (
include/kit/cas.h) — the build layer's content-addressed cache. - The host-vtable convention (
KitCasHost,KitInterpHost) — RSN's host vtable supplies all I/O and the scheduler's event source; which fields are bound is the capability gate.
The IR extension: managed references + GC
The heart of the design, specified with its firewall discipline.
The type. A new CG type kind — a managed ref: an opaque, GC-owned pointer
to a heap object of a given shape. Distinct from a raw C pointer. RSN's
string/list/map/fn/fiber/chan/record/enum/tuple/any values are
managed refs; int/float/bool are unmanaged scalars.
The ops (a small dialect): managed_alloc(shape) -> ref, ref_field_load/ store, ref_elem_load/store, ref_eq, ref_null. Safepoints are implicit at
allocations, calls, and loop back-edges (no explicit op in the interp; once the
1b collector is on it can collect at any allocation because all roots are in
registers).
Object shapes for precise tracing. Every managed object's header points at a shape descriptor listing which fields are managed refs (or a trace function for containers). Sum types use a per-variant shape: the collector reads the active tag, then traces that variant's ref-map. Records use a single field-map.
Root finding in the interp. Each InterpFunc carries a static ref-bitmap
marking which PRegs hold managed refs. Roots = every live ref-PReg across all
live fibers' frames + globals + the string-intern table + a small C-roots
scratch for native functions mid-construction. Precise; no stack maps at the
interp level.
Two cost-containing invariants (locked):
- Non-moving → no write/read barriers, no pointer rewriting in codegen; the backend only ever identifies roots, never updates them.
- No interior pointers → element/field access re-derives from the base ref, so a stack map only ever needs base refs.
Firewall discipline. The managed dialect is an optional IR feature: a
function with no managed refs needs no GC support and pays nothing. The C frontend
never emits managed refs, so C → IR → {native, Wasm, C} is untouched and
zero-cost. A backend without managed support asserts out cleanly on a managed
function. The managed-IR capability is its own flag (KIT_MANAGED_IR), distinct
from the KIT_RSN_ENABLED frontend that consumes it. The standing test: a
C-only build compiles and links with the managed dialect compiled out.
Stage 2 (native). Precise native GC needs safepoints + stack maps: the optimizer's liveness + register allocator record which managed refs are live at each safepoint/call, per arch. This is the real cross-cutting work and the reason native is staged separately. The two invariants above keep it to root identification rather than the full moving-GC codegen problem.
Type system
Static, with local inference so scripts stay terse.
- Primitives:
int(i64),float(f64),bool,string. - Containers (builtin generics):
list<T>,map<K, V>. - Functions:
fn(A, B) -> R; closures capture bindings by reference. - Records (product):
struct Name { field: T, … }— named-field managed objects with a single shape. - Sums (variant):
enum Name { Variant, Variant(T, …), … }— tagged unions with optional payloads; per-variant shape. Constructors are values (Leaf(5),Empty); destructured bymatch. - Tuples (anonymous product):
(T, U, …), arity ≥ 2 — a managed ref to an anonymous positional record (reuses the record shape, trivially traced). Storable and nestable (list<(string, string)>,(int, (int, int))).()is unit and(e)is grouping, so 1-tuples don't exist. Destructure-only — no.0/.1(consistent with variant payloads, and it dodges thepair.0.1float-literal lexer hazard); pull one element withlet (_, v) = p. Packed all-scalar tuples (e.g. unboxed(int, int)) are a deferred niche opt, same bucket as packedlist<int>. - Builtin sums:
Option<T> = { Some(T), None }(withT?sugar and??/if let),Result<T, E> = { Ok(T), Err(E) }(withtry/catch). - Builtin error type:
struct Error { code: int, message: string }— the defaultEfor host/process operations (run,file_io); user code may use anyE. - Methods are UFCS: there is no method-declaration form.
a.f(args)lowers tof(a, args)whenfis a free function in scope;a.nameis a plain field load whennameis a field. A field shadows a free function of the same name. any: a managed ref to a{ tag, payload }box for genuinely dynamic data; narrowed bymatchtype-patterns that name the real (lowercase) types (string s,int n,list xs, …). Boxing is confined toany(andOption<scalar>), not universal.- Inference: locals infer from initializers (
let n = 5⇒int); function parameters and return types are annotated. Explicit type arguments are not written at call sites (inferred), so<is unambiguous in expressions — no turbofish. - Generics — Stage 1 is builtin containers only.
list<T>andmap<K, V>ship as hand-written specializations:list<int>/list<float>are packed/unboxed, ref element types share one boxed instantiation. The general scheme for user-defined generics (.NET-style value/ref hybrid vs. plain monomorphization, and constraints) is deferred to its own phase — when it lands, the value/ref split and ref-vs-scalar-parameterized shape descriptors are the intended direction, but nothing here is locked yet. - Modules: each
.rsnfile is a module;pubexports (private by default);import "util.rsn" as utilthenutil.name. Qualified, clash-free. Selectivefrom … import { … }is a possible later sugar.
Value model
Typed, so values are their natural machine representations — no universal tagged word:
| RSN type | Representation |
|---|---|
int / float / bool |
i64 / f64 / i8 (unmanaged scalars) |
string, list, map, fn, fiber, chan, record, enum |
managed ref |
(T, U, …) tuple |
managed ref (anonymous positional record); all-scalar packing deferred |
Option<T> / T? |
managed ref (a sum); Option<scalar> boxes |
any |
managed ref to a {tag, payload} box |
int is a true 64-bit integer (the reason typing was attractive — no NaN-box, no
bigint spill, no pointer masking). Mixed int/float arithmetic requires an
explicit float(x). Strings are immutable, so they are always safe to share
across fibers; interning is selective, not universal — short strings and any
string used as a map key are interned (cheap equality, CAS keys), while large
strings (file bodies, captured subprocess output) are held by reference without
hashing them into the intern table. Option<scalar> boxing is later
niche-optimizable; builtin list<int> stays packed/unboxed.
Errors and results
Values, typed, with Zig's try ergonomics — no exceptions, no unwind tables.
Result<T, E>(a sum):Ok(v)/Err(e).try exprunwrapsTonOk, or returns theErrfrom the enclosing function onErr; it is type-checked (the enclosing return must be aResultwith a compatibleE). At the top level, atrythat hitsErrhalts the run nonzero.expr catch handlerrecovers, with an optional|e|binder.- Process primitives:
run(cmd) -> Result<Proc, Error>(Okon exit 0, fail fast withtry);exec(cmd) -> Proc(raw{code, out, err}, for harnesses that expect nonzero exits). matchdestructuresResult/Option/any sum generally;try/catch/??/if letare the ergonomic shortcuts.
fn link(objs: list<string>) -> Result<string, Error> {
let a = try run(["kit", "ar", "rcs", "lib.a"] + objs) # propagate on failure
return Ok(a.out)
}
Surface syntax
Brace-delimited, keyword-led, semicolon-free, mandatory braces on block bodies.
Type annotations only in type positions. Comments are # to end-of-line (line or
trailing); there are no block comments. Strings interpolate with "${expr}".
Method calls are UFCS (xs.map(f) ≡ map(xs, f)).
struct Target { name: string, inputs: list<string>, optimize: bool }
enum Json {
Null
Bool(bool)
Num(float)
Str(string)
Arr(list<Json>)
Obj(map<string, Json>)
}
fn render(j: Json) -> string {
match j { # match is an expression
Null -> "null"
Str(s) -> quote(s)
Num(n) -> str(n)
Arr(xs) -> "[" + join(map(xs, render), ",") + "]"
_ -> "..."
}
}
fn build(t: Builder) {
let objs: list<string> = [] # annotation: empty list element type
for src in glob("src/**/*.c") {
let obj = src.replace(".c", ".o")
t.target(obj, [src], fn() { try run(["kit", "cc", "-c", src, "-o", obj]) })
objs.push(obj) # mutating contents (let binding is fine)
}
let mode = if t.release { "opt" } else { "dbg" } # value-if; no ternary
t.target("libkit.a", objs, fn() { try run(["kit", "ar", "rcs", "libkit.a"] + objs) })
}
Grammar (LL(1))
LL(1) by construction; type syntax confined to type positions. The lexer
suppresses the statement-terminating newline after an open bracket, a trailing
binary operator, or when the next non-blank token is a leading . (so
method/field chains may wrap across lines); the EBNF omits the newline token.
Comments (# to end-of-line; no block form) and string interpolation
("${expr}", \$ escapes a literal $) are lexer-level; an interpolated ${…}
lexes as a parenthesized sub-expression.
program = { item } ;
item = importDecl | [ "pub" ] decl | stmt ;
importDecl = "import" STRING "as" IDENT ;
decl = fnDecl | structDecl | enumDecl ;
fnDecl = "fn" IDENT [ generics ] "(" [ params ] ")" [ "->" type ] block ;
structDecl = "struct" IDENT [ generics ] "{" { IDENT ":" type [ "," ] } "}" ;
enumDecl = "enum" IDENT [ generics ] "{" { variant [ "," ] } "}" ;
variant = IDENT [ "(" type { "," type } ")" ] ;
generics = "<" IDENT { "," IDENT } ">" ;
params = IDENT ":" type { "," IDENT ":" type } [ "," ] ;
stmt = letDecl | varDecl | whileStmt | forStmt | returnStmt
| "break" | "continue" | exprStmt | block ;
letDecl = "let" pattern [ ":" type ] "=" expr ; (* pattern irrefutable *)
varDecl = "var" pattern [ ":" type ] "=" expr ; (* pattern irrefutable *)
whileStmt = "while" cond block ;
forStmt = "for" pattern "in" cond block ; (* pattern irrefutable *)
returnStmt = "return" [ expr ] ;
exprStmt = expr [ assignOp expr ] ; (* tuple-of-lvalues LHS = parallel assign *)
assignOp = "=" | "+=" | "-=" | "*=" | "/=" ;
block = "{" { stmt } "}" ; (* value = trailing expr-stmt, else unit *)
type = baseType { "?" } ;
baseType = IDENT [ "<" type { "," type } ">" ]
| "fn" "(" [ type { "," type } ] ")" "->" type
| "(" [ type "," type { "," type } ] ")" ;
(* left-factored: "()" = unit; "(" T "," U … ")" = tuple (arity >= 2).
No "(" T ")" form — types have no paren-grouping, so a lone "(int)"
is a syntax error; unit-vs-tuple is decided after "(" on ")" vs a type. *)
expr = coalesce [ "catch" [ "|" IDENT "|" ] expr ] ;
coalesce = orExpr { "??" orExpr } ;
orExpr = andExpr { "||" andExpr } ;
andExpr = eqExpr { "&&" eqExpr } ;
eqExpr = relExpr { ("==" | "!=") relExpr } ;
relExpr = addExpr { ("<" | "<=" | ">" | ">=") addExpr } ;
addExpr = mulExpr { ("+" | "-") mulExpr } ;
mulExpr = unary { ("*" | "/" | "%") unary } ;
unary = "try" unary | ("!" | "-") unary | postfix ;
postfix = primary { callOp | indexOp | fieldOp } ;
callOp = "(" [ args ] ")" ;
indexOp = "[" expr "]" ;
fieldOp = "." IDENT ;
args = expr { "," expr } [ "," ] ;
primary = INT | FLOAT | STRING | "true" | "false"
| ifExpr | matchExpr | spawnExpr
| IDENT [ structLit ] (* structLit suppressed in cond; see below *)
| parenExpr
| listLit | mapLit | fnLit ;
parenExpr = "(" expr [ "," expr { "," expr } ] ")" ;
(* left-factored: no comma => grouping (value is the inner expr);
>= 1 comma => tuple (arity >= 2). The grouping-vs-tuple choice is
made on the token after the first expr: ")" vs ",". No 1-tuples,
so "(a,)" is rejected. *)
spawnExpr = "spawn" block ; (* yields a fiber handle *)
ifExpr = "if" ( "let" pattern "=" cond | cond ) block
[ "else" ( ifExpr | block ) ] ;
cond = expr ; (* a bare `IDENT { … }` struct literal is NOT taken here; a
struct literal in condition position must be
parenthesized: `if (Foo{…}).ok { … }` *)
matchExpr = "match" expr "{" { matchArm } "}" ;
matchArm = pattern "->" ( block | expr ) [ "," ] ;
pattern = patternAtom { "|" patternAtom } ; (* or-pattern; all alternatives
must bind the same names+types.
`|` here is pattern position,
distinct from catch's `|e|`. *)
patternAtom = "_" | literal
| IDENT IDENT (* type-narrow: type binder *)
| IDENT [ "(" pattern { "," pattern } ")" ] (* Ctor / binding *)
| "(" pattern "," pattern { "," pattern } ")" ; (* tuple, arity >= 2 *)
structLit = "{" [ field { "," field } [ "," ] ] "}" ;
field = IDENT ":" expr ;
listLit = "[" [ expr { "," expr } [ "," ] ] "]" ;
mapLit = "{" [ entry { "," entry } [ "," ] ] "}" ;
entry = ( STRING | "[" expr "]" ) ":" expr ; (* string or computed key;
bare `IDENT:` is a struct field, never a map key, so
`{name: x}` is unambiguous and `map<int,V>` literals are
written `{[k]: v}` *)
fnLit = "fn" "(" [ params ] ")" [ "->" type ] block ;
literal = INT | FLOAT | STRING | "true" | "false" ;
LL(1) tactics that constrain the language:
let/varlead bindings;publeads exports;importleads imports.- Assignment is statement-level (
=vs==unambiguous; noif (a = b)). - The
{rule. In a statement or control-flow slot (block, afterif/else/while/for/fn/spawn, and a match arm's->),{is a block. In expression position,{is a map literal andName{...}is a struct literal. To use a map where a block is expected, parenthesize:-> ({...}). This is what lets value-if/matchcoexist with{k: v}maps under LL(1). - Struct literals are suppressed in condition position. Inside the
condofif/while/for, a bareName { … }is not a struct literal — otherwise the{is eaten as a struct body and starves the required block (the classic Rust/Go conflict). A struct literal needed in a condition must be parenthesized:if (Foo{…}).ok { … }. This is what actually makes the grammar LL(1); without itIDENT [structLit]and the control-flow{-block production collide. - The
(rule. In expression position(is left-factored: parse oneexpr, then a following)means it was grouping and,means it's a tuple — so grouping and tuples never conflict at the open paren. In type position there is no grouping, so()is unit and(T, …)is a tuple, decided right after(. In pattern position a leading(is always a tuple pattern. (spawn/fnbefore(are keywords, handled before this rule.) fnat item/stmt start is a named declaration; anonymousfnonly in expression position.tryis a unary prefix;catcha low-precedence tail with an optional|e|binder — LL(1) because|is not in FIRST(expr).<is a type-only token in type position (generics, declared sites). No explicit type args at call sites, so<is always less-than in expressions.- Patterns decide by one lookahead after the leading token:
_/literal are immediate; a leading(is a tuple pattern; anIDENTfollowed by anotherIDENTis a type-narrow, by(is a constructor, else a binding (constructor-vs-binding for a bare identifier is resolved in the checker against known variants). Onepatterngrammar serveslet/var/for/match/if let;let/var/foradditionally require it to be irrefutable (binders,_, nested tuples thereof — never a literal/constructor/type-narrow that could fail to match). - Or-patterns join alternatives with
|("+" | "-" -> …): after apatternAtom, lookahead|continues the or-pattern, anything else ends it. This pattern-position|does not clash withcatch's|e|binder (expr position), so the "|∉ FIRST(expr)" tactic still holds. Or-patterns are refutable ⇒ they appear only inmatch/if let, and they are howmatchsubsumes Cswitchmulti-case/fallthrough — arms never fall through. if letvsif: peekletafterif.- Parallel assignment.
(a, b) = (b, a)is an assignment whose LHS is a tuple of lvalues; the RHS tuple is fully evaluated before any target is written (so swap works). Only=takes a tuple LHS — not+=and friends.
Semantics defaults
Decided, revisitable — these were settled as defaults rather than formal forks:
- Integer overflow traps (halts the run) — correctness over speed for a tool
language, where a silently wrapped size/offset is the worse failure. Explicit
+%/-%/*%perform two's-complement wrapping where it is actually wanted. /truncates toward zero;%sign follows the dividend (C). Mixedint/floatneedsfloat(x).==compares scalars and strings by value, and lists/maps/records/sums/ tuples structurally by contents; identity is a separateis.- Iteration:
for x in list,for (k, v) in map,for i in range(n); one iterator protocol (next() -> T?) so user types opt in —mapyields(K, V)tuples, so multi-binding is just a tuple pattern, not a special case. - Unit return: a function with no
-> Treturns unit (nothing), distinct fromNone. - Strings: immutable, UTF-8;
.lenis bytes; codepoint/grapheme helpers and patterns operate on bytes. - Closures capture bindings by reference (mutating a captured
varis visible). Loop bindings are fresh per iteration — eachfor/whileturn gets its own binding, so a closure created in the loop captures that iteration's value (Go 1.22 / JS-letsemantics). The build example's per-srcrecipe closures depend on this.
Execution: RSN on the interp
- Pipeline: source → lexer → LL(1) parser → AST → type check → CG-API
lowering → IR → (interp sink)
InterpInsn. Typed ops are real IR ops; onlyanyand a few container/iterator primitives arert/lib/helper calls. - Register VM, direct-threaded dispatch — the existing interp.
- Fibers are the interp's explicit
InterpStacks. A yielding op (run,read,sleep, channel ops) returnsKIT_INTERP_BLOCKED; a scheduler (living with the host, since readiness depends on host events) resumes the stack when ready. Cooperative: no preemption. - The one limit (like Lua): a fiber may yield only at RSN/IR instruction boundaries and blessed scheduler-aware host calls — never from inside a native C frame reached via FFI.
- Surface:
spawn { … }is an expression yielding a fiber handle (let h = spawn { … });parallel(list, limit, fn)→ bounded fan-out, results in input order;chan()/.send/.recv. - Determinism guard: fibers are execution-phase; build-graph definition runs
single-fiber, and the build-file capability gate nulls
spawnand writes (read-onlyglob/statstay live so the graph can discover its inputs).
Memory and GC
- Non-moving mark-sweep, precise via the managed-ref type. Allocation funnels
through
rsn_alloc(shape, size); collection runs at an allocation threshold. - Stop-the-world is trivial (single-threaded, cooperative): collect only at allocation safepoints, where all roots are in registers.
- Roots: live ref-PRegs across all fibers (via each function's ref-bitmap), globals, the string-intern table, the C-roots scratch. Tracing uses each object's shape (per-variant for sums).
- Finalizers:
handle/chan/fiberand host-resource objects get an optionalfinalizeslot in their shape, run on collection and at teardown. - Arena stage (1a): managed values bump-allocate into a run-scoped arena and the collector is deferred (free at run end); object headers and shapes are designed in from the start, so turning the collector on in 1b is localized.
Embedding API and capability gating
typedef struct KitRsnHost {
const KitFileIO* file_io; /* build-def: read/glob/stat live, writes gated */
int (*spawn)(void* user, const KitRsnSpawn*, KitRsnProc** out);
int (*proc_reap)(void* user, KitRsnProc*, int* exit_code); /* scheduler */
int64_t (*clock_ns)(void* user); /* gated */
int (*mkdir_p)(void* user, const char* path);
void* user;
} KitRsnHost;
KIT_API KitRsn* kit_rsn_new(KitContext*, const KitRsnHost*);
KIT_API KitStatus kit_rsn_eval(KitRsn*, KitSlice src, const char* name);
KIT_API void kit_rsn_register(KitRsn*, const char* name,
KitRsnNativeFn, void* ud);
KIT_API void kit_rsn_free(KitRsn*);
- Script mode (
kit rsn foo.rsn): full vtable — replaces the shell glue. - Build-definition mode:
spawn/clock_ns/write-file_ioNULL, but read-only filesystem queries stay live (glob,stat, read) — abuild.rsnmust discover its inputs (glob("src/**/*.c")) to declare targets, yet cannotrun()processes, write files, or read the clock, so the graph is hermetic and ad-hoc effects are a load-time error. The capability gate is therefore per-operation (read vs. write vs. spawn), not a single wholesalefile_ioswitch. One frontend, two privilege levels, chosen by the caller.
The build layer (later)
A library in RSN plus a thin engine: target(name, inputs, outputs, recipe);
staleness by content hash via the CAS (not mtimes); parallel execution via
parallel/fibers bounded to ncpu(); kit build loads build.rsn in
build-definition mode for the graph, then executes it in script mode. Out of scope
for the milestones below — it is what the language is for.
Implementation phases
Stage 1a — typed RSN, interpreted, arena-allocated (no collector yet):
lang/rsn/lexer + LL(1) parser → AST (full grammar above). Parse-corpus tests undertest/rsn/.- Type checker: local inference, ADTs (
struct/enum/tuple), builtin-container generics, UFCS resolution, the uniformpatterngrammar (irrefutability check forlet/var/for, exhaustiveness formatch),Result/Option/try, modules. - RSN → CG-API lowering (typed ops → IR ops;
any/containers/iterators viart/lib/). Managed values live in a run-scoped arena, freed at run end — the interp carries a per-function ref-bitmap as interp-local metadata, but no new CG type kind is added yet, so the C compiler's type system is untouched. - Fibers: complete the
BLOCKED/yield path + scheduler overInterpStacks + host vtable (spawn/run/read/clock);spawn/parallel/chan. - Stdlib: strings, list/map, iterators/
range, Lua-style patterns, JSON (→any),run/exec. - CLI
kit rsn+ per-operation capability gating.
Goal of 1a: prove the language on real build.rsn / test scripts before the
invasive IR surgery. Short-lived runs make arena-free-at-exit genuinely adequate.
Stage 1b — precise GC (added once runs are long-lived enough to need it):
- CG/IR managed-ref type + GC op dialect (
src/cg), gatedKIT_MANAGED_IR; object headers + (per-variant) shape descriptors. Firewall test: C-only build green with the dialect compiled out. - Lowering emits managed ops for refs; the interp's root scan + mark-sweep collector + finalizers replace the arena. (The 1a ref-bitmap becomes the precise root map.)
Stage 2 — native managed codegen (optional, demand-driven):
- Safepoints + GC stack maps in the optimizer (liveness/regalloc).
- Native emit for managed functions across aa64/x64/rv64.
- JIT/AOT RSN:
kit rsn --jit, andkit buildemitting native build tools. - Wasm/C-backend managed support (furthest out).
Then: the build layer.
Resolved decisions
All of: the architecture (typed CG-API frontend; GC-aware IR; precise non-moving
mark-sweep; cooperative fibers; staged interp → 1a arena → 1b GC → native);
let/var; non-null + T?; Result/try/catch; ADTs (struct/enum) with
match; any narrowing via lowercase type-patterns in match; UFCS methods;
string interpolation; # line comments; expression if/match
with the {}-block rule and the struct-literal-in-condition restriction;
qualified module imports; Lua-style patterns; separate int/float with explicit
conversion; integer overflow traps; selective string interning;
per-operation capability gating; first-class tuples with one uniform
pattern grammar (Rust-style) + parallel assignment; match as the single
multi-way construct (no switch) with or-patterns and no fallthrough; Stage-1
generics = builtin containers only; the semantics defaults above; NaN-box value
model dropped.
Open decisions
None block Stage 1a. Deferred until their phase: selective imports (from … import { … } sugar), range syntax (a..b sugar vs the range() builtin) —
which would also unlock range patterns in match (1..10 -> …),
record-style enum payloads (Variant { … }), and user-defined generic
functions (instantiation strategy + constraints — see Type system). The earlier
tuple gap is now resolved (first-class (T, U), see Type system / Value model /
Grammar); the remaining tuple sub-question deferred to its phase is packed
all-scalar tuple representation (e.g. unboxed (int, int)), niche-optimized
alongside packed list<int>. Match guards (pattern if cond -> …) are
deliberately deferred — match stays structural + or-pattern dispatch, and
conditional logic lives in if/else if; guards can be added later
non-breakingly (matchArm = pattern [ "if" expr ] "->" …).
Naming
Language: RSN — Ryan's Scripting Notation. Source extension .rsn. Frontend
lang/rsn/, CLI kit rsn, frontend flag KIT_RSN_ENABLED; the GC-aware IR
it depends on is gated separately (KIT_MANAGED_IR). When RSN ships, the
durable design moves up to doc/RSN.md and this roadmap shrinks to what remains
open.