commit d7882f0ce3836f3d71de3ba9bca652a7fc28a161
parent cc3abed6052eb7fdcc58fc6ccba94bbd8fc8a5ff
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 10 May 2026 05:26:26 -0700
parse: Phase 1 — calls, params, multi-fn TUs, &&/||/?:, sizeof/cast
Land the Phase 1 expression-grammar surface so helper functions and
calls work end-to-end. Param lists with pointer declarators, multiple
function definitions per TU, forward prototypes, function calls in
postfix, short-circuiting `&&`/`||` and ternary `?:`, unary `&`/`*`,
char-literal decoding, string literals into .rodata, sizeof(type-name)
and sizeof(IDENT), _Alignof(type-name), cast expressions, and a
type-name production shared by all three.
Adds backend-cooperative scratch-register reset at statement boundaries
(cg_reset_scratch → aa_reset_scratch) so deep recursive bodies like
factorial don't exhaust the aarch64 backend's fixed scratch window.
Flips 36 corpus rows from `·` to `★` (Phase 0's spine + Phase 1's
unlocks). Phase 1 boxes ticked in doc/parser-status.md.
Diffstat:
| A | doc/parser-status.md | | | 239 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| M | src/arch/aarch64.c | | | 14 | ++++++++++++++ |
| M | src/cg/cg.c | | | 57 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| M | src/parse/parse.c | | | 1051 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------- |
| M | test/parse/CORPUS.md | | | 144 | ++++++++++++++++++++++++++++++++++++++++---------------------------------------- |
5 files changed, 1293 insertions(+), 212 deletions(-)
diff --git a/doc/parser-status.md b/doc/parser-status.md
@@ -0,0 +1,239 @@
+# C-parser status
+
+Living checklist for the C front-end (`src/parse/`) build-out. Behavioral
+oracle is `test/parse/CORPUS.md` — every checkbox here corresponds to
+corpus rows that flip from `·` to `★` when the box gets ticked. Update
+this doc in the same commit as the parser change that lands it.
+
+Phase status:
+
+- ✅ landed
+- 🚧 in progress
+- ⬜ planned
+
+Each phase is one agent's worth of work. Phases are ordered by
+dependency, not priority — Phase N+1 generally needs Phase N's surface
+in place.
+
+---
+
+## Phase 0 — Spine ✅
+
+§6.5 binary operators, scalar `int` locals, `if/else`, `while`, `for`,
+`break`, `continue`, `return`, comma, simple/compound assignment,
+unary `+ - ! ~ ++ --`, `(expr)`, integer literals. Single function
+`int test_main(void) { ... }` per TU.
+
+---
+
+## Phase 1 — Calls & §6.5 completion ✅
+
+Finish the expression grammar and let helper functions exist. The
+largest single unlock — most later phases depend on multi-function TUs.
+
+- [x] Parameter lists in function definitions (`int f(int x, int *p)`)
+- [x] Multiple function definitions per TU
+- [x] Function calls in postfix (`f(x, y)`) — drives `cg_call` / ABI
+- [x] Forward prototypes (`int f(int);` then later body)
+- [x] Logical `&&`, `||`, ternary `?:` (short-circuit, label-driven)
+- [x] Address-of `&` and dereference `*` in unary
+- [x] String literals in primary (rodata-resident, decay to `char*`)
+- [x] Char literal decoded value (full §6.4.4.4 escape set)
+- [x] `sizeof(type-name)` and `sizeof(IDENT)` for declared objects;
+ arbitrary `sizeof expr` (no parens, side-effecting operand)
+ defers to Phase 2 once subscript/`->`/etc. land
+- [x] `_Alignof(type-name)`
+- [x] Cast expression `(type-name) expr`
+- [x] Type-name production (shared by sizeof / _Alignof / cast); abstract
+ declarators are pointer-prefix only — array/function suffixes wait
+ on Phase 2
+
+Phase 1 also added a backend-cooperative scratch-register reset at
+statement boundaries (cg's value stack is empty between stmts, so
+`cg_reset_scratch` lets the next statement reuse the entire scratch
+window). Without it the aarch64 backend's fixed pool gets exhausted
+inside any function with multiple sequential reg-allocating
+operations (factorial blew up at depth 5).
+
+Unlocks (status as landed): `6_5_12–14` ★, `6_5_20–21` ★, `6_5_22` ·
+(needs array decl, Phase 2), `6_5_23–25` ★, `6_5_30` · (deferred —
+`_Generic` requires type-only walking, defer to Phase 3 alongside
+`_Generic`'s natural neighbors), `6_5_32` · (needs subscript, Phase 2),
+`6_2_4_01` · (proper static-local persistence — Phase 4), `6_2_5_01` ★,
+`6_3_1_1_01` ★, `6_3_2_2_01` ★, `6_3_2_3_01–02` ★, `6_7_4_01` ★,
+`6_9_01–02` ★, `6_9_03` · (file-scope tentative def — Phase 4),
+`6_9_04–05` ★, `6_9_06` · (variadic + `__builtin_va_*` — Phase 9).
+
+---
+
+## Phase 2 — Pointers & arrays ⬜
+
+Pointer/array declarator layers and the address operators. Builds on
+Phase 1's type-name production.
+
+- [ ] Pointer declarator (`int *p`, `int **pp`)
+- [ ] Array declarator (`int a[N]`, `int a[]`)
+- [ ] Subscript `a[i]` (and the commutative `i[a]`)
+- [ ] Pointer arithmetic in `+`/`-` (scaled by element size)
+- [ ] Array-to-pointer decay
+- [ ] Function-to-pointer decay + indirect call (`(*fp)(x)`)
+- [ ] `int *const p` / `const int *p` qualifier placement
+- [ ] `[static N]` parameter form
+- [ ] VLA local (`int a[n]`)
+
+Unlocks: `6_3_2_1_*`, `6_5_28–32`, `6_7_3_03–04`, `6_7_6_01–07`.
+
+---
+
+## Phase 3 — Aggregates (struct / union / enum) ⬜
+
+Tag namespace, member access, anonymous and forward-declared
+aggregates, bitfields, `_Generic`.
+
+- [ ] `struct` / `union` definition and tag-scope lookup
+- [ ] Member access `.` and `->` in postfix
+- [ ] `enum` with constants bound into ordinary scope
+- [ ] Forward-declared tag (`struct S; ... struct S { ... };`)
+- [ ] Self-referential pointers (`struct N { struct N *next; };`)
+- [ ] Anonymous struct/union members (C11 §6.7.2.1)
+- [ ] Bitfield members (`unsigned a:5`)
+- [ ] `_Generic` selection (type-keyed)
+
+Unlocks: `6_2_3_*`, `6_5_28–30`, `6_6_01`, `6_7_06–08`, `6_7_2_1_01–05`.
+
+---
+
+## Phase 4 — Globals, storage, linkage ⬜
+
+File-scope objects with their full initializer / linkage matrix.
+
+- [ ] File-scope object declarations
+- [ ] `static` global (internal linkage, `.data` / `.bss` placement)
+- [ ] `extern` declaration and resolution
+- [ ] Tentative definitions
+- [ ] `const` global in `.rodata`
+- [ ] Global struct / array data emission
+- [ ] `static` local with non-zero init
+
+Unlocks: `6_7_02–04`, `6_9_03`, `6_9_07–09`.
+
+---
+
+## Phase 5 — Statement completeness ⬜
+
+Switch family, goto/labels, do-while, void return, label namespace.
+
+- [ ] `switch` / `case` / `default` (incl. fall-through, default-only)
+- [ ] `goto` forward and backward
+- [ ] User labels (separate namespace from ordinary identifiers)
+- [ ] `do { } while ()`
+- [ ] `return;` from void function
+
+Unlocks: `6_2_3_02`, `6_8_04`, `6_8_07–11`, `6_8_14`.
+
+---
+
+## Phase 6 — Initializers ⬜
+
+Full §6.7.9 surface. Requires aggregates (Phase 3) and globals
+(Phase 4) to be fully useful.
+
+- [ ] Brace initializer for arrays
+- [ ] Brace initializer for structs
+- [ ] Designated initializers (`[i] = ...`, `.field = ...`)
+- [ ] Nested designators (`[i][j] = ...`)
+- [ ] Partial init with zero-fill
+- [ ] String literal init for `char[]`
+- [ ] Compound literals (`(int[]){1, 2}`)
+
+Unlocks: `6_5_29`, `6_7_9_02–10`, `6_9_08–09`.
+
+---
+
+## Phase 7 — Type breadth & conversions ⬜
+
+Every primitive integer + float type round-tripped, plus the §6.3
+conversion matrix.
+
+- [ ] `char`, `signed char`, `unsigned char`
+- [ ] `short`, `unsigned short`
+- [ ] `long`, `long long`, `unsigned long`, `unsigned long long`
+- [ ] `_Bool` with normalize-to-0/1 semantics
+- [ ] `float`, `double`, `long double`
+- [ ] Integer literal suffixes (`U`, `L`, `LL`)
+- [ ] Float literals (decimal + hex)
+- [ ] Usual arithmetic conversions
+- [ ] Integer ↔ float conversions
+
+Unlocks: `6_3_*`, `6_7_2_01–12`.
+
+---
+
+## Phase 8 — Qualifiers, alignment, typedefs ⬜
+
+Remaining declaration-side features.
+
+- [ ] `_Atomic` qualifier (parse + plumb to cg)
+- [ ] `_Alignas(T)` and `_Alignas(N)` on objects
+- [ ] `inline` (header-only definitions)
+- [ ] `typedef` (already partially landed; promote)
+- [ ] Compound typedef targets (struct, function pointer, array)
+- [ ] `_Static_assert` at file and block scope
+
+Unlocks: `6_7_3_05`, `6_7_5_*`, `6_7_8_*`, `6_7_10_*`.
+
+---
+
+## Phase 9 — Builtins ⬜
+
+Routes named `__builtin_*` calls to cg's intrinsic / asm machinery
+rather than ordinary call lowering. Contract: `doc/builtins.md`.
+
+- [ ] `__builtin_alloca`
+- [ ] `__builtin_expect`
+- [ ] `__builtin_va_start` / `va_arg` / `va_end` / `va_copy`
+- [ ] `__builtin_offsetof`
+- [ ] `__atomic_load_n`, `__atomic_fetch_add` (and friends)
+
+Unlocks: `builtin_01–07`, `6_9_06`.
+
+---
+
+## Phase 10 — Diagnostics polish ⬜
+
+Negative cases assert nonzero exit + optional `errpat` substring. Wire
+the explicit diagnostic for each `cases_err/*` row.
+
+- [ ] Identifier resolution / lvalue / type mismatch
+- [ ] Redefinition (object, function, struct tag)
+- [ ] Member / call / arrow / sizeof on incomplete or wrong shape
+- [ ] Storage-class combinations
+- [ ] Bitfield width
+- [ ] `const` violation
+- [ ] `_Static_assert` failure
+- [ ] switch / case / default scope rules
+- [ ] goto / label scope rules
+- [ ] `void` parameter rules
+
+Unlocks: every row under `test/parse/cases_err/`.
+
+---
+
+## Cross-cutting ⬜
+
+Nudged forward as relevant cases exercise them; not their own phase.
+
+- [ ] DWARF Class-1 fanout (`debug_func_begin` / `param` / `local` /
+ `scope_*`) when `Debug` is non-NULL — see `DWARF.md` §3.1.
+- [ ] Multi-TU `cases/<name>/{a.c,b.c}` harness wiring (see CORPUS.md
+ Multi-TU §).
+
+---
+
+## Maintenance
+
+- Tick boxes in the same commit as the parser change that lands them.
+- When a phase finishes, flip its heading marker to ✅ and update the
+ matching `·` rows in `CORPUS.md` to `★`.
+- New corpus rows go in `CORPUS.md`; cross-link here only when they
+ introduce a feature axis the phases don't already cover.
diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c
@@ -777,6 +777,20 @@ static void aa_free_reg(CGTarget* t, Reg r) {
(void)r;
}
+/* Reset the scratch-register cursors. The parser calls this between
+ * statements (when its value stack is known empty), letting the next
+ * statement reuse the entire scratch pool. Safe only when no live
+ * register-resident SValue is still expected — the parser asserts
+ * that precondition by checking cg's sp before forwarding. Without
+ * this, any function whose body contains more than ~10 sequential
+ * register-allocating operations exhausts the scratch pool. */
+void aa_reset_scratch(CGTarget* t);
+void aa_reset_scratch(CGTarget* t) {
+ AAImpl* a = impl_of(t);
+ a->used_int = 0;
+ a->used_fp = 0;
+}
+
static FrameSlot aa_frame_slot(CGTarget* t, const FrameSlotDesc* d) {
AAImpl* a = impl_of(t);
if (a->nslots == a->slots_cap) {
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -346,6 +346,63 @@ void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty) {
push(g, sv);
}
+/* Pop a pointer rvalue and push an OPK_INDIRECT lvalue for the pointee.
+ * The parser uses this to implement unary `*`. The pointer is materialized
+ * into a register; the resulting lvalue's MemAccess alias root is unknown
+ * (not LOCAL/GLOBAL), which is the right conservative answer for *ptr. */
+static Operand force_reg(CG* g, SValue v, const Type* ty);
+void cg_deref(CG* g, const Type* pointee_ty);
+void cg_deref(CG* g, const Type* pointee_ty) {
+ SValue v = pop(g);
+ const Type* pty = v.type ? v.type : v.op.type;
+ Operand src = force_reg(g, v, pty);
+ Operand ind;
+ SValue sv;
+ memset(&ind, 0, sizeof ind);
+ ind.kind = OPK_INDIRECT;
+ ind.cls = RC_INT;
+ ind.type = pointee_ty;
+ ind.v.ind.base = src.v.reg;
+ ind.v.ind.ofs = 0;
+ sv.op = ind;
+ sv.type = pointee_ty;
+ push(g, sv);
+}
+
+/* Read the type of the value currently on top of the stack without popping.
+ * The parser uses this for type-driven dispatch (e.g. function-call lowering
+ * needs the callee's TY_FUNC) without re-deriving from its own state. */
+const Type* cg_top_type(CG* g);
+const Type* cg_top_type(CG* g) {
+ if (g->sp == 0) return NULL;
+ return g->stack[g->sp - 1].type;
+}
+
+/* Replace the type tag on the top SValue without emitting code. Used by
+ * the parser for casts that are no-ops at the value level (e.g. pointer-
+ * to-pointer of the same width); the underlying register/operand stays
+ * the same, only the C type the parser/backend will read changes. */
+void cg_retag_top(CG* g, const Type* ty);
+void cg_retag_top(CG* g, const Type* ty) {
+ if (g->sp == 0) return;
+ g->stack[g->sp - 1].type = ty;
+ g->stack[g->sp - 1].op.type = ty;
+}
+
+/* Recycle the backend's scratch-register pool. Safe only when nothing on
+ * the value stack holds a register operand (sp == 0 in particular, but
+ * an all-IMM stack is also fine in principle). The parser calls this at
+ * statement boundaries so a function body with many sequential reg-
+ * allocating operations doesn't exhaust the fixed scratch window. */
+extern void aa_reset_scratch(CGTarget*);
+void cg_reset_scratch(CG* g);
+void cg_reset_scratch(CG* g) {
+ if (g->sp != 0) return;
+ /* For now we only know about the aarch64 backend; once a second arch
+ * lands we promote this to a CGTarget vtable entry. */
+ aa_reset_scratch(g->target);
+}
+
void cg_push_global(CG* g, ObjSymId sym, const Type* ty) {
SValue sv;
sv.op = op_global(sym, 0, ty);
diff --git a/src/parse/parse.c b/src/parse/parse.c
@@ -40,6 +40,17 @@
/* Type-aware push for locals — exposed by cg.c, not in cg.h. */
extern void cg_push_local_typed(CG*, FrameSlot, const Type*);
+/* Pop pointer rvalue, push INDIRECT lvalue of given pointee. */
+extern void cg_deref(CG*, const Type* pointee);
+/* Read SValue.type at top of stack without popping. */
+extern const Type* cg_top_type(CG*);
+/* Replace the type tag on the top SValue without emitting code (used for
+ * pointer-to-pointer casts which are no-ops at the value level). */
+extern void cg_retag_top(CG*, const Type*);
+/* Recycle the backend's scratch-register pool when no value-stack entry
+ * holds a live register. Called at statement boundaries to avoid
+ * exhausting the fixed scratch window over the course of a function. */
+extern void cg_reset_scratch(CG*);
/* ============================================================
* Keywords
@@ -158,7 +169,9 @@ typedef struct Parser {
TargetABI* abi;
Pool* pool;
- Tok cur; /* one token of lookahead */
+ Tok cur; /* one token of lookahead */
+ Tok next; /* second slot, populated lazily by peek1() */
+ int has_next;
Sym kw_sym[KW_COUNT];
@@ -188,7 +201,23 @@ static _Noreturn void perr(Parser* p, const char* fmt, ...) {
* Token helpers
* ============================================================ */
-static void advance(Parser* p) { p->cur = pp_next(p->pp); }
+static void advance(Parser* p) {
+ if (p->has_next) {
+ p->cur = p->next;
+ p->has_next = 0;
+ } else {
+ p->cur = pp_next(p->pp);
+ }
+}
+
+/* One-token lookahead beyond p->cur. Lazily populated. */
+static Tok peek1(Parser* p) {
+ if (!p->has_next) {
+ p->next = pp_next(p->pp);
+ p->has_next = 1;
+ }
+ return p->next;
+}
static int is_punct(const Tok* t, u32 punct) {
return t->kind == TOK_PUNCT && t->v.punct == punct;
@@ -284,6 +313,207 @@ static SymEntry* scope_lookup(Parser* p, Sym name) {
* ============================================================ */
static const Type* ty_int(Parser* p) { return type_prim(p->pool, TY_INT); }
+static const Type* ty_size_t(Parser* p) {
+ return abi_size_type(p->abi, p->pool);
+}
+
+/* DeclSpecs and the matching parser landed up in the declaration section
+ * historically; we hoist it before expression parsing because
+ * sizeof / _Alignof / cast need to consume a type-name from inside
+ * parse_unary. */
+typedef struct DeclSpecs {
+ const Type* type;
+ DeclStorage storage;
+ u32 flags; /* DeclFlag */
+} DeclSpecs;
+
+/* Resolve the type implied by a multiset of type-specifier tokens
+ * (unsigned, signed, short, long, char, int, ...). C allows most orders
+ * (`unsigned long int` ≡ `int unsigned long`), so we collect everything
+ * first and pick the canonical TY_* tag at the end. Phase 1 covers the
+ * combinations the corpus needs; the float family (`long double`) is
+ * Phase 7's job and falls through to a "conflicting" diagnostic if
+ * combined with the integer keywords here. */
+typedef struct TypeSpecAccum {
+ u8 saw_void;
+ u8 saw_char;
+ u8 saw_int;
+ u8 saw_short;
+ u8 long_count; /* 0/1/2 */
+ u8 saw_signed;
+ u8 saw_unsigned;
+ u8 saw_bool;
+ u8 saw_float;
+ u8 saw_double;
+ u8 saw_explicit_type; /* any of the above? */
+} TypeSpecAccum;
+
+static const Type* resolve_type_specs(Parser* p, const TypeSpecAccum* a,
+ SrcLoc loc) {
+ if (!a->saw_explicit_type) return NULL;
+ if (a->saw_void) {
+ if (a->saw_char || a->saw_int || a->saw_short || a->long_count ||
+ a->saw_signed || a->saw_unsigned || a->saw_bool || a->saw_float ||
+ a->saw_double) {
+ compiler_panic(p->c, loc, "conflicting type specifiers (void mixed)");
+ }
+ return type_void(p->pool);
+ }
+ if (a->saw_bool) {
+ return type_prim(p->pool, TY_BOOL);
+ }
+ if (a->saw_char) {
+ if (a->saw_unsigned) return type_prim(p->pool, TY_UCHAR);
+ if (a->saw_signed) return type_prim(p->pool, TY_SCHAR);
+ return type_prim(p->pool, TY_CHAR);
+ }
+ if (a->saw_float) return type_prim(p->pool, TY_FLOAT);
+ if (a->saw_double) {
+ return type_prim(p->pool, a->long_count ? TY_LDOUBLE : TY_DOUBLE);
+ }
+ if (a->saw_short) {
+ return type_prim(p->pool, a->saw_unsigned ? TY_USHORT : TY_SHORT);
+ }
+ if (a->long_count == 2) {
+ return type_prim(p->pool, a->saw_unsigned ? TY_ULLONG : TY_LLONG);
+ }
+ if (a->long_count == 1) {
+ return type_prim(p->pool, a->saw_unsigned ? TY_ULONG : TY_LONG);
+ }
+ if (a->saw_unsigned) return type_prim(p->pool, TY_UINT);
+ if (a->saw_signed || a->saw_int) return type_prim(p->pool, TY_INT);
+ return type_prim(p->pool, TY_INT);
+}
+
+static int parse_decl_specs(Parser* p, DeclSpecs* out) {
+ /* Tracks integer/void/char type specifiers in any order, plus the
+ * storage-class and qualifier keywords. Returns 0 if no specifier was
+ * consumed (caller treats that as "not a declaration"). */
+ TypeSpecAccum acc;
+ SrcLoc loc;
+ int seen = 0;
+ memset(&acc, 0, sizeof acc);
+ out->type = NULL;
+ out->storage = DS_AUTO;
+ out->flags = DF_NONE;
+ loc = tok_loc(&p->cur);
+ for (;;) {
+ Tok t = p->cur;
+ if (is_kw(p, &t, KW_VOID)) {
+ acc.saw_void = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_CHAR)) {
+ acc.saw_char = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_INT)) {
+ acc.saw_int = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_SHORT)) {
+ acc.saw_short = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_LONG)) {
+ acc.long_count++; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_SIGNED)) {
+ acc.saw_signed = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_UNSIGNED)) {
+ acc.saw_unsigned = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_BOOL)) {
+ acc.saw_bool = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_FLOAT)) {
+ acc.saw_float = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_DOUBLE)) {
+ acc.saw_double = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_STATIC)) {
+ out->storage = DS_STATIC; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_EXTERN)) {
+ out->storage = DS_EXTERN; advance(p); seen = 1;
+ } else if (is_kw(p, &t, KW_CONST) || is_kw(p, &t, KW_VOLATILE) ||
+ is_kw(p, &t, KW_RESTRICT) || is_kw(p, &t, KW_INLINE) ||
+ is_kw(p, &t, KW_NORETURN) || is_kw(p, &t, KW_REGISTER) ||
+ is_kw(p, &t, KW_AUTO) || is_kw(p, &t, KW_ATOMIC)) {
+ /* Recognized but currently no-op at this slice. */
+ advance(p); seen = 1;
+ } else {
+ break;
+ }
+ }
+ if (seen) {
+ out->type = resolve_type_specs(p, &acc, loc);
+ if (!out->type) {
+ /* Storage class without a type — default to int per pre-C99. */
+ out->type = ty_int(p);
+ }
+ }
+ return seen;
+}
+
+/* True when the current token starts a declaration-specifier sequence: a
+ * type keyword, a storage-class keyword, a qualifier, or a function
+ * specifier. Used at lookahead points (cast vs. paren expr; sizeof's
+ * inner form; for-init declarator vs. expression). The list mirrors
+ * parse_decl_specs's accepted set so the two stay in sync.
+ *
+ * Typedef-names are not yet implemented; when they land, they become
+ * the second branch here and dispatch on scope_lookup().kind ==
+ * SEK_TYPEDEF, just like any other type-name token. */
+static int starts_type_name(const Parser* p, const Tok* t) {
+ if (t->kind != TOK_IDENT) return 0;
+ CKw k = ident_kw(p, t->v.ident);
+ switch (k) {
+ case KW_VOID:
+ case KW_CHAR:
+ case KW_SHORT:
+ case KW_INT:
+ case KW_LONG:
+ case KW_FLOAT:
+ case KW_DOUBLE:
+ case KW_SIGNED:
+ case KW_UNSIGNED:
+ case KW_BOOL:
+ case KW_STRUCT:
+ case KW_UNION:
+ case KW_ENUM:
+ case KW_CONST:
+ case KW_VOLATILE:
+ case KW_RESTRICT:
+ case KW_ATOMIC:
+ case KW_STATIC:
+ case KW_EXTERN:
+ case KW_INLINE:
+ case KW_NORETURN:
+ case KW_REGISTER:
+ case KW_AUTO:
+ case KW_TYPEDEF:
+ return 1;
+ default:
+ return 0;
+ }
+}
+
+/* Walk a `*` chain at the front of a declarator (and optional qualifiers
+ * after each `*`), wrapping `base` in successive pointer types. Returns
+ * the innermost type the IDENT/declarator-tail refers to. */
+static const Type* parse_pointer_layer(Parser* p, const Type* base) {
+ while (accept_punct(p, '*')) {
+ base = type_ptr(p->pool, base);
+ /* Optional qualifiers after `*`; recognized and ignored at this slice. */
+ for (;;) {
+ if (accept_kw(p, KW_CONST) || accept_kw(p, KW_VOLATILE) ||
+ accept_kw(p, KW_RESTRICT) || accept_kw(p, KW_ATOMIC)) {
+ continue;
+ }
+ break;
+ }
+ }
+ return base;
+}
+
+/* Type-name (§6.7.7): specifier-qualifier-list (abstract-declarator)?
+ * The abstract declarator at this slice is just a `*` chain — array and
+ * function suffixes land in Phase 2. Used by sizeof / _Alignof / cast. */
+static const Type* parse_type_name(Parser* p) {
+ DeclSpecs specs;
+ if (!parse_decl_specs(p, &specs)) {
+ perr(p, "expected type-name");
+ }
+ return parse_pointer_layer(p, specs.type);
+}
/* ============================================================
* Literal parsing
@@ -369,6 +599,163 @@ static void to_rvalue(Parser* p) {
(void)p;
}
+/* Decode one character (the first encoded code unit) from the token's
+ * spelling at offset `i`, advancing `*pi` past the consumed bytes.
+ * Handles the §6.4.4.4 escape sequences a freestanding compiler is
+ * required to recognize. */
+static i64 decode_one_char(Parser* p, const char* s, size_t len, size_t* pi,
+ SrcLoc loc) {
+ size_t i = *pi;
+ i64 v;
+ int c;
+ if (i >= len) compiler_panic(p->c, loc, "truncated character literal");
+ if (s[i] != '\\') {
+ v = (unsigned char)s[i++];
+ *pi = i;
+ return v;
+ }
+ /* Escape sequence. */
+ i++;
+ if (i >= len) compiler_panic(p->c, loc, "trailing '\\' in literal");
+ c = (unsigned char)s[i++];
+ switch (c) {
+ case 'n': v = '\n'; break;
+ case 't': v = '\t'; break;
+ case 'r': v = '\r'; break;
+ case 'b': v = '\b'; break;
+ case 'f': v = '\f'; break;
+ case 'v': v = '\v'; break;
+ case 'a': v = '\a'; break;
+ case '\\': v = '\\'; break;
+ case '\'': v = '\''; break;
+ case '"': v = '"'; break;
+ case '?': v = '?'; break;
+ case 'x': {
+ i64 hex = 0;
+ int any = 0;
+ while (i < len) {
+ int d = (unsigned char)s[i];
+ int dv;
+ if (d >= '0' && d <= '9') dv = d - '0';
+ else if (d >= 'a' && d <= 'f') dv = d - 'a' + 10;
+ else if (d >= 'A' && d <= 'F') dv = d - 'A' + 10;
+ else break;
+ hex = hex * 16 + dv;
+ any = 1;
+ i++;
+ }
+ if (!any) compiler_panic(p->c, loc, "\\x with no hex digits");
+ v = hex & 0xff;
+ break;
+ }
+ default:
+ if (c >= '0' && c <= '7') {
+ i64 oct = c - '0';
+ int n = 1;
+ while (n < 3 && i < len && s[i] >= '0' && s[i] <= '7') {
+ oct = oct * 8 + (s[i] - '0');
+ i++;
+ n++;
+ }
+ v = oct & 0xff;
+ } else {
+ /* Unknown escape: implementation-defined; keep the literal byte. */
+ v = c;
+ }
+ break;
+ }
+ *pi = i;
+ return v;
+}
+
+static i64 decode_char_literal(Parser* p, const Tok* t) {
+ size_t len = 0;
+ const char* s = pool_str(p->pool, t->spelling, &len);
+ size_t i = 0;
+ i64 v;
+ if (!s) perr(p, "bad char literal");
+ /* Skip optional encoding prefix (`L`, `u`, `U`, `u8`). The flag bits
+ * tell us which one without re-parsing. */
+ if (t->flags & TF_STR_U8) i = 2;
+ else if (t->flags & (TF_STR_WIDE | TF_STR_U16 | TF_STR_U32)) i = 1;
+ if (i >= len || s[i] != '\'') perr(p, "malformed character literal");
+ i++; /* opening quote */
+ if (i >= len || s[i] == '\'') perr(p, "empty character literal");
+ v = decode_one_char(p, s, len, &i, t->loc);
+ /* Multi-character constants are valid C but undefined-implementation;
+ * the spine corpus only uses single-char constants. Diagnose extra
+ * source bytes before the closing quote conservatively. */
+ if (i >= len || s[i] != '\'') {
+ perr(p, "multi-character constants are not supported");
+ }
+ return v;
+}
+
+/* Decode the content of a string-literal token (without the surrounding
+ * quotes / encoding prefix) into raw bytes. Returns a heap-allocated
+ * buffer of length `*nlen_out`; caller frees through the same heap. */
+static u8* decode_string_literal(Parser* p, const Tok* t, size_t* nlen_out) {
+ size_t len = 0;
+ const char* s = pool_str(p->pool, t->spelling, &len);
+ size_t i = 0;
+ Heap* h = p->c->env->heap;
+ u8* buf;
+ size_t k = 0;
+ if (!s) perr(p, "bad string literal");
+ if (t->flags & TF_STR_U8) i = 2;
+ else if (t->flags & (TF_STR_WIDE | TF_STR_U16 | TF_STR_U32)) i = 1;
+ if (i >= len || s[i] != '"') perr(p, "malformed string literal");
+ i++;
+ /* Conservative buffer: at most one byte per source byte, plus NUL. */
+ buf = (u8*)h->alloc(h, len + 1, 1);
+ if (!buf) perr(p, "out of memory in string literal");
+ while (i < len && s[i] != '"') {
+ i64 ch = decode_one_char(p, s, len, &i, t->loc);
+ buf[k++] = (u8)ch;
+ }
+ buf[k++] = 0; /* NUL terminator */
+ *nlen_out = k;
+ return buf;
+}
+
+/* Place decoded string bytes in .rodata and return an ObjSymId pointing at
+ * them. Used by string literals in primary. */
+static ObjSymId emit_string_to_rodata(Parser* p, const u8* bytes, size_t n) {
+ ObjBuilder* ob = decl_obj(p->decls);
+ Sym secname = pool_intern_cstr(p->pool, ".rodata");
+ ObjSecId sec = obj_section(ob, secname, SEC_RODATA, SF_ALLOC, 1u);
+ u32 base = obj_pos(ob, sec);
+ Sym lname;
+ ObjSymId sym;
+ char namebuf[32];
+ static u32 counter;
+ /* Anonymous local symbol; the name is just for readability in objdump. */
+ int wlen = 0;
+ u32 id = ++counter;
+ /* Tiny formatter — avoids stdio dependencies in the parser. */
+ namebuf[wlen++] = '.';
+ namebuf[wlen++] = 'L';
+ namebuf[wlen++] = 'C';
+ {
+ char digits[12];
+ int dn = 0;
+ if (id == 0) digits[dn++] = '0';
+ while (id) {
+ digits[dn++] = (char)('0' + (id % 10));
+ id /= 10;
+ }
+ while (dn) namebuf[wlen++] = digits[--dn];
+ }
+ namebuf[wlen] = 0;
+ lname = pool_intern(p->pool, namebuf, (size_t)wlen);
+ sym = obj_symbol(ob, lname, SB_LOCAL, SK_OBJ, sec, base, n);
+ {
+ u8* dst = obj_reserve(ob, sec, n);
+ if (dst) memcpy(dst, bytes, n);
+ }
+ return sym;
+}
+
static void parse_primary(Parser* p) {
Tok t = p->cur;
if (t.kind == TOK_NUM) {
@@ -414,15 +801,37 @@ static void parse_primary(Parser* p) {
}
}
if (t.kind == TOK_CHR) {
- /* Minimal char-literal: take the first decoded byte from the lit table.
- * Spine doesn't use char literals, so this is best-effort. */
- const LitInfo* li = pp_lit(p->pp, t.lit);
- i64 v = 0;
- (void)li;
+ i64 v = decode_char_literal(p, &t);
advance(p);
cg_push_int(p->cg, v, ty_int(p));
return;
}
+ if (t.kind == TOK_STR) {
+ /* Decoded bytes go into a fresh anonymous .rodata symbol; the value
+ * of the expression is a pointer to char[] decayed to char*. */
+ size_t n = 0;
+ u8* bytes = decode_string_literal(p, &t, &n);
+ ObjSymId sym = emit_string_to_rodata(p, bytes, n);
+ p->c->env->heap->free(p->c->env->heap, bytes, 0);
+ advance(p);
+ {
+ const Type* char_ty = type_prim(p->pool, TY_CHAR);
+ const Type* arr_ty = type_array(p->pool, char_ty, (u32)n, 0);
+ const Type* ptr_ty = type_ptr(p->pool, char_ty);
+ /* Array-to-pointer decay would normally happen at use; cg_push_global
+ * is given a pointer-typed lvalue so subsequent operations treat it
+ * as `char*` rvalue once loaded. */
+ (void)arr_ty;
+ cg_push_global(p->cg, sym, ptr_ty);
+ /* String address is already the pointer rvalue we want — promote
+ * away from "lvalue of pointer-to-char[N]" to just "rvalue of
+ * char*" by tagging it as an rvalue at the cg layer. cg_push_global
+ * pushes a GLOBAL lvalue; for strings we want the address itself,
+ * i.e. an rvalue. cg_addr converts. */
+ cg_addr(p->cg);
+ }
+ return;
+ }
perr(p, "expected expression");
}
@@ -440,16 +849,97 @@ static void parse_postfix(Parser* p) {
cg_inc_dec(p->cg, BO_ISUB, /*post=*/1);
continue;
}
- if (is_punct(&t, '(') || is_punct(&t, '[') || is_punct(&t, '.') ||
- is_punct(&t, P_ARROW)) {
- perr(p, "call/subscript/member access not supported in v1 slice");
+ if (is_punct(&t, '(')) {
+ /* Function call. The callee was pushed by parse_primary as an
+ * lvalue (OPK_GLOBAL for SEK_FUNC); cg_call accepts that directly
+ * for direct calls. */
+ const Type* fn_type = cg_top_type(p->cg);
+ if (!fn_type || fn_type->kind != TY_FUNC) {
+ perr(p, "called object is not a function");
+ }
+ advance(p); /* '(' */
+ u32 nargs = 0;
+ if (!is_punct(&p->cur, ')')) {
+ for (;;) {
+ parse_assign_expr(p);
+ to_rvalue(p);
+ ++nargs;
+ if (!accept_punct(p, ',')) break;
+ }
+ }
+ expect_punct(p, ')', "')' after argument list");
+ if (fn_type->fn.nparams != nargs && !fn_type->fn.variadic) {
+ perr(p, "wrong number of arguments");
+ }
+ if (fn_type->fn.variadic && nargs < fn_type->fn.nparams) {
+ perr(p, "too few arguments to variadic function");
+ }
+ cg_call(p->cg, nargs, fn_type);
+ /* cg_call leaves nothing on the stack for void-returning functions.
+ * Higher-level expression machinery (drop in stmt context, dispatch
+ * inside ternary, etc.) expects a top SValue, so push a sentinel
+ * int 0. Using the value of a void-returning call is invalid C; the
+ * sentinel just keeps stack discipline so the parser doesn't
+ * underflow on `f();` style statements. */
+ if (fn_type->fn.ret && fn_type->fn.ret->kind == TY_VOID) {
+ cg_push_int(p->cg, 0, ty_int(p));
+ }
+ continue;
+ }
+ if (is_punct(&t, '[') || is_punct(&t, '.') || is_punct(&t, P_ARROW)) {
+ perr(p, "subscript/member access not supported in v1 slice");
}
break;
}
}
+/* sizeof / _Alignof and cast all parse a type-name from inside parentheses;
+ * detection at `(` requires looking past the opening paren. The work is the
+ * same: dispatch on what comes next. */
static void parse_unary(Parser* p) {
Tok t = p->cur;
+ /* Cast expression `(type-name) cast`. Disambiguated against `(expr)`
+ * by checking the token immediately after `(`. */
+ if (is_punct(&t, '(')) {
+ Tok n = peek1(p);
+ if (starts_type_name(p, &n)) {
+ const Type* dst;
+ const Type* src;
+ advance(p); /* '(' */
+ dst = parse_type_name(p);
+ expect_punct(p, ')', "')' after type-name");
+ parse_unary(p); /* cast-expression */
+ to_rvalue(p);
+ /* `(void) expr` is the C idiom for "discard the value"; we must not
+ * convert (no value to materialize) — drop the rvalue and push
+ * nothing. The corpus relies on this for `(void)42;` style stmts. */
+ if (dst && dst->kind == TY_VOID) {
+ cg_drop(p->cg);
+ /* Leave nothing on stack. parse_stmt's expression-stmt path drops
+ * the result; our caller is parse_unary, so leave the stack
+ * exactly empty and synthesize a sentinel int 0 to keep value-
+ * stack discipline (so to_rvalue from a higher level still has
+ * a top). The expression `(void)e` cannot appear where a value
+ * is required, so this is dead-but-harmless. */
+ cg_push_int(p->cg, 0, ty_int(p));
+ return;
+ }
+ src = cg_top_type(p->cg);
+ /* Pointer-to-pointer cast is a no-op at the value level once the
+ * pointer is already in a register. Skip cg_convert (which would
+ * dispatch to the backend's same-class bitcast, not implemented for
+ * register-resident pointers). Update the SValue's type so later
+ * dereference picks the right pointee — easiest done by re-pushing
+ * with the new type. */
+ if (src && src->kind == TY_PTR && dst->kind == TY_PTR) {
+ cg_retag_top(p->cg, dst);
+ return;
+ }
+ cg_convert(p->cg, dst);
+ return;
+ }
+ /* fall through to parse_postfix → parse_primary which handles `(expr)`. */
+ }
if (is_punct(&t, '+')) {
advance(p);
parse_unary(p);
@@ -479,6 +969,32 @@ static void parse_unary(Parser* p) {
cg_unop(p->cg, UO_BNOT);
return;
}
+ if (is_punct(&t, '&')) {
+ advance(p);
+ parse_unary(p);
+ /* The operand is required to be an lvalue; cg_addr panics otherwise. */
+ cg_addr(p->cg);
+ return;
+ }
+ if (is_punct(&t, '*')) {
+ /* Dereference: parse the operand, force to a pointer rvalue, then
+ * derive the INDIRECT lvalue. The pointee type drives the next access. */
+ const Type* pty;
+ const Type* pointee;
+ advance(p);
+ parse_unary(p);
+ to_rvalue(p);
+ pty = cg_top_type(p->cg);
+ if (!pty || pty->kind != TY_PTR) {
+ perr(p, "indirection requires pointer operand");
+ }
+ pointee = pty->ptr.pointee;
+ if (pointee && pointee->kind == TY_VOID) {
+ perr(p, "dereferencing pointer to incomplete type");
+ }
+ cg_deref(p->cg, pointee);
+ return;
+ }
if (is_punct(&t, P_INC) || is_punct(&t, P_DEC)) {
BinOp bop = is_punct(&t, P_INC) ? BO_IADD : BO_ISUB;
advance(p);
@@ -486,6 +1002,51 @@ static void parse_unary(Parser* p) {
cg_inc_dec(p->cg, bop, /*post=*/0);
return;
}
+ if (is_kw(p, &t, KW_SIZEOF)) {
+ /* sizeof has two forms: `sizeof ( type-name )` and `sizeof unary`.
+ * The expression form must NOT evaluate its operand (per §6.5.3.4),
+ * which is awkward in single-pass codegen. The Phase 1 corpus only
+ * needs `sizeof(type-name)` and `sizeof(IDENT)` where IDENT is a
+ * declared object — both reducible to a type lookup with no
+ * emission. Other expression forms are diagnosed. */
+ const Type* ty = NULL;
+ advance(p);
+ if (is_punct(&p->cur, '(')) {
+ Tok n = peek1(p);
+ if (starts_type_name(p, &n)) {
+ advance(p);
+ ty = parse_type_name(p);
+ expect_punct(p, ')', "')'");
+ } else if (n.kind == TOK_IDENT && ident_kw(p, n.v.ident) == KW_NONE) {
+ /* `sizeof(IDENT)` where IDENT is an object — look up its type. */
+ SymEntry* e;
+ advance(p); /* '(' */
+ e = scope_lookup(p, p->cur.v.ident);
+ if (!e) {
+ compiler_panic(p->c, p->cur.loc, "undeclared identifier");
+ }
+ ty = e->type;
+ advance(p); /* IDENT */
+ expect_punct(p, ')', "')'");
+ } else {
+ perr(p, "sizeof of expression not supported in v1 slice");
+ }
+ } else {
+ perr(p, "sizeof expr (without parens) not supported in v1 slice");
+ }
+ cg_push_int(p->cg, (i64)abi_sizeof(p->abi, ty), ty_size_t(p));
+ return;
+ }
+ if (is_kw(p, &t, KW_ALIGNOF)) {
+ /* _Alignof is type-name only (per §6.5.3.4 ¶1). */
+ const Type* ty;
+ advance(p);
+ expect_punct(p, '(', "'('");
+ ty = parse_type_name(p);
+ expect_punct(p, ')', "')'");
+ cg_push_int(p->cg, (i64)abi_alignof(p->abi, ty), ty_size_t(p));
+ return;
+ }
parse_postfix(p);
/* postfix may have left an lvalue or rvalue. Higher-level callers
* issue to_rvalue when they need the value. */
@@ -634,12 +1195,116 @@ static void parse_bor(Parser* p) {
}
}
-/* Logical && / || / ?: are short-circuiting and need labels. The spine
- * doesn't need them yet (the relevant corpus rows are the §6.5_1[2,3,4]
- * group); they slot in here when those rows graduate. */
+/* Logical && / || are short-circuiting: the right operand is evaluated
+ * only when the left does not already determine the result. We lower
+ * each as a label-driven branch sequence that materializes a 0/1 i32
+ * result. Both produce an int rvalue regardless of operand types
+ * (per §6.5.13/14).
+ *
+ * a && b lowers to: a || b lowers to:
+ * <a>; jz Lfalse <a>; jnz Ltrue
+ * <b>; jz Lfalse <b>; jnz Ltrue
+ * push 1; jmp Lend push 0; jmp Lend
+ * Lfalse: push 0 Ltrue: push 1
+ * Lend: Lend:
+ */
+static void parse_land(Parser* p) {
+ parse_bor(p);
+ while (is_punct(&p->cur, P_AND)) {
+ CGLabel L_false = cg_label_new(p->cg);
+ CGLabel L_end = cg_label_new(p->cg);
+ advance(p);
+ to_rvalue(p);
+ cg_branch_false(p->cg, L_false);
+ parse_bor(p);
+ to_rvalue(p);
+ cg_branch_false(p->cg, L_false);
+ cg_push_int(p->cg, 1, ty_int(p));
+ cg_jump(p->cg, L_end);
+ cg_label_place(p->cg, L_false);
+ cg_push_int(p->cg, 0, ty_int(p));
+ cg_label_place(p->cg, L_end);
+ }
+}
+
+static void parse_lor(Parser* p) {
+ parse_land(p);
+ while (is_punct(&p->cur, P_OR)) {
+ CGLabel L_true = cg_label_new(p->cg);
+ CGLabel L_end = cg_label_new(p->cg);
+ advance(p);
+ to_rvalue(p);
+ cg_branch_true(p->cg, L_true);
+ parse_land(p);
+ to_rvalue(p);
+ cg_branch_true(p->cg, L_true);
+ cg_push_int(p->cg, 0, ty_int(p));
+ cg_jump(p->cg, L_end);
+ cg_label_place(p->cg, L_true);
+ cg_push_int(p->cg, 1, ty_int(p));
+ cg_label_place(p->cg, L_end);
+ }
+}
+
+/* Ternary `c ? t : f`. The cg value stack is linear-flow only, so a naive
+ * "push from each arm" leaves the stack in an inconsistent state at the
+ * merge point. We materialize the result through a fresh local: each arm
+ * stores into the same slot, the merge label reloads. v1 picks the slot's
+ * type from the then-arm and assumes the else-arm is the same type
+ * (matches the §6.5.15 corpus rows; full usual-conversions rules slot in
+ * with Phase 7).
+ *
+ * Likewise `&&` / `||` produce a 0/1 int and we lower them with explicit
+ * push/jump per branch, but since the result is a fresh constant in each
+ * arm, no temp slot is needed. The ternary case is special because the
+ * two arms can be arbitrary expressions whose computed values must
+ * appear on the same physical register/slot at the merge. */
+static void parse_ternary(Parser* p) {
+ parse_lor(p);
+ if (!is_punct(&p->cur, '?')) return;
+ CGLabel L_else = cg_label_new(p->cg);
+ CGLabel L_end = cg_label_new(p->cg);
+ const Type* result_ty = ty_int(p);
+ FrameSlot tmp;
+ FrameSlotDesc fsd;
+ /* Pop the cond, branch on it. */
+ advance(p); /* '?' */
+ to_rvalue(p);
+ cg_branch_false(p->cg, L_else);
+ parse_assign_expr(p);
+ to_rvalue(p);
+ /* Update result_ty from the then-arm (a closer approximation than int). */
+ result_ty = cg_top_type(p->cg);
+ if (!result_ty) result_ty = ty_int(p);
+ memset(&fsd, 0, sizeof fsd);
+ fsd.type = result_ty;
+ fsd.size = abi_sizeof(p->abi, result_ty);
+ fsd.align = abi_alignof(p->abi, result_ty);
+ fsd.kind = FS_LOCAL;
+ fsd.flags = FSF_NONE;
+ tmp = cg_local(p->cg, &fsd);
+ /* Store then-arm value into tmp. cg_store needs [lv, rv]; the rvalue
+ * is already on top, so push the lvalue and swap. */
+ cg_push_local_typed(p->cg, tmp, result_ty);
+ cg_swap(p->cg);
+ cg_store(p->cg);
+ cg_drop(p->cg); /* cg_store leaves the rvalue; drop in stmt-style usage */
+ cg_jump(p->cg, L_end);
+ cg_label_place(p->cg, L_else);
+ expect_punct(p, ':', "':' in ternary");
+ parse_assign_expr(p);
+ to_rvalue(p);
+ cg_push_local_typed(p->cg, tmp, result_ty);
+ cg_swap(p->cg);
+ cg_store(p->cg);
+ cg_drop(p->cg);
+ cg_label_place(p->cg, L_end);
+ /* At the merge, push the slot lvalue; callers can to_rvalue if needed. */
+ cg_push_local_typed(p->cg, tmp, result_ty);
+}
static void parse_assign_expr(Parser* p) {
- parse_bor(p);
+ parse_ternary(p);
/* The LHS is now on the CG stack. If it's an lvalue we may consume it
* for assignment; otherwise we keep the rvalue as the final result. */
Tok t = p->cur;
@@ -710,61 +1375,11 @@ static void parse_expr(Parser* p) {
}
/* ============================================================
- * Declarations (slice: `int` / `void` only, no struct/union/enum/typedef)
- * ============================================================ */
-
-typedef struct DeclSpecs {
- const Type* type;
- DeclStorage storage;
- u32 flags; /* DeclFlag */
-} DeclSpecs;
-
-static int parse_decl_specs(Parser* p, DeclSpecs* out) {
- /* v1: tracks `int`, `void`, `static`, `extern`, plus a couple of common
- * qualifiers that are ignored at this slice. Returns 0 if no specifier
- * was consumed (caller treats that as "not a declaration"). */
- int seen = 0;
- out->type = NULL;
- out->storage = DS_AUTO;
- out->flags = DF_NONE;
- for (;;) {
- Tok t = p->cur;
- if (is_kw(p, &t, KW_INT)) {
- if (out->type) perr(p, "conflicting type specifiers");
- out->type = type_prim(p->pool, TY_INT);
- advance(p);
- seen = 1;
- } else if (is_kw(p, &t, KW_VOID)) {
- if (out->type) perr(p, "conflicting type specifiers");
- out->type = type_void(p->pool);
- advance(p);
- seen = 1;
- } else if (is_kw(p, &t, KW_STATIC)) {
- out->storage = DS_STATIC;
- advance(p);
- seen = 1;
- } else if (is_kw(p, &t, KW_EXTERN)) {
- out->storage = DS_EXTERN;
- advance(p);
- seen = 1;
- } else if (is_kw(p, &t, KW_CONST) || is_kw(p, &t, KW_VOLATILE) ||
- is_kw(p, &t, KW_RESTRICT) || is_kw(p, &t, KW_INLINE) ||
- is_kw(p, &t, KW_NORETURN) || is_kw(p, &t, KW_REGISTER) ||
- is_kw(p, &t, KW_AUTO)) {
- /* Recognized but currently no-op at this slice. */
- advance(p);
- seen = 1;
- } else {
- break;
- }
- }
- if (seen && !out->type) {
- /* `static x;` without type — default to int per pre-C99, but this is
- * a hard error in C99/C11. Still tolerate at the scaffold level. */
- out->type = ty_int(p);
- }
- return seen;
-}
+ * Declarations (slice: `int` / `void` / `char` only)
+ * ============================================================
+ * DeclSpecs and parse_decl_specs are defined above (hoisted before the
+ * expression parsing section). What follows here is the declarator-and-
+ * initializer machinery built on top of them. */
/* Forward decl for parse_compound_stmt (mutually recursive with statement
* dispatch). */
@@ -791,27 +1406,35 @@ static FrameSlot make_local(Parser* p, Sym name, const Type* type, SrcLoc loc) {
return s;
}
-/* Parse a single init-declarator after the decl-specs have been consumed.
- * Spine grammar: declarator = IDENT ; init = `=` assign_expr.
- * Pointer/array/function declarators are TODO — those slot in here as
- * additional layers around the IDENT. */
-static void parse_init_declarator(Parser* p, const DeclSpecs* specs) {
- SrcLoc loc;
- Tok name_tok;
- Sym name;
+/* Parse a non-abstract declarator: optional `*` pointer prefix followed
+ * by an IDENT. v1 doesn't yet implement function or array declarators,
+ * which slot in around the IDENT in subsequent phases. Returns the
+ * declared type (with pointer layers wrapping `base`) and writes the
+ * IDENT to *name_out / *loc_out. */
+static const Type* parse_declarator(Parser* p, const Type* base, Sym* name_out,
+ SrcLoc* loc_out) {
+ base = parse_pointer_layer(p, base);
if (p->cur.kind != TOK_IDENT || ident_kw(p, p->cur.v.ident) != KW_NONE) {
perr(p, "expected declarator name");
}
- name_tok = p->cur;
- loc = tok_loc(&name_tok);
- name = name_tok.v.ident;
+ *name_out = p->cur.v.ident;
+ *loc_out = tok_loc(&p->cur);
advance(p);
+ return base;
+}
+
+/* Parse a single init-declarator after the decl-specs have been consumed.
+ * v1 grammar: declarator = `*`* IDENT ; init = `=` assign_expr. */
+static void parse_init_declarator(Parser* p, const DeclSpecs* specs) {
+ SrcLoc loc;
+ Sym name;
+ const Type* var_ty = parse_declarator(p, specs->type, &name, &loc);
/* Local declaration only at this slice. */
{
- FrameSlot s = make_local(p, name, specs->type, loc);
+ FrameSlot s = make_local(p, name, var_ty, loc);
if (accept_punct(p, '=')) {
cg_set_loc(p->cg, loc);
- cg_push_local_typed(p->cg, s, specs->type);
+ cg_push_local_typed(p->cg, s, var_ty);
parse_assign_expr(p);
to_rvalue(p);
cg_store(p->cg);
@@ -984,6 +1607,10 @@ static void parse_compound_stmt(Parser* p) {
}
static void parse_stmt(Parser* p) {
+ /* Each statement starts from an empty value stack; recycle scratch
+ * registers so a function body with many sequential reg-allocating
+ * operations isn't bounded by the backend's fixed scratch window. */
+ cg_reset_scratch(p->cg);
cg_set_loc(p->cg, tok_loc(&p->cur));
if (is_punct(&p->cur, '{')) {
parse_compound_stmt(p);
@@ -1033,54 +1660,125 @@ static void parse_stmt(Parser* p) {
* External (top-level) declarations
* ============================================================ */
-/* For the spine, the only function shape is `int test_main(void) { ... }`.
- * We accept `<type> IDENT (` `void` `)` `{` ... `}` and reject anything
- * fancier. The full §6.7.6 declarator surface (parameters, varargs,
- * pointer/array returns) lands as the corresponding corpus rows do. */
-static void parse_function_definition(Parser* p, const DeclSpecs* specs,
- Sym fname, SrcLoc fname_loc) {
- const Type** ptypes = NULL;
- u16 nparams = 0;
- const Type* fn_ty;
- const ABIFuncInfo* abi;
- Decl decl_in;
- DeclId did;
- ObjSymId fsym;
- CGFuncDesc fd;
+/* Helper: holds one parsed parameter's name + type (for binding into the
+ * function-body scope after cg_func_begin / cg_param). */
+typedef struct ParamInfo {
+ Sym name;
+ const Type* type;
+ SrcLoc loc;
+} ParamInfo;
- /* Param list: `void` or empty (and `)`); full list is TODO. */
- expect_punct(p, '(', "'('");
- if (accept_kw(p, KW_VOID)) {
- /* `(void)`: zero params, not variadic. */
- } else if (!is_punct(&p->cur, ')')) {
- perr(p, "only `(void)` parameter list is supported in v1 slice");
+/* Parse a parameter-type-list. Returns the parameter type array and counts
+ * via out-pointers; `*variadic_out` is set if the list ends in `, ...`.
+ *
+ * Forms accepted:
+ * `(void)` — zero named params
+ * `()` — old-style "unspecified args"; treated as zero
+ * `(T1, T2, ...)` — named or abstract params, possibly trailing ellipsis
+ *
+ * For each named param we record name+type so the function-body parser can
+ * later bind them into the param scope. Abstract (no-name) params are
+ * allowed for prototype-only declarations. */
+static void parse_param_list(Parser* p, ParamInfo** infos_out, u16* nparams_out,
+ u8* variadic_out) {
+ ParamInfo* infos;
+ u32 cap = 4;
+ u32 n = 0;
+ *variadic_out = 0;
+ *infos_out = NULL;
+ *nparams_out = 0;
+
+ if (is_punct(&p->cur, ')')) {
+ return; /* `()` — no params recorded */
}
- expect_punct(p, ')', "')'");
+ if (is_kw(p, &p->cur, KW_VOID)) {
+ Tok n = peek1(p);
+ if (is_punct(&n, ')')) {
+ advance(p); /* `void` */
+ return; /* `(void)` */
+ }
+ }
+
+ infos = (ParamInfo*)arena_array(p->c->tu, ParamInfo, cap);
+ for (;;) {
+ DeclSpecs specs;
+ Sym pname = 0;
+ SrcLoc ploc = {0, 0, 0};
+ const Type* pty;
+ if (accept_punct(p, P_ELLIPSIS)) {
+ *variadic_out = 1;
+ break;
+ }
+ if (!parse_decl_specs(p, &specs)) {
+ perr(p, "expected parameter type");
+ }
+ /* Allow either named (`int x`) or abstract (`int`) declarators. We
+ * peek the pointer prefix, then if an IDENT follows it's named. */
+ pty = parse_pointer_layer(p, specs.type);
+ if (p->cur.kind == TOK_IDENT && ident_kw(p, p->cur.v.ident) == KW_NONE) {
+ pname = p->cur.v.ident;
+ ploc = tok_loc(&p->cur);
+ advance(p);
+ }
+ if (n == cap) {
+ cap *= 2;
+ ParamInfo* nbuf = (ParamInfo*)arena_array(p->c->tu, ParamInfo, cap);
+ memcpy(nbuf, infos, sizeof(ParamInfo) * n);
+ infos = nbuf;
+ }
+ infos[n].name = pname;
+ infos[n].type = pty;
+ infos[n].loc = ploc;
+ ++n;
+ if (!accept_punct(p, ',')) break;
+ }
+ *infos_out = infos;
+ *nparams_out = (u16)n;
+}
- fn_ty = type_func(p->pool, specs->type, ptypes, nparams, 0);
- abi = abi_func_info(p->abi, fn_ty);
-
- memset(&decl_in, 0, sizeof decl_in);
- decl_in.name = fname;
- decl_in.type = fn_ty;
- decl_in.loc = fname_loc;
- decl_in.storage = (specs->storage == DS_STATIC) ? DS_STATIC : DS_EXTERN;
- decl_in.linkage =
- (specs->storage == DS_STATIC) ? DL_INTERNAL : DL_EXTERNAL;
- decl_in.visibility = SV_DEFAULT;
- did = decl_declare(p->decls, &decl_in);
- fsym = decl_obj_sym(p->decls, did);
- /* Promote the symbol's binding for non-static functions. decl_declare
- * minted it with the right binding; assert here for clarity. */
-
- /* Bind the function name into file scope so calls resolve. */
+/* Resolve or mint the ObjSymId for a function declaration. If the same
+ * function name was seen before in file scope (forward prototype, prior
+ * definition), reuse its symbol so the linker sees one definition. */
+static SymEntry* declare_function(Parser* p, Sym fname, const Type* fn_ty,
+ const DeclSpecs* specs, SrcLoc fname_loc) {
+ SymEntry* existing = scope_lookup(p, fname);
+ if (existing && existing->kind == SEK_FUNC) {
+ /* Compatible-types check is Phase 10 territory; for v1 we trust the
+ * declarations agree. Returning the existing entry lets the body
+ * defs reuse the prior obj_sym. */
+ return existing;
+ }
{
- SymEntry* e = scope_define(p, fname, SEK_FUNC, fn_ty);
+ Decl decl_in;
+ DeclId did;
+ ObjSymId fsym;
+ SymEntry* e;
+ memset(&decl_in, 0, sizeof decl_in);
+ decl_in.name = fname;
+ decl_in.type = fn_ty;
+ decl_in.loc = fname_loc;
+ decl_in.storage = (specs->storage == DS_STATIC) ? DS_STATIC : DS_EXTERN;
+ decl_in.linkage =
+ (specs->storage == DS_STATIC) ? DL_INTERNAL : DL_EXTERNAL;
+ decl_in.visibility = SV_DEFAULT;
+ did = decl_declare(p->decls, &decl_in);
+ fsym = decl_obj_sym(p->decls, did);
+ e = scope_define(p, fname, SEK_FUNC, fn_ty);
e->v.sym = fsym;
+ return e;
}
+}
+
+/* Drive cg through a full function definition: build CGFuncDesc with the
+ * already-resolved symbol and ABI info, open a parameter scope, allocate
+ * FS_PARAM slots for each named param, dispatch cg_param, then parse the
+ * compound body. The `infos` array is the parser's per-param state. */
+static void parse_function_body(Parser* p, ObjSymId fsym, const Type* fn_ty,
+ const ABIFuncInfo* abi, const ParamInfo* infos,
+ u16 nparams, SrcLoc fname_loc) {
+ CGFuncDesc fd;
+ CGParamDesc* pds = NULL;
- /* Function body: open a parameter scope, then descend into body. The
- * spine has no params, so we just open an empty scope. */
memset(&fd, 0, sizeof fd);
fd.sym = fsym;
fd.text_section_id = p->text_sec;
@@ -1088,20 +1786,61 @@ static void parse_function_definition(Parser* p, const DeclSpecs* specs,
fd.fn_type = fn_ty;
fd.abi = abi;
fd.params = NULL;
- fd.nparams = 0;
+ fd.nparams = nparams;
fd.loc = fname_loc;
- scope_push(p);
+ if (nparams) {
+ pds = (CGParamDesc*)arena_array(p->c->tu, CGParamDesc, nparams);
+ memset(pds, 0, sizeof(CGParamDesc) * nparams);
+ for (u16 i = 0; i < nparams; ++i) {
+ pds[i].index = i;
+ pds[i].name = infos[i].name;
+ pds[i].type = infos[i].type;
+ pds[i].slot = FRAME_SLOT_NONE; /* filled below */
+ pds[i].abi = &abi->params[i];
+ /* The aarch64 backend reads parts from `pds[i].abi->parts` directly;
+ * `incoming` is the materialized CGABIPart slot used by ABIs that
+ * pre-stage values. Leave NULL until a backend wires it up. */
+ pds[i].incoming = NULL;
+ pds[i].nincoming = 0;
+ pds[i].loc = infos[i].loc;
+ }
+ fd.params = pds;
+ }
+
+ scope_push(p); /* parameter scope */
cg_set_loc(p->cg, fname_loc);
cg_func_begin(p->cg, &fd);
+
+ /* Allocate FS_PARAM slots and dispatch cg_param in declaration order. */
+ for (u16 i = 0; i < nparams; ++i) {
+ FrameSlotDesc fsd;
+ FrameSlot s;
+ SymEntry* e;
+ memset(&fsd, 0, sizeof fsd);
+ fsd.type = infos[i].type;
+ fsd.name = infos[i].name;
+ fsd.loc = infos[i].loc;
+ fsd.size = abi_sizeof(p->abi, infos[i].type);
+ fsd.align = abi_alignof(p->abi, infos[i].type);
+ fsd.kind = FS_PARAM;
+ fsd.flags = FSF_NONE;
+ s = cg_local(p->cg, &fsd);
+ pds[i].slot = s;
+ cg_param(p->cg, &pds[i]);
+ if (infos[i].name) {
+ e = scope_define(p, infos[i].name, SEK_LOCAL, infos[i].type);
+ e->v.slot = s;
+ }
+ }
+
parse_compound_stmt(p);
- /* Implicit fall-through return for `int main` — emit a return-0 if the
- * function reaches the closing brace without an explicit return. The
- * codegen always emits a real epilogue at func_end, so this is just a
- * safety belt against undefined behavior on trailing fall-through.
- * Spine cases all `return ...;` explicitly, so this is dead code there. */
- if (specs->type && specs->type->kind != TY_VOID) {
- cg_push_int(p->cg, 0, specs->type);
+ /* Implicit fall-through return: emit a return so the function's epilogue
+ * always has a tail to chain into. For non-void functions this returns
+ * a zero value, which is undefined behavior at the language level but
+ * a useful safety belt against trailing-fall-through. */
+ if (fn_ty->fn.ret && fn_ty->fn.ret->kind != TY_VOID) {
+ cg_push_int(p->cg, 0, fn_ty->fn.ret);
cg_ret(p->cg, 1);
} else {
cg_ret(p->cg, 0);
@@ -1110,29 +1849,61 @@ static void parse_function_definition(Parser* p, const DeclSpecs* specs,
scope_pop(p);
}
+/* Parse one external declaration: function definition, function prototype,
+ * or (deferred) global object declaration. The declarator is consumed by
+ * parse_declarator before we know whether a body or `;` follows. */
static void parse_external_decl(Parser* p) {
DeclSpecs specs;
- Tok name_tok;
Sym name;
SrcLoc loc;
+ const Type* base_ty;
if (!parse_decl_specs(p, &specs)) {
perr(p, "expected declaration");
}
- /* Parse the declarator. v1 slice: just IDENT — pointer/array layers
- * are TODO. */
+ /* Parse the declarator's pointer prefix and IDENT. Function and array
+ * declarator suffixes are recognized inline below. */
+ base_ty = parse_pointer_layer(p, specs.type);
if (p->cur.kind != TOK_IDENT || ident_kw(p, p->cur.v.ident) != KW_NONE) {
perr(p, "expected declarator");
}
- name_tok = p->cur;
- loc = tok_loc(&name_tok);
- name = name_tok.v.ident;
+ name = p->cur.v.ident;
+ loc = tok_loc(&p->cur);
advance(p);
if (is_punct(&p->cur, '(')) {
- parse_function_definition(p, &specs, name, loc);
- return;
+ /* Function declaration or definition: build the type from the param
+ * list, then dispatch on `{` (definition) vs `;` (prototype). */
+ ParamInfo* infos = NULL;
+ u16 nparams = 0;
+ u8 variadic = 0;
+ const Type** ptypes = NULL;
+ const Type* fn_ty;
+ const ABIFuncInfo* abi;
+ SymEntry* fent;
+
+ advance(p); /* '(' */
+ parse_param_list(p, &infos, &nparams, &variadic);
+ expect_punct(p, ')', "')' after parameter list");
+
+ if (nparams) {
+ ptypes = (const Type**)arena_array(p->c->tu, const Type*, nparams);
+ for (u16 i = 0; i < nparams; ++i) ptypes[i] = infos[i].type;
+ }
+ fn_ty = type_func(p->pool, base_ty, ptypes, nparams, (int)variadic);
+ abi = abi_func_info(p->abi, fn_ty);
+
+ fent = declare_function(p, name, fn_ty, &specs, loc);
+
+ if (is_punct(&p->cur, '{')) {
+ parse_function_body(p, fent->v.sym, fn_ty, abi, infos, nparams, loc);
+ return;
+ }
+ if (accept_punct(p, ';')) {
+ return; /* prototype only */
+ }
+ perr(p, "expected '{' or ';' after function declarator");
}
/* Global object declaration: `int g;` / `int g = 7;` / `int g = ..., h;` */
diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md
@@ -56,12 +56,12 @@ function definition itself.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_5_01_return_const` | · | `return 42;` | 42 |
-| `6_5_02_add` | · | `return 1 + 2;` | 3 |
-| `6_5_03_sub_mul` | · | `return 7 * 3 - 4;` | 17 |
-| `6_8_01_if_else` | · | `int x; if (1) x = 7; else x = 99; return x;` | 7 |
-| `6_8_02_while_sum` | · | `int s=0,i=0; while (i<10) { s+=i; i++; } return s;` | 45 |
-| `6_8_03_for_sum` | · | `int s=0; for (int i=1; i<=10; i++) s+=i; return s;` | 55 |
+| `6_5_01_return_const` | ★ | `return 42;` | 42 |
+| `6_5_02_add` | ★ | `return 1 + 2;` | 3 |
+| `6_5_03_sub_mul` | ★ | `return 7 * 3 - 4;` | 17 |
+| `6_8_01_if_else` | ★ | `int x; if (1) x = 7; else x = 99; return x;` | 7 |
+| `6_8_02_while_sum` | ★ | `int s=0,i=0; while (i<10) { s+=i; i++; } return s;` | 45 |
+| `6_8_03_for_sum` | ★ | `int s=0; for (int i=1; i<=10; i++) s+=i; return s;` | 55 |
Negative spine (one case per spec area; expands as the parser learns
to diagnose more):
@@ -82,7 +82,7 @@ natural home elsewhere.
| `6_2_3_01_tag_ord_namespace` | · | `struct s { int v; }; int s = 42; struct s t = {0}; return s + t.v;` | 42 |
| `6_2_3_02_label_namespace` | · | `int s = 0; goto s; s = 99; s: return 42;` | 42 |
| `6_2_4_01_static_keeps_value` | · | helper `int next(){static int n=40; return ++n;}`; `next(); return next();` | 42 |
-| `6_2_5_01_void_func_no_value` | · | helper `void f(int *p){*p=42;} int x; f(&x); return x;` | 42 |
+| `6_2_5_01_void_func_no_value` | ★ | helper `void f(int *p){*p=42;} int x; f(&x); return x;` | 42 |
## §6.3 Conversions
@@ -91,17 +91,17 @@ explicit cast; rows here fill in the rest of the conversion matrix.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_3_1_1_01_char_promotion` | · | `char c = 'A'; return c - '@' + 41;` | 42 |
+| `6_3_1_1_01_char_promotion` | ★ | `char c = 'A'; return c - '@' + 41;` | 42 |
| `6_3_1_3_01_signed_to_unsigned` | · | `int n = -1; unsigned u = (unsigned)n; return (int)(u & 0xff);` | 255 |
| `6_3_1_3_02_unsigned_narrow` | · | `unsigned u = 0x100002aU; int n = (int)u; return n;` | 42 |
| `6_3_1_4_01_float_to_int` | · | `double d = 42.9; return (int)d;` | 42 |
| `6_3_1_4_02_int_to_float` | · | `int n = 42; double d = n; return (int)d;` | 42 |
-| `6_3_1_8_01_usual_arith_mixed` | · | `int s = -1; unsigned u = 1; return (s + u) ? 0 : 42;` | 42 |
+| `6_3_1_8_01_usual_arith_mixed` | ★ | `int s = -1; unsigned u = 1; return (s + u) ? 0 : 42;` | 42 |
| `6_3_2_1_01_array_to_ptr` | · | `int a[3] = {0,0,42}; int *p = a; return p[2];` | 42 |
| `6_3_2_1_02_func_to_ptr` | · | helper `id`; `int (*fp)(int) = id; return fp(42);` | 42 |
-| `6_3_2_2_01_void_cast_discard` | · | `(void)42; return 42;` | 42 |
-| `6_3_2_3_01_null_ptr_cmp` | · | `int *p = 0; return p ? 99 : 42;` | 42 |
-| `6_3_2_3_02_void_ptr_roundtrip` | · | `int x=42; void *v=&x; int *p=(int*)v; return *p;` | 42 |
+| `6_3_2_2_01_void_cast_discard` | ★ | `(void)42; return 42;` | 42 |
+| `6_3_2_3_01_null_ptr_cmp` | ★ | `int *p = 0; return p ? 99 : 42;` | 42 |
+| `6_3_2_3_02_void_ptr_roundtrip` | ★ | `int x=42; void *v=&x; int *p=(int*)v; return *p;` | 42 |
## §6.5 Expressions
@@ -111,33 +111,33 @@ here for completeness once they're real cases.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_5_01_return_const` | · | `return 42;` | 42 |
-| `6_5_02_add` | · | `return 1 + 2;` | 3 |
-| `6_5_03_sub_mul` | · | `return 7 * 3 - 4;` | 17 |
-| `6_5_04_div_mod` | · | `return 23 / 4 + 23 % 4;` | 8 |
-| `6_5_05_bitwise_and` | · | `return (~3) & 0xff;` | 252 |
-| `6_5_06_bitwise_or_xor` | · | `return (0xa5 ^ 0x5a) & 0xff;` | 255 |
-| `6_5_07_shift` | · | `return (1<<5) \| (16>>1);` | 40 |
-| `6_5_08_unary_neg` | · | `return -7;` | 249 |
-| `6_5_09_logical_not` | · | `return !0 + !!5;` | 2 |
-| `6_5_10_cmp_eq` | · | `return (5 == 5) + (5 == 6);` | 1 |
-| `6_5_11_cmp_lt` | · | `return (-1 < 1);` | 1 |
-| `6_5_12_logical_and_skip` | · | `int s=0; (0) && (s=99); return s;` | 0 |
-| `6_5_13_logical_or_skip` | · | `int s=0; (1) \|\| (s=99); return s;` | 0 |
-| `6_5_14_ternary` | · | `return (5>3) ? 42 : 7;` | 42 |
-| `6_5_15_comma` | · | `int x; return (x=1, x=42, x);` | 42 |
-| `6_5_16_assign` | · | `int x; x = 42; return x;` | 42 |
-| `6_5_17_compound_assign` | · | `int x = 40; x += 2; return x;` | 42 |
-| `6_5_18_pre_inc` | · | `int x = 41; return ++x;` | 42 |
-| `6_5_19_post_inc` | · | `int x = 42; x++; return x;` | 43; reads as 43 |
-| `6_5_20_addr_deref` | · | `int x = 42; int *p = &x; return *p;` | 42 |
-| `6_5_21_sizeof_int` | · | `return (int)sizeof(int);` | 4 |
+| `6_5_01_return_const` | ★ | `return 42;` | 42 |
+| `6_5_02_add` | ★ | `return 1 + 2;` | 3 |
+| `6_5_03_sub_mul` | ★ | `return 7 * 3 - 4;` | 17 |
+| `6_5_04_div_mod` | ★ | `return 23 / 4 + 23 % 4;` | 8 |
+| `6_5_05_bitwise_and` | ★ | `return (~3) & 0xff;` | 252 |
+| `6_5_06_bitwise_or_xor` | ★ | `return (0xa5 ^ 0x5a) & 0xff;` | 255 |
+| `6_5_07_shift` | ★ | `return (1<<5) \| (16>>1);` | 40 |
+| `6_5_08_unary_neg` | ★ | `return -7;` | 249 |
+| `6_5_09_logical_not` | ★ | `return !0 + !!5;` | 2 |
+| `6_5_10_cmp_eq` | ★ | `return (5 == 5) + (5 == 6);` | 1 |
+| `6_5_11_cmp_lt` | ★ | `return (-1 < 1);` | 1 |
+| `6_5_12_logical_and_skip` | ★ | `int s=0; (0) && (s=99); return s;` | 0 |
+| `6_5_13_logical_or_skip` | ★ | `int s=0; (1) \|\| (s=99); return s;` | 0 |
+| `6_5_14_ternary` | ★ | `return (5>3) ? 42 : 7;` | 42 |
+| `6_5_15_comma` | ★ | `int x; return (x=1, x=42, x);` | 42 |
+| `6_5_16_assign` | ★ | `int x; x = 42; return x;` | 42 |
+| `6_5_17_compound_assign` | ★ | `int x = 40; x += 2; return x;` | 42 |
+| `6_5_18_pre_inc` | ★ | `int x = 41; return ++x;` | 42 |
+| `6_5_19_post_inc` | ★ | `int x = 42; x++; return x;` | 43; reads as 43 |
+| `6_5_20_addr_deref` | ★ | `int x = 42; int *p = &x; return *p;` | 42 |
+| `6_5_21_sizeof_int` | ★ | `return (int)sizeof(int);` | 4 |
| `6_5_22_sizeof_expr` | · | `int a[7]; return (int)(sizeof(a)/sizeof(int));` | 7 |
-| `6_5_23_cast` | · | `return (int)(unsigned char)(-1);` | 255 |
-| `6_5_24_func_call` | · | helper `int id(int x){return x;}` + `return id(42);` | 42 |
-| `6_5_25_unary_plus` | · | `return +42;` | 42 |
-| `6_5_26_pre_dec` | · | `int x = 43; return --x;` | 42 |
-| `6_5_27_post_dec` | · | `int x = 43; x--; return x;` | 42 |
+| `6_5_23_cast` | ★ | `return (int)(unsigned char)(-1);` | 255 |
+| `6_5_24_func_call` | ★ | helper `int id(int x){return x;}` + `return id(42);` | 42 |
+| `6_5_25_unary_plus` | ★ | `return +42;` | 42 |
+| `6_5_26_pre_dec` | ★ | `int x = 43; return --x;` | 42 |
+| `6_5_27_post_dec` | ★ | `int x = 43; x--; return x;` | 42 |
| `6_5_28_arrow` | · | `struct S{int v;} s={42}; struct S *p=&s; return p->v;` | 42 |
| `6_5_29_compound_literal` | · | `int *p = (int[]){10, 32}; return p[0]+p[1];` | 42 |
| `6_5_30_generic_selection`| · | `int x=42; return _Generic((x), int: x, default: 0);` | 42 |
@@ -149,7 +149,7 @@ here for completeness once they're real cases.
| Case | Status | Body | Expected |
|---|---|---|---|
| `6_6_01_enum_const` | · | `enum { K = 42 }; return K;` | 42 |
-| `6_6_02_const_expr_init` | · | `int x = 1+2*3; return x;` | 7 |
+| `6_6_02_const_expr_init` | ★ | `int x = 1+2*3; return x;` | 7 |
| `6_6_03_array_size_const` | · | `int a[3+4] = {0}; return (int)sizeof a / (int)sizeof a[0];` | 7 |
## §6.7 Declarations
@@ -157,14 +157,14 @@ here for completeness once they're real cases.
| Case | Status | Body | Expected |
|---|---|---|---|
| `6_7_01_typedef` | · | `typedef int I; I x = 42; return x;` | 42 |
-| `6_7_02_static_local` | · | `static int s = 42; return s;` | 42 |
+| `6_7_02_static_local` | ★ | `static int s = 42; return s;` | 42 |
| `6_7_03_static_global` | · | `static int g = 42; int test_main(void){return g;}` | 42 |
| `6_7_04_extern_resolved` | · | `extern int g; int g = 42; return g;` | 42 |
-| `6_7_05_const_qualifier` | · | `const int c = 42; return c;` | 42 |
+| `6_7_05_const_qualifier` | ★ | `const int c = 42; return c;` | 42 |
| `6_7_06_struct_basic` | · | `struct S { int a, b; } s = {10, 32}; return s.a + s.b;` | 42 |
| `6_7_07_union_basic` | · | `union U { int i; char c[4]; } u; u.i = 42; return u.i;` | 42 |
| `6_7_08_enum_basic` | · | `enum E { A = 40, B }; return B + 1;` | 42 |
-| `6_7_09_alignof` | · | `return (int)_Alignof(double);` | 8 |
+| `6_7_09_alignof` | ★ | `return (int)_Alignof(double);` | 8 |
## §6.7.2 Type specifiers
@@ -174,15 +174,15 @@ that the type round-trips through a declaration and back to `int`.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_7_2_01_short` | · | `short x = 42; return x;` | 42 |
-| `6_7_2_02_long` | · | `long x = 42L; return (int)x;` | 42 |
-| `6_7_2_03_long_long` | · | `long long x = 42LL; return (int)x;` | 42 |
+| `6_7_2_01_short` | ★ | `short x = 42; return x;` | 42 |
+| `6_7_2_02_long` | ★ | `long x = 42L; return (int)x;` | 42 |
+| `6_7_2_03_long_long` | ★ | `long long x = 42LL; return (int)x;` | 42 |
| `6_7_2_04_unsigned` | · | `unsigned x = 42U; return (int)x;` | 42 |
-| `6_7_2_05_signed_char` | · | `signed char c = 42; return c;` | 42 |
-| `6_7_2_06_unsigned_char` | · | `unsigned char c = 200; return c;` | 200 |
-| `6_7_2_07_unsigned_short` | · | `unsigned short s = 42; return s;` | 42 |
-| `6_7_2_08_unsigned_long` | · | `unsigned long x = 42UL; return (int)x;` | 42 |
-| `6_7_2_09_bool` | · | `_Bool b = 5; return b ? 42 : 0;` | 42 |
+| `6_7_2_05_signed_char` | ★ | `signed char c = 42; return c;` | 42 |
+| `6_7_2_06_unsigned_char` | ★ | `unsigned char c = 200; return c;` | 200 |
+| `6_7_2_07_unsigned_short` | ★ | `unsigned short s = 42; return s;` | 42 |
+| `6_7_2_08_unsigned_long` | ★ | `unsigned long x = 42UL; return (int)x;` | 42 |
+| `6_7_2_09_bool` | ★ | `_Bool b = 5; return b ? 42 : 0;` | 42 |
| `6_7_2_10_float` | · | `float f = 42.0f; return (int)f;` | 42 |
| `6_7_2_11_double` | · | `double d = 42.5; return (int)d;` | 42 |
| `6_7_2_12_long_double` | · | `long double d = 42.0L; return (int)d;` | 42 |
@@ -209,18 +209,18 @@ remaining qualifier forms and pointer-qualifier interactions.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_7_3_01_volatile` | · | `volatile int x = 42; return x;` | 42 |
-| `6_7_3_02_restrict_param` | · | helper `int rd(int *restrict p){return *p;}` + caller | 42 |
-| `6_7_3_03_const_pointer` | · | `int x=42; int *const p=&x; return *p;` | 42 |
-| `6_7_3_04_ptr_to_const` | · | `const int x=42; const int *p=&x; return *p;` | 42 |
-| `6_7_3_05_atomic` | · | `_Atomic int x = 42; return x;` | 42 |
+| `6_7_3_01_volatile` | ★ | `volatile int x = 42; return x;` | 42 |
+| `6_7_3_02_restrict_param` | ★ | helper `int rd(int *restrict p){return *p;}` + caller | 42 |
+| `6_7_3_03_const_pointer` | ★ | `int x=42; int *const p=&x; return *p;` | 42 |
+| `6_7_3_04_ptr_to_const` | ★ | `const int x=42; const int *p=&x; return *p;` | 42 |
+| `6_7_3_05_atomic` | ★ | `_Atomic int x = 42; return x;` | 42 |
## §6.7.4 Function specifiers
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_7_4_01_inline` | · | `static inline int id(int x){return x;}` + `return id(42);` | 42 |
-| `6_7_4_02_noreturn` | · | full TU: `_Noreturn void die(void){for(;;);} int test_main(void){return 42;}` (declared, not called) | 42 |
+| `6_7_4_01_inline` | ★ | `static inline int id(int x){return x;}` + `return id(42);` | 42 |
+| `6_7_4_02_noreturn` | ★ | full TU: `_Noreturn void die(void){for(;;);} int test_main(void){return 42;}` (declared, not called) | 42 |
## §6.7.5 Alignment specifier
@@ -236,7 +236,7 @@ already exercised in §6.5 and §6.7.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_7_6_01_ptr_to_ptr` | · | `int x=42; int *p=&x; int **pp=&p; return **pp;` | 42 |
+| `6_7_6_01_ptr_to_ptr` | ★ | `int x=42; int *p=&x; int **pp=&p; return **pp;` | 42 |
| `6_7_6_02_array_2d` | · | `int a[2][3]={{0,0,0},{0,0,42}}; return a[1][2];` | 42 |
| `6_7_6_03_array_of_ptr` | · | `int x=42; int *a[2]={0,&x}; return *a[1];` | 42 |
| `6_7_6_04_funcptr_decl` | · | `int id(int x){return x;} int (*fp)(int)=id; return fp(42);` | 42 |
@@ -261,7 +261,7 @@ cover compound typedef targets.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_7_9_01_scalar_init` | · | `int x = 42; return x;` | 42 |
+| `6_7_9_01_scalar_init` | ★ | `int x = 42; return x;` | 42 |
| `6_7_9_02_array_brace` | · | `int a[3] = {10, 20, 12}; return a[0]+a[1]+a[2];` | 42 |
| `6_7_9_03_partial_zero` | · | `int a[5] = {42}; return a[0] + a[4];` | 42 |
| `6_7_9_04_designated` | · | `int a[5] = {[2] = 42}; return a[2];` | 42 |
@@ -283,31 +283,31 @@ cover compound typedef targets.
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_8_01_if_else` | · | `int x; if (1) x=7; else x=99; return x;` | 7 |
-| `6_8_02_while_sum` | · | sum 0..9 with `while` | 45 |
-| `6_8_03_for_sum` | · | sum 1..10 with `for` | 55 |
+| `6_8_01_if_else` | ★ | `int x; if (1) x=7; else x=99; return x;` | 7 |
+| `6_8_02_while_sum` | ★ | sum 0..9 with `while` | 45 |
+| `6_8_03_for_sum` | ★ | sum 1..10 with `for` | 55 |
| `6_8_04_do_while` | · | `int i=0; do { i=42; } while (0); return i;` | 42 |
-| `6_8_05_break` | · | `for (i=0;;i++) if (i==42) break; return i;` | 42 |
-| `6_8_06_continue` | · | sum of evens in `[0,20)` via `continue` | 90 |
+| `6_8_05_break` | ★ | `for (i=0;;i++) if (i==42) break; return i;` | 42 |
+| `6_8_06_continue` | ★ | sum of evens in `[0,20)` via `continue` | 90 |
| `6_8_07_switch_case` | · | three-arm switch returns 42 on case 2 | 42 |
| `6_8_08_switch_fallthrough` | · | `case 1: r+=10; case 2: r+=20;` on input 1 | 30 |
| `6_8_09_switch_default` | · | unmatched switch hits `default` | 7 |
| `6_8_10_goto_forward` | · | `goto L; r=99; L: return 42;` | 42 |
| `6_8_11_goto_backward` | · | counter loop built with `goto` | 10 |
-| `6_8_12_block_scope` | · | inner `{ int x=42; }` shadows outer | 42 |
-| `6_8_13_compound_decl_mix` | · | declarations interleaved with statements (C99) | 42 |
-| `6_8_14_return_void` | · | `void f(void){return;}; f(); return 42;` | 42 |
-| `6_8_15_null_statement` | · | `for (int i=0;i<42;i++) ; return i;` | 42 |
+| `6_8_12_block_scope` | ★ | inner `{ int x=42; }` shadows outer | 42 |
+| `6_8_13_compound_decl_mix` | ★ | declarations interleaved with statements (C99) | 42 |
+| `6_8_14_return_void` | ★ | `void f(void){return;}; f(); return 42;` | 42 |
+| `6_8_15_null_statement` | ★ | `for (int i=0;i<42;i++) ; return i;` | 42 |
## §6.9 External definitions
| Case | Status | Body | Expected |
|---|---|---|---|
-| `6_9_01_two_functions` | · | helper + caller in one TU | 42 |
-| `6_9_02_recursive_function` | · | `factorial(5)` | 120 |
+| `6_9_01_two_functions` | ★ | helper + caller in one TU | 42 |
+| `6_9_02_recursive_function` | ★ | `factorial(5)` | 120 |
| `6_9_03_tentative_def` | · | file-scope `int g;` (tentative) + use | 0 |
-| `6_9_04_static_func` | · | `static int helper(...)` + caller | 42 |
-| `6_9_05_proto_then_def` | · | forward declaration before body | 42 |
+| `6_9_04_static_func` | ★ | `static int helper(...)` + caller | 42 |
+| `6_9_05_proto_then_def` | ★ | forward declaration before body | 42 |
| `6_9_06_variadic_func` | · | `sum(int n, ...)` over `va_arg`; `sum(2,20,22)` (paired with builtin_03) | 42 |
| `6_9_07_global_const` | · | full TU: `const int g = 42; int test_main(void){return g;}` | 42 |
| `6_9_08_global_struct_init` | · | full TU: `struct S{int v;} g={42}; int test_main(void){return g.v;}` | 42 |