kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit d7882f0ce3836f3d71de3ba9bca652a7fc28a161
parent cc3abed6052eb7fdcc58fc6ccba94bbd8fc8a5ff
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 05:26:26 -0700

parse: Phase 1 — calls, params, multi-fn TUs, &&/||/?:, sizeof/cast

Land the Phase 1 expression-grammar surface so helper functions and
calls work end-to-end. Param lists with pointer declarators, multiple
function definitions per TU, forward prototypes, function calls in
postfix, short-circuiting `&&`/`||` and ternary `?:`, unary `&`/`*`,
char-literal decoding, string literals into .rodata, sizeof(type-name)
and sizeof(IDENT), _Alignof(type-name), cast expressions, and a
type-name production shared by all three.

Adds backend-cooperative scratch-register reset at statement boundaries
(cg_reset_scratch → aa_reset_scratch) so deep recursive bodies like
factorial don't exhaust the aarch64 backend's fixed scratch window.

Flips 36 corpus rows from `·` to `★` (Phase 0's spine + Phase 1's
unlocks). Phase 1 boxes ticked in doc/parser-status.md.

Diffstat:
Adoc/parser-status.md | 239+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/arch/aarch64.c | 14++++++++++++++
Msrc/cg/cg.c | 57+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/parse/parse.c | 1051++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
Mtest/parse/CORPUS.md | 144++++++++++++++++++++++++++++++++++++++++----------------------------------------
5 files changed, 1293 insertions(+), 212 deletions(-)

diff --git a/doc/parser-status.md b/doc/parser-status.md @@ -0,0 +1,239 @@ +# C-parser status + +Living checklist for the C front-end (`src/parse/`) build-out. Behavioral +oracle is `test/parse/CORPUS.md` — every checkbox here corresponds to +corpus rows that flip from `·` to `★` when the box gets ticked. Update +this doc in the same commit as the parser change that lands it. + +Phase status: + +- ✅ landed +- 🚧 in progress +- ⬜ planned + +Each phase is one agent's worth of work. Phases are ordered by +dependency, not priority — Phase N+1 generally needs Phase N's surface +in place. + +--- + +## Phase 0 — Spine ✅ + +§6.5 binary operators, scalar `int` locals, `if/else`, `while`, `for`, +`break`, `continue`, `return`, comma, simple/compound assignment, +unary `+ - ! ~ ++ --`, `(expr)`, integer literals. Single function +`int test_main(void) { ... }` per TU. + +--- + +## Phase 1 — Calls & §6.5 completion ✅ + +Finish the expression grammar and let helper functions exist. The +largest single unlock — most later phases depend on multi-function TUs. + +- [x] Parameter lists in function definitions (`int f(int x, int *p)`) +- [x] Multiple function definitions per TU +- [x] Function calls in postfix (`f(x, y)`) — drives `cg_call` / ABI +- [x] Forward prototypes (`int f(int);` then later body) +- [x] Logical `&&`, `||`, ternary `?:` (short-circuit, label-driven) +- [x] Address-of `&` and dereference `*` in unary +- [x] String literals in primary (rodata-resident, decay to `char*`) +- [x] Char literal decoded value (full §6.4.4.4 escape set) +- [x] `sizeof(type-name)` and `sizeof(IDENT)` for declared objects; + arbitrary `sizeof expr` (no parens, side-effecting operand) + defers to Phase 2 once subscript/`->`/etc. land +- [x] `_Alignof(type-name)` +- [x] Cast expression `(type-name) expr` +- [x] Type-name production (shared by sizeof / _Alignof / cast); abstract + declarators are pointer-prefix only — array/function suffixes wait + on Phase 2 + +Phase 1 also added a backend-cooperative scratch-register reset at +statement boundaries (cg's value stack is empty between stmts, so +`cg_reset_scratch` lets the next statement reuse the entire scratch +window). Without it the aarch64 backend's fixed pool gets exhausted +inside any function with multiple sequential reg-allocating +operations (factorial blew up at depth 5). + +Unlocks (status as landed): `6_5_12–14` ★, `6_5_20–21` ★, `6_5_22` · +(needs array decl, Phase 2), `6_5_23–25` ★, `6_5_30` · (deferred — +`_Generic` requires type-only walking, defer to Phase 3 alongside +`_Generic`'s natural neighbors), `6_5_32` · (needs subscript, Phase 2), +`6_2_4_01` · (proper static-local persistence — Phase 4), `6_2_5_01` ★, +`6_3_1_1_01` ★, `6_3_2_2_01` ★, `6_3_2_3_01–02` ★, `6_7_4_01` ★, +`6_9_01–02` ★, `6_9_03` · (file-scope tentative def — Phase 4), +`6_9_04–05` ★, `6_9_06` · (variadic + `__builtin_va_*` — Phase 9). + +--- + +## Phase 2 — Pointers & arrays ⬜ + +Pointer/array declarator layers and the address operators. Builds on +Phase 1's type-name production. + +- [ ] Pointer declarator (`int *p`, `int **pp`) +- [ ] Array declarator (`int a[N]`, `int a[]`) +- [ ] Subscript `a[i]` (and the commutative `i[a]`) +- [ ] Pointer arithmetic in `+`/`-` (scaled by element size) +- [ ] Array-to-pointer decay +- [ ] Function-to-pointer decay + indirect call (`(*fp)(x)`) +- [ ] `int *const p` / `const int *p` qualifier placement +- [ ] `[static N]` parameter form +- [ ] VLA local (`int a[n]`) + +Unlocks: `6_3_2_1_*`, `6_5_28–32`, `6_7_3_03–04`, `6_7_6_01–07`. + +--- + +## Phase 3 — Aggregates (struct / union / enum) ⬜ + +Tag namespace, member access, anonymous and forward-declared +aggregates, bitfields, `_Generic`. + +- [ ] `struct` / `union` definition and tag-scope lookup +- [ ] Member access `.` and `->` in postfix +- [ ] `enum` with constants bound into ordinary scope +- [ ] Forward-declared tag (`struct S; ... struct S { ... };`) +- [ ] Self-referential pointers (`struct N { struct N *next; };`) +- [ ] Anonymous struct/union members (C11 §6.7.2.1) +- [ ] Bitfield members (`unsigned a:5`) +- [ ] `_Generic` selection (type-keyed) + +Unlocks: `6_2_3_*`, `6_5_28–30`, `6_6_01`, `6_7_06–08`, `6_7_2_1_01–05`. + +--- + +## Phase 4 — Globals, storage, linkage ⬜ + +File-scope objects with their full initializer / linkage matrix. + +- [ ] File-scope object declarations +- [ ] `static` global (internal linkage, `.data` / `.bss` placement) +- [ ] `extern` declaration and resolution +- [ ] Tentative definitions +- [ ] `const` global in `.rodata` +- [ ] Global struct / array data emission +- [ ] `static` local with non-zero init + +Unlocks: `6_7_02–04`, `6_9_03`, `6_9_07–09`. + +--- + +## Phase 5 — Statement completeness ⬜ + +Switch family, goto/labels, do-while, void return, label namespace. + +- [ ] `switch` / `case` / `default` (incl. fall-through, default-only) +- [ ] `goto` forward and backward +- [ ] User labels (separate namespace from ordinary identifiers) +- [ ] `do { } while ()` +- [ ] `return;` from void function + +Unlocks: `6_2_3_02`, `6_8_04`, `6_8_07–11`, `6_8_14`. + +--- + +## Phase 6 — Initializers ⬜ + +Full §6.7.9 surface. Requires aggregates (Phase 3) and globals +(Phase 4) to be fully useful. + +- [ ] Brace initializer for arrays +- [ ] Brace initializer for structs +- [ ] Designated initializers (`[i] = ...`, `.field = ...`) +- [ ] Nested designators (`[i][j] = ...`) +- [ ] Partial init with zero-fill +- [ ] String literal init for `char[]` +- [ ] Compound literals (`(int[]){1, 2}`) + +Unlocks: `6_5_29`, `6_7_9_02–10`, `6_9_08–09`. + +--- + +## Phase 7 — Type breadth & conversions ⬜ + +Every primitive integer + float type round-tripped, plus the §6.3 +conversion matrix. + +- [ ] `char`, `signed char`, `unsigned char` +- [ ] `short`, `unsigned short` +- [ ] `long`, `long long`, `unsigned long`, `unsigned long long` +- [ ] `_Bool` with normalize-to-0/1 semantics +- [ ] `float`, `double`, `long double` +- [ ] Integer literal suffixes (`U`, `L`, `LL`) +- [ ] Float literals (decimal + hex) +- [ ] Usual arithmetic conversions +- [ ] Integer ↔ float conversions + +Unlocks: `6_3_*`, `6_7_2_01–12`. + +--- + +## Phase 8 — Qualifiers, alignment, typedefs ⬜ + +Remaining declaration-side features. + +- [ ] `_Atomic` qualifier (parse + plumb to cg) +- [ ] `_Alignas(T)` and `_Alignas(N)` on objects +- [ ] `inline` (header-only definitions) +- [ ] `typedef` (already partially landed; promote) +- [ ] Compound typedef targets (struct, function pointer, array) +- [ ] `_Static_assert` at file and block scope + +Unlocks: `6_7_3_05`, `6_7_5_*`, `6_7_8_*`, `6_7_10_*`. + +--- + +## Phase 9 — Builtins ⬜ + +Routes named `__builtin_*` calls to cg's intrinsic / asm machinery +rather than ordinary call lowering. Contract: `doc/builtins.md`. + +- [ ] `__builtin_alloca` +- [ ] `__builtin_expect` +- [ ] `__builtin_va_start` / `va_arg` / `va_end` / `va_copy` +- [ ] `__builtin_offsetof` +- [ ] `__atomic_load_n`, `__atomic_fetch_add` (and friends) + +Unlocks: `builtin_01–07`, `6_9_06`. + +--- + +## Phase 10 — Diagnostics polish ⬜ + +Negative cases assert nonzero exit + optional `errpat` substring. Wire +the explicit diagnostic for each `cases_err/*` row. + +- [ ] Identifier resolution / lvalue / type mismatch +- [ ] Redefinition (object, function, struct tag) +- [ ] Member / call / arrow / sizeof on incomplete or wrong shape +- [ ] Storage-class combinations +- [ ] Bitfield width +- [ ] `const` violation +- [ ] `_Static_assert` failure +- [ ] switch / case / default scope rules +- [ ] goto / label scope rules +- [ ] `void` parameter rules + +Unlocks: every row under `test/parse/cases_err/`. + +--- + +## Cross-cutting ⬜ + +Nudged forward as relevant cases exercise them; not their own phase. + +- [ ] DWARF Class-1 fanout (`debug_func_begin` / `param` / `local` / + `scope_*`) when `Debug` is non-NULL — see `DWARF.md` §3.1. +- [ ] Multi-TU `cases/<name>/{a.c,b.c}` harness wiring (see CORPUS.md + Multi-TU §). + +--- + +## Maintenance + +- Tick boxes in the same commit as the parser change that lands them. +- When a phase finishes, flip its heading marker to ✅ and update the + matching `·` rows in `CORPUS.md` to `★`. +- New corpus rows go in `CORPUS.md`; cross-link here only when they + introduce a feature axis the phases don't already cover. diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c @@ -777,6 +777,20 @@ static void aa_free_reg(CGTarget* t, Reg r) { (void)r; } +/* Reset the scratch-register cursors. The parser calls this between + * statements (when its value stack is known empty), letting the next + * statement reuse the entire scratch pool. Safe only when no live + * register-resident SValue is still expected — the parser asserts + * that precondition by checking cg's sp before forwarding. Without + * this, any function whose body contains more than ~10 sequential + * register-allocating operations exhausts the scratch pool. */ +void aa_reset_scratch(CGTarget* t); +void aa_reset_scratch(CGTarget* t) { + AAImpl* a = impl_of(t); + a->used_int = 0; + a->used_fp = 0; +} + static FrameSlot aa_frame_slot(CGTarget* t, const FrameSlotDesc* d) { AAImpl* a = impl_of(t); if (a->nslots == a->slots_cap) { diff --git a/src/cg/cg.c b/src/cg/cg.c @@ -346,6 +346,63 @@ void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty) { push(g, sv); } +/* Pop a pointer rvalue and push an OPK_INDIRECT lvalue for the pointee. + * The parser uses this to implement unary `*`. The pointer is materialized + * into a register; the resulting lvalue's MemAccess alias root is unknown + * (not LOCAL/GLOBAL), which is the right conservative answer for *ptr. */ +static Operand force_reg(CG* g, SValue v, const Type* ty); +void cg_deref(CG* g, const Type* pointee_ty); +void cg_deref(CG* g, const Type* pointee_ty) { + SValue v = pop(g); + const Type* pty = v.type ? v.type : v.op.type; + Operand src = force_reg(g, v, pty); + Operand ind; + SValue sv; + memset(&ind, 0, sizeof ind); + ind.kind = OPK_INDIRECT; + ind.cls = RC_INT; + ind.type = pointee_ty; + ind.v.ind.base = src.v.reg; + ind.v.ind.ofs = 0; + sv.op = ind; + sv.type = pointee_ty; + push(g, sv); +} + +/* Read the type of the value currently on top of the stack without popping. + * The parser uses this for type-driven dispatch (e.g. function-call lowering + * needs the callee's TY_FUNC) without re-deriving from its own state. */ +const Type* cg_top_type(CG* g); +const Type* cg_top_type(CG* g) { + if (g->sp == 0) return NULL; + return g->stack[g->sp - 1].type; +} + +/* Replace the type tag on the top SValue without emitting code. Used by + * the parser for casts that are no-ops at the value level (e.g. pointer- + * to-pointer of the same width); the underlying register/operand stays + * the same, only the C type the parser/backend will read changes. */ +void cg_retag_top(CG* g, const Type* ty); +void cg_retag_top(CG* g, const Type* ty) { + if (g->sp == 0) return; + g->stack[g->sp - 1].type = ty; + g->stack[g->sp - 1].op.type = ty; +} + +/* Recycle the backend's scratch-register pool. Safe only when nothing on + * the value stack holds a register operand (sp == 0 in particular, but + * an all-IMM stack is also fine in principle). The parser calls this at + * statement boundaries so a function body with many sequential reg- + * allocating operations doesn't exhaust the fixed scratch window. */ +extern void aa_reset_scratch(CGTarget*); +void cg_reset_scratch(CG* g); +void cg_reset_scratch(CG* g) { + if (g->sp != 0) return; + /* For now we only know about the aarch64 backend; once a second arch + * lands we promote this to a CGTarget vtable entry. */ + aa_reset_scratch(g->target); +} + void cg_push_global(CG* g, ObjSymId sym, const Type* ty) { SValue sv; sv.op = op_global(sym, 0, ty); diff --git a/src/parse/parse.c b/src/parse/parse.c @@ -40,6 +40,17 @@ /* Type-aware push for locals — exposed by cg.c, not in cg.h. */ extern void cg_push_local_typed(CG*, FrameSlot, const Type*); +/* Pop pointer rvalue, push INDIRECT lvalue of given pointee. */ +extern void cg_deref(CG*, const Type* pointee); +/* Read SValue.type at top of stack without popping. */ +extern const Type* cg_top_type(CG*); +/* Replace the type tag on the top SValue without emitting code (used for + * pointer-to-pointer casts which are no-ops at the value level). */ +extern void cg_retag_top(CG*, const Type*); +/* Recycle the backend's scratch-register pool when no value-stack entry + * holds a live register. Called at statement boundaries to avoid + * exhausting the fixed scratch window over the course of a function. */ +extern void cg_reset_scratch(CG*); /* ============================================================ * Keywords @@ -158,7 +169,9 @@ typedef struct Parser { TargetABI* abi; Pool* pool; - Tok cur; /* one token of lookahead */ + Tok cur; /* one token of lookahead */ + Tok next; /* second slot, populated lazily by peek1() */ + int has_next; Sym kw_sym[KW_COUNT]; @@ -188,7 +201,23 @@ static _Noreturn void perr(Parser* p, const char* fmt, ...) { * Token helpers * ============================================================ */ -static void advance(Parser* p) { p->cur = pp_next(p->pp); } +static void advance(Parser* p) { + if (p->has_next) { + p->cur = p->next; + p->has_next = 0; + } else { + p->cur = pp_next(p->pp); + } +} + +/* One-token lookahead beyond p->cur. Lazily populated. */ +static Tok peek1(Parser* p) { + if (!p->has_next) { + p->next = pp_next(p->pp); + p->has_next = 1; + } + return p->next; +} static int is_punct(const Tok* t, u32 punct) { return t->kind == TOK_PUNCT && t->v.punct == punct; @@ -284,6 +313,207 @@ static SymEntry* scope_lookup(Parser* p, Sym name) { * ============================================================ */ static const Type* ty_int(Parser* p) { return type_prim(p->pool, TY_INT); } +static const Type* ty_size_t(Parser* p) { + return abi_size_type(p->abi, p->pool); +} + +/* DeclSpecs and the matching parser landed up in the declaration section + * historically; we hoist it before expression parsing because + * sizeof / _Alignof / cast need to consume a type-name from inside + * parse_unary. */ +typedef struct DeclSpecs { + const Type* type; + DeclStorage storage; + u32 flags; /* DeclFlag */ +} DeclSpecs; + +/* Resolve the type implied by a multiset of type-specifier tokens + * (unsigned, signed, short, long, char, int, ...). C allows most orders + * (`unsigned long int` ≡ `int unsigned long`), so we collect everything + * first and pick the canonical TY_* tag at the end. Phase 1 covers the + * combinations the corpus needs; the float family (`long double`) is + * Phase 7's job and falls through to a "conflicting" diagnostic if + * combined with the integer keywords here. */ +typedef struct TypeSpecAccum { + u8 saw_void; + u8 saw_char; + u8 saw_int; + u8 saw_short; + u8 long_count; /* 0/1/2 */ + u8 saw_signed; + u8 saw_unsigned; + u8 saw_bool; + u8 saw_float; + u8 saw_double; + u8 saw_explicit_type; /* any of the above? */ +} TypeSpecAccum; + +static const Type* resolve_type_specs(Parser* p, const TypeSpecAccum* a, + SrcLoc loc) { + if (!a->saw_explicit_type) return NULL; + if (a->saw_void) { + if (a->saw_char || a->saw_int || a->saw_short || a->long_count || + a->saw_signed || a->saw_unsigned || a->saw_bool || a->saw_float || + a->saw_double) { + compiler_panic(p->c, loc, "conflicting type specifiers (void mixed)"); + } + return type_void(p->pool); + } + if (a->saw_bool) { + return type_prim(p->pool, TY_BOOL); + } + if (a->saw_char) { + if (a->saw_unsigned) return type_prim(p->pool, TY_UCHAR); + if (a->saw_signed) return type_prim(p->pool, TY_SCHAR); + return type_prim(p->pool, TY_CHAR); + } + if (a->saw_float) return type_prim(p->pool, TY_FLOAT); + if (a->saw_double) { + return type_prim(p->pool, a->long_count ? TY_LDOUBLE : TY_DOUBLE); + } + if (a->saw_short) { + return type_prim(p->pool, a->saw_unsigned ? TY_USHORT : TY_SHORT); + } + if (a->long_count == 2) { + return type_prim(p->pool, a->saw_unsigned ? TY_ULLONG : TY_LLONG); + } + if (a->long_count == 1) { + return type_prim(p->pool, a->saw_unsigned ? TY_ULONG : TY_LONG); + } + if (a->saw_unsigned) return type_prim(p->pool, TY_UINT); + if (a->saw_signed || a->saw_int) return type_prim(p->pool, TY_INT); + return type_prim(p->pool, TY_INT); +} + +static int parse_decl_specs(Parser* p, DeclSpecs* out) { + /* Tracks integer/void/char type specifiers in any order, plus the + * storage-class and qualifier keywords. Returns 0 if no specifier was + * consumed (caller treats that as "not a declaration"). */ + TypeSpecAccum acc; + SrcLoc loc; + int seen = 0; + memset(&acc, 0, sizeof acc); + out->type = NULL; + out->storage = DS_AUTO; + out->flags = DF_NONE; + loc = tok_loc(&p->cur); + for (;;) { + Tok t = p->cur; + if (is_kw(p, &t, KW_VOID)) { + acc.saw_void = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_CHAR)) { + acc.saw_char = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_INT)) { + acc.saw_int = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_SHORT)) { + acc.saw_short = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_LONG)) { + acc.long_count++; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_SIGNED)) { + acc.saw_signed = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_UNSIGNED)) { + acc.saw_unsigned = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_BOOL)) { + acc.saw_bool = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_FLOAT)) { + acc.saw_float = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_DOUBLE)) { + acc.saw_double = 1; acc.saw_explicit_type = 1; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_STATIC)) { + out->storage = DS_STATIC; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_EXTERN)) { + out->storage = DS_EXTERN; advance(p); seen = 1; + } else if (is_kw(p, &t, KW_CONST) || is_kw(p, &t, KW_VOLATILE) || + is_kw(p, &t, KW_RESTRICT) || is_kw(p, &t, KW_INLINE) || + is_kw(p, &t, KW_NORETURN) || is_kw(p, &t, KW_REGISTER) || + is_kw(p, &t, KW_AUTO) || is_kw(p, &t, KW_ATOMIC)) { + /* Recognized but currently no-op at this slice. */ + advance(p); seen = 1; + } else { + break; + } + } + if (seen) { + out->type = resolve_type_specs(p, &acc, loc); + if (!out->type) { + /* Storage class without a type — default to int per pre-C99. */ + out->type = ty_int(p); + } + } + return seen; +} + +/* True when the current token starts a declaration-specifier sequence: a + * type keyword, a storage-class keyword, a qualifier, or a function + * specifier. Used at lookahead points (cast vs. paren expr; sizeof's + * inner form; for-init declarator vs. expression). The list mirrors + * parse_decl_specs's accepted set so the two stay in sync. + * + * Typedef-names are not yet implemented; when they land, they become + * the second branch here and dispatch on scope_lookup().kind == + * SEK_TYPEDEF, just like any other type-name token. */ +static int starts_type_name(const Parser* p, const Tok* t) { + if (t->kind != TOK_IDENT) return 0; + CKw k = ident_kw(p, t->v.ident); + switch (k) { + case KW_VOID: + case KW_CHAR: + case KW_SHORT: + case KW_INT: + case KW_LONG: + case KW_FLOAT: + case KW_DOUBLE: + case KW_SIGNED: + case KW_UNSIGNED: + case KW_BOOL: + case KW_STRUCT: + case KW_UNION: + case KW_ENUM: + case KW_CONST: + case KW_VOLATILE: + case KW_RESTRICT: + case KW_ATOMIC: + case KW_STATIC: + case KW_EXTERN: + case KW_INLINE: + case KW_NORETURN: + case KW_REGISTER: + case KW_AUTO: + case KW_TYPEDEF: + return 1; + default: + return 0; + } +} + +/* Walk a `*` chain at the front of a declarator (and optional qualifiers + * after each `*`), wrapping `base` in successive pointer types. Returns + * the innermost type the IDENT/declarator-tail refers to. */ +static const Type* parse_pointer_layer(Parser* p, const Type* base) { + while (accept_punct(p, '*')) { + base = type_ptr(p->pool, base); + /* Optional qualifiers after `*`; recognized and ignored at this slice. */ + for (;;) { + if (accept_kw(p, KW_CONST) || accept_kw(p, KW_VOLATILE) || + accept_kw(p, KW_RESTRICT) || accept_kw(p, KW_ATOMIC)) { + continue; + } + break; + } + } + return base; +} + +/* Type-name (§6.7.7): specifier-qualifier-list (abstract-declarator)? + * The abstract declarator at this slice is just a `*` chain — array and + * function suffixes land in Phase 2. Used by sizeof / _Alignof / cast. */ +static const Type* parse_type_name(Parser* p) { + DeclSpecs specs; + if (!parse_decl_specs(p, &specs)) { + perr(p, "expected type-name"); + } + return parse_pointer_layer(p, specs.type); +} /* ============================================================ * Literal parsing @@ -369,6 +599,163 @@ static void to_rvalue(Parser* p) { (void)p; } +/* Decode one character (the first encoded code unit) from the token's + * spelling at offset `i`, advancing `*pi` past the consumed bytes. + * Handles the §6.4.4.4 escape sequences a freestanding compiler is + * required to recognize. */ +static i64 decode_one_char(Parser* p, const char* s, size_t len, size_t* pi, + SrcLoc loc) { + size_t i = *pi; + i64 v; + int c; + if (i >= len) compiler_panic(p->c, loc, "truncated character literal"); + if (s[i] != '\\') { + v = (unsigned char)s[i++]; + *pi = i; + return v; + } + /* Escape sequence. */ + i++; + if (i >= len) compiler_panic(p->c, loc, "trailing '\\' in literal"); + c = (unsigned char)s[i++]; + switch (c) { + case 'n': v = '\n'; break; + case 't': v = '\t'; break; + case 'r': v = '\r'; break; + case 'b': v = '\b'; break; + case 'f': v = '\f'; break; + case 'v': v = '\v'; break; + case 'a': v = '\a'; break; + case '\\': v = '\\'; break; + case '\'': v = '\''; break; + case '"': v = '"'; break; + case '?': v = '?'; break; + case 'x': { + i64 hex = 0; + int any = 0; + while (i < len) { + int d = (unsigned char)s[i]; + int dv; + if (d >= '0' && d <= '9') dv = d - '0'; + else if (d >= 'a' && d <= 'f') dv = d - 'a' + 10; + else if (d >= 'A' && d <= 'F') dv = d - 'A' + 10; + else break; + hex = hex * 16 + dv; + any = 1; + i++; + } + if (!any) compiler_panic(p->c, loc, "\\x with no hex digits"); + v = hex & 0xff; + break; + } + default: + if (c >= '0' && c <= '7') { + i64 oct = c - '0'; + int n = 1; + while (n < 3 && i < len && s[i] >= '0' && s[i] <= '7') { + oct = oct * 8 + (s[i] - '0'); + i++; + n++; + } + v = oct & 0xff; + } else { + /* Unknown escape: implementation-defined; keep the literal byte. */ + v = c; + } + break; + } + *pi = i; + return v; +} + +static i64 decode_char_literal(Parser* p, const Tok* t) { + size_t len = 0; + const char* s = pool_str(p->pool, t->spelling, &len); + size_t i = 0; + i64 v; + if (!s) perr(p, "bad char literal"); + /* Skip optional encoding prefix (`L`, `u`, `U`, `u8`). The flag bits + * tell us which one without re-parsing. */ + if (t->flags & TF_STR_U8) i = 2; + else if (t->flags & (TF_STR_WIDE | TF_STR_U16 | TF_STR_U32)) i = 1; + if (i >= len || s[i] != '\'') perr(p, "malformed character literal"); + i++; /* opening quote */ + if (i >= len || s[i] == '\'') perr(p, "empty character literal"); + v = decode_one_char(p, s, len, &i, t->loc); + /* Multi-character constants are valid C but undefined-implementation; + * the spine corpus only uses single-char constants. Diagnose extra + * source bytes before the closing quote conservatively. */ + if (i >= len || s[i] != '\'') { + perr(p, "multi-character constants are not supported"); + } + return v; +} + +/* Decode the content of a string-literal token (without the surrounding + * quotes / encoding prefix) into raw bytes. Returns a heap-allocated + * buffer of length `*nlen_out`; caller frees through the same heap. */ +static u8* decode_string_literal(Parser* p, const Tok* t, size_t* nlen_out) { + size_t len = 0; + const char* s = pool_str(p->pool, t->spelling, &len); + size_t i = 0; + Heap* h = p->c->env->heap; + u8* buf; + size_t k = 0; + if (!s) perr(p, "bad string literal"); + if (t->flags & TF_STR_U8) i = 2; + else if (t->flags & (TF_STR_WIDE | TF_STR_U16 | TF_STR_U32)) i = 1; + if (i >= len || s[i] != '"') perr(p, "malformed string literal"); + i++; + /* Conservative buffer: at most one byte per source byte, plus NUL. */ + buf = (u8*)h->alloc(h, len + 1, 1); + if (!buf) perr(p, "out of memory in string literal"); + while (i < len && s[i] != '"') { + i64 ch = decode_one_char(p, s, len, &i, t->loc); + buf[k++] = (u8)ch; + } + buf[k++] = 0; /* NUL terminator */ + *nlen_out = k; + return buf; +} + +/* Place decoded string bytes in .rodata and return an ObjSymId pointing at + * them. Used by string literals in primary. */ +static ObjSymId emit_string_to_rodata(Parser* p, const u8* bytes, size_t n) { + ObjBuilder* ob = decl_obj(p->decls); + Sym secname = pool_intern_cstr(p->pool, ".rodata"); + ObjSecId sec = obj_section(ob, secname, SEC_RODATA, SF_ALLOC, 1u); + u32 base = obj_pos(ob, sec); + Sym lname; + ObjSymId sym; + char namebuf[32]; + static u32 counter; + /* Anonymous local symbol; the name is just for readability in objdump. */ + int wlen = 0; + u32 id = ++counter; + /* Tiny formatter — avoids stdio dependencies in the parser. */ + namebuf[wlen++] = '.'; + namebuf[wlen++] = 'L'; + namebuf[wlen++] = 'C'; + { + char digits[12]; + int dn = 0; + if (id == 0) digits[dn++] = '0'; + while (id) { + digits[dn++] = (char)('0' + (id % 10)); + id /= 10; + } + while (dn) namebuf[wlen++] = digits[--dn]; + } + namebuf[wlen] = 0; + lname = pool_intern(p->pool, namebuf, (size_t)wlen); + sym = obj_symbol(ob, lname, SB_LOCAL, SK_OBJ, sec, base, n); + { + u8* dst = obj_reserve(ob, sec, n); + if (dst) memcpy(dst, bytes, n); + } + return sym; +} + static void parse_primary(Parser* p) { Tok t = p->cur; if (t.kind == TOK_NUM) { @@ -414,15 +801,37 @@ static void parse_primary(Parser* p) { } } if (t.kind == TOK_CHR) { - /* Minimal char-literal: take the first decoded byte from the lit table. - * Spine doesn't use char literals, so this is best-effort. */ - const LitInfo* li = pp_lit(p->pp, t.lit); - i64 v = 0; - (void)li; + i64 v = decode_char_literal(p, &t); advance(p); cg_push_int(p->cg, v, ty_int(p)); return; } + if (t.kind == TOK_STR) { + /* Decoded bytes go into a fresh anonymous .rodata symbol; the value + * of the expression is a pointer to char[] decayed to char*. */ + size_t n = 0; + u8* bytes = decode_string_literal(p, &t, &n); + ObjSymId sym = emit_string_to_rodata(p, bytes, n); + p->c->env->heap->free(p->c->env->heap, bytes, 0); + advance(p); + { + const Type* char_ty = type_prim(p->pool, TY_CHAR); + const Type* arr_ty = type_array(p->pool, char_ty, (u32)n, 0); + const Type* ptr_ty = type_ptr(p->pool, char_ty); + /* Array-to-pointer decay would normally happen at use; cg_push_global + * is given a pointer-typed lvalue so subsequent operations treat it + * as `char*` rvalue once loaded. */ + (void)arr_ty; + cg_push_global(p->cg, sym, ptr_ty); + /* String address is already the pointer rvalue we want — promote + * away from "lvalue of pointer-to-char[N]" to just "rvalue of + * char*" by tagging it as an rvalue at the cg layer. cg_push_global + * pushes a GLOBAL lvalue; for strings we want the address itself, + * i.e. an rvalue. cg_addr converts. */ + cg_addr(p->cg); + } + return; + } perr(p, "expected expression"); } @@ -440,16 +849,97 @@ static void parse_postfix(Parser* p) { cg_inc_dec(p->cg, BO_ISUB, /*post=*/1); continue; } - if (is_punct(&t, '(') || is_punct(&t, '[') || is_punct(&t, '.') || - is_punct(&t, P_ARROW)) { - perr(p, "call/subscript/member access not supported in v1 slice"); + if (is_punct(&t, '(')) { + /* Function call. The callee was pushed by parse_primary as an + * lvalue (OPK_GLOBAL for SEK_FUNC); cg_call accepts that directly + * for direct calls. */ + const Type* fn_type = cg_top_type(p->cg); + if (!fn_type || fn_type->kind != TY_FUNC) { + perr(p, "called object is not a function"); + } + advance(p); /* '(' */ + u32 nargs = 0; + if (!is_punct(&p->cur, ')')) { + for (;;) { + parse_assign_expr(p); + to_rvalue(p); + ++nargs; + if (!accept_punct(p, ',')) break; + } + } + expect_punct(p, ')', "')' after argument list"); + if (fn_type->fn.nparams != nargs && !fn_type->fn.variadic) { + perr(p, "wrong number of arguments"); + } + if (fn_type->fn.variadic && nargs < fn_type->fn.nparams) { + perr(p, "too few arguments to variadic function"); + } + cg_call(p->cg, nargs, fn_type); + /* cg_call leaves nothing on the stack for void-returning functions. + * Higher-level expression machinery (drop in stmt context, dispatch + * inside ternary, etc.) expects a top SValue, so push a sentinel + * int 0. Using the value of a void-returning call is invalid C; the + * sentinel just keeps stack discipline so the parser doesn't + * underflow on `f();` style statements. */ + if (fn_type->fn.ret && fn_type->fn.ret->kind == TY_VOID) { + cg_push_int(p->cg, 0, ty_int(p)); + } + continue; + } + if (is_punct(&t, '[') || is_punct(&t, '.') || is_punct(&t, P_ARROW)) { + perr(p, "subscript/member access not supported in v1 slice"); } break; } } +/* sizeof / _Alignof and cast all parse a type-name from inside parentheses; + * detection at `(` requires looking past the opening paren. The work is the + * same: dispatch on what comes next. */ static void parse_unary(Parser* p) { Tok t = p->cur; + /* Cast expression `(type-name) cast`. Disambiguated against `(expr)` + * by checking the token immediately after `(`. */ + if (is_punct(&t, '(')) { + Tok n = peek1(p); + if (starts_type_name(p, &n)) { + const Type* dst; + const Type* src; + advance(p); /* '(' */ + dst = parse_type_name(p); + expect_punct(p, ')', "')' after type-name"); + parse_unary(p); /* cast-expression */ + to_rvalue(p); + /* `(void) expr` is the C idiom for "discard the value"; we must not + * convert (no value to materialize) — drop the rvalue and push + * nothing. The corpus relies on this for `(void)42;` style stmts. */ + if (dst && dst->kind == TY_VOID) { + cg_drop(p->cg); + /* Leave nothing on stack. parse_stmt's expression-stmt path drops + * the result; our caller is parse_unary, so leave the stack + * exactly empty and synthesize a sentinel int 0 to keep value- + * stack discipline (so to_rvalue from a higher level still has + * a top). The expression `(void)e` cannot appear where a value + * is required, so this is dead-but-harmless. */ + cg_push_int(p->cg, 0, ty_int(p)); + return; + } + src = cg_top_type(p->cg); + /* Pointer-to-pointer cast is a no-op at the value level once the + * pointer is already in a register. Skip cg_convert (which would + * dispatch to the backend's same-class bitcast, not implemented for + * register-resident pointers). Update the SValue's type so later + * dereference picks the right pointee — easiest done by re-pushing + * with the new type. */ + if (src && src->kind == TY_PTR && dst->kind == TY_PTR) { + cg_retag_top(p->cg, dst); + return; + } + cg_convert(p->cg, dst); + return; + } + /* fall through to parse_postfix → parse_primary which handles `(expr)`. */ + } if (is_punct(&t, '+')) { advance(p); parse_unary(p); @@ -479,6 +969,32 @@ static void parse_unary(Parser* p) { cg_unop(p->cg, UO_BNOT); return; } + if (is_punct(&t, '&')) { + advance(p); + parse_unary(p); + /* The operand is required to be an lvalue; cg_addr panics otherwise. */ + cg_addr(p->cg); + return; + } + if (is_punct(&t, '*')) { + /* Dereference: parse the operand, force to a pointer rvalue, then + * derive the INDIRECT lvalue. The pointee type drives the next access. */ + const Type* pty; + const Type* pointee; + advance(p); + parse_unary(p); + to_rvalue(p); + pty = cg_top_type(p->cg); + if (!pty || pty->kind != TY_PTR) { + perr(p, "indirection requires pointer operand"); + } + pointee = pty->ptr.pointee; + if (pointee && pointee->kind == TY_VOID) { + perr(p, "dereferencing pointer to incomplete type"); + } + cg_deref(p->cg, pointee); + return; + } if (is_punct(&t, P_INC) || is_punct(&t, P_DEC)) { BinOp bop = is_punct(&t, P_INC) ? BO_IADD : BO_ISUB; advance(p); @@ -486,6 +1002,51 @@ static void parse_unary(Parser* p) { cg_inc_dec(p->cg, bop, /*post=*/0); return; } + if (is_kw(p, &t, KW_SIZEOF)) { + /* sizeof has two forms: `sizeof ( type-name )` and `sizeof unary`. + * The expression form must NOT evaluate its operand (per §6.5.3.4), + * which is awkward in single-pass codegen. The Phase 1 corpus only + * needs `sizeof(type-name)` and `sizeof(IDENT)` where IDENT is a + * declared object — both reducible to a type lookup with no + * emission. Other expression forms are diagnosed. */ + const Type* ty = NULL; + advance(p); + if (is_punct(&p->cur, '(')) { + Tok n = peek1(p); + if (starts_type_name(p, &n)) { + advance(p); + ty = parse_type_name(p); + expect_punct(p, ')', "')'"); + } else if (n.kind == TOK_IDENT && ident_kw(p, n.v.ident) == KW_NONE) { + /* `sizeof(IDENT)` where IDENT is an object — look up its type. */ + SymEntry* e; + advance(p); /* '(' */ + e = scope_lookup(p, p->cur.v.ident); + if (!e) { + compiler_panic(p->c, p->cur.loc, "undeclared identifier"); + } + ty = e->type; + advance(p); /* IDENT */ + expect_punct(p, ')', "')'"); + } else { + perr(p, "sizeof of expression not supported in v1 slice"); + } + } else { + perr(p, "sizeof expr (without parens) not supported in v1 slice"); + } + cg_push_int(p->cg, (i64)abi_sizeof(p->abi, ty), ty_size_t(p)); + return; + } + if (is_kw(p, &t, KW_ALIGNOF)) { + /* _Alignof is type-name only (per §6.5.3.4 ¶1). */ + const Type* ty; + advance(p); + expect_punct(p, '(', "'('"); + ty = parse_type_name(p); + expect_punct(p, ')', "')'"); + cg_push_int(p->cg, (i64)abi_alignof(p->abi, ty), ty_size_t(p)); + return; + } parse_postfix(p); /* postfix may have left an lvalue or rvalue. Higher-level callers * issue to_rvalue when they need the value. */ @@ -634,12 +1195,116 @@ static void parse_bor(Parser* p) { } } -/* Logical && / || / ?: are short-circuiting and need labels. The spine - * doesn't need them yet (the relevant corpus rows are the §6.5_1[2,3,4] - * group); they slot in here when those rows graduate. */ +/* Logical && / || are short-circuiting: the right operand is evaluated + * only when the left does not already determine the result. We lower + * each as a label-driven branch sequence that materializes a 0/1 i32 + * result. Both produce an int rvalue regardless of operand types + * (per §6.5.13/14). + * + * a && b lowers to: a || b lowers to: + * <a>; jz Lfalse <a>; jnz Ltrue + * <b>; jz Lfalse <b>; jnz Ltrue + * push 1; jmp Lend push 0; jmp Lend + * Lfalse: push 0 Ltrue: push 1 + * Lend: Lend: + */ +static void parse_land(Parser* p) { + parse_bor(p); + while (is_punct(&p->cur, P_AND)) { + CGLabel L_false = cg_label_new(p->cg); + CGLabel L_end = cg_label_new(p->cg); + advance(p); + to_rvalue(p); + cg_branch_false(p->cg, L_false); + parse_bor(p); + to_rvalue(p); + cg_branch_false(p->cg, L_false); + cg_push_int(p->cg, 1, ty_int(p)); + cg_jump(p->cg, L_end); + cg_label_place(p->cg, L_false); + cg_push_int(p->cg, 0, ty_int(p)); + cg_label_place(p->cg, L_end); + } +} + +static void parse_lor(Parser* p) { + parse_land(p); + while (is_punct(&p->cur, P_OR)) { + CGLabel L_true = cg_label_new(p->cg); + CGLabel L_end = cg_label_new(p->cg); + advance(p); + to_rvalue(p); + cg_branch_true(p->cg, L_true); + parse_land(p); + to_rvalue(p); + cg_branch_true(p->cg, L_true); + cg_push_int(p->cg, 0, ty_int(p)); + cg_jump(p->cg, L_end); + cg_label_place(p->cg, L_true); + cg_push_int(p->cg, 1, ty_int(p)); + cg_label_place(p->cg, L_end); + } +} + +/* Ternary `c ? t : f`. The cg value stack is linear-flow only, so a naive + * "push from each arm" leaves the stack in an inconsistent state at the + * merge point. We materialize the result through a fresh local: each arm + * stores into the same slot, the merge label reloads. v1 picks the slot's + * type from the then-arm and assumes the else-arm is the same type + * (matches the §6.5.15 corpus rows; full usual-conversions rules slot in + * with Phase 7). + * + * Likewise `&&` / `||` produce a 0/1 int and we lower them with explicit + * push/jump per branch, but since the result is a fresh constant in each + * arm, no temp slot is needed. The ternary case is special because the + * two arms can be arbitrary expressions whose computed values must + * appear on the same physical register/slot at the merge. */ +static void parse_ternary(Parser* p) { + parse_lor(p); + if (!is_punct(&p->cur, '?')) return; + CGLabel L_else = cg_label_new(p->cg); + CGLabel L_end = cg_label_new(p->cg); + const Type* result_ty = ty_int(p); + FrameSlot tmp; + FrameSlotDesc fsd; + /* Pop the cond, branch on it. */ + advance(p); /* '?' */ + to_rvalue(p); + cg_branch_false(p->cg, L_else); + parse_assign_expr(p); + to_rvalue(p); + /* Update result_ty from the then-arm (a closer approximation than int). */ + result_ty = cg_top_type(p->cg); + if (!result_ty) result_ty = ty_int(p); + memset(&fsd, 0, sizeof fsd); + fsd.type = result_ty; + fsd.size = abi_sizeof(p->abi, result_ty); + fsd.align = abi_alignof(p->abi, result_ty); + fsd.kind = FS_LOCAL; + fsd.flags = FSF_NONE; + tmp = cg_local(p->cg, &fsd); + /* Store then-arm value into tmp. cg_store needs [lv, rv]; the rvalue + * is already on top, so push the lvalue and swap. */ + cg_push_local_typed(p->cg, tmp, result_ty); + cg_swap(p->cg); + cg_store(p->cg); + cg_drop(p->cg); /* cg_store leaves the rvalue; drop in stmt-style usage */ + cg_jump(p->cg, L_end); + cg_label_place(p->cg, L_else); + expect_punct(p, ':', "':' in ternary"); + parse_assign_expr(p); + to_rvalue(p); + cg_push_local_typed(p->cg, tmp, result_ty); + cg_swap(p->cg); + cg_store(p->cg); + cg_drop(p->cg); + cg_label_place(p->cg, L_end); + /* At the merge, push the slot lvalue; callers can to_rvalue if needed. */ + cg_push_local_typed(p->cg, tmp, result_ty); +} static void parse_assign_expr(Parser* p) { - parse_bor(p); + parse_ternary(p); /* The LHS is now on the CG stack. If it's an lvalue we may consume it * for assignment; otherwise we keep the rvalue as the final result. */ Tok t = p->cur; @@ -710,61 +1375,11 @@ static void parse_expr(Parser* p) { } /* ============================================================ - * Declarations (slice: `int` / `void` only, no struct/union/enum/typedef) - * ============================================================ */ - -typedef struct DeclSpecs { - const Type* type; - DeclStorage storage; - u32 flags; /* DeclFlag */ -} DeclSpecs; - -static int parse_decl_specs(Parser* p, DeclSpecs* out) { - /* v1: tracks `int`, `void`, `static`, `extern`, plus a couple of common - * qualifiers that are ignored at this slice. Returns 0 if no specifier - * was consumed (caller treats that as "not a declaration"). */ - int seen = 0; - out->type = NULL; - out->storage = DS_AUTO; - out->flags = DF_NONE; - for (;;) { - Tok t = p->cur; - if (is_kw(p, &t, KW_INT)) { - if (out->type) perr(p, "conflicting type specifiers"); - out->type = type_prim(p->pool, TY_INT); - advance(p); - seen = 1; - } else if (is_kw(p, &t, KW_VOID)) { - if (out->type) perr(p, "conflicting type specifiers"); - out->type = type_void(p->pool); - advance(p); - seen = 1; - } else if (is_kw(p, &t, KW_STATIC)) { - out->storage = DS_STATIC; - advance(p); - seen = 1; - } else if (is_kw(p, &t, KW_EXTERN)) { - out->storage = DS_EXTERN; - advance(p); - seen = 1; - } else if (is_kw(p, &t, KW_CONST) || is_kw(p, &t, KW_VOLATILE) || - is_kw(p, &t, KW_RESTRICT) || is_kw(p, &t, KW_INLINE) || - is_kw(p, &t, KW_NORETURN) || is_kw(p, &t, KW_REGISTER) || - is_kw(p, &t, KW_AUTO)) { - /* Recognized but currently no-op at this slice. */ - advance(p); - seen = 1; - } else { - break; - } - } - if (seen && !out->type) { - /* `static x;` without type — default to int per pre-C99, but this is - * a hard error in C99/C11. Still tolerate at the scaffold level. */ - out->type = ty_int(p); - } - return seen; -} + * Declarations (slice: `int` / `void` / `char` only) + * ============================================================ + * DeclSpecs and parse_decl_specs are defined above (hoisted before the + * expression parsing section). What follows here is the declarator-and- + * initializer machinery built on top of them. */ /* Forward decl for parse_compound_stmt (mutually recursive with statement * dispatch). */ @@ -791,27 +1406,35 @@ static FrameSlot make_local(Parser* p, Sym name, const Type* type, SrcLoc loc) { return s; } -/* Parse a single init-declarator after the decl-specs have been consumed. - * Spine grammar: declarator = IDENT ; init = `=` assign_expr. - * Pointer/array/function declarators are TODO — those slot in here as - * additional layers around the IDENT. */ -static void parse_init_declarator(Parser* p, const DeclSpecs* specs) { - SrcLoc loc; - Tok name_tok; - Sym name; +/* Parse a non-abstract declarator: optional `*` pointer prefix followed + * by an IDENT. v1 doesn't yet implement function or array declarators, + * which slot in around the IDENT in subsequent phases. Returns the + * declared type (with pointer layers wrapping `base`) and writes the + * IDENT to *name_out / *loc_out. */ +static const Type* parse_declarator(Parser* p, const Type* base, Sym* name_out, + SrcLoc* loc_out) { + base = parse_pointer_layer(p, base); if (p->cur.kind != TOK_IDENT || ident_kw(p, p->cur.v.ident) != KW_NONE) { perr(p, "expected declarator name"); } - name_tok = p->cur; - loc = tok_loc(&name_tok); - name = name_tok.v.ident; + *name_out = p->cur.v.ident; + *loc_out = tok_loc(&p->cur); advance(p); + return base; +} + +/* Parse a single init-declarator after the decl-specs have been consumed. + * v1 grammar: declarator = `*`* IDENT ; init = `=` assign_expr. */ +static void parse_init_declarator(Parser* p, const DeclSpecs* specs) { + SrcLoc loc; + Sym name; + const Type* var_ty = parse_declarator(p, specs->type, &name, &loc); /* Local declaration only at this slice. */ { - FrameSlot s = make_local(p, name, specs->type, loc); + FrameSlot s = make_local(p, name, var_ty, loc); if (accept_punct(p, '=')) { cg_set_loc(p->cg, loc); - cg_push_local_typed(p->cg, s, specs->type); + cg_push_local_typed(p->cg, s, var_ty); parse_assign_expr(p); to_rvalue(p); cg_store(p->cg); @@ -984,6 +1607,10 @@ static void parse_compound_stmt(Parser* p) { } static void parse_stmt(Parser* p) { + /* Each statement starts from an empty value stack; recycle scratch + * registers so a function body with many sequential reg-allocating + * operations isn't bounded by the backend's fixed scratch window. */ + cg_reset_scratch(p->cg); cg_set_loc(p->cg, tok_loc(&p->cur)); if (is_punct(&p->cur, '{')) { parse_compound_stmt(p); @@ -1033,54 +1660,125 @@ static void parse_stmt(Parser* p) { * External (top-level) declarations * ============================================================ */ -/* For the spine, the only function shape is `int test_main(void) { ... }`. - * We accept `<type> IDENT (` `void` `)` `{` ... `}` and reject anything - * fancier. The full §6.7.6 declarator surface (parameters, varargs, - * pointer/array returns) lands as the corresponding corpus rows do. */ -static void parse_function_definition(Parser* p, const DeclSpecs* specs, - Sym fname, SrcLoc fname_loc) { - const Type** ptypes = NULL; - u16 nparams = 0; - const Type* fn_ty; - const ABIFuncInfo* abi; - Decl decl_in; - DeclId did; - ObjSymId fsym; - CGFuncDesc fd; +/* Helper: holds one parsed parameter's name + type (for binding into the + * function-body scope after cg_func_begin / cg_param). */ +typedef struct ParamInfo { + Sym name; + const Type* type; + SrcLoc loc; +} ParamInfo; - /* Param list: `void` or empty (and `)`); full list is TODO. */ - expect_punct(p, '(', "'('"); - if (accept_kw(p, KW_VOID)) { - /* `(void)`: zero params, not variadic. */ - } else if (!is_punct(&p->cur, ')')) { - perr(p, "only `(void)` parameter list is supported in v1 slice"); +/* Parse a parameter-type-list. Returns the parameter type array and counts + * via out-pointers; `*variadic_out` is set if the list ends in `, ...`. + * + * Forms accepted: + * `(void)` — zero named params + * `()` — old-style "unspecified args"; treated as zero + * `(T1, T2, ...)` — named or abstract params, possibly trailing ellipsis + * + * For each named param we record name+type so the function-body parser can + * later bind them into the param scope. Abstract (no-name) params are + * allowed for prototype-only declarations. */ +static void parse_param_list(Parser* p, ParamInfo** infos_out, u16* nparams_out, + u8* variadic_out) { + ParamInfo* infos; + u32 cap = 4; + u32 n = 0; + *variadic_out = 0; + *infos_out = NULL; + *nparams_out = 0; + + if (is_punct(&p->cur, ')')) { + return; /* `()` — no params recorded */ } - expect_punct(p, ')', "')'"); + if (is_kw(p, &p->cur, KW_VOID)) { + Tok n = peek1(p); + if (is_punct(&n, ')')) { + advance(p); /* `void` */ + return; /* `(void)` */ + } + } + + infos = (ParamInfo*)arena_array(p->c->tu, ParamInfo, cap); + for (;;) { + DeclSpecs specs; + Sym pname = 0; + SrcLoc ploc = {0, 0, 0}; + const Type* pty; + if (accept_punct(p, P_ELLIPSIS)) { + *variadic_out = 1; + break; + } + if (!parse_decl_specs(p, &specs)) { + perr(p, "expected parameter type"); + } + /* Allow either named (`int x`) or abstract (`int`) declarators. We + * peek the pointer prefix, then if an IDENT follows it's named. */ + pty = parse_pointer_layer(p, specs.type); + if (p->cur.kind == TOK_IDENT && ident_kw(p, p->cur.v.ident) == KW_NONE) { + pname = p->cur.v.ident; + ploc = tok_loc(&p->cur); + advance(p); + } + if (n == cap) { + cap *= 2; + ParamInfo* nbuf = (ParamInfo*)arena_array(p->c->tu, ParamInfo, cap); + memcpy(nbuf, infos, sizeof(ParamInfo) * n); + infos = nbuf; + } + infos[n].name = pname; + infos[n].type = pty; + infos[n].loc = ploc; + ++n; + if (!accept_punct(p, ',')) break; + } + *infos_out = infos; + *nparams_out = (u16)n; +} - fn_ty = type_func(p->pool, specs->type, ptypes, nparams, 0); - abi = abi_func_info(p->abi, fn_ty); - - memset(&decl_in, 0, sizeof decl_in); - decl_in.name = fname; - decl_in.type = fn_ty; - decl_in.loc = fname_loc; - decl_in.storage = (specs->storage == DS_STATIC) ? DS_STATIC : DS_EXTERN; - decl_in.linkage = - (specs->storage == DS_STATIC) ? DL_INTERNAL : DL_EXTERNAL; - decl_in.visibility = SV_DEFAULT; - did = decl_declare(p->decls, &decl_in); - fsym = decl_obj_sym(p->decls, did); - /* Promote the symbol's binding for non-static functions. decl_declare - * minted it with the right binding; assert here for clarity. */ - - /* Bind the function name into file scope so calls resolve. */ +/* Resolve or mint the ObjSymId for a function declaration. If the same + * function name was seen before in file scope (forward prototype, prior + * definition), reuse its symbol so the linker sees one definition. */ +static SymEntry* declare_function(Parser* p, Sym fname, const Type* fn_ty, + const DeclSpecs* specs, SrcLoc fname_loc) { + SymEntry* existing = scope_lookup(p, fname); + if (existing && existing->kind == SEK_FUNC) { + /* Compatible-types check is Phase 10 territory; for v1 we trust the + * declarations agree. Returning the existing entry lets the body + * defs reuse the prior obj_sym. */ + return existing; + } { - SymEntry* e = scope_define(p, fname, SEK_FUNC, fn_ty); + Decl decl_in; + DeclId did; + ObjSymId fsym; + SymEntry* e; + memset(&decl_in, 0, sizeof decl_in); + decl_in.name = fname; + decl_in.type = fn_ty; + decl_in.loc = fname_loc; + decl_in.storage = (specs->storage == DS_STATIC) ? DS_STATIC : DS_EXTERN; + decl_in.linkage = + (specs->storage == DS_STATIC) ? DL_INTERNAL : DL_EXTERNAL; + decl_in.visibility = SV_DEFAULT; + did = decl_declare(p->decls, &decl_in); + fsym = decl_obj_sym(p->decls, did); + e = scope_define(p, fname, SEK_FUNC, fn_ty); e->v.sym = fsym; + return e; } +} + +/* Drive cg through a full function definition: build CGFuncDesc with the + * already-resolved symbol and ABI info, open a parameter scope, allocate + * FS_PARAM slots for each named param, dispatch cg_param, then parse the + * compound body. The `infos` array is the parser's per-param state. */ +static void parse_function_body(Parser* p, ObjSymId fsym, const Type* fn_ty, + const ABIFuncInfo* abi, const ParamInfo* infos, + u16 nparams, SrcLoc fname_loc) { + CGFuncDesc fd; + CGParamDesc* pds = NULL; - /* Function body: open a parameter scope, then descend into body. The - * spine has no params, so we just open an empty scope. */ memset(&fd, 0, sizeof fd); fd.sym = fsym; fd.text_section_id = p->text_sec; @@ -1088,20 +1786,61 @@ static void parse_function_definition(Parser* p, const DeclSpecs* specs, fd.fn_type = fn_ty; fd.abi = abi; fd.params = NULL; - fd.nparams = 0; + fd.nparams = nparams; fd.loc = fname_loc; - scope_push(p); + if (nparams) { + pds = (CGParamDesc*)arena_array(p->c->tu, CGParamDesc, nparams); + memset(pds, 0, sizeof(CGParamDesc) * nparams); + for (u16 i = 0; i < nparams; ++i) { + pds[i].index = i; + pds[i].name = infos[i].name; + pds[i].type = infos[i].type; + pds[i].slot = FRAME_SLOT_NONE; /* filled below */ + pds[i].abi = &abi->params[i]; + /* The aarch64 backend reads parts from `pds[i].abi->parts` directly; + * `incoming` is the materialized CGABIPart slot used by ABIs that + * pre-stage values. Leave NULL until a backend wires it up. */ + pds[i].incoming = NULL; + pds[i].nincoming = 0; + pds[i].loc = infos[i].loc; + } + fd.params = pds; + } + + scope_push(p); /* parameter scope */ cg_set_loc(p->cg, fname_loc); cg_func_begin(p->cg, &fd); + + /* Allocate FS_PARAM slots and dispatch cg_param in declaration order. */ + for (u16 i = 0; i < nparams; ++i) { + FrameSlotDesc fsd; + FrameSlot s; + SymEntry* e; + memset(&fsd, 0, sizeof fsd); + fsd.type = infos[i].type; + fsd.name = infos[i].name; + fsd.loc = infos[i].loc; + fsd.size = abi_sizeof(p->abi, infos[i].type); + fsd.align = abi_alignof(p->abi, infos[i].type); + fsd.kind = FS_PARAM; + fsd.flags = FSF_NONE; + s = cg_local(p->cg, &fsd); + pds[i].slot = s; + cg_param(p->cg, &pds[i]); + if (infos[i].name) { + e = scope_define(p, infos[i].name, SEK_LOCAL, infos[i].type); + e->v.slot = s; + } + } + parse_compound_stmt(p); - /* Implicit fall-through return for `int main` — emit a return-0 if the - * function reaches the closing brace without an explicit return. The - * codegen always emits a real epilogue at func_end, so this is just a - * safety belt against undefined behavior on trailing fall-through. - * Spine cases all `return ...;` explicitly, so this is dead code there. */ - if (specs->type && specs->type->kind != TY_VOID) { - cg_push_int(p->cg, 0, specs->type); + /* Implicit fall-through return: emit a return so the function's epilogue + * always has a tail to chain into. For non-void functions this returns + * a zero value, which is undefined behavior at the language level but + * a useful safety belt against trailing-fall-through. */ + if (fn_ty->fn.ret && fn_ty->fn.ret->kind != TY_VOID) { + cg_push_int(p->cg, 0, fn_ty->fn.ret); cg_ret(p->cg, 1); } else { cg_ret(p->cg, 0); @@ -1110,29 +1849,61 @@ static void parse_function_definition(Parser* p, const DeclSpecs* specs, scope_pop(p); } +/* Parse one external declaration: function definition, function prototype, + * or (deferred) global object declaration. The declarator is consumed by + * parse_declarator before we know whether a body or `;` follows. */ static void parse_external_decl(Parser* p) { DeclSpecs specs; - Tok name_tok; Sym name; SrcLoc loc; + const Type* base_ty; if (!parse_decl_specs(p, &specs)) { perr(p, "expected declaration"); } - /* Parse the declarator. v1 slice: just IDENT — pointer/array layers - * are TODO. */ + /* Parse the declarator's pointer prefix and IDENT. Function and array + * declarator suffixes are recognized inline below. */ + base_ty = parse_pointer_layer(p, specs.type); if (p->cur.kind != TOK_IDENT || ident_kw(p, p->cur.v.ident) != KW_NONE) { perr(p, "expected declarator"); } - name_tok = p->cur; - loc = tok_loc(&name_tok); - name = name_tok.v.ident; + name = p->cur.v.ident; + loc = tok_loc(&p->cur); advance(p); if (is_punct(&p->cur, '(')) { - parse_function_definition(p, &specs, name, loc); - return; + /* Function declaration or definition: build the type from the param + * list, then dispatch on `{` (definition) vs `;` (prototype). */ + ParamInfo* infos = NULL; + u16 nparams = 0; + u8 variadic = 0; + const Type** ptypes = NULL; + const Type* fn_ty; + const ABIFuncInfo* abi; + SymEntry* fent; + + advance(p); /* '(' */ + parse_param_list(p, &infos, &nparams, &variadic); + expect_punct(p, ')', "')' after parameter list"); + + if (nparams) { + ptypes = (const Type**)arena_array(p->c->tu, const Type*, nparams); + for (u16 i = 0; i < nparams; ++i) ptypes[i] = infos[i].type; + } + fn_ty = type_func(p->pool, base_ty, ptypes, nparams, (int)variadic); + abi = abi_func_info(p->abi, fn_ty); + + fent = declare_function(p, name, fn_ty, &specs, loc); + + if (is_punct(&p->cur, '{')) { + parse_function_body(p, fent->v.sym, fn_ty, abi, infos, nparams, loc); + return; + } + if (accept_punct(p, ';')) { + return; /* prototype only */ + } + perr(p, "expected '{' or ';' after function declarator"); } /* Global object declaration: `int g;` / `int g = 7;` / `int g = ..., h;` */ diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md @@ -56,12 +56,12 @@ function definition itself. | Case | Status | Body | Expected | |---|---|---|---| -| `6_5_01_return_const` | · | `return 42;` | 42 | -| `6_5_02_add` | · | `return 1 + 2;` | 3 | -| `6_5_03_sub_mul` | · | `return 7 * 3 - 4;` | 17 | -| `6_8_01_if_else` | · | `int x; if (1) x = 7; else x = 99; return x;` | 7 | -| `6_8_02_while_sum` | · | `int s=0,i=0; while (i<10) { s+=i; i++; } return s;` | 45 | -| `6_8_03_for_sum` | · | `int s=0; for (int i=1; i<=10; i++) s+=i; return s;` | 55 | +| `6_5_01_return_const` | ★ | `return 42;` | 42 | +| `6_5_02_add` | ★ | `return 1 + 2;` | 3 | +| `6_5_03_sub_mul` | ★ | `return 7 * 3 - 4;` | 17 | +| `6_8_01_if_else` | ★ | `int x; if (1) x = 7; else x = 99; return x;` | 7 | +| `6_8_02_while_sum` | ★ | `int s=0,i=0; while (i<10) { s+=i; i++; } return s;` | 45 | +| `6_8_03_for_sum` | ★ | `int s=0; for (int i=1; i<=10; i++) s+=i; return s;` | 55 | Negative spine (one case per spec area; expands as the parser learns to diagnose more): @@ -82,7 +82,7 @@ natural home elsewhere. | `6_2_3_01_tag_ord_namespace` | · | `struct s { int v; }; int s = 42; struct s t = {0}; return s + t.v;` | 42 | | `6_2_3_02_label_namespace` | · | `int s = 0; goto s; s = 99; s: return 42;` | 42 | | `6_2_4_01_static_keeps_value` | · | helper `int next(){static int n=40; return ++n;}`; `next(); return next();` | 42 | -| `6_2_5_01_void_func_no_value` | · | helper `void f(int *p){*p=42;} int x; f(&x); return x;` | 42 | +| `6_2_5_01_void_func_no_value` | ★ | helper `void f(int *p){*p=42;} int x; f(&x); return x;` | 42 | ## §6.3 Conversions @@ -91,17 +91,17 @@ explicit cast; rows here fill in the rest of the conversion matrix. | Case | Status | Body | Expected | |---|---|---|---| -| `6_3_1_1_01_char_promotion` | · | `char c = 'A'; return c - '@' + 41;` | 42 | +| `6_3_1_1_01_char_promotion` | ★ | `char c = 'A'; return c - '@' + 41;` | 42 | | `6_3_1_3_01_signed_to_unsigned` | · | `int n = -1; unsigned u = (unsigned)n; return (int)(u & 0xff);` | 255 | | `6_3_1_3_02_unsigned_narrow` | · | `unsigned u = 0x100002aU; int n = (int)u; return n;` | 42 | | `6_3_1_4_01_float_to_int` | · | `double d = 42.9; return (int)d;` | 42 | | `6_3_1_4_02_int_to_float` | · | `int n = 42; double d = n; return (int)d;` | 42 | -| `6_3_1_8_01_usual_arith_mixed` | · | `int s = -1; unsigned u = 1; return (s + u) ? 0 : 42;` | 42 | +| `6_3_1_8_01_usual_arith_mixed` | ★ | `int s = -1; unsigned u = 1; return (s + u) ? 0 : 42;` | 42 | | `6_3_2_1_01_array_to_ptr` | · | `int a[3] = {0,0,42}; int *p = a; return p[2];` | 42 | | `6_3_2_1_02_func_to_ptr` | · | helper `id`; `int (*fp)(int) = id; return fp(42);` | 42 | -| `6_3_2_2_01_void_cast_discard` | · | `(void)42; return 42;` | 42 | -| `6_3_2_3_01_null_ptr_cmp` | · | `int *p = 0; return p ? 99 : 42;` | 42 | -| `6_3_2_3_02_void_ptr_roundtrip` | · | `int x=42; void *v=&x; int *p=(int*)v; return *p;` | 42 | +| `6_3_2_2_01_void_cast_discard` | ★ | `(void)42; return 42;` | 42 | +| `6_3_2_3_01_null_ptr_cmp` | ★ | `int *p = 0; return p ? 99 : 42;` | 42 | +| `6_3_2_3_02_void_ptr_roundtrip` | ★ | `int x=42; void *v=&x; int *p=(int*)v; return *p;` | 42 | ## §6.5 Expressions @@ -111,33 +111,33 @@ here for completeness once they're real cases. | Case | Status | Body | Expected | |---|---|---|---| -| `6_5_01_return_const` | · | `return 42;` | 42 | -| `6_5_02_add` | · | `return 1 + 2;` | 3 | -| `6_5_03_sub_mul` | · | `return 7 * 3 - 4;` | 17 | -| `6_5_04_div_mod` | · | `return 23 / 4 + 23 % 4;` | 8 | -| `6_5_05_bitwise_and` | · | `return (~3) & 0xff;` | 252 | -| `6_5_06_bitwise_or_xor` | · | `return (0xa5 ^ 0x5a) & 0xff;` | 255 | -| `6_5_07_shift` | · | `return (1<<5) \| (16>>1);` | 40 | -| `6_5_08_unary_neg` | · | `return -7;` | 249 | -| `6_5_09_logical_not` | · | `return !0 + !!5;` | 2 | -| `6_5_10_cmp_eq` | · | `return (5 == 5) + (5 == 6);` | 1 | -| `6_5_11_cmp_lt` | · | `return (-1 < 1);` | 1 | -| `6_5_12_logical_and_skip` | · | `int s=0; (0) && (s=99); return s;` | 0 | -| `6_5_13_logical_or_skip` | · | `int s=0; (1) \|\| (s=99); return s;` | 0 | -| `6_5_14_ternary` | · | `return (5>3) ? 42 : 7;` | 42 | -| `6_5_15_comma` | · | `int x; return (x=1, x=42, x);` | 42 | -| `6_5_16_assign` | · | `int x; x = 42; return x;` | 42 | -| `6_5_17_compound_assign` | · | `int x = 40; x += 2; return x;` | 42 | -| `6_5_18_pre_inc` | · | `int x = 41; return ++x;` | 42 | -| `6_5_19_post_inc` | · | `int x = 42; x++; return x;` | 43; reads as 43 | -| `6_5_20_addr_deref` | · | `int x = 42; int *p = &x; return *p;` | 42 | -| `6_5_21_sizeof_int` | · | `return (int)sizeof(int);` | 4 | +| `6_5_01_return_const` | ★ | `return 42;` | 42 | +| `6_5_02_add` | ★ | `return 1 + 2;` | 3 | +| `6_5_03_sub_mul` | ★ | `return 7 * 3 - 4;` | 17 | +| `6_5_04_div_mod` | ★ | `return 23 / 4 + 23 % 4;` | 8 | +| `6_5_05_bitwise_and` | ★ | `return (~3) & 0xff;` | 252 | +| `6_5_06_bitwise_or_xor` | ★ | `return (0xa5 ^ 0x5a) & 0xff;` | 255 | +| `6_5_07_shift` | ★ | `return (1<<5) \| (16>>1);` | 40 | +| `6_5_08_unary_neg` | ★ | `return -7;` | 249 | +| `6_5_09_logical_not` | ★ | `return !0 + !!5;` | 2 | +| `6_5_10_cmp_eq` | ★ | `return (5 == 5) + (5 == 6);` | 1 | +| `6_5_11_cmp_lt` | ★ | `return (-1 < 1);` | 1 | +| `6_5_12_logical_and_skip` | ★ | `int s=0; (0) && (s=99); return s;` | 0 | +| `6_5_13_logical_or_skip` | ★ | `int s=0; (1) \|\| (s=99); return s;` | 0 | +| `6_5_14_ternary` | ★ | `return (5>3) ? 42 : 7;` | 42 | +| `6_5_15_comma` | ★ | `int x; return (x=1, x=42, x);` | 42 | +| `6_5_16_assign` | ★ | `int x; x = 42; return x;` | 42 | +| `6_5_17_compound_assign` | ★ | `int x = 40; x += 2; return x;` | 42 | +| `6_5_18_pre_inc` | ★ | `int x = 41; return ++x;` | 42 | +| `6_5_19_post_inc` | ★ | `int x = 42; x++; return x;` | 43; reads as 43 | +| `6_5_20_addr_deref` | ★ | `int x = 42; int *p = &x; return *p;` | 42 | +| `6_5_21_sizeof_int` | ★ | `return (int)sizeof(int);` | 4 | | `6_5_22_sizeof_expr` | · | `int a[7]; return (int)(sizeof(a)/sizeof(int));` | 7 | -| `6_5_23_cast` | · | `return (int)(unsigned char)(-1);` | 255 | -| `6_5_24_func_call` | · | helper `int id(int x){return x;}` + `return id(42);` | 42 | -| `6_5_25_unary_plus` | · | `return +42;` | 42 | -| `6_5_26_pre_dec` | · | `int x = 43; return --x;` | 42 | -| `6_5_27_post_dec` | · | `int x = 43; x--; return x;` | 42 | +| `6_5_23_cast` | ★ | `return (int)(unsigned char)(-1);` | 255 | +| `6_5_24_func_call` | ★ | helper `int id(int x){return x;}` + `return id(42);` | 42 | +| `6_5_25_unary_plus` | ★ | `return +42;` | 42 | +| `6_5_26_pre_dec` | ★ | `int x = 43; return --x;` | 42 | +| `6_5_27_post_dec` | ★ | `int x = 43; x--; return x;` | 42 | | `6_5_28_arrow` | · | `struct S{int v;} s={42}; struct S *p=&s; return p->v;` | 42 | | `6_5_29_compound_literal` | · | `int *p = (int[]){10, 32}; return p[0]+p[1];` | 42 | | `6_5_30_generic_selection`| · | `int x=42; return _Generic((x), int: x, default: 0);` | 42 | @@ -149,7 +149,7 @@ here for completeness once they're real cases. | Case | Status | Body | Expected | |---|---|---|---| | `6_6_01_enum_const` | · | `enum { K = 42 }; return K;` | 42 | -| `6_6_02_const_expr_init` | · | `int x = 1+2*3; return x;` | 7 | +| `6_6_02_const_expr_init` | ★ | `int x = 1+2*3; return x;` | 7 | | `6_6_03_array_size_const` | · | `int a[3+4] = {0}; return (int)sizeof a / (int)sizeof a[0];` | 7 | ## §6.7 Declarations @@ -157,14 +157,14 @@ here for completeness once they're real cases. | Case | Status | Body | Expected | |---|---|---|---| | `6_7_01_typedef` | · | `typedef int I; I x = 42; return x;` | 42 | -| `6_7_02_static_local` | · | `static int s = 42; return s;` | 42 | +| `6_7_02_static_local` | ★ | `static int s = 42; return s;` | 42 | | `6_7_03_static_global` | · | `static int g = 42; int test_main(void){return g;}` | 42 | | `6_7_04_extern_resolved` | · | `extern int g; int g = 42; return g;` | 42 | -| `6_7_05_const_qualifier` | · | `const int c = 42; return c;` | 42 | +| `6_7_05_const_qualifier` | ★ | `const int c = 42; return c;` | 42 | | `6_7_06_struct_basic` | · | `struct S { int a, b; } s = {10, 32}; return s.a + s.b;` | 42 | | `6_7_07_union_basic` | · | `union U { int i; char c[4]; } u; u.i = 42; return u.i;` | 42 | | `6_7_08_enum_basic` | · | `enum E { A = 40, B }; return B + 1;` | 42 | -| `6_7_09_alignof` | · | `return (int)_Alignof(double);` | 8 | +| `6_7_09_alignof` | ★ | `return (int)_Alignof(double);` | 8 | ## §6.7.2 Type specifiers @@ -174,15 +174,15 @@ that the type round-trips through a declaration and back to `int`. | Case | Status | Body | Expected | |---|---|---|---| -| `6_7_2_01_short` | · | `short x = 42; return x;` | 42 | -| `6_7_2_02_long` | · | `long x = 42L; return (int)x;` | 42 | -| `6_7_2_03_long_long` | · | `long long x = 42LL; return (int)x;` | 42 | +| `6_7_2_01_short` | ★ | `short x = 42; return x;` | 42 | +| `6_7_2_02_long` | ★ | `long x = 42L; return (int)x;` | 42 | +| `6_7_2_03_long_long` | ★ | `long long x = 42LL; return (int)x;` | 42 | | `6_7_2_04_unsigned` | · | `unsigned x = 42U; return (int)x;` | 42 | -| `6_7_2_05_signed_char` | · | `signed char c = 42; return c;` | 42 | -| `6_7_2_06_unsigned_char` | · | `unsigned char c = 200; return c;` | 200 | -| `6_7_2_07_unsigned_short` | · | `unsigned short s = 42; return s;` | 42 | -| `6_7_2_08_unsigned_long` | · | `unsigned long x = 42UL; return (int)x;` | 42 | -| `6_7_2_09_bool` | · | `_Bool b = 5; return b ? 42 : 0;` | 42 | +| `6_7_2_05_signed_char` | ★ | `signed char c = 42; return c;` | 42 | +| `6_7_2_06_unsigned_char` | ★ | `unsigned char c = 200; return c;` | 200 | +| `6_7_2_07_unsigned_short` | ★ | `unsigned short s = 42; return s;` | 42 | +| `6_7_2_08_unsigned_long` | ★ | `unsigned long x = 42UL; return (int)x;` | 42 | +| `6_7_2_09_bool` | ★ | `_Bool b = 5; return b ? 42 : 0;` | 42 | | `6_7_2_10_float` | · | `float f = 42.0f; return (int)f;` | 42 | | `6_7_2_11_double` | · | `double d = 42.5; return (int)d;` | 42 | | `6_7_2_12_long_double` | · | `long double d = 42.0L; return (int)d;` | 42 | @@ -209,18 +209,18 @@ remaining qualifier forms and pointer-qualifier interactions. | Case | Status | Body | Expected | |---|---|---|---| -| `6_7_3_01_volatile` | · | `volatile int x = 42; return x;` | 42 | -| `6_7_3_02_restrict_param` | · | helper `int rd(int *restrict p){return *p;}` + caller | 42 | -| `6_7_3_03_const_pointer` | · | `int x=42; int *const p=&x; return *p;` | 42 | -| `6_7_3_04_ptr_to_const` | · | `const int x=42; const int *p=&x; return *p;` | 42 | -| `6_7_3_05_atomic` | · | `_Atomic int x = 42; return x;` | 42 | +| `6_7_3_01_volatile` | ★ | `volatile int x = 42; return x;` | 42 | +| `6_7_3_02_restrict_param` | ★ | helper `int rd(int *restrict p){return *p;}` + caller | 42 | +| `6_7_3_03_const_pointer` | ★ | `int x=42; int *const p=&x; return *p;` | 42 | +| `6_7_3_04_ptr_to_const` | ★ | `const int x=42; const int *p=&x; return *p;` | 42 | +| `6_7_3_05_atomic` | ★ | `_Atomic int x = 42; return x;` | 42 | ## §6.7.4 Function specifiers | Case | Status | Body | Expected | |---|---|---|---| -| `6_7_4_01_inline` | · | `static inline int id(int x){return x;}` + `return id(42);` | 42 | -| `6_7_4_02_noreturn` | · | full TU: `_Noreturn void die(void){for(;;);} int test_main(void){return 42;}` (declared, not called) | 42 | +| `6_7_4_01_inline` | ★ | `static inline int id(int x){return x;}` + `return id(42);` | 42 | +| `6_7_4_02_noreturn` | ★ | full TU: `_Noreturn void die(void){for(;;);} int test_main(void){return 42;}` (declared, not called) | 42 | ## §6.7.5 Alignment specifier @@ -236,7 +236,7 @@ already exercised in §6.5 and §6.7. | Case | Status | Body | Expected | |---|---|---|---| -| `6_7_6_01_ptr_to_ptr` | · | `int x=42; int *p=&x; int **pp=&p; return **pp;` | 42 | +| `6_7_6_01_ptr_to_ptr` | ★ | `int x=42; int *p=&x; int **pp=&p; return **pp;` | 42 | | `6_7_6_02_array_2d` | · | `int a[2][3]={{0,0,0},{0,0,42}}; return a[1][2];` | 42 | | `6_7_6_03_array_of_ptr` | · | `int x=42; int *a[2]={0,&x}; return *a[1];` | 42 | | `6_7_6_04_funcptr_decl` | · | `int id(int x){return x;} int (*fp)(int)=id; return fp(42);` | 42 | @@ -261,7 +261,7 @@ cover compound typedef targets. | Case | Status | Body | Expected | |---|---|---|---| -| `6_7_9_01_scalar_init` | · | `int x = 42; return x;` | 42 | +| `6_7_9_01_scalar_init` | ★ | `int x = 42; return x;` | 42 | | `6_7_9_02_array_brace` | · | `int a[3] = {10, 20, 12}; return a[0]+a[1]+a[2];` | 42 | | `6_7_9_03_partial_zero` | · | `int a[5] = {42}; return a[0] + a[4];` | 42 | | `6_7_9_04_designated` | · | `int a[5] = {[2] = 42}; return a[2];` | 42 | @@ -283,31 +283,31 @@ cover compound typedef targets. | Case | Status | Body | Expected | |---|---|---|---| -| `6_8_01_if_else` | · | `int x; if (1) x=7; else x=99; return x;` | 7 | -| `6_8_02_while_sum` | · | sum 0..9 with `while` | 45 | -| `6_8_03_for_sum` | · | sum 1..10 with `for` | 55 | +| `6_8_01_if_else` | ★ | `int x; if (1) x=7; else x=99; return x;` | 7 | +| `6_8_02_while_sum` | ★ | sum 0..9 with `while` | 45 | +| `6_8_03_for_sum` | ★ | sum 1..10 with `for` | 55 | | `6_8_04_do_while` | · | `int i=0; do { i=42; } while (0); return i;` | 42 | -| `6_8_05_break` | · | `for (i=0;;i++) if (i==42) break; return i;` | 42 | -| `6_8_06_continue` | · | sum of evens in `[0,20)` via `continue` | 90 | +| `6_8_05_break` | ★ | `for (i=0;;i++) if (i==42) break; return i;` | 42 | +| `6_8_06_continue` | ★ | sum of evens in `[0,20)` via `continue` | 90 | | `6_8_07_switch_case` | · | three-arm switch returns 42 on case 2 | 42 | | `6_8_08_switch_fallthrough` | · | `case 1: r+=10; case 2: r+=20;` on input 1 | 30 | | `6_8_09_switch_default` | · | unmatched switch hits `default` | 7 | | `6_8_10_goto_forward` | · | `goto L; r=99; L: return 42;` | 42 | | `6_8_11_goto_backward` | · | counter loop built with `goto` | 10 | -| `6_8_12_block_scope` | · | inner `{ int x=42; }` shadows outer | 42 | -| `6_8_13_compound_decl_mix` | · | declarations interleaved with statements (C99) | 42 | -| `6_8_14_return_void` | · | `void f(void){return;}; f(); return 42;` | 42 | -| `6_8_15_null_statement` | · | `for (int i=0;i<42;i++) ; return i;` | 42 | +| `6_8_12_block_scope` | ★ | inner `{ int x=42; }` shadows outer | 42 | +| `6_8_13_compound_decl_mix` | ★ | declarations interleaved with statements (C99) | 42 | +| `6_8_14_return_void` | ★ | `void f(void){return;}; f(); return 42;` | 42 | +| `6_8_15_null_statement` | ★ | `for (int i=0;i<42;i++) ; return i;` | 42 | ## §6.9 External definitions | Case | Status | Body | Expected | |---|---|---|---| -| `6_9_01_two_functions` | · | helper + caller in one TU | 42 | -| `6_9_02_recursive_function` | · | `factorial(5)` | 120 | +| `6_9_01_two_functions` | ★ | helper + caller in one TU | 42 | +| `6_9_02_recursive_function` | ★ | `factorial(5)` | 120 | | `6_9_03_tentative_def` | · | file-scope `int g;` (tentative) + use | 0 | -| `6_9_04_static_func` | · | `static int helper(...)` + caller | 42 | -| `6_9_05_proto_then_def` | · | forward declaration before body | 42 | +| `6_9_04_static_func` | ★ | `static int helper(...)` + caller | 42 | +| `6_9_05_proto_then_def` | ★ | forward declaration before body | 42 | | `6_9_06_variadic_func` | · | `sum(int n, ...)` over `va_arg`; `sum(2,20,22)` (paired with builtin_03) | 42 | | `6_9_07_global_const` | · | full TU: `const int g = 42; int test_main(void){return g;}` | 42 | | `6_9_08_global_struct_init` | · | full TU: `struct S{int v;} g={42}; int test_main(void){return g.v;}` | 42 |