commit 87ad4aceb8900a6d217748e4fea96df2c9170938
parent ed1a327c76414859d1bfaaf2dc037e11f7c7977e
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 26 Apr 2026 21:12:57 -0700
cc: add CC-PUNCHLIST.md — TDD checklist of red codegen capabilities
Engineer-facing checklist organized for the CC-INTERNALS §Feature
workflow (cc-cg fixture → cg impl → cc-parse fixture → parse impl).
Items grouped by area: width-correct integer codegen, lvalue mechanics,
sizeof, aggregates, initializers, control flow, variadics, conditionals
as values, storage classes, driver/envelope, expressions and
conversions, aggregates round 2. Linked from cc/README.md as the
fourth item in the reading order.
Diffstat:
2 files changed, 398 insertions(+), 0 deletions(-)
diff --git a/cc/README.md b/cc/README.md
@@ -11,6 +11,8 @@ Read in this order:
2. [docs/CC-INTERNALS.md](../docs/CC-INTERNALS.md) — module interfaces.
3. [docs/CC-CONTRACTS.md](../docs/CC-CONTRACTS.md) — frozen alphabets, ABI,
test formats, mangling, phase-1 milestone.
+4. [docs/CC-PUNCHLIST.md](../docs/CC-PUNCHLIST.md) — TDD checklist of
+ codegen capabilities still red.
## Files
diff --git a/docs/CC-PUNCHLIST.md b/docs/CC-PUNCHLIST.md
@@ -0,0 +1,396 @@
+# CC codegen punch list
+
+C99-subset codegen capabilities, ordered for red→green TDD per
+[CC-INTERNALS.md §Feature workflow](CC-INTERNALS.md#feature-workflow).
+The accepted language surface is defined in [CC.md](CC.md); this doc
+is the implementation checklist against that surface.
+
+## Conventions
+
+- Every item has up to three runtime-validated fixtures:
+ - **cg**: `tests/cc-cg/<n>-name.scm` — drives the cg API directly.
+ - **parse**: `tests/cc-parse/<n>-name.c` — exercises the same shape
+ via real C.
+ - **e2e**: `tests/cc-e2e/<n>-name.c` — only for capabilities that
+ stress the full envelope (driver, multi-fn, libc).
+- Acceptance: `make test SUITE=cc-cg` (then cc-parse, then cc-e2e)
+ green on all three arches. The runner asserts `.expected-exit`
+ (default `0`) and `.expected` stdout (default empty).
+- Land cg work + cg fixture in one PR; parse work + parse fixture in
+ the next. Don't block on parse to start cg.
+- Pick the next free `<n>` per suite. cc-cg + cc-parse currently end
+ at 14; cc-e2e at 00.
+- Status legend: `[ ]` red · `[~]` partial · `[x]` green.
+
+## Already green
+
+cc-cg 00–14 + cc-parse 00–14 + cc-e2e 00 cover: empty fn, `return`
+with const/param, two-param fn, i64 binops, locals + assign, `if` /
+`if-else`, `while` with `break` / `continue`, direct calls (0..5
+args, with stack staging), string literal interning, file-scope
+zero-init globals, `&x` on a param, typedef plumbing through to a
+return.
+
+## Punch list
+
+### A. Width-correct integer codegen
+
+The 64-bit-everything load/store path is the largest correctness gap
+upstream of nearly everything else. Land this first.
+
+- [ ] **`char` (8-bit) load/store via lval**
+ - cg: `cc-cg/NN-char-roundtrip.scm` — store `0xAA` into a 1-byte slot,
+ load it, exit with the low 8 bits → exit 170.
+ - parse: `cc-parse/NN-char-arith.c` — `unsigned char a = 0xAA; return a;`
+ → exit 170.
+ - Needs: `%cg-emit-ld` / `%cg-emit-st` dispatch on `ctype-size` to
+ `%ldb` / `%stb` (and matching libp1pp helpers if absent).
+
+- [ ] **`short` (16-bit) load/store via lval**
+ - cg: `cc-cg/NN-short-roundtrip.scm`
+ - parse: `cc-parse/NN-short-arith.c`
+ - Needs: `%ldh` / `%sth` paths.
+
+- [ ] **`int` (32-bit) load/store via lval**
+ - cg: `cc-cg/NN-int-roundtrip.scm` — distinct from cc-cg/04 because it
+ forces a 4-byte slot, not an 8-byte spill.
+ - parse: `cc-parse/NN-int-arith.c`
+ - Needs: `%ldw` / `%stw`. Existing fixtures pass because cg always
+ spills i32 results into 8-byte slots.
+
+- [ ] **Signed narrowing keeps sign on re-widen**
+ - cg: `cc-cg/NN-sext-narrow.scm` — `(unsigned)(int)(char)-3` → exit 253.
+ - parse: `cc-parse/NN-sext-narrow.c`
+ - Needs: `cg-cast` emits sign-extend on the narrow path (or signed
+ `%lds*` loads); `cg-promote` emits sext when source rank < int.
+
+- [ ] **Unsigned narrowing zero-extends**
+ - cg: `cc-cg/NN-zext-narrow.scm` — `(unsigned)(unsigned char)-3` → 253.
+ - parse: `cc-parse/NN-zext-narrow.c`
+ - Needs: `cg-cast` zero-fill on unsigned target.
+
+- [ ] **Integer promotion preserves sign across operations**
+ - cg: `cc-cg/NN-promote-sign.scm` — operate on a `signed char` slot
+ holding `-1`; promote, add 1, return 0.
+ - parse: `cc-parse/NN-promote-sign.c`
+ - Needs: `cg-promote` is currently relabel-only; emit sext for
+ `i8`/`i16` sources.
+
+### B. Lvalue mechanics
+
+`cg-take-addr` does not preserve the original lval, so any operation
+that needs to *use* an lvalue twice (compound assign, inc/dec) is
+broken. Pick one fix and document it in
+[CC-CONTRACTS §4.1](CC-CONTRACTS.md#41-parsers-responsibilities) row
+"`lhs += rhs`":
+
+- (a) `cg-take-addr` leaves `[orig-lval, ptr-rval]`; or
+- (b) introduce `cg-dup` (duplicate top vstack entry).
+
+- [ ] **Pre-`++` / pre-`--`**
+ - cg: `cc-cg/NN-preinc.scm` — `int x = 5; ++x; return x;` → exit 6.
+ - parse: `cc-parse/NN-preinc.c`
+ - Needs: lhs preservation per above.
+
+- [ ] **Post-`++` / post-`--` returns old value**
+ - cg: `cc-cg/NN-postinc.scm` — `int x=5; int y=x++; return x*10+y;`
+ → exit 65.
+ - parse: `cc-parse/NN-postinc.c`
+ - Needs: `cg-postinc` / `cg-postdec`, or parser uses `cg-dup` to
+ keep the old rval before the store.
+
+- [ ] **Compound assignment on simple lval (`+= -= *= /= %= <<= >>= &= ^= |=`)**
+ - cg: `cc-cg/NN-cmpd-simple.scm` — `int x=7; x+=3; return x;` → exit 10.
+ - parse: `cc-parse/NN-cmpd-simple.c` — one fixture per op family is
+ fine; the cg primitives are shared.
+ - Needs: same lhs preservation; existing parser sequence (take-addr,
+ push-deref, load, rhs, arith-conv, binop, assign) works once
+ preservation is in.
+
+- [ ] **Compound assignment through pointer**
+ - cg: `cc-cg/NN-cmpd-ptr.scm` — `int x=7; int *p=&x; *p+=3; return x;`
+ - parse: `cc-parse/NN-cmpd-ptr.c`
+ - Needs: validates the indirect-slot path in `cg-assign`.
+
+- [ ] **`*p++` walking an array**
+ - cg: `cc-cg/NN-deref-postinc.scm` — sums a 3-element array.
+ - parse: `cc-parse/NN-deref-postinc.c`
+ - Needs: composes B above with pointer arithmetic scaling.
+
+### C. `sizeof`
+
+- [ ] **`sizeof e` returns the type's actual size**
+ - parse: `cc-parse/NN-sizeof-expr.c` — `int x; return sizeof x;` → 4.
+ - Needs: parser peeks `(opnd-type (cg-top …))`, computes size, pops,
+ pushes `imm u64 size`. Today returns 8 always
+ (`parse.scm` line ~836).
+
+- [ ] **`sizeof` over struct, array, pointer, char**
+ - parse: `cc-parse/NN-sizeof-types.c` — sum of representative sizes
+ against a known integer.
+
+### D. Aggregates
+
+- [ ] **Struct member load**
+ - cg: `cc-cg/NN-struct-load.scm` — pushes a struct frame lval at
+ offset, loads field-typed value.
+ - parse: `cc-parse/NN-struct-load.c` — `struct S {int a; int b;}; struct S s;
+ s.a=1; s.b=2; return s.a + s.b*10;` → exit 21.
+ - Needs: `cg-push-field cg fname` — pop struct/union lval, look up
+ `fname` in `ctype-ext`'s `(tag complete? fields)`, push frame
+ lval at the right offset with the field's ctype. Replaces the
+ parser stub at `parse.scm` lines 947–960 that ignores the field
+ name and uses offset 0.
+
+- [ ] **Struct member store**
+ - cg: `cc-cg/NN-struct-store.scm`
+ - parse: `cc-parse/NN-struct-store.c`
+ - Needs: same primitive plus width-correct stores from §A.
+
+- [ ] **Pointer-to-struct (`p->x`)**
+ - cg: `cc-cg/NN-arrow.scm`
+ - parse: `cc-parse/NN-arrow.c`
+ - Needs: parser does ptr → deref → field via `cg-push-field`.
+
+- [ ] **Nested struct access (`s.inner.x`, `s->inner.x`)**
+ - parse: `cc-parse/NN-struct-nested.c`
+
+- [ ] **Array element access at non-zero index**
+ - cg: `cc-cg/NN-array-index.scm` — `int a[3]; a[0]=1; a[1]=2; a[2]=4;
+ return a[0]+a[1]+a[2];` → exit 7.
+ - parse: `cc-parse/NN-array-index.c`
+ - Needs: array lval decays to ptr-rval (in `cg-push-sym` or via a
+ new `cg-decay-array`); verify scaling for `arr` types in
+ `cg-binop add`.
+
+- [ ] **Multi-dim arrays**
+ - parse: `cc-parse/NN-array-2d.c`
+ - Needs: derived `arr (arr T N) M`; verify size/align/decay.
+
+- [ ] **Struct passed by pointer to a function**
+ - parse: `cc-parse/NN-struct-fn-arg.c` — passes `&s`.
+ - Needs: nothing new; smoke-tests §D primitives.
+
+ *Pass-by-value of structs is outside CC.md's accepted set; tcc.c
+ doesn't use it.*
+
+### E. Initializers
+
+`parse-init-list` (`parse.scm` lines 398–413) currently balances
+braces and returns `#f`, dropping all initializer data. `cg-emit-global`
+accepts an init bv but is never given one.
+
+- [ ] **Scalar global with constant initializer**
+ - cg: `cc-cg/NN-init-scalar-global.scm` — emit `int g = 42` via cg
+ API; in `main`, return g.
+ - parse: `cc-parse/NN-init-scalar-global.c`
+ - Needs: parser builds an N-byte LE bv from the const expression and
+ passes to `cg-emit-global`.
+
+- [ ] **Scalar global with address initializer (`int *p = &x;`)**
+ - cg: `cc-cg/NN-init-addr.scm`
+ - parse: `cc-parse/NN-init-addr.c`
+ - Needs: `cg-emit-global` accepts a structured init (bytes +
+ label-references) and emits `&label` form to `cg-data`.
+
+- [ ] **Array global from element list**
+ - cg: `cc-cg/NN-init-array-list.scm` — `int a[3] = {1,2,4};`
+ - parse: `cc-parse/NN-init-array-list.c`
+
+- [ ] **Array global from string literal**
+ - parse: `cc-parse/NN-init-array-str.c` — `char s[]="abc"; return s[1];`
+ → exit 98.
+
+- [ ] **Struct global, positional init**
+ - parse: `cc-parse/NN-init-struct-pos.c`
+
+- [ ] **Struct global, designated init (`.field = …`)**
+ - parse: `cc-parse/NN-init-struct-desig.c`
+ - Needs: required by tcc.c per CC.md §Variable initializers.
+
+- [ ] **Local array initializer**
+ - parse: `cc-parse/NN-init-local-array.c`
+ - Needs: parser emits per-element store sequence into the frame slot.
+
+- [ ] **Local struct initializer**
+ - parse: `cc-parse/NN-init-local-struct.c`
+
+### F. Control flow extensions
+
+- [ ] **`do { } while (e);`**
+ - cg: `cc-cg/NN-do-while.scm`
+ - parse: `cc-parse/NN-do-while.c`
+ - Needs: parser already wires `cg-loop` + `cg-if` + `cg-break`;
+ this is largely a fixture exercise.
+
+- [ ] **`for (init; cond; step)` with declaration in `init`**
+ - parse: `cc-parse/NN-for-decl.c`
+ - Needs: existing `parse-for-stmt` exercised end-to-end.
+
+- [ ] **`switch / case / default` with fall-through**
+ - cg: `cc-cg/NN-switch.scm` — three cases falling through to default.
+ - parse: `cc-parse/NN-switch.c`
+ - Needs: validates the existing `swctx` machinery in cg.
+
+- [ ] **`goto` / labelled statement (forward and backward)**
+ - cg: `cc-cg/NN-goto.scm`
+ - parse: `cc-parse/NN-goto.c`
+ - Needs: replace the `cg-break` hack in `parse-goto-stmt`. Add
+ `cg-emit-label cg name-bv` (drops `::user_<name>`) and
+ `cg-goto cg name-bv` (emits `%b(&::user_<name>)`).
+ `parse-labelled-stmt` calls `cg-emit-label` before the inner stmt.
+
+### G. Variadics
+
+- [ ] **Variadic call: per-arg default-promote**
+ - cg: `cc-cg/NN-vararg-call.scm`
+ - parse: `cc-parse/NN-vararg-call.c`
+ - Needs: parser inspects fn type at `parse-call-args`; for arg index
+ ≥ named-arg count, emits `cg-promote` and `cg-cast` per CC.md
+ §Implicit conversions.
+
+- [ ] **Variadic receive: `__builtin_va_start/arg/end`**
+ - cg: `cc-cg/NN-vararg-recv.scm` — sums N int-typed variadic args.
+ - parse: `cc-parse/NN-vararg-recv.c`
+ - Needs: `cg-va-start cg ap-lval`, `cg-va-arg cg ap-lval ctype`,
+ `cg-va-end cg ap-lval`. Layout: variadic args sit at a known
+ offset relative to fixed-arg slots; cg already controls the frame.
+ - Also needs: a bundled `stdarg.h` (CC.md §Standard library
+ expectations — "supplied by us").
+
+### H. Conditionals as values
+
+`cg-ifelse` is correct for `if`-statements (thunks push nothing) but
+leaks two opnds when both thunks push (ternary, `&&`, `||`). The fix
+is a result-merging primitive: caller pre-allocates the result slot,
+both branches store into it, vstack ends with one frame opnd.
+
+- [ ] **Ternary `?:` leaves exactly one rval**
+ - cg: `cc-cg/NN-ternary.scm` — `int x = c ? 1 : 2; return x;` → exit 1.
+ - parse: `cc-parse/NN-ternary.c`
+ - Needs: result-merging primitive (`cg-ifelse-merge` or similar);
+ parser passes the result type, cg allocates the slot.
+
+- [ ] **`&&` short-circuit leaves exactly one i32 rval**
+ - cg: `cc-cg/NN-land.scm`
+ - parse: `cc-parse/NN-land.c`
+ - Needs: same merging primitive; result type is `%t-i32`
+ irrespective of operands.
+
+- [ ] **`||` short-circuit leaves exactly one i32 rval**
+ - cg: `cc-cg/NN-lor.scm`
+ - parse: `cc-parse/NN-lor.c`
+
+### I. Storage classes
+
+- [ ] **Block-scope `static` lives in bss/data, not on the stack**
+ - cg: `cc-cg/NN-block-static.scm` — counter that survives across calls.
+ - parse: `cc-parse/NN-block-static.c`
+ - Needs: `parse.scm` `handle-decl` checks `sto = 'static'` *before*
+ branching on `(ps-fn-ctx ps)` and routes static block-scope to
+ `cg-emit-global`. Mangling adds the function name to avoid
+ cross-function collisions (e.g. `cc__<fn>__<var>`).
+
+### J. Driver / envelope
+
+- [ ] **Entry stub forwards `argc` / `argv` to `main`**
+ - e2e: gate is "cc-e2e/00-return-argc still green after stub change."
+ - Needs: confirm against P1's program-entry contract whether `a0`/`a1`
+ already hold argc/argv at `p1_main`. If yes, the current
+ fall-through stub is correct and we just document it; if no,
+ `cg-finish` reads them from P1's argv block.
+
+- [ ] **`int main()` falling off the end returns 0**
+ - parse: `cc-parse/NN-main-noret.c` — `int main(){}` → exit 0.
+ - Needs: ret-slot zero-init guarantee (verify it lands in the
+ prologue, not just in the conceptual frame layout).
+
+- [ ] **Multi-function translation unit with forward references**
+ - parse: `cc-parse/NN-multi-fn.c`
+
+### K. Expressions and conversions
+
+- [ ] **Comma operator (`a, b` as expression)**
+ - parse: `cc-parse/NN-comma.c` — `int a; int b; (a=1, b=2); return a + b*10;`
+ → exit 21.
+ - Needs: add `comma` to `%binop-bp` at lowest precedence, left-assoc.
+ Handler discards lhs (`cg-pop`) before evaluating rhs. tcc.c uses
+ this in `for` headers.
+
+- [ ] **Function-pointer call**
+ - cg: `cc-cg/NN-fnptr-call.scm` — push a fn-typed sym, spill to a
+ frame slot, reload, call.
+ - parse: `cc-parse/NN-fnptr-call.c` — `int (*fp)(int) = f; return fp(41);`
+ → exit 42.
+ - Needs: exercises `cg-call`'s `%callr(t0)` branch; verify
+ return-type extraction walks `ptr → fn → ret` correctly.
+
+- [ ] **Enum constant in expressions**
+ - parse: `cc-parse/NN-enum-const.c` — `enum E { A=1, B=10 }; return A+B;`
+ → exit 11.
+ - Needs: existing `cg-push-sym` `'enum-const` branch; just a fixture.
+
+- [ ] **`void *` ↔ `T *` implicit conversion (no cast required)**
+ - parse: `cc-parse/NN-voidptr-impl.c` — `void *p; int x=42; p=&x;
+ int *q=p; return *q;` → exit 42.
+ - Needs: parser accepts both directions at assignment, return, and
+ call sites without an explicit cast. cg's relabel-only path
+ between pointer types already supports it.
+
+- [ ] **Implicit narrowing of fixed-arg call arguments to declared
+ param type**
+ - parse: `cc-parse/NN-call-narrow.c` — `int f(unsigned char x){return x;}
+ int main(){ return f(258); }` → exit 2.
+ - Needs: `parse-call-args` emits `cg-cast` per fixed arg to the
+ declared param type (variadic args are §G.1).
+
+- [ ] **Pointer comparison is unsigned**
+ - cg: `cc-cg/NN-ptr-cmp.scm` — verify two frame-slot pointers compare
+ via `ltu`.
+ - parse: `cc-parse/NN-ptr-cmp.c` — `int a[2]; return &a[1] > &a[0];`
+ → exit 1.
+ - Needs: confirms `cg-binop`'s `lt/le/gt/ge` dispatch picks the
+ unsigned variant when either operand is `ptr` or `arr`. Likely
+ already correct; locks it in.
+
+### L. Aggregates round 2
+
+- [ ] **Flexible array member as last struct field**
+ - parse: `cc-parse/NN-flex-array.c` — `struct s { int n; int data[]; };`
+ indexed via a global instance plus malloc-extra padding.
+ tcc.c's `Sym` / `TokenSym` rely on this.
+ - Needs: parser accepts `T name[]` only as last field; `complete-agg!`
+ sets `ctype-size` to the offset of the flex member (excludes its
+ extent); `cg-push-field` for the flex member returns an `arr`-
+ typed lval that decays to `ptr` on use.
+
+- [ ] **`T[]` in parameter position decays to `T *`**
+ - parse: `cc-parse/NN-array-param-decay.c` — `int sum(int a[], int n)
+ { int s=0; for(int i=0;i<n;i++) s+=a[i]; return s; }` → known sum.
+ - Needs: parser detects `arr` ctype in fn-param position and
+ rewrites to `ptr` before slot allocation. cg sees a pointer and
+ needs no special handling.
+
+- [ ] **Array of function pointers initialized with named functions**
+ - parse: `cc-parse/NN-fnptr-tab.c` — `int f1(){return 1;}
+ int f2(){return 2;} int (*tab[])() = {f1, f2};
+ return tab[0]() + tab[1]()*10;` → exit 21.
+ - Needs: composes §E.4 (array list init) with §E.2 (address init);
+ parser admits a fn name as an initializer expression that
+ evaluates to a label reference.
+
+## Phase milestones (CC.md §Validation)
+
+The CC.md milestones gate on contiguous blocks above. Each lights up
+once its dependencies are green:
+
+- [ ] **Self-test sweep** (cc-e2e mirroring tests/scheme1) — depends on §A,
+ §B, §C, §F, §H.
+- [ ] **Hand-written hello-world ELF** — depends on §G, §I, §J + a
+ string-formatting libc surface.
+- [ ] **Compile mes libc `unified-libc.c`** — depends on §D, §E.
+- [ ] **Compile tcc.c (tcc-mes defines)** — depends on everything above.
+- [ ] **tcc-lispcc builds tcc-boot0**; checksum matches live-bootstrap.
+
+The last is the bootstrap milestone — at that point lispcc has fully
+replaced MesCC in the chain.