boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 87ad4aceb8900a6d217748e4fea96df2c9170938
parent ed1a327c76414859d1bfaaf2dc037e11f7c7977e
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 26 Apr 2026 21:12:57 -0700

cc: add CC-PUNCHLIST.md — TDD checklist of red codegen capabilities

Engineer-facing checklist organized for the CC-INTERNALS §Feature
workflow (cc-cg fixture → cg impl → cc-parse fixture → parse impl).
Items grouped by area: width-correct integer codegen, lvalue mechanics,
sizeof, aggregates, initializers, control flow, variadics, conditionals
as values, storage classes, driver/envelope, expressions and
conversions, aggregates round 2. Linked from cc/README.md as the
fourth item in the reading order.

Diffstat:
Mcc/README.md | 2++
Adocs/CC-PUNCHLIST.md | 396+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 398 insertions(+), 0 deletions(-)

diff --git a/cc/README.md b/cc/README.md @@ -11,6 +11,8 @@ Read in this order: 2. [docs/CC-INTERNALS.md](../docs/CC-INTERNALS.md) — module interfaces. 3. [docs/CC-CONTRACTS.md](../docs/CC-CONTRACTS.md) — frozen alphabets, ABI, test formats, mangling, phase-1 milestone. +4. [docs/CC-PUNCHLIST.md](../docs/CC-PUNCHLIST.md) — TDD checklist of + codegen capabilities still red. ## Files diff --git a/docs/CC-PUNCHLIST.md b/docs/CC-PUNCHLIST.md @@ -0,0 +1,396 @@ +# CC codegen punch list + +C99-subset codegen capabilities, ordered for red→green TDD per +[CC-INTERNALS.md §Feature workflow](CC-INTERNALS.md#feature-workflow). +The accepted language surface is defined in [CC.md](CC.md); this doc +is the implementation checklist against that surface. + +## Conventions + +- Every item has up to three runtime-validated fixtures: + - **cg**: `tests/cc-cg/<n>-name.scm` — drives the cg API directly. + - **parse**: `tests/cc-parse/<n>-name.c` — exercises the same shape + via real C. + - **e2e**: `tests/cc-e2e/<n>-name.c` — only for capabilities that + stress the full envelope (driver, multi-fn, libc). +- Acceptance: `make test SUITE=cc-cg` (then cc-parse, then cc-e2e) + green on all three arches. The runner asserts `.expected-exit` + (default `0`) and `.expected` stdout (default empty). +- Land cg work + cg fixture in one PR; parse work + parse fixture in + the next. Don't block on parse to start cg. +- Pick the next free `<n>` per suite. cc-cg + cc-parse currently end + at 14; cc-e2e at 00. +- Status legend: `[ ]` red · `[~]` partial · `[x]` green. + +## Already green + +cc-cg 00–14 + cc-parse 00–14 + cc-e2e 00 cover: empty fn, `return` +with const/param, two-param fn, i64 binops, locals + assign, `if` / +`if-else`, `while` with `break` / `continue`, direct calls (0..5 +args, with stack staging), string literal interning, file-scope +zero-init globals, `&x` on a param, typedef plumbing through to a +return. + +## Punch list + +### A. Width-correct integer codegen + +The 64-bit-everything load/store path is the largest correctness gap +upstream of nearly everything else. Land this first. + +- [ ] **`char` (8-bit) load/store via lval** + - cg: `cc-cg/NN-char-roundtrip.scm` — store `0xAA` into a 1-byte slot, + load it, exit with the low 8 bits → exit 170. + - parse: `cc-parse/NN-char-arith.c` — `unsigned char a = 0xAA; return a;` + → exit 170. + - Needs: `%cg-emit-ld` / `%cg-emit-st` dispatch on `ctype-size` to + `%ldb` / `%stb` (and matching libp1pp helpers if absent). + +- [ ] **`short` (16-bit) load/store via lval** + - cg: `cc-cg/NN-short-roundtrip.scm` + - parse: `cc-parse/NN-short-arith.c` + - Needs: `%ldh` / `%sth` paths. + +- [ ] **`int` (32-bit) load/store via lval** + - cg: `cc-cg/NN-int-roundtrip.scm` — distinct from cc-cg/04 because it + forces a 4-byte slot, not an 8-byte spill. + - parse: `cc-parse/NN-int-arith.c` + - Needs: `%ldw` / `%stw`. Existing fixtures pass because cg always + spills i32 results into 8-byte slots. + +- [ ] **Signed narrowing keeps sign on re-widen** + - cg: `cc-cg/NN-sext-narrow.scm` — `(unsigned)(int)(char)-3` → exit 253. + - parse: `cc-parse/NN-sext-narrow.c` + - Needs: `cg-cast` emits sign-extend on the narrow path (or signed + `%lds*` loads); `cg-promote` emits sext when source rank < int. + +- [ ] **Unsigned narrowing zero-extends** + - cg: `cc-cg/NN-zext-narrow.scm` — `(unsigned)(unsigned char)-3` → 253. + - parse: `cc-parse/NN-zext-narrow.c` + - Needs: `cg-cast` zero-fill on unsigned target. + +- [ ] **Integer promotion preserves sign across operations** + - cg: `cc-cg/NN-promote-sign.scm` — operate on a `signed char` slot + holding `-1`; promote, add 1, return 0. + - parse: `cc-parse/NN-promote-sign.c` + - Needs: `cg-promote` is currently relabel-only; emit sext for + `i8`/`i16` sources. + +### B. Lvalue mechanics + +`cg-take-addr` does not preserve the original lval, so any operation +that needs to *use* an lvalue twice (compound assign, inc/dec) is +broken. Pick one fix and document it in +[CC-CONTRACTS §4.1](CC-CONTRACTS.md#41-parsers-responsibilities) row +"`lhs += rhs`": + +- (a) `cg-take-addr` leaves `[orig-lval, ptr-rval]`; or +- (b) introduce `cg-dup` (duplicate top vstack entry). + +- [ ] **Pre-`++` / pre-`--`** + - cg: `cc-cg/NN-preinc.scm` — `int x = 5; ++x; return x;` → exit 6. + - parse: `cc-parse/NN-preinc.c` + - Needs: lhs preservation per above. + +- [ ] **Post-`++` / post-`--` returns old value** + - cg: `cc-cg/NN-postinc.scm` — `int x=5; int y=x++; return x*10+y;` + → exit 65. + - parse: `cc-parse/NN-postinc.c` + - Needs: `cg-postinc` / `cg-postdec`, or parser uses `cg-dup` to + keep the old rval before the store. + +- [ ] **Compound assignment on simple lval (`+= -= *= /= %= <<= >>= &= ^= |=`)** + - cg: `cc-cg/NN-cmpd-simple.scm` — `int x=7; x+=3; return x;` → exit 10. + - parse: `cc-parse/NN-cmpd-simple.c` — one fixture per op family is + fine; the cg primitives are shared. + - Needs: same lhs preservation; existing parser sequence (take-addr, + push-deref, load, rhs, arith-conv, binop, assign) works once + preservation is in. + +- [ ] **Compound assignment through pointer** + - cg: `cc-cg/NN-cmpd-ptr.scm` — `int x=7; int *p=&x; *p+=3; return x;` + - parse: `cc-parse/NN-cmpd-ptr.c` + - Needs: validates the indirect-slot path in `cg-assign`. + +- [ ] **`*p++` walking an array** + - cg: `cc-cg/NN-deref-postinc.scm` — sums a 3-element array. + - parse: `cc-parse/NN-deref-postinc.c` + - Needs: composes B above with pointer arithmetic scaling. + +### C. `sizeof` + +- [ ] **`sizeof e` returns the type's actual size** + - parse: `cc-parse/NN-sizeof-expr.c` — `int x; return sizeof x;` → 4. + - Needs: parser peeks `(opnd-type (cg-top …))`, computes size, pops, + pushes `imm u64 size`. Today returns 8 always + (`parse.scm` line ~836). + +- [ ] **`sizeof` over struct, array, pointer, char** + - parse: `cc-parse/NN-sizeof-types.c` — sum of representative sizes + against a known integer. + +### D. Aggregates + +- [ ] **Struct member load** + - cg: `cc-cg/NN-struct-load.scm` — pushes a struct frame lval at + offset, loads field-typed value. + - parse: `cc-parse/NN-struct-load.c` — `struct S {int a; int b;}; struct S s; + s.a=1; s.b=2; return s.a + s.b*10;` → exit 21. + - Needs: `cg-push-field cg fname` — pop struct/union lval, look up + `fname` in `ctype-ext`'s `(tag complete? fields)`, push frame + lval at the right offset with the field's ctype. Replaces the + parser stub at `parse.scm` lines 947–960 that ignores the field + name and uses offset 0. + +- [ ] **Struct member store** + - cg: `cc-cg/NN-struct-store.scm` + - parse: `cc-parse/NN-struct-store.c` + - Needs: same primitive plus width-correct stores from §A. + +- [ ] **Pointer-to-struct (`p->x`)** + - cg: `cc-cg/NN-arrow.scm` + - parse: `cc-parse/NN-arrow.c` + - Needs: parser does ptr → deref → field via `cg-push-field`. + +- [ ] **Nested struct access (`s.inner.x`, `s->inner.x`)** + - parse: `cc-parse/NN-struct-nested.c` + +- [ ] **Array element access at non-zero index** + - cg: `cc-cg/NN-array-index.scm` — `int a[3]; a[0]=1; a[1]=2; a[2]=4; + return a[0]+a[1]+a[2];` → exit 7. + - parse: `cc-parse/NN-array-index.c` + - Needs: array lval decays to ptr-rval (in `cg-push-sym` or via a + new `cg-decay-array`); verify scaling for `arr` types in + `cg-binop add`. + +- [ ] **Multi-dim arrays** + - parse: `cc-parse/NN-array-2d.c` + - Needs: derived `arr (arr T N) M`; verify size/align/decay. + +- [ ] **Struct passed by pointer to a function** + - parse: `cc-parse/NN-struct-fn-arg.c` — passes `&s`. + - Needs: nothing new; smoke-tests §D primitives. + + *Pass-by-value of structs is outside CC.md's accepted set; tcc.c + doesn't use it.* + +### E. Initializers + +`parse-init-list` (`parse.scm` lines 398–413) currently balances +braces and returns `#f`, dropping all initializer data. `cg-emit-global` +accepts an init bv but is never given one. + +- [ ] **Scalar global with constant initializer** + - cg: `cc-cg/NN-init-scalar-global.scm` — emit `int g = 42` via cg + API; in `main`, return g. + - parse: `cc-parse/NN-init-scalar-global.c` + - Needs: parser builds an N-byte LE bv from the const expression and + passes to `cg-emit-global`. + +- [ ] **Scalar global with address initializer (`int *p = &x;`)** + - cg: `cc-cg/NN-init-addr.scm` + - parse: `cc-parse/NN-init-addr.c` + - Needs: `cg-emit-global` accepts a structured init (bytes + + label-references) and emits `&label` form to `cg-data`. + +- [ ] **Array global from element list** + - cg: `cc-cg/NN-init-array-list.scm` — `int a[3] = {1,2,4};` + - parse: `cc-parse/NN-init-array-list.c` + +- [ ] **Array global from string literal** + - parse: `cc-parse/NN-init-array-str.c` — `char s[]="abc"; return s[1];` + → exit 98. + +- [ ] **Struct global, positional init** + - parse: `cc-parse/NN-init-struct-pos.c` + +- [ ] **Struct global, designated init (`.field = …`)** + - parse: `cc-parse/NN-init-struct-desig.c` + - Needs: required by tcc.c per CC.md §Variable initializers. + +- [ ] **Local array initializer** + - parse: `cc-parse/NN-init-local-array.c` + - Needs: parser emits per-element store sequence into the frame slot. + +- [ ] **Local struct initializer** + - parse: `cc-parse/NN-init-local-struct.c` + +### F. Control flow extensions + +- [ ] **`do { } while (e);`** + - cg: `cc-cg/NN-do-while.scm` + - parse: `cc-parse/NN-do-while.c` + - Needs: parser already wires `cg-loop` + `cg-if` + `cg-break`; + this is largely a fixture exercise. + +- [ ] **`for (init; cond; step)` with declaration in `init`** + - parse: `cc-parse/NN-for-decl.c` + - Needs: existing `parse-for-stmt` exercised end-to-end. + +- [ ] **`switch / case / default` with fall-through** + - cg: `cc-cg/NN-switch.scm` — three cases falling through to default. + - parse: `cc-parse/NN-switch.c` + - Needs: validates the existing `swctx` machinery in cg. + +- [ ] **`goto` / labelled statement (forward and backward)** + - cg: `cc-cg/NN-goto.scm` + - parse: `cc-parse/NN-goto.c` + - Needs: replace the `cg-break` hack in `parse-goto-stmt`. Add + `cg-emit-label cg name-bv` (drops `::user_<name>`) and + `cg-goto cg name-bv` (emits `%b(&::user_<name>)`). + `parse-labelled-stmt` calls `cg-emit-label` before the inner stmt. + +### G. Variadics + +- [ ] **Variadic call: per-arg default-promote** + - cg: `cc-cg/NN-vararg-call.scm` + - parse: `cc-parse/NN-vararg-call.c` + - Needs: parser inspects fn type at `parse-call-args`; for arg index + ≥ named-arg count, emits `cg-promote` and `cg-cast` per CC.md + §Implicit conversions. + +- [ ] **Variadic receive: `__builtin_va_start/arg/end`** + - cg: `cc-cg/NN-vararg-recv.scm` — sums N int-typed variadic args. + - parse: `cc-parse/NN-vararg-recv.c` + - Needs: `cg-va-start cg ap-lval`, `cg-va-arg cg ap-lval ctype`, + `cg-va-end cg ap-lval`. Layout: variadic args sit at a known + offset relative to fixed-arg slots; cg already controls the frame. + - Also needs: a bundled `stdarg.h` (CC.md §Standard library + expectations — "supplied by us"). + +### H. Conditionals as values + +`cg-ifelse` is correct for `if`-statements (thunks push nothing) but +leaks two opnds when both thunks push (ternary, `&&`, `||`). The fix +is a result-merging primitive: caller pre-allocates the result slot, +both branches store into it, vstack ends with one frame opnd. + +- [ ] **Ternary `?:` leaves exactly one rval** + - cg: `cc-cg/NN-ternary.scm` — `int x = c ? 1 : 2; return x;` → exit 1. + - parse: `cc-parse/NN-ternary.c` + - Needs: result-merging primitive (`cg-ifelse-merge` or similar); + parser passes the result type, cg allocates the slot. + +- [ ] **`&&` short-circuit leaves exactly one i32 rval** + - cg: `cc-cg/NN-land.scm` + - parse: `cc-parse/NN-land.c` + - Needs: same merging primitive; result type is `%t-i32` + irrespective of operands. + +- [ ] **`||` short-circuit leaves exactly one i32 rval** + - cg: `cc-cg/NN-lor.scm` + - parse: `cc-parse/NN-lor.c` + +### I. Storage classes + +- [ ] **Block-scope `static` lives in bss/data, not on the stack** + - cg: `cc-cg/NN-block-static.scm` — counter that survives across calls. + - parse: `cc-parse/NN-block-static.c` + - Needs: `parse.scm` `handle-decl` checks `sto = 'static'` *before* + branching on `(ps-fn-ctx ps)` and routes static block-scope to + `cg-emit-global`. Mangling adds the function name to avoid + cross-function collisions (e.g. `cc__<fn>__<var>`). + +### J. Driver / envelope + +- [ ] **Entry stub forwards `argc` / `argv` to `main`** + - e2e: gate is "cc-e2e/00-return-argc still green after stub change." + - Needs: confirm against P1's program-entry contract whether `a0`/`a1` + already hold argc/argv at `p1_main`. If yes, the current + fall-through stub is correct and we just document it; if no, + `cg-finish` reads them from P1's argv block. + +- [ ] **`int main()` falling off the end returns 0** + - parse: `cc-parse/NN-main-noret.c` — `int main(){}` → exit 0. + - Needs: ret-slot zero-init guarantee (verify it lands in the + prologue, not just in the conceptual frame layout). + +- [ ] **Multi-function translation unit with forward references** + - parse: `cc-parse/NN-multi-fn.c` + +### K. Expressions and conversions + +- [ ] **Comma operator (`a, b` as expression)** + - parse: `cc-parse/NN-comma.c` — `int a; int b; (a=1, b=2); return a + b*10;` + → exit 21. + - Needs: add `comma` to `%binop-bp` at lowest precedence, left-assoc. + Handler discards lhs (`cg-pop`) before evaluating rhs. tcc.c uses + this in `for` headers. + +- [ ] **Function-pointer call** + - cg: `cc-cg/NN-fnptr-call.scm` — push a fn-typed sym, spill to a + frame slot, reload, call. + - parse: `cc-parse/NN-fnptr-call.c` — `int (*fp)(int) = f; return fp(41);` + → exit 42. + - Needs: exercises `cg-call`'s `%callr(t0)` branch; verify + return-type extraction walks `ptr → fn → ret` correctly. + +- [ ] **Enum constant in expressions** + - parse: `cc-parse/NN-enum-const.c` — `enum E { A=1, B=10 }; return A+B;` + → exit 11. + - Needs: existing `cg-push-sym` `'enum-const` branch; just a fixture. + +- [ ] **`void *` ↔ `T *` implicit conversion (no cast required)** + - parse: `cc-parse/NN-voidptr-impl.c` — `void *p; int x=42; p=&x; + int *q=p; return *q;` → exit 42. + - Needs: parser accepts both directions at assignment, return, and + call sites without an explicit cast. cg's relabel-only path + between pointer types already supports it. + +- [ ] **Implicit narrowing of fixed-arg call arguments to declared + param type** + - parse: `cc-parse/NN-call-narrow.c` — `int f(unsigned char x){return x;} + int main(){ return f(258); }` → exit 2. + - Needs: `parse-call-args` emits `cg-cast` per fixed arg to the + declared param type (variadic args are §G.1). + +- [ ] **Pointer comparison is unsigned** + - cg: `cc-cg/NN-ptr-cmp.scm` — verify two frame-slot pointers compare + via `ltu`. + - parse: `cc-parse/NN-ptr-cmp.c` — `int a[2]; return &a[1] > &a[0];` + → exit 1. + - Needs: confirms `cg-binop`'s `lt/le/gt/ge` dispatch picks the + unsigned variant when either operand is `ptr` or `arr`. Likely + already correct; locks it in. + +### L. Aggregates round 2 + +- [ ] **Flexible array member as last struct field** + - parse: `cc-parse/NN-flex-array.c` — `struct s { int n; int data[]; };` + indexed via a global instance plus malloc-extra padding. + tcc.c's `Sym` / `TokenSym` rely on this. + - Needs: parser accepts `T name[]` only as last field; `complete-agg!` + sets `ctype-size` to the offset of the flex member (excludes its + extent); `cg-push-field` for the flex member returns an `arr`- + typed lval that decays to `ptr` on use. + +- [ ] **`T[]` in parameter position decays to `T *`** + - parse: `cc-parse/NN-array-param-decay.c` — `int sum(int a[], int n) + { int s=0; for(int i=0;i<n;i++) s+=a[i]; return s; }` → known sum. + - Needs: parser detects `arr` ctype in fn-param position and + rewrites to `ptr` before slot allocation. cg sees a pointer and + needs no special handling. + +- [ ] **Array of function pointers initialized with named functions** + - parse: `cc-parse/NN-fnptr-tab.c` — `int f1(){return 1;} + int f2(){return 2;} int (*tab[])() = {f1, f2}; + return tab[0]() + tab[1]()*10;` → exit 21. + - Needs: composes §E.4 (array list init) with §E.2 (address init); + parser admits a fn name as an initializer expression that + evaluates to a label reference. + +## Phase milestones (CC.md §Validation) + +The CC.md milestones gate on contiguous blocks above. Each lights up +once its dependencies are green: + +- [ ] **Self-test sweep** (cc-e2e mirroring tests/scheme1) — depends on §A, + §B, §C, §F, §H. +- [ ] **Hand-written hello-world ELF** — depends on §G, §I, §J + a + string-formatting libc surface. +- [ ] **Compile mes libc `unified-libc.c`** — depends on §D, §E. +- [ ] **Compile tcc.c (tcc-mes defines)** — depends on everything above. +- [ ] **tcc-lispcc builds tcc-boot0**; checksum matches live-bootstrap. + +The last is the bootstrap milestone — at that point lispcc has fully +replaced MesCC in the chain.