CC codegen punch list
C99-subset codegen capabilities, ordered for red→green TDD per CC-INTERNALS.md §Feature workflow. The accepted language surface is defined in CC.md; this doc is the implementation checklist against that surface.
Conventions
- Every item has up to two runtime-validated fixtures:
- cg:
tests/cc-cg/<n>-name.scm— drives the cg API directly. - cc:
tests/cc/<n>-name.c— exercises the same shape via real C through the full compiler driver. Use the same bucket for full-envelope scenarios (driver, multi-fn, libc) once those land.
- cg:
- Acceptance:
make test SUITE=cc-cg(then cc) green on all three arches. The runner asserts.expected-exit(default0) and.expectedstdout (default empty). - Land cg work + cg fixture in one PR; cc fixture + parse work in the next. Don't block on parse to start cg.
- Pick the next free
<n>per suite. - Status legend:
[ ]red ·[~]partial ·[x]green.
Already green
cc-cg 00–14 + cc 00–14 cover: empty fn, return
with const/param, two-param fn, i64 binops, locals + assign, if /
if-else, while with break / continue, direct calls (0..5
args, with stack staging), string literal interning, file-scope
zero-init globals, &x on a param, typedef plumbing through to a
return.
Punch list
A. Width-correct integer codegen
The 64-bit-everything load/store path is the largest correctness gap upstream of nearly everything else. Land this first.
char(8-bit) load/store via lval- cg:
cc-cg/15-char-roundtrip.scm— two adjacent 1-byte slots; stores must not bleed across slots → exit 1 on equality check. - cc:
cc/15-char-arith.c— same shape from C. - Done: added
%cg-emit-{ld,st}{,-slot}-typedhelpers that dispatch onctype-size = 1to%lb/%sb;%cg-load-opnd-into,cg-load,cg-assignthread the lval ctype.i8loads also sign-extend viashli/sari56. 16/32-bit fall through to the 8-byte path until §A.2/§A.3 land.
- cg:
short(16-bit) load/store via lval- cg:
cc-cg/16-short-roundtrip.scm - cc:
cc/16-short-arith.c - Done: byte-decomposed dispatch in the typed helpers — store low
byte then
%shri8 + store high byte; load two bytes +%shli8%or.i16sign-extends viashli/sari48. Helpers may clobbert1; callers never passreg=t1.
- cg:
int(32-bit) load/store via lval- cg:
cc-cg/17-int-roundtrip.scm - cc:
cc/17-int-arith.c - Done: byte-decomposed dispatch generalised to N bytes via
%cg-emit-{ld,st}N-bytes; size-4 routes through 4×%lb/%sbwith shift+OR / shri-shift.i32sign-extends viashli/sari32. The address-staging in%cg-load-opnd-intoandcg-load's indirect path now usest2so multi-byte gathers don't alias dest with base.
- cg:
Signed narrowing keeps sign on re-widen
- cg:
cc-cg/18-sext-narrow.scm—(int)(char)-3 == -3→ exit 1. - cc:
cc/18-sext-narrow.c - Done:
cg-cast's narrowing branch nowshli/sari's for signed narrow targets (i8/i16/i32) instead of masking, so the slot holds the canonical sign-extended 64-bit form. The widening cast back (relabel-only) preserves it.
- cg:
Unsigned narrowing zero-extends
- cg:
cc-cg/19-zext-narrow.scm—(unsigned)(unsigned char)-3 == 253. - cc:
cc/19-zext-narrow.c - Done: pre-existing mask-on-unsigned-narrow path was already correct; fixture locks the contrast with §A.4 in (same source, same chain shape, divergent result via target signedness).
- cg:
Integer promotion preserves sign across operations
- cg:
cc-cg/20-promote-sign.scm—signed char x=-1; ((int)x)+2 == 1via 64-bit comparison so a non-canonical 0x101 result fails. - cc:
cc/20-promote-sign.c - Done: load-side sign-extension from §A.1 already canonicalises the
slot, so
cg-promote's relabel-only path is correct. Fixture locks the invariant in.
- cg:
B. Lvalue mechanics
Picked (b) cg-dup — duplicate the top vstack entry, used for
compound assign and pre-inc/dec to keep the lhs lval available across
its own load. Post-inc/dec use a dedicated cg-postinc / cg-postdec
primitive to capture the old rval before the store. See
CC-CONTRACTS §4.1.
Pre-
++/ pre---- cg:
cc-cg/21-preinc.scm—int x = 5; ++x; return x;→ exit 6. - cc:
cc/21-preinc.c - Done: parser dups lhs lval, loads, +1, assigns; pops result.
- cg:
Post-
++/ post---returns old value- cg:
cc-cg/22-postinc.scm—int x=5; int y=x++; return x*10+y;→ exit 65. - cc:
cc/22-postinc.c - Done:
cg-postinc/cg-postdecprimitives composed of two dup+load passes — one to capture the old rval (which lives in a never-reused spill slot), one to compute the +1/-1 store.
- cg:
Compound assignment on simple lval (
+= -= *= /= %= <<= >>= &= ^= |=)- cg:
cc-cg/23-cmpd-simple.scm—int x=7; x+=3; return x;→ exit 10. - cc:
cc/23-cmpd-simple.c— one of every op family. - Done: parser uses
cg-dup+cg-load+ rhs + arith-conv + binop + assign; thecg-take-addr/cg-push-derefindirection is gone.
- cg:
Compound assignment through pointer
- cg:
cc-cg/24-cmpd-ptr.scm—int x=7; int *p=&x; *p+=3; return x; - cc:
cc/24-cmpd-ptr.c - Done: same parser sequence;
cg-push-deref's indirect-slot lval composes correctly withcg-dup+cg-assign.
- cg:
*p++walking an array- cg:
cc-cg/25-deref-postinc.scm— walks a 3-element span via *p++. - cc:
cc/25-deref-postinc.c - Done: composes B.2 (post-inc on a ptr lval; pointer scaling falls
out of
cg-binop add's ptr branch) with*pderef. Also fixedcg-arith-convto skip the relabel when one operand is ptr/arr socg-binopstill sees a ptr/int pair (previously it saw both sides relabelled to ptr and skipped the scaling).
- cg:
C. sizeof
sizeof ereturns the type's actual size- cc:
cc/26-sizeof-expr.c—int x; return sizeof x;→ 4. - Done: parser peeks
(opnd-type (cg-top …)), takes itsctype-size, pops, pushesimm u64 size. Both forms (sizeof eandsizeof(e)) updated.
- cc:
sizeofover struct, array, pointer, char- cc:
cc/27-sizeof-types.c— sum ofchar,short,int,long,int*,int[5],struct S{int a; int b;}→ 51. - Done: the type form already returned
ctype-size ty; this fixture just locks the answer in.
- cc:
D. Aggregates
Struct member load
- cg:
cc-cg/36-struct-load.scm— two-int struct, fields at 0 and 4. - cc:
cc/36-struct-load.c - Done: added
cg-push-field cg fname(cg.scm). Pops struct/union lval, looks upfnameinctype-ext's(tag complete? fields). Three input cases: direct frame lval shifts the slot offset; indirect frame lval loads addr+fo into a new indirect slot; global lvalla's the label, addsfo, stashes via indirect slot. Parserdotarm replaced.
- cg:
Struct member store
- cg:
cc-cg/37-struct-store.scm— three u8 fields, distinct multipliers in the readback to isolate per-field width. - cc:
cc/37-struct-store.c - Done: cg-push-field from §D.1 plus the width-aware store path from §A.1. No new primitive.
- cg:
Pointer-to-struct (
p->x)- cg:
cc-cg/38-arrow.scm - cc:
cc/38-arrow.c - Done: arrow arm in
parse-postfix-restcalls rval! (loads ptr), cg-push-deref (struct lval through ptr), then cg-push-field. Indirect-frame branch of cg-push-field (added in §D.1) handles the deref-result struct lval correctly.
- cg:
Nested struct access (
s.inner.x,s->inner.x)- cc:
cc/39-struct-nested.c - Done: cg-push-field pushes a new lval whose ctype is the field's
type; if that's a struct, a subsequent
.xchains naturally. The fixture exercises both s.inner.x (direct frame) and p->inner.x (indirect frame, via the §D.1 indirect path).
- cc:
Array element access at non-zero index
- cg:
cc-cg/40-array-index.scm - cc:
cc/40-array-index.c - Done: cg-load on an arr-typed lval delegates to cg-decay-array,
pushing a ptr-rval to the first element. Existing
cg-binop addpointer-arithmetic path scales by the pointee size, soa + iyields&a[i], and cg-push-deref turns that into the element lval. cg-take-addr on an arr lval was also adjusted to yield T* (not (T[N])*) so&a[0]stays consistent.
- cg:
Multi-dim arrays
- cc:
cc/41-array-2d.c - Done: fixed
parse-decl-suf-contto apply suffixes right-to-left (innermost first) soint a[2][3]producesarr (arr int 3) 2, notarr (arr int 2) 3. Same fix in the fn-suffix arm soT (...)(...)chains compose correctly. Decay- ptr arithmetic from §D.5 then handles the rest.
- cc:
Struct passed by pointer to a function
- cc:
cc/42-struct-fn-arg.c— passes&stosum2, callee returnsp->x + p->y. - Done: composes §D.1/§D.3 (cg-push-field, arrow access) and the pre-existing param/call/return wiring. No new primitive.
Pass-by-value of structs is outside CC.md's accepted set; tcc.c doesn't use it.
- cc:
E. Initializers
parse-init-list (parse.scm lines 398–413) currently balances
braces and returns #f, dropping all initializer data. cg-emit-global
accepts an init bv but is never given one.
Scalar global with constant initializer
- cg:
cc-cg/49-init-scalar-global.scm - cc:
cc/49-init-scalar-global.c - Done: cg-emit-global now consumes a list of pieces (bytevectors
or
(label-ref . label-bv)pairs); parser's parse-init-global builds N-byte LE bv via %int->le-bv from a const expression.
- cg:
Scalar global with address initializer (
int *p = &x;)- cg:
cc-cg/50-init-addr.scm - cc:
cc/50-init-addr.c - Done: %const-init-piece recognises
&IDENTand bare-IDENT for fn / static / extern symbols, emitting(label-ref . cc__name).
- cg:
Array global from element list
- cg:
cc-cg/51-init-array-list.scm - cc:
cc/51-init-array-list.c - Done: %parse-init-array-list walks brace lists; element types
drive bv width; array-name → label-ref decay so
int *p = a;works as init too.
- cg:
Array global from string literal
- cg:
cc-cg/52-init-array-str.scm - cc:
cc/52-init-array-str.c - Done: parse-init-global recognises STR for char[] target;
inferred-length arrays (
T a[]) get patched with the literal length.
- cg:
Struct global, positional init
- cg:
cc-cg/53-init-struct-pos.scm - cc:
cc/53-init-struct-pos.c - Done: %parse-init-struct-list walks fields positionally; trailing fields zero-padded.
- cg:
Struct global, designated init (
.field = …)- cg:
cc-cg/54-init-struct-desig.scm - cc:
cc/54-init-struct-desig.c - Done: same %parse-init-struct-list handles
.name = …form. Also: cg-arith-conv now leaves pointer-typed operands alone so*(p + N)scales correctly when one side is ptr.
- cg:
Local array initializer
- cc:
cc/55-init-local-array.c - Done: parse-init-local-aggregate emits per-element store ops at slot+(i*esize); zero-pads trailing slots when declared length exceeds initializer count.
- cc:
Local struct initializer
- cc:
cc/56-init-local-struct.c - Done: same per-field store sequence; designated form supported.
- cc:
F. Control flow extensions
do { } while (e);- cg:
cc-cg/63-do-while.scm - cc:
cc/63-do-while.c - Done: composes existing
cg-loop+cg-if+cg-break; fixture-only.
- cg:
for (init; cond; step)with declaration ininit- cc:
cc/64-for-decl.c - Done: existing
parse-for-stmtexercised end-to-end.
- cc:
switch / case / defaultwith fall-through- cg:
cc-cg/64-switch.scm— three cases falling through to default. - cc:
cc/65-switch.c - Done: validated the existing
swctxmachinery in cg.
- cg:
goto/ labelled statement (forward and backward)- cg:
cc-cg/65-goto.scm - cc:
cc/66-goto.c - Done: replaced the
cg-breakhack inparse-goto-stmt. Addedcg-emit-label cg name-bv(drops::user_<name>) andcg-goto cg name-bv(emits%b(&::user_<name>)).parse-labelled-stmtnow callscg-emit-labelbefore the inner stmt. - Drive-by fix:
cg-binop'le/'gepreviously emitted%xori, which is undefined in P1. Replaced with%li(t1,1) %xor(t0,t0,t1).
- cg:
G. Variadics
Variadic call: per-arg default-promote
- cg:
cc-cg/66-vararg-call.scm - cc:
cc/67-vararg-call.c - Done: parser inspects fn type at
parse-call-args; for arg index ≥ named-arg count, emitscg-promoteper CC.md §Implicit conversions. Fixed-arg index emitscg-castto declared param type (also covers §K.5).
- cg:
Variadic receive:
__builtin_va_start/arg/end- cg:
cc-cg/69-vararg-recv.scm— sums N int-typed variadic args. - cc:
cc/76-vararg-recv.c - Done: added
cg-va-start cg,cg-va-arg cg ctype,cg-va-end cg(each pops ap-lval from vstack);cg-fn-begin/vreserves a 16-slot incoming-arg window: indices 0..3 are saved from a-registers, indices 4..15 fromLDARGslots 0..11. va_arg walks the window linearly from the slot at index = named-arg count. Parser recognizes__builtin_va_start/arg/endat parse-primary;parse-fn-bodythreads the fn ctype's variadic? flag. Bundledcc/headers/stdarg.haliasesva_list/va_start/va_arg/va_endto the builtins. Lock-in fixturecc/79-vararg-deep.cexercises 1 named + 6 variadic args. Limit: 15 variadic args after the named ones; bumpVARARG_WINDOW(currently 16) incg-fn-begin/vto extend.
- cg:
H. Conditionals as values
Added cg-ifelse-merge: caller pre-allocates the result slot, each
thunk pushes one rval that is then loaded and stored into the slot,
and the slot's frame rval is left on the vstack. The merged result
type is taken from the first thunk's pushed type — parser is
responsible for arranging compatible types in the two branches.
Ternary
?:leaves exactly one rval- cg:
cc-cg/28-ternary.scm—c ? 7 : 9with c=1 → exit 7. - cc:
cc/28-ternary.c - Done: parser swaps
cg-ifelseforcg-ifelse-mergein theqmarkarm ofparse-binary-rhs; both branches push their parsed rval directly.
- cg:
&&short-circuit leaves exactly one i32 rval- cg:
cc-cg/29-land.scm - cc:
cc/29-land.c - Done: parser injects
cg-cast %t-boolthencg-cast %t-i32on the rhs side so the merged result is i32 ∈ {0,1}; the else-arm pushes%t-i32 0.
- cg:
||short-circuit leaves exactly one i32 rval- cg:
cc-cg/30-lor.scm - cc:
cc/30-lor.c - Done: mirrors §H.2 with the bool-cast on the else-arm and a
constant
%t-i32 1in the then-arm.
- cg:
I. Storage classes
- Block-scope
staticlives in bss/data, not on the stack- cg:
cc-cg/57-block-static.scm - cc:
cc/57-block-static.c - Done: handle-decl gates on
sto = 'static'before the file-vs-block branch and routes block-scope statics to cg-emit-global with acc__<fn>__<var>mangled label.
- cg:
J. Driver / envelope
Entry stub forwards
argc/argvtomain- cc:
cc/00-return-argc(already green; locked in). - Done: P1's program-entry contract delivers
a0=argc,a1=argvatp1_main(P1.md §Program Entry).%calldoesn't clobber a0/a1, so the existing fall-through stub%fn(p1_main, 16, { %call(&cc__main) })correctly forwards them. Documented incg-finish.
- cc:
int main()falling off the end returns 0- cc:
cc/68-main-noret.c—int main(){}→ exit 0. - Done:
cg-fn-beginnow zero-inits the ret slot in the prologue when the return type isn't void, so falling through to::retreads back a defined 0 instead of relying on kernel zero-fill.
- cc:
Multi-function translation unit with forward references
- cc:
cc/69-multi-fn.c - Done: forward declaration
int helper(int x);binds an extern fn sym up-front soparse-primaryfinds it before the definition appears.
- cc:
K. Expressions and conversions
Comma operator (
a, bas expression)- cc:
cc/31-comma.c—int a; int b; (a=1, b=2); return a + b*10;→ exit 21. - Done: added
(comma . (1 . 2))to%binop-bp(left-assoc, belowassign's 4/3 soparse-call-args ps 4still won't slurp it as a call separator); handlercg-pops the lhs and evaluates the rhs.
- cc:
Function-pointer call
- cg:
cc-cg/67-fnptr-call.scm— push a fn-typed sym, spill to a frame slot, reload, call. - cc:
cc/71-fnptr-call.c—int (*fp)(int) = f; return fp(7);→ exit 21. - Done: exercises
cg-call's%callr(t0)branch; verified return-type extraction walksptr → fn → retcorrectly.
- cg:
Enum constant in expressions
- cc:
cc/72-enum-const.c—enum E { A=1, B=10, C }; return A+B+C;→ exit 22. - Done: locked in the existing
parse-primary'enum-constbranch.
- cc:
void *↔T *implicit conversion (no cast required)- cc:
cc/73-voidptr-impl.c—void *p; int x=42; p=&x; int *q=p; return *q;→ exit 42. - Done: cg-cast's
to-kind = 'ptrclause is relabel-only between any pointer types;cg-assigndrives the cast each direction.
- cc:
Implicit narrowing of fixed-arg call arguments to declared param type
- cc:
cc/74-call-narrow.c—int f(unsigned char x){return x;} int main(){ return f(258); }→ exit 2. - Done:
parse-call-argsnow emitscg-castper fixed arg to the declared param type (variadic args promoted via §G.1).
- cc:
Pointer comparison is unsigned
- cg:
cc-cg/68-ptr-cmp.scm— verifies%ifelse_ltudispatch. - cc:
cc/75-ptr-cmp.c—int a[2]; return &a[1] > &a[0];→ exit 1. - Done:
cg-binop'slt/le/gt/gedispatch already picks the unsigned variant for ptr/arr operands. Fixtures lock it in.
- cg:
L. Aggregates round 2
Flexible array member as last struct field
- cc:
cc/77-flex-array.c— backsstruct s { int n; int data[]; };with anint[]buffer, casts tostruct s *, indexes throughp->data[i]. tcc.c'sSym/TokenSymrely on this. - Done: parser already accepted
T name[](no size) via the existingparse-decl-suf-cont[]arm;complete-agg!already excluded flex extent because(max sz 0)collapses-1to 0;cg-push-field- the indirect-frame path +
cg-decay-arrayalready composed correctly. The actual fix was a latent cast-conversion bug inparse-cast-or-unary— it skippedrval!when the target type was a pointer, so(struct s *)bufrelabeled the array's lval rather than decaying it to a ptr-rval. Nowrval!runs for every cast (matches C lvalue-conversion semantics: arrays decay, lvals become rvals before the bit-cast).
- the indirect-frame path +
- cc:
T[]in parameter position decays toT *- cc:
cc/43-array-param-decay.c—int sum(int a[], int n) { ... s+=a[i]; ... } sum(xs,4)over{1,2,3,4}→ exit 10. - Done:
parse-fn-paramsrewrites arr→ptr (and fn→ptr) before slot allocation, so cg sees an 8-byte ptr slot and the callee'sa[i]decays the param's loaded ptr-rval and scales the index correctly.
- cc:
Array of function pointers initialized with named functions
- cg:
cc-cg/58-fnptr-tab.scm - cc:
cc/58-fnptr-tab.c - Done: composes §E.3's array list init with §E.2's label-ref;
%const-init-piece's bare-IDENT branch already covers
fnsyms. Parse fixture uses an explicit typedef for the array element becauseint (*tab[])()declarators are owned elsewhere.
- cg:
Phase milestones (CC.md §Validation)
The CC.md milestones gate on contiguous blocks above. Each lights up once its dependencies are green:
- Self-test sweep (cc fixtures mirroring tests/scheme1) — depends on §A, §B, §C, §F, §H.
- Hand-written hello-world ELF — depends on §G, §I, §J + a string-formatting libc surface.
- Compile mes libc
unified-libc.c— depends on §D, §E. - Compile tcc.c (tcc-mes defines) — depends on everything above.
- tcc-lispcc builds tcc-boot0; checksum matches live-bootstrap.
The last is the bootstrap milestone — at that point lispcc has fully replaced MesCC in the chain.