tcc-boot2 Current TODO
Current tracker for the scheme1-hosted cc.scm path that builds
tcc.flat.c into tcc-boot2.
Companion docs:
- TCC.md describes the surrounding tcc pipeline.
- CC.md describes the C subset and validation milestones.
- LIBC.md describes the libc side used to link
tcc-boot2.
Current State
cc.scm compiles the flattened tcc translation unit, the P1pp output
assembles and links via the M1pp + hex2++ chain, and the resulting
tcc-boot2 is at full parity with the gcc-built control on the
tcc-cc acceptance suite (see Latest Result below).
Useful smoke checks:
make tcc-boot2 ARCH=aarch64
build/aarch64/tcc-boot2/tcc-boot2 -v
build/aarch64/tcc-boot2/tcc-boot2 -E smoke.c
build/aarch64/tcc-boot2/tcc-boot2 -c smoke.c -o smoke.o
For native generated-program testing, use the ARM64-targeted build via
the tcc-cc suite:
make test SUITE=tcc-cc
tcc-cc Suite
tcc-cc runs the plain tests/cc fixtures through tcc-tcc
(second-stage tcc) instead of through cc.scm directly. The Makefile
chain is cc.scm → tcc-boot2 → tcc-tcc; the runner then does:
build/aarch64/tcc-tcc/tcc-tcc \
-nostdlib build/aarch64/tcc-cc/start.o build/aarch64/tcc-cc/mem.o \
tests/cc/NAME.c -o build/aarch64/tests/tcc-cc/NAME
./build/aarch64/tests/tcc-cc/NAME
Routing fixtures through tcc-tcc (rather than tcc-boot2 directly)
turns every fixture into a self-host check: a regression in
cc.scm's emitted code surfaces first when tcc-boot2 builds
tcc-tcc, then again when tcc-tcc runs the fixtures.
mem.o is the compiler-builtin mem* runtime — memcpy/memmove/memset
that tcc emits direct calls to for struct copies and bulk init, plus
memcmp for fixtures that reach it via bare extern int memcmp(...).
The tcc-gcc sibling supplies the equivalent four symbols by compiling
mes-libc's string/{memcpy,memmove,memset,memcmp}.c into its runtime
archive.
The result is compared against the same .expected and
.expected-exit files used by the regular cc suite. The suite is
aarch64-only today because it needs generated binaries to run natively
inside the aarch64 container.
Run a subset with NAMES:
NAMES='002-arith 007-call-with-args' make test SUITE=tcc-cc
Latest Result
make test SUITE=tcc-cc tcc-tcc on tests/cc: 181 passed, 0 failed
scripts/run-gcc-libc-flat-tcc.sh tcc-gcc baseline: 181 passed, 0 failed
Exact parity, suite fully green on both paths.
The path from earlier results to here:
| Result | Delta |
|---|---|
| 148/30 | baseline before mem-runtime |
| 163/15 | added tcc-cc/mem.c runtime; cleared the mem* undefined-symbol cluster |
| 175/3 | cc.scm migration to M1pp + hex2++ pipeline (dotted local labels, .scope/.endscope, .align directives, bare-hex string emission) cleared the entire assert fail: 0@12051 cluster (14 fixtures) plus a hex2pp.P1 BSS-overlap fix that unblocked the tcc-boot2 link itself for inputs >1 MiB |
| 176/2 | ternary-arms common-type fix in cg-ifelse-merge cleared 220-const-promote (was: arm 1's type leaked through as the result type, truncating wider arm 2 to 32-bit; tcc's gen_opic sign-extension idiom hit this) |
| 178/1 | reframed mem* as compiler builtins supplied by the build process: renamed libp1pp's libp1pp__memcpy / _memcmp / _memset to plain memcpy / memcmp / memset and added memmove; dropped mes-libc's string/memcpy.c / memmove.c / memset.c / memcmp.c from unified-libc.c so the symbols are not duplicated; added memcmp to tcc-cc/mem.c and linked it into the gcc-built tcc-gcc binary; updated and renamed the regression fixture (129-extern-libp1pp → 129-extern-mem-builtins) to extern the plain names. Cleared the fixture on every path (cc, cc-libc, tcc-cc, tcc-gcc). |
| 183/1 | added tcc-tcc (second-stage tcc) and routed tcc-cc / tcc-libc through it. cc.scm's cg-load was 8-byte-spilling struct lvalues — anything sizeof > 8 got truncated when used in expression context (e.g. as a ternary arm). Fixed cg-load to leave aggregates as lvalues and updated cg-ifelse-merge to memcpy aggregate arms into a struct-sized merge slot; without this, tcc-boot2 (cc.scm-built) self-corrupted whenever it had to compile type = bt1 == 6 ? type1 : type2;. Regression locked by tests/cc/336-struct-assign-ternary. |
| 181/0 | fixed cc.scm's struct-by-value parameter ABI — both cg-call and cg-fn-begin/v now split 9..16-byte aggregates across two consecutive ABI slots. Locked by tests/cc/337-struct-by-value-arg. |
| 181/0 | added simple-patches/tcc-0.9.26/lex-char-unsigned so tcc reads single-byte character constants through uint8_t, not int8_t; clears 200-lex-char-type on both tcc-cc (tcc-tcc-driven) and tcc-gcc (gcc-built control). C99 §6.4.4.4¶10 leaves char signedness implementation-defined and aarch64 AAPCS picks unsigned, so '\xFF' must be 255, not -1. |
Host Baseline
The tests/cc fixtures are coherent under a host compiler. The
temporary host harness compiled, ran, and compared every fixture with
plain host cc:
build/aarch64/.work/tests/tcc-cc/run-host-cc.sh
Recorded baseline:
HOST_CC=cc
HOST_CFLAGS=-std=gnu11 -w
153 passed, 0 failed
The gcc-built flattened-tcc control runs in the Alpine gcc image:
make tcc-gcc TCC_TARGET=ARM64
podman run --rm --pull=never --platform linux/arm64 \
-v "$PWD":/work -w /work boot2-alpine-gcc:aarch64 \
sh scripts/run-gcc-libc-flat-tcc.sh
This is the canonical sanity reference for "tcc-built-from-our-source"
fixture coverage; cc.scm-built tcc-boot2 is now at exact parity with
it.
Patches
scripts/simple-patches/tcc-0.9.26/ carries fixes applied during
stage1-flatten so any tcc rebuilt from this tree picks them up:
aarch64-stdarg-array.{before,after}— swaps the bundledva_listfor__va_list_struct[1](matches glibc/musl/x86_64 ABI).arm64-va-{pointer-operand,arg-pointer}.{before,after}— teachesgen_va_start/gen_va_argto skipgaddrof()when the operand is already a pointer (the array-decayed/pointer-parameter case). Without this,va_listforwarding into a non-variadic helper (thevfprintfshape, e.g.131-vararg-mixed) hitassert fail: 0inarm64-gen.c.const-divzero-shortcircuit-int.{before,after}— gatesgen_opic's "division by zero in constant" error on!nocode_wantedso the unevaluated arm of&&/||/?:in constant expressions (C11 §6.6¶3) does not abort.lex-char-unsigned.{before,after}— reads single-byte character constants throughuint8_tinstead ofint8_tso'\xFF'produces 255, not -1, matching aarch64 AAPCS's plain-charsignedness (C99 §6.4.4.4¶10 leaves it implementation-defined).
Fixture cleanups
Two small fixtures were rewritten to drop assumptions the regular cc
suite shouldn't depend on:
tests/cc/125-anon-union.cexplicitly initializes its local struct before probing anonymous-union aliasing. Tests must not depend on implicit zeroing of automatic locals.tests/cc/132-tentative-bss-sizing.creturns distinct numeric exit codes instead of callingsys_write/strlen. Plaintests/ccfixtures must not need stdio/libc helpers.
Known limitations
riscv64: u32 narrowing leaves dirty upper bits
tests/cc/335-ternary-merge-arith-conv fails on riscv64 in both
tcc-cc[stage2] and tcc-cc[stage3] (identical behavior — the
fixed-point property holds, the bug is in tcc's RISC-V codegen, not
in cc.scm or the P1 pipeline). aarch64 and amd64 are green.
The proximate trigger is in riscv64-gen.c::load():
func3 = size == 1 ? 0 : size == 2 ? 1 : size == 4 ? 2 : 3;
if (size < 4 && !is_float(sv->type.t) && (sv->type.t & VT_UNSIGNED))
func3 |= 4; // promotes lb→lbu, lh→lhu, but skips lw→lwu
The func3 |= 4 promotion to LWU is gated on size < 4, so a 4-byte
unsigned load uses LW (sign-extending) instead of LWU (zero-extending).
gen_cast to VT_INT|VT_UNSIGNED from a wider source emits no
narrowing — it relies on the use-time load to truncate, but with LW
the high u32 bits of the source leak through. (u32)x where x is
u64 with bit 31 set then evaluates to 0xFFFFFFFFFFFFFFFF. This
same bug is present in upstream tcc mob.
Why the one-line patch isn't enough. Widening the gate to
size <= 4 (so 4-byte unsigned loads use LWU) regresses
017-int-arith and 128-cast-signedness. They were passing because
two compensating bugs canceled out: stock tcc on riscv64 also
sign-extends unsigned 32-bit immediate constants (LUI/ADDI with a
bit-31-set value), so a comparison between an unsigned int
variable (loaded with sign-extending LW) and an unsigned int
constant (loaded with sign-extending LUI/ADDI) had matching dirty
upper bits and BEQ saw them as equal. Fixing only the load breaks
that join, because the compare path also lies — BEQ is a 64-bit
instruction but C semantics require 32-bit width for unsigned int == unsigned int.
Full fix shape. Three coupled pieces: (1) load — emit LWU for
unsigned 4-byte loads; (2) immediate — clear bits 32–63 when
materializing an unsigned 32-bit constant with bit 31 set; (3)
compare — eagerly canonicalize 32-bit-typed values into zero-extended
or sign-extended form (per VT_UNSIGNED) after every op that can
leave the upper half dirty. Pieces 2 and 3 overlap: if values are
canonicalized at every produce site, the load fix becomes one of many
sites that need to do it. This is what gcc/clang's RISC-V backends
do, and it's beyond the scope of the literal-block simple-patches
mechanism — file upstream or write a real canonicalization pass.
For now: known limitation, document, move on. The scalar codegen elsewhere on riscv64 is fine — only u32 narrowing of a wider source trips it.
tcc0 → tcc1 is not a fixed point on riscv64 (cc.scm behavioral bug)
boot3.sh + boot4.sh produce four staged compilers:
tcc0= tcc-source compiled by cc.scm (boot3 output)tcc1= tcc-source compiled by tcc0 (boot4)tcc2= tcc-source compiled by tcc1 (boot4)tcc3= tcc-source compiled by tcc2 (boot4)
The fixed-point check is tcc2 == tcc3 (asserted at the end of
boot4.sh, verified on aarch64, amd64, riscv64). On riscv64 the
weaker tcc1 == tcc2 does not hold: tcc0(tcc.flat.c) produces
a 616100-byte .o while tcc1(tcc.flat.c) and tcc2(tcc.flat.c)
produce a byte-identical 615892-byte .o — 208 bytes larger from
tcc0 (200 in .text + 8 ripple in symtab/reloc offsets). amd64 and
aarch64 satisfy tcc1 == tcc2; only riscv64 diverges.
This is a bug to investigate, not just a "fatter code" observation. cc.scm should be a faithful (semantics-preserving) compiler — slower or larger output is acceptable, but tcc0 and tcc1 must produce byte-identical output when run on the same source. That they don't on riscv64 means cc.scm's translation of tcc.flat.c into tcc0 changed what tcc0 does at runtime, not just how it's encoded. We don't care about peephole optimizations being missed; we do care that tcc0 makes different codegen decisions than tcc1 makes.
What's known
The visible symptom: tcc0 emits 4 RISCV codegen patterns differently than tcc1 does:
| Source pattern | tcc0 emits | tcc1 emits | Δ |
|---|---|---|---|
x = x - imm (i32) |
addiw t,zero,imm; addw rd,rs,t |
addiw rd,rs,imm |
+4 B |
x = x & imm |
addiw t,zero,imm; and rd,rs,t |
andi rd,rs,imm |
+4 B |
zero-ext after sext.w |
sext.w r,r; slli r,r,0x20; srli r,r,0x20 |
sext.w r,r |
+8 B |
x == 0xFFFFFFFF (i32) |
addiw t,zero,-1; slli/srli; beq x,t,L |
addi x,x,1; beqz x,L |
+8 B |
These are decision points in riscv64-gen.c (immediate-folding,
zero-ext elision). Same source code, same input C, but the running
tcc0 takes the slow branch where the running tcc1 takes the fast
one — even though both are compiled from the same tcc.flat.c.
Hypothesis to test
cc.scm likely miscompiles an integer comparison or bit-test inside
the immediate-fits-in-instruction guard in riscv64-gen.c. Most of
the missed patterns share the shape if (small_int_fits) { fold } else { materialize }. If cc.scm gets the predicate wrong (e.g. signed vs.
unsigned compare, or wrong branch on a particular bit pattern), tcc0
falls into the materialize path on inputs where tcc1 takes the fold
path.
Repro / starting point
# In the riscv64 container with boot3+boot4 outputs present:
$TCC0 -nostdlib -c -o /tmp/flat-tcc0.o tcc.flat.c
$TCC1 -nostdlib -c -o /tmp/flat-tcc1.o tcc.flat.c
# wc -c /tmp/flat-tcc0.o /tmp/flat-tcc1.o → 616100 vs 615892
# objdump -d both, normalize addresses, diff to find divergent functions
The first divergent function in disassembly is tal_free_impl — a
small refcount-decrement that hits the "x = x - 1" pattern. Good
starting point because the function is short and the source path is
narrow.
Until this is fixed, tcc1 is the "shake-out" stage and tcc2 is the canonical compiler.
Standalone bootN.sh: remaining host deps
scripts/{boot0,boot1,boot2}.sh are pure scratch + busybox — no host
compiler, no alpine-gcc image, just podman + the pinned busybox:musl
digest. boot3.sh is also pure scratch + busybox (it's just
scheme1 + M1pp + hex2pp on .flat.c inputs flattened by host cc -E).
boot4.sh previously had one host-tooling dep on aarch64 only:
cross-asm of tcc-libc/aarch64/{start,sys_stubs}.S to .o via
$HOST_CC -target aarch64-linux-gnu. tcc 0.9.26's aarch64 backend has
no assembler (no arm64-asm.c) and no inline-asm support, so .S inputs
historically needed pre-compilation host-side; the patched arm64-asm.c
now removes that requirement (see docs/TCC-ARM64-ASM.md).
amd64 and riscv64 backends both ship CONFIG_TCC_ASM and assemble .S
in-container via tcc-boot2 itself (stages C+D in boot4.sh). The
riscv64 .S files are macroed behind #ifdef __TINYC__ because tcc's
riscv64 asm parser uses 3-operand load/store syntax (ld rd, base, off,
sd base, src, off — base first for stores) instead of GAS's
ld rd, off(base) / sd src, off(base); the GAS path stays usable
for the Makefile's alpine-gcc fallback. The boot2-alpine-gcc:riscv64
image is no longer used by boot3.sh / boot4.sh.
Replacing the aarch64 .S pair with .P1pp (or any in-container-buildable)
equivalents drops the host-cc dep entirely. After that, every
bootN.sh is podman + scratch + busybox only.
Out of scope for this TODO (already accepted as host-side):
stage1-flatten.sh and libc-flatten.sh use the host cc -E
preprocessor to produce tcc.flat.c and libc.flat.c. The unpacked
tcc-0.9.26/lib/{lib-arm64.c, va_list.c} helpers compile cleanly under
tcc-boot2 inside the container — no host cc on those, just source
deps.
Next steps
The cc.scm path is at full parity with the gcc-built control on the
test suites that pass: every fixture in tcc-cc and tcc-libc
passes on both, modulo the riscv64 limitation noted above. Further
bug-hunting work is open-ended — surface a misbehavior, write a
tests/cc fixture that locks it, fix.