commit 994d0d3666007b7fe0e8e68732631f994f496c20
parent a73a78dfa060ba0a69f92bf8e1fd941ac8dc61dd
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 4 May 2026 00:06:01 -0700
tcc 0.9.26: read single-byte char constants through uint8_t
C99 §6.4.4.4¶10 leaves plain-`char` signedness implementation-defined,
and aarch64 AAPCS picks unsigned. tcc 0.9.26 unconditionally
sign-extended single-byte character constants via `int8_t`, so '\xFF'
produced -1 instead of 255 — surfacing as the long-running
200-lex-char-type fixture failure on both tcc-tcc (cc.scm chain) and
tcc-gcc (gcc-built control).
Add a simple-patches/ entry that swaps the int8_t cast in tccpp.c's
parse_string for uint8_t. With this both suites go fully green:
tcc-cc 181/0 (was 180/1)
tcc-gcc 181/0 (was 180/1)
Diffstat:
4 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/docs/TCC-TODO.md b/docs/TCC-TODO.md
@@ -73,16 +73,11 @@ NAMES='002-arith 007-call-with-args' make test SUITE=tcc-cc
## Latest Result
```text
-make test SUITE=tcc-cc cc.scm-built tcc-boot2: 178 passed, 1 failed
-scripts/run-gcc-libc-flat-tcc.sh gcc-built tcc-gcc: 178 passed, 1 failed
+make test SUITE=tcc-cc tcc-tcc on tests/cc: 181 passed, 0 failed
+scripts/run-gcc-libc-flat-tcc.sh tcc-gcc baseline: 181 passed, 0 failed
```
-Exact parity. The single remaining failure is the same on both paths
-and is not a cc.scm bug:
-
-- **`200-lex-char-type`** — upstream tcc 0.9.26 bug (also fails under
- the gcc-built control). Fixing it requires a `simple-patches/` patch
- against tcc itself.
+Exact parity, suite fully green on both paths.
The path from earlier results to here:
@@ -94,6 +89,8 @@ The path from earlier results to here:
| 176/2 | ternary-arms common-type fix in `cg-ifelse-merge` cleared `220-const-promote` (was: arm 1's type leaked through as the result type, truncating wider arm 2 to 32-bit; tcc's `gen_opic` sign-extension idiom hit this) |
| 178/1 | reframed mem* as compiler builtins supplied by the build process: renamed libp1pp's `libp1pp__memcpy` / `_memcmp` / `_memset` to plain `memcpy` / `memcmp` / `memset` and added `memmove`; dropped mes-libc's `string/memcpy.c` / `memmove.c` / `memset.c` / `memcmp.c` from `unified-libc.c` so the symbols are not duplicated; added `memcmp` to `tcc-cc/mem.c` and linked it into the gcc-built tcc-gcc binary; updated and renamed the regression fixture (`129-extern-libp1pp` → `129-extern-mem-builtins`) to extern the plain names. Cleared the fixture on every path (cc, cc-libc, tcc-cc, tcc-gcc). |
| 183/1 | added `tcc-tcc` (second-stage tcc) and routed `tcc-cc` / `tcc-libc` through it. cc.scm's `cg-load` was 8-byte-spilling struct lvalues — anything `sizeof > 8` got truncated when used in expression context (e.g. as a ternary arm). Fixed `cg-load` to leave aggregates as lvalues and updated `cg-ifelse-merge` to memcpy aggregate arms into a struct-sized merge slot; without this, tcc-boot2 (cc.scm-built) self-corrupted whenever it had to compile `type = bt1 == 6 ? type1 : type2;`. Regression locked by `tests/cc/336-struct-assign-ternary`. |
+| 181/0 | fixed cc.scm's struct-by-value parameter ABI — both `cg-call` and `cg-fn-begin/v` now split 9..16-byte aggregates across two consecutive ABI slots. Locked by `tests/cc/337-struct-by-value-arg`. |
+| 181/0 | added `simple-patches/tcc-0.9.26/lex-char-unsigned` so tcc reads single-byte character constants through `uint8_t`, not `int8_t`; clears `200-lex-char-type` on both `tcc-cc` (tcc-tcc-driven) and `tcc-gcc` (gcc-built control). C99 §6.4.4.4¶10 leaves `char` signedness implementation-defined and aarch64 AAPCS picks unsigned, so `'\xFF'` must be 255, not -1. |
## Host Baseline
@@ -143,6 +140,10 @@ it.
"division by zero in constant" error on `!nocode_wanted` so the
unevaluated arm of `&&`/`||`/`?:` in constant expressions
(C11 §6.6¶3) does not abort.
+- `lex-char-unsigned.{before,after}` — reads single-byte character
+ constants through `uint8_t` instead of `int8_t` so `'\xFF'`
+ produces 255, not -1, matching aarch64 AAPCS's plain-`char`
+ signedness (C99 §6.4.4.4¶10 leaves it implementation-defined).
## Fixture cleanups
@@ -158,7 +159,7 @@ suite shouldn't depend on:
## Next steps
-The cc.scm path matches the gcc baseline; further `tcc-cc` progress is
-gated on upstream tcc bugs, not on our compiler.
-
-- Backport tcc's `200-lex-char-type` fix as a `simple-patches/` entry.
+The cc.scm path is at full parity with the gcc-built control: every
+fixture in `tcc-cc` and `tcc-libc` passes on both. Further bug-hunting
+work is open-ended — surface a misbehavior, write a `tests/cc` fixture
+that locks it, fix.
diff --git a/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.after b/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.after
@@ -0,0 +1,10 @@
+ if (!is_long) {
+ /* C99 §6.4.4.4¶10: an integer character constant has type
+ * `int`. The value is what you get from converting an
+ * object of type `char` (whose value is the byte) to int.
+ * Plain `char` signedness is implementation-defined; on
+ * aarch64 AAPCS (and most modern ABIs) it is unsigned, so
+ * '\xFF' must yield 255, not -1. tcc 0.9.26 unconditionally
+ * sign-extended via int8_t — read as uint8_t instead. */
+ tokc.i = *(uint8_t *)tokcstr.data;
+ tok = TOK_CCHAR;
diff --git a/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.before b/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.before
@@ -0,0 +1,3 @@
+ if (!is_long) {
+ tokc.i = *(int8_t *)tokcstr.data;
+ tok = TOK_CCHAR;
diff --git a/scripts/stage1-flatten.sh b/scripts/stage1-flatten.sh
@@ -139,6 +139,7 @@ apply_our_patch getclock-ms-stub "$SRC/tcc.c"
apply_our_patch getcwd-stub "$SRC/tccgen.c"
apply_our_patch ldexp-stub "$SRC/tccpp.c"
apply_our_patch date-time-stub "$SRC/tccpp.c"
+apply_our_patch lex-char-unsigned "$SRC/tccpp.c"
apply_our_patch elfinterp-stub "$SRC/tccelf.c"
# Const-expr short-circuit: gen_opic/gen_opif must respect nocode_wanted