boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 994d0d3666007b7fe0e8e68732631f994f496c20
parent a73a78dfa060ba0a69f92bf8e1fd941ac8dc61dd
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  4 May 2026 00:06:01 -0700

tcc 0.9.26: read single-byte char constants through uint8_t

C99 §6.4.4.4¶10 leaves plain-`char` signedness implementation-defined,
and aarch64 AAPCS picks unsigned. tcc 0.9.26 unconditionally
sign-extended single-byte character constants via `int8_t`, so '\xFF'
produced -1 instead of 255 — surfacing as the long-running
200-lex-char-type fixture failure on both tcc-tcc (cc.scm chain) and
tcc-gcc (gcc-built control).

Add a simple-patches/ entry that swaps the int8_t cast in tccpp.c's
parse_string for uint8_t. With this both suites go fully green:

  tcc-cc   181/0  (was 180/1)
  tcc-gcc  181/0  (was 180/1)

Diffstat:
Mdocs/TCC-TODO.md | 25+++++++++++++------------
Ascripts/simple-patches/tcc-0.9.26/lex-char-unsigned.after | 10++++++++++
Ascripts/simple-patches/tcc-0.9.26/lex-char-unsigned.before | 3+++
Mscripts/stage1-flatten.sh | 1+
4 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/docs/TCC-TODO.md b/docs/TCC-TODO.md @@ -73,16 +73,11 @@ NAMES='002-arith 007-call-with-args' make test SUITE=tcc-cc ## Latest Result ```text -make test SUITE=tcc-cc cc.scm-built tcc-boot2: 178 passed, 1 failed -scripts/run-gcc-libc-flat-tcc.sh gcc-built tcc-gcc: 178 passed, 1 failed +make test SUITE=tcc-cc tcc-tcc on tests/cc: 181 passed, 0 failed +scripts/run-gcc-libc-flat-tcc.sh tcc-gcc baseline: 181 passed, 0 failed ``` -Exact parity. The single remaining failure is the same on both paths -and is not a cc.scm bug: - -- **`200-lex-char-type`** — upstream tcc 0.9.26 bug (also fails under - the gcc-built control). Fixing it requires a `simple-patches/` patch - against tcc itself. +Exact parity, suite fully green on both paths. The path from earlier results to here: @@ -94,6 +89,8 @@ The path from earlier results to here: | 176/2 | ternary-arms common-type fix in `cg-ifelse-merge` cleared `220-const-promote` (was: arm 1's type leaked through as the result type, truncating wider arm 2 to 32-bit; tcc's `gen_opic` sign-extension idiom hit this) | | 178/1 | reframed mem* as compiler builtins supplied by the build process: renamed libp1pp's `libp1pp__memcpy` / `_memcmp` / `_memset` to plain `memcpy` / `memcmp` / `memset` and added `memmove`; dropped mes-libc's `string/memcpy.c` / `memmove.c` / `memset.c` / `memcmp.c` from `unified-libc.c` so the symbols are not duplicated; added `memcmp` to `tcc-cc/mem.c` and linked it into the gcc-built tcc-gcc binary; updated and renamed the regression fixture (`129-extern-libp1pp` → `129-extern-mem-builtins`) to extern the plain names. Cleared the fixture on every path (cc, cc-libc, tcc-cc, tcc-gcc). | | 183/1 | added `tcc-tcc` (second-stage tcc) and routed `tcc-cc` / `tcc-libc` through it. cc.scm's `cg-load` was 8-byte-spilling struct lvalues — anything `sizeof > 8` got truncated when used in expression context (e.g. as a ternary arm). Fixed `cg-load` to leave aggregates as lvalues and updated `cg-ifelse-merge` to memcpy aggregate arms into a struct-sized merge slot; without this, tcc-boot2 (cc.scm-built) self-corrupted whenever it had to compile `type = bt1 == 6 ? type1 : type2;`. Regression locked by `tests/cc/336-struct-assign-ternary`. | +| 181/0 | fixed cc.scm's struct-by-value parameter ABI — both `cg-call` and `cg-fn-begin/v` now split 9..16-byte aggregates across two consecutive ABI slots. Locked by `tests/cc/337-struct-by-value-arg`. | +| 181/0 | added `simple-patches/tcc-0.9.26/lex-char-unsigned` so tcc reads single-byte character constants through `uint8_t`, not `int8_t`; clears `200-lex-char-type` on both `tcc-cc` (tcc-tcc-driven) and `tcc-gcc` (gcc-built control). C99 §6.4.4.4¶10 leaves `char` signedness implementation-defined and aarch64 AAPCS picks unsigned, so `'\xFF'` must be 255, not -1. | ## Host Baseline @@ -143,6 +140,10 @@ it. "division by zero in constant" error on `!nocode_wanted` so the unevaluated arm of `&&`/`||`/`?:` in constant expressions (C11 §6.6¶3) does not abort. +- `lex-char-unsigned.{before,after}` — reads single-byte character + constants through `uint8_t` instead of `int8_t` so `'\xFF'` + produces 255, not -1, matching aarch64 AAPCS's plain-`char` + signedness (C99 §6.4.4.4¶10 leaves it implementation-defined). ## Fixture cleanups @@ -158,7 +159,7 @@ suite shouldn't depend on: ## Next steps -The cc.scm path matches the gcc baseline; further `tcc-cc` progress is -gated on upstream tcc bugs, not on our compiler. - -- Backport tcc's `200-lex-char-type` fix as a `simple-patches/` entry. +The cc.scm path is at full parity with the gcc-built control: every +fixture in `tcc-cc` and `tcc-libc` passes on both. Further bug-hunting +work is open-ended — surface a misbehavior, write a `tests/cc` fixture +that locks it, fix. diff --git a/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.after b/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.after @@ -0,0 +1,10 @@ + if (!is_long) { + /* C99 §6.4.4.4¶10: an integer character constant has type + * `int`. The value is what you get from converting an + * object of type `char` (whose value is the byte) to int. + * Plain `char` signedness is implementation-defined; on + * aarch64 AAPCS (and most modern ABIs) it is unsigned, so + * '\xFF' must yield 255, not -1. tcc 0.9.26 unconditionally + * sign-extended via int8_t — read as uint8_t instead. */ + tokc.i = *(uint8_t *)tokcstr.data; + tok = TOK_CCHAR; diff --git a/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.before b/scripts/simple-patches/tcc-0.9.26/lex-char-unsigned.before @@ -0,0 +1,3 @@ + if (!is_long) { + tokc.i = *(int8_t *)tokcstr.data; + tok = TOK_CCHAR; diff --git a/scripts/stage1-flatten.sh b/scripts/stage1-flatten.sh @@ -139,6 +139,7 @@ apply_our_patch getclock-ms-stub "$SRC/tcc.c" apply_our_patch getcwd-stub "$SRC/tccgen.c" apply_our_patch ldexp-stub "$SRC/tccpp.c" apply_our_patch date-time-stub "$SRC/tccpp.c" +apply_our_patch lex-char-unsigned "$SRC/tccpp.c" apply_our_patch elfinterp-stub "$SRC/tccelf.c" # Const-expr short-circuit: gen_opic/gen_opif must respect nocode_wanted