commit 76182e3a5a1b436ec5df54f8405da50f3ee1e16f
parent 0dc9d9b9d6fd9100c28f165cfa7e71e3033a16cf
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Thu, 28 May 2026 14:32:42 -0700
doc: refresh binary-trees -O1 numbers after known-frame prologue
Re-ran the binary-trees sweep on a clean release build (RELEASE=1,
COMPILE_REPEATS=3, RUN_REPEATS=3). cfree -O1 runtime 3146 -> 2973 ms;
vs gcc -O0 0.84x -> 0.89x. Other rows unchanged (not re-run).
Diffstat:
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/doc/OPT_O1_PERF_TODO.md b/doc/OPT_O1_PERF_TODO.md
@@ -27,7 +27,7 @@ the cached baseline in `scripts/opt_bench_baseline.csv`; regenerate with
| bench | cfree -O1 | gcc -O0 | vs gcc-O0 | mir -O1 | vs mir-O1 | behind |
| --- | ---: | ---: | ---: | ---: | ---: | --- |
-| binary-trees | 3146 | 2639 | **0.84×** (slower) | n/a¹ | — | gcc |
+| binary-trees | 2973 | 2647 | **0.89×** (slower) | n/a¹ | — | gcc |
| lists | 4843 | 8868 | 1.83× ✓ | 4997 | 1.03× | mir |
| hash2 | 4988 | 7481 | 1.50× ✓ | 3863 | **0.77×** | mir |
| sieve | 5148 | 5077 | 0.99× (~tied) | 4028 | **0.78×** | gcc (~tied), mir |
@@ -39,7 +39,8 @@ the cached baseline in `scripts/opt_bench_baseline.csv`; regenerate with
## Per-benchmark notes
### binary-trees — slower than unoptimized gcc (highest priority)
-The only case where cfree `-O1` is *slower than gcc -O0* (0.84×). Workload is
+The only case where cfree `-O1` is *slower than gcc -O0* (0.89×, up from 0.84×
+after the known-frame prologue landed — item 1 below). Workload is
recursive tree build/walk: four tiny functions (`NewTreeNode`, `ItemCheck`,
`BottomUpTree`, `DeleteTree`) called ~7.6M times at depth=19, plus a
`malloc`/`free` per node. The **body** of each function is fine — cfree -O1