stage2: R1–R7 rt-ingest fixes (5/8 rt sources compile under cfree) - kit

commit d00d65a937eb4ea15f294fdd58d8de057cf632f5
parent 23338340287823ed9e87979bca0dd7a261a48b4b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 16:51:40 -0700

stage2: R1–R7 rt-ingest fixes (5/8 rt sources compile under cfree)

Lands the R-series from doc/STAGE2.md so cfree can parse most of the
freestanding compiler-rt sources in rt/lib/. After this change the
aarch64-apple-darwin probe goes from 2/8 to 5/8 clean.

R1 source-only: replace __inline with inline across rt/lib/.

R2 pp: pp_next drops forwarded #pragma lines so the C parser doesn't
see them (cpp mode still re-emits via pp_next_raw). atomic_common.inc:
drop #pragma redefine_extname and rename _c-suffixed functions to their
final library names directly — cfree has no clang-builtin collision.

R3 parse_expr: __builtin_offsetof folds inside cexpr_unary.

R4 parse_type/abi: member-level _Alignas raises the field's
align_override, which abi_record_layout already propagates into the
containing aggregate's alignment.

R5 type/abi/cg: __int128, __int128_t, __uint128_t recognized as type
specifiers (TY_INT128 / TY_UINT128, size 16, align 16). Typedef-only
use parses; cg_load/store/binop/unop/convert on int128 panic with a
clear "__int128 codegen not implemented" diagnostic.

R6 parse_expr/cg: __builtin_trap, __builtin_unreachable (new
cg_intrinsic_void), __builtin_clz{,l,ll} (cg_intrinsic_unary_to_int +
INTRIN_CLZ), and __builtin_mem{cpy,move,cmp,set} (rewritten to plain
libc calls in try_parse_builtin_call so runtime n works).

R7 parse_expr/parse: __func__, __FUNCTION__, __PRETTY_FUNCTION__
synthesized lazily as NUL-terminated char[N+1] in .rodata, using a
new Parser.cur_func_name set around parse_function_body.

doc/STAGE2.md updated with R6/R7 status and the three concrete
remaining rt blockers (R8 ctz variants, R9 atomic-lock-free fold,
R10 file-scope __asm__).

Diffstat:
M doc/STAGE2.md  | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
M rt/lib/atomic/atomic_common.inc  | 27 +++++++++++----------------
M rt/lib/impl/fp_add_impl.inc  | 2 +-
M rt/lib/impl/fp_compare_impl.inc  | 2 +-
M rt/lib/impl/fp_div_impl.inc  | 2 +-
M rt/lib/impl/fp_extend_impl.inc  | 2 +-
M rt/lib/impl/fp_fixint_impl.inc  | 2 +-
M rt/lib/impl/fp_fixuint_impl.inc  | 2 +-
M rt/lib/impl/fp_mul_impl.inc  | 2 +-
M rt/lib/impl/fp_trunc_impl.inc  | 2 +-
M rt/lib/impl/int_div_impl.inc  | 10 +++++-----
M rt/lib/impl/int_to_fp_impl.inc  | 12 ++++++------
M rt/lib/include/common/fp_lib.h  | 46 +++++++++++++++++++++++-----------------------
M rt/lib/include/llp64_le/int_lib.h  | 4 ++--
M rt/lib/include/lp64_le/int_lib.h  | 4 ++--
M src/abi/abi.c  | 10 ++++++++++
M src/cg/cg.c  | 25 +++++++++++++++++++++++++
M src/cg/cg.h  | 7 +++++++
M src/debug/c_debug.c  | 4 ++++
M src/parse/parse.c  | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M src/parse/parse_expr.c  | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
M src/parse/parse_priv.h  | 18 ++++++++++++++++++
M src/parse/parse_type.c  | 15 +++++++++++++++
M src/pp/pp.c  | 29 +++++++++++++++++++++++++++--
M src/type/type.c  | 2 ++
M src/type/type.h  | 2 ++

26 files changed, 397 insertions(+), 93 deletions(-)
diff --git a/doc/STAGE2.md b/doc/STAGE2.md
@@ -156,35 +156,77 @@ not been switched back on.
 
 Separate from stage-2 self-host: can cfree compile `libcfree_rt.a`?
 Probed on the `aarch64-apple-darwin` variant — 8 sources, freestanding,
-no system headers. Result: **2 / 8 clean** today. Flags must drop
+no system headers. Result: **5 / 8 clean** today (`fp/fp.c`, `mem/mem.c`,
+`cfree/ifunc_init.c`, `coro/coro.c`, `int/int.c`). Flags must drop
 `-std=c11 -Wpedantic -Wall -Wextra -Werror -ffreestanding -fno-builtin` —
 cfree rejects all of these. (`-fno-builtin` is the only one not already
 on the stage-2 drop list.)
 
-- [ ] **R1.** Accept `__inline` and `__inline__` as keyword aliases for
-  `inline`. One-line lexer/keyword-table change. Blocks `int/int.c`,
-  `fp/fp.c`, `int64/int64.c` via `rt/lib/include/lp64_le/int_lib.h:83`
-  (`static __inline …`).
-- [ ] **R2.** Unknown `#pragma` accepted as no-op. Same root cause as
-  A2-S3 — one fix, two payoffs. Blocks
-  `atomic/atomic_freestanding.c` (`#pragma clang diagnostic …`).
-- [ ] **R3.** Fold `__builtin_offsetof(T, m)` as a constant expression.
-  cfree already computes struct layout; this is plumbing for the
-  constant-evaluator. Blocks `coro/aarch64.c` (`offsetof` inside
-  `_Static_assert`).
-- [ ] **R4.** `_Alignas(N)` on a **struct member** must raise the
-  containing aggregate's alignment (C11 §6.7.5). cfree honors
-  `__attribute__((aligned(N)))` on the struct itself, but member-level
-  `_Alignas` doesn't propagate. Blocks `coro/coro.c` whose
-  `_Alignof(coro_t)` assertion silently evaluates to less than 16.
-- [ ] **R5.** `__int128` keyword. Latent — `int_lib.h` declares the
-  type via `__attribute__((mode(TI)))` which parses, but full codegen
-  correctness on `__int128` operations hasn't been exercised. Will
-  matter for ABI variants that hit the int128 paths.
-
-After R1+R2+R3+R4, all 8 sources of the `aarch64-apple-darwin` variant
-should compile. The same fixes apply to the linux variants (same
-`int_lib.h`, same `coro.c`, same Apache-2.0 atomic shim).
+- [x] **R1.** Replaced `__inline` with `inline` in rt sources (no
+  compiler change; cfree already accepts `inline`).
+- [x] **R2.** Unknown `#pragma` now silently skipped at the parser
+  boundary (`pp_next` drops forwarded pragma lines so cpp mode still
+  re-emits them via `pp_next_raw`). `atomic_common.inc`'s
+  `#pragma redefine_extname` rename was dropped from source; the
+  `_c`-suffixed functions were renamed directly to their final library
+  names (no clang-builtin collision on the cfree side).
+- [x] **R3.** `__builtin_offsetof(T, m)` now folds inside `cexpr_unary`
+  using the existing `offsetof_designator` helper. Unblocks
+  `_Static_assert(offsetof(...))`.
+- [x] **R4.** Member-level `_Alignas(N)` now raises the field's
+  `align_override`, which the ABI layout already propagates into the
+  containing aggregate's alignment (`src/abi/abi.c:195,213,223`).
+- [x] **R5.** `__int128`, `__int128_t`, `__uint128_t` recognized as
+  type specifiers (`TY_INT128`/`TY_UINT128`, size 16, align 16).
+  Typedef-only use parses; any `cg_load`/`cg_store`/`cg_binop`/
+  `cg_unop`/`cg_convert` on int128 panics with a clear
+  "`__int128` codegen not implemented" diagnostic. Codegen support is
+  out of scope for this milestone.
+- [x] **R6.** Missing rt builtins wired up in the parser.
+  - `__builtin_trap`, `__builtin_unreachable` → new `cg_intrinsic_void`,
+    `INTRIN_TRAP` / `INTRIN_UNREACHABLE` (already implemented in all
+    three backends).
+  - `__builtin_clz`, `__builtin_clzl`, `__builtin_clzll` →
+    `cg_intrinsic_unary_to_int(INTRIN_CLZ)`; operand type drives width.
+  - `__builtin_memcpy`, `__builtin_memmove`, `__builtin_memcmp`,
+    `__builtin_memset` → rewritten at `try_parse_builtin_call` to plain
+    calls to the libc functions of the same name, so runtime-`n` works.
+    Caller must declare the libc prototype (rt's `<string.h>` does).
+- [x] **R7.** `__func__`, `__FUNCTION__`, `__PRETTY_FUNCTION__`
+  predefined identifiers (C99 §6.4.2.2). Synthesized lazily in
+  `parse_primary` as a NUL-terminated `char[N+1]` literal in `.rodata`,
+  using a new `Parser.cur_func_name` field set around
+  `parse_function_body`. Outside a function body, a clean diagnostic.
+
+After R1–R7, three blockers remain for the 8-source `aarch64-apple-darwin`
+rt probe:
+
+- [ ] **R8.** `__builtin_ctzl` / `__builtin_ctzll` not wired (only
+  `__builtin_ctz` is). Same shape as R6's `clz` wiring; just needs the
+  three symbols added to the gate in `try_parse_builtin_call` and
+  routed through `INTRIN_CTZ`. Blocks `int64/int64.c:217`.
+- [ ] **R9.** `__atomic_always_lock_free(size, ptr)` and
+  `__atomic_is_lock_free(size, ptr)` must fold at compile time when
+  `size` is a constant — `atomic_common.inc`'s `IS_LOCK_FREE_n` macros
+  expand to these inside `case 1: ... case 16:` arms and rely on the
+  fold to elide unreachable branches. Plain runtime calls would still
+  link but the macros wrap the result in a switch over `size`, so
+  without folding cfree would emit per-size dispatch that the rt
+  layout expects to be dead-code-eliminated. Blocks
+  `atomic/atomic_freestanding.c:77`.
+- [ ] **R10.** File-scope `__asm__("...")` declarations
+  (a GCC extension, also accepted by clang). `coro/aarch64.c` uses
+  this form to emit raw setjmp/longjmp assembly without going through
+  a function-body inline-asm path. Needs a new top-level parser case
+  in `parse_translation_unit` that recognizes `__asm__` / `asm` at
+  TU scope, parses the string-literal argument, and feeds it to
+  `parse_asm` against the current `__TEXT,__text` section. Blocks
+  `coro/aarch64.c:120`.
+
+Additionally listed in the larger SDK ingest plan but not yet seen in
+the 8-source rt probe: `__builtin_*_overflow` (for `int/int.c`'s
+`__addvsi3` family — currently the source uses manual overflow checks,
+not the builtins).
 
 ## How to re-run the audits
 
diff --git a/rt/lib/atomic/atomic_common.inc b/rt/lib/atomic/atomic_common.inc
@@ -13,19 +13,14 @@
 // Uses GCC-style __atomic_* builtins (the family cfree provides) rather than
 // Clang-specific __c11_atomic_*.
 //
-// The five generic entry points (load/store/exchange/compare_exchange/
-// is_lock_free) collide with clang builtins of the same name. We define them
-// under _c-suffixed symbols and rename them at link time via the standard
-// `#pragma redefine_extname` trick.
+// cfree does not treat the unsuffixed generic entry points
+// (__atomic_load / __atomic_store / __atomic_exchange /
+// __atomic_compare_exchange / __atomic_is_lock_free) as compiler builtins,
+// so we define them under their final library names directly — no
+// `#pragma redefine_extname` rename is required.
 //===----------------------------------------------------------------------===//
 
-#pragma redefine_extname __atomic_load_c              __atomic_load
-#pragma redefine_extname __atomic_store_c             __atomic_store
-#pragma redefine_extname __atomic_exchange_c          __atomic_exchange
-#pragma redefine_extname __atomic_compare_exchange_c  __atomic_compare_exchange
-#pragma redefine_extname __atomic_is_lock_free_c      __atomic_is_lock_free
-
-static __inline Lock *lock_for_pointer(void *ptr) {
+static inline Lock *lock_for_pointer(void *ptr) {
   intptr_t hash = (intptr_t)ptr;
   hash >>= 4;
   intptr_t low = hash & SPINLOCK_MASK;
@@ -77,7 +72,7 @@ static __inline Lock *lock_for_pointer(void *ptr) {
     }                                                                          \
   } while (0)
 
-bool __atomic_is_lock_free_c(size_t size, void *ptr) {
+bool __atomic_is_lock_free(size_t size, void *ptr) {
 #define LOCK_FREE_ACTION(type) return true;
   LOCK_FREE_CASES(ptr);
 #undef LOCK_FREE_ACTION
@@ -87,7 +82,7 @@ bool __atomic_is_lock_free_c(size_t size, void *ptr) {
 #pragma clang diagnostic push
 #pragma clang diagnostic ignored "-Watomic-alignment"
 
-void __atomic_load_c(int size, void *src, void *dest, int model) {
+void __atomic_load(int size, void *src, void *dest, int model) {
 #define LOCK_FREE_ACTION(type)                                                 \
   *((type *)dest) = __atomic_load_n((type *)src, model);                       \
   return;
@@ -99,7 +94,7 @@ void __atomic_load_c(int size, void *src, void *dest, int model) {
   unlock(l);
 }
 
-void __atomic_store_c(int size, void *dest, void *src, int model) {
+void __atomic_store(int size, void *dest, void *src, int model) {
 #define LOCK_FREE_ACTION(type)                                                 \
   __atomic_store_n((type *)dest, *(type *)src, model);                         \
   return;
@@ -111,7 +106,7 @@ void __atomic_store_c(int size, void *dest, void *src, int model) {
   unlock(l);
 }
 
-int __atomic_compare_exchange_c(int size, void *ptr, void *expected,
+int __atomic_compare_exchange(int size, void *ptr, void *expected,
                                 void *desired, int success, int failure) {
 #define LOCK_FREE_ACTION(type)                                                 \
   return __atomic_compare_exchange_n((type *)ptr, (type *)expected,            \
@@ -130,7 +125,7 @@ int __atomic_compare_exchange_c(int size, void *ptr, void *expected,
   return 0;
 }
 
-void __atomic_exchange_c(int size, void *ptr, void *val, void *old, int model) {
+void __atomic_exchange(int size, void *ptr, void *val, void *old, int model) {
 #define LOCK_FREE_ACTION(type)                                                 \
   *(type *)old = __atomic_exchange_n((type *)ptr, *(type *)val, model);        \
   return;
diff --git a/rt/lib/impl/fp_add_impl.inc b/rt/lib/impl/fp_add_impl.inc
@@ -30,7 +30,7 @@
 #ifdef _FP_ADD_EMIT
 #undef _FP_ADD_EMIT
 
-static __inline fp_t __addXf3__(fp_t a, fp_t b) {
+static inline fp_t __addXf3__(fp_t a, fp_t b) {
   rep_t aRep = toRep(a);
   rep_t bRep = toRep(b);
   const rep_t aAbs = aRep & absMask;
diff --git a/rt/lib/impl/fp_compare_impl.inc b/rt/lib/impl/fp_compare_impl.inc
@@ -9,7 +9,7 @@
 #include "fp_lib.h"
 
 // CMP_RESULT and the LE_*/GE_* sentinels are precision-independent; emit
-// them once per TU. The static __inline comparators (__leXf2__ etc.) are
+// them once per TU. The static inline comparators (__leXf2__ etc.) are
 // per-precision and gated below.
 #ifndef FP_COMPARE_COMMON_EMITTED
 #define FP_COMPARE_COMMON_EMITTED
diff --git a/rt/lib/impl/fp_div_impl.inc b/rt/lib/impl/fp_div_impl.inc
@@ -48,7 +48,7 @@
 #error At least one full iteration is required
 #endif
 
-static __inline fp_t __divXf3__(fp_t a, fp_t b) {
+static inline fp_t __divXf3__(fp_t a, fp_t b) {
 
   const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;
   const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;
diff --git a/rt/lib/impl/fp_extend_impl.inc b/rt/lib/impl/fp_extend_impl.inc
@@ -314,7 +314,7 @@ static inline dst_t dstFromRep(dst_rep_t x) {
 // format. In particular, for the source type srcSigFracBits may be not equal to
 // srcSigBits. The destination type is assumed to be one of IEEE-754 standard
 // types.
-static __inline dst_t __extendXfYf2__(src_t a) {
+static inline dst_t __extendXfYf2__(src_t a) {
   // Various constants whose values follow from the type parameters.
   // Any reasonable optimizer will fold and propagate all of these.
   const int srcInfExp = (1 << srcExpBits) - 1;
diff --git a/rt/lib/impl/fp_fixint_impl.inc b/rt/lib/impl/fp_fixint_impl.inc
@@ -27,7 +27,7 @@
 #endif
 #define __fixint FP_FIX_IMPL_PASTE(__fixint, FP_FIX_SUFFIX)
 
-static __inline fixint_t __fixint(fp_t a) {
+static inline fixint_t __fixint(fp_t a) {
   const fixint_t fixint_max = (fixint_t)((~(fixuint_t)0) / 2);
   const fixint_t fixint_min = -fixint_max - 1;
   // Break a into sign, exponent, significand parts.
diff --git a/rt/lib/impl/fp_fixuint_impl.inc b/rt/lib/impl/fp_fixuint_impl.inc
@@ -27,7 +27,7 @@
 #endif
 #define __fixuint FP_FIX_IMPL_PASTE(__fixuint, FP_FIX_SUFFIX)
 
-static __inline fixuint_t __fixuint(fp_t a) {
+static inline fixuint_t __fixuint(fp_t a) {
   // Break a into sign, exponent, significand parts.
   const rep_t aRep = toRep(a);
   const rep_t aAbs = aRep & absMask;
diff --git a/rt/lib/impl/fp_mul_impl.inc b/rt/lib/impl/fp_mul_impl.inc
@@ -29,7 +29,7 @@
 #ifdef _FP_MUL_EMIT
 #undef _FP_MUL_EMIT
 
-static __inline fp_t __mulXf3__(fp_t a, fp_t b) {
+static inline fp_t __mulXf3__(fp_t a, fp_t b) {
   const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;
   const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;
   const rep_t productSign = (toRep(a) ^ toRep(b)) & signBit;
diff --git a/rt/lib/impl/fp_trunc_impl.inc b/rt/lib/impl/fp_trunc_impl.inc
@@ -299,7 +299,7 @@ static inline dst_t dstFromRep(dst_rep_t x) {
 // 80-bit format. In particular, for the destination type dstSigFracBits may be
 // not equal to dstSigBits. The source type is assumed to be one of IEEE-754
 // standard types.
-static __inline dst_t __truncXfYf2__(src_t a) {
+static inline dst_t __truncXfYf2__(src_t a) {
   // Various constants whose values follow from the type parameters.
   // Any reasonable optimizer will fold and propagate all of these.
   const int srcInfExp = (1 << srcExpBits) - 1;
diff --git a/rt/lib/impl/int_div_impl.inc b/rt/lib/impl/int_div_impl.inc
@@ -18,7 +18,7 @@
 //   COMPUTE_UDIV(a, b)    -- expression yielding unsigned quotient
 //   ASSIGN_UMOD(res, a, b)-- statement assigning unsigned remainder to res
 //
-// Outputs (always emitted as `static __inline`):
+// Outputs (always emitted as `static inline`):
 //   __udivXi3_<suffix>, __umodXi3_<suffix>
 // Plus, conditionally:
 //   __divXi3_<suffix>     iff COMPUTE_UDIV is defined
@@ -43,7 +43,7 @@
 #endif
 
 // Adapted from Figure 3-40 of The PowerPC Compiler Writer's Guide
-static __inline fixuint_t
+static inline fixuint_t
 INT_DIV_IMPL_CAT(__udivXi3_, INT_DIV_SUFFIX)(fixuint_t n, fixuint_t d) {
   const unsigned N = sizeof(fixuint_t) * CHAR_BIT;
   // d == 0 cases are unspecified.
@@ -73,7 +73,7 @@ INT_DIV_IMPL_CAT(__udivXi3_, INT_DIV_SUFFIX)(fixuint_t n, fixuint_t d) {
 }
 
 // Mostly identical to __udivXi3 but the return values are different.
-static __inline fixuint_t
+static inline fixuint_t
 INT_DIV_IMPL_CAT(__umodXi3_, INT_DIV_SUFFIX)(fixuint_t n, fixuint_t d) {
   const unsigned N = sizeof(fixuint_t) * CHAR_BIT;
   // d == 0 cases are unspecified.
@@ -102,7 +102,7 @@ INT_DIV_IMPL_CAT(__umodXi3_, INT_DIV_SUFFIX)(fixuint_t n, fixuint_t d) {
 }
 
 #ifdef COMPUTE_UDIV
-static __inline fixint_t
+static inline fixint_t
 INT_DIV_IMPL_CAT(__divXi3_, INT_DIV_SUFFIX)(fixint_t a, fixint_t b) {
   const int N = (int)(sizeof(fixint_t) * CHAR_BIT) - 1;
   fixint_t s_a = a >> N;                            // s_a = a < 0 ? -1 : 0
@@ -115,7 +115,7 @@ INT_DIV_IMPL_CAT(__divXi3_, INT_DIV_SUFFIX)(fixint_t a, fixint_t b) {
 #endif // COMPUTE_UDIV
 
 #ifdef ASSIGN_UMOD
-static __inline fixint_t
+static inline fixint_t
 INT_DIV_IMPL_CAT(__modXi3_, INT_DIV_SUFFIX)(fixint_t a, fixint_t b) {
   const int N = (int)(sizeof(fixint_t) * CHAR_BIT) - 1;
   fixint_t s = b >> N;                              // s = b < 0 ? -1 : 0
diff --git a/rt/lib/impl/int_to_fp_impl.inc b/rt/lib/impl/int_to_fp_impl.inc
@@ -115,22 +115,22 @@
 #if defined SRC_I64
 typedef int64_t src_t;
 typedef uint64_t usrc_t;
-static __inline int clzSrcT(usrc_t x) { return __builtin_clzll(x); }
+static inline int clzSrcT(usrc_t x) { return __builtin_clzll(x); }
 
 #elif defined SRC_U64
 typedef uint64_t src_t;
 typedef uint64_t usrc_t;
-static __inline int clzSrcT(usrc_t x) { return __builtin_clzll(x); }
+static inline int clzSrcT(usrc_t x) { return __builtin_clzll(x); }
 
 #elif defined SRC_I128
 typedef __int128_t src_t;
 typedef __uint128_t usrc_t;
-static __inline int clzSrcT(usrc_t x) { return __clzti2(x); }
+static inline int clzSrcT(usrc_t x) { return __clzti2(x); }
 
 #elif defined SRC_U128
 typedef __uint128_t src_t;
 typedef __uint128_t usrc_t;
-static __inline int clzSrcT(usrc_t x) { return __clzti2(x); }
+static inline int clzSrcT(usrc_t x) { return __clzti2(x); }
 
 #endif
 
@@ -160,7 +160,7 @@ enum {
 
 #endif
 
-static __inline dst_t dstFromRep(dst_rep_t x) {
+static inline dst_t dstFromRep(dst_rep_t x) {
   const union {
     dst_t f;
     dst_rep_t i;
@@ -207,7 +207,7 @@ static __inline dst_t dstFromRep(dst_rep_t x) {
 #ifdef _INT_TO_FP_IMPL_EMIT
 #undef _INT_TO_FP_IMPL_EMIT
 
-static __inline dst_t __floatXiYf__(src_t a) {
+static inline dst_t __floatXiYf__(src_t a) {
   if (a == 0)
     return 0.0;
 
diff --git a/rt/lib/include/common/fp_lib.h b/rt/lib/include/common/fp_lib.h
@@ -120,9 +120,9 @@ typedef uint64_t twice_rep_t;
 typedef int32_t srep_t;
 typedef float fp_t;
 
-static __inline int rep_clz(rep_t a) { return clzsi(a); }
+static inline int rep_clz(rep_t a) { return clzsi(a); }
 
-static __inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
+static inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
   const uint64_t product = (uint64_t)a * b;
   *hi = product >> 32;
   *lo = product;
@@ -137,12 +137,12 @@ typedef uint64_t rep_t;
 typedef int64_t srep_t;
 typedef double fp_t;
 
-static __inline int rep_clz(rep_t a) { return __builtin_clzll(a); }
+static inline int rep_clz(rep_t a) { return __builtin_clzll(a); }
 
 #define loWord(a) (a & 0xffffffffU)
 #define hiWord(a) (a >> 32)
 
-static __inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
+static inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
   const uint64_t plolo = loWord(a) * loWord(b);
   const uint64_t plohi = loWord(a) * hiWord(b);
   const uint64_t philo = hiWord(a) * loWord(b);
@@ -166,7 +166,7 @@ typedef __uint128_t rep_t;
 typedef __int128_t srep_t;
 typedef tf_float fp_t;
 
-static __inline int rep_clz(rep_t a) {
+static inline int rep_clz(rep_t a) {
   const union {
     __uint128_t ll;
     struct {
@@ -192,7 +192,7 @@ static __inline int rep_clz(rep_t a) {
 #define Word_3(a) (uint64_t)((a >> 32) & Word_LoMask)
 #define Word_4(a) (uint64_t)(a & Word_LoMask)
 
-static __inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
+static inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
   const uint64_t product11 = Word_1(a) * Word_1(b);
   const uint64_t product12 = Word_1(a) * Word_2(b);
   const uint64_t product13 = Word_1(a) * Word_3(b);
@@ -256,7 +256,7 @@ static __inline void wideMultiply(rep_t a, rep_t b, rep_t* hi, rep_t* lo) {
 #ifdef _FP_LIB_EMIT_COMMON
 #undef _FP_LIB_EMIT_COMMON
 
-static __inline rep_t toRep(fp_t x) {
+static inline rep_t toRep(fp_t x) {
   const union {
     fp_t f;
     rep_t i;
@@ -264,7 +264,7 @@ static __inline rep_t toRep(fp_t x) {
   return rep.i;
 }
 
-static __inline fp_t fromRep(rep_t x) {
+static inline fp_t fromRep(rep_t x) {
   const union {
     fp_t f;
     rep_t i;
@@ -272,18 +272,18 @@ static __inline fp_t fromRep(rep_t x) {
   return rep.f;
 }
 
-static __inline int normalize(rep_t* significand) {
+static inline int normalize(rep_t* significand) {
   const int shift = rep_clz(*significand) - rep_clz(implicitBit);
   *significand <<= shift;
   return 1 - shift;
 }
 
-static __inline void wideLeftShift(rep_t* hi, rep_t* lo, int count) {
+static inline void wideLeftShift(rep_t* hi, rep_t* lo, int count) {
   *hi = *hi << count | *lo >> (typeWidth - count);
   *lo = *lo << count;
 }
 
-static __inline void wideRightShiftWithSticky(rep_t* hi, rep_t* lo,
+static inline void wideRightShiftWithSticky(rep_t* hi, rep_t* lo,
                                               unsigned int count) {
   if (count < typeWidth) {
     const bool sticky = (*lo << (typeWidth - count)) != 0;
@@ -300,7 +300,7 @@ static __inline void wideRightShiftWithSticky(rep_t* hi, rep_t* lo,
   }
 }
 
-static __inline fp_t __compiler_rt_logbX(fp_t x) {
+static inline fp_t __compiler_rt_logbX(fp_t x) {
   rep_t rep = toRep(x);
   int exp = (rep & exponentMask) >> significandBits;
 
@@ -323,7 +323,7 @@ static __inline fp_t __compiler_rt_logbX(fp_t x) {
   }
 }
 
-static __inline fp_t __compiler_rt_scalbnX(fp_t x, int y) {
+static inline fp_t __compiler_rt_scalbnX(fp_t x, int y) {
   const rep_t rep = toRep(x);
   int exp = (rep & exponentMask) >> significandBits;
 
@@ -352,38 +352,38 @@ static __inline fp_t __compiler_rt_scalbnX(fp_t x, int y) {
     return fromRep(sign | ((rep_t)exp << significandBits) | sig);
 }
 
-static __inline fp_t __compiler_rt_fmaxX(fp_t x, fp_t y) {
+static inline fp_t __compiler_rt_fmaxX(fp_t x, fp_t y) {
   return (crt_isnan(x) || x < y) ? y : x;
 }
 
 #if defined(SINGLE_PRECISION)
-static __inline fp_t __compiler_rt_logbf(fp_t x) {
+static inline fp_t __compiler_rt_logbf(fp_t x) {
   return __compiler_rt_logbX(x);
 }
-static __inline fp_t __compiler_rt_scalbnf(fp_t x, int y) {
+static inline fp_t __compiler_rt_scalbnf(fp_t x, int y) {
   return __compiler_rt_scalbnX(x, y);
 }
-static __inline fp_t __compiler_rt_fmaxf(fp_t x, fp_t y) {
+static inline fp_t __compiler_rt_fmaxf(fp_t x, fp_t y) {
   return __compiler_rt_fmaxX(x, y);
 }
 #elif defined(DOUBLE_PRECISION)
-static __inline fp_t __compiler_rt_logb(fp_t x) {
+static inline fp_t __compiler_rt_logb(fp_t x) {
   return __compiler_rt_logbX(x);
 }
-static __inline fp_t __compiler_rt_scalbn(fp_t x, int y) {
+static inline fp_t __compiler_rt_scalbn(fp_t x, int y) {
   return __compiler_rt_scalbnX(x, y);
 }
-static __inline fp_t __compiler_rt_fmax(fp_t x, fp_t y) {
+static inline fp_t __compiler_rt_fmax(fp_t x, fp_t y) {
   return __compiler_rt_fmaxX(x, y);
 }
 #elif defined(QUAD_PRECISION)
-static __inline tf_float __compiler_rt_logbtf(tf_float x) {
+static inline tf_float __compiler_rt_logbtf(tf_float x) {
   return __compiler_rt_logbX(x);
 }
-static __inline tf_float __compiler_rt_scalbntf(tf_float x, int y) {
+static inline tf_float __compiler_rt_scalbntf(tf_float x, int y) {
   return __compiler_rt_scalbnX(x, y);
 }
-static __inline tf_float __compiler_rt_fmaxtf(tf_float x, tf_float y) {
+static inline tf_float __compiler_rt_fmaxtf(tf_float x, tf_float y) {
   return __compiler_rt_fmaxX(x, y);
 }
 #endif
diff --git a/rt/lib/include/llp64_le/int_lib.h b/rt/lib/include/llp64_le/int_lib.h
@@ -78,13 +78,13 @@ typedef union {
   } s;
 } utwords;
 
-static __inline ti_int make_ti(di_int h, di_int l) {
+static inline ti_int make_ti(di_int h, di_int l) {
   twords r;
   r.s.high = h;
   r.s.low = l;
   return r.all;
 }
-static __inline tu_int make_tu(du_int h, du_int l) {
+static inline tu_int make_tu(du_int h, du_int l) {
   utwords r;
   r.s.high = h;
   r.s.low = l;
diff --git a/rt/lib/include/lp64_le/int_lib.h b/rt/lib/include/lp64_le/int_lib.h
@@ -80,13 +80,13 @@ typedef union {
   } s;
 } utwords;
 
-static __inline ti_int make_ti(di_int h, di_int l) {
+static inline ti_int make_ti(di_int h, di_int l) {
   twords r;
   r.s.high = h;
   r.s.low = l;
   return r.all;
 }
-static __inline tu_int make_tu(du_int h, du_int l) {
+static inline tu_int make_tu(du_int h, du_int l) {
   utwords r;
   r.s.high = h;
   r.s.low = l;
diff --git a/src/abi/abi.c b/src/abi/abi.c
@@ -103,6 +103,16 @@ static ABITypeInfo prim_info(TargetABI* a, TypeKind k) {
       r.align = 8;
       r.signed_ = 0;
       return r;
+    case TY_INT128:
+      r.size = 16;
+      r.align = 16;
+      r.signed_ = 1;
+      return r;
+    case TY_UINT128:
+      r.size = 16;
+      r.align = 16;
+      r.signed_ = 0;
+      return r;
     case TY_FLOAT:
       r.size = 4;
       r.align = 4;
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -143,6 +143,16 @@ static void push(CG* g, SValue v) {
   g->stack[g->sp++] = v;
 }
 
+/* __int128 / unsigned __int128 are parsed and laid out, but no backend
+ * implements arithmetic, conversion, or load/store on them yet. Trip a
+ * clear panic at the first codegen op that would touch the value. */
+static void reject_int128(CG* g, const Type* ty, const char* where) {
+  if (ty && (ty->kind == TY_INT128 || ty->kind == TY_UINT128)) {
+    compiler_panic(g->c, g->cur_loc,
+                   "%s: __int128 codegen not implemented", where);
+  }
+}
+
 static SValue pop(CG* g) {
   if (g->sp == 0) {
     compiler_panic(g->c, g->cur_loc, "cg: stack underflow");
@@ -553,6 +563,10 @@ void cg_free(CG* g) {
   h->free(h, g, sizeof *g);
 }
 
+CGTarget* cg_target(CG* g) {
+  return g ? g->target : NULL;
+}
+
 /* ============================================================
  * Function lifecycle
  * ============================================================ */
@@ -833,6 +847,7 @@ void cg_load(CG* g) {
    * fresh value reg, T->load through the lvalue's MemAccess, free the
    * old INDIRECT base if any, retag v as RES_REG. */
   const Type* ty = sv_type(&v);
+  reject_int128(g, ty, "cg_load");
   Operand dst = force_reg(g, &v, ty);
   push(g, make_sv(dst, ty));
 }
@@ -868,6 +883,7 @@ void cg_store(CG* g) {
     compiler_panic(g->c, g->cur_loc, "cg_store: destination is not an lvalue");
   }
   const Type* ty = sv_type(&lv);
+  reject_int128(g, ty, "cg_store");
   /* IMM is a legal source for store; otherwise force the rvalue into a
    * register. force_reg handles the lvalue → REG transition cleanly. */
   Operand src;
@@ -955,6 +971,7 @@ void cg_binop(CG* g, BinOp op) {
   CGTarget* T = g->target;
   /* Result type is `a`'s type at this slice (parser already coerced). */
   const Type* ty = a.type ? a.type : b.type;
+  reject_int128(g, ty, "cg_binop");
 
   /* Tier 1+2: constant-fold or apply algebraic identities via the
    * pure fold helper. KEEP_A/KEEP_B re-push the non-constant operand
@@ -998,6 +1015,7 @@ void cg_unop(CG* g, UnOp op) {
   SValue a = pop(g);
   CGTarget* T = g->target;
   const Type* ty = a.type ? a.type : a.op.type;
+  reject_int128(g, ty, "cg_unop");
 
   {
     Operand folded;
@@ -1080,6 +1098,8 @@ void cg_convert(CG* g, const Type* dst_ty) {
   SValue v = pop(g);
   CGTarget* T = g->target;
   const Type* sty = v.type ? v.type : v.op.type;
+  reject_int128(g, sty, "cg_convert");
+  reject_int128(g, dst_ty, "cg_convert");
   ConvKind ck;
   Operand src;
   Reg rr;
@@ -1519,6 +1539,11 @@ void cg_intrinsic_unary_to_int(CG* g, IntrinKind kind) {
   push(g, make_sv(dst, int_ty));
 }
 
+void cg_intrinsic_void(CG* g, IntrinKind kind) {
+  CGTarget* T = g->target;
+  T->intrinsic(T, kind, NULL, 0u, NULL, 0u);
+}
+
 /* ============================================================
  * Control flow — flat labels
  * ============================================================ */
diff --git a/src/cg/cg.h b/src/cg/cg.h
@@ -11,6 +11,7 @@ typedef struct Debug Debug;
 /* Debug is optional; pass NULL when -g is off. */
 CG* cg_new(Compiler*, CGTarget*, Debug*);
 void cg_free(CG*);
+CGTarget* cg_target(CG*);
 
 /* ----- functions ----- */
 void cg_func_begin(CG*, const CGFuncDesc*);
@@ -137,6 +138,12 @@ void cg_fence(CG*, MemOrder);
  * kind, pushes the result as `int`. */
 void cg_intrinsic_unary_to_int(CG*, IntrinKind);
 
+/* Zero-operand, zero-result intrinsic (e.g. __builtin_trap,
+ * __builtin_unreachable). Lowers via CGTarget.intrinsic and pushes no
+ * value — caller is responsible for pushing a dummy `int 0` if the
+ * builtin appears in an expression context. */
+void cg_intrinsic_void(CG*, IntrinKind);
+
 /* ----- control flow (CG-level labels) -----
  * cg_branch_true fuses with a preceding cg_cmp into a single
  * CGTarget.cmp_branch when the i1 on top of stack is the unconsumed result of
diff --git a/src/debug/c_debug.c b/src/debug/c_debug.c
@@ -125,6 +125,10 @@ static DebugTypeId walk_unqual(Debug* d, TargetABI* abi, const Type* t,
       return base_id(d, abi, t, "long long", DEBUG_BE_SIGNED);
     case TY_ULLONG:
       return base_id(d, abi, t, "unsigned long long", DEBUG_BE_UNSIGNED);
+    case TY_INT128:
+      return base_id(d, abi, t, "__int128", DEBUG_BE_SIGNED);
+    case TY_UINT128:
+      return base_id(d, abi, t, "unsigned __int128", DEBUG_BE_UNSIGNED);
     case TY_FLOAT:
       return base_id(d, abi, t, "float", DEBUG_BE_FLOAT);
     case TY_DOUBLE:
diff --git a/src/parse/parse.c b/src/parse/parse.c
@@ -863,8 +863,11 @@ static void parse_external_decl(Parser* p) {
     attr_list_append(&fent->attrs, dattrs);
 
     if (is_punct(&p->cur, '{')) {
+      Sym saved_func_name = p->cur_func_name;
+      p->cur_func_name = name;
       parse_function_body(p, fent->v.sym, fn_ty, abi, infos, nparams, loc,
                           fn_section_id, fn_decl_flags);
+      p->cur_func_name = saved_func_name;
       return;
     }
     if (accept_punct(p, ';')) {
@@ -979,6 +982,50 @@ static void parse_external_decl(Parser* p) {
   expect_punct(p, ';', "';' after global declaration");
 }
 
+static void parse_file_scope_asm(Parser* p) {
+  SrcLoc loc = tok_loc(&p->cur);
+  u8* bytes;
+  size_t nlen = 0;
+  Lexer* lex;
+  CGTarget* target;
+
+  advance(p); /* asm / __asm__ */
+  for (;;) {
+    if (is_kw(p, &p->cur, KW_VOLATILE)) {
+      advance(p);
+      continue;
+    }
+    if (p->cur.kind == TOK_IDENT && p->cur.v.ident == p->sym_volatile_alias) {
+      advance(p);
+      continue;
+    }
+    break;
+  }
+  expect_punct(p, '(', "'(' after file-scope asm");
+  if (p->cur.kind != TOK_STR) {
+    perr(p, "expected string literal in file-scope asm");
+  }
+  {
+    Tok t = p->cur;
+    advance(p);
+    bytes = decode_string_literal(p, &t, &nlen);
+  }
+  if (nlen > 0) nlen -= 1; /* drop C string terminator */
+  expect_punct(p, ')', "')' after file-scope asm");
+  expect_punct(p, ';', "';' after file-scope asm");
+
+  target = cg_target(p->cg);
+  if (!target || !target->mc) {
+    perr(p, "file-scope asm requires an object-code target");
+  }
+  cg_set_loc(p->cg, loc);
+  if (target->mc->set_loc) target->mc->set_loc(target->mc, loc);
+  lex = lex_open_mem(p->c, "<file-scope-asm>", (const char*)bytes, nlen);
+  parse_asm(p->c, lex, target->mc);
+  lex_close(lex);
+  p->c->env->heap->free(p->c->env->heap, bytes, 0);
+}
+
 static void parse_translation_unit(Parser* p) {
   while (p->cur.kind != TOK_EOF) {
     if (p->cur.kind == TOK_NEWLINE || is_pp_hash(&p->cur)) {
@@ -989,6 +1036,10 @@ static void parse_translation_unit(Parser* p) {
       parse_static_assert(p);
       continue;
     }
+    if (is_kw(p, &p->cur, KW_ASM) || is_kw(p, &p->cur, KW_BUILTIN_ASM)) {
+      parse_file_scope_asm(p);
+      continue;
+    }
     parse_external_decl(p);
   }
 }
@@ -1017,6 +1068,18 @@ void parse_c(Compiler* c, Pp* pp, DeclTable* decls, CG* cg, Debug* debug) {
 
   p.sym_b_alloca     = pool_intern_cstr(p.pool, "__builtin_alloca");
   p.sym_b_ctz        = pool_intern_cstr(p.pool, "__builtin_ctz");
+  p.sym_b_clz        = pool_intern_cstr(p.pool, "__builtin_clz");
+  p.sym_b_clzl       = pool_intern_cstr(p.pool, "__builtin_clzl");
+  p.sym_b_clzll      = pool_intern_cstr(p.pool, "__builtin_clzll");
+  p.sym_b_trap       = pool_intern_cstr(p.pool, "__builtin_trap");
+  p.sym_b_unreachable = pool_intern_cstr(p.pool, "__builtin_unreachable");
+  p.sym_b_memcpy     = pool_intern_cstr(p.pool, "__builtin_memcpy");
+  p.sym_b_memmove    = pool_intern_cstr(p.pool, "__builtin_memmove");
+  p.sym_b_memcmp     = pool_intern_cstr(p.pool, "__builtin_memcmp");
+  p.sym_b_memset     = pool_intern_cstr(p.pool, "__builtin_memset");
+  p.sym_func         = pool_intern_cstr(p.pool, "__func__");
+  p.sym_func_gcc     = pool_intern_cstr(p.pool, "__FUNCTION__");
+  p.sym_pretty_func_gcc = pool_intern_cstr(p.pool, "__PRETTY_FUNCTION__");
   p.sym_b_expect     = pool_intern_cstr(p.pool, "__builtin_expect");
   p.sym_b_offsetof   = pool_intern_cstr(p.pool, "__builtin_offsetof");
   p.sym_b_va_list    = pool_intern_cstr(p.pool, "__builtin_va_list");
@@ -1027,6 +1090,9 @@ void parse_c(Compiler* c, Pp* pp, DeclTable* decls, CG* cg, Debug* debug) {
   p.sym_attribute    = pool_intern_cstr(p.pool, "__attribute__");
   p.sym_volatile_alias = pool_intern_cstr(p.pool, "__volatile__");
   p.sym_alignof_alias  = pool_intern_cstr(p.pool, "__alignof__");
+  p.sym_int128         = pool_intern_cstr(p.pool, "__int128");
+  p.sym_int128_t       = pool_intern_cstr(p.pool, "__int128_t");
+  p.sym_uint128_t      = pool_intern_cstr(p.pool, "__uint128_t");
   p.sym_a_load_n     = pool_intern_cstr(p.pool, "__atomic_load_n");
   p.sym_a_store_n    = pool_intern_cstr(p.pool, "__atomic_store_n");
   p.sym_a_exchange_n = pool_intern_cstr(p.pool, "__atomic_exchange_n");
diff --git a/src/parse/parse_expr.c b/src/parse/parse_expr.c
@@ -300,6 +300,7 @@ ObjSymId emit_string_to_rodata(Parser* p, const u8* bytes, size_t n) {
  * ============================================================ */
 
 static i64 cexpr_unary(Parser* p, SrcLoc loc);
+static const Type* offsetof_designator(Parser* p, const Type* base, u32* off);
 
 static i64 cexpr_mul(Parser* p, SrcLoc loc) {
   i64 v = cexpr_unary(p, loc);
@@ -462,10 +463,24 @@ static i64 cexpr_unary(Parser* p, SrcLoc loc) {
     return v;
   }
   if (p->cur.kind == TOK_IDENT) {
-    SymEntry* e = scope_lookup(p, p->cur.v.ident);
-    if (e && e->kind == SEK_ENUM_CST) {
-      advance(p);
-      return e->v.enum_value;
+    Sym name = p->cur.v.ident;
+    if (name == p->sym_b_offsetof) {
+      u32 off = 0;
+      const Type* root;
+      advance(p); /* IDENT */
+      expect_punct(p, '(', "'(' after __builtin_offsetof");
+      root = parse_type_name(p);
+      expect_punct(p, ',', "',' in __builtin_offsetof");
+      (void)offsetof_designator(p, root, &off);
+      expect_punct(p, ')', "')' after __builtin_offsetof");
+      return (i64)off;
+    }
+    {
+      SymEntry* e = scope_lookup(p, name);
+      if (e && e->kind == SEK_ENUM_CST) {
+        advance(p);
+        return e->v.enum_value;
+      }
     }
     compiler_panic(p->c, loc, "non-constant identifier in constant expression");
   }
@@ -575,7 +590,27 @@ static int try_parse_builtin_call(Parser* p) {
   Sym name = p->cur.v.ident;
   SrcLoc loc = p->cur.loc;
 
+  /* `__builtin_mem{cpy,move,cmp,set}` are GCC/Clang's compiler-inlinable
+   * aliases for the libc functions. cfree's INTRIN_MEMCPY/MEMMOVE
+   * backend paths only handle constant byte counts, but the rt code
+   * calls them with runtime sizes. Rewrite each builtin into a plain
+   * call and let the normal function-call path handle it. The caller
+   * (parse_primary) reports a clean "undeclared identifier" if the TU
+   * forgot to declare the underlying libc function. */
+  if (name == p->sym_b_memcpy || name == p->sym_b_memmove ||
+      name == p->sym_b_memcmp || name == p->sym_b_memset) {
+    const char* libname = (name == p->sym_b_memcpy)  ? "memcpy"
+                        : (name == p->sym_b_memmove) ? "memmove"
+                        : (name == p->sym_b_memcmp)  ? "memcmp"
+                                                     : "memset";
+    p->cur.v.ident = pool_intern_cstr(p->pool, libname);
+    return 0;
+  }
+
   if (name != p->sym_b_alloca && name != p->sym_b_ctz &&
+      name != p->sym_b_clz && name != p->sym_b_clzl &&
+      name != p->sym_b_clzll && name != p->sym_b_trap &&
+      name != p->sym_b_unreachable &&
       name != p->sym_b_expect &&
       name != p->sym_b_offsetof && name != p->sym_b_va_start &&
       name != p->sym_b_va_arg && name != p->sym_b_va_end &&
@@ -628,6 +663,32 @@ static int try_parse_builtin_call(Parser* p) {
     return 1;
   }
 
+  if (name == p->sym_b_clz || name == p->sym_b_clzl ||
+      name == p->sym_b_clzll) {
+    parse_assign_expr(p);
+    to_rvalue(p);
+    expect_punct(p, ')', "')' after __builtin_clz");
+    cg_set_loc(p->cg, loc);
+    /* The operand carries its own type, which drives the sf bit on
+     * aarch64 / REX.W on x64 / sf on rv64. Whether the caller used the
+     * `l` / `ll` suffix only changes the C-level type the user wrote;
+     * cfree picks the instruction width from the value type. */
+    cg_intrinsic_unary_to_int(p->cg, INTRIN_CLZ);
+    return 1;
+  }
+
+  if (name == p->sym_b_trap || name == p->sym_b_unreachable) {
+    expect_punct(p, ')', "')' after __builtin_trap/unreachable");
+    cg_set_loc(p->cg, loc);
+    cg_intrinsic_void(p->cg,
+                      name == p->sym_b_trap ? INTRIN_TRAP : INTRIN_UNREACHABLE);
+    /* Both are noreturn at the C level. Push a dummy `int 0` so callers
+     * that consume an expression value (e.g. ternary, comma) don't see
+     * an empty stack — the dead value will be folded out. */
+    cg_push_int(p->cg, 0, ty_int(p));
+    return 1;
+  }
+
   if (name == p->sym_b_va_start) {
     parse_assign_expr(p);
     cg_addr(p->cg);
@@ -837,6 +898,38 @@ static void parse_primary(Parser* p) {
       Tok n = peek1(p);
       if (is_punct(&n, '(') && try_parse_builtin_call(p)) return;
     }
+    /* try_parse_builtin_call may rewrite the current ident in-place
+     * (e.g. __builtin_memcpy → memcpy) and return 0, asking us to
+     * resume normal lookup with the rewritten name. */
+    t = p->cur;
+    /* C99 §6.4.2.2: `__func__` inside a function-body acts as
+     *   static const char __func__[] = "<function-name>";
+     * GCC also exposes `__FUNCTION__` and `__PRETTY_FUNCTION__` with
+     * the same value. We synthesize the string lazily — the symbol
+     * lives in .rodata and the resulting type is `char[N+1]` (with the
+     * trailing NUL). */
+    if (t.v.ident == p->sym_func || t.v.ident == p->sym_func_gcc ||
+        t.v.ident == p->sym_pretty_func_gcc) {
+      if (p->cur_func_name == 0) {
+        compiler_panic(p->c, t.loc, "'%s' used outside a function",
+                       t.v.ident == p->sym_func ? "__func__"
+                       : t.v.ident == p->sym_func_gcc ? "__FUNCTION__"
+                       : "__PRETTY_FUNCTION__");
+      }
+      size_t nlen = 0;
+      const char* fn_name = pool_str(p->pool, p->cur_func_name, &nlen);
+      Heap* h = p->c->env->heap;
+      u8* bytes = (u8*)h->alloc(h, nlen + 1u, 1u);
+      for (size_t i = 0; i < nlen; ++i) bytes[i] = (u8)fn_name[i];
+      bytes[nlen] = 0;
+      ObjSymId sym = emit_string_to_rodata(p, bytes, nlen + 1u);
+      h->free(h, bytes, 0);
+      advance(p);
+      const Type* char_ty = type_prim(p->pool, TY_CHAR);
+      const Type* arr_ty = type_array(p->pool, char_ty, (u32)(nlen + 1u), 0);
+      cg_push_global(p->cg, sym, arr_ty);
+      return;
+    }
     e = scope_lookup(p, t.v.ident);
     if (!e) {
       size_t nlen = 0;
diff --git a/src/parse/parse_priv.h b/src/parse/parse_priv.h
@@ -192,6 +192,20 @@ typedef struct Parser {
 
   Sym sym_b_alloca;
   Sym sym_b_ctz;
+  Sym sym_b_clz;
+  Sym sym_b_clzl;
+  Sym sym_b_clzll;
+  Sym sym_b_trap;
+  Sym sym_b_unreachable;
+  Sym sym_b_memcpy;
+  Sym sym_b_memmove;
+  Sym sym_b_memcmp;
+  Sym sym_b_memset;
+  Sym sym_func;             /* __func__ */
+  Sym sym_func_gcc;         /* __FUNCTION__ */
+  Sym sym_pretty_func_gcc;  /* __PRETTY_FUNCTION__ */
+  Sym cur_func_name;        /* name of the function whose body we're in,
+                             * 0 at file scope */
   Sym sym_b_expect;
   Sym sym_b_offsetof;
   Sym sym_b_va_list;
@@ -202,6 +216,9 @@ typedef struct Parser {
   Sym sym_attribute;
   Sym sym_volatile_alias;
   Sym sym_alignof_alias;
+  Sym sym_int128;     /* __int128 */
+  Sym sym_int128_t;   /* __int128_t */
+  Sym sym_uint128_t;  /* __uint128_t */
   Sym sym_a_load_n;
   Sym sym_a_store_n;
   Sym sym_a_exchange_n;
@@ -277,6 +294,7 @@ typedef struct TypeSpecAccum {
   u8 saw_bool;
   u8 saw_float;
   u8 saw_double;
+  u8 saw_int128;  /* __int128 / __int128_t / __uint128_t */
   u8 saw_explicit_type;
 } TypeSpecAccum;
 
diff --git a/src/parse/parse_type.c b/src/parse/parse_type.c
@@ -353,6 +353,9 @@ const Type* resolve_type_specs(Parser* p, const TypeSpecAccum* a, SrcLoc loc) {
   if (a->saw_short) {
     return type_prim(p->pool, a->saw_unsigned ? TY_USHORT : TY_SHORT);
   }
+  if (a->saw_int128) {
+    return type_prim(p->pool, a->saw_unsigned ? TY_UINT128 : TY_INT128);
+  }
   if (a->long_count == 2) {
     return type_prim(p->pool, a->saw_unsigned ? TY_ULLONG : TY_LLONG);
   }
@@ -440,6 +443,14 @@ int parse_decl_specs(Parser* p, DeclSpecs* out) {
       acc.saw_float = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
     } else if (is_kw(p, &t, KW_DOUBLE)) {
       acc.saw_double = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+    } else if (t.kind == TOK_IDENT && t.v.ident == p->sym_int128) {
+      acc.saw_int128 = 1; acc.saw_explicit_type = 1; advance(p); seen = 1;
+    } else if (t.kind == TOK_IDENT && t.v.ident == p->sym_int128_t) {
+      acc.saw_int128 = 1; acc.saw_signed = 1; acc.saw_explicit_type = 1;
+      advance(p); seen = 1;
+    } else if (t.kind == TOK_IDENT && t.v.ident == p->sym_uint128_t) {
+      acc.saw_int128 = 1; acc.saw_unsigned = 1; acc.saw_explicit_type = 1;
+      advance(p); seen = 1;
     } else if (is_kw(p, &t, KW_STATIC)) {
       out->storage = DS_STATIC; advance(p); seen = 1;
     } else if (is_kw(p, &t, KW_EXTERN)) {
@@ -598,6 +609,7 @@ static void parse_member_decls(Parser* p, TypeRecordBuilder* b) {
         f.flags = FIELD_BITFIELD;
         if (w == 0) f.flags |= FIELD_ZERO_WIDTH;
         attrs_to_field(specs.attrs, &f);
+        if (specs.align > f.align_override) f.align_override = (u16)specs.align;
         type_record_field(b, f);
         if (!accept_punct(p, ',')) break;
         continue;
@@ -624,6 +636,7 @@ static void parse_member_decls(Parser* p, TypeRecordBuilder* b) {
         parse_attrs_into(p, &trailing);
         attrs_to_field(trailing, &f);
       }
+      if (specs.align > f.align_override) f.align_override = (u16)specs.align;
       type_record_field(b, f);
       if (!accept_punct(p, ',')) break;
     }
@@ -823,6 +836,8 @@ int starts_type_name(const Parser* p, const Tok* t) {
       return 1;
     case KW_NONE: {
       if (t->v.ident == p->sym_b_va_list) return 1;
+      if (t->v.ident == p->sym_int128 || t->v.ident == p->sym_int128_t ||
+          t->v.ident == p->sym_uint128_t) return 1;
       SymEntry* e = scope_lookup((Parser*)p, t->v.ident);
       return e && e->kind == SEK_TYPEDEF;
     }
diff --git a/src/pp/pp.c b/src/pp/pp.c
@@ -121,10 +121,35 @@ void push_buf(Pp* pp, Tok* toks, HidesetId* hs, u32 n) {
 
 Tok pp_next(Pp* pp) {
   /* Public: filter newlines so consumers like the C parser don't need
-   * to handle them. pp_emit_text uses pp_next_raw via its own loop. */
+   * to handle them. pp_emit_text uses pp_next_raw via its own loop.
+   *
+   * Also drop forwarded `#pragma` lines: do_pragma pushes the directive
+   * back onto the source stack so pp_emit_text can re-emit it verbatim
+   * in cpp mode, but the C parser (cc mode) would see the trailing
+   * tokens as stray identifiers. When we see TOK_PP_HASH followed by
+   * `pragma`, swallow tokens through the next NEWLINE. */
   for (;;) {
     Tok t = pp_next_raw(pp);
-    if (t.kind != TOK_NEWLINE) return t;
+    if (t.kind == TOK_NEWLINE) continue;
+    if (t.kind == TOK_PP_HASH) {
+      Tok t2 = pp_next_raw(pp);
+      if (t2.kind == TOK_IDENT && t2.v.ident == pp->sym_pragma) {
+        for (;;) {
+          Tok tt = pp_next_raw(pp);
+          if (tt.kind == TOK_NEWLINE || tt.kind == TOK_EOF) break;
+        }
+        continue;
+      }
+      /* Not a pragma — push the peeked token back as a 1-element buffer
+       * so the next pp_next_raw returns it, and surface the hash now. */
+      Tok* keep = arena_array(&pp->arena, Tok, 1);
+      HidesetId* hs = arena_array(&pp->arena, HidesetId, 1);
+      keep[0] = t2;
+      hs[0] = HS_EMPTY;
+      push_buf(pp, keep, hs, 1);
+      return t;
+    }
+    return t;
   }
 }
 
diff --git a/src/type/type.c b/src/type/type.c
@@ -330,6 +330,8 @@ int type_is_int(const Type* t) {
     case TY_ULONG:
     case TY_LLONG:
     case TY_ULLONG:
+    case TY_INT128:
+    case TY_UINT128:
     case TY_ENUM:
       return 1;
     default:
diff --git a/src/type/type.h b/src/type/type.h
@@ -18,6 +18,8 @@ typedef enum TypeKind {
   TY_ULONG,
   TY_LLONG,
   TY_ULLONG,
+  TY_INT128,
+  TY_UINT128,
   TY_FLOAT,
   TY_DOUBLE,
   TY_LDOUBLE,

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

M	doc/STAGE2.md	\|	92	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------
M	rt/lib/atomic/atomic_common.inc	\|	27	+++++++++++----------------
M	rt/lib/impl/fp_add_impl.inc	\|	2	+-
M	rt/lib/impl/fp_compare_impl.inc	\|	2	+-
M	rt/lib/impl/fp_div_impl.inc	\|	2	+-
M	rt/lib/impl/fp_extend_impl.inc	\|	2	+-
M	rt/lib/impl/fp_fixint_impl.inc	\|	2	+-
M	rt/lib/impl/fp_fixuint_impl.inc	\|	2	+-
M	rt/lib/impl/fp_mul_impl.inc	\|	2	+-
M	rt/lib/impl/fp_trunc_impl.inc	\|	2	+-
M	rt/lib/impl/int_div_impl.inc	\|	10	+++++-----
M	rt/lib/impl/int_to_fp_impl.inc	\|	12	++++++------
M	rt/lib/include/common/fp_lib.h	\|	46	+++++++++++++++++++++++-----------------------
M	rt/lib/include/llp64_le/int_lib.h	\|	4	++--
M	rt/lib/include/lp64_le/int_lib.h	\|	4	++--
M	src/abi/abi.c	\|	10	++++++++++
M	src/cg/cg.c	\|	25	+++++++++++++++++++++++++
M	src/cg/cg.h	\|	7	+++++++
M	src/debug/c_debug.c	\|	4	++++
M	src/parse/parse.c	\|	66	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	src/parse/parse_expr.c	\|	101	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
M	src/parse/parse_priv.h	\|	18	++++++++++++++++++
M	src/parse/parse_type.c	\|	15	+++++++++++++++
M	src/pp/pp.c	\|	29	+++++++++++++++++++++++++++--
M	src/type/type.c	\|	2	++
M	src/type/type.h	\|	2	++