cg/parse: C atomics — full aarch64 lowering + CAS/fence parser surface - kit

commit bded6d326e383ec53eabd205d8521456899ed8a5
parent b12f8b4dcdf5b08b279691628be1593b32ae4787
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 13:12:49 -0700

cg/parse: C atomics — full aarch64 lowering + CAS/fence parser surface

Wire cg_atomic_load/store/rmw/cas/fence through to the existing aarch64
LL/SC backend (the panic-stub bodies were the v1-slice gap). Add parser
support for __atomic_compare_exchange_n, __atomic_thread_fence, and
__atomic_signal_fence; CAS stashes &expected in a frame slot so the
failure branch can write the prior value back.

Drops the .skip sidecars on builtin_06/07 and adds builtin_08..19
covering store_n, exchange_n, fetch_{sub,and,or,xor}, 8-byte (long)
atomics, atomic-pointer load, thread_fence, and CAS (success / failure
/ retry-loop). All 60 atomic-related parse tests pass on D/R/E/J.

Diffstat:
M doc/parser-status.md  | 33 ++++++++++++++++++++++++++++-----
M src/cg/cg.c  | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
M src/parse/parse.c  | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
M test/parse/CORPUS.md  | 18 +++++++++++++++---
D test/parse/cases/builtin_06_atomic_load.skip  | 1 -
D test/parse/cases/builtin_07_atomic_fetch_add.skip  | 1 -
A test/parse/cases/builtin_08_atomic_store_n.c  | 5 +++++
A test/parse/cases/builtin_08_atomic_store_n.expected  | 1 +
A test/parse/cases/builtin_09_atomic_exchange_n.c  | 6 ++++++
A test/parse/cases/builtin_09_atomic_exchange_n.expected  | 1 +
A test/parse/cases/builtin_10_atomic_fetch_sub.c  | 6 ++++++
A test/parse/cases/builtin_10_atomic_fetch_sub.expected  | 1 +
A test/parse/cases/builtin_11_atomic_fetch_and.c  | 6 ++++++
A test/parse/cases/builtin_11_atomic_fetch_and.expected  | 1 +
A test/parse/cases/builtin_12_atomic_fetch_or.c  | 7 +++++++
A test/parse/cases/builtin_12_atomic_fetch_or.expected  | 1 +
A test/parse/cases/builtin_13_atomic_fetch_xor.c  | 7 +++++++
A test/parse/cases/builtin_13_atomic_fetch_xor.expected  | 1 +
A test/parse/cases/builtin_14_atomic_long.c  | 6 ++++++
A test/parse/cases/builtin_14_atomic_long.expected  | 1 +
A test/parse/cases/builtin_15_atomic_pointer.c  | 11 +++++++++++
A test/parse/cases/builtin_15_atomic_pointer.expected  | 1 +
A test/parse/cases/builtin_16_atomic_thread_fence.c  | 6 ++++++
A test/parse/cases/builtin_16_atomic_thread_fence.expected  | 1 +
A test/parse/cases/builtin_17_atomic_cas_success.c  | 9 +++++++++
A test/parse/cases/builtin_17_atomic_cas_success.expected  | 1 +
A test/parse/cases/builtin_18_atomic_cas_failure.c  | 9 +++++++++
A test/parse/cases/builtin_18_atomic_cas_failure.expected  | 1 +
A test/parse/cases/builtin_19_atomic_cas_loop.c  | 18 ++++++++++++++++++
A test/parse/cases/builtin_19_atomic_cas_loop.expected  | 1 +

30 files changed, 357 insertions(+), 28 deletions(-)
diff --git a/doc/parser-status.md b/doc/parser-status.md
@@ -418,6 +418,17 @@ machinery rather than ordinary call lowering. Contract:
       memory-order argument must be a constant-folded integer; pp
       predefines `__ATOMIC_RELAXED`..`__ATOMIC_SEQ_CST` to match the
       `MemOrder` enum so a header-less corpus row resolves the names.
+- [x] `__atomic_compare_exchange_n` — parser stashes `&expected` in a
+      frame slot, lowers `cg_atomic_cas` (which pushes [prior, ok]),
+      and conditionally stores `prior` back to `*expected` on the
+      failure branch. The `weak` flag is parsed and ignored (strong-CAS
+      is a safe over-approximation; AArch64 LL/SC has no weak form
+      anyway). The result is the `ok` flag as `int`.
+- [x] `__atomic_thread_fence` / `__atomic_signal_fence` — parser
+      routes through `cg_fence`. On AArch64 the backend emits `DMB ISH`
+      for any non-RELAXED order; signal_fence collapses to the same
+      lowering since cfree's threading model is single-threaded
+      observers and a compiler barrier suffices in either case.
 
 Phase 9 also added:
   - `try_parse_builtin_call` invoked from `parse_primary` ahead of
@@ -436,8 +447,17 @@ Phase 9 also added:
     (`abi_aapcs64_va_list_type` is a 32-byte struct; Apple ARM64 is
     `char*`).
 
-Lowering gap: the `cg_atomic_*` primitives still panic at the cg layer
-("not in v1 slice"). The `cg_va_*` primitives now wire through to the
+Atomics are now fully wired end-to-end on AArch64. `cg_atomic_load` /
+`cg_atomic_store` / `cg_atomic_rmw` / `cg_atomic_cas` / `cg_fence` pop
+a pointer (and value/expected/desired as appropriate) from the value
+stack, derive a `MemAccess` from the pointee type with `MF_ATOMIC` set,
+and dispatch to the AArch64 backend's LL/SC implementation
+(`LDXR`/`LDAXR` + `STXR`/`STLXR`, plain `LDUR`/`STUR` for relaxed
+load/store, `LDAR`/`STLR` for acquire/release scalar load/store, and
+`DMB ISH` for fences). The CAS retry loop and writeback-on-failure
+shape live in the parser; the cg layer just hands ([prior, ok]) back.
+
+The `cg_va_*` primitives now wire through to the
 backend's `va_start_` / `va_arg_` / `va_end_` / `va_copy_` hooks (the
 aarch64 backend already implemented all four against the AAPCS64 §6
 five-field `__va_list`); the only follow-up gap on the variadic side
@@ -459,9 +479,12 @@ offset exceeds STUR/LDUR's ±256 window — variadic functions hit this
 trivially because the GP+FP save areas reserve 192 bytes before any
 declared local.
 
-Atomics still carry `.skip` sidecars (`builtin_06`, `builtin_07`),
-treated as failure unless `CFREE_TEST_ALLOW_SKIP=1` — same convention
-as `6_7_2_12_long_double`.
+Atomics no longer carry `.skip` sidecars; `builtin_06_atomic_load` and
+`builtin_07_atomic_fetch_add` round-trip on aarch64 (paths D / R / E /
+J), and the suite is rounded out with `builtin_08`–`builtin_19`
+covering store_n, exchange_n, fetch_{sub,and,or,xor}, 8-byte (`long`)
+atomics, atomic-pointer load, thread_fence, and the three
+compare_exchange shapes (success / failure / retry-loop).
 
 Unlocks (status as landed): `builtin_01_alloca` ★, `builtin_02_expect`
 ★, `builtin_03_va_list` ★, `builtin_04_offsetof` ★, `builtin_05_va_copy`
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -1255,28 +1255,107 @@ void cg_setjmp(CG* g) {
 void cg_longjmp(CG* g) {
   compiler_panic(g->c, g->cur_loc, "cg_longjmp: not in v1 slice");
 }
+/* Atomics. The parser pushes the address as a pointer rvalue (typed `T*`)
+ * and any value operands as plain rvalues; cg pops them, materializes
+ * registers, derives a MemAccess from the pointee type, and dispatches to
+ * the backend. MF_ATOMIC is set on the MemAccess so opt sees the access
+ * as atomic regardless of any qualifier on the pointee. */
+static const Type* atomic_pointee(CG* g, const Type* pty, const char* who) {
+  if (!pty || pty->kind != TY_PTR) {
+    compiler_panic(g->c, g->cur_loc, "%s: operand is not a pointer", who);
+  }
+  return pty->ptr.pointee;
+}
+
+static MemAccess mem_for_atomic(CG* g, const Type* val_ty) {
+  MemAccess ma = derive_mem(g, val_ty, ALIAS_UNKNOWN, 0);
+  ma.flags |= MF_ATOMIC;
+  return ma;
+}
+
 void cg_atomic_load(CG* g, MemOrder o) {
-  (void)o;
-  compiler_panic(g->c, g->cur_loc, "cg_atomic_load: not in v1 slice");
+  CGTarget* T = g->target;
+  SValue ptr = pop(g);
+  ensure_reg(g, &ptr);
+  const Type* pty = sv_type(&ptr);
+  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_load");
+  Operand addr = force_reg(g, &ptr, pty);
+  Reg dst_r = alloc_reg_or_spill(g, type_class(val_ty), val_ty);
+  Operand dst = op_reg(dst_r, val_ty);
+  T->atomic_load(T, dst, addr, mem_for_atomic(g, val_ty), o);
+  release(g, &ptr);
+  push(g, make_sv(dst, val_ty));
 }
+
 void cg_atomic_store(CG* g, MemOrder o) {
-  (void)o;
-  compiler_panic(g->c, g->cur_loc, "cg_atomic_store: not in v1 slice");
+  CGTarget* T = g->target;
+  SValue val = pop(g);
+  SValue ptr = pop(g);
+  ensure_reg(g, &val);
+  ensure_reg(g, &ptr);
+  const Type* pty = sv_type(&ptr);
+  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_store");
+  Operand addr = force_reg(g, &ptr, pty);
+  Operand src = (val.op.kind == OPK_IMM || val.op.kind == OPK_REG)
+                    ? val.op
+                    : force_reg(g, &val, val_ty);
+  T->atomic_store(T, addr, src, mem_for_atomic(g, val_ty), o);
+  release(g, &val);
+  release(g, &ptr);
 }
+
 void cg_atomic_rmw(CG* g, AtomicOp a, MemOrder o) {
-  (void)a;
-  (void)o;
-  compiler_panic(g->c, g->cur_loc, "cg_atomic_rmw: not in v1 slice");
-}
-void cg_atomic_cas(CG* g, MemOrder s, MemOrder f) {
-  (void)s;
-  (void)f;
-  compiler_panic(g->c, g->cur_loc, "cg_atomic_cas: not in v1 slice");
-}
-void cg_fence(CG* g, MemOrder o) {
-  (void)o;
-  compiler_panic(g->c, g->cur_loc, "cg_fence: not in v1 slice");
-}
+  CGTarget* T = g->target;
+  SValue val = pop(g);
+  SValue ptr = pop(g);
+  ensure_reg(g, &val);
+  ensure_reg(g, &ptr);
+  const Type* pty = sv_type(&ptr);
+  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_rmw");
+  Operand addr = force_reg(g, &ptr, pty);
+  Operand vop = (val.op.kind == OPK_IMM || val.op.kind == OPK_REG)
+                    ? val.op
+                    : force_reg(g, &val, val_ty);
+  Reg dst_r = alloc_reg_or_spill(g, type_class(val_ty), val_ty);
+  Operand dst = op_reg(dst_r, val_ty);
+  T->atomic_rmw(T, a, dst, addr, vop, mem_for_atomic(g, val_ty), o);
+  release(g, &val);
+  release(g, &ptr);
+  push(g, make_sv(dst, val_ty));
+}
+
+void cg_atomic_cas(CG* g, MemOrder succ, MemOrder fail) {
+  CGTarget* T = g->target;
+  SValue desired = pop(g);
+  SValue expected = pop(g);
+  SValue ptr = pop(g);
+  ensure_reg(g, &desired);
+  ensure_reg(g, &expected);
+  ensure_reg(g, &ptr);
+  const Type* pty = sv_type(&ptr);
+  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_cas");
+  Operand addr = force_reg(g, &ptr, pty);
+  Operand exp_op = (expected.op.kind == OPK_IMM || expected.op.kind == OPK_REG)
+                       ? expected.op
+                       : force_reg(g, &expected, val_ty);
+  Operand des_op = (desired.op.kind == OPK_IMM || desired.op.kind == OPK_REG)
+                       ? desired.op
+                       : force_reg(g, &desired, val_ty);
+  Reg prior_r = alloc_reg_or_spill(g, type_class(val_ty), val_ty);
+  const Type* i32 = type_prim(g->pool, TY_INT);
+  Reg ok_r = alloc_reg_or_spill(g, RC_INT, i32);
+  Operand prior = op_reg(prior_r, val_ty);
+  Operand ok = op_reg(ok_r, i32);
+  T->atomic_cas(T, prior, ok, addr, exp_op, des_op, mem_for_atomic(g, val_ty),
+                succ, fail);
+  release(g, &desired);
+  release(g, &expected);
+  release(g, &ptr);
+  push(g, make_sv(prior, val_ty));
+  push(g, make_sv(ok, i32));
+}
+
+void cg_fence(CG* g, MemOrder o) { g->target->fence(g->target, o); }
 
 /* ============================================================
  * Control flow — flat labels
diff --git a/src/parse/parse.c b/src/parse/parse.c
@@ -245,6 +245,9 @@ typedef struct Parser {
   Sym sym_a_fetch_and;
   Sym sym_a_fetch_or;
   Sym sym_a_fetch_xor;
+  Sym sym_a_cas_n;
+  Sym sym_a_thread_fence;
+  Sym sym_a_signal_fence;
 
   Scope* scope; /* top of stack; file scope is the root */
 
@@ -1801,7 +1804,8 @@ static int try_parse_builtin_call(Parser* p) {
       name != p->sym_a_store_n && name != p->sym_a_exchange_n &&
       name != p->sym_a_fetch_add && name != p->sym_a_fetch_sub &&
       name != p->sym_a_fetch_and && name != p->sym_a_fetch_or &&
-      name != p->sym_a_fetch_xor) {
+      name != p->sym_a_fetch_xor && name != p->sym_a_cas_n &&
+      name != p->sym_a_thread_fence && name != p->sym_a_signal_fence) {
     return 0;
   }
   advance(p); /* IDENT */
@@ -1920,6 +1924,108 @@ static int try_parse_builtin_call(Parser* p) {
     return 1;
   }
 
+  if (name == p->sym_a_thread_fence || name == p->sym_a_signal_fence) {
+    /* `__atomic_thread_fence(order)` / `__atomic_signal_fence(order)`.
+     * Both consume an order constant. signal_fence is a compiler barrier
+     * only; on real arches we conservatively lower it the same as
+     * thread_fence (the backend's fence emits DMB ISH). */
+    i64 ord = eval_const_int(p, p->cur.loc);
+    expect_punct(p, ')', "')' after atomic fence");
+    cg_set_loc(p->cg, loc);
+    cg_fence(p->cg, (MemOrder)ord);
+    cg_push_int(p->cg, 0, ty_int(p));
+    return 1;
+  }
+
+  if (name == p->sym_a_cas_n) {
+    /* `__atomic_compare_exchange_n(ptr, &expected, desired, weak,
+     *                              success_order, failure_order)`.
+     * On match: stores `desired` at *ptr; returns 1.
+     * On mismatch: stores *ptr (the prior value) at *expected; returns 0.
+     *
+     * Strategy: pin &expected to a local, lower the CAS to [prior, ok]
+     * via cg, save both to locals, conditionally store prior to *expected
+     * on the failure branch, then push ok as the i32 result. Routing
+     * through frame slots keeps the value stack balanced across the
+     * conditional. */
+    parse_assign_expr(p); to_rvalue(p);  /* ptr */
+    expect_punct(p, ',', "',' in __atomic_compare_exchange_n");
+
+    parse_assign_expr(p); to_rvalue(p);  /* &expected */
+    const Type* eptr_ty = cg_top_type(p->cg);
+    if (!eptr_ty || eptr_ty->kind != TY_PTR) {
+      perr(p, "__atomic_compare_exchange_n: arg 2 must be a pointer");
+    }
+    const Type* val_ty = eptr_ty->ptr.pointee;
+
+    /* Stash &expected. */
+    FrameSlotDesc fsd; memset(&fsd, 0, sizeof fsd);
+    fsd.type = eptr_ty; fsd.size = 8; fsd.align = 8; fsd.kind = FS_LOCAL;
+    FrameSlot eslot = cg_local(p->cg, &fsd);
+    cg_push_local_typed(p->cg, eslot, eptr_ty);
+    cg_swap(p->cg);                            /* [ptr, eslot_lv, &expected] */
+    cg_store(p->cg); cg_drop(p->cg);           /* [ptr] */
+
+    /* Load expected_val = *expected. */
+    cg_push_local_typed(p->cg, eslot, eptr_ty);
+    cg_load(p->cg);
+    cg_deref(p->cg, val_ty);
+    cg_load(p->cg);                            /* [ptr, expected_val] */
+
+    expect_punct(p, ',', "',' in __atomic_compare_exchange_n");
+    parse_assign_expr(p); to_rvalue(p);        /* desired */
+    expect_punct(p, ',', "',' in __atomic_compare_exchange_n");
+
+    /* Stack: [ptr, expected_val, desired]. */
+    (void)eval_const_int(p, p->cur.loc);       /* weak (ignored — strong CAS) */
+    expect_punct(p, ',', "',' in __atomic_compare_exchange_n");
+    i64 succ = eval_const_int(p, p->cur.loc);
+    expect_punct(p, ',', "',' in __atomic_compare_exchange_n");
+    i64 fail = eval_const_int(p, p->cur.loc);
+    expect_punct(p, ')', "')' after __atomic_compare_exchange_n");
+
+    cg_set_loc(p->cg, loc);
+    cg_atomic_cas(p->cg, (MemOrder)succ, (MemOrder)fail);
+    /* Stack: [prior, ok]. */
+
+    /* Stash ok. */
+    const Type* ok_ty = cg_top_type(p->cg);
+    FrameSlotDesc okd; memset(&okd, 0, sizeof okd);
+    okd.type = ok_ty; okd.size = 4; okd.align = 4; okd.kind = FS_LOCAL;
+    FrameSlot okslot = cg_local(p->cg, &okd);
+    cg_push_local_typed(p->cg, okslot, ok_ty);
+    cg_swap(p->cg); cg_store(p->cg); cg_drop(p->cg);  /* [prior] */
+
+    /* Stash prior. */
+    FrameSlotDesc pd; memset(&pd, 0, sizeof pd);
+    pd.type = val_ty;
+    pd.size = abi_sizeof(p->abi, val_ty);
+    pd.align = abi_alignof(p->abi, val_ty);
+    pd.kind = FS_LOCAL;
+    FrameSlot pslot = cg_local(p->cg, &pd);
+    cg_push_local_typed(p->cg, pslot, val_ty);
+    cg_swap(p->cg); cg_store(p->cg); cg_drop(p->cg);  /* [] */
+
+    /* if (!ok) *expected = prior; */
+    cg_push_local_typed(p->cg, okslot, ok_ty);
+    cg_load(p->cg);
+    CGLabel L_done = cg_label_new(p->cg);
+    cg_branch_true(p->cg, L_done);
+    /* writeback */
+    cg_push_local_typed(p->cg, eslot, eptr_ty);
+    cg_load(p->cg);
+    cg_deref(p->cg, val_ty);
+    cg_push_local_typed(p->cg, pslot, val_ty);
+    cg_load(p->cg);
+    cg_store(p->cg); cg_drop(p->cg);
+    cg_label_place(p->cg, L_done);
+
+    /* Push ok as the i32 result. */
+    cg_push_local_typed(p->cg, okslot, ok_ty);
+    cg_load(p->cg);
+    return 1;
+  }
+
   /* The rmw family — exchange / fetch_{add,sub,and,or,xor} share the same
    * (ptr, val, order) shape; map name → AtomicOp. */
   AtomicOp op;
@@ -5003,6 +5109,9 @@ void parse_c(Compiler* c, Pp* pp, DeclTable* decls, CG* cg, Debug* debug) {
   p.sym_a_fetch_and  = pool_intern_cstr(p.pool, "__atomic_fetch_and");
   p.sym_a_fetch_or   = pool_intern_cstr(p.pool, "__atomic_fetch_or");
   p.sym_a_fetch_xor  = pool_intern_cstr(p.pool, "__atomic_fetch_xor");
+  p.sym_a_cas_n         = pool_intern_cstr(p.pool, "__atomic_compare_exchange_n");
+  p.sym_a_thread_fence  = pool_intern_cstr(p.pool, "__atomic_thread_fence");
+  p.sym_a_signal_fence  = pool_intern_cstr(p.pool, "__atomic_signal_fence");
 
   /* File scope. */
   p.scope = scope_new(&p, NULL);
diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md
@@ -333,9 +333,21 @@ ordinary calls.
 | `builtin_03_va_list`          | ★ | uses `__builtin_va_start`/`__builtin_va_arg`/`__builtin_va_end` summing three ints | 42 |
 | `builtin_04_offsetof`         | ★ | `struct S {int a, b;}; return (int)__builtin_offsetof(struct S, b) * 10 + 2;` | 42 |
 | `builtin_05_va_copy`          | ★ | walks varargs twice via `__builtin_va_copy` | 42 |
-| `builtin_06_atomic_load`      | · | `int x = 42; return __atomic_load_n(&x, __ATOMIC_RELAXED);` (`.skip` pending cg `cg_atomic_load`) | 42 |
-| `builtin_07_atomic_fetch_add` | · | `int x = 40; __atomic_fetch_add(&x, 2, __ATOMIC_RELAXED); return x;` (`.skip` pending cg `cg_atomic_rmw`) | 42 |
-| `builtin_08_syscall0`         | (deferred) | `__cfree_syscall0` requires linking against the syscall stub; covered in `test/libc` | — |
+| `builtin_06_atomic_load`      | ★ | `__atomic_load_n(&x, __ATOMIC_RELAXED)` — RELAXED-ordered load via LDUR | 42 |
+| `builtin_07_atomic_fetch_add` | ★ | `__atomic_fetch_add(&x, 2, __ATOMIC_RELAXED)` — LL/SC RMW loop | 42 |
+| `builtin_08_atomic_store_n`   | ★ | `__atomic_store_n(&x, 42, __ATOMIC_RELEASE)` — RELEASE store via STLR | 42 |
+| `builtin_09_atomic_exchange_n`| ★ | `__atomic_exchange_n(&x, 99, __ATOMIC_SEQ_CST)` — RMW XCHG, returns prior | 42 |
+| `builtin_10_atomic_fetch_sub` | ★ | `__atomic_fetch_sub` with ACQ_REL — LDAXR/STLXR loop | 42 |
+| `builtin_11_atomic_fetch_and` | ★ | `__atomic_fetch_and` with RELAXED — RMW AND combine | 42 |
+| `builtin_12_atomic_fetch_or`  | ★ | `__atomic_fetch_or` with ACQUIRE — LDAXR/STXR loop | 42 |
+| `builtin_13_atomic_fetch_xor` | ★ | `__atomic_fetch_xor` with SEQ_CST — RMW EOR combine | 42 |
+| `builtin_14_atomic_long`      | ★ | 8-byte atomic on `long` — `sf=1` path through ldxr/stxr | 42 |
+| `builtin_15_atomic_pointer`   | ★ | atomic load of a pointer-typed variable; result deref'd through array | 42 |
+| `builtin_16_atomic_thread_fence` | ★ | `__atomic_thread_fence(__ATOMIC_SEQ_CST)` — emits DMB ISH | 42 |
+| `builtin_17_atomic_cas_success` | ★ | `__atomic_compare_exchange_n` matching path: stores desired, returns 1 | 42 |
+| `builtin_18_atomic_cas_failure` | ★ | CAS mismatch: writes prior to *expected, returns 0 | 42 |
+| `builtin_19_atomic_cas_loop`  | ★ | lock-free increment via CAS retry loop (ACQ_REL/ACQUIRE pair) | 42 |
+| `builtin_99_syscall0`         | (deferred) | `__cfree_syscall0` requires linking against the syscall stub; covered in `test/libc` | — |
 
 ## Variadic coverage
 
diff --git a/test/parse/cases/builtin_06_atomic_load.skip b/test/parse/cases/builtin_06_atomic_load.skip
@@ -1 +0,0 @@
-cg_atomic_load is a stub; parser routing for __atomic_load_n landed in Phase 9, lowering needs cg backend wiring
diff --git a/test/parse/cases/builtin_07_atomic_fetch_add.skip b/test/parse/cases/builtin_07_atomic_fetch_add.skip
@@ -1 +0,0 @@
-cg_atomic_rmw is a stub; parser routing for __atomic_fetch_add landed in Phase 9, lowering needs cg backend wiring
diff --git a/test/parse/cases/builtin_08_atomic_store_n.c b/test/parse/cases/builtin_08_atomic_store_n.c
@@ -0,0 +1,5 @@
+int test_main(void) {
+  int x = 0;
+  __atomic_store_n(&x, 42, __ATOMIC_RELEASE);
+  return x;
+}
diff --git a/test/parse/cases/builtin_08_atomic_store_n.expected b/test/parse/cases/builtin_08_atomic_store_n.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_09_atomic_exchange_n.c b/test/parse/cases/builtin_09_atomic_exchange_n.c
@@ -0,0 +1,6 @@
+int test_main(void) {
+  int x = 30;
+  int prior = __atomic_exchange_n(&x, 99, __ATOMIC_SEQ_CST);
+  /* prior was 30, x is now 99; result is prior + (x - 99) + 12 = 42 */
+  return prior + (x - 99) + 12;
+}
diff --git a/test/parse/cases/builtin_09_atomic_exchange_n.expected b/test/parse/cases/builtin_09_atomic_exchange_n.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_10_atomic_fetch_sub.c b/test/parse/cases/builtin_10_atomic_fetch_sub.c
@@ -0,0 +1,6 @@
+int test_main(void) {
+  int x = 50;
+  int prior = __atomic_fetch_sub(&x, 8, __ATOMIC_ACQ_REL);
+  /* prior=50, x=42 -> result 42 */
+  return x + (prior - 50);
+}
diff --git a/test/parse/cases/builtin_10_atomic_fetch_sub.expected b/test/parse/cases/builtin_10_atomic_fetch_sub.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_11_atomic_fetch_and.c b/test/parse/cases/builtin_11_atomic_fetch_and.c
@@ -0,0 +1,6 @@
+int test_main(void) {
+  int x = 0xFE;       /* 1111 1110 */
+  int prior = __atomic_fetch_and(&x, 0x6F, __ATOMIC_RELAXED);
+  /* prior=0xFE, x=0x6E. 0x6E=110, return 110 - 68 = 42 */
+  return x + (prior - 0xFE) - 68;
+}
diff --git a/test/parse/cases/builtin_11_atomic_fetch_and.expected b/test/parse/cases/builtin_11_atomic_fetch_and.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_12_atomic_fetch_or.c b/test/parse/cases/builtin_12_atomic_fetch_or.c
@@ -0,0 +1,7 @@
+int test_main(void) {
+  int x = 0x20;       /* 0010 0000 */
+  int prior = __atomic_fetch_or(&x, 0x0A, __ATOMIC_ACQUIRE);
+  /* prior=0x20=32, x=0x2A=42 */
+  (void)prior;
+  return x;
+}
diff --git a/test/parse/cases/builtin_12_atomic_fetch_or.expected b/test/parse/cases/builtin_12_atomic_fetch_or.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_13_atomic_fetch_xor.c b/test/parse/cases/builtin_13_atomic_fetch_xor.c
@@ -0,0 +1,7 @@
+int test_main(void) {
+  int x = 0x55;       /* 0101 0101 = 85 */
+  int prior = __atomic_fetch_xor(&x, 0x7F, __ATOMIC_SEQ_CST);
+  /* x = 0x55 ^ 0x7F = 0x2A = 42 */
+  (void)prior;
+  return x;
+}
diff --git a/test/parse/cases/builtin_13_atomic_fetch_xor.expected b/test/parse/cases/builtin_13_atomic_fetch_xor.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_14_atomic_long.c b/test/parse/cases/builtin_14_atomic_long.c
@@ -0,0 +1,6 @@
+int test_main(void) {
+  long x = 1000000000L;
+  long y = __atomic_fetch_add(&x, 5L, __ATOMIC_SEQ_CST);
+  /* y was 1e9; x is 1e9+5. Subtract to fit i32 result. */
+  return (int)(x - y) + 37;
+}
diff --git a/test/parse/cases/builtin_14_atomic_long.expected b/test/parse/cases/builtin_14_atomic_long.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_15_atomic_pointer.c b/test/parse/cases/builtin_15_atomic_pointer.c
@@ -0,0 +1,11 @@
+int test_main(void) {
+  int arr[3];
+  arr[0] = 10;
+  arr[1] = 20;
+  arr[2] = 12;
+
+  int *p = &arr[0];
+  /* Atomically load p, then read through the loaded pointer. */
+  int *q = __atomic_load_n(&p, __ATOMIC_ACQUIRE);
+  return q[0] + q[1] + q[2];
+}
diff --git a/test/parse/cases/builtin_15_atomic_pointer.expected b/test/parse/cases/builtin_15_atomic_pointer.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_16_atomic_thread_fence.c b/test/parse/cases/builtin_16_atomic_thread_fence.c
@@ -0,0 +1,6 @@
+int test_main(void) {
+  int x = 0;
+  __atomic_store_n(&x, 21, __ATOMIC_RELAXED);
+  __atomic_thread_fence(__ATOMIC_SEQ_CST);
+  return __atomic_load_n(&x, __ATOMIC_RELAXED) * 2;
+}
diff --git a/test/parse/cases/builtin_16_atomic_thread_fence.expected b/test/parse/cases/builtin_16_atomic_thread_fence.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_17_atomic_cas_success.c b/test/parse/cases/builtin_17_atomic_cas_success.c
@@ -0,0 +1,9 @@
+int test_main(void) {
+  int x = 10;
+  int expected = 10;
+  /* CAS: x==expected -> store 42, return 1. x becomes 42. */
+  int ok = __atomic_compare_exchange_n(&x, &expected, 42, 0,
+                                       __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
+  /* ok==1, x==42, expected==10 (unchanged on success). */
+  return x + ok - 1;
+}
diff --git a/test/parse/cases/builtin_17_atomic_cas_success.expected b/test/parse/cases/builtin_17_atomic_cas_success.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_18_atomic_cas_failure.c b/test/parse/cases/builtin_18_atomic_cas_failure.c
@@ -0,0 +1,9 @@
+int test_main(void) {
+  int x = 7;
+  int expected = 5;       /* mismatch: x != expected */
+  int ok = __atomic_compare_exchange_n(&x, &expected, 99, 0,
+                                       __ATOMIC_SEQ_CST, __ATOMIC_RELAXED);
+  /* ok==0, x stays at 7, expected is updated to 7 (the prior value). */
+  /* expected*6=42, ok==0, x==7. Return: expected*6 + ok + (x-7) = 42 */
+  return expected * 6 + ok + (x - 7);
+}
diff --git a/test/parse/cases/builtin_18_atomic_cas_failure.expected b/test/parse/cases/builtin_18_atomic_cas_failure.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/builtin_19_atomic_cas_loop.c b/test/parse/cases/builtin_19_atomic_cas_loop.c
@@ -0,0 +1,18 @@
+/* Lock-free increment via CAS retry loop. Single-threaded test, so the
+ * first CAS attempt always succeeds, but the loop shape exercises the
+ * compare-exchange surface end-to-end. */
+int atomic_inc(int *p) {
+  int cur = __atomic_load_n(p, __ATOMIC_ACQUIRE);
+  for (;;) {
+    int next = cur + 1;
+    if (__atomic_compare_exchange_n(p, &cur, next, 0,
+                                    __ATOMIC_ACQ_REL, __ATOMIC_ACQUIRE)) {
+      return next;
+    }
+  }
+}
+
+int test_main(void) {
+  int x = 41;
+  return atomic_inc(&x);
+}
diff --git a/test/parse/cases/builtin_19_atomic_cas_loop.expected b/test/parse/cases/builtin_19_atomic_cas_loop.expected
@@ -0,0 +1 @@
+42

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

M	doc/parser-status.md	\|	33	++++++++++++++++++++++++++++-----
M	src/cg/cg.c	\|	113	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
M	src/parse/parse.c	\|	111	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
M	test/parse/CORPUS.md	\|	18	+++++++++++++++---
D	test/parse/cases/builtin_06_atomic_load.skip	\|	1	-
D	test/parse/cases/builtin_07_atomic_fetch_add.skip	\|	1	-
A	test/parse/cases/builtin_08_atomic_store_n.c	\|	5	+++++
A	test/parse/cases/builtin_08_atomic_store_n.expected	\|	1	+
A	test/parse/cases/builtin_09_atomic_exchange_n.c	\|	6	++++++
A	test/parse/cases/builtin_09_atomic_exchange_n.expected	\|	1	+
A	test/parse/cases/builtin_10_atomic_fetch_sub.c	\|	6	++++++
A	test/parse/cases/builtin_10_atomic_fetch_sub.expected	\|	1	+
A	test/parse/cases/builtin_11_atomic_fetch_and.c	\|	6	++++++
A	test/parse/cases/builtin_11_atomic_fetch_and.expected	\|	1	+
A	test/parse/cases/builtin_12_atomic_fetch_or.c	\|	7	+++++++
A	test/parse/cases/builtin_12_atomic_fetch_or.expected	\|	1	+
A	test/parse/cases/builtin_13_atomic_fetch_xor.c	\|	7	+++++++
A	test/parse/cases/builtin_13_atomic_fetch_xor.expected	\|	1	+
A	test/parse/cases/builtin_14_atomic_long.c	\|	6	++++++
A	test/parse/cases/builtin_14_atomic_long.expected	\|	1	+
A	test/parse/cases/builtin_15_atomic_pointer.c	\|	11	+++++++++++
A	test/parse/cases/builtin_15_atomic_pointer.expected	\|	1	+
A	test/parse/cases/builtin_16_atomic_thread_fence.c	\|	6	++++++
A	test/parse/cases/builtin_16_atomic_thread_fence.expected	\|	1	+
A	test/parse/cases/builtin_17_atomic_cas_success.c	\|	9	+++++++++
A	test/parse/cases/builtin_17_atomic_cas_success.expected	\|	1	+
A	test/parse/cases/builtin_18_atomic_cas_failure.c	\|	9	+++++++++
A	test/parse/cases/builtin_18_atomic_cas_failure.expected	\|	1	+
A	test/parse/cases/builtin_19_atomic_cas_loop.c	\|	18	++++++++++++++++++
A	test/parse/cases/builtin_19_atomic_cas_loop.expected	\|	1	+