lisp.M1 step 10a-10c: FFI harness, registration, 27 primitives - boot2

commit 7a1408b717a8edf57455a7733b9113603afe3bbf
parent b9b5b0447f9729f86deac54ff643d8097c82636b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue, 21 Apr 2026 08:41:09 -0700

lisp.M1 step 10a-10c: FFI harness, registration, 27 primitives

- 10a: make_primitive (type 5 fixed / 6 variadic), apply forks on
  header type byte, marshal_argv populates prim_argv BSS buffer
  (32 slots), EPILOGUE_N4 + B to prim_dispatch cascade so primitive
  bodies RET straight to apply's caller.
- 10b: static prim_table (40-byte records: name-ptr, len, code, type,
  arity; zero-ptr sentinel) walked by _start to intern names and bind
  tagged primitive objects in global_env.
- 10c: 27 arithmetic/comparison/bitwise/predicate primitives with a
  minimal #t/#f reader and printer slice — predicates land useable.
  tests/lisp/10..14 exercise each primitive on all three arches.

Diffstat:
M docs/LISP.md  | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------
M src/lisp.M1  | 1489 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
M src/p1_gen.py  | 5 +++--
A tests/lisp/10-arith.expected  | 1 +
A tests/lisp/10-arith.scm  | 17 +++++++++++++++++
A tests/lisp/11-compare.expected  | 1 +
A tests/lisp/11-compare.scm  | 16 ++++++++++++++++
A tests/lisp/12-numpred.expected  | 1 +
A tests/lisp/12-numpred.scm  | 20 ++++++++++++++++++++
A tests/lisp/13-bitwise.expected  | 1 +
A tests/lisp/13-bitwise.scm  | 19 +++++++++++++++++++
A tests/lisp/14-tagpred.expected  | 1 +
A tests/lisp/14-tagpred.scm  | 18 ++++++++++++++++++

13 files changed, 1547 insertions(+), 143 deletions(-)
diff --git a/docs/LISP.md b/docs/LISP.md
@@ -4,9 +4,10 @@
 
 A small Scheme-flavored Lisp interpreter written once in the P1 pseudo-ISA
 (see [P1.md](P1.md)) and assembled to three host arches (amd64, aarch64,
-riscv64) via the existing `M1` + `hex2` chain. Hosts the C1 compiler (see
-[SEED.md](SEED.md)) and the C compiler (see [PLAN.md](PLAN.md)) as Lisp
-programs. Target size: 4,000–6,000 lines of P1.
+riscv64) via the existing `M1` + `hex2` chain. Hosts the C compiler (see
+[PLAN.md](PLAN.md)) as a Lisp program; that compiler is used for both the
+seed userland ([SEED.md](SEED.md)) and tcc-boot. Target size: 4,000–6,000
+lines of P1.
 
 This document covers only the interpreter. The Lisp-language surface it
 exposes is specified by PLAN.md's "Feature floor" — this doc does not
@@ -15,13 +16,12 @@ restate it.
 ## Position in the chain
 
 ```
-M1 asm  →  P1 pseudo-ISA  →  Lisp interpreter (this doc)  →  C1 compiler
-                                                         →  C compiler  →  tcc-boot
+M1 asm  →  P1 pseudo-ISA  →  Lisp interpreter (this doc)  →  C compiler  →  seed + tcc-boot
 ```
 
 The interpreter binary is the last artifact built from P1 source in the
-bootstrap. Everything above it is either Lisp source text or a C1/C binary
-produced by a Lisp program.
+bootstrap. Everything above it is either Lisp source text or a C binary
+produced by the Lisp-hosted C compiler.
 
 ## Settled decisions
 
@@ -108,9 +108,18 @@ before their payload:
 - **Symbol**: header + name bytes + a 1-word intern-chain link. Two
   symbols are `eq?` iff their pointers match; the interner guarantees
   one object per name.
-- **Closure**: header + `params` list pointer + `body` list pointer +
-  `env` pointer. Arity in header.
-- **Primitive**: header + P1 function pointer. Arity in header.
+- **Closure**: header (type 4) + `params` list pointer + `body` list
+  pointer + `env` pointer. Arity in header.
+- **Primitive (fixed arity)**: header (type 5) + primitive code id.
+  Arity in header low 48 bits.
+- **Primitive (variadic)**: header (type 6) + primitive code id.
+  Arity field unused; body does its own min-arg check.
+
+The "code id" is an integer 1..N assigned at generation time. `apply`
+resolves it by cascade — a chain of `BEQ code, N, &prim_N_body` — so
+the seed doesn't need an indirect-call op in P1. A future refactor can
+swap the cascade for an indirect call once a `CALLR` op lands; the
+primitive object layout doesn't change.
 
 ## Heap layout
 
@@ -348,9 +357,17 @@ Each primitive is an ordinary P1 function:
   back-to-back).
 - **Out:** `r0` = result tagged word.
 
-A static dispatch table maps `(symbol-pointer → primitive-pointer)`. At
-startup the interpreter populates `global_env` with one binding per
-primitive, each wrapped as a primitive-type heap object.
+At startup, the interpreter walks a static `(name, len, code, arity,
+type)` table and, for each row, interns the name, allocates a
+primitive heap object (type 5 fixed or type 6 variadic, carrying the
+code id), and binds it in `global_env`. Primitives are first-class
+values.
+
+`apply` dispatches by cascade on the code id: a straight-line chain of
+`BEQ r_code, N, &prim_N_body` — one entry per primitive. The cost is
+O(n) per call; acceptable for seed workloads. Can be swapped for an
+indirect call later (add `CALLR` to p1_gen.py, store `&prim_N_body`
+directly in the object at +8) without touching any primitive body.
 
 Primitives allocate through the same `alloc`/GC path as user code and
 must honor the per-frame-slot root discipline when they make more than
@@ -359,7 +376,7 @@ one allocation.
 Target set: ~40 primitives per PLAN.md. Roughly:
 
 - **List/pair**: `cons car cdr pair? null? list? list length append
-  reverse map filter fold assoc member`
+  reverse assoc member`
 - **Arithmetic**: `+ - * / % = < > <= >= zero? negative? positive?
   abs min max`
 - **Bitwise**: `bit-and bit-or bit-xor bit-not arithmetic-shift`
@@ -372,6 +389,11 @@ Target set: ~40 primitives per PLAN.md. Roughly:
 - **I/O**: `read-file write-file display write newline format error`
 - **Control**: `apply`
 
+`map`, `filter`, and `fold` are higher-order and would force
+primitives to re-enter `eval`. They live in a short Scheme prelude
+instead (evaluated by `_start` after the primitive registration loop);
+not P1 code.
+
 ## `pmatch`
 
 Built-in special form. Pattern language:
@@ -404,8 +426,8 @@ expansion so a `pmatch` site does not recompile per execution.
 2. Write `"error: "` + the message + formatted args to fd 2.
 3. `exit(1)`.
 
-No unwinding, no continuations, no handlers. The C1/C compilers either
-succeed or exit with diagnostics.
+No unwinding, no continuations, no handlers. The C compiler either
+succeeds or exits with diagnostics.
 
 ## Startup
 
@@ -492,11 +514,44 @@ coverage before the next begins.
    test-lisp` (single arch) and `make test-lisp-all` (tri-arch diff);
    pass = exit 0 and expected stdout. Locks in regression coverage
    before the feature surface grows.
-10. **Primitives (~40).** Mechanical once the FFI harness exists.
-    ~500 LOC.
-11. **Reader extensions.** `'`/`` ` ``/`,`/`,@`, strings, `#t`/`#f`,
-    `#\char`, `#( … )`, improper `.` tail, hex and negative fixnums.
-    Source-location side table explicitly deferred to step 16.
+10. **Primitives (~40).** Broken into sub-steps; ~1200 P1 LOC total
+    (~200 harness + ~1000 primitives). Earlier ~500 estimate was
+    optimistic.
+    - **10a. FFI harness.** `make_primitive` constructor (type 5
+      fixed, type 6 variadic). Fork in `apply` on header type:
+      closure → existing path; prim-fixed → count, arity-check,
+      marshal argv, cascade-dispatch on code id; prim-variadic →
+      marshal + dispatch, no arity check. `prim_argv[]` BSS buffer
+      (64 slots × 8 B).
+    - **10b. Registration.** Static `(name, len, code, arity, type)`
+      table; `_start` walks it, interning each name and binding a
+      fresh primitive object in `global_env`.
+    - **10c. Arithmetic + bitwise + tag predicates.** 27 primitives:
+      `+ - * / % = < > <= >= zero? negative? positive? abs min max
+      bit-and bit-or bit-xor bit-not arithmetic-shift number?
+      symbol? string? vector? procedure? eq?`. Predicates return
+      `#t`/`#f` — so a minimal slice of step 11 (reader + printer
+      support for the `#t`/`#f` singletons) lands here too.
+    - **10d. List-core primitives.** `cons car cdr pair? null? list
+      length list? append reverse assoc member`.
+    - **10e. String primitives.** `string? string-length string-ref
+      substring string-append string->symbol symbol->string`.
+    - **10f. Vector primitives.** `make-vector vector-ref
+      vector-set! vector-length vector->list list->vector`. Adds
+      `make_vector` runtime helper (absent before 10f).
+    - **10g. I/O + `equal?` + `apply`.** `display write newline
+      format error read-file write-file` wrap existing labels;
+      `format` dispatches `~a ~s ~d ~%`. `equal?` recurses on
+      pairs/strings/vectors; non-allocating. `apply` primitive
+      re-enters the internal `apply` label.
+    - **10h. Lisp prelude.** ~20 lines of Scheme defining `map`,
+      `filter`, `fold` in terms of the primitives above. Embedded
+      in BSS as a string; `_start` parses and evaluates it right
+      after 10b finishes.
+11. **Reader extensions.** `'`/`` ` ``/`,`/`,@`, strings, `#\char`,
+    `#( … )`, improper `.` tail, hex and negative fixnums. (`#t`/`#f`
+    land early in 10c.) Source-location side table explicitly
+    deferred to step 16.
 12. **Eval extensions.** `set!`, `let`/`let*`/`letrec`, `cond`,
     `quasiquote`; inner `define` → `letrec`-shape rewrite.
 13. **Mark-sweep GC.** Mark bitmap, root discipline, sweep, size-classed
@@ -510,5 +565,5 @@ coverage before the next begins.
 17. **End-to-end.** Run a 200-LOC Scheme test program exercising all
     features; diff tri-arch output.
 
-Rough rollup: ~4,600 P1 LOC (includes source-location side table),
-mid-range in PLAN.md's 4–6k budget.
+Rough rollup: ~5,300 P1 LOC (step 10 bumped from ~500 to ~1200 based
+on 10a–10c drafting), still mid-range in PLAN.md's 4–6k budget.
diff --git a/src/lisp.M1 b/src/lisp.M1
@@ -189,6 +189,44 @@
     &sym_define
     P1_ST_R0_R2_0
 
+    ## Register primitives (LISP.md step 10b). Walk prim_table with
+    ## r4 = cursor, r5 = saved sym across make_primitive. Entry is
+    ## 40 bytes; zero name pointer ends the loop.
+    P1_LI_R4
+    &prim_table
+:_start_reg_prim_loop
+    P1_LD_R1_R4_0                       ## r1 = name ptr
+    P1_LI_R0
+    '00000000'
+    P1_LI_BR
+    &_start_reg_prim_done
+    P1_BEQ_R1_R0
+
+    P1_LD_R2_R4_8                       ## r2 = length
+    P1_LI_BR
+    &intern
+    P1_CALL                             ## r0 = tagged sym
+    P1_MOV_R5_R0
+
+    P1_LD_R1_R4_16                      ## r1 = code id
+    P1_LD_R2_R4_32                      ## r2 = arity
+    P1_LD_R3_R4_24                      ## r3 = type
+    P1_LI_BR
+    &make_primitive
+    P1_CALL                             ## r0 = tagged prim
+
+    P1_MOV_R2_R0
+    P1_MOV_R1_R5
+    P1_LI_BR
+    &gset
+    P1_CALL
+
+    P1_ADDI_R4_R4_40
+    P1_LI_BR
+    &_start_reg_prim_loop
+    P1_B
+:_start_reg_prim_done
+
     ## r4 = last_result. Initial unspec (0x27) so an empty source exits 0.
     P1_LI_R4
     '27000000'
@@ -1279,6 +1317,13 @@
     &read_expr_list
     P1_BEQ_R0_R1
 
+    ## '#' → hash-prefix literal (#t, #f for now).
+    P1_LI_R1
+    '23000000'
+    P1_LI_BR
+    &read_expr_hash
+    P1_BEQ_R0_R1
+
     ## digit → number.
     P1_MOV_R1_R0
     P1_LI_BR
@@ -1314,6 +1359,49 @@
     P1_EPILOGUE
     P1_RET
 
+## Hash-prefix literals: `#t` → 0x0F, `#f` → 0x17. Inherits read_expr's
+## PROLOGUE (no new frame). Extended syntax (#\char, #(…)) is step 11.
+:read_expr_hash
+    P1_LI_BR
+    &advance_char
+    P1_CALL                             ## eat '#'
+    P1_LI_BR
+    &peek_char
+    P1_CALL                             ## r0 = next char
+
+    P1_LI_R1
+    '74000000'                          ## 't'
+    P1_LI_BR
+    &read_expr_hash_t
+    P1_BEQ_R0_R1
+    P1_LI_R1
+    '66000000'                          ## 'f'
+    P1_LI_BR
+    &read_expr_hash_f
+    P1_BEQ_R0_R1
+
+    P1_LI_BR
+    &err_reader_bad
+    P1_B
+
+:read_expr_hash_t
+    P1_LI_BR
+    &advance_char
+    P1_CALL                             ## eat 't'
+    P1_LI_R0
+    '0F000000'                          ## #t singleton
+    P1_EPILOGUE
+    P1_RET
+
+:read_expr_hash_f
+    P1_LI_BR
+    &advance_char
+    P1_CALL                             ## eat 'f'
+    P1_LI_R0
+    '17000000'                          ## #f singleton
+    P1_EPILOGUE
+    P1_RET
+
 
 ## ---- Display --------------------------------------------------------
 
@@ -1440,6 +1528,20 @@
     &display_nil
     P1_BEQ_R1_R2
 
+    ## #t = 0x0F
+    P1_LI_R2
+    '0F000000'
+    P1_LI_BR
+    &display_true
+    P1_BEQ_R1_R2
+
+    ## #f = 0x17
+    P1_LI_R2
+    '17000000'
+    P1_LI_BR
+    &display_false
+    P1_BEQ_R1_R2
+
     ## Low-3-bit tag.
     P1_MOV_R2_R1
     P1_ANDI_R1_R1_7                     ## r1 = tag
@@ -1487,6 +1589,36 @@
     P1_EPILOGUE_N2
     P1_RET
 
+## #t and #f share display's PROLOGUE_N2 frame (reached via BEQ from
+## display or write — display_fixnum/symbol pattern).
+:display_true
+    P1_LI_R1
+    '23000000'                          ## '#'
+    P1_LI_BR
+    &putc
+    P1_CALL
+    P1_LI_R1
+    '74000000'                          ## 't'
+    P1_LI_BR
+    &putc
+    P1_CALL
+    P1_EPILOGUE_N2
+    P1_RET
+
+:display_false
+    P1_LI_R1
+    '23000000'                          ## '#'
+    P1_LI_BR
+    &putc
+    P1_CALL
+    P1_LI_R1
+    '66000000'                          ## 'f'
+    P1_LI_BR
+    &putc
+    P1_CALL
+    P1_EPILOGUE_N2
+    P1_RET
+
 :display_fixnum
     P1_MOV_R1_R2
     P1_SARI_R1_R1_3                     ## signed shift; value (assume non-negative here)
@@ -1645,6 +1777,18 @@
     &display_nil
     P1_BEQ_R1_R2                        ## nil
 
+    P1_LI_R2
+    '0F000000'
+    P1_LI_BR
+    &display_true
+    P1_BEQ_R1_R2                        ## #t
+
+    P1_LI_R2
+    '17000000'
+    P1_LI_BR
+    &display_false
+    P1_BEQ_R1_R2                        ## #f
+
     P1_MOV_R2_R1
     P1_ANDI_R1_R1_7                     ## r1 = tag
     P1_LI_R3
@@ -2129,6 +2273,40 @@
     P1_RET
 
 
+## ---- make_primitive(r1=code_id, r2=arity, r3=type) -> r0 = tagged --
+## 16-byte heap object: [header | code_id]. header byte 7 = type
+## (5=fixed, 6=variadic); byte 0 = arity. Shares tag 0b110 with
+## closures so apply's tag check still works; the type byte forks
+## there. Arity is ignored for variadic (type 6) — callers pass 0.
+:make_primitive
+    P1_PROLOGUE_N3
+    P1_MOV_R0_R3                        ## save type → r0 (frees r3)
+    P1_MOV_R3_SP
+    P1_ST_R1_R3_8                       ## slot 1 = code_id
+    P1_ST_R2_R3_16                      ## slot 2 = arity
+    P1_ST_R0_R3_24                      ## slot 3 = type
+
+    P1_LI_R1
+    '10000000'                          ## 16 bytes
+    P1_LI_BR
+    &alloc
+    P1_CALL                             ## r0 = raw ptr
+
+    P1_MOV_R3_SP
+    P1_LD_R1_R3_24
+    P1_SB_R1_R0_7                       ## byte 7 = type
+    P1_LD_R1_R3_16
+    P1_SB_R1_R0_0                       ## byte 0 = arity
+    P1_LD_R1_R3_8
+    P1_ST_R1_R0_8                       ## +8 = code id
+
+    P1_ORI_R0_R0_4
+    P1_ORI_R0_R0_2                      ## tag = 0b110 = 6
+
+    P1_EPILOGUE_N3
+    P1_RET
+
+
 ## ---- eval_args(r1=args, r2=env) -> r0 = evaluated args list --------
 ## Recursive map: eval each, cons the results left-to-right.
 :eval_args
@@ -2184,10 +2362,11 @@
 
 
 ## ---- apply(r1=callee, r2=args) -> r0 = result ----------------------
-## Only closures for now. Frame has 4 slots: callee-then-params,
-## args, body, closure-env. env_extend consumes the three params/
-## args/closure-env triple, then eval runs body under the extended
-## env.
+## Callee must have tag 0b110 (closure/prim band). Header byte 7
+## discriminates: 4=closure (existing path), 5=prim-fixed,
+## 6=prim-variadic. Frame has 4 slots: callee-then-params, args,
+## body, closure-env (prim paths repurpose slots 3/4 for code_id /
+## expected_arity).
 :apply
     P1_PROLOGUE_N4
     P1_MOV_R3_SP
@@ -2205,6 +2384,20 @@
     P1_LD_R1_R3_8
     P1_ADDI_R1_R1_NEG6                  ## r1 = raw ptr
 
+    ## Fork on header type byte.
+    P1_LB_R0_R1_7                       ## r0 = type
+    P1_LI_R2
+    '05000000'
+    P1_LI_BR
+    &apply_prim_fixed
+    P1_BEQ_R0_R2
+    P1_LI_R2
+    '06000000'
+    P1_LI_BR
+    &apply_prim_variadic
+    P1_BEQ_R0_R2
+    ## Fall through: type == 4 (closure).
+
     P1_LD_R0_R1_8                       ## params
     P1_ST_R0_R3_8                       ## slot 1 = params (overwrite)
     P1_LD_R0_R1_16                      ## body
@@ -2228,149 +2421,910 @@
     P1_TAIL_N4                          ## Scheme tail call: closure body
 
 
-## ---- eval(r1=expr, r2=env) -> r0 = value ---------------------------
-## Self-evaluating: nil/fixnum/string/closure. Symbols → lookup. Pairs
-## → eval_pair (special-form or application dispatch).
-:eval
-    P1_PROLOGUE_N3
-    P1_MOV_R3_SP
-    P1_ST_R1_R3_8                       ## slot 1 = expr
-    P1_ST_R2_R3_16                      ## slot 2 = env
-
-    P1_LI_R2
-    '07000000'
-    P1_LI_BR
-    &eval_self_slot1
-    P1_BEQ_R1_R2
-
-    P1_ANDI_R1_R1_7                     ## r1 = tag
-
-    P1_LI_R2
-    '01000000'
-    P1_LI_BR
-    &eval_self_slot1
-    P1_BEQ_R1_R2                        ## fixnum
+## ---- apply_prim_fixed (r1 = raw prim ptr, inside apply's frame) ----
+## Loads arity + code_id, marshals args, arity-checks, then unwinds
+## apply's frame and B's to prim_dispatch. The cascade there tail-
+## enters the body with r1=argc, r2=argv — body RETs directly to
+## apply's caller since EPILOGUE_N4 already restored lr.
+:apply_prim_fixed
+    ## r3 is still sp from the MOV above; slot 3/4 are free.
+    P1_LB_R0_R1_0                       ## r0 = expected arity
+    P1_ST_R0_R3_32                      ## slot 4 = expected arity
+    P1_LD_R0_R1_8                       ## r0 = code id
+    P1_ST_R0_R3_24                      ## slot 3 = code id
 
-    P1_LI_R2
-    '05000000'
+    P1_LD_R1_R3_16                      ## r1 = args
     P1_LI_BR
-    &eval_sym
-    P1_BEQ_R1_R2
+    &marshal_argv
+    P1_CALL                             ## r0 = argc, r2 = &prim_argv
 
-    P1_LI_R2
-    '02000000'
+    P1_MOV_R3_SP
+    P1_LD_R1_R3_32
     P1_LI_BR
-    &eval_pair
-    P1_BEQ_R1_R2
+    &err_arity
+    P1_BNE_R0_R1
 
-    ## Other tags (string, closure) self-evaluate.
+    P1_MOV_R1_R0
+    P1_LD_R3_R3_24                      ## r3 = code id
+    P1_EPILOGUE_N4
     P1_LI_BR
-    &eval_self_slot1
+    &prim_dispatch
     P1_B
 
-:eval_self_slot1
-    P1_MOV_R3_SP
-    P1_LD_R0_R3_8
-    P1_EPILOGUE_N3
-    P1_RET
 
-:eval_sym
-    P1_MOV_R3_SP
-    P1_LD_R1_R3_8
-    P1_LD_R2_R3_16
+## ---- apply_prim_variadic (r1 = raw prim ptr) ----------------------
+:apply_prim_variadic
+    P1_LD_R0_R1_8                       ## r0 = code id
+    P1_ST_R0_R3_24                      ## slot 3 = code id
+
+    P1_LD_R1_R3_16                      ## r1 = args
     P1_LI_BR
-    &lookup
-    P1_TAIL_N3
+    &marshal_argv
+    P1_CALL                             ## r0 = argc, r2 = &prim_argv
 
-:eval_pair
-    ## Compound expression. Dispatch on car against cached sym_*
-    ## pointers; otherwise treat as function application.
+    P1_MOV_R1_R0
     P1_MOV_R3_SP
-    P1_LD_R1_R3_8
+    P1_LD_R3_R3_24                      ## r3 = code id
+    P1_EPILOGUE_N4
     P1_LI_BR
-    &car
-    P1_CALL                             ## r0 = callee-expr
+    &prim_dispatch
+    P1_B
 
-    P1_LI_R1
-    &sym_quote
-    P1_LD_R1_R1_0
-    P1_LI_BR
-    &eval_quote
-    P1_BEQ_R0_R1
 
-    P1_LI_R1
-    &sym_if
-    P1_LD_R1_R1_0
-    P1_LI_BR
-    &eval_if
-    P1_BEQ_R0_R1
+## ---- marshal_argv(r1 = args list) -> r0 = argc, r2 = &prim_argv ---
+## Walks the tagged-pair list, storing each car into prim_argv[i*8].
+## Overflow beyond PRIM_ARGV_SLOTS is an error (err_too_many_args).
+:marshal_argv
+    P1_PROLOGUE
+    P1_LI_R2
+    &prim_argv                          ## r2 = cursor
+    P1_LI_R3
+    '00000000'                          ## r3 = count
 
-    P1_LI_R1
-    &sym_begin
-    P1_LD_R1_R1_0
+:marshal_argv_loop
+    P1_LI_R0
+    '07000000'
     P1_LI_BR
-    &eval_begin
-    P1_BEQ_R0_R1
+    &marshal_argv_done
+    P1_BEQ_R1_R0                        ## list == nil → done
 
-    P1_LI_R1
-    &sym_lambda
-    P1_LD_R1_R1_0
+    P1_LI_R0
+    '20000000'                          ## PRIM_ARGV_SLOTS = 32
     P1_LI_BR
-    &eval_lambda
-    P1_BEQ_R0_R1
+    &err_too_many_args
+    P1_BEQ_R3_R0
 
-    P1_LI_R1
-    &sym_define
-    P1_LD_R1_R1_0
+    P1_ADDI_R0_R1_NEG2                  ## r0 = raw pair ptr
+    P1_LD_R1_R0_0                       ## r1 = car (we'll load cdr next)
+    P1_ST_R1_R2_0                       ## *cursor = car
+    P1_LD_R1_R0_8                       ## r1 = cdr → next list
+    P1_ADDI_R2_R2_8                     ## cursor += 8
+    P1_ADDI_R3_R3_1                     ## count++
     P1_LI_BR
-    &eval_define
-    P1_BEQ_R0_R1
+    &marshal_argv_loop
+    P1_B
 
-    ## Application: callee = eval(callee-expr, env)
-    P1_MOV_R1_R0
-    P1_MOV_R3_SP
-    P1_LD_R2_R3_16
-    P1_LI_BR
-    &eval
-    P1_CALL                             ## r0 = callee value
+:marshal_argv_done
+    P1_MOV_R0_R3                        ## r0 = count
+    P1_LI_R2
+    &prim_argv                          ## r2 = buffer base
+    P1_EPILOGUE
+    P1_RET
 
-    P1_MOV_R3_SP
-    P1_ST_R0_R3_24                      ## slot 3 = callee
 
-    ## args = eval_args(cdr(expr), env)
-    P1_LD_R1_R3_8
+## ---- prim_dispatch(r1=argc, r2=argv, r3=code_id) -------------------
+## Cascade dispatch — r3 is matched against each code id, and on
+## match we B to the primitive body. Bodies are leaf: r1/r2 are
+## already set up, and RET returns to apply's caller (since apply
+## already ran EPILOGUE_N4 before branching here).
+:prim_dispatch
+    P1_LI_R0
+    '00000000'
     P1_LI_BR
-    &cdr
-    P1_CALL
-    P1_MOV_R1_R0
-    P1_MOV_R3_SP
-    P1_LD_R2_R3_16
+    &prim_add
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '01000000'
     P1_LI_BR
-    &eval_args
-    P1_CALL                             ## r0 = args list
-
-    P1_MOV_R2_R0
-    P1_MOV_R3_SP
-    P1_LD_R1_R3_24
+    &prim_sub
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '02000000'
     P1_LI_BR
-    &apply
-    P1_TAIL_N3                          ## Scheme tail call: application
-
-
-## ---- eval_quote / eval_if / eval_begin -----------------------------
-## All run inside eval's PROLOGUE_N3 frame: slot 1 = expr, slot 2 =
-## env, slot 3 = per-form scratch.
-:eval_quote
-    P1_MOV_R3_SP
-    P1_LD_R1_R3_8
+    &prim_mul
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '03000000'
     P1_LI_BR
-    &cdr
-    P1_CALL                             ## r0 = (x)
-    P1_MOV_R1_R0
+    &prim_div
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '04000000'
     P1_LI_BR
-    &car                                ## tail: r0 = x
-    P1_TAIL_N3
-
+    &prim_mod
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '05000000'
+    P1_LI_BR
+    &prim_numeq
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '06000000'
+    P1_LI_BR
+    &prim_lt
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '07000000'
+    P1_LI_BR
+    &prim_gt
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '08000000'
+    P1_LI_BR
+    &prim_le
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '09000000'
+    P1_LI_BR
+    &prim_ge
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '0A000000'
+    P1_LI_BR
+    &prim_zerop
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '0B000000'
+    P1_LI_BR
+    &prim_negativep
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '0C000000'
+    P1_LI_BR
+    &prim_positivep
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '0D000000'
+    P1_LI_BR
+    &prim_abs
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '0E000000'
+    P1_LI_BR
+    &prim_min
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '0F000000'
+    P1_LI_BR
+    &prim_max
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '10000000'
+    P1_LI_BR
+    &prim_bitand
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '11000000'
+    P1_LI_BR
+    &prim_bitor
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '12000000'
+    P1_LI_BR
+    &prim_bitxor
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '13000000'
+    P1_LI_BR
+    &prim_bitnot
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '14000000'
+    P1_LI_BR
+    &prim_ashift
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '15000000'
+    P1_LI_BR
+    &prim_numberp
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '16000000'
+    P1_LI_BR
+    &prim_symbolp
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_LI_BR
+    &prim_stringp
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '18000000'
+    P1_LI_BR
+    &prim_vectorp
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '19000000'
+    P1_LI_BR
+    &prim_procp
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '1A000000'
+    P1_LI_BR
+    &prim_eqp
+    P1_BEQ_R3_R0
+
+    P1_LI_BR
+    &err_bad_prim
+    P1_B
+
+
+## ---- Primitive bodies ----------------------------------------------
+## Convention: entry with r1=argc (raw int), r2=argv (ptr into the
+## prim_argv buffer of tagged values). Leaf code: may use r0-r3
+## freely; must not touch r4-r7 (still hold apply's caller's values
+## since EPILOGUE_N4 ran before this branch). End with RET — that
+## returns to apply's caller, because lr was restored by EPILOGUE_N4.
+
+## (+ ...) — variadic sum. Tagged fixnum (v<<3)|1: untag each, add
+## to decoded accumulator, retag at end.
+:prim_add
+    P1_LI_R3
+    '00000000'                          ## acc = 0
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2                     ## r1 = argv_end
+:prim_add_loop
+    P1_LI_BR
+    &prim_add_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_SARI_R0_R0_3
+    P1_ADD_R3_R3_R0
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_add_loop
+    P1_B
+:prim_add_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (- x) negates; (- x y ...) folds subtraction from the left.
+:prim_sub
+    P1_LI_R0
+    '01000000'
+    P1_LI_BR
+    &prim_sub_negate
+    P1_BEQ_R1_R0                        ## argc == 1 → unary negate
+
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3                     ## r3 = decoded first
+    P1_ADDI_R2_R2_8
+    P1_ADDI_R1_R1_NEG1
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2                     ## r1 = argv_end
+:prim_sub_loop
+    P1_LI_BR
+    &prim_sub_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_SARI_R0_R0_3
+    P1_SUB_R3_R3_R0
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_sub_loop
+    P1_B
+:prim_sub_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+:prim_sub_negate
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3
+    P1_LI_R0
+    '00000000'
+    P1_SUB_R3_R0_R3
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (* ...) — variadic product. Identity 1.
+:prim_mul
+    P1_LI_R3
+    '01000000'                          ## acc = 1
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2
+:prim_mul_loop
+    P1_LI_BR
+    &prim_mul_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_SARI_R0_R0_3
+    P1_MUL_R3_R3_R0
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_mul_loop
+    P1_B
+:prim_mul_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (/ x y) — binary integer division (signed).
+:prim_div
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3
+    P1_LD_R0_R2_8
+    P1_SARI_R0_R0_3
+    P1_DIV_R3_R3_R0
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (% x y) — binary signed remainder. Scheme calls this `modulo`
+## / `remainder`; we bind the primitive as `%` to stay fixnum-only
+## and avoid the sign-convention split.
+:prim_mod
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3
+    P1_LD_R0_R2_8
+    P1_SARI_R0_R0_3
+    P1_REM_R3_R3_R0
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (= x y) — binary fixnum equality. Because tagged fixnums share
+## the same low-3-bit tag (001) and the payload is bit-identical
+## for equal values, we compare the tagged words directly.
+:prim_numeq
+    P1_LD_R3_R2_0
+    P1_LD_R0_R2_8
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'                          ## #f
+    P1_RET
+
+## Shared "return #t" tail used by comparison/predicate primitives.
+:prim_true
+    P1_LI_R0
+    '0F000000'
+    P1_RET
+
+
+## (< x y). Tagged-fixnum order preserves decoded order (shift-by-3
+## is monotone for 61-bit signed values).
+:prim_lt
+    P1_LD_R3_R2_0
+    P1_LD_R0_R2_8
+    P1_LI_BR
+    &prim_true
+    P1_BLT_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (> x y) ≡ (< y x).
+:prim_gt
+    P1_LD_R3_R2_0
+    P1_LD_R0_R2_8
+    P1_LI_BR
+    &prim_true
+    P1_BLT_R0_R3
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (<= x y) ≡ !(y < x). Invert the BLT result.
+:prim_le
+    P1_LD_R3_R2_0
+    P1_LD_R0_R2_8
+    P1_LI_BR
+    &prim_le_false
+    P1_BLT_R0_R3
+    P1_LI_R0
+    '0F000000'                          ## #t
+    P1_RET
+:prim_le_false
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (>= x y) ≡ !(x < y).
+:prim_ge
+    P1_LD_R3_R2_0
+    P1_LD_R0_R2_8
+    P1_LI_BR
+    &prim_ge_false
+    P1_BLT_R3_R0
+    P1_LI_R0
+    '0F000000'
+    P1_RET
+:prim_ge_false
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (zero? x). Tagged fixnum 0 = 0x01.
+:prim_zerop
+    P1_LD_R3_R2_0
+    P1_LI_R0
+    '01000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (negative? x) — tagged-compare against 0x01.
+:prim_negativep
+    P1_LD_R3_R2_0
+    P1_LI_R0
+    '01000000'
+    P1_LI_BR
+    &prim_true
+    P1_BLT_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (positive? x).
+:prim_positivep
+    P1_LD_R3_R2_0
+    P1_LI_R0
+    '01000000'
+    P1_LI_BR
+    &prim_true
+    P1_BLT_R0_R3
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (abs x) — branch-on-sign; negate by 0 - v.
+:prim_abs
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3                     ## decode
+    P1_LI_R0
+    '00000000'
+    P1_LI_BR
+    &prim_abs_done
+    P1_BLT_R0_R3                        ## 0 < v → already positive
+    P1_SUB_R3_R0_R3                     ## v = -v
+:prim_abs_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (min ...) — variadic; first arg is seed, each later arg replaces
+## acc if strictly smaller. Comparing tagged fixnums directly keeps
+## the body untagging-free.
+:prim_min
+    P1_LD_R3_R2_0                       ## r3 = acc (tagged)
+    P1_ADDI_R2_R2_8
+    P1_ADDI_R1_R1_NEG1
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2
+:prim_min_loop
+    P1_LI_BR
+    &prim_min_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_LI_BR
+    &prim_min_skip
+    P1_BLT_R3_R0                        ## acc < x → keep acc
+    P1_MOV_R3_R0
+:prim_min_skip
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_min_loop
+    P1_B
+:prim_min_done
+    P1_MOV_R0_R3
+    P1_RET
+
+
+## (max ...) — acc replaces on strictly greater.
+:prim_max
+    P1_LD_R3_R2_0
+    P1_ADDI_R2_R2_8
+    P1_ADDI_R1_R1_NEG1
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2
+:prim_max_loop
+    P1_LI_BR
+    &prim_max_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_LI_BR
+    &prim_max_skip
+    P1_BLT_R0_R3                        ## x < acc → keep acc
+    P1_MOV_R3_R0
+:prim_max_skip
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_max_loop
+    P1_B
+:prim_max_done
+    P1_MOV_R0_R3
+    P1_RET
+
+
+## (bit-and ...) — variadic fold. Identity is all-ones; we seed the
+## decoded accumulator with -1 which ANDs as identity. Tag bits are
+## preserved through AND on tagged fixnums, but we still untag/retag
+## to match the shape of the other variadic bit ops.
+:prim_bitand
+    P1_LI_R3
+    '00000000'
+    P1_ADDI_R3_R3_NEG1                  ## r3 = -1
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2
+:prim_bitand_loop
+    P1_LI_BR
+    &prim_bitand_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_SARI_R0_R0_3
+    P1_AND_R3_R3_R0
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_bitand_loop
+    P1_B
+:prim_bitand_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (bit-or ...) — identity 0.
+:prim_bitor
+    P1_LI_R3
+    '00000000'
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2
+:prim_bitor_loop
+    P1_LI_BR
+    &prim_bitor_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_SARI_R0_R0_3
+    P1_OR_R3_R3_R0
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_bitor_loop
+    P1_B
+:prim_bitor_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (bit-xor ...) — identity 0.
+:prim_bitxor
+    P1_LI_R3
+    '00000000'
+    P1_SHLI_R1_R1_3
+    P1_ADD_R1_R1_R2
+:prim_bitxor_loop
+    P1_LI_BR
+    &prim_bitxor_done
+    P1_BEQ_R2_R1
+    P1_LD_R0_R2_0
+    P1_SARI_R0_R0_3
+    P1_XOR_R3_R3_R0
+    P1_ADDI_R2_R2_8
+    P1_LI_BR
+    &prim_bitxor_loop
+    P1_B
+:prim_bitxor_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (bit-not x) — ~x = -1 - x.
+:prim_bitnot
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3
+    P1_LI_R0
+    '00000000'
+    P1_ADDI_R0_R0_NEG1                  ## r0 = -1
+    P1_SUB_R3_R0_R3                     ## r3 = -1 - r3
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (arithmetic-shift n k) — k>=0 left shift, k<0 arithmetic right.
+:prim_ashift
+    P1_LD_R3_R2_0
+    P1_SARI_R3_R3_3                     ## r3 = n (decoded)
+    P1_LD_R0_R2_8
+    P1_SARI_R0_R0_3                     ## r0 = k (decoded, signed)
+    P1_LI_R1
+    '00000000'
+    P1_LI_BR
+    &prim_ashift_neg
+    P1_BLT_R0_R1                        ## k < 0 → right shift by -k
+    P1_SHL_R3_R3_R0
+    P1_LI_BR
+    &prim_ashift_done
+    P1_B
+:prim_ashift_neg
+    P1_SUB_R0_R1_R0                     ## r0 = -k
+    P1_SAR_R3_R3_R0
+:prim_ashift_done
+    P1_SHLI_R0_R3_3
+    P1_ORI_R0_R0_1
+    P1_RET
+
+
+## (number? x) — true iff tag is 1 (fixnum).
+:prim_numberp
+    P1_LD_R3_R2_0
+    P1_ANDI_R3_R3_7
+    P1_LI_R0
+    '01000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (symbol? x) — tag is 5.
+:prim_symbolp
+    P1_LD_R3_R2_0
+    P1_ANDI_R3_R3_7
+    P1_LI_R0
+    '05000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (string? x) — tag is 4.
+:prim_stringp
+    P1_LD_R3_R2_0
+    P1_ANDI_R3_R3_7
+    P1_LI_R0
+    '04000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (vector? x) — tag is 3.
+:prim_vectorp
+    P1_LD_R3_R2_0
+    P1_ANDI_R3_R3_7
+    P1_LI_R0
+    '03000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (procedure? x) — tag 6 AND header type ∈ {4,5,6}. Any of the three
+## heads (closure, prim-fixed, prim-variadic) count; the only other
+## tag-6 value shape (if we add one later) would need explicit
+## enumeration here.
+:prim_procp
+    P1_LD_R3_R2_0
+    P1_MOV_R0_R3
+    P1_ANDI_R0_R0_7
+    P1_LI_R1
+    '06000000'
+    P1_LI_BR
+    &prim_procp_false
+    P1_BNE_R0_R1                        ## tag != 6 → #f
+
+    P1_ADDI_R3_R3_NEG6                  ## strip tag
+    P1_LB_R0_R3_7                       ## r0 = type byte
+    P1_LI_R1
+    '04000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R0_R1
+    P1_LI_R1
+    '05000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R0_R1
+    P1_LI_R1
+    '06000000'
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R0_R1
+:prim_procp_false
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## (eq? x y) — pointer/bit identity. Fixnums, interned symbols, and
+## singletons all compare correctly this way; pairs/strings/vectors/
+## closures compare by heap address (distinct allocations are #f).
+:prim_eqp
+    P1_LD_R3_R2_0
+    P1_LD_R0_R2_8
+    P1_LI_BR
+    &prim_true
+    P1_BEQ_R3_R0
+    P1_LI_R0
+    '17000000'
+    P1_RET
+
+
+## ---- eval(r1=expr, r2=env) -> r0 = value ---------------------------
+## Self-evaluating: nil/fixnum/string/closure. Symbols → lookup. Pairs
+## → eval_pair (special-form or application dispatch).
+:eval
+    P1_PROLOGUE_N3
+    P1_MOV_R3_SP
+    P1_ST_R1_R3_8                       ## slot 1 = expr
+    P1_ST_R2_R3_16                      ## slot 2 = env
+
+    P1_LI_R2
+    '07000000'
+    P1_LI_BR
+    &eval_self_slot1
+    P1_BEQ_R1_R2
+
+    P1_ANDI_R1_R1_7                     ## r1 = tag
+
+    P1_LI_R2
+    '01000000'
+    P1_LI_BR
+    &eval_self_slot1
+    P1_BEQ_R1_R2                        ## fixnum
+
+    P1_LI_R2
+    '05000000'
+    P1_LI_BR
+    &eval_sym
+    P1_BEQ_R1_R2
+
+    P1_LI_R2
+    '02000000'
+    P1_LI_BR
+    &eval_pair
+    P1_BEQ_R1_R2
+
+    ## Other tags (string, closure) self-evaluate.
+    P1_LI_BR
+    &eval_self_slot1
+    P1_B
+
+:eval_self_slot1
+    P1_MOV_R3_SP
+    P1_LD_R0_R3_8
+    P1_EPILOGUE_N3
+    P1_RET
+
+:eval_sym
+    P1_MOV_R3_SP
+    P1_LD_R1_R3_8
+    P1_LD_R2_R3_16
+    P1_LI_BR
+    &lookup
+    P1_TAIL_N3
+
+:eval_pair
+    ## Compound expression. Dispatch on car against cached sym_*
+    ## pointers; otherwise treat as function application.
+    P1_MOV_R3_SP
+    P1_LD_R1_R3_8
+    P1_LI_BR
+    &car
+    P1_CALL                             ## r0 = callee-expr
+
+    P1_LI_R1
+    &sym_quote
+    P1_LD_R1_R1_0
+    P1_LI_BR
+    &eval_quote
+    P1_BEQ_R0_R1
+
+    P1_LI_R1
+    &sym_if
+    P1_LD_R1_R1_0
+    P1_LI_BR
+    &eval_if
+    P1_BEQ_R0_R1
+
+    P1_LI_R1
+    &sym_begin
+    P1_LD_R1_R1_0
+    P1_LI_BR
+    &eval_begin
+    P1_BEQ_R0_R1
+
+    P1_LI_R1
+    &sym_lambda
+    P1_LD_R1_R1_0
+    P1_LI_BR
+    &eval_lambda
+    P1_BEQ_R0_R1
+
+    P1_LI_R1
+    &sym_define
+    P1_LD_R1_R1_0
+    P1_LI_BR
+    &eval_define
+    P1_BEQ_R0_R1
+
+    ## Application: callee = eval(callee-expr, env)
+    P1_MOV_R1_R0
+    P1_MOV_R3_SP
+    P1_LD_R2_R3_16
+    P1_LI_BR
+    &eval
+    P1_CALL                             ## r0 = callee value
+
+    P1_MOV_R3_SP
+    P1_ST_R0_R3_24                      ## slot 3 = callee
+
+    ## args = eval_args(cdr(expr), env)
+    P1_LD_R1_R3_8
+    P1_LI_BR
+    &cdr
+    P1_CALL
+    P1_MOV_R1_R0
+    P1_MOV_R3_SP
+    P1_LD_R2_R3_16
+    P1_LI_BR
+    &eval_args
+    P1_CALL                             ## r0 = args list
+
+    P1_MOV_R2_R0
+    P1_MOV_R3_SP
+    P1_LD_R1_R3_24
+    P1_LI_BR
+    &apply
+    P1_TAIL_N3                          ## Scheme tail call: application
+
+
+## ---- eval_quote / eval_if / eval_begin -----------------------------
+## All run inside eval's PROLOGUE_N3 frame: slot 1 = expr, slot 2 =
+## env, slot 3 = per-form scratch.
+:eval_quote
+    P1_MOV_R3_SP
+    P1_LD_R1_R3_8
+    P1_LI_BR
+    &cdr
+    P1_CALL                             ## r0 = (x)
+    P1_MOV_R1_R0
+    P1_LI_BR
+    &car                                ## tail: r0 = x
+    P1_TAIL_N3
+
 :eval_if
     ## (if cond then else). Save (then else) tail into slot 3, eval
     ## cond, branch to the correct arm.
@@ -2623,6 +3577,24 @@
     &error
     P1_B
 
+:err_too_many_args
+    P1_LI_R1
+    &msg_too_many_args
+    P1_LI_R2
+    '15000000'                          ## strlen("primitive argc > 32") == 21
+    P1_LI_BR
+    &error
+    P1_B
+
+:err_bad_prim
+    P1_LI_R1
+    &msg_bad_prim
+    P1_LI_R2
+    '14000000'                          ## strlen("unknown primitive id") == 20
+    P1_LI_BR
+    &error
+    P1_B
+
 
 ## ---- Static strings -------------------------------------------------
 :msg_error_prefix
@@ -2654,6 +3626,10 @@
 "failed to open source file"
 :msg_src_too_big
 "source file too large"
+:msg_too_many_args
+"primitive argc > 32"
+:msg_bad_prim
+"unknown primitive id"
 
 ## Interned name samples used by the self-test.
 :str_foo
@@ -2675,6 +3651,267 @@
 "define"
 
 
+## Primitive name strings (step 10b). The registration table below
+## holds (ptr, len, code_id, type, arity) for each. _start walks the
+## table, interns each name, and binds a freshly-built primitive into
+## global_env.
+:str_prim_plus
+"+"
+:str_prim_minus
+"-"
+:str_prim_mul
+"*"
+:str_prim_div
+"/"
+:str_prim_mod
+"%"
+:str_prim_numeq
+"="
+:str_prim_lt
+"<"
+:str_prim_gt
+">"
+:str_prim_le
+"<="
+:str_prim_ge
+">="
+:str_prim_zerop
+"zero?"
+:str_prim_negativep
+"negative?"
+:str_prim_positivep
+"positive?"
+:str_prim_abs
+"abs"
+:str_prim_min
+"min"
+:str_prim_max
+"max"
+:str_prim_bitand
+"bit-and"
+:str_prim_bitor
+"bit-or"
+:str_prim_bitxor
+"bit-xor"
+:str_prim_bitnot
+"bit-not"
+:str_prim_ashift
+"arithmetic-shift"
+:str_prim_numberp
+"number?"
+:str_prim_symbolp
+"symbol?"
+:str_prim_stringp
+"string?"
+:str_prim_vectorp
+"vector?"
+:str_prim_procp
+"procedure?"
+:str_prim_eqp
+"eq?"
+
+
+## Registration table. 40-byte records: ptr(8) + len(8) + code_id(8) +
+## type(8) + arity(8). End-sentinel = zero name pointer. _start iterates
+## with ADDI +40.
+:prim_table
+## (+ ...) variadic
+&str_prim_plus
+'00000000'
+'0100000000000000'
+'0000000000000000'
+'0600000000000000'
+'0000000000000000'
+## (- x ...) variadic (fixes unary-negate branch in body)
+&str_prim_minus
+'00000000'
+'0100000000000000'
+'0100000000000000'
+'0600000000000000'
+'0000000000000000'
+## (* ...) variadic
+&str_prim_mul
+'00000000'
+'0100000000000000'
+'0200000000000000'
+'0600000000000000'
+'0000000000000000'
+## (/ x y) fixed 2
+&str_prim_div
+'00000000'
+'0100000000000000'
+'0300000000000000'
+'0500000000000000'
+'0200000000000000'
+## (% x y) fixed 2
+&str_prim_mod
+'00000000'
+'0100000000000000'
+'0400000000000000'
+'0500000000000000'
+'0200000000000000'
+## (= x y) fixed 2
+&str_prim_numeq
+'00000000'
+'0100000000000000'
+'0500000000000000'
+'0500000000000000'
+'0200000000000000'
+## (< x y) fixed 2
+&str_prim_lt
+'00000000'
+'0100000000000000'
+'0600000000000000'
+'0500000000000000'
+'0200000000000000'
+## (> x y) fixed 2
+&str_prim_gt
+'00000000'
+'0100000000000000'
+'0700000000000000'
+'0500000000000000'
+'0200000000000000'
+## (<= x y) fixed 2
+&str_prim_le
+'00000000'
+'0200000000000000'
+'0800000000000000'
+'0500000000000000'
+'0200000000000000'
+## (>= x y) fixed 2
+&str_prim_ge
+'00000000'
+'0200000000000000'
+'0900000000000000'
+'0500000000000000'
+'0200000000000000'
+## (zero? x) fixed 1
+&str_prim_zerop
+'00000000'
+'0500000000000000'
+'0A00000000000000'
+'0500000000000000'
+'0100000000000000'
+## (negative? x) fixed 1
+&str_prim_negativep
+'00000000'
+'0900000000000000'
+'0B00000000000000'
+'0500000000000000'
+'0100000000000000'
+## (positive? x) fixed 1
+&str_prim_positivep
+'00000000'
+'0900000000000000'
+'0C00000000000000'
+'0500000000000000'
+'0100000000000000'
+## (abs x) fixed 1
+&str_prim_abs
+'00000000'
+'0300000000000000'
+'0D00000000000000'
+'0500000000000000'
+'0100000000000000'
+## (min ...) variadic — arity 0 ignored, but ≥1 enforced by body load
+&str_prim_min
+'00000000'
+'0300000000000000'
+'0E00000000000000'
+'0600000000000000'
+'0000000000000000'
+## (max ...) variadic
+&str_prim_max
+'00000000'
+'0300000000000000'
+'0F00000000000000'
+'0600000000000000'
+'0000000000000000'
+## (bit-and ...) variadic
+&str_prim_bitand
+'00000000'
+'0700000000000000'
+'1000000000000000'
+'0600000000000000'
+'0000000000000000'
+## (bit-or ...) variadic
+&str_prim_bitor
+'00000000'
+'0600000000000000'
+'1100000000000000'
+'0600000000000000'
+'0000000000000000'
+## (bit-xor ...) variadic
+&str_prim_bitxor
+'00000000'
+'0700000000000000'
+'1200000000000000'
+'0600000000000000'
+'0000000000000000'
+## (bit-not x) fixed 1
+&str_prim_bitnot
+'00000000'
+'0700000000000000'
+'1300000000000000'
+'0500000000000000'
+'0100000000000000'
+## (arithmetic-shift n k) fixed 2
+&str_prim_ashift
+'00000000'
+'1000000000000000'
+'1400000000000000'
+'0500000000000000'
+'0200000000000000'
+## (number? x) fixed 1
+&str_prim_numberp
+'00000000'
+'0700000000000000'
+'1500000000000000'
+'0500000000000000'
+'0100000000000000'
+## (symbol? x) fixed 1
+&str_prim_symbolp
+'00000000'
+'0700000000000000'
+'1600000000000000'
+'0500000000000000'
+'0100000000000000'
+## (string? x) fixed 1
+&str_prim_stringp
+'00000000'
+'0700000000000000'
+'1700000000000000'
+'0500000000000000'
+'0100000000000000'
+## (vector? x) fixed 1
+&str_prim_vectorp
+'00000000'
+'0700000000000000'
+'1800000000000000'
+'0500000000000000'
+'0100000000000000'
+## (procedure? x) fixed 1
+&str_prim_procp
+'00000000'
+'0A00000000000000'
+'1900000000000000'
+'0500000000000000'
+'0100000000000000'
+## (eq? x y) fixed 2
+&str_prim_eqp
+'00000000'
+'0300000000000000'
+'1A00000000000000'
+'0500000000000000'
+'0200000000000000'
+## End sentinel: zero name pointer.
+'0000000000000000'
+'0000000000000000'
+'0000000000000000'
+'0000000000000000'
+'0000000000000000'
+
+
 ## ---- Special-form symbol slots --------------------------------------
 ## Zero-initialized; _start populates each slot with the interned
 ## tagged-symbol pointer so eval_pair can dispatch by pointer identity.
@@ -2703,6 +3940,22 @@
 '00000000'
 
 
+## ---- Primitive argv scratch buffer (32 slots × 8B = 256B) -----------
+## apply fills this with tagged values before cascading to a primitive
+## body. 32 slots is generous for the step-10 surface; extend if later
+## primitives need more. Zeroed so a stray read sees the 0 tagged
+## sentinel (not a valid value — harmless).
+:prim_argv
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+'0000000000000000000000000000000000000000000000000000000000000000'
+
+
 ## ---- Symbol table (4096 slots × 8 bytes = 32 KiB) -------------------
 ## Open-addressing hash table. Empty slot = 0 (no valid tagged value
 ## is 0). LISP.md §GC §Roots makes this a named BSS root.
diff --git a/src/p1_gen.py b/src/p1_gen.py
@@ -852,9 +852,10 @@ SYS_NUM = {
 ## plus a little headroom. Extend when a new value appears in P1 src.
 
 ## ADDI imms. NEG48/48 handle the ASCII '0' bias; the rest cover tag
-## stripping and loop counters. Full reg product × this set = 8²×N.
+## stripping, loop counters, and — at 40 — the primitive-registration
+## table's per-record stride. Full reg product × this set = 8²×N.
 ADDI_IMMS = (-48, -8, -7, -6, -5, -4, -3, -2, -1,
-             1, 2, 3, 4, 5, 6, 7, 8, 48)
+             1, 2, 3, 4, 5, 6, 7, 8, 40, 48)
 
 ## Shift amounts (for SHLI/SHRI/SARI). 32/52 implement low-N-bit masks
 ## (length field extraction; 4096-slot symbol-table index); the small
diff --git a/tests/lisp/10-arith.expected b/tests/lisp/10-arith.expected
@@ -0,0 +1 @@
+42
diff --git a/tests/lisp/10-arith.scm b/tests/lisp/10-arith.scm
@@ -0,0 +1,17 @@
+;; Step-10c arithmetic primitives: + - * / %.
+;;
+;; `+` and `*` are variadic (identity on 0 args); `-` folds left; `/`
+;; and `%` are strictly binary (signed truncating divide and remainder,
+;; matching SARI + DIV/REM semantics). The nested `if` ladder exits
+;; with the witness fixnum 42 only when every check holds; a single
+;; wrong primitive short-circuits to 0 and the diff fails loudly.
+(if (= (+) 0)
+  (if (= (+ 7) 7)
+    (if (= (+ 1 2 3 4) 10)
+      (if (= (*) 1)
+        (if (= (* 6) 6)
+          (if (= (* 2 3 4) 24)
+            (if (= (- 10 3 2) 5)
+              (if (= (/ 100 5) 20)
+                (if (= (% 17 5) 2)
+                  42 0) 0) 0) 0) 0) 0) 0) 0) 0)
diff --git a/tests/lisp/11-compare.expected b/tests/lisp/11-compare.expected
@@ -0,0 +1 @@
+42
diff --git a/tests/lisp/11-compare.scm b/tests/lisp/11-compare.scm
@@ -0,0 +1,16 @@
+;; Step-10c numeric comparison primitives: = < > <= >=.
+;;
+;; All binary; result is #t or #f (step-10c singletons). The ladder
+;; nests #t-guards and witnesses the #f-side by inverting with a `0`
+;; tail, so a primitive that returns the wrong singleton short-circuits.
+(if (= 5 5)
+  (if (< 3 5)
+    (if (> 5 3)
+      (if (<= 5 5)
+        (if (<= 4 5)
+          (if (>= 5 5)
+            (if (>= 6 5)
+              (if (if (= 5 6) 0 1)
+                (if (if (< 5 3) 0 1)
+                  (if (if (>= 3 5) 0 1)
+                    42 0) 0) 0) 0) 0) 0) 0) 0) 0) 0)
diff --git a/tests/lisp/12-numpred.expected b/tests/lisp/12-numpred.expected
@@ -0,0 +1 @@
+42
diff --git a/tests/lisp/12-numpred.scm b/tests/lisp/12-numpred.scm
@@ -0,0 +1,20 @@
+;; Step-10c numeric predicates + numeric functions.
+;;
+;; zero? / positive? / negative? / number? return singletons;
+;; abs / min / max return fixnums. Step 11 brings negative literals
+;; and negative-fixnum printing, so we synthesize negatives here via
+;; `(- 0 n)` / `(bit-not 0)` and witness them through predicates
+;; rather than printing the raw value.
+(if (zero? 0)
+  (if (if (zero? 5) 0 1)
+    (if (positive? 5)
+      (if (if (positive? 0) 0 1)
+        (if (negative? (- 0 3))
+          (if (if (negative? 5) 0 1)
+            (if (number? 42)
+              (if (if (number? (quote x)) 0 1)
+                (if (= (abs 7) 7)
+                  (if (= (abs (- 0 9)) 9)
+                    (if (= (min 5 3 9 2 8) 2)
+                      (if (= (max 1 7 4 3) 7)
+                        42 0) 0) 0) 0) 0) 0) 0) 0) 0) 0) 0) 0)
diff --git a/tests/lisp/13-bitwise.expected b/tests/lisp/13-bitwise.expected
@@ -0,0 +1 @@
+42
diff --git a/tests/lisp/13-bitwise.scm b/tests/lisp/13-bitwise.scm
@@ -0,0 +1,19 @@
+;; Step-10c bitwise primitives: bit-and bit-or bit-xor bit-not
+;; arithmetic-shift.
+;;
+;; bit-and/or/xor are variadic (identity on 0 args: -1, 0, 0);
+;; bit-not is unary; arithmetic-shift is binary (k<0 is right shift).
+;; Step-11 reader/printer deferrals mean we can't literalize -1 or
+;; a negative shift; we synthesize both via `(bit-not 0)` and witness
+;; through non-negative results.
+(if (= (bit-and) (bit-not 0))
+  (if (= (bit-and 15) 15)
+    (if (= (bit-and 15 6) 6)
+      (if (= (bit-or) 0)
+        (if (= (bit-or 1 2 4) 7)
+          (if (= (bit-xor 15 6) 9)
+            (if (= (bit-xor 15 15) 0)
+              (if (= (bit-not (bit-not 42)) 42)
+                (if (= (arithmetic-shift 1 3) 8)
+                  (if (= (arithmetic-shift 16 (bit-not 1)) 4)
+                    42 0) 0) 0) 0) 0) 0) 0) 0) 0) 0)
diff --git a/tests/lisp/14-tagpred.expected b/tests/lisp/14-tagpred.expected
@@ -0,0 +1 @@
+42
diff --git a/tests/lisp/14-tagpred.scm b/tests/lisp/14-tagpred.scm
@@ -0,0 +1,18 @@
+;; Step-10c tag predicates + eq?.
+;;
+;; symbol?/string?/vector?/procedure? inspect the tag (and for vector
+;; the header type byte). Strings and vectors don't exist as literals
+;; yet (steps 10e/10f), so their #f case is tested against a fixnum.
+;; `procedure?` accepts both closures and primitives (same tag band).
+;; `eq?` is pointer/value identity — interning makes symbol eq? work.
+(if (symbol? (quote foo))
+  (if (if (symbol? 42) 0 1)
+    (if (if (string? 42) 0 1)
+      (if (if (vector? 42) 0 1)
+        (if (procedure? (lambda (x) x))
+          (if (procedure? +)
+            (if (if (procedure? 42) 0 1)
+              (if (eq? (quote a) (quote a))
+                (if (if (eq? (quote a) (quote b)) 0 1)
+                  (if (eq? 5 5)
+                    42 0) 0) 0) 0) 0) 0) 0) 0) 0) 0)

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs

M	docs/LISP.md	\|	101	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------
M	src/lisp.M1	\|	1489	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
M	src/p1_gen.py	\|	5	+++--
A	tests/lisp/10-arith.expected	\|	1	+
A	tests/lisp/10-arith.scm	\|	17	+++++++++++++++++
A	tests/lisp/11-compare.expected	\|	1	+
A	tests/lisp/11-compare.scm	\|	16	++++++++++++++++
A	tests/lisp/12-numpred.expected	\|	1	+
A	tests/lisp/12-numpred.scm	\|	20	++++++++++++++++++++
A	tests/lisp/13-bitwise.expected	\|	1	+
A	tests/lisp/13-bitwise.scm	\|	19	+++++++++++++++++++
A	tests/lisp/14-tagpred.expected	\|	1	+
A	tests/lisp/14-tagpred.scm	\|	18	++++++++++++++++++