commit 30273099f630eb2e8f787c413fc863ad2e44edfa
parent d19a402ecc3df0f912aecf4ec2a8d400f26c5f05
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Fri, 24 Apr 2026 09:03:03 -0700
Hide P1 frame header and merge LEAVE+RET into ERET
Portable sp after ENTER points to the frame-local base; the saved
retaddr and saved caller sp become backend-private. LEAVE is dropped
as a standalone op — ERET atomically tears down the frame and returns,
mirroring TAIL/TAILR which already bundle the epilogue. Leaves still
return with bare RET.
Diffstat:
12 files changed, 505 insertions(+), 303 deletions(-)
diff --git a/docs/P1.md b/docs/P1.md
@@ -47,7 +47,7 @@ So the notation in this document is descriptive rather than literal:
opcodes
- `BR rs`, `CALLR rs`, and `TAILR rs` mean register-specific control-flow
opcodes
-- `LEAVE`, `CALL`, `RET`, `TAIL`, `B`, and `SYSCALL` remain operand-free
+- `ERET`, `CALL`, `RET`, `TAIL`, `B`, and `SYSCALL` remain operand-free
Labels still appear in source where the toolchain supports them directly, such
as `LA rd, %label` and `LA_BR %label`.
@@ -176,8 +176,8 @@ those words in the first `m` words of its frame-local storage immediately
before the call:
```
-[sp + 2*WORD + 0*WORD] = outgoing arg word 0
-[sp + 2*WORD + 1*WORD] = outgoing arg word 1
+[sp + 0*WORD] = outgoing arg word 0
+[sp + 1*WORD] = outgoing arg word 1
...
```
@@ -191,29 +191,30 @@ addressed prefix available for outgoing argument staging across the call.
### Standard frame layout
-Functions that need local stack storage use a standard frame layout. After
-frame establishment:
+Functions that need local stack storage establish a standard frame with
+`ENTER size`. After frame establishment, the portable-visible frame-local
+storage occupies the first `size` bytes above `sp`:
```
-[sp + 0*WORD] = saved return address
-[sp + 1*WORD] = saved caller stack pointer
-[sp + 2*WORD ... sp + 2*WORD + local_bytes - 1] = frame-local storage
-...
+[sp + 0 ... sp + size - 1] = frame-local storage
```
Frame-local storage is byte-addressed. Portable code may use it for ordinary
locals, spilled callee-saved registers, and the caller-staged outgoing
stack-argument words described above.
-Total frame size is:
+Each frame also carries backend-private per-frame state — typically the
+saved return continuation, saved caller `sp`, and any padding needed to
+satisfy `STACK_ALIGN`. That state is not addressable by portable source,
+and the backend chooses its layout and total allocation size.
-`round_up(STACK_ALIGN, 2*WORD_SIZE + local_bytes)`
+Word sizes:
-Where:
+- `WORD = 8` in P1v2-64
+- `WORD = 4` in P1v2-32
-- `WORD_SIZE = 8` in P1v2-64
-- `WORD_SIZE = 4` in P1v2-32
-- `STACK_ALIGN` is target-defined and must satisfy the native call ABI
+`STACK_ALIGN` is target-defined and must satisfy the native call ABI at
+every call boundary.
Leaf functions that need no frame-local storage may omit the frame entirely.
@@ -235,8 +236,8 @@ Leaf functions that need no frame-local storage may omit the frame entirely.
| Memory | `LD`, `ST`, `LB`, `SB` |
| ABI access | `LDARG` |
| Branching | `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ` |
-| Calls / returns | `CALL`, `CALLR`, `RET`, `TAIL`, `TAILR` |
-| Frame management | `ENTER`, `LEAVE` |
+| Calls / returns | `CALL`, `CALLR`, `RET`, `ERET`, `TAIL`, `TAILR` |
+| Frame management | `ENTER` |
| System | `SYSCALL` |
## Immediates
@@ -284,9 +285,14 @@ instruction after the `CALL`. `CALL` requires an active standard frame.
the code pointer value held in `rs` and establishes the same return
continuation semantics as `CALL`.
-`RET` returns through the current return continuation. `RET` is valid whether
-or not the current function has established a standard frame, provided any
-frame established by the function has already been torn down.
+`RET` returns from a leaf function through the hidden return continuation
+captured at call time. `RET` is valid only when the current function has no
+active standard frame.
+
+`ERET` returns from a function that has an active standard frame. It
+performs the standard epilogue — restoring `sp` and the hidden return
+continuation — and then returns to the caller. `ERET` is valid only when
+the current function has an active standard frame.
`TAIL` is a tail call to the target most recently loaded by `LA_BR`. It is
valid only when the current function has an active standard frame. `TAIL`
@@ -300,7 +306,7 @@ current function has an active standard frame.
Because stack-passed outgoing argument words are staged in the caller's own
frame-local storage, `TAIL` and `TAILR` are portable only when the tail-called
callee requires no stack-passed argument words. Portable compilers must lower
-other tail-call cases to an ordinary `CALL` / `RET` sequence.
+other tail-call cases to an ordinary `CALL` / `ERET` sequence.
Portable source must treat the return continuation as hidden machine state. It
must not assume that the return address lives in any exposed register or stack
@@ -309,40 +315,32 @@ establishment.
### Prologue / Epilogue
-P1 v2 defines the following frame-establishment and frame-teardown operations:
-
-- `ENTER size`
-- `LEAVE`
+P1 v2 has a single frame-establishment op, `ENTER size`. Frame teardown is
+not a standalone op; it is embedded in `ERET`, `TAIL`, and `TAILR`.
-`ENTER size` establishes the standard frame layout with `size` bytes of
-frame-local storage:
+`ENTER size` establishes a standard frame with `size` bytes of frame-local
+storage. After it executes:
```
-[sp + 0*WORD] = saved return address
-[sp + 1*WORD] = saved caller stack pointer
-[sp + 2*WORD ... sp + 2*WORD + size - 1] = frame-local storage
+[sp + 0 ... sp + size - 1] = frame-local storage
```
-The total allocation size is:
-
-`round_up(STACK_ALIGN, 2*WORD_SIZE + size)`
-
-The named frame-local bytes are the usable local storage. Any additional bytes
-introduced by alignment rounding are padding, not extra local bytes.
-
-`LEAVE` tears down the current standard frame and restores the hidden return
-continuation so that a subsequent `RET` returns correctly.
+Any backend-private per-frame state (saved return continuation, saved
+caller `sp`, alignment padding) lives outside the portable-visible `size`
+bytes. Portable source may not address it.
-Because every standard frame stores the saved caller stack pointer at
-`[sp + 1*WORD]`, `LEAVE` does not need to know the frame-local byte count used
-by the corresponding `ENTER`.
+`ERET`, `TAIL`, and `TAILR` each perform the standard epilogue — restoring
+`sp` and the hidden return continuation — and then transfer control: `ERET`
+to the caller, `TAIL` to the target in `br`, and `TAILR` to the target in
+`rs`. Portable source must use one of these ops (not `RET`) to exit a
+function that has established a frame.
-A function may omit `ENTER` / `LEAVE` entirely if it is a leaf and needs no
-standard frame.
+A function may omit `ENTER` entirely if it is a leaf and needs no frame.
+Such a function exits with `RET`.
-`ENTER` and `LEAVE` do not implicitly save or restore `s0` or `s1`. A
-function that modifies `s0` or `s1` must preserve them explicitly, typically by
-storing them in frame-local storage within its standard frame.
+`ENTER` does not implicitly save or restore `s0`-`s3`. A function that
+modifies any callee-saved register must preserve it explicitly, typically
+by storing it in frame-local storage within its standard frame.
### Branching
@@ -419,7 +417,8 @@ register.
Portable source may also read the current stack pointer through `MOV rd, sp`.
Portable source may not write `sp` through `MOV`. Stack-pointer updates are only
-performed by `ENTER`, `LEAVE`, and backend-private call/return machinery.
+performed by `ENTER`, `ERET`, `TAIL`, `TAILR`, and backend-private call/return
+machinery.
`LI` materializes an integer bit-pattern. `LA` materializes the address of a
label. `LA_BR` is a separate control-flow-target materialization form and is not
@@ -483,7 +482,7 @@ At entry to `p1_main`, the native entry-stack layout has already been
consumed by the backend stub. Portable source may not assume anything
about the `sp` value inherited from `_start` except that it satisfies
the call-boundary alignment rule and that the standard frame protocol
-(`ENTER` / `LEAVE`) works correctly from it.
+(`ENTER` / `ERET`) works correctly from it.
`p1_main` may return normally, or it may call `sys_exit` directly at
any point.
diff --git a/m1pp/m1pp.M1 b/m1pp/m1pp.M1
@@ -20,7 +20,7 @@
## without emitting output.
##
## P1v2 ABI: a0..a3 arg/return, t0..t2 caller-saved temps, s0..s3 callee-saved
-## (unused here). Non-leaf functions use enter_0 / leave. _start has no frame;
+## (unused here). Non-leaf functions use enter_0 / eret. _start has no frame;
## the kernel-supplied SP carries argv/argc directly.
## --- Constants & sizing ------------------------------------------------------
@@ -114,13 +114,13 @@ DEFINE EXPR_INVALID 1200000000000000
:_start
# if (argc < 3) usage
- ld_a0,sp,0
+ ld_a0,sp,neg16
li_a1 %3 %0
la_br &err_usage
blt_a0,a1
# output_path = argv[2]
- ld_t0,sp,24
+ ld_t0,sp,8
la_a0 &output_path
st_t0,a0,0
@@ -140,7 +140,7 @@ DEFINE EXPR_INVALID 1200000000000000
# input_fd = openat(AT_FDCWD, argv[1], O_RDONLY, 0)
li_a0 sys_openat
li_a1 AT_FDCWD
- ld_a2,sp,16
+ ld_a2,sp,0
li_a3 %0 %0
li_t0 %0 %0
syscall
@@ -719,8 +719,7 @@ DEFINE EXPR_INVALID 1200000000000000
b
:lex_done
- leave
- ret
+ eret
## --- Output: normalized token stream to output_buf ---------------------------
## emit_newline writes '\n' and clears output_need_space.
@@ -855,7 +854,7 @@ DEFINE EXPR_INVALID 1200000000000000
call
la_br &proc_done
beqz_a0
- st_a0,sp,16
+ st_a0,sp,0
# if (s->pos == s->end) pop and continue
ld_t0,a0,16
@@ -864,7 +863,7 @@ DEFINE EXPR_INVALID 1200000000000000
beq_t0,t1
# tok = s->pos
- st_t0,sp,24
+ st_t0,sp,8
# ---- line_start && tok->kind == TOK_WORD && tok eq "%macro" ----
ld_a1,a0,24
@@ -888,7 +887,7 @@ DEFINE EXPR_INVALID 1200000000000000
# holds in practice (line_start in expansion streams is cleared
# before any %macro could matter). After it returns we copy
# proc_pos back into s->pos and set s->line_start = 1.
- ld_t0,sp,24
+ ld_t0,sp,8
la_a0 &proc_pos
st_t0,a0,0
la_a0 &proc_line_start
@@ -896,7 +895,7 @@ DEFINE EXPR_INVALID 1200000000000000
st_a1,a0,0
la_br &define_macro
call
- ld_a0,sp,16
+ ld_a0,sp,0
la_a1 &proc_pos
ld_t0,a1,0
st_t0,a0,16
@@ -909,7 +908,7 @@ DEFINE EXPR_INVALID 1200000000000000
## The %macro guard above already proved line_start && kind == TOK_WORD; if
## we reach here via a %macro non-match, those gates still hold.
:proc_check_struct
- ld_t0,sp,24
+ ld_t0,sp,8
mov_a0,t0
la_a1 &const_struct
li_a2 %7 %0
@@ -919,7 +918,7 @@ DEFINE EXPR_INVALID 1200000000000000
beqz_a0
# %struct matched: shim into define_fielded(stride=8, total="SIZE", len=4)
- ld_t0,sp,24
+ ld_t0,sp,8
la_a0 &proc_pos
st_t0,a0,0
la_a0 &proc_line_start
@@ -930,7 +929,7 @@ DEFINE EXPR_INVALID 1200000000000000
li_a2 %4 %0
la_br &define_fielded
call
- ld_a0,sp,16
+ ld_a0,sp,0
la_a1 &proc_pos
ld_t0,a1,0
st_t0,a0,16
@@ -941,7 +940,7 @@ DEFINE EXPR_INVALID 1200000000000000
## ---- line_start && tok eq "%enum" ----
:proc_check_enum
- ld_t0,sp,24
+ ld_t0,sp,8
mov_a0,t0
la_a1 &const_enum
li_a2 %5 %0
@@ -951,7 +950,7 @@ DEFINE EXPR_INVALID 1200000000000000
beqz_a0
# %enum matched: shim into define_fielded(stride=1, total="COUNT", len=5)
- ld_t0,sp,24
+ ld_t0,sp,8
la_a0 &proc_pos
st_t0,a0,0
la_a0 &proc_line_start
@@ -962,7 +961,7 @@ DEFINE EXPR_INVALID 1200000000000000
li_a2 %5 %0
la_br &define_fielded
call
- ld_a0,sp,16
+ ld_a0,sp,0
la_a1 &proc_pos
ld_t0,a1,0
st_t0,a0,16
@@ -973,8 +972,8 @@ DEFINE EXPR_INVALID 1200000000000000
:proc_check_newline
# reload s, tok
- ld_a0,sp,16
- ld_t0,sp,24
+ ld_a0,sp,0
+ ld_t0,sp,8
ld_a1,t0,0
li_a2 TOK_NEWLINE
la_br &proc_check_builtin
@@ -992,8 +991,8 @@ DEFINE EXPR_INVALID 1200000000000000
:proc_check_builtin
# tok->kind == TOK_WORD && tok+1 < s->end && (tok+1)->kind == TOK_LPAREN ?
- ld_a0,sp,16
- ld_t0,sp,24
+ ld_a0,sp,0
+ ld_t0,sp,8
ld_a1,t0,0
li_a2 TOK_WORD
la_br &proc_check_macro
@@ -1018,35 +1017,35 @@ DEFINE EXPR_INVALID 1200000000000000
call
la_br &proc_do_builtin
bnez_a0
- ld_a0,sp,24
+ ld_a0,sp,8
la_a1 &const_at
li_a2 %1 %0
la_br &tok_eq_const
call
la_br &proc_do_builtin
bnez_a0
- ld_a0,sp,24
+ ld_a0,sp,8
la_a1 &const_pct
li_a2 %1 %0
la_br &tok_eq_const
call
la_br &proc_do_builtin
bnez_a0
- ld_a0,sp,24
+ ld_a0,sp,8
la_a1 &const_dlr
li_a2 %1 %0
la_br &tok_eq_const
call
la_br &proc_do_builtin
bnez_a0
- ld_a0,sp,24
+ ld_a0,sp,8
la_a1 &const_select
li_a2 %7 %0
la_br &tok_eq_const
call
la_br &proc_do_builtin
bnez_a0
- ld_a0,sp,24
+ ld_a0,sp,8
la_a1 &const_str
li_a2 %4 %0
la_br &tok_eq_const
@@ -1058,8 +1057,8 @@ DEFINE EXPR_INVALID 1200000000000000
:proc_do_builtin
# expand_builtin_call(s, tok)
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br &expand_builtin_call
call
la_br &proc_loop
@@ -1069,14 +1068,14 @@ DEFINE EXPR_INVALID 1200000000000000
# macro = find_macro(tok); if non-zero AND
# ((tok+1 < s->end AND (tok+1)->kind == TOK_LPAREN) OR macro->param_count == 0)
# then expand_call. (§4 paren-less 0-arg calls.)
- ld_a0,sp,24
+ ld_a0,sp,8
la_br &find_macro
call
la_br &proc_emit
beqz_a0
mov_t2,a0
- ld_a0,sp,16
- ld_t0,sp,24
+ ld_a0,sp,0
+ ld_t0,sp,8
addi_t1,t0,24
ld_a1,a0,8
la_br &proc_macro_has_next
@@ -1088,7 +1087,7 @@ DEFINE EXPR_INVALID 1200000000000000
li_a2 TOK_LPAREN
la_br &proc_macro_zero_arg
bne_a1,a2
- ld_a0,sp,16
+ ld_a0,sp,0
mov_a1,t2
la_br &expand_call
call
@@ -1099,7 +1098,7 @@ DEFINE EXPR_INVALID 1200000000000000
ld_t0,t2,16
la_br &proc_emit
bnez_t0
- ld_a0,sp,16
+ ld_a0,sp,0
mov_a1,t2
la_br &expand_call
call
@@ -1108,10 +1107,10 @@ DEFINE EXPR_INVALID 1200000000000000
:proc_emit
# emit_token(tok); s->pos += 24; s->line_start = 0
- ld_a0,sp,24
+ ld_a0,sp,8
la_br &emit_token
call
- ld_a0,sp,16
+ ld_a0,sp,0
ld_t0,a0,16
addi_t0,t0,24
st_t0,a0,16
@@ -1127,13 +1126,12 @@ DEFINE EXPR_INVALID 1200000000000000
b
:proc_done
- leave
- ret
+ eret
## --- %macro storage: parse header + body into macros[] / macro_body_tokens --
## Called at proc_pos == line-start `%macro`. Leaves proc_pos past the %endm
## line with proc_line_start = 1. Uses BSS scratch (def_m_ptr, def_param_ptr,
-## def_body_line_start) since P1v2 enter/leave does not save s* registers.
+## def_body_line_start) since P1v2 enter/eret does not save s* registers.
##
## Macro record layout (296 bytes, see M1PP_MACRO_RECORD_SIZE):
## +0 name.ptr (8)
@@ -1447,8 +1445,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_a0 &proc_line_start
li_a1 %1 %0
st_a1,a0,0
- leave
- ret
+ eret
## --- %struct / %enum directive ----------------------------------------------
## define_fielded(a0=stride, a1=total_name_ptr, a2=total_name_len).
@@ -1653,8 +1650,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_a0 &proc_line_start
li_a1 %1 %0
st_a1,a0,0
- leave
- ret
+ eret
## df_emit_field(): read df_base_*, df_suffix_*, df_value from BSS; synthesize
## one macro record + one body token. Builds the "NAME.field" identifier in
@@ -1798,8 +1794,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_a1 ¯os_end
st_t2,a1,0
- leave
- ret
+ eret
## df_render_decimal(): reads df_value; writes a reverse-filled decimal
## rendering into df_digit_scratch[cursor..end) and stores df_digit_count +
@@ -1992,8 +1987,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &push_stream_span
call
:ppsfm_done
- leave
- ret
+ eret
## ============================================================================
## --- Argument parsing -------------------------------------------------------
@@ -2480,15 +2474,15 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &err_bad_macro_header
beq_a0,a1
# spill a0/a1 so arg_is_braced can clobber regs
- st_a0,sp,16
- st_a1,sp,24
+ st_a0,sp,0
+ st_a1,sp,8
la_br &arg_is_braced
call
la_br &catp_plain
beqz_a0
# braced: strip outer braces (start+24, end-24)
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
addi_a0,a0,24
addi_a1,a1,neg24
la_br &catp_done
@@ -2498,13 +2492,12 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &catp_done
b
:catp_plain
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br ©_span_to_pool
call
:catp_done
- leave
- ret
+ eret
## copy_paste_arg_to_pool(a0=arg_start, a1=arg_end) -> void (fatal unless len 1)
## Enforces the single-token-argument rule for params adjacent to ##.
@@ -2512,14 +2505,14 @@ DEFINE EXPR_INVALID 1200000000000000
:copy_paste_arg_to_pool
enter_16
# spill a0/a1 for the arg_is_braced call
- st_a0,sp,16
- st_a1,sp,24
+ st_a0,sp,0
+ st_a1,sp,8
la_br &arg_is_braced
call
la_br &err_bad_macro_header
bnez_a0
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
# if ((arg_end - arg_start) != 24) fatal
sub_a2,a1,a0
li_a3 M1PP_TOK_SIZE
@@ -2527,8 +2520,7 @@ DEFINE EXPR_INVALID 1200000000000000
bne_a2,a3
la_br ©_span_to_pool
call
- leave
- ret
+ eret
## expand_macro_tokens(a0=call_tok, a1=limit, a2=macro_ptr) -> void (fatal on bad)
## Requires call_tok+1 is TOK_LPAREN. Runs parse_args(call_tok+1, limit),
@@ -3027,8 +3019,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &paste_pool_range
call
- leave
- ret
+ eret
## expand_call(a0=stream_ptr, a1=macro_ptr) -> void (fatal on bad call)
## Calls expand_macro_tokens for the call at stream->pos, sets
@@ -3039,7 +3030,7 @@ DEFINE EXPR_INVALID 1200000000000000
# spill stream_ptr to local frame slot (sp+16 is the first local; sp+0/+8
# hold the saved return address and saved caller sp).
- st_a0,sp,16
+ st_a0,sp,0
# expand_macro_tokens(stream->pos, stream->end, macro)
# stream->pos at +16, stream->end at +8
@@ -3052,7 +3043,7 @@ DEFINE EXPR_INVALID 1200000000000000
call
# stream->pos = emt_after_pos
- ld_a0,sp,16
+ ld_a0,sp,0
la_a1 &emt_after_pos
ld_t0,a1,0
st_t0,a0,16
@@ -3067,8 +3058,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &push_pool_stream_from_mark
call
- leave
- ret
+ eret
## ============================================================================
## --- ## token paste compaction ----------------------------------------------
@@ -3171,8 +3161,7 @@ DEFINE EXPR_INVALID 1200000000000000
ld_a1,a1,0
st_a1,t0,16
- leave
- ret
+ eret
## paste_pool_range(a0=mark) -> void (fatal on bad paste)
## In-place compactor over expand_pool[mark..pool_used). For each TOK_PASTE,
@@ -3311,8 +3300,7 @@ DEFINE EXPR_INVALID 1200000000000000
sub_t0,t0,a1
la_a1 &pool_used
st_t0,a1,0
- leave
- ret
+ eret
## ============================================================================
## --- Integer atoms + S-expression evaluator ---------------------------------
@@ -3678,80 +3666,61 @@ DEFINE EXPR_INVALID 1200000000000000
:eoc_invalid
li_a0 EXPR_INVALID
- leave
- ret
+ eret
:eoc_add
li_a0 EXPR_ADD
- leave
- ret
+ eret
:eoc_sub
li_a0 EXPR_SUB
- leave
- ret
+ eret
:eoc_mul
li_a0 EXPR_MUL
- leave
- ret
+ eret
:eoc_div
li_a0 EXPR_DIV
- leave
- ret
+ eret
:eoc_mod
li_a0 EXPR_MOD
- leave
- ret
+ eret
:eoc_shl
li_a0 EXPR_SHL
- leave
- ret
+ eret
:eoc_shr
li_a0 EXPR_SHR
- leave
- ret
+ eret
:eoc_and
li_a0 EXPR_AND
- leave
- ret
+ eret
:eoc_or
li_a0 EXPR_OR
- leave
- ret
+ eret
:eoc_xor
li_a0 EXPR_XOR
- leave
- ret
+ eret
:eoc_not
li_a0 EXPR_NOT
- leave
- ret
+ eret
:eoc_eq
li_a0 EXPR_EQ
- leave
- ret
+ eret
:eoc_ne
li_a0 EXPR_NE
- leave
- ret
+ eret
:eoc_lt
li_a0 EXPR_LT
- leave
- ret
+ eret
:eoc_le
li_a0 EXPR_LE
- leave
- ret
+ eret
:eoc_gt
li_a0 EXPR_GT
- leave
- ret
+ eret
:eoc_ge
li_a0 EXPR_GE
- leave
- ret
+ eret
:eoc_strlen
li_a0 EXPR_STRLEN
- leave
- ret
+ eret
## apply_expr_op(a0=op_code, a1=args_ptr, a2=argc) -> a0 = i64 result
## Reduce args[0..argc) per op:
@@ -4209,8 +4178,7 @@ DEFINE EXPR_INVALID 1200000000000000
:aeo_finish
la_a0 &aeo_acc
ld_a0,a0,0
- leave
- ret
+ eret
## helper: validate argc >= 1; fatal otherwise. (Returns to caller.)
:aeo_require_argc_ge1
@@ -4286,13 +4254,13 @@ DEFINE EXPR_INVALID 1200000000000000
## sp+48 saved emt_mark
:eval_expr_atom
enter_40
- st_a0,sp,16
- st_a1,sp,24
+ st_a0,sp,0
+ st_a1,sp,8
# macro_ptr = find_macro(tok)
la_br &find_macro
call
- st_a0,sp,32
+ st_a0,sp,16
# if (macro_ptr == 0) -> integer atom branch
la_br &eea_int_atom
@@ -4301,9 +4269,9 @@ DEFINE EXPR_INVALID 1200000000000000
# §4 paren-less 0-arg atom:
# Take the macro-call branch if (tok+1 < limit AND (tok+1)->kind == TOK_LPAREN)
# OR macro->param_count == 0. Otherwise fall through to int atom (unchanged).
- ld_t0,sp,16
+ ld_t0,sp,0
addi_t0,t0,24
- ld_t1,sp,24
+ ld_t1,sp,8
la_br &eea_check_zero_arg
blt_t1,t0
la_br &eea_check_zero_arg
@@ -4317,7 +4285,7 @@ DEFINE EXPR_INVALID 1200000000000000
:eea_check_zero_arg
# No trailing LPAREN. Take the macro branch only if param_count == 0.
- ld_t0,sp,32
+ ld_t0,sp,16
ld_t1,t0,16
la_br &eea_int_atom
bnez_t1
@@ -4325,30 +4293,30 @@ DEFINE EXPR_INVALID 1200000000000000
:eea_do_macro
# Macro call branch:
# expand_macro_tokens(tok, limit, macro_ptr)
- ld_a0,sp,16
- ld_a1,sp,24
- ld_a2,sp,32
+ ld_a0,sp,0
+ ld_a1,sp,8
+ ld_a2,sp,16
la_br &expand_macro_tokens
call
# Snapshot emt outputs immediately.
la_a0 &emt_after_pos
ld_t0,a0,0
- st_t0,sp,40
+ st_t0,sp,24
la_a0 &emt_mark
ld_t0,a0,0
- st_t0,sp,48
+ st_t0,sp,32
# If pool was not extended (pool_used == mark) -> bad expression.
la_a0 &pool_used
ld_t0,a0,0
- ld_t1,sp,48
+ ld_t1,sp,32
la_br &err_bad_macro_header
beq_t0,t1
# eval_expr_range(expand_pool + mark, expand_pool + pool_used)
la_a0 &expand_pool
- ld_t1,sp,48
+ ld_t1,sp,32
add_a0,a0,t1
la_a1 &expand_pool
la_a2 &pool_used
@@ -4363,33 +4331,31 @@ DEFINE EXPR_INVALID 1200000000000000
# restore pool_used = mark
la_a0 &pool_used
- ld_t0,sp,48
+ ld_t0,sp,32
st_t0,a0,0
# eval_after_pos = saved emt_after_pos
la_a0 &eval_after_pos
- ld_t0,sp,40
+ ld_t0,sp,24
st_t0,a0,0
- leave
- ret
+ eret
:eea_int_atom
# parse_int_token(tok) -> i64
- ld_a0,sp,16
+ ld_a0,sp,0
la_br &parse_int_token
call
la_a1 &eval_value
st_a0,a1,0
# eval_after_pos = tok + 24
- ld_t0,sp,16
+ ld_t0,sp,0
addi_t0,t0,24
la_a0 &eval_after_pos
st_t0,a0,0
- leave
- ret
+ eret
## eval_expr_range(a0=start_tok, a1=end_tok) -> a0 = i64 result (fatal on bad)
## Main S-expression evaluator loop, driven by the explicit ExprFrame stack
@@ -4412,28 +4378,28 @@ DEFINE EXPR_INVALID 1200000000000000
## used as the local base for stack checks)
:eval_expr_range
enter_56
- st_a0,sp,16
- st_a1,sp,24
+ st_a0,sp,0
+ st_a1,sp,8
li_t0 %0 %0
+ st_t0,sp,16
+ st_t0,sp,24
st_t0,sp,32
st_t0,sp,40
- st_t0,sp,48
- st_t0,sp,56
# entry_frame_top = expr_frame_top
la_a0 &expr_frame_top
ld_t0,a0,0
- st_t0,sp,64
+ st_t0,sp,48
:eer_loop
# If have_value, deliver it.
- ld_t0,sp,48
+ ld_t0,sp,32
la_br &eer_no_have_value
beqz_t0
# have_value: feed into top frame, or set result.
la_a0 &expr_frame_top
ld_t0,a0,0
- ld_t1,sp,64
+ ld_t1,sp,48
la_br &eer_set_result
beq_t0,t1
# frame = &expr_frames[frame_top - 1]
@@ -4456,42 +4422,42 @@ DEFINE EXPR_INVALID 1200000000000000
add_a3,a0,a2
shli_a2,t1,3
add_a3,a3,a2
- ld_t2,sp,32
+ ld_t2,sp,16
st_t2,a3,0
# frame->argc++
addi_t1,t1,1
st_t1,a1,0
# have_value = 0
li_t0 %0 %0
- st_t0,sp,48
+ st_t0,sp,32
la_br &eer_loop
b
:eer_set_result
# No frame open; this value is the top-level result.
- ld_t0,sp,56
+ ld_t0,sp,40
la_br &err_bad_macro_header
bnez_t0
- ld_t0,sp,32
- st_t0,sp,40
+ ld_t0,sp,16
+ st_t0,sp,24
li_t0 %1 %0
- st_t0,sp,56
+ st_t0,sp,40
li_t0 %0 %0
- st_t0,sp,48
+ st_t0,sp,32
la_br &eer_loop
b
:eer_no_have_value
# skip_expr_newlines(pos, end)
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br &skip_expr_newlines
call
- st_a0,sp,16
+ st_a0,sp,0
# if (pos >= end) break
- ld_t0,sp,16
- ld_t1,sp,24
+ ld_t0,sp,0
+ ld_t1,sp,8
la_br &eer_loop_done
beq_t0,t1
@@ -4505,38 +4471,38 @@ DEFINE EXPR_INVALID 1200000000000000
beq_t2,a3
# atom: eval_expr_atom(pos, end); value = eval_value; pos = eval_after_pos
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br &eval_expr_atom
call
la_a0 &eval_value
ld_t0,a0,0
- st_t0,sp,32
+ st_t0,sp,16
la_a0 &eval_after_pos
ld_t0,a0,0
- st_t0,sp,16
+ st_t0,sp,0
li_t0 %1 %0
- st_t0,sp,48
+ st_t0,sp,32
la_br &eer_loop
b
:eer_lparen
# pos++
addi_t0,t0,24
- st_t0,sp,16
+ st_t0,sp,0
# skip_expr_newlines
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br &skip_expr_newlines
call
- st_a0,sp,16
+ st_a0,sp,0
# if (pos >= end) fatal
- ld_t0,sp,16
- ld_t1,sp,24
+ ld_t0,sp,0
+ ld_t1,sp,8
la_br &err_bad_macro_header
beq_t0,t1
# op = expr_op_code(pos)
- ld_a0,sp,16
+ ld_a0,sp,0
la_br &expr_op_code
call
# if (op == EXPR_INVALID) fatal
@@ -4572,9 +4538,9 @@ DEFINE EXPR_INVALID 1200000000000000
addi_t0,t0,1
st_t0,a1,0
# pos++ (skip operator token)
- ld_t0,sp,16
+ ld_t0,sp,0
addi_t0,t0,24
- st_t0,sp,16
+ st_t0,sp,0
la_br &eer_loop
b
@@ -4582,7 +4548,7 @@ DEFINE EXPR_INVALID 1200000000000000
# if (frame_top <= entry_frame_top) fatal
la_a0 &expr_frame_top
ld_t0,a0,0
- ld_t1,sp,64
+ ld_t1,sp,48
la_br &err_bad_macro_header
beq_t0,t1
la_br &err_bad_macro_header
@@ -4603,16 +4569,16 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &apply_expr_op
call
# value = result; frame_top--; pos++; have_value = 1
- st_a0,sp,32
+ st_a0,sp,16
la_a1 &expr_frame_top
ld_t0,a1,0
addi_t0,t0,neg1
st_t0,a1,0
- ld_t0,sp,16
+ ld_t0,sp,0
addi_t0,t0,24
- st_t0,sp,16
+ st_t0,sp,0
li_t0 %1 %0
- st_t0,sp,48
+ st_t0,sp,32
la_br &eer_loop
b
@@ -4620,18 +4586,18 @@ DEFINE EXPR_INVALID 1200000000000000
# (strlen "literal") — degenerate unary op whose argument is a
# TOK_STRING atom, not a recursive expression.
# pos++ past the "strlen" operator word.
- ld_t0,sp,16
+ ld_t0,sp,0
addi_t0,t0,24
- st_t0,sp,16
+ st_t0,sp,0
# skip_expr_newlines(pos, end)
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br &skip_expr_newlines
call
- st_a0,sp,16
+ st_a0,sp,0
# if (pos >= end) fatal
- ld_t0,sp,16
- ld_t1,sp,24
+ ld_t0,sp,0
+ ld_t1,sp,8
la_br &err_bad_macro_header
beq_t0,t1
# if (pos->kind != TOK_STRING) fatal
@@ -4652,19 +4618,19 @@ DEFINE EXPR_INVALID 1200000000000000
bne_a3,a0
# value = pos->text.len - 2
addi_a1,a1,neg2
- st_a1,sp,32
+ st_a1,sp,16
# pos++
addi_t0,t0,24
- st_t0,sp,16
+ st_t0,sp,0
# skip_expr_newlines(pos, end)
- ld_a0,sp,16
- ld_a1,sp,24
+ ld_a0,sp,0
+ ld_a1,sp,8
la_br &skip_expr_newlines
call
- st_a0,sp,16
+ st_a0,sp,0
# if (pos >= end) fatal
- ld_t0,sp,16
- ld_t1,sp,24
+ ld_t0,sp,0
+ ld_t1,sp,8
la_br &err_bad_macro_header
beq_t0,t1
# if (pos->kind != TOK_RPAREN) fatal
@@ -4674,10 +4640,10 @@ DEFINE EXPR_INVALID 1200000000000000
bne_t2,a3
# pos++
addi_t0,t0,24
- st_t0,sp,16
+ st_t0,sp,0
# have_value = 1
li_t0 %1 %0
- st_t0,sp,48
+ st_t0,sp,32
la_br &eer_loop
b
@@ -4685,22 +4651,21 @@ DEFINE EXPR_INVALID 1200000000000000
# frame_top must equal entry_frame_top
la_a0 &expr_frame_top
ld_t0,a0,0
- ld_t1,sp,64
+ ld_t1,sp,48
la_br &err_bad_macro_header
bne_t0,t1
# have_result must be 1
- ld_t0,sp,56
+ ld_t0,sp,40
la_br &err_bad_macro_header
beqz_t0
# pos must equal end
- ld_t0,sp,16
- ld_t1,sp,24
+ ld_t0,sp,0
+ ld_t1,sp,8
la_br &err_bad_macro_header
bne_t0,t1
# return result
- ld_a0,sp,40
- leave
- ret
+ ld_a0,sp,24
+ eret
## ============================================================================
## --- Hex emit for !@%$ ------------------------------------------------------
@@ -4820,8 +4785,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &emit_token
call
- leave
- ret
+ eret
## ============================================================================
## --- Builtin dispatcher ( ! @ % $ %select ) ---------------------------------
@@ -5030,8 +4994,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &emit_hex_value
call
- leave
- ret
+ eret
:ebc_select
# require arg_count == 3
@@ -5142,8 +5105,7 @@ DEFINE EXPR_INVALID 1200000000000000
call
:ebc_select_done
- leave
- ret
+ eret
## %str(IDENT): stringify a single WORD argument into a TOK_STRING literal.
## Validation: arg_count == 1, arg span length == 1 token, and that token's
@@ -5261,8 +5223,7 @@ DEFINE EXPR_INVALID 1200000000000000
la_br &emit_token
call
- leave
- ret
+ eret
## --- Error paths -------------------------------------------------------------
## Each err_* loads a (msg, len) pair for fatal; fatal writes "m1pp: <msg>\n"
diff --git a/p1/P1-aarch64.M1pp b/p1/P1-aarch64.M1pp
@@ -165,7 +165,7 @@
%select((= %aa64_is_sp(dst) 1),
%aa64_add_imm(sp, src, 0),
%select((= %aa64_is_sp(src) 1),
- %aa64_add_imm(dst, sp, 0),
+ %aa64_add_imm(dst, sp, 16),
%((| 0xAA000000 (<< %aa64_reg(src) 16) (<< 31 5) %aa64_reg(dst)))))
%endm
@@ -408,7 +408,9 @@
%endm
%macro p1_mem(op, rt, rn, off)
-%aa64_mem(op, rt, rn, off)
+%select((= %aa64_is_sp(rn) 1),
+ %aa64_mem(op, rt, rn, (+ off 16)),
+ %aa64_mem(op, rt, rn, off))
%endm
%macro p1_ldarg(rd, slot)
@@ -436,19 +438,24 @@
%aa64_ret()
%endm
-%macro p1_leave()
+%macro p1_eret()
%aa64_mem(LD, lr, sp, 0)
%aa64_mem(LD, x8, sp, 8)
%aa64_mov_rr(sp, x8)
+%aa64_ret()
%endm
%macro p1_tail()
-%p1_leave()
+%aa64_mem(LD, lr, sp, 0)
+%aa64_mem(LD, x8, sp, 8)
+%aa64_mov_rr(sp, x8)
%aa64_br(br)
%endm
%macro p1_tailr(rs)
-%p1_leave()
+%aa64_mem(LD, lr, sp, 0)
+%aa64_mem(LD, x8, sp, 8)
+%aa64_mov_rr(sp, x8)
%aa64_br(rs)
%endm
diff --git a/p1/P1-amd64.M1pp b/p1/P1-amd64.M1pp
@@ -562,7 +562,50 @@
%endm
%macro p1_mov(rd, rs)
-%amd_mov_rr(rd, rs)
+%p1_mov_##rs(rd)
+%endm
+
+# All non-sp sources: plain register copy.
+%macro p1_mov_a0(rd)
+%amd_mov_rr(rd, a0)
+%endm
+%macro p1_mov_a1(rd)
+%amd_mov_rr(rd, a1)
+%endm
+%macro p1_mov_a2(rd)
+%amd_mov_rr(rd, a2)
+%endm
+%macro p1_mov_a3(rd)
+%amd_mov_rr(rd, a3)
+%endm
+%macro p1_mov_t0(rd)
+%amd_mov_rr(rd, t0)
+%endm
+%macro p1_mov_t1(rd)
+%amd_mov_rr(rd, t1)
+%endm
+%macro p1_mov_t2(rd)
+%amd_mov_rr(rd, t2)
+%endm
+%macro p1_mov_s0(rd)
+%amd_mov_rr(rd, s0)
+%endm
+%macro p1_mov_s1(rd)
+%amd_mov_rr(rd, s1)
+%endm
+%macro p1_mov_s2(rd)
+%amd_mov_rr(rd, s2)
+%endm
+%macro p1_mov_s3(rd)
+%amd_mov_rr(rd, s3)
+%endm
+
+# sp-source: portable sp is the frame-local base, which is native rsp + 16
+# (the 16-byte backend-private frame header sits at [rsp+0..rsp+15]).
+# Emit `mov rd, rsp ; add rd, 16`.
+%macro p1_mov_sp(rd)
+%amd_mov_rr(rd, sp)
+%amd_alu_ri8(0, rd, 16)
%endm
%macro p1_rrr(op, rd, ra, rb)
@@ -618,18 +661,176 @@
%p1_shifti_##op(rd, ra, imm)
%endm
+# p1_mem dispatches on (op, base). When the base is sp, portable sp is the
+# frame-local base — 16 bytes above native rsp — so the physical access needs
+# the supplied portable offset plus 16. For any other base, the portable and
+# native offset coincide. Internal backend callers that need raw native-rsp
+# access (p1_enter, p1_eret, _start stub, p1_ldarg, p1_syscall) use
+# amd_mem_LD/amd_mem_ST directly and bypass this translation.
+
+%macro p1_mem_LD_sp(rt, off)
+%amd_mem_LD(rt, sp, (+ off 16))
+%endm
+%macro p1_mem_ST_sp(rt, off)
+%amd_mem_ST(rt, sp, (+ off 16))
+%endm
+%macro p1_mem_LB_sp(rt, off)
+%amd_mem_LB(rt, sp, (+ off 16))
+%endm
+%macro p1_mem_SB_sp(rt, off)
+%amd_mem_SB(rt, sp, (+ off 16))
+%endm
+
%macro p1_mem_LD(rt, rn, off)
-%amd_mem_LD(rt, rn, off)
+%p1_mem_LD_##rn(rt, off)
%endm
%macro p1_mem_ST(rt, rn, off)
-%amd_mem_ST(rt, rn, off)
+%p1_mem_ST_##rn(rt, off)
%endm
%macro p1_mem_LB(rt, rn, off)
-%amd_mem_LB(rt, rn, off)
+%p1_mem_LB_##rn(rt, off)
%endm
%macro p1_mem_SB(rt, rn, off)
-%amd_mem_SB(rt, rn, off)
+%p1_mem_SB_##rn(rt, off)
+%endm
+
+# Non-sp bases for each op -- plain native load/store with portable offset.
+%macro p1_mem_LD_a0(rt, off)
+%amd_mem_LD(rt, a0, off)
+%endm
+%macro p1_mem_LD_a1(rt, off)
+%amd_mem_LD(rt, a1, off)
+%endm
+%macro p1_mem_LD_a2(rt, off)
+%amd_mem_LD(rt, a2, off)
+%endm
+%macro p1_mem_LD_a3(rt, off)
+%amd_mem_LD(rt, a3, off)
+%endm
+%macro p1_mem_LD_t0(rt, off)
+%amd_mem_LD(rt, t0, off)
+%endm
+%macro p1_mem_LD_t1(rt, off)
+%amd_mem_LD(rt, t1, off)
+%endm
+%macro p1_mem_LD_t2(rt, off)
+%amd_mem_LD(rt, t2, off)
+%endm
+%macro p1_mem_LD_s0(rt, off)
+%amd_mem_LD(rt, s0, off)
+%endm
+%macro p1_mem_LD_s1(rt, off)
+%amd_mem_LD(rt, s1, off)
+%endm
+%macro p1_mem_LD_s2(rt, off)
+%amd_mem_LD(rt, s2, off)
+%endm
+%macro p1_mem_LD_s3(rt, off)
+%amd_mem_LD(rt, s3, off)
+%endm
+
+%macro p1_mem_ST_a0(rt, off)
+%amd_mem_ST(rt, a0, off)
+%endm
+%macro p1_mem_ST_a1(rt, off)
+%amd_mem_ST(rt, a1, off)
+%endm
+%macro p1_mem_ST_a2(rt, off)
+%amd_mem_ST(rt, a2, off)
+%endm
+%macro p1_mem_ST_a3(rt, off)
+%amd_mem_ST(rt, a3, off)
+%endm
+%macro p1_mem_ST_t0(rt, off)
+%amd_mem_ST(rt, t0, off)
+%endm
+%macro p1_mem_ST_t1(rt, off)
+%amd_mem_ST(rt, t1, off)
%endm
+%macro p1_mem_ST_t2(rt, off)
+%amd_mem_ST(rt, t2, off)
+%endm
+%macro p1_mem_ST_s0(rt, off)
+%amd_mem_ST(rt, s0, off)
+%endm
+%macro p1_mem_ST_s1(rt, off)
+%amd_mem_ST(rt, s1, off)
+%endm
+%macro p1_mem_ST_s2(rt, off)
+%amd_mem_ST(rt, s2, off)
+%endm
+%macro p1_mem_ST_s3(rt, off)
+%amd_mem_ST(rt, s3, off)
+%endm
+
+%macro p1_mem_LB_a0(rt, off)
+%amd_mem_LB(rt, a0, off)
+%endm
+%macro p1_mem_LB_a1(rt, off)
+%amd_mem_LB(rt, a1, off)
+%endm
+%macro p1_mem_LB_a2(rt, off)
+%amd_mem_LB(rt, a2, off)
+%endm
+%macro p1_mem_LB_a3(rt, off)
+%amd_mem_LB(rt, a3, off)
+%endm
+%macro p1_mem_LB_t0(rt, off)
+%amd_mem_LB(rt, t0, off)
+%endm
+%macro p1_mem_LB_t1(rt, off)
+%amd_mem_LB(rt, t1, off)
+%endm
+%macro p1_mem_LB_t2(rt, off)
+%amd_mem_LB(rt, t2, off)
+%endm
+%macro p1_mem_LB_s0(rt, off)
+%amd_mem_LB(rt, s0, off)
+%endm
+%macro p1_mem_LB_s1(rt, off)
+%amd_mem_LB(rt, s1, off)
+%endm
+%macro p1_mem_LB_s2(rt, off)
+%amd_mem_LB(rt, s2, off)
+%endm
+%macro p1_mem_LB_s3(rt, off)
+%amd_mem_LB(rt, s3, off)
+%endm
+
+%macro p1_mem_SB_a0(rt, off)
+%amd_mem_SB(rt, a0, off)
+%endm
+%macro p1_mem_SB_a1(rt, off)
+%amd_mem_SB(rt, a1, off)
+%endm
+%macro p1_mem_SB_a2(rt, off)
+%amd_mem_SB(rt, a2, off)
+%endm
+%macro p1_mem_SB_a3(rt, off)
+%amd_mem_SB(rt, a3, off)
+%endm
+%macro p1_mem_SB_t0(rt, off)
+%amd_mem_SB(rt, t0, off)
+%endm
+%macro p1_mem_SB_t1(rt, off)
+%amd_mem_SB(rt, t1, off)
+%endm
+%macro p1_mem_SB_t2(rt, off)
+%amd_mem_SB(rt, t2, off)
+%endm
+%macro p1_mem_SB_s0(rt, off)
+%amd_mem_SB(rt, s0, off)
+%endm
+%macro p1_mem_SB_s1(rt, off)
+%amd_mem_SB(rt, s1, off)
+%endm
+%macro p1_mem_SB_s2(rt, off)
+%amd_mem_SB(rt, s2, off)
+%endm
+%macro p1_mem_SB_s3(rt, off)
+%amd_mem_SB(rt, s3, off)
+%endm
+
%macro p1_mem(op, rt, rn, off)
%p1_mem_##op(rt, rn, off)
%endm
@@ -659,25 +860,38 @@
%amd_ret()
%endm
-# LEAVE
-# r9 = [sp + 0] -- retaddr into scratch
-# rax = [sp + 8] -- saved caller sp into rax (an unused native reg)
-# sp = rax -- unwind to caller sp
-# push r9 -- reinstall retaddr so RET returns correctly
-%macro p1_leave()
+# ERET -- atomic frame epilogue + return from a framed function.
+# r9 = [rsp + 0] -- retaddr into scratch (native rsp; backend-private)
+# rax = [rsp + 8] -- saved caller sp into rax (an unused native reg)
+# rsp = rax -- unwind to caller sp
+# push r9 -- reinstall retaddr so the trailing ret returns
+# correctly
+# ret -- pop reinstated retaddr into rip
+%macro p1_eret()
%amd_mem_LD(scratch, sp, 0)
%amd_mem_LD(rax, sp, 8)
%amd_mov_rr(sp, rax)
%amd_push(scratch)
+%amd_ret()
%endm
+# TAIL / TAILR -- frame epilogue followed by an unconditional jump to the
+# target. The epilogue is the same sequence as the first four steps of
+# p1_eret (we omit the trailing ret because we jmp to a fresh target
+# instead).
%macro p1_tail()
-%p1_leave()
+%amd_mem_LD(scratch, sp, 0)
+%amd_mem_LD(rax, sp, 8)
+%amd_mov_rr(sp, rax)
+%amd_push(scratch)
%amd_jmp_r(br)
%endm
%macro p1_tailr(rs)
-%p1_leave()
+%amd_mem_LD(scratch, sp, 0)
+%amd_mem_LD(rax, sp, 8)
+%amd_mov_rr(sp, rax)
+%amd_push(scratch)
%amd_jmp_r(rs)
%endm
diff --git a/p1/P1-riscv64.M1pp b/p1/P1-riscv64.M1pp
@@ -9,7 +9,7 @@
# save0 = t4 (x29) -- transient across SYSCALL only
# save1 = t3 (x28)
# save2 = a6 (x16)
-# saved_fp = fp (x8) -- used by ENTER/LEAVE to capture caller sp
+# saved_fp = fp (x8) -- used by ENTER/ERET to capture caller sp
# a7 = x17 -- Linux riscv64 syscall-number slot
# a4 = x14 -- syscall arg4 slot
# a5 = x15 -- syscall arg5 slot
@@ -331,7 +331,9 @@
%endm
%macro p1_mov(rd, rs)
-%rv_mov_rr(rd, rs)
+%select((= %rv_is_sp(rs) 1),
+ %rv_addi(rd, sp, 16),
+ %rv_mov_rr(rd, rs))
%endm
%macro p1_rrr(op, rd, ra, rb)
@@ -378,7 +380,9 @@
%rv_sb(rt, rn, off)
%endm
%macro p1_mem(op, rt, rn, off)
-%p1_mem_##op(rt, rn, off)
+%select((= %rv_is_sp(rn) 1),
+ %p1_mem_##op(rt, rn, (+ off 16)),
+ %p1_mem_##op(rt, rn, off))
%endm
%macro p1_ldarg(rd, slot)
@@ -406,19 +410,24 @@
%rv_jalr(zero, ra, 0)
%endm
-%macro p1_leave()
+%macro p1_eret()
%rv_ld(ra, sp, 0)
%rv_ld(fp, sp, 8)
%rv_mov_rr(sp, fp)
+%rv_jalr(zero, ra, 0)
%endm
%macro p1_tail()
-%p1_leave()
+%rv_ld(ra, sp, 0)
+%rv_ld(fp, sp, 8)
+%rv_mov_rr(sp, fp)
%rv_jalr(zero, br, 0)
%endm
%macro p1_tailr(rs)
-%p1_leave()
+%rv_ld(ra, sp, 0)
+%rv_ld(fp, sp, 8)
+%rv_mov_rr(sp, fp)
%rv_jalr(zero, rs, 0)
%endm
diff --git a/p1/P1.M1pp b/p1/P1.M1pp
@@ -4,7 +4,7 @@
# The backend must provide the target hooks used below:
# %p1_li, %p1_la, %p1_labr, %p1_mov, %p1_rrr, %p1_addi, %p1_logi,
# %p1_shifti, %p1_mem, %p1_ldarg, %p1_b, %p1_br, %p1_call, %p1_callr,
-# %p1_ret, %p1_leave, %p1_tail, %p1_tailr, %p1_condb, %p1_condbz,
+# %p1_ret, %p1_eret, %p1_tail, %p1_tailr, %p1_condb, %p1_condbz,
# %p1_enter, %p1_syscall, and %p1_sys_*.
# ---- Materialization ------------------------------------------------------
@@ -185,8 +185,8 @@
%p1_enter(size)
%endm
-%macro leave()
-%p1_leave()
+%macro eret()
+%p1_eret()
%endm
# ---- System ---------------------------------------------------------------
diff --git a/p1/aarch64.py b/p1/aarch64.py
@@ -185,6 +185,18 @@ def aa_ret():
return le32(0xD65F03C0)
+def aa_epilogue():
+ # Frame teardown, shared by ERET, TAIL, TAILR. Loads lr and the
+ # saved caller sp from the hidden header at native_sp+0/+8, then
+ # unwinds sp. Does NOT transfer control; the caller appends an
+ # aa_ret / aa_br as appropriate.
+ return (
+ aa_mem('LD', 'lr', 'sp', 0)
+ + aa_mem('LD', 'x8', 'sp', 8)
+ + aa_mov_rr('sp', 'x8')
+ )
+
+
def aa_lit64_prefix(rd):
## 64-bit literal-pool prefix for LI: ldr xN, [pc,#8]; b PC+12.
## The 8 bytes that follow in source become the literal; b skips them.
@@ -219,6 +231,12 @@ def encode_labr(_arch, _row):
def encode_mov(_arch, row):
+ # Portable `sp` is the frame-local base, which is 16 bytes above
+ # native sp (the backend's 2-word hidden header sits at the low end
+ # of each frame allocation). So reading sp into a register yields
+ # native_sp + 16, not native_sp itself.
+ if row.rs == 'sp':
+ return aa_add_imm(row.rd, 'sp', 16, sub=False)
return aa_mov_rr(row.rd, row.rs)
@@ -263,7 +281,11 @@ def encode_shifti(_arch, row):
def encode_mem(_arch, row):
- return aa_mem(row.op, row.rt, row.rn, row.off)
+ # Portable sp points to the frame-local base; the 2-word hidden
+ # header sits at native_sp+0/+8 and is not portable-addressable.
+ # Shift sp-relative offsets past the header.
+ off = row.off + 16 if row.rn == 'sp' else row.off
+ return aa_mem(row.op, row.rt, row.rn, off)
def encode_ldarg(_arch, row):
@@ -276,8 +298,7 @@ def encode_branch_reg(_arch, row):
if row.kind == 'CALLR':
return aa_blr(row.rs)
if row.kind == 'TAILR':
- leave = encode_nullary(_arch, Nullary('LEAVE', 'LEAVE'))
- return leave + aa_br(row.rs)
+ return aa_epilogue() + aa_br(row.rs)
raise ValueError(f'unknown branch-reg kind: {row.kind}')
@@ -314,15 +335,10 @@ def encode_nullary(_arch, row):
return aa_blr('br')
if row.kind == 'RET':
return aa_ret()
- if row.kind == 'LEAVE':
- return (
- aa_mem('LD', 'lr', 'sp', 0)
- + aa_mem('LD', 'x8', 'sp', 8)
- + aa_mov_rr('sp', 'x8')
- )
+ if row.kind == 'ERET':
+ return aa_epilogue() + aa_ret()
if row.kind == 'TAIL':
- leave = encode_nullary(_arch, Nullary('LEAVE', 'LEAVE'))
- return leave + aa_br('br')
+ return aa_epilogue() + aa_br('br')
if row.kind == 'SYSCALL':
return ''.join([
aa_mov_rr('x8', 'a0'),
diff --git a/p1/p1_gen.py b/p1/p1_gen.py
@@ -139,6 +139,7 @@ def rows(arch):
out.append(Banner('Calls And Returns'))
out.append(Nullary(name='CALL', kind='CALL'))
out.append(Nullary(name='RET', kind='RET'))
+ out.append(Nullary(name='ERET', kind='ERET'))
out.append(Nullary(name='TAIL', kind='TAIL'))
for rs in P1_GPRS:
out.append(BranchReg(name=f'CALLR_{rs.upper()}', kind='CALLR', rs=rs))
@@ -148,7 +149,6 @@ def rows(arch):
out.append(Banner('Frame Management'))
for size in ENTER_SIZES:
out.append(Enter(name=f'ENTER_{size}', size=size))
- out.append(Nullary(name='LEAVE', kind='LEAVE'))
out.append(Banner('System'))
out.append(Nullary(name='SYSCALL', kind='SYSCALL'))
diff --git a/post.md b/post.md
@@ -239,8 +239,8 @@ Ops:
- Branching: `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ`.
Signed and unsigned less-than; `>=`, `>`, `<=` are synthesized by swapping
operands or inverting branch sense.
-- Calls / returns: `CALL`, `CALLR`, `RET`, `TAIL`, `TAILR`.
-- Frame management: `ENTER`, `LEAVE`.
+- Calls / returns: `CALL`, `CALLR`, `RET`, `ERET`, `TAIL`, `TAILR`.
+- Frame management: `ENTER`.
- ABI arg access: `LDARG` — reads stack-passed incoming args without
hard-coding the frame layout.
- System: `SYSCALL`.
@@ -252,7 +252,8 @@ Calling convention:
- `a0` is the one-word return register. Two-word returns use `a0`/`a1`.
- `a0`-`a3` and `t0`-`t2` are caller-saved; `s0`-`s3` and `sp` are
callee-saved.
-- `ENTER` builds the standard frame; `LEAVE` tears it down.
+- `ENTER` builds the standard frame; `ERET` tears it down and returns
+ (`TAIL`/`TAILR` likewise combine teardown with a jump).
- Stack-passed outgoing args are staged in a dedicated frame-local area
before `CALL`, so the callee finds them at a known offset from `sp`.
- Wider-than-two-word returns use the usual hidden-pointer trick: caller
@@ -323,8 +324,7 @@ A function call, with a helper that doubles its argument:
%enter(0)
%la_br() &double
%call()
- %leave()
- %ret()
+ %eret()
:ELF_end
```
@@ -333,7 +333,7 @@ A function call, with a helper that doubles its argument:
op consumes it. `double` is a leaf and needs no frame. `p1_main` is not
— it calls `double`, so it opens a standard frame with `%enter(0)` to
preserve the hidden return-address state across the call, and closes it
-with `%leave()` before returning. Run with `./double a b c` and the exit
+with `%eret()`, which tears down the frame and returns in one step. Run with `./double a b c` and the exit
status is `8` (argc=4, doubled).
## What it cost
diff --git a/tests/p1/double.P1 b/tests/p1/double.P1
@@ -2,7 +2,7 @@
#
# `:double` is a leaf function that shifts its one-word argument left by
# one and returns. `:p1_main` is not a leaf (it calls `double`), so it
-# establishes a standard frame with %enter/%leave to preserve the hidden
+# establishes a standard frame with %enter/%eret to preserve the hidden
# return-address state across the call. argc arrives in a0, is handed to
# double unchanged, and the doubled result comes back in a0.
@@ -14,7 +14,6 @@
%enter(0)
%la_br() &double
%call()
- %leave()
- %ret()
+ %eret()
:ELF_end
diff --git a/tests/p1/p1-aliasing.P1 b/tests/p1/p1-aliasing.P1
@@ -60,8 +60,7 @@
%syscall()
%li(a0) $(0)
- %leave()
- %ret()
+ %eret()
# Two-byte output scratch: [0] = computed byte, [1] = newline. The space
# placeholder gets overwritten by SB before the write syscall.
diff --git a/tests/p1/p1-call.P1 b/tests/p1/p1-call.P1
@@ -1,4 +1,4 @@
-# tests/p1/p1-call.P1 -- exercise ENTER, LEAVE, CALL, RET, MOV, ADDI
+# tests/p1/p1-call.P1 -- exercise ENTER, ERET, CALL, RET, MOV, ADDI
# across a nontrivial P1 program. Calls a `write_msg` subroutine twice
# and returns argc + 1 as the exit status so we also verify the argv-
# aware _start stub (argc is always >= 1).
@@ -22,8 +22,7 @@
# exit status = argc + 1 (so it's always >= 2).
%addi(a0, s0, 1)
- %leave()
- %ret()
+ %eret()
# write_msg(buf=a0, len=a1) -> void
:write_msg
@@ -34,8 +33,7 @@
%li(a0) %sys_write()
%li(a1) $(1)
%syscall()
- %leave()
- %ret()
+ %eret()
:msg_a
"A