pp test corpus
Layered, incremental tests for the C preprocessor (translation phase 4)
derived from C11 §6.10 in doc/std/CHAPTER-6.txt and the translation-phase
ordering in doc/std/CHAPTER-5.txt §5.1.1.2. C23 #embed is also covered.
Two harnesses:
cases/ — golden-output tests; kit cc -E -I . <src> -o <actual>
is diffed against <src>.expected. Runner: run.sh.
cases_err/ — must-fail tests; kit cc -E ... must exit nonzero.
Constraint violations from C11 §6.10. Runner: run_errors.sh.
Number prefixes group tests by spec section; lexical ordering matches the
order an implementation should plausibly grow features. Files are minimal
on purpose — each isolates one rule.
Determinism
Both runners pin the values of the otherwise-environmental predefined macros:
SOURCE_DATE_EPOCH=0 is exported, fixing __DATE__ to "Jan 1 1970"
(note two spaces; C11 §6.10.8.1 requires the dd field to be space-padded
when less than 10) and __TIME__ to "00:00:00". kit must honour
SOURCE_DATE_EPOCH per the reproducible-builds convention.
- The runner
cds into the cases directory and passes the source as a
basename, so __FILE__ is the file name without a checkout-path prefix.
cases/ — success tests
00–09 Translation phases / passthrough (§5.1.1.2)
| case |
spec |
tests |
| 00_text_passthrough |
5.1.1.2 |
text-line emitted unchanged |
| 01_multiline_text |
5.1.1.2 |
multiple text-lines preserve newlines |
| 02_comment_to_space |
phase 3 |
each comment replaced by one space |
| 03_line_splice |
phase 2 |
\<newline> deleted, lines spliced |
| 04_line_splice_in_directive |
phase 2 |
splice happens before directive recognition |
| 05_directive_indented |
6.10 ¶2 |
# may follow whitespace before line start |
10–19 Null directive (§6.10.7)
| case |
spec |
tests |
| 10_null_directive |
6.10.7 |
# alone has no effect |
20–29 Object-like macros and #undef (§6.10.3, §6.10.3.5)
| case |
spec |
tests |
| 20_define_object_basic |
6.10.3 ¶9 |
trivial replacement |
| 21_define_object_multitoken |
6.10.3 ¶9 |
replacement-list is a token sequence |
| 22_define_object_empty |
6.10.3 ¶9 |
empty replacement-list is allowed |
| 23_define_object_self_ref |
6.10.3.4 ¶2 |
name-being-replaced is not re-replaced |
| 24_define_object_indirect |
6.10.3.4 ¶2 |
mutual recursion stops via blue-painting |
| 25_define_object_chain |
6.10.3.4 ¶1 |
rescan reaches second-level macro |
| 26_undef_basic |
6.10.3.5 ¶2 |
#undef removes the binding |
| 27_undef_unknown |
6.10.3.5 ¶2 |
#undef of unknown identifier is ignored |
30–39 Function-like macros and argument substitution (§6.10.3, §6.10.3.1)
| case |
spec |
tests |
| 30_define_func_basic |
6.10.3 ¶10 |
one-arg invocation |
| 31_define_func_no_params |
6.10.3 ¶10 |
empty parameter list |
| 32_define_func_no_invoke |
6.10.3 ¶10 |
bare name without ( does not expand |
| 33_define_func_two_args |
6.10.3 ¶11 |
comma separates arguments |
| 34_define_func_arg_inner_paren |
6.10.3 ¶11 |
comma inside matched () does not separate |
| 35_define_func_empty_arg |
6.10.3 ¶4 |
empty argument allowed (still arity 1) |
| 36_define_func_arg_full_expand |
6.10.3.1 ¶1 |
argument fully expanded before substitution |
| 37_define_func_arg_no_preexpand_hash |
6.10.3.1 ¶1 |
#-stringized argument is not pre-expanded |
| 38_define_func_newline_in_invoke |
6.10.3 ¶10 |
newline is whitespace inside an invocation |
40–49 # operator (stringize) (§6.10.3.2)
| case |
spec |
tests |
| 40_stringize_basic |
6.10.3.2 ¶2 |
#x produces a string literal |
| 41_stringize_multi_token |
6.10.3.2 ¶2 |
inter-token whitespace becomes single space |
| 42_stringize_whitespace_collapse |
6.10.3.2 ¶2 |
leading/trailing stripped, internal sequences collapse |
| 43_stringize_special_chars |
6.10.3.2 ¶2 |
\ inserted before " and \ in string-literal arguments |
| 44_stringize_empty |
6.10.3.2 ¶2 |
empty argument stringizes to "" |
50–59 ## operator (token paste) (§6.10.3.3)
| case |
spec |
tests |
| 50_paste_basic |
6.10.3.3 ¶3 |
concatenation of two identifier tokens |
| 51_paste_to_number |
6.10.3.3 ¶3 |
concatenation produces a pp-number |
| 52_paste_with_space_around |
6.10.3.5 ¶6 |
space around ## in definition is optional |
| 53_paste_empty_left |
6.10.3.3 ¶2 |
placemarker on left collapses to right token |
| 54_paste_empty_right |
6.10.3.3 ¶2 |
placemarker on right collapses to left token |
| 55_paste_object_macro |
6.10.3.3 ¶3 |
## is allowed in object-like macros, not just funcs |
| 56_paste_chain |
6.10.3.5 ¶7 |
x##y##z (spec example 5) |
60–69 Rescanning and self-reference (§6.10.3.4)
| case |
spec |
tests |
| 60_rescan_chain |
6.10.3.4 ¶1 |
replacement is rescanned for further macros |
| 61_rescan_object_invokes_func |
6.10.3.4 ¶1 |
rescan picks up a function-like invocation |
| 62_rescan_paste_to_macro |
6.10.3.3 ¶3 |
post-paste token is available for further replacement |
| 63_rescan_not_directive |
6.10.3.4 ¶3 |
tokens produced by expansion are not a directive |
| 64_rescan_self_in_func |
6.10.3.4 ¶2 |
function-like macro does not re-invoke itself in rescan |
| 65_rescan_arg_hideset_prescan |
6.10.3.4 ¶2 |
argument prescan keeps no-reexpand state from caller |
70–79 Variadic macros (§6.10.3 ¶12, §6.10.3.1 ¶2)
| case |
spec |
tests |
| 70_variadic_basic |
6.10.3 ¶12 |
... collects trailing args including commas |
| 71_variadic_named_param |
6.10.3 ¶12 |
named param + variadic tail |
| 72_variadic_stringize |
6.10.3.1 ¶2 |
#__VA_ARGS__ stringizes the merged sequence |
| 73_variadic_paste |
6.10.3.3 ¶2 |
paste with __VA_ARGS__ handles separator commas |
80–8f Conditional inclusion (§6.10.1)
| case |
spec |
tests |
| 80_if_arith_true |
6.10.1 ¶3 |
arithmetic constant expression in #if |
| 81_if_arith_false |
6.10.1 ¶3 |
false branch is skipped |
| 82_if_undefined_id_zero |
6.10.1 ¶4 |
remaining identifiers replaced with 0 |
| 83_if_defined_word |
6.10.1 ¶1 |
defined identifier form |
| 84_if_defined_paren |
6.10.1 ¶1 |
defined ( identifier ) form |
| 85_if_defined_no_replace |
6.10.1 ¶4 |
operand of defined is not macro-expanded |
| 86_if_macro_replaced |
6.10.1 ¶4 |
macros expanded before evaluation |
| 87_ifdef |
6.10.1 ¶5 |
#ifdef shorthand |
| 88_ifndef |
6.10.1 ¶5 |
#ifndef shorthand |
| 89_elif_chain |
6.10.1 ¶6 |
only first true branch processed |
| 8a_else_only |
6.10.1 ¶6 |
#else taken when no preceding branch is true |
| 8b_nested_if |
6.10.1 ¶6 |
nested conditionals |
| 8c_skipped_relaxed_syntax |
6.10 ¶4 |
unknown directives in skipped groups don't error |
| 8d_skipped_keeps_nesting |
6.10.1 ¶6 |
#if/#endif still tracked inside skipped groups |
90–99 Source file inclusion (§6.10.2)
| case |
spec |
tests |
| 90_include_local |
6.10.2 ¶3 |
quoted-form include of a sibling header |
| 91_include_macros_propagate |
6.10.2 ¶3 |
macros defined in header visible to includer |
| 92_include_macro_replaced_path |
6.10.2 ¶4 |
header path is macro-replaced before file lookup |
| 93_include_nested |
6.10.2 ¶6 |
a header may itself #include |
| 94_include_system_form |
6.10.2 ¶2 |
<...> form resolves through -I search paths |
a0–af Line control (§6.10.4)
| case |
spec |
tests |
| a0_line_number |
6.10.4 ¶3 |
#line N resets __LINE__ |
| a1_line_with_file |
6.10.4 ¶4 |
#line N "name" resets __FILE__ |
| a2_line_macro_replaced |
6.10.4 ¶5 |
tokens after line are macro-replaced |
b0–bf Pragma directive and _Pragma operator (§6.10.6, §6.10.9)
| case |
spec |
tests |
| b0_pragma_unknown_ignored |
6.10.6 ¶1 |
unrecognized pragma is ignored |
| b1_pragma_operator |
6.10.9 ¶1 |
_Pragma("...") tokens are removed; pragma applied |
| b2_pragma_stdc |
6.10.6 ¶2 |
#pragma STDC FP_CONTRACT ON consumed without error |
c0–cf Predefined macros (§6.10.8)
| case |
spec |
tests |
| c0_line_predefined |
6.10.8.1 |
bare __LINE__ |
| c1_stdc_macro |
6.10.8.1 |
__STDC__ is 1 for a conforming impl |
| c2_stdc_hosted |
6.10.8.1 |
__STDC_HOSTED__ is 0 (kit is freestanding) |
| c3_predefined_macros_defined |
6.10.8.1 |
__STDC_VERSION__, __DATE__, __TIME__, __FILE__ exist |
| c4_date_value |
6.10.8.1 |
__DATE__ is "Jan 1 1970" under SOURCE_DATE_EPOCH=0 |
| c5_time_value |
6.10.8.1 |
__TIME__ is "00:00:00" under SOURCE_DATE_EPOCH=0 |
| c6_file_value |
6.10.8.1 |
__FILE__ is the basename passed to kit |
d0–df #embed (C23, supplementary)
These exercise the C23 #embed directive (not in C11). Format presumed:
each byte becomes a pp-number, separated by , with no surrounding spaces.
| case |
tests |
| d0_embed_basic |
one-byte file → one pp-number |
| d1_embed_two_bytes |
two-byte file → comma-separated pp-numbers |
| d2_embed_in_array |
embed inside an array initializer |
| d3_embed_limit |
limit(N) parameter trims the byte sequence |
| d4_embed_if_empty |
if_empty(tokens) substitutes when the file has zero bytes |
Binary fixtures (regenerate with printf to control bytes exactly):
printf 'A' > test/pp/cases/d0_data.bin # 1 byte
printf 'Hi' > test/pp/cases/d1_data.bin # 2 bytes
: > test/pp/cases/d4_empty.bin # 0 bytes
cases_err/ — must-fail tests
C11 §6.10 constraint violations must produce a diagnostic; the runner
just checks for nonzero exit.
| case |
spec |
violation |
| 01_redef_object_diff_tokens |
6.10.3 ¶2 |
object-like redefined with different replacement-list |
| 02_redef_object_diff_whitespace |
6.10.3 ¶2 |
identical tokens but different whitespace separation |
| 03_redef_func_diff_param_name |
6.10.3 ¶2 |
function-like redefined with different parameter spelling |
| 04_redef_func_diff_param_count |
6.10.3 ¶2 |
function-like redefined with different parameter count |
| 05_paste_at_start |
6.10.3.3 ¶1 |
## at start of replacement list |
| 06_paste_at_end |
6.10.3.3 ¶1 |
## at end of replacement list |
| 07_hash_not_followed_by_param |
6.10.3.2 ¶1 |
# not followed by a parameter |
| 08_va_args_in_non_variadic |
6.10.3 ¶5 |
__VA_ARGS__ outside a variadic macro |
| 09_func_too_few_args |
6.10.3 ¶4 |
invocation supplies fewer arguments than parameters |
| 0a_func_too_many_args |
6.10.3 ¶4 |
invocation supplies more arguments than parameters (non-variadic) |
| 0b_unterminated_func_invoke |
6.10.3 ¶4 |
no ) terminates a function-like macro invocation |
| 0c_define_predefined |
6.10.8 ¶2 |
#define of a mandatory predefined macro name |
| 0d_undef_predefined |
6.10.8 ¶2 |
#undef of a mandatory predefined macro name |
| 0e_define_defined |
6.10.8 ¶2 |
defined cannot be the subject of #define |
| 0f_error_directive |
6.10.5 ¶1 |
#error shall produce a diagnostic |
Out of scope
These items in C11 §6.10 are deliberately not covered, with reasons:
- Exact value of
__STDC_VERSION__ (§6.10.8.1). Spec mandates the
shape 201ymmL but leaves the version kit claims to the impl;
asserting a specific value would freeze that choice in the test suite.
Existence is in c3_predefined_macros_defined.
- Conditional feature macros
__STDC_NO_THREADS__, __STDC_NO_VLA__,
__STDC_NO_ATOMICS__, __STDC_IEC_559__, etc. (§6.10.8.3). Spec
defines these as conditionally defined per impl capabilities. There is
no spec-mandated value to diff against.
- Trigraph processing (§5.1.1.2 phase 1). Trigraphs were removed in
C23; kit targets C23-era source (see
#embed support). Testing
trigraph translation would assert behavior kit intentionally does
not provide.
#embed prefix() and suffix() parameters. The spec leaves the
whitespace inserted between the parameter tokens and the byte
sequence implementation-defined, so a byte-exact diff would freeze a
particular formatting choice. limit() and if_empty() are
whitespace-insensitive in the same sense and are covered.
#include system-path resolution outside -I. The runner only
controls -I cases_dir; testing the implementation-defined fallback
search (e.g. /usr/include) would couple the suite to the host.
94_include_system_form covers <...> resolution through the
controlled -I path.