Gram Import Plan
This is the import plan for folding /Users/ryan/code/ll1 into libkit and
exposing it through the kit driver. The goal is not to make ll1 a language
frontend. It should land as a reusable parser/lexer-generator subsystem: EBNF in,
parse and lexer tables out, with allocation-free push runtimes for generated
parsers and lexers.
The standalone project already has the right broad shape for libkit: explicit allocation, no hidden CLI dependency in the generator API, immutable generated tables, and caller-owned runtime state. The import work is mostly namespace, host-boundary, build-gating, and file-layout discipline.
Decision Summary
- Driver command:
gram. - Public generator API header:
<kit/gram.h>. - Public parser runtime header:
<kit/gram_parse.h>. - Public lexer runtime header:
<kit/gram_lex.h>. - Generated-code support headers:
<kit/support/gram_parse_tables.h>and<kit/support/gram_lex_tables.h>. - Library subsystem gate:
KIT_GRAM_ENABLED. - Driver tool gate:
KIT_TOOL_GRAM_ENABLED. - Implementation directory:
src/gram/. - Tests:
test/gram/plus a driver smoke lane.
ll1 remains a source repository/project name only. It should not appear in the
libkit API, installed command name, or user-facing help.
Boundaries
gram is a library subsystem, not a lang/ frontend. It does not register a
KitFrontendVTable, does not emit KitCg, and does not participate in
source-to-object compilation unless a future frontend chooses to use it
internally. Its public surface is consumed by embedders, the driver command, and
generated C files.
The driver remains the only hosted layer. File reads, directory creation, stdout,
stderr, and CLI allocation policy live in driver/cmd/gram.c and the hosted
driver environment. The library accepts input as KitSlice, writes output to
KitWriter, allocates through KitContext.heap, and reports diagnostics through
KitContext.diag.
Generated grammar tables are immutable static data and are fine as globals. Mutable parser, lexer, and generator state must hang off caller-owned runtime objects or explicit libkit handles.
File Moves
Import the C implementation, generated meta tables, runtime, and tests. Do not make the Python generator part of the normal libkit build.
| Current file | Kit destination | Notes |
|---|---|---|
include/llgen.h |
include/kit/gram.h |
Public generator API, renamed to Kit types and functions. |
include/llparse.h |
include/kit/gram_parse.h |
Public parser runtime API. |
include/lllex.h |
include/kit/gram_lex.h |
Public lexer runtime API. |
include/llparse_tables.h |
include/kit/support/gram_parse_tables.h |
Public generated-code support, not ordinary embedder API. |
include/lllex_tables.h |
include/kit/support/gram_lex_tables.h |
Public generated-code support, not ordinary embedder API. |
include/llunicode.h |
src/gram/unicode.h |
Private helper; expose later as <kit/unicode.h> only if there is a broader API need. |
include/llunicode_props.h |
src/gram/unicode_props.h |
Private generated Unicode property resolver. |
runtime/llparse.c |
src/gram/parse_runtime.c |
Implements <kit/gram_parse.h>. |
runtime/lllex.c |
src/gram/lex_runtime.c |
Implements <kit/gram_lex.h>. |
runtime/llunicode.c |
src/gram/unicode.c |
Private Unicode helpers for generator and lexer runtime. |
runtime/llunicode_props.c |
src/gram/unicode_props.c |
Checked-in generated property tables. |
gen/llgen.c |
src/gram/generator.c |
Public API implementation and C emitter after Kit context rewrite. |
gen/llgen_ll1.c |
src/gram/ll1.c |
LL(1), Pratt, FIRST/FOLLOW, and validation. |
gen/llgen_lex_byte.c |
src/gram/lex_byte.c |
Byte-mode lexer compiler. |
gen/llgen_lex_unicode.c |
src/gram/lex_unicode.c |
Unicode-mode lexer compiler. |
gen/llgen_internal.h |
src/gram/internal.h |
Private to src/gram/*.c. |
gen/meta.ebnf |
src/gram/meta.ebnf |
Source grammar for the generator's own parser; not consumed by normal builds. |
generated meta .c/.h |
src/gram/meta_tables.c, src/gram/meta_tables.h |
Check in generated tables so libkit does not need Python or a previous gram to build. |
gen/llgen_cli.c |
driver/cmd/gram.c |
Rewrite as a Kit driver command using public <kit/gram.h>. |
tools/gen_unicode_props.py |
scripts/gen_gram_unicode_props.py |
Regeneration helper only; not in the build. |
data/ucd/17.0.0/* |
data/ucd/17.0.0/* or test/gram/ucd/17.0.0/* |
Keep if we want reproducible Unicode-table regeneration in-tree. |
test/*.c, test/*.ebnf |
test/gram/* |
Library tests and fixtures. |
test/errors/* |
test/gram/errors/* |
Error fixtures. |
gen/llgen.py should not be imported into libkit. If parity against the Python
reference is still useful during the transition, keep it temporarily as
scripts/gram_ref.py and exclude it from release/build dependencies. Delete it
once the imported C generator is trusted.
Public Renames
The standalone ll_* and llgen_* names are too short for libkit and would
violate the public symbol discipline. Public definitions for the imported
subsystem must use KitGram, kit_gram_, or KIT_GRAM_.
Generator API
| Standalone name | Kit name |
|---|---|
llgen_options |
KitGramOptions |
llgen_compiled |
KitGramCompiled |
llgen_codegen |
Remove or narrow; prefer KitWriter outputs. |
llgen_compile_text |
kit_gram_compile_text |
llgen_compiled_free |
kit_gram_free |
llgen_dump_sexpr_text |
kit_gram_dump_sexpr |
llgen_generate_c |
kit_gram_emit_c |
llgen_parser_grammar |
kit_gram_parser_grammar |
llgen_lexer_grammar |
kit_gram_lexer_grammar |
llgen_token_count |
kit_gram_token_count |
llgen_token_name |
kit_gram_token_name |
llgen_token_display |
kit_gram_token_display |
llgen_find_token |
kit_gram_find_token |
llgen_rule_count |
kit_gram_rule_count |
llgen_rule_name |
kit_gram_rule_name |
llgen_find_rule |
kit_gram_find_rule |
The generator API should take a const KitContext* and KitSlice inputs. It
should not expose llgen_allocator; the allocator becomes ctx->heap. C output
should stream to caller-provided KitWriters. Callers that want owned in-memory
text can use kit_writer_mem.
Proposed core shape:
typedef struct KitGramCompiled KitGramCompiled;
typedef struct KitGramOptions {
KitSlice name; /* optional grammar name override */
} KitGramOptions;
typedef struct KitGramEmitOptions {
KitSlice header_path;
KitSlice source_path;
KitSlice prefix;
} KitGramEmitOptions;
KIT_API KitStatus kit_gram_compile_text(const KitContext* ctx,
KitSlice text, KitSlice path,
const KitGramOptions* opts,
KitGramCompiled** out);
KIT_API void kit_gram_free(KitGramCompiled*);
KIT_API KitStatus kit_gram_emit_c(const KitGramCompiled*,
const KitGramEmitOptions* opts,
KitWriter* header, KitWriter* source);
KIT_API KitStatus kit_gram_dump_sexpr(const KitContext* ctx, KitSlice text,
KitSlice path,
const KitGramOptions* opts,
KitWriter* out);
Parser Runtime API
| Standalone name | Kit name |
|---|---|
ll_tok_kind |
KitGramTokenKind |
ll_rule_id |
KitGramRuleId |
ll_sem |
KitGramSem |
ll_token |
KitGramToken |
ll_error |
KitGramParseError |
ll_err_action |
KitGramErrorAction |
LL_ABORT |
KIT_GRAM_ERROR_ABORT |
LL_SKIP |
KIT_GRAM_ERROR_SKIP |
LL_RESYNC |
KIT_GRAM_ERROR_RESYNC |
ll_actions |
KitGramActions |
ll_slot |
KitGramSlot |
LL_SLOT_SIZE |
KIT_GRAM_SLOT_SIZE |
ll_config |
KitGramParserConfig |
ll_grammar |
KitGramGrammar |
ll_parser |
KitGramParser |
LL_PARSER_SIZE |
KIT_GRAM_PARSER_SIZE |
ll_status |
KitGramParseStatus |
LL_NEED_MORE |
KIT_GRAM_PARSE_NEED_MORE |
LL_PARSE_ACCEPT |
KIT_GRAM_PARSE_ACCEPT |
LL_PARSE_ERROR |
KIT_GRAM_PARSE_ERROR |
ll_parser_init |
kit_gram_parser_init |
ll_parser_push |
kit_gram_parser_push |
ll_parser_finish |
kit_gram_parser_finish |
ll_parser_result |
kit_gram_parser_result |
ll_stack_bounds |
kit_gram_stack_bounds |
Lexer Runtime API
| Standalone name | Kit name |
|---|---|
ll_lex_grammar |
KitGramLexGrammar |
ll_lexer |
KitGramLexer |
LL_LEXER_SIZE |
KIT_GRAM_LEXER_SIZE |
ll_lex_config |
KitGramLexConfig |
ll_lex_status |
KitGramLexStatus |
LL_LEX_TOKEN |
KIT_GRAM_LEX_TOKEN |
LL_LEX_NEED_MORE |
KIT_GRAM_LEX_NEED_MORE |
LL_LEX_EOF |
KIT_GRAM_LEX_EOF |
LL_LEX_ERROR |
KIT_GRAM_LEX_ERROR |
ll_lex_error |
KitGramLexError |
ll_lexer_init |
kit_gram_lexer_init |
ll_lexer_push |
kit_gram_lexer_push |
ll_lexer_finish |
kit_gram_lexer_finish |
ll_lexer_next |
kit_gram_lexer_next |
ll_lexer_error |
kit_gram_lexer_error |
Generated-Code Support API
The generated-code support headers should use Kit names too:
| Standalone name | Kit name |
|---|---|
ll_sym_kind |
KitGramSymKind |
LL_S_TERM |
KIT_GRAM_SYM_TERM |
LL_S_RULE |
KIT_GRAM_SYM_RULE |
LL_S_REP |
KIT_GRAM_SYM_REP |
LL_S_OPT |
KIT_GRAM_SYM_OPT |
ll_sym |
KitGramSym |
ll_prod |
KitGramProd |
ll_pratt_op |
KitGramPrattOp |
ll_pratt |
KitGramPratt |
ll_rule |
KitGramRule |
LL_TERM |
KIT_GRAM_TERM |
LL_RULE |
KIT_GRAM_RULE |
ll_lex_accept |
KitGramLexAccept |
LL_LEX_DEAD |
KIT_GRAM_LEX_DEAD |
LL_LEX_ACCEPT_NONE |
KIT_GRAM_LEX_ACCEPT_NONE |
Generated grammar-specific token/rule enums (TOK_*, R_*, and
<prefix>_*) are user artifacts. They may keep their current shape because they
are controlled by the generated prefix and are not libkit exports.
Include Rewrites
The generator should emit installed-style includes:
/* generated header */
#include <kit/gram_parse.h>
#include <kit/gram_lex.h> /* only when a generated lexer exists */
/* generated source */
#include "generated_name.h"
#include <kit/support/gram_parse_tables.h>
#include <kit/support/gram_lex_tables.h> /* only when needed */
Private src/gram/*.c files include internal.h and the public headers they
implement. They must not include driver headers.
Driver Command
The user-facing command is:
kit gram [--dump-sexpr] [--prefix PREFIX] [-o OUT.c] [--header OUT.h] grammar.ebnf
The first import should preserve existing behavior:
- Default
.coutput path: replace the input suffix with.c. - Default
.houtput path: replace the input suffix with.h. - Default prefix: derive from the input basename and append
_. --dump-sexpr: write the meta-grammar dump to stdout.- Exit code
0: success. - Exit code
1: compile, diagnostic, or I/O failure. - Exit code
2: bad command-line usage.
The driver implementation should use DriverEnv for KitContext, file reads,
writer opening, diagnostics, and memory. The command should not call malloc,
free, fopen, fprintf, or exit directly.
Driver integration points:
- Add
KIT_TOOL_GRAM_ENABLEDtoinclude/kit/config.h. - Add
driver/cmd/gram.c. - Add
driver_gramanddriver_help_gramtodriver/driver.h. - Add the
gramrow todriver/main.c, gated byKIT_TOOL_GRAM_ENABLED. - Add
$(call tool-cmd,GRAM,gram)tomk/driver_srcs.mk. - Keep it in
DRIVER_GROUP_OTHERfor now. It is a developer tool, not a default drop-in binutils/toolchain symlink.
Build Integration
Library integration points:
- Add
KIT_GRAM_ENABLEDtoinclude/kit/config.has an optional library subsystem. - Add
LIB_SRCS_GRAM := $(shell find src/gram -name '*.c' ...)tomk/lib_srcs.mk, and include it only whenKIT_GRAM_ENABLEDis1. - Add weak public stubs in
src/api/config_stubs.cfor gated-out generator API entry points. Runtime stubs may be omitted if the runtime is considered part of the generated-code ABI and the whole subsystem is always enabled in the default build; if it is gated, public runtime symbols need stubs too. - Keep
src/gram/meta_tables.cchecked in. Normalmake libmust not require Python, network access, UCD regeneration, or a bootstrapgrambinary. - Add a regeneration-only maintenance target later, for example
make regen-gram-metaandmake regen-gram-unicode-props.
The first import can gate runtime and generator together under
KIT_GRAM_ENABLED. If size-sensitive embeddings need generated parser runtime
without the generator compiler, split later into:
KIT_GRAM_RUNTIME_ENABLED: parser/lexer runtime plus support headers.KIT_GRAM_ENABLED: generator compiler, depending on runtime.
Do not introduce that split until there is a real embedding that benefits from it; it adds config and stub surface.
Implementation Phases
Phase 0: Freeze Standalone Behavior
- Run the current
ll1test suite and save the passing command set in the import notes. - Generate and check in
meta_tables.c/meta_tables.hfrommeta.ebnf. - Confirm generated output for representative grammars is stable.
Phase 1: Mechanical Import, Private Names Still Allowed Internally
- Move files into the destinations above.
- Keep behavior unchanged while fixing include paths.
- Add
test/gramfixtures and amake test-gramtarget, initially allowed to fail until the namespace rewrite lands.
Phase 2: Public Namespace Rewrite
- Rename every public
ll_*,llgen_*, andLL_*symbol toKitGram,kit_gram_, orKIT_GRAM_spelling. - Update emitted C and generated meta tables to use the new names.
- Run
make test-lib-depsto catch leaked public symbols outsideKit,kit_, orKIT.
Phase 3: Kit Context Rewrite
- Replace
llgen_allocatorwithKitContext.heap. - Replace generated text return structs with
KitWriteroutputs. - Replace direct diagnostics with
KitContext.diag. - Remove hosted libc calls from imported library code.
- Keep
setjmp/longjmponly if the existing frontend panic pattern accepts it for this subsystem; otherwise convert OOM and validation aborts to explicitKitStatusunwinding.
Phase 4: Driver Command
- Port
gen/llgen_cli.ctodriver/cmd/gram.c. - Use the public generator API only.
- Add help text consistent with other driver commands.
- Add a focused driver test: invoke
kit gramon a small grammar, compile the generated C against libkit, and run the parser.
Phase 5: Cleanup and Documentation
- Document the stable runtime API in the public headers.
- Add a durable design doc under
doc/only after the subsystem ships. - Add
gramtoREADME.mdanddoc/DESIGN.mdcapability lists after the command works. - Remove temporary compatibility shims and any retained Python parity path.
Tests
Targeted tests should land with the import:
make test-gram: direct API compile, table introspection, generated parser, generated lexer, Pratt grammar, UTF-8 lexer, and error fixtures.make test-driver-gram: CLI generation and generated-code compile/run smoke.make test-lib-deps: symbol discipline and no accidental hosted dependencies.
Map standalone tests as follows:
| Standalone test | Kit test |
|---|---|
test/test_llgen_api.c |
test/gram/api_test.c |
test/test_calc.c |
test/gram/calc_test.c |
test/test_features.c |
test/gram/features_test.c |
test/test_lexer.c |
test/gram/lexer_test.c |
test/test_pratt_calc.c |
test/gram/pratt_test.c |
test/test_unicode_support.c |
test/gram/unicode_test.c |
test/test_unicode_lexer.c |
test/gram/unicode_lexer_test.c |
test/test_utf8_runtime.c |
test/gram/utf8_runtime_test.c |
test/errors/*.ebnf |
test/gram/errors/*.ebnf |
Prefer red-green import steps:
- Add the test target and fixtures first.
- Import the runtime until hand-written/generated tables parse again.
- Import the generator until direct API tests pass.
- Add the driver command and smoke test last.
Open Questions
- Should generated table-layout headers be documented as stable ABI, or merely stable enough for C emitted by the same libkit version? The first import should promise only same-version compatibility.
- Should Unicode UCD source data live in-tree permanently, or should only the generated property tables be checked in? Keeping the data improves reproducible regeneration but increases repository size.
- Should the runtime/generator gate be split immediately? The plan says no until an embedding needs runtime-only size savings.
- Should
grameventually support non-C output modes? The API should not bake in more thanemit_ctoday, but the command can grow--emit=later.