kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Gram Import Plan

This is the import plan for folding /Users/ryan/code/ll1 into libkit and exposing it through the kit driver. The goal is not to make ll1 a language frontend. It should land as a reusable parser/lexer-generator subsystem: EBNF in, parse and lexer tables out, with allocation-free push runtimes for generated parsers and lexers.

The standalone project already has the right broad shape for libkit: explicit allocation, no hidden CLI dependency in the generator API, immutable generated tables, and caller-owned runtime state. The import work is mostly namespace, host-boundary, build-gating, and file-layout discipline.

Decision Summary

ll1 remains a source repository/project name only. It should not appear in the libkit API, installed command name, or user-facing help.

Boundaries

gram is a library subsystem, not a lang/ frontend. It does not register a KitFrontendVTable, does not emit KitCg, and does not participate in source-to-object compilation unless a future frontend chooses to use it internally. Its public surface is consumed by embedders, the driver command, and generated C files.

The driver remains the only hosted layer. File reads, directory creation, stdout, stderr, and CLI allocation policy live in driver/cmd/gram.c and the hosted driver environment. The library accepts input as KitSlice, writes output to KitWriter, allocates through KitContext.heap, and reports diagnostics through KitContext.diag.

Generated grammar tables are immutable static data and are fine as globals. Mutable parser, lexer, and generator state must hang off caller-owned runtime objects or explicit libkit handles.

File Moves

Import the C implementation, generated meta tables, runtime, and tests. Do not make the Python generator part of the normal libkit build.

Current file Kit destination Notes
include/llgen.h include/kit/gram.h Public generator API, renamed to Kit types and functions.
include/llparse.h include/kit/gram_parse.h Public parser runtime API.
include/lllex.h include/kit/gram_lex.h Public lexer runtime API.
include/llparse_tables.h include/kit/support/gram_parse_tables.h Public generated-code support, not ordinary embedder API.
include/lllex_tables.h include/kit/support/gram_lex_tables.h Public generated-code support, not ordinary embedder API.
include/llunicode.h src/gram/unicode.h Private helper; expose later as <kit/unicode.h> only if there is a broader API need.
include/llunicode_props.h src/gram/unicode_props.h Private generated Unicode property resolver.
runtime/llparse.c src/gram/parse_runtime.c Implements <kit/gram_parse.h>.
runtime/lllex.c src/gram/lex_runtime.c Implements <kit/gram_lex.h>.
runtime/llunicode.c src/gram/unicode.c Private Unicode helpers for generator and lexer runtime.
runtime/llunicode_props.c src/gram/unicode_props.c Checked-in generated property tables.
gen/llgen.c src/gram/generator.c Public API implementation and C emitter after Kit context rewrite.
gen/llgen_ll1.c src/gram/ll1.c LL(1), Pratt, FIRST/FOLLOW, and validation.
gen/llgen_lex_byte.c src/gram/lex_byte.c Byte-mode lexer compiler.
gen/llgen_lex_unicode.c src/gram/lex_unicode.c Unicode-mode lexer compiler.
gen/llgen_internal.h src/gram/internal.h Private to src/gram/*.c.
gen/meta.ebnf src/gram/meta.ebnf Source grammar for the generator's own parser; not consumed by normal builds.
generated meta .c/.h src/gram/meta_tables.c, src/gram/meta_tables.h Check in generated tables so libkit does not need Python or a previous gram to build.
gen/llgen_cli.c driver/cmd/gram.c Rewrite as a Kit driver command using public <kit/gram.h>.
tools/gen_unicode_props.py scripts/gen_gram_unicode_props.py Regeneration helper only; not in the build.
data/ucd/17.0.0/* data/ucd/17.0.0/* or test/gram/ucd/17.0.0/* Keep if we want reproducible Unicode-table regeneration in-tree.
test/*.c, test/*.ebnf test/gram/* Library tests and fixtures.
test/errors/* test/gram/errors/* Error fixtures.

gen/llgen.py should not be imported into libkit. If parity against the Python reference is still useful during the transition, keep it temporarily as scripts/gram_ref.py and exclude it from release/build dependencies. Delete it once the imported C generator is trusted.

Public Renames

The standalone ll_* and llgen_* names are too short for libkit and would violate the public symbol discipline. Public definitions for the imported subsystem must use KitGram, kit_gram_, or KIT_GRAM_.

Generator API

Standalone name Kit name
llgen_options KitGramOptions
llgen_compiled KitGramCompiled
llgen_codegen Remove or narrow; prefer KitWriter outputs.
llgen_compile_text kit_gram_compile_text
llgen_compiled_free kit_gram_free
llgen_dump_sexpr_text kit_gram_dump_sexpr
llgen_generate_c kit_gram_emit_c
llgen_parser_grammar kit_gram_parser_grammar
llgen_lexer_grammar kit_gram_lexer_grammar
llgen_token_count kit_gram_token_count
llgen_token_name kit_gram_token_name
llgen_token_display kit_gram_token_display
llgen_find_token kit_gram_find_token
llgen_rule_count kit_gram_rule_count
llgen_rule_name kit_gram_rule_name
llgen_find_rule kit_gram_find_rule

The generator API should take a const KitContext* and KitSlice inputs. It should not expose llgen_allocator; the allocator becomes ctx->heap. C output should stream to caller-provided KitWriters. Callers that want owned in-memory text can use kit_writer_mem.

Proposed core shape:

typedef struct KitGramCompiled KitGramCompiled;

typedef struct KitGramOptions {
  KitSlice name; /* optional grammar name override */
} KitGramOptions;

typedef struct KitGramEmitOptions {
  KitSlice header_path;
  KitSlice source_path;
  KitSlice prefix;
} KitGramEmitOptions;

KIT_API KitStatus kit_gram_compile_text(const KitContext* ctx,
                                        KitSlice text, KitSlice path,
                                        const KitGramOptions* opts,
                                        KitGramCompiled** out);
KIT_API void kit_gram_free(KitGramCompiled*);

KIT_API KitStatus kit_gram_emit_c(const KitGramCompiled*,
                                  const KitGramEmitOptions* opts,
                                  KitWriter* header, KitWriter* source);
KIT_API KitStatus kit_gram_dump_sexpr(const KitContext* ctx, KitSlice text,
                                      KitSlice path,
                                      const KitGramOptions* opts,
                                      KitWriter* out);

Parser Runtime API

Standalone name Kit name
ll_tok_kind KitGramTokenKind
ll_rule_id KitGramRuleId
ll_sem KitGramSem
ll_token KitGramToken
ll_error KitGramParseError
ll_err_action KitGramErrorAction
LL_ABORT KIT_GRAM_ERROR_ABORT
LL_SKIP KIT_GRAM_ERROR_SKIP
LL_RESYNC KIT_GRAM_ERROR_RESYNC
ll_actions KitGramActions
ll_slot KitGramSlot
LL_SLOT_SIZE KIT_GRAM_SLOT_SIZE
ll_config KitGramParserConfig
ll_grammar KitGramGrammar
ll_parser KitGramParser
LL_PARSER_SIZE KIT_GRAM_PARSER_SIZE
ll_status KitGramParseStatus
LL_NEED_MORE KIT_GRAM_PARSE_NEED_MORE
LL_PARSE_ACCEPT KIT_GRAM_PARSE_ACCEPT
LL_PARSE_ERROR KIT_GRAM_PARSE_ERROR
ll_parser_init kit_gram_parser_init
ll_parser_push kit_gram_parser_push
ll_parser_finish kit_gram_parser_finish
ll_parser_result kit_gram_parser_result
ll_stack_bounds kit_gram_stack_bounds

Lexer Runtime API

Standalone name Kit name
ll_lex_grammar KitGramLexGrammar
ll_lexer KitGramLexer
LL_LEXER_SIZE KIT_GRAM_LEXER_SIZE
ll_lex_config KitGramLexConfig
ll_lex_status KitGramLexStatus
LL_LEX_TOKEN KIT_GRAM_LEX_TOKEN
LL_LEX_NEED_MORE KIT_GRAM_LEX_NEED_MORE
LL_LEX_EOF KIT_GRAM_LEX_EOF
LL_LEX_ERROR KIT_GRAM_LEX_ERROR
ll_lex_error KitGramLexError
ll_lexer_init kit_gram_lexer_init
ll_lexer_push kit_gram_lexer_push
ll_lexer_finish kit_gram_lexer_finish
ll_lexer_next kit_gram_lexer_next
ll_lexer_error kit_gram_lexer_error

Generated-Code Support API

The generated-code support headers should use Kit names too:

Standalone name Kit name
ll_sym_kind KitGramSymKind
LL_S_TERM KIT_GRAM_SYM_TERM
LL_S_RULE KIT_GRAM_SYM_RULE
LL_S_REP KIT_GRAM_SYM_REP
LL_S_OPT KIT_GRAM_SYM_OPT
ll_sym KitGramSym
ll_prod KitGramProd
ll_pratt_op KitGramPrattOp
ll_pratt KitGramPratt
ll_rule KitGramRule
LL_TERM KIT_GRAM_TERM
LL_RULE KIT_GRAM_RULE
ll_lex_accept KitGramLexAccept
LL_LEX_DEAD KIT_GRAM_LEX_DEAD
LL_LEX_ACCEPT_NONE KIT_GRAM_LEX_ACCEPT_NONE

Generated grammar-specific token/rule enums (TOK_*, R_*, and <prefix>_*) are user artifacts. They may keep their current shape because they are controlled by the generated prefix and are not libkit exports.

Include Rewrites

The generator should emit installed-style includes:

/* generated header */
#include <kit/gram_parse.h>
#include <kit/gram_lex.h>       /* only when a generated lexer exists */

/* generated source */
#include "generated_name.h"
#include <kit/support/gram_parse_tables.h>
#include <kit/support/gram_lex_tables.h> /* only when needed */

Private src/gram/*.c files include internal.h and the public headers they implement. They must not include driver headers.

Driver Command

The user-facing command is:

kit gram [--dump-sexpr] [--prefix PREFIX] [-o OUT.c] [--header OUT.h] grammar.ebnf

The first import should preserve existing behavior:

The driver implementation should use DriverEnv for KitContext, file reads, writer opening, diagnostics, and memory. The command should not call malloc, free, fopen, fprintf, or exit directly.

Driver integration points:

Build Integration

Library integration points:

The first import can gate runtime and generator together under KIT_GRAM_ENABLED. If size-sensitive embeddings need generated parser runtime without the generator compiler, split later into:

Do not introduce that split until there is a real embedding that benefits from it; it adds config and stub surface.

Implementation Phases

Phase 0: Freeze Standalone Behavior

Phase 1: Mechanical Import, Private Names Still Allowed Internally

Phase 2: Public Namespace Rewrite

Phase 3: Kit Context Rewrite

Phase 4: Driver Command

Phase 5: Cleanup and Documentation

Tests

Targeted tests should land with the import:

Map standalone tests as follows:

Standalone test Kit test
test/test_llgen_api.c test/gram/api_test.c
test/test_calc.c test/gram/calc_test.c
test/test_features.c test/gram/features_test.c
test/test_lexer.c test/gram/lexer_test.c
test/test_pratt_calc.c test/gram/pratt_test.c
test/test_unicode_support.c test/gram/unicode_test.c
test/test_unicode_lexer.c test/gram/unicode_lexer_test.c
test/test_utf8_runtime.c test/gram/utf8_runtime_test.c
test/errors/*.ebnf test/gram/errors/*.ebnf

Prefer red-green import steps:

  1. Add the test target and fixtures first.
  2. Import the runtime until hand-written/generated tables parse again.
  3. Import the generator until direct API tests pass.
  4. Add the driver command and smoke test last.

Open Questions