Main differences from yacc ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. reentrant push API The classic yacc API is pull: it runs a blocking loop calling the lexer to get tokens. byaccic implements a totally different, push API. This means the caller needs to read the input, tokenize it and call the parser with each new token. While this forces the caller to implement a parser loop, it also makes the API much simpler and much more flexible: - the caller is free to pause parsing or time multiplex among parsing multiple parallel streams, without threads or processes - the caller can more easily decide to abort parsing at any time, the parser has no chance to fall in loops taking too long - there is no direct API between the parser and the lexer, which makes APIs much simpler 2. no yy Most grammar files will set a prefix, using %prefix% (preferred) or using -p. Unlike yacc, the prefix really renames all symbols: there is no yy macro to the prefixed name. In practice this means that with a %prefix foo_% the grammar file needs to write foo_lval, foo_errok, etc. instead of yylval, yyerrok, etc. On the one hand this ties the code to a specific prefix, but on the other hand this guarantees clean namespace (no yy* can leak in accidentally in any generated .h or .c file). 3. locations There's no lloc. Locations can be recoded by the lexer and saved in the token using %struct (see point 6). The parser does not need to know anything about locations. Error reporting is done using a context struct and the guilty token passed so the error call can print location from the token. 4. context pointers There's an user defined context pointer of type %prefix%_ctx_t. This is passed on with any call. It is opaque to the auto-generated code (like if it was void *), but the grammar can use it typed, typically as a struct. All states of the parser is stored in a caller allocated %prefix%_yyctx_t, which must be passed to the parser calls. 5. file naming All file names are explicit and specified on the command line. There are no file name templates, prefixes or calculated file names. 6. %struct around %union The classic %union, as an option, can be used to assign multiple different value types to a token. With the %struct extension it is also easy to attach arbitrary auxiliary fields to tokens (e.g. to keep track of token location). When neither %union nor %struct is defined, the token is an int. When only %union is defined, the token is the union. When only %struct is defined, the token is the struct, the first field of the struct is "int un" (automatically injected by byaccic) an it holds the token value. When both %struct and %union are defined the first (automatically created) field of the struct is un, typed as the union. At the end, the prefixed STYPE is the struct (if specified) or the union (if specified) or int. $$ and $n are automatically generated according to existence of %struct: when %struct is specified, an .un tag is inserted. However the lexer call, which gets STYPE as an argument to fill in, needs to know whether there's %struct in effect and address lval accordingly. For example to assign 42 to the lval, it needs the following statement: - struct-only: lval->un = 42; - struct+enum: lval->un.numeric = 42; - enum-only: lval->numeric = 42; - no enum or struct: lval = 42; A typical use of the %struct is to add location information to each token: %struct { int line, col; } Then the lexer can manipulate lval->line and lval->col. The error() call also gets the last token seen before the error happened. Note: this does not introduce any change as long as %struct is not used, but %struct has to be used to have locations.