1. Language basics Tokens: - flow control contructions (controls in short) - instructions - whitespace - data: - immediate in {}; escape sequences are applied - block: immediate string in [$ $], where $ is an arbitrary character; no escape sequences applied - indirect: address (address fetched from the database); - statement separator (a sequence of ";" or newline characters) - comments There is a database provided by the environment through a C API (each instruction is implemented as a function that is called by tmpasm while executing the script; resolving instruction names to function pointers is also done by the environment). All data is stored in that database. There is not too many assumptions about the database address or data formats; the parser handles both data and address as string and address is of a restricted set of characters. The database is intended (but not strictly limited) to be a key=value database. Whether it's a flat hash, a tree, an array indexed by integer addresses or anything else, is up to the environment (through hooks like set) and get()). 1.1. immediate data Immediate is a string enclosed in {} for the parser. The following escape sequences are substituted: \\ a single \ \n a newline \r a carriage return \t a tab \o brace open ("{") \c brace close ("}") If any other character follows the \, it is appended as a character, without further processing (thus "\}" is an alias for \c and "\j" is "j"). 1.2. indirect address An address is a string that addresses a node in the database. Address is written without {}, format in regexp is: [A-Za-z0-9_./-]+ 1.3. block character $ is an arbitrary separator character which the string may not contain. The block is closed by the separator and a ]. The separator is also used to embrace inlined node values. The block is a different form of a string; it keeps data more intact but restricts it somewhat, since there must be a separator character that is not present in the string. In return it's very easy to mix verbatim text and printing data from the database, which, as tmpasm is a templating language, is one of the main features. This is done by inlining variables in blocks using the chosen separator character. For example in block: [@text text @myvar@ text text] the @myvar@ part will be substituted with the value of variable myvar. 1.4. comment A comment may start anywhere where a keyword, instruction or argument is expected. The comment starts with a hashmark ("#") and ends in a newline character. The newline character is taken as a statement separator. A semicolon does not end the comment (comments may contain semicolons). 2. environment Instructions operate on database nodes (using get and set hooks). The database is provided by the environment. The environment provides the following hooks: - set(address, str) both address and str are immediate strings; sets database node addressed to data (overwrites, creates) - get(address) returns a string data - is_true(str) decides if data is true or false for flow control; what values are interpeted as true is up to the env, but most libraries will use "1". - match(str1, str2) returns true if str1 data matches str2 pattern (however data and pattern format are defined by the env) - first(str, &st, lstr) take str as a list and iterate on it; return first item set up state (st). lstr is the list string. - next(str, &st) return next item and update state; return NULL at end 3. flow control and instructions The script consists of instructions and flow control constructions. Instructions and controls are separated by a sequence of statement separators (\n or \r or semicolon). A control may have arguments including keywords, separated by whitespace (the exact syntax for each control is specified below). An instruction may have a whitespace separated list of arguments. Keywords are: end, if, then, else, foreach, in, switch, case, default. Keywords can be used as data, but using them as variable names is nor recommended. 3.1. instruction instr; instr arg1 arg2 arg3; The instruction is executed by the environment. The maximum length of an instruction name is configured by the environemnt (in TMPASM_INSTR_MAXLEN, compile time) and is typically 32 bytes. Arguments may be any data type. The argument list may be empty. 3.2. if-then-else if data then code1; code1; code1; else code2; code2; code2; end; if data then code1; code1; code1; end; Invokes is_true on data and runs code1 (when true) or code2 (when false). The else thread may be omitted. A special application of the syntax is to have no code in "then", which in effect would invert the condition (as in code1 runs if data is false): if data then else code1; code1; code1; end; 3.3. foreach foreach address in data code; code; code; end; Invokes first()/next() of the env to iterate over data. In each iteration the node at address is set to the current element of the list and code is executed. 3.4. switch switch data case data1 code1; code1; code1; code1; end; case data2 code2; code2; code2; code2; end; else code3; code3; code3; code3; end; end; 4. scconfig lib The following instructions are defined when tmpasm is compiled in scconfig. 4.1. core (mandatory) 4.1.1. put put address data copy data into address using the set() hook. 4.1.2. print print data output data to the current output fd 4.1.3. exit (TODO) exit data exit Exits immediately, with status code 0 or data converted to integer. 4.1.4. include (TODO) include data Include script from a file (starts a new parsing context recursively). 4.1.5. eval (TODO) eval data Execute a script from a node (starts a new parsing context recursively). 4.2. regex (TODO) 4.2.1. regex substitution sub address data-pattern data-subst gsub address data-pattern data-subst replace the first or all matches of data-pattern to data-subst in the node at address 4.2.2. regex match match address data data-pattern Writes "1" or "0" into the node addressed depending on whether data matches data-pattern 4.3. string (TODO) 4.3.1. uniq (TODO) 4.3.2. isempty (TODO) 4.3.3 invert (TODO) 4.4. file I/O (TODO) 4.4.1. redir redir data Set output redirection to the file named in data. If data is empty, output redirection is set back to the default (stdout normally). If data is "&2", redirection is set to stderr. 4.4.2. error error data Print a message to the stderr. 5. examples 5.1. generate Makefile with features; database is a hash (scconfig) put /local/cflags {-std=c99 -Wall} put /local/ldflags {-lm} put /local/objs {main.o foo.o bar.o} if /local/debug then append /local/cflags {-g} else append /local/cflags {-O2} end isempty /local/r /local/somelib invert /local/r if /local/r then append /local/cflags { -I/usr/include/somelib} append /local/ldflags { -lsomelib} end print [@ # Makefile generated by scconfig - please edit Makefile.in CFLAGS=@/local/cflags@ LDFLAGS=@/local/ldflags@ OBJS=@/local/objs@ all: main main: $(OBJS) $(CC) $(LDFLAGS) @] foreach /local/o in /local/objs put /local/c /local/o sub /local/c {.o$} {.c} print [@ @/local/o@: @/local/c@ $(CC) -c $(CFLAGS) @/local/c@ -o @/local/o@ @] done