gentrex - the language

terminology

There is a caller (e.g. a host application) that calls the lib (the library that implements gentrex) for running a query. Queries are scripts that are executed in a Virtual Machine (VM) by the lib. Queries operate on trees. The tree is maintained by the caller, the lib uses the API provided by the caller to access the tree. Thus the lib is pretty flexible on how the tree is actually implemented, where/how it is stored. The only assumptions for trees are:

There is a query state the caller needs to initialize first. The query state includes:

The map is used for

The language: grammar

	expr:
		[ tag_expr ]
		[]                /* always matches the current tag */
		node_ref
		! expr            /* invert the logic value of an expr; if it was NULL, the new value is TRUE */
		expr : expr       /* apply the right expr on the result of the left expr; e.g. "r : [name==A]" means the right tag match is applied on backreferenced node "r" */
		expr / expr       /* right expr matched from any first-level children of the left node; evaluates to the right expr */
		expr / loop expr  /* same as above, but the / operator looped on multiple levels of the tree, considering all children on all levels */
		expr . expr       /* right expr matched from the first child node of the node named by left node; evaluates to the right expr */
		expr . loop expr  /* same as above, but the . operator looped on multiple levels of the tree, always considering the first child only */
		expr ^ expr       /* right expr matched against the direct parent node of the node selected by the left expr; evaluates to the right expr */
		expr ^ loop expr  /* same as above, but the ^ operator looped: any of the ascendents to the depth loop specifies */
		expr > expr       /* right expr matched against the next (sibling) node of the node named by the left expr; evaluates to the right expr */
		expr > loop expr  /* same as above, but the > operator looped: any of the right siblings, spanning up to what loop specifies */

		expr && expr      /* evaluate left expr, if it matches evaluate the right expr; true if both matches (evaluates to left) */
		expr || expr      /* evaluate left expr, if it doesn't match evaluate the right expr; true if either matches (evaluates to left or if left is FALSE, to the right) */
		name = expr       /* overwrite the value of name in the query state map */
		(expr)

	loop:
		*                 /* previous operator repeated zero or more times */
		+                 /* previous operator repeated one or more times */
		{ int }           /* previous operator repeated exactly int times */
		{ int1 - int2 }   /* previous operator repeated any times between int1 and int2, inclusive */

	node_ref:
		name              /* matches a node only if it is the node named; evaluates to the node if matches (else to NULL); name is alnum */

	tag_expr:
		! tag_expr                  /* invert the logic value of a tag_expr */
		tag == value                /* tag is the node's tag; case sensitive string match */
		tag ~ sep r sep             /* matches a regex specified between two separator characters */
		tag c_op value              /* compare tag to value using a custom operator */
		tag_expr || tag_expr
		tag_expr && tag_expr
		(tag_expr)

	value:
		string                  /* string consists of alnum */
		" string "              /* quoted string consists of any char (standard \ quoting) */

The language: expressions

Each query is an expression. Expressions are built from expressions and operators. Leaf expressions are are either tag_expr or map references.

The value of an expression is a node or FALSE. FALSE can be interpreted as a node reference to a non-existing node (sort of a NULL). Anything non-FALSE is true and is a reference to any existing node. If a true value is negated, its value becomes FALSE. If a true value is negated twice, it retrains its original value.

The value of a tag_expr is false or the node it was matched against. A tag_expr is used to decide whether a given node matches specific criterias evaluating the tags of the node.

When evaluating an expression, there is a Node-Under-Test, or NUT for short; tag_expr is always matched against NUT.The NUT is the root node of the tree or sub-tree when the evaluation of an expression starts. Tree operators change the NUT.

tag_expr

The purpose of a tag_expr is to match the tags of the NUT against an an expression. A tag_expr is always in square brakcets. They can not be nested.

The simples tag expression is the empty expression:

[]
This will always match the current NUT. Tags of a tree can be matched using two builtin syntaxes and a custom one:

Custom operators are strings consists of the following characters:

'<', '>', '=', '!', '*', '/', '^', '.'
Custom operators can not be longer than 20 characters.

Furthermore the following operators are accepted within a tag_expr:

Examples:
[color==green] matches if the "color" tag of the NUT is "green"
[color==green || color=red] matches if the "color" tag of the NUT is "green" or "red"
[ !(color==green || color=red) && (type ~ foo) ] matches if the "color" tag of the NUT is not "green" or "red" and type matches foo.
If a tag_expr matches the NUT, the result is the NUT (true). Else the result is FALSE.

non-looped tree operators

A tree operator takes the result of its left expression, which is either a node in the tree or false. If it's false, it evaluates to false. If it's a node, it hops to a relative of the node. The hop means the selected node becomes the NUT on which the right side of the operator is evaluated.

For example the parent operator (^) takes the parent of the NUT. The expression "[] ^ [color==red]" is executed as:

  1. there's a NUT by the time this expression is running
  2. execute the left side on this NUT
  3. if the result is false, the expression is false, do not execute the next 3 steps
  4. else take the parent of the result and make it the new NUT
  5. execute tag_expr "[color==red]" on the new NUT
  6. the result of the expression is the result of that tag_expr
Or in other words the above example says "return NUT's parent if NUT's parent matches to [color==red]"; since the left side always matches. A slightly more complex example is "[type=foo] ^ [color==red]" which means: "if NUT's type is foo and NUT's parent's color is red, evaluate to NUT's parent."

NOTE: sides of a tree operator are expressions, they don't have to be tag_expr but can be more complex expressions.

Tree operators:

Examples:
[] . [] . [] 1/1 granchild: take a first child of NUT, then take the first child of the child; evaluate a reference to the rightmost node.
[] / [] . [color == red] take a child of NUT whose first child has tag color==red; evaluate a reference to the rightmost node.
[] . [] ^ [] Evaluates to the NUT itself, if it had a child. If the NUT doesn't have a child, the second [] will evaluate to false as it will not match non-existing nodes.

looped tree operators

Any of the above tree operator works in looped mode, that means instead of selecting a direct relative of NUT they loop to find all relatives of NUT in a range. Range is written directly after the tree operator and is exactly like in regex:
* previous operator repeated zero or more times
+ previous operator repeated one or more times
{ int } previous operator repeated exactly int times
{ int1 - int2 } previous operator repeated any times between int1 and int2, inclusive
This means a query can yield multiple results: the loop is evaluated to all nodes within the specified range and each evaluation can result in a true. Examples:
[] >+ [] iterate over all siblings of the NUT
[] >* [] iterate over all siblings of the NUT, including the NUT itself (the important case of "zero times")
[] ^{2-3} [color=red] check the grand-parent and the parent of the grand-parent of NUT; return such an ancestor if it is red
[] . [] >* [] take all the children of NUT (NOTE: the second and third [] will be the same for the first child, * allows that; same as "[] / []")
Such looped operators are actual loops in the lib code, running in O(n) time, and loops can easily be nested:
[] >+ [] . [color=green] >* [] Iterate over all siblings of the NUT, taking the first child of the given sibling. If that child is green, return it and all its siblings Two loops nested: O(n^2)
For the ^, . and > operators the meaning of the range and direction of search is obvious. The / operator does a BSF from NUT and the range applies on the depth it went into children levels. TODO: examples, drawing

logic operators

Expressions can be used to form more complex expressions using the operators &&, ||, ! and (). Their meaning is the same as in tag_expr, but on the level of the expression they operate on nodes matched by tag_exprs. Evaluation of && and || is lazy: if after evaluation of the left side the evaluation of the right side would not be able to change the result, the right side is skipped. This is important if the right side has side effects (see later). The result of && is FALSE or the left expression; the result of || is the expression that was first true, from left to right (or FALSE if both were false).

Examples:
([] > [color==blue]) && ([] . [color==red]) matches if the right sibling of NUT is blue and the first child of NUT is red; the value of the expression is the right sibling if both criterias are met or else FALSE.
([] > [color==blue]) || ([] . [color==red]) matches if either the right sibling of NUT is blue or the first child of NUT is red; the value of the expression is the right sibling if it is blue, else the first child of NUT if it is red, else FALSE.
!([] . [color==red]) matches if the first child of NUT is not red. Evaluates to the child or FALSE. Same as
[] . [!color==red]
!!([] . [color==red]) matches if the first child of NUT is red. Evaluates to the child or FALSE. Same as
!([] . [!color==red])

map assignments

At any point map variables can be assigned. The syntax is name=expr. For example the following expression will BFS all nodes from the root, matching node paths of at least 3 nodes in depth:
 [] /+ m=[] /+ t=[]
It will effectively find all combinations so that: On a reasonable sized tree there would be multiple results. For each succesful match the result of the expression is the same as the value of t, while m is set to a node between the root and t.

This feature allows the caller to extract detailed search results beyond the one node that is returned by the evaluation as result.

map references

Nodes stored in the map can be referenced by their name. This can be used to backreference a node found by an iteration earlier in the expression:
 ([] /+ m=[] /+ t=[]) && (m ^ [color==red])
The left side of the expression is the same as in the previous example; the right side filters the result and accepts only the those matches for which "m's parent is red" is true.

The caller may set map entries before evaluating the expression, and the evaluation is able to access those map values (can read or overwrite them). This feature allows the caller to "pass on parameters" to the evaluation through map variables. For example

[] /+ []
is a BFS from the root of the tree, while
e /+ []
is a BFS from whichever node map variable 'e' points to.

Note: it is the responsibility of the caller to reset the map between iterations (repeated evaluation of the same expression on the same tree). When some variables of the map are not reset, they can be used to communicate nodes between iterations.

strings and string matches

A tag_expr may compare a string to the value of a tag. If the string contains non-alnum characters, it shall be specified in double quotes. Within double quotes the usual backslash escaping are interpreted for \n, \r, \t and \xII (where II is a two-digit hexadecimal integer). \" means a literal double quote. For example
[text=="foo\"bar"]
matches if NUT's text is foo"bar.

Beside string equality check, there is a so called "string matching" operation. What "matching" exactly means is up to the caller (most often it is a variant of regex). The syntax for string matching is slightly different:

tag_name ~SEP pattern SEP
where ~ is the tilde character (string match operator) and SEP is an arbitrary caharacter that does not appear in the pattern. The spaces between the SEP characters and pattern are not part of the syntax, if they are present, they are already part of the pattern. There must be no extra space between the ~ operator and SEP, but SEP might be whitespace. For example the following six tag_exprs all perform the same string pattern matching to "pattern":
[tag_name ~/pattern/]
[tag_name ~!pattern!]
[tag_name ~"pattern"]
[tag_name ~ypatterny]
[tag_name ~ pattern ]
[tag_name ~
pattern
]
SEP may be any character except \0.