Lihata C API

1. Introduction

Lihata C API is a set of libraries layered on top of each other in a way the user can choose how many their application code wants to use. The layers are: Size, code complexity and API complexity of the layers are increasing from the core toward the higher levels.

Each level of the stack is designed to be reasonably reentrant:

This is achieved by using document handles on all levels. These handles store all internal data and parser states of the document and the library does not depend on global variables. For many read operations (most notably tree print functions) any call-local storage is allocated on the stack so multiple concurrent calls on the same non-changing tree shall work.

No layer of the library depends on threading or thread libraries. For non-threaded applications the above concurrency rules restrict operations that can be executed from callback functions. In threaded applications the write operations shall put exclusive read-write lock on the lihata document to make sure no read sessions are confused by the write. For example a recursive tree print of a lihata document in one thread may produce undefined behavior if the tree is changed during the descend.

2. The core

The core library is not useful alone, but is required by all above layers. It is dealing with basic types and constants derived from the specification and possible parse errors.

3. The event parser

The event parser consumes the document character by character in a non-blocking manner. The caller is responsible for feeding the parser until it returns an error or PE_STOP. Normally the caller shall pass on the document without any filtering (even passing the EOF). If the parser detects an error or a valid EOF (root node closed), it stops parsing and returns PE_STOP upon subsequent calls. This means the event parser ignores anything beyond the root node.

Parsing a stream with a single root node behaves exactly as parsing a file document. When parsing a streams without a root node, the caller should reinitialize the event parser after the root node is closed.

Detecting when the root node is closed can be implemented in different ways:

The event parser will generate open/close events in pair for nodes that may have children. The close event is anonymous - if the event callback needs to pair up close events with node events, or remember the node name or type at close, it should maintain its own internal database of open nodes. Node types that can not have children will trigger only a single event that will hold all properties of the node (i.e. text or symlink nodes will not have open/close events but a single textdata event).

The event parser is not required to remember more than the maximum two textual tokens and the type of the current node. This allows the caller to set an upper limit of memory usage of the event parser. However, this also means the event parser can not do all checks and validation of the document the lihata specification requires, and it is the responsibility of the caller to implement those:

Note: the DOM parser does implement all these checks.

The event parser is recommended when the application is not required to have random access on the data but can process it sequentially. This often means the document consists of lists/tables and the application expects a specific order of data; or the document is built of hashes and the application sets the value of variables by name. The simplest example expects a single list or hash node as the root with setting nodes of known order and/or name.

NOTE: This should not mean the order of nodes matters under a hash.

4. The DOM parser

4.1. generic concepts: documents, trees, nodes

The DOM parser uses the output of the event parser to build an in-memory representation of a lihata document. To distiguish between them the latter will be called lihata stream in this document, and will mean a lihata document read from or written to a file, a network connection, a pipe, etc; document will always mean the in-memory repersentation.

The sequence for loading a stream into a document (lht_doc_t *):

There are accessors for many non-trivial functionality, but there are no accessors for the trivial ones. Instead, the relevant fields of the public struct (lht_doc_t) should be read (or in rare cases written) directly. The most important fields are:

The same rule applies on nodes; the node struct (lht_node_t) has the following public fields:

After loading a lihata stream the document is integral which means all nodes are part of the tree and are accessible from the root. There are four ways a node or subtree can be removed from the tree:

Note: the above calls take a node pointer to be unlinked/detached/deleted/moved. Getting such node pointers is usually done by resolving paths using the path API or manually descending the tree using lower level calls and/or struct fields.

It is possible to create floating nodes with the low level node creating calls. A floating node is not part of any document and does not have location information. All administration regarding to floating nodes is the duty of the caller. A floating subtree can be destroyed by deleting it using lht_tree_del().

4.2. location

A document is most often loaded from a stream and not created from scratch by the caller. During loading, the DOM parser attaches location information to each node (stream name and line number). After merging trees the location references of a single document may point to different stream names.

The following three read-only fields are used to store the location in the node_t struct:

To save memory and speed up location handling, each document has a hash table of all stream names ever referenced by any location of any node that ever have been part of the document. file_name may be NULL (e.g. for nodes created after the load operation).

4.3. data

The ->data field of node_t holds the value of the node in the following format: NOTE: anything else than text and symlink value should be accessed using high level accessors as described in section 5.

4.4. doc uninit

When a document is no longer needed, lht_dom_uninit() should be called on it to free up all resources, including the unlinked list.

4.5. iterator

Two functions and a mostly opaque struct are provided for iterating over the children of a node (regardless of the type of the node). The loop should call lht_dom_first() before the first iteration and lht_dom_next() for subsequent calls. The state pointer (lht_dom_iterator_t *) usually points to a local variable which doesn't need to be initialized before the first iteration or cleaned up after the last iteration.

The iterator considers only the direct (first-level) children of the node.

The caller is free to break the loop. Inserting new children or deleting existing children under the node may cause undefined behaviour.

4.6. debug/diganostic

Function lht_dom_pnode() is provided for printing a node non-recursively and function lht_dom_ptree() for printing a subtree recursively. Both calls get a node argument, an output FILE * and a prefix which is inserted before each line printed. Output is a line based textual format intended for diagnostics, often including internal states/pointers.

4.7. building a tree

Allocating a new floating node is possible using the lht_dom_node_alloc(). A floating node can become part of a tree by:

4.8. per-type calls

The following sections describe special, type-specific accessors. They are useful for low level operations, but very often can be avoided by using higher level API described later. For example instead of using the node->next field in a list or addressing from row;col loops in a table, the iterator API can visit each child of a node regardless of the node type. When seeking a node or descending the tree, the path API is the type-independent generic way.

4.8.1. list

Calls lht_err_t lht_dom_list_append() and lht_dom_list_insert() are provided for growing lists. Counting the number of elements on a list is done by int lht_dom_list_len(). There is no type-independent API for these.

Looking up children nodes can be done using the higher level path API, or the list-specific calls lht_tree_list_nth(), lht_tree_list_nthname().

Deleting/detaching/unlinking/replacing children can be done by the generic API or by lht_tree_list_del_nth, lht_err_t lht_tree_list_del_child, lht_tree_list_detach_nth, lht_err_t lht_tree_list_detach_child, lht_tree_list_replace_child.

4.8.2. hash

Inserting a node in a hash is done using lht_dom_hash_put(). Children nodes can be counted by lht_tree_hash_len(). There is no high level equivalent for these calls.

Looking up a node in a hash, by name is possible using lht_dom_hash_get() or by using the path API.

Deleting/detaching/replacing children can be done by the generic API or by lht_tree_hash_del_child(), lht_tree_hash_detach_child(), lht_tree_hash_replace_child().

4.8.3. table

There is no direct way of inserting nodes in a table. Instead, a row or column of anonymous empty strings can be inserted using lht_tree_table_ins_row() and lht_tree_table_ins_col() then any empty (or non-empty) cell can be replaced using the generic lht_tree_replace() (after looking up the node pointer of the cell).

It is possible to delete a whole row or column using lht_tree_table_del_row() and lht_tree_table_del_col(). When a single cell is deleted using lht_tree_del() it is replaced by the anonymous empty string.

Looking up the node pointer of a cell can be done by the path API or by lht_dom_table_cell() and lht_tree_table_named_cell().

4.9. node replace

Any node of a tree can be replaced with a detached node using lht_tree_replace(). The replace operation detaches the original node from the tree (and its document) and links the new node in place. The detached old node (and subtree) becomes floating and shall be destroyed (or inserted in another document) by the caller.