Lihata C API
1. Introduction
Lihata C API is a set of libraries layered on top of each other in a way
the user can choose how many their application code wants to use. The layers
are:
- core constants and types and associated utility functions (lihata.h)
- event parser (parser.h) - depends on core
- DOM parser that builds a tree (dom.h) - depends on the event parser
- tree utils (tree.h) - depends on the DOM parser
Size, code complexity and API complexity of the layers are increasing from
the core toward the higher levels.
Each level of the stack is designed to be reasonably reentrant:
- the application may maintain multiple lihata documents
- threaded apps: concurrent read/write operations on different documents will work
- threaded apps: concurrent read operations are guaranteed to work on a single document
- threaded apps: concurrent read while write will not work on a single document: the application should not attempt to change a document while other threads are reading it
- threaded apps: concurrent write operations will not work on a single document and the application is responsible for implementing locking mechanism
This is achieved by using document handles on all levels. These handles
store all internal data and parser states of the document and the library
does not depend on global variables. For many read operations (most notably
tree print functions) any call-local storage is allocated on the stack so
multiple concurrent calls on the same non-changing tree shall work.
No layer of the library depends on threading or thread libraries. For
non-threaded applications the above concurrency rules restrict operations
that can be executed from callback functions. In threaded applications
the write operations shall put exclusive read-write lock on the lihata
document to make sure no read sessions are confused by the write. For example
a recursive tree print of a lihata document in one thread may produce undefined
behavior if the tree is changed during the descend.
2. The core
The core library is not useful alone, but is required by all above
layers. It is dealing with basic types and constants derived from
the specification and possible parse errors.
3. The event parser
The event parser consumes the document character by character in
a non-blocking manner. The caller is responsible for feeding the parser
until it returns an error or PE_STOP. Normally the caller shall
pass on the document without any filtering (even passing the EOF). If
the parser detects an error or a valid EOF (root node closed), it stops
parsing and returns PE_STOP upon subsequent calls. This means
the event parser ignores anything beyond the root node.
Parsing a stream with a single root node behaves exactly as parsing a
file document. When parsing a streams without a root node, the caller
should reinitialize the event parser after the root node is closed.
Detecting when the root node is closed can be implemented in different
ways:
- check if two subsequent calls return PE_SUCCESS and PE_STOP
- trace depth of nesting in the event handler; when the outmost node is closed, the root is closed
The event parser will generate open/close events in pair for nodes that may
have children. The close event is anonymous - if the event callback needs to
pair up close events with node events, or remember the node name or type
at close, it should maintain its own internal database of open nodes.
Node types that can not have children will trigger only a single event
that will hold all properties of the node (i.e. text or symlink nodes
will not have open/close events but a single textdata event).
The event parser is not required to remember more than the maximum
two textual tokens and the type of the current node. This allows the
caller to set an upper limit of memory usage of the event parser. However,
this also means the event parser can not do all checks and validation
of the document the lihata specification requires, and it is the
responsibility of the caller to implement those:
- keys in a hash must be unique
- children of a table must be lists
- rows of a table should have the same length
Note: the DOM parser does implement all these checks.
The event parser is recommended when the application is not
required to have random access on the data but can process it
sequentially. This often means the document consists of lists/tables
and the application expects a specific order of data; or
the document is built of hashes and the application sets
the value of variables by name. The simplest example expects
a single list or hash node as the root with setting nodes of
known order and/or name.
NOTE: This should not mean the order of nodes matters under
a hash.
4. The DOM parser
5. tree utils