pcb-rnd knowledge pool

 

Lihata persistency, second approach

persistent2 by Tibor 'Igor2' Palinkas on 2018-05-11

Tags: insight, lihata, persistent, persistency, style, indentation

node source

 

 

Abstract: Current state of persistent save, difficulties, challanges, future trends

  Persistence means that the board file formatting is persistent: with a load-modify-save cycle using pcb-rnd, only the data fields change, but the indentation, formatting, including numerical formats, are preserved.

This mechanism works fine in the current code, has been proven to be useful and safe all through the first half of 2018, 2017 and second half of 2016. However, there are a few real exotic corner cases where it may break.

The first approach, being in use in pcb-rnd in version 1.x.x and in 2.0.x, uses a "parse back" approach. On save, the original file being overwritten is read back, parsed, and compared to the in-memory representation. As long as values match, bytes are copied one-to-one from the original files and new data is written only where it changes old data or where new data is not a modification but an addition.

The obvious advantage of this method is that we do not need to store the indentation/comments in memory after loading a file.

Challenges with the "parse back" method

But this comes with a major drawback: there is no look-ahead. At any given point while reading the document, the code has all the past information available, but no future information. In some situations this makes it hard to make decisions.

For example the validity of a table can be decided only after reading the whole table (number of cols in each row is known only after that). In some situations interpretation of a given indentation character (e.g. whether a whitespace is an after-the-previous node character or a before-the-next-node character) depends on future characters (e.g. reading a \n after the space). A few internal states are maintained in the persistence lib for these, but these internal states are all special case wizardry that are hard to understand and follow.

Another interesting case is when a new file (or subtree) needs to be produced. The trivial approach would be to use the canonical form, e.g. for a line object


ha:line.23 {
  x2=5.0mil
  y1=10.0mil
  clearance=15mil
  y2=10.0mil
  ha:flags {
    clearline=1
    selected=1
  }
  thickness=8.0mil
  x1=0.0
}

The fields are in "random" order, because the hash object does not guarantee any ordering. The other problem is that this description makes the file look longer than necessary. A more compact variant is preferred:


ha:line.23 {
  x1=0.0; x2=5.0mil; y1=10.0mil; y2=10.0mil; thickness=8.0mil; clearance=15mil;
  ha:flags { clearline=1; selected=1 }
}

However, to produce such output, pcb-rnd needs some sort of style description stored in the code that describes the order of fields and style/indentation. With the parse-back setup and the current code, this description is stored in the same structure as the one the parse-back code generates while reading a file. This looked like a good idea, but is not necessarily easy to hand-edit or maintain when the tree is large.

Next generation

I will implement an alternative persistence lib later on, after pcb-rnd 2.0.0. The new lib will not use the "parse back" method directly, but indirectly. On save (or maybe even on the original load), it will parse the original file in pass 1, but will not try to perform the tree merge or any write in the same pass. Instead, there will be an "the tree looked like this in the file" document built in memory, called the verbatim version.

The verbatim tree contains the logical tree, plus style/indentation information, plus node-order information (in case of hash). The code will work on the live version, not on the verbatim version, because we need that as a reference for comparison. The live version doesn't contain any style or ordering information.

On save, we start to traverse the verbatim tree, considering the original order of nodes, and compare nodes between the verbatim and the live trees. This comparison is a bit simpler than with the "read back" approach as we can traverse and search both trees freely, e.g.:

For writing new subtrees with predefined style, the same algorithm can be used with an example tree stored as data. We would have a style tree, a lihata document that serves as a template. It would have a full board description with an example of all possible subtrees (using pattern matching on subtree names). The actual data fields (values of text nodes) in the example tree could be ignored, but the node ordering and indentation can be used. When traversing the verbatim tree, if a new subtree is present in the live tree but not in the verbatim tree, the code can first look at sibling subtrees of the same type or refer to the stored style tree.

The result of the process is a new tree in the memory, called the output tree. As described above, the output is merged from three source:

Just like the verbatim tree, the output tree also understands and stored node-order information and indentation and comments.

If the tree merging process finished, the output tree is dumped to disk.

Why use a verbatim tree?

It may look like an unnecessary extra step to use a verbatim instead of just storing the styling information in the live tree. The reasons for doing this is simple: we do not keep the live tree during the editing session; pcb-rnd keeps an in-memory representation of the board, which is not lihata. Thus any extra info (not directly useful for core) in the live tree would be lost right after the load.

Of course it would be possible to just preserve the live in memory after load, but that wouldn't be much cheaper than parsing it again on save. This can be regarded as an optimization: many sessions do not end in a save - keeping an extra lihata document in memory would be a waste. In other words, save gets a little bit more expensive but in return everything else gets cheaper.