pcb-rnd knowledge pool
Why I dislike KiCad pcbnew's s-expr format
kicad_fmt_fun by Tibor 'Igor2' Palinkas on 2019-04-06
Tags: insight, kicad, pcbnew, s-expr, s-expression, board, file, format
positional vs. field names
A major characteristic of text file formats is whether data fields (arguments, parameters) are encoded in a positional manner or using field names .
In the positional setup, parameters are in a fixed order, the parser knows exactly how many parameters there would be and what their meaning is (by their position on the parameter list). A typical example is the spice netlist format, geda/pcb's board format but KiCad's old board format was position too. For example in the old KiCad board format a silk line in a footprint looks like this:
DS -3.9497 2.25044 3.9497 2.25044 0.127 21
Parameters of DS are: x1, y1, x2, y2, thickness, layer
Pros: compact format, easy to parse using any language (a common minimum in any programming language for reading text files is reading the file line by line and splitting the lines into words).
Cons: after about the 10th parameter it becomes a nightmare to maintain the parser, especially if backward compatibility is required on an expanding file format. Even after the 5th parameter it may be getting inconvenient for the human eye, especially if arguments are of the same type. Implementing "vararg" is very limited: the variable part need to come at the end. Once any flexibility (e.g. vararg) is introduced, it becomes very hard or straight impossible to insert more static fields (this is how QUCS can't add a footprint field).
All in all, I recommend using this format with very careful design and only for explicitly limited file formats. For example tEDAx uses this format, but knowing and building on these limitations.
The other format, with named fields simply assigns a name to each data field, so the order of parameters wouldn't matter. Typical examples are xml, json, lihata or s-expression. An example on KiCad's s-expression for a footprint silk line:
(fp_line (start -1.6 -1.2) (end -1.6 1.2) (layer F.SilkS) (width 0.2))
Pros: much easier to read by the human eye, sort of self-describing. Very easy to extend without breaking compatibility. Very easy to use flexible/variable fields.
Cons: more verbose (longer files) and requires a more advanced parser. Doesn't play well with the traditional UNIX shell tools (sed, grep, awk) unless invalid assumption are applied about how other tools will place newlines.
KiCad's strange mixed approach
So KiCad switched from positional to named. Fine, for describing a complex board, maintaining an evolving format over decades, that is a good choice. We did that too with pcb-rnd when switching from geda/PCB's positional format to lihata.
However, while working on mapping KiCad's file format I found something interesting: the code doesn't seem to have a proper s-expression parser. Instead, it's really just reading the tokens (like parenthesis!) mostly sequentially. This is a design decision, not necessarily bad, but not to my liking - but just a minor thing.
However, what I find real bad: some of the named fields are also positional . I mean in the above example, the order of start and end does matter for KiCad:
Works fine: (fp_line (start -1.6 -1.2) (end -1.6 1.2) (layer F.SilkS) (width 0.2)) Parser error: (fp_line (end -1.6 1.2) (start -1.6 -1.2) (layer F.SilkS) (width 0.2))
Some fields can be specified in any order, but other fields, like start/end for fp_line or center/end for fp_circle needs to be at exact positions in the parameter list.
This seems to be a real poor combination: verbose file format but randomly enforced positional arguments.
Even worse, I couldn't find a reference to this in their (otherwise very thin) file format documentation.
This property of the file format design definitely makes it harder to write code that produces valid KiCad files.