pcb-rnd knowledge pool

Feature complexity considerations vs. data model and hardwired code vs. user data

feature_cmpl by Tibor 'Igor2' Palinkas on 2018-05-06	Tags: insight, feature, complexity, future, orthogonal, clean, data, model
node source

Abstract: How to add new features; what sort of features are accepted and what features are refused. How to keep to code maintainable long term.

1. History

Remember the old pcb data model? A bunch of arbitrary limitations (e.g. no silk polygon in elements, no copper arc in elements , no thermal on pad , etc.) and tons of special cases (e.g. derive mask and paste shape from pad dimensions, a specific flag for "no paste" for pads, but the same thing didn't do anything for vias, not sure what it did for pins, etc.)

No text in footprint - except the 3 magic text objects enforced by the design that you must have whether you like it or not, and you must have them on the same coordinate else the code just breaks.

And the broken layer model... No clear concept of physical layers vs. logical layers. You could draw almost anything on copper, silk and "outline" layers (which was really just a copper layer with a special name ), but you couldn't draw on paste or mask layers. And vias had to have the same shape on all layers. Special hacks to get the mask layer rendered inverse.

I really think the original PCB data model is largely broken. To be fair, geda/pcb is not alone with that. With Erich we have worked on a lot of load/save code to various other EDA formats. The most popular ones, especially kicad's and eagle's are at least as much broken as pcb's, they just break on different areas. There are a few EDA tools with better (more generic, more capable) models out there, tho.

2. How did that even happen?

The reason for all these ugly hacks, one may say, is obviously decades of development accumulating - I mean how would you know back in 1994 that you are going to need smd pads and paste?

On one hand, this is true, you can't accurately predict the future on 10 years scale, and you can't prepare for everything. But in this case I'd say it's only 1/3 that and 2/3 wrong approach.

The wrong approach is that if we have a problem, we just add a special case to solve that one problem specifically. This sounds like a trivial, good idea first:

We have vias and now we have named pins? Just copy the via code and rename things to pin. Results delivered fast... and then thousands of developer hours wasted for the next 15 years having to maintain two copies of essentially the same code in every corner of the code base.
We have smd pads already, and someone comes with the idea of having to export paste; good, let's just calculate the paste layer from the pad objects! Quick result then, but uneditable paste layer, no complex paste patterns modern parts need (ask John, he has real cool paste patterns with QFN like parts!). And more importantly: later on, have to introduce yet another kludge to work around this kludge: the "no paste" flag.
We can draw lines and vias, but we have no elements; no problem, let's invent elements and duplicate the line as element-line, via as pin (element-via really), and later arc as element-arc, what could go wrong? Except again the whole code base has to deal with tons of special cases for the new, element-only objects, and in return users are constantly bumping into the limitations like "why can't I have an arc in copper in my element?".
so we have vias and pins already, but sometimes we need just a hole; instead of saying the pin/via object is constructed of some orthogonal parts, like copper shapes and hole and plating, the old model introduced a "hole" flag, which single handedly removes the copper ring and the plating. Now if you wanted a hole that had copper rings but no plating, or one that had plating but no copper rings, you just can't do it easily, and solving this from code would mean introducing more flags for more special cases.

The common pattern of the wrong approach is that instead of implementing orthogonal, generic solutions that elegantly solves a wide range of problems (by the way including the original problem that triggered the implementation), it implements special case hacks. Special case hacks that usually control multiple unrelated things and have tons of unforeseen side effects - because for that one special feature we wanted to have, this selection of effects were relevant, and who cares if we need another selection tomorrow.

Special case hacks that are usually real quick to add, and look good first, but then start to interfere with other special case hacks. Long term this is potentially an O(n^2) problem: the more hacks you have, the more interference you have between pairs of hacks and then the less likely you can add a new hack because of the impact: you need to interface with 200 already existing hacks!

A trivial example is import code. If you want to be able to import from whatever file format, you need to understand pcb's data model, and support each object type. If you have separate object for "a line on the board" and "a silk line in an element", that makes it harder, you simply need to code your importer to handle more special cases. Later on if you decide to add polygon on silk in elements, you need to revisit all existing importers and upgrade them.

Now look back at geda/pcb - when I last digged up, I've found requests of "lifting element restrictions" back from 2004. PCB had a real productive period between 2005 and 2008 then one around 2011. Still nobody addressed the element restrictions. Why? Probably because knowing the code they too realized this: they either started adding more hacks and then made the code even more unmaintainable or they had to allocate a huge amount of time to rewrite the whole element model, which they didn't dare to do.

(Data: in pcb-rnd, it took me net 323 hours to implement subcircuits, after spending more than 200 hours on the layer rewrite first, without which it would not have been possible. Generalizing the terminal support so that we didn't have to stick with special pins and pads took 107 hours. Design and implementation of padstacks took another 306 hours. Then 252 hours to switch over the IO plugins to the new data model. All together this is in the range of 1250 hours, and it does not include at least 500 more hours of pure bugfixing of things that were directly related to the new data model. So it is realistic to say that the total cost of the data model rewrite was close to 2000 working hours. I think anyone who were hacking PCB in their active periods deep enough back than realized that cleaning the data model up would take something in this range but I believe they couldn't afford this high amount of hours on the project.)

3. so how can it be done better?

By removing old hacks and implementing new code that try to provide orthogonal, generic solutions that do not interfere. I've been working on this a few thousand hours since pcb-rnd started, so it's easiest to demonstrate this by a few examples.

3.1. subcircuits

Unlike with pcb elements, you can have anything in a pcb-rnd subcircuit that you can have on a board. Any object on any layer. There are really no limitations. This is not achieved by adding a lot of code for handling the special case of "what happens if this line is part of a subcircuit". It is done by removing a lot of special code for elements, and saying a subcircuit has the same data struct as a board.

3.2. padstacks

We removed the special case for via, another for pin sharing like 90% of the code, and a third for pad which shared most of its code with the line object. We introduced a generic object padstack, which is defined as a vertical construct that can have a different shape per layer type (supporting any board layer, not only copper), and an optional hole. This one object can handle all cases the old code could handle for vias, pins and pads, easily introduced blind-buried vias, and can be used for a host of other cases like align markers, fiducials, unplated holes, little mask cutouts to expose some copper for anonymous test point, etc.

3.3. layers and side effects

Very early versions of pcb did deal with copper only. Then came elements, which could draw on silk, but the user was not allowed to draw explicit objects on silk. If you started to use PCB in the mid 2000s, like me, I hear you saying "What? Can't draw on silk? ????" - but in reality that's not the special case, as you couldn't draw on paste or mask layers either. The special case was really that you could draw on copper - all other layers were just generated from elements (and pins/pads/vias later on).

Our predecessors decided to add the hacks to make silk editable, and we got used to that. Later on paste and mask layers got introduced. But why didn't they make them editable right away, or at least make them editable later on? The answer is the O(n^2) nature of the problem, as described in point 2. Silk happened first, the number of special cases and hacks was lower so it was realistic to add more hacks. By the time paste and mask happened a lot more hacks had accumulated already and getting everything to work to an acceptable level would have cost more than it cost with the silk some years earlier.

How does this work with pcb-rnd? We have all these layers the same: mask, paste, silk are not special, they are just layers, with material set to mask, paste or silk. You can draw on them just like drawing on any layer. Subcircuit and padstack side effects can be rendered on them, but even that is controlled by the user, using the "AUTO" flag of the layer.

(In fact our special layer is copper now: I had to disable negative draw on copper, because it would make it terribly inefficient to calculate galvanic connections that way. This is not something we can easily solve: we either do connection finding, or not; and we either draw negatively or not. If we have to do connection finding together with negative drawing, that means the whole layer, including all lines, arcs, texts, padstacks have to be converted into polygons and the connection finding has to be done on the polygons - which is very expensive in CPU time)

4. hardwired code vs. data

Some aspects of the old data model was hardwired code. For example the mask cutout shape and size and the paste shape were all derived automatically from the pad shape.

On the one hand this seemingly makes novice user's life easier: no need to care about these, just go with default values and it will be right, auto-magically. On the other hand for intermediate or advanced users this starts to be a limiting factor.

This has an even worse aspect, which can be demonstrated better with clearance and thermal calculations. Clearances are calculated automatically, using a single clearance value, around any objects that sits in a polygon. A thermal is also computed graphics, depending on object clearance size and object size.

Now what if you do not like how the thermal looks? For example it would be so nice to have the fingers thicker.

The problem is that the resulting copper shape depends on two independent things:

the hardwired code (with hardwired constants and weights!)
user provided data

We have many users and many many boards out there. The user provided data remains what it is, as it is stored in each of those many many boards. But the hardwired code is shared. If we ever change the hardwired code, it will fix the problem for one user, and for future boards, but will also break those many many existing boards.

This means once such a calculation is in the code, it is likely to stay there and very very unlikely to change. Not even the configuration constants in that code because that would change existing copper too!

Long term the solution is to be very careful about what we hardwire, because these very fast end up as burdens. Whenever we can, we should rely on user data instead of hardwired computations.

Padstack is a good example on how to clean up the situation: instead of the hardwired code for mask and paste shape, padstacks let the user specify the shapes and sizes on all layers.

To make the bridge between the old world and the new world, we have a compatibility layer that executes the old algorithm to generate the missing data and create a padstack out of a pad or pin. On the GUI, we have a padstack editor that has nice buttons that can generate the shapes for you, deriving it from the corresponding copper shape, using the same old algorithms. This way the simple case is not too hard to produce using the GUI.

5. What features are accepted or refused?

It's a complex "scoring mechanism". If a new feature:

is a special case hack
for solving one exotic problem,
is limited to solve only that one problem
at a cost that it introduces an aspect that a lot of other code has to deal with

it is most likely refused, and we will need to look for an alternative solution, that:

introduces an orthogonal feature that solves a set of related problems (including the original problem)
preferably does not overlap with existing other features
does not introduce an aspect that then has to be cared about in a lot of other parts of the code.

As mentioned above, I've invested a lot of hours in cleaning up the original mess. By selecting which feature we accept and which feature we refuse, we are trying to avoid reproducing the same issue during the next decade.

There's always multiple solutions to a problem. Don't stick with the first idea, even if it seems easy to implement today. It is much more important what impact it has tomorrow.