When we start writing production code, we face the usual dilemmas of programmers:
This article collects considerations that may help finding a sustainable balance in these questions. The conclusion presents a practical solution to some of the problems discussed.
This suggests depending more on libraries. The programmer who wrote the generic purpose library for the subproblem perhaps spent much time on it, probably evaluating different algorithms and APIs. The library probably got a lot of testing already and would work much better out-of-the-box than new code.
However, these are assumptions, and should not be trusted blindly. Sometimes the library is not implemented along the same requirements/preferences as the project that tries to use it. Sometimes the library is simply not implemented properly. A well designed API does not automatically mean proper implementation or throughly tested code. Popularity doesn't imply quality either. The user should download the source of the library and evaluate code quality.
If the library is old with a lot of users on a lot of system, it doesn't only mean that it's well tested, but it often means it's also large: some users had special needs and when the library does 99% the user needs it's hard to resist adding the missing 1%. This may encumber the library with code (and API) for rarely used special cases, that may later on interfere with each other or with the normal case code.
Large code is hard to review. It's often more expensive to read the code of a library than just implementing an alternative from scratch (for the relevant part). Unfortunately there is a third option: use the lib without review, blindly trusting the "brand" (e.g. "it's large, it's popular, it can not be that bad.").
This in turn may lead to another problem: what if the library doesn't do exactly what is needed in the given use? The match between what the library offers and what the application needs is obviously never 100%. The application code may needs to do some extra (e.g. converting its own idea of the world to the API of the lib), needs to accommodate. This works well, up to a point, but sometimes the difference is so large it'd be easier to change the library and save the extra cruft. If the library is small and easy to understand, this is a real option, while it rarely happens with libs that span multiple 100k source lines.
This sometimes leads to more application code spent on the interfacing than what it would cost to have a local implementation of the functionalty.
The con for reimplementation is exactly the hidden knowledge accumulated in the library: all the experience in handling corner cases, all the testing on different systems, etc.
From another perspective, this hidden knowledge is probably better to be collected and maintained at one place, in a library, instead of many places (local implementations in different applications).
Large libraries have more parts that can change, and because of the extra interdependencies, it is more likely that a change triggers another. Because of the sheer size and complexity, mega libraries are less likely to stabilize, e.g. reach a state where developers say "done, finished, the library already supports everything we ever wanted it to". This leads to a constant stream of changes and bump of version numbers, which makes it harder to determine API compatibility precisely.
After some time system stopped supporting gtk1.2 and developers had to spend time forward porting the gtk1.2 GUI module to gtk2.0. Some widgets were removed, new widgets introduced, support infrastructure and API changed. All in all, developers had to spend their time forward porting their application not because they were necessarily fond of the new features, nor because there was a strong user demand for the actual new features of gtk2, but often only because if major OS distributions stop supporting gtk1.2, the application that depends on gtk1.2 is also left behind.
Gtk3.0 followed gtk2.0 in 9 years. The same cycle repeats: as of now (2016) we can happily use gtk2.0 on most systems, but there are signs that applications should switch to gtk3.0.
So an application that depends on gtk needs to spare some developer time spent on forward porting the GUI. Users using the said application need to spare some cycles learning a new GUI over and over. Not because they do want to have a new GUI every 6..10 years, rather because they have to follow the trends or the application bit rots and gets unusable. There is no other way around this: even gtk1.2 is too big to be fully shipped with a random application and forward porting gtk1.2 itself to new libc and X or windows versions would be even more expensive.
If an application depends on multiple large libraries and tools, gtk, glib, guile, boost, autotools, etc., each such dependency will have its own force-forward-port cycle. As such large libraries often have more developers than small projects or applications, this may result in a system where a big chunk of the application developer's time is spent on following dependencies.
Fortunately there is an alternative. When picking dependencies for a new project, estimate how much developer time will be available on the long run and, looking at history and trends of the dependency, estimate how much time it will take just to keep up with the given dependency. It may turn out that going for the given dependency is more expensive on the long run than going for another solution that has higher first-time costs, but pays off later. To avoid burning excess developer time on forward portig, consider:
If the release depends on libraries, the programmer will follow similar rules: more-or-less blindly depend on the quality of libraries above a certain size. However, there's an extra level or risk involved: even if both the application alone and the library alone are safe, the combination may be unsafe. Complex, opaque API and the sheer volume of the library source code may make the application programmer to use the API without fully understanding what happens in the library code, which in turn may lead to unforeseen vulnerabilities.
Building software using libraries is often a surprisingly similar process. Some libraries are designed to be a set of small, weakly dependent parts that can be freely combined. The user employs the additive process: it selects the "raw material" his program needs and adds them the way the program needs them. There's usually very little excess material added in this process, and most often such burr is tolerable.
Other libraries are designed to provide a large set of interdependent infrastructure; when the program starts using one, it can hardly avoid using others. This is the subtractive method: by default all features are used and the user need to manually remove (turn off, or even work around) excess features that are not needed or are sometimes even harmful.
Whether a library ends up used in subtractive or additive ways is more or less hardwired in the design of the library API. The subtractive method is hard to avoid when using megalibs like glib or gtk . On the other hand the additive method is easier to achieve with minilibs, as they often focus on a single problem.
The outcome of the evaluation may indeed be to go with a large lib. In other cases it turns out using a small, specialized library is better. For reference, a collection of such mini libraries follows in the appendix.
size: lines of source code as calculated by sloccount. Excludes regression test code and auxiliary utilities in case of libs.
standard: all projects are implemented in C; this column shows the oldest C standard the code aims to complaint to.
has cross platform parts: most of these libs are 100% portable plain C code with no dependencies. Some of them need to depend on calls that differ from system to system. These libs are marked as having "cross platform parts", which means some of the code is implemented multiple times, each instance for one of the systems supported.
type generic: applicable to data types only; the code can be configured or instantiated for different user supplied types. The code is type safe (different instances have different types that don't mix, e.g. an "string-to-int" hash lookup is different from a "int-to-pointer" lookup).
alternative for: the project is a good alternative for the well known public project(s) listed in this column. Since some of these projects have huge code base, a minilib is often alternative for parts of the projects. The project size is calculated with sloccount, not counting regression tests and auxiliary utilities wherever possible.