Porting methods
Porting software, when translated to the language of the end user, means that
the same source will compile and run on different systems. Different may
range from "different versions of the same operating system" (e.g. Linux
distribution) on the same hardware through "different operating systems
on different desktop and server machines" to "anything that has a file
system and a C compiler, from a 8 core gamer PC to an arm based
embedded system". The decision is up to the user (and the developer).
Normally my interpretation is: if it's manually hacked to work on
two or a few systems, it's cross-platform. If provisions have been
made for "compile and run on any system that has [list of requirements]" and
the infrastructure more or less works, I tend to use the term portable.
However, in this document I chose a very liberal interpretation and
define porting as "getting it to compile on at least one system". This will help
evaluating the different approaches, which are illustrated on the graph
below.
First, the developer needs to decide whether he needs to care much about porting.
1. Portable by Nature
The most convenient approach is when the code does not implement anything
that imply portability issues. For example a single file ANSI C program that
does not use anything than stdio, small integers and has absolutely
no assumption about the execution environment beyond these, falls under
this category. The single file requirement is strong: for a single file
the developer can simply say "compile and run it", without providing any sort
of a build system. This allows mostly everyone from GNU/Linux users to
whatever-proprietary-graphical-IDE users on Windows or Mac to compile the
software as well as real geeks running a System V derived UNIX released
in the early 90s.
But the price is high: it is extremely simple to accidentally incorporate
implications about the execution environment. For example if the software
opens files on the file system, it tends to assume various properties of
the file system - path separator is a common example. Or even just
calculating with integers: how wide an "int" is, while C89 doesn't
provide stdint.h?
A way to escape this trap is to use high level languages, such as shell,
awk, perl, python. However, this method has its own drawbacks. First of all,
the real common minimum for old languages such as shell and awk are very
low. For example awk has random limitation on string lengths, number of
fields, etc, set very low in many old implementations - and there
are a lot of implementations out there! Choosing a modern language
may promise to skip the hassle of implementations - most of them
have only one implementation available (the reasons for this is another,
very interesting story). But this also means the subject software is
only as portable as the one implementation of that language the developer
chose, and the developer can do very little about the portability of
the interpreter.
Nevertheless, this is a popular and working model in some domains. Typical
examples are demonstration of theoretical algorithms of research
papers (where stdio communication is all I/O the software ever does) and
extremely small and simple tools, also without I/O.
Growing code size doesn't help this method either. After a specific size,
programmers tend to restructure and split the code into multiple source files.
Manually compiling such a multi-file project is not what users prefer.
Pros: sometimes very simple but may be a PITA to keep this up for a larger project
Cons: no overhead - unless the software needs to do something more
2. Manual Makefile
When Portable by Nature didn't work, another method is chosen. If
the developer tries to ignore the problem, most often the Manual Makefile
branch is automatically selected. On the other hand, this can be a
deliberate choice of the developer. In the latter case the developer
refuses to chose another branch, most often for one or more of the
following reasons:
- the software doesn't need to be portable at all: special application,
limited user base or marketing strategy (e.g. some smartphone apps
that are developed solely for one smartphone family or expensive
CAD systems with "the end user uses Windows anyway" assumption or the
sole purpose of the software is to fix or extend a specific
operating system or even installation)
- the software can not be portable due to other limitations: drives
special hardware that is not available on other systems or a firmware or
kiosk software that will be run on specific platform only and when
the device is replaced, the software is replaced as well
- overhead of the other branches - the software could benefit
from portability (doesn't fall in any of the above categories), but
the developer decided against other branches because of the overhead
of the build system or even the overhead of programming in a portable
manner
The two major problems with this branch are:
- A. old, widespread build systems like make have a very low common minimum;
it's hard or impossible to write a working generic build script that works
everywhere; it's hard to maintain a verbose, dumb-but-works-everywhere
build script manually.
- B. system specific settings/properties/configuration for the software and
the build system needs manual configuration; doesn't scale well as number
of such settings increases
2.1. Single Manual Makefile
Probably the most common case is when the developer maintains a single
Makefile (or equivalent build script written in whatever language, including
the automatic build recipes of GUI IDEs). In this case the developer can
compile the code, or users who are willing to install the same (or very similar)
system. Because the user base will use the same system, these projects
tend to implement code that includes assumptions specific to that given system
on all levels. Thus these project are extremely hard to port.
Pros: simple, lazy method
Cons: won't work anywhere else
2.2. Edit your Makefile
When this starts to become an issue (sometimes even due to different
versions of the same operating systems or IDEs being used within the user
base), a natural extension is to structure the build scripts to
make system specific parts easy to edit/change. This often means the
code is grouped so that these settings are separated, and documentation
is provided.
This usually makes the project able to compile and run on a few systems,
mostly those that are close to the developer's. It requires ongoing
attention of the developer already: any new feature that may break
on other systems must be made configurable, grouped accordingly and documented;
this already has mentionable overhead compared to the previous method.
This method is extremely annoying for the end user: it assumes the user
knows a lot of little details about their system and is willing to spend
the time (ranging from minutes to hours) investigating and configuring just
to compile.
Pros: trivial upgrade from the single manual makefile, relatively low overhead
Cons: hassle to the user
2.3. Conditioned/Dual Makefile
A trivial upgrade is to collect the configuration for known systems, name
the systems and have a single system choice instead of manual configuration
of each individual property. This is commonly done by conditional parts
in the build scripts. An alternative is to have different set of build scripts
for each setup and let the user chose one of them (e.g. Makefile.linux,
Makefile.win32).
As long as the user is using a known system (i.e. one that the project
is already manually configured to), he doesn't need to change anything.
Or at least as long as enough of the library versions, dependencies and
system configurations match. The fallback is manual configuration, the
previous method.
This method scales very poorly, as any new version of the build scripts
takes some extra time during maintenance of the build system.
Pros: trivial upgrade from the Edit your Makefiles,
medium to large overhead, less hassle for majority of the users as long
as the user base tends to use a small number of systems
Cons: maintenance effort doesn't scale well; users with
slightly different system will have to fall back to manual configuration
2.4. Multiple Manual Makefiles
Essentially the previous method, already running wild. Such projects
have at least half dozen Makefiles from which some used to work
with older versions of the source but bitrotted along the road, others
do work with the source but failed to keep up with the changes in their
target systems.
Pros: "at least build scripts are not generated" and "works for most of my users out of the box"
Cons: doesn't work for minority of the users; maintenance cost is high; confusing for the user
3. Replace Make
Replacing make, or any other (relatively) simple build tool with a more complex
one may solve problem 2/A, which already makes maintenance much easier.
However, it does not solve 2/B - so these solutions tend to end up advanced
versions of branch 2 with reduced maintenance cost.
On the other hand this brings in new dependency which may reduce overall
portability of the software; make is still more available than its
replacements, on exotic systems.
3.1. regmake - case study
Regmake is an abandoned make replacement project. It aimed to be small,
simple and portable and tried to fix the problems of recursive make
and tricky pattern matching (the latter by using regexp, thus the name).
After the prototype started to work, I realized regmake could not solve 2/B.
This means using regmake alone doesn't work and another system is needed
for configuring the software and potentially the build system, which certainly
leads back to the 2/B portability problem that seem to have three solutions:
- single file Portable by Nature project (1.) - clearly not the case if make didn't work
- manual configuration (2.) - in this case using regmake over make offers marginal gain when there are already more than 2 or 3 set of configurations
- automatic configuration (4.) - results in generated files; once there are generated files and an extra step already, the benefit of having the build script static is not that important
Pros: easier to write "clever" (complex), maintainable build scripts
Cons: does not solve multiple manual or generated config problem
3.2. cons
Another long abandoned project is cons. It's a make replacement with
a lot of extra features, written in perl. While it suffers from the same
problems as regmake, it also introduces a new dependency: perl. Depending
on another generic programming language has its pros and cons. While
it makes the system more flexible for those who already master the language,
it my be a burden for developers who do not.
4. Generated Makefiles, Automatic Configuration
A common method for solving both 2/A and 2/B is to generate the build scripts
(most often Makefiles), combined with a tool to automatically detect system
dependent properties for the configuration. When it's not buggy, it frees
the user from manual configuration and can use the oldest, most common
make syntax: generated Makefiles don't need to be clever or small, the
input files need to be.
This comes at a cost, tho:
- the overall overhead of the system is large compared to the simplest
methods in 2.; however, it scales better, in case of software ported
to many systems may be cheaper than manual configuration
- there is an extra layer; whenever the developer needs to modify the build
scripts, the configure tool needs to be run again
- with some implementations the clever high level build-script-input language
considerably limits the developer to one or at most to a few patterns the
tool supports
For this purpose there are more widespread, usually large systems and
less known or even abandoned, usually smaller systems.
4.1. Large: autoconf, automake
The autotools system is a heavyweight implementation. It has capabilities
to generate configuration detection scripts and makefiles. It heavily
builds on an old (widely available) scripting language, m4.
When it works, it helps building relatively portable software;
it's huge, maintenance cost is high. Whenever it breaks down, it most
often takes more time to fix it than to do the manual configuration.
M4 is a real burden as less and less developers know the language, not
to mention users (who are typically the ones who face breakdowns when
trying to configure the system).
My personal experience is that the average autotools project is portable
only among a few modern systems, and most often breaks on anything marginally
exotic. This is usually blamed on the software being configured, not on
autotools itself - but such common misuse of the tool suggests problems
in the tool.
Pros: widespread, widely known and accepted by developers and users, has the potential to efficiently help making software portable
Cons: extremely large maintenance costs; when breaks, the user is in worse position than with 2.2.
4.2. Large: qmake, cmake
These systems offer a much more abstract language for describing the build.
This makes it easier to express complicated structure of the project but
also limits what can be expressed. After a while, for a complex project
with special rules that would be easy to write in a traditional Makefile,
a lot of extra effort is spent on trying to explain the build system
what to do.
TODO
4.3. Small: scconfig
Scconfig does not depend on m4 and does not offer a custom high level
abstraction above the build. It's implemented in plain ANSI (C89) C,
and is a framework for auto detecting system properties. Once the
detection phase is done, all properties are collected in a hierarchical
database.
The next step is to generate the necessary build scripts and source
files (e.g. config.h). For this the developer has five choices:
- A. write ANSI C code that simply prints the files
- B. write ANSI C code that reads templates and generates the files
- C. use generator, the builtin dumb templating language
- D. use tmpasm, the builtin clever templating language
- E. a combination of A, B and C or A, B and D.
Pros: literally no dependencies; the only language the developer must know is C; small, simple, modular; does not restrict how build scripts are generated (templating instead of abstract build language); small enough that the full scconfig source can be embedded in the source tree of the target software
Cons: does not offer blind porting, requires the developer to understand many aspects of porting
4.4. Large: Scons
TODO
4.5. Small: autosetup
TODO
Can be deployed directly in the project source tree. Output generation
is not general, but tailored to Makefile and config.h. Depends on tcl.
4.5. Large: Waf
Heavyweight application that controls the whole process of configuration
and build, implemented in python.
TODO