plumb 5 2012-08-12


plumb, the pipeshell - language syntax and grammar


Plumb's script language is mainly declarative with a few imperative transaction control commands. The declarative part is used to create variables, processes and pipes between the processes. While parsing the script, plumb is building a binary in-memory representation, called transaction. Transaction control commands are somewhat out-of-band, instructing plumb to execute or discard the transaction. End-of-file on a commandline -f script or a end of a commandline in-line script are handled as implicit transaction control command to execute the script.

Plumb has a single global namespace for variables and process names of the running script. Each transaction has its own local scope for variables and processes while parsing, and if the transaction could be executed succesfully, the local scope is merged into the global.

1. Tokens

The language consists of comments and statements. Comments are lines starting with a hashmark (#), and are ignored by the parser. Statements are separated by semicolons and/or newlines. Empty statements are accepted. A statement may be one of:
variable assignment
process creation
process control
query and diagnostics
FD (file descriptor) reference
transaction control
A typical plumb script will assign variables, create processes then build pipelines referencing FDs of already created processes and/or creating new processes as part of the pipelines.

Scripts issued runtime, through a [cmd] process will often execute queries, process controls but ocassionally may create new processes and/or pipelines as well, and must use explicit transaction control commands to execute its transactions.

Diagnostic commands may be used to create snapshots of internal states at specific stages of executing a script or even runtime.

2. string literals

A plumb string is a regular string or a generic string.

A regular string is a word that contains one or more alphanumeric characters, underscore, dot or dash or slash. Regular strings don't need protection: they can be written in the script without quotes or escapes.

A generic string is zero or more characters, and does not contain \0. Generic strings shall be protected by single or double quotes. Double quotes in a single quote protected string, or single quotes in a doble quote protected string do not have any special meaning. The following escape sequences are interpreted in both quotings:
\\ a single backslash
\" or \' a double or a single quote character (in a string protected by the same quote)
\n newline character (ASCII decimal 10)
\r CR character (ASCII decimal 13)
\t (horizontal) tab
\xXX a sinlge byte hexadecimal XX; both lowercase and uppercase works
A backslash followed by anything else is ignored and removed.

Two or more strings can be concetaneted using the + operator; the result is a single string.

String examples (each example is a single string):
foo-bar1 a regular string
'foo-bar1' a generic string
"foo-bar1" the same generic string
"foo-" + bar + 1' the same string
"" empty string
"\"q\"" "q"
'"q"' "q"
'hello\nworld' hello world, in two lines

3. Variables

Variables are evaluated during parsing the script. In this regard a plumb variable is more similar to a C macro than a real variable. A variable may change value during parsing, and at a specific variable substitution plumb will always use the latest value at that point. A variable is looked up first from the transaction's own scope, and if it is not found there, from the global scope.

Substituting a variable that has not been initialized causes an error.

A variable can be initialized from the command line with -v. In this case the value is copied verbatim, without variable substitution or any other transformation.

Another way to initialize a variable is assigning it from a plumb script: name = value. Both name and value are case sensitive strings (variable substitution and string concatenation can be used in them).

For variable substitution there are three different syntax:
${name} substitue as a string literal without further parsing of the content; useful to pass on arguments to processes and strings, as no backslash protection is required
$(name) substitue as a series of tokens; the value of the variable should start and end with a valid token but may contain any kind of tokens; behaves similar to include, but more in-line; useful for shorthanding commonly used constructions
$name same as $(name); works as long as name is a valid ID

# simple variable assignment

# using foo for declaring baz to "bart"
baz=$foo + t;

# redefine foo

# initialize a new variable BAZ with the same value as foo

# the following substitution let's the parser evaluate the content of
# foo; the effect is declaring v1 to be "track".

# it is possible to create and use variables with tricky names using
# -v; to use them later in the script, the $() form is required. The
# following substitution assumes command line arguments -v foo:bar val

4. Includes

An include command instructs the parser to copy-paste a file in place of the command. There are two types of include:
include FILENAME copy-paste FILENAME in place
lib-include FILENAME copy-paste FILENAME in place, encapsulating anything found in the file in a new sub-transaction when executing the current transaction; this is useful when internal states are dumped or drawn using plumbviz(1). Lib-included files will have their own subtransactions with separate local variable scope, not shared with the parent transaction; if this is not desired, the parent transaction should flush before lib-include, to push its transaction-local variables into the global scope that is visible in the subtransaction
Since an include command is a statement, the file included must start and end with a valid token.


lib-include stdio
include part1.pb
include part2.pb

5. process creation

Synopsis: name=[cmd arg1 arg2 ... argN],prop1=val1,prop2=val2

A process is created by listing its command name and arguments in a square brackets for virtual process or curly brackets for real process. The name is a string and after the name each new string until the closing bracket is a new argument.

Optionally the process may be named by prefixing the process creation with "name=", where name is a string. If name is not specified, a random name is generated. This makes later FD references impossible, but is useful for creating "anonymous" stdin->stdout processes as part of a long pipeline or creating background processes that will not be piped anywhere.

After the closing bracket, a colon separated key-value pairs may follow, describing the process properties.


# run an awk script named ai
ai={awk -f ai.awk}

# on some systems : needs to be used in the path; since : is part of
# the syntax, such strings must be protected
visualization={"d:/animator" -k}

# Starting a virtual process
spl=[split "\t "]

# Starting a sticky [hub] using process property:

For a list of avaiable virtual processes and process properties, plese refer to the plumb user manual (plumb(7)).

6. process control

Process control commands can change the state of an already created named process. The following commands are available:
stop NAME stop process called NAME softly with a signal (SIGTERM on UNIX)
kill NAME stop process called NAME the hard way with a signal (SIGKILL on UNIX)
pause NAME pause process called NAME if it is running
resume NAME resume process called NAME if it is running and is paused

7. query and diagnostics

Query commands can be used to extract information about process state. Output of the query is written to the standard output of the [cmd] that executed it; if it was executed from main script, output is discarded. Syntax is: query PROP NAME COOKIE; where NAME is the name of an existing process and prop is one of the following:
proc.state prints status of the named process ("stopped", "running" or "paused")
COOKIE is an optional cookie that will be included in the answer, which is either "query-ans [COOKIE] ANSWER" or "query-err [COOKIE] ERROR_MSG".

A special diagnostic command is dump_internals: when transaction executes this command, it will print a detailed map of all internal states of plumb to plumb's stderr. dump_internals takes an optional argument that will be the label of the dump (useful if there are multiple dumps).

8. FD (file descriptor) reference

An FD reference is a NAME:INT where NAME is the name of an existing process and INT is an integer file descritor. For normal UNIX processes 0 is stdin, 1 is stdout and 2 is stderr. There are further conventions for plumb virtual processes (3 for stdctrl, 4 for stdevent; the manual page of each virtual process has the details).

Instead of the INT, an asterisk (*) can be used that will make plumb allocate the next unbound fd of the given process. This feature is useful with virtual processes that take multiple input and/or output file descriptors automatically; most common use is with [hub].


p1={awk -f script.awk}
# after the above two lines:
#  reference to stdout of p1 is: p1:1
#  reference to stdin of p1 is: p1:0
#  referencing the next unused FD of h: h:*

9. pipeline

A pipeline is a convenient way of describing a data flow. A single pipe ("|" in plumb sytnax) always has exactly two ends, two file descriptors. The FD to the left acts as a source (writes data), the one on the right is a sink (reads data). An FD reference can serve as a source or a sink. When a process creation statement is used in a pipeline, depending on the context the stdin (0) or stdout (1) of the newly created process is used as the FD. This allows the user to build a long pipeline, as long as intermerdiate objects are always processes - an FD reference can be either sink or source, but never both.

A long pipeline can be split up in a set of single pipes. Both variants will work the same way. The only reasons to build a longer pipeline is to save typing and represent the data flow.

Examples (different forms of the same script):

# form 1
ai={awk -f script.awk}
ai:1 | h:0
h:1 | anim:0
anim:1 | ai:0

# form 2: pipeline
ai={awk -f script.awk} | h=[hub] | anim={animator}
anim:1 | ai:0

# form 3: by the end of the pipeline ai is already an existing process:
ai={awk -f script.awk} | h=[hub] | anim={animator} | ai:0

# form 4: if there's no external reference to processes created in a
# pipeline, naming the processes is not necessary (unless for debugging)
# Assume h to be accessed from other pipelines (it wouldn't make sense to
# have it in our pipeline otherwise); looping back to ai requires ai to
# be named:
ai={awk -f script.awk} | h=[hub] | {animator} | ai:0

10. transaction control

Currently the only supported transaction control command is called flush. It will execute the already parsed, but not yet executed part of the script from the execution buffer and clears the buffer. Execution is atomic: it either succeeds and newly created objects (processes, variables) become part of the global scope or the transaction fails (and has no effect on the global scope at all).


# transaction 1: create main loop
ai=[hub] | aiscript={awk -f ai.awk} | anim=[hub] | {animator -k} | ai:0
aiscript:2 | env:2

# transaction 2: start logging and a timer
anim:* | {"./"}
[timer period=0.5 repeat=0] | ai:*
(The above example works with -f, but flush is more useful when such a script is fed into a [cmd].)

11. standard script

Before the first user specified script is loaded, plumb will start the standard script, to prepare the environment. The following objects are created by the standard script:
event event dispatcher [hub]; user script may use sources of this hub for subscribing the central event stream; see event(3plumb)
env a virtual process exposing the file descriptors of plumb; see also: env(3plumb)
_procwatch a virtual process collecting signals of exiting real processes, delivering events
plumb_core a virtual process that generates internal plumb events (write errors, fd closes) to the event dispatcher
The appropriate piping between the above processes is also created.


plumb 5 2012-08-12