plumb 7 2012-08-12

NAME

plumb, the pipeshell - user manual

DESCRIPTION

0. Terminology

There are processes , file descriptors ( fds ) and pipes . Plumb can run multiple processes , each process may have multiple fds . A connection between two fds can be established by creating a pipe . pipes are always unidirectional. There is a command language that can create and maintain processes and pipes . Records of dfferent size are sent through the pipes; it is a common configuration to split text streams to one records per line. There is a line based event language that can deliver events to processes upon subscription (which is binding by pipes ).

1. processes

There are real and virtual processes . A real process is a program that is started by plumb as a background process, using bgproc of libporty (effectively a system() call). A virtual process is a dummy process emulated by plumb, connected to a bidirectional socket/pipe/device file, etc. or to a code section in plumb. Records are passed between processes ; it is guaranhubd that among virtual processes records are not split or merged unless the process explicitly requests so.

The single most useful virtual process is hub, which can connect multiple pipes. Refer to the see also section for a list of supported virtual processes.

Properties of a process :
unique ID (anonymous processes get an auto-generated ID)
cmdline command line (for non-virtual processes)
fdlist number and role of fds
state see later
sticky (*) plumb will not quit as long as at least one sticky process is running (bool; default: yes for real processes, no for most virtual processes - refer to VP manual)
kill_on_eof (*) kill the process if all sinks got eof; it is useful for processes that do not handle eof properly (bool; default: no - expect processes to handle eof)
delayed_startup do not start the process until any of the sinks gets a write request
_new_fds whether process can get new fds on the fly (bool; default no; yes for a few selected VPs, user can not change this flag)

Process states:
stopped before started or after exited/killed
running normal operation
paused all sinks are blocking, plumb is not writing sinks of the process and is not reading sources of the process

A process ends, if:
die the process exits or gets killed by another external program
kill-on-eof all sink fds of the process got EOF and the kill_on_eof is true for the process
scripted kill the process gets killed by plumb (a kill command through a [cmd])

Plumb exits cleanly if all sticky processes are in stopped state (even if there is data in buffer of non-sticky processes that would be discarded). If plumb is forced to exit by an error, a signal or upon request through a [cmd], all non-stopped processes are killed.

2. fds

An fd is always attached to a specific process and is either input or output of the process .

Input fds are called sinks, output fds are called sources. A source can emit data and EOF, a sink can receive data and EOF. pipes pass on EOF; if a sink should not receive an EOF, virtual process "[hub eof=...]" should be inserted.

3. pipes

pipes are simple, unbuffered, unidirectional, anonymous communication channels.

pipes do not do any filtering or alteration of the stream - those features are all implemented in virtual processes .

Unbuffered means whatever is read on the input, be it a single byte, a line, a random sized binary blobb, is copied to the output immediately, using the same record size, and the output is flushed. To speed up transfer, buffering should happen on the input and output of the processes (which does, for most of the standard tools).

NOTE: buffer for read() has a specific size; if a process tries to write a large record bigger than this buffer size, the user should not except it to pass on without being fragmented.

A pipe is a point-to-point device, connecting a single source to a single sink. For connecting multiple pipes to a single source, or connecting multiple pipes to a single sink, virtual process 'hub' should be

4. Flow control and buffering

Internal communication ( pipes between virtual processes and sinks of virtual processes ) are handled by passing on pointers - no real pipes or buffering is involved. Real processes on the other hand have real sockets for both sinks and sources; sinks may block when plumb tries to write them.

If a sink of a real process is blocking, the portion of the write that couldn't go through is saved in a buffer attached to the fd and the fd is marked "blocking". Blocking sinks are watched for POLLOUT from the main loop. The block is also back-promoted through the pipe connected to it. When this back-promotion reaches the source of the process feeding the pipe :

- for a virtual process : by default the process gets blocked and stays blocked as long as any of its sources are being blocked; some virtual processes may handle the situation differently: refer the documentation of the given virtual.

- for a real process : plumb stops reading that one source, and it is up to the process to decide when/how to block

In practice this means that blocking starts at a sink of a real process and virtual processes are _usually_ transparent to blocking, immediately relaying them back to sources of real processes , causing them to pause writing. This behaviour is similar to what shell does on pipes, except that in plumb we have a tree instead of a straight pipeline which means a blocking sink may infect a large tree and block a number of sources on the other end.

Unlike with shell, it is possible to create deadlocks with plumb: in a '69' setup with two real processes that are both blocking on write, thus can not read or process input. More generally, any closed loop of processes that contains at least one real process that can block on write may get deadlocked. If the loop is connected to other loops via 'hub' all those other loops may get blocked too. Plumb has no chance to predict or detect such deadlocks. The script may fork the stream somewhere in each loop with [hub] and drive the data to an user implemented watchdog process through a timed [count] process to detect if data flow stopped. With extra information on the behavior of the processes, this script may decide when to pause or stop or restart processes. Library stats(3plumb) has predefined [hub] | [count] routines.

5. configuration variables

Plumb is configured through variables. The advantage of this is that variables can be set from command line with -v or from script files, with existing syntax. Below is a list of configuration variabels.

plumb_kill_timeout When plumb exits, it needs to make sure all real processes are killed; it will try to terminate processes gracefully, then waits at most plumb_kill_timeout (integer) seconds before hard-killing remaining processes. If not set, a hardwired default is used.

SEE ALSO

plumb 7 2012-08-12