regex 3plumb 2012-09-05

NAME

regex - filtering and string substitution based on regex

SYNPOSIS

[regex pattern=REGX]

[regex pattern=REGX pass=mismatch]

[regex pattern=REGX subst=STR]

[regex pattern=REGX subst=STR global=yes pass=match]

               +-------+
0 (stdin) ---->| regex |----> (stdout) 1
               +-------+

DESCRIPTION

Regex takes each record and macthes them against a regex pattern. There are two main mode of operation: filtering and substitution (when subst is specified). In filtering mode the records are not altered. In subst mode a backreference capable substitution if subst string is done on the matched records once (if global is no) or muiltiple times (if global is yes) - just like the "g" switch for sed's "s" command.

With or without substitution, whether records are passed or not on match depends on the pass setting:
all pass all records regardless of whether the pattern matched - can be used only when subst is used; default for subst mode
match pass all records where pattern matched; default for filtering mode
invert pass all records where pattern did not match; can be used with filtering mode

Regex syntax is documented in project genregex (TODO: external referece).

NOTE: regex takes binary records, the common use case that each record is a single line ($LSP) is just a special case. The following regex commands have unusual meaning because of binary operation:
^ begin-of-record; after an $LSP, this is beginning of a line as well
$ end-of-line; replaced with [\r\n]+$; after an $LSP it matches the end of the line as expected
$$ end-of-record; relaced with a single $; always matches the end of the current binary record

This means ^ can be used for both binary and text; for text streams regex will work properly only after an $LSP, as regex doesn't attempt to do any line splitting, and $ works as expected; for binary streams, $$ works as wned-of-record without side effects.

eof handling

Default.

blocking/flow control

Default.

buffering

None.

EXAMPLE

The following script reads each line on stdin and replaces distr with DISTR then prints all lines to stdout (sed: "s/distr/DISTR/g"):
	env:0 | $LSP | [regex pattern=distr subst=DISTR global=true],sticky=1 | env:1

Same script, replacing only the first distr in each line (sed: "s/distr/DISTR/"):

	env:0 | $LSP | [regex pattern=distr subst=DISTR],sticky=1 | env:1

Next script reads each line on stdin and quotes digits then prints matched lines to stdout (sed: "s/distr/DISTR/g"):

	env:0 | $LSP | [regex pattern="[0-9]*" subst='"\1"' global=true pass=match],sticky=1 | env:1

SEE ALSO

regex 3plumb 2012-09-05