Fawk ("Function AWK") is a dialect of the AWK programming language, built on libfawk, a custom Virtual Machine (VM). Libfawk comes with different scripting language frontends, including fawk, fbas ("Function BASIC") and fpas("Function Pascal"). This tutorial demonstrates every feature of the libfawk VM using the fawk language for examples. Some of the knowledge learned here can be transferred to using other AWK implementations.
A libfawk script is always running embedded in a host application, which is a bigger program, typically written in a C-like language. Since libfawk has almost no I/O, there are two interfaces to the outside world:
Normally the job is split up between the host application and libfawk in a way that the host application implements the functions that can execute the low level tasks (the "how do we do it" part) while the script is responsible for the high level logics ("why do we do it").
A libfawk script consists code (in form of functions) and data memory (in form of variables, called cells). Just like in C, there is only one special function: main. This is the entry point, the first code the host application should call. Depending on the application:
Beside main, the API (function names and parameters) between the host application and the script are always specific to the given host application. This is true in both direction: for script implemented functions the host application will call and for the application provided functions the script can call. The API should be documented in the host application's documentation.
For this tutorial, we will use fawk(1), shipped with libfawk. This is a tiny host application, indented for diagnostics. It implements the bare minimum: it calls the main of the script and does not provide any application funtion for the script. However, that is enough for this tutorial, since the focus is the scripting language and the VM, not any specific host application.
A libfawk function has a name unique in the script context, has zero or more parameters and has a return value. The type of the parameters and the return value are all flexible and decided run-time. More on this in chapter 2.
The syntax for defining a function is:
function funcname(argname1, argname2, argnameN) { statments }
In this tutorial text written in italic in the examples are parts that the user needs to fill in; the rest, written with normal font are keywords.
If there are no parameters, an empty () needs to follow the function name. Libfawk is not white-space sensitive: there can be spaces between function and funcname and (), the opening brace can be in the same line, with or without space.
The API for main() is similar to the main() of C:
function main(ARGV) { }
where ARGV is an array containing the arguments. What arguments mean is specific to the host application. In chapter 1, we will ignore arguments.
The part between the braces is called the body of the function.
The classic hello world program in fawk:
example program ch1_ex1 |
---|
function main(ARGV) { fawk_print("hello, world"); } |
Function fawk_print() is a libfawk builtin, implemented in C, which prints all arguments supplied then a newline, to the standard output. In our case this one argument is an immediate string literal, which is written using doublequotes in fawk. Alternatively fawk_print_cell() can be used which prints verbose information about the arguments passed, including type and length and reference counters.
Running the script with fawk from a shell, assuming the script is saved as hello.fawk:
fawk hello.fawk
The fawk_print line is a function call. A function call is an expression (because it has a return value). We did not use the return value in this example. Whether the return value is used or not, the whole line becomes a statement: an atomic part of a code sequence (e.g. function body) that needs to be executed ordered the same way the source code is.
A function can have any amount of statements. For example this script will print the two words in two lines:
example program ch1_ex2 |
---|
function main(ARGV) { fawk_print("hello,"); fawk_print("world"); } |
In fawk, just like in C, semicolon is mandatory at the end of statements, but not after a {} block. This is unlike in traditional AWK where newline can substitute semicolon.
Newlines can be omitted:
function main(ARGV) { fawk_print("hello,");fawk_print("world"); }
Excess semicolons between statements or at the beginning or end of blocks are silently ignored (they represent empty statements).
A fawk source file is a sequence of functions. The following example demonstrates how to declare multiple functions:
example program ch1_ex3 |
---|
function hello() { fawk_print("hello,"); } function world() { fawk_print("world"); } function main(ARGV) { hello(); world(); } |
In this example main first calls hello() with no parameter and then calls world() with no parameter. The order of functions is arbitrary, the following script is equivalent to the previous one:
example program ch1_ex4 |
---|
function main(ARGV) { hello(); world(); } function world() { fawk_print("world"); } function hello() { fawk_print("hello,"); } |
Lines from the first # character are ignored (unless the # character appears in a string literal within double quotes):
function main(ARGV) { # This is a comment fawk_print("#this is not a comment and will be printed"); }
Comments are filtered out while loading a script.
A block is a sequence of statements enclosed in braces . A function body is always a block. But wherever a single statement can be written, a block can be written too. This will be useful with flow controls in chapter 3:
example program ch1_ex5 |
---|
function main(ARGV) { if (1) fawk_print("hello world"); if (1) { fawk_print("hello"); fawk_print("world"); } } |