Chapter 5: Arrays

5.1. Arrays are hash tables

An array is a collection of members. A member is like a variable: it has a type and a value. Array members are accessed by naming the array and using an index (which is a string or a number). Each array has its own namespace for indices: the same index in two different arrays will mean two different members.

In libfawk arrays are hash tables: they can be indexed by strings. When iterating over all members of an array, the order of iteration is random.

5.2. Syntax

The array syntax is:

varname[index]
where varname is the variable name of the array and index is an expression (must result in a scalar value).

For example the following code creates an array foo and print its members:
example program ch5_ex1
foo[1] = 42
foo["bar"] = 3.14
fawk_print(foo[1])
fawk_print(foo["bar"])

As a convention, we normally choose array variable names to be all uppercase. This is only for code clarity. The rest of this document will stick to this convention.

5.3. The in operator

The rule for array members is the same as for variables: they are created with the NIL value the first time they are referenced. This makes checking an array member for existence with read operation cause a side effect:

if ARR[42] != "" then fawk_print("does exist")

While this code works, as a side effect it does create member 42 in ARR, with the NIL value, even tho the code used it only for reading.

To overcome this problem, libfawk has the in operator to check if a member exists:

if 42 in ARR then fawk_print("does exist");

The in operator never creates the array member, but returns non-zero (true) if it exists and zero (false) if it does not.

5.4. Array iteration: the for-in syntax

It is possible to iterate over all members of an array (in random order):
example program ch5_ex2
FOO[1] = 42
FOO["bar"] = 3.14
if FOO["baz"] <> "" then fawk_print("hey!")

fawk_print("members A:")
for i in FOO
	fawk_print(i, FOO[i])
next i

fawk_print("members B:")
for i in FOO
	fawk_print(i, FOO[i])
next i

Both loops will go in the same order, because there was no change in the array in between the two. Both will list index "baz", because it is indeed created with NIL value by the if() statement.

5.5. Fake multi-dimensional arrays

Syntax:

varname[x, y]

where x and y are both expressions. Any number of dimensions can be attached using more commas.

However this is only a fake multi-dimensional array: in reality all arrays are single-dimension in libfawk. What happens here is the comma operator concatenates x and SUBSEP and y into a single string that is then used as an index. SUBSEP is a builtin global variable that is a single character string, binary \034 (ASCII "file separator"), by default. This normally won't appear in user strings so is good for dimension separation.

The for-in loop will also handle the array as one-dimensional, listing all indices as strings with SUBSEP in them. It is not possible to iterate per dimension.

5.6. Array-in-array

An array member can be a full array:
example program ch5_ex3
FOO[1] = 3
FOO[2] = 14
BAR["a"] = "A"
BAR["b"] = FOO

rem prints 14
fawk_print(BAR["b"][2])

This is different from multi-dimensional arrays: it can build a real tree in the memory.

It is important to note that FOO is not copied into BAR["b"], only referenced so FOO is linked into BAR["b"] and later changes to FOO are reflected in BAR["b"]:
example program ch5_ex4
FOO[2] = 14
BAR["b"] = FOO
FOO[2] = 55

rem prints 55
fawk_print(BAR["b"][2])

If copy is needed instead, a for-in loop can be implemented to create a new array:
example program ch5_ex5
FOO[1] = 3
FOO[2] = 14
for i in FOO
	BAR["b"][i] = FOO[i]
next i
FOO[2] = 55

rem prints 14
fawk_print(BAR["b"][2])

5.7. Array as function parameter, explicit references to arrays

When an array is passed as a function parameter, only a reference is passed, not a deep copy of the array. Unlike with scalar function parameters, if an array parameter is modified, the modification is visible at the caller as well:
example program ch5_ex6
function load(ARR)
	ARR[1] = "one"
	ARR[2] = "two"
	ARR[3] = "three"
end function

load(FOO[])
for n = 1 to 3
	fawk_print(n, FOO[n])
next n

When calling load(), the syntax FOO[] is used instead of plain FOO. It is because this is the first appearance of FOO in the code, which means libfawk will need to create global variable FOO here. Whether it is created as scalar or as array depends on whether the first appearance is indexed. The empty index construct means we are not addressing a specific member, merely state that FOO is an array. Once a variable is created, it is either an array or a scalar, but it can not switch between the two and any access referring to it differently than how it was created will result in a runtime error.

Note on internals: within load() it is clear that ARR is an array because the first reference is done with a [1] index. When FOO exists and is an array at the caller side, the two arrays are properly linked. If FOO exists on the caller side as a scalar, the linking fails (runtime error). However if FOO does not exist on the caller side, it is created as scalar with value NIL which is passed to load() as copy; then load() is free to create a local array for ARR, but it can not link it to FOO when we are already in the call. Thus FOO is not initialized. That's why referencing it with [] is essential, so that it is created at the caller's side before the call so load() can link to it.

There are two alternatives. First is initializing an index of the array:

	FOO[0] = 0
	delete(&FOO[0])
	load(FOO)

This creates the array before call to load, but as an unwanted side effect it also creates a member that we need to delete (else it will show up in the listing).

A much simpler solution is simply referencing the array as array before the call:

	FOO[]
	load(FOO)

Note: there is a limitation of the [] syntax: it can not address array-in-array directly; thus FOO[] is valid; but FOO[1][] is not (which would create an array and store it in FOO[1]).

5.8. Array referencing and array UIDs

Each array gets a script-context unique integer identifier on creation. When the array is printed using fawk_print(), this UID is printed.

This UID is used for implementing the = operator (not the assignment but the one that compares) . Which means two arrays compare to be the same only if they are just two references to the very same array. Two distinct arrays with the same content will not be taken as the same array.

Arrays are stored in the form of array reference. Libfawk never makes a real copy of the array, an ARR1 = ARR2 merely makes one more reference to the array behind ARR2. This means later on a condition on ARR1 = ARR2 will return true. This also means any modification to ARR2 also affects ARR1 and vice versa. If a real copy is needed, a simple one liner can be used:

	for tmp in ARR1
		ARR2[tmp] = ARR1[tmp]
	next tmp

This will create ARR2 and copy each element. ARR2 will have a different UID than ARR1.