libgpcogl - theory of operation

GPU rendering

A modern GPU uses user provided fragment shader programs to calculate pixel colors while rendering polygon fragments. The input of rendering is a set of (typically two dimensional) textures and the output is pixels of a framebuffer. A pixel value, both in the texture and in the output frame buffer, is normally a tuple of floating point or integer values (e.g. 2 values for RG or 4 values for RGBA). Compiled shader programs, frame buffers and textures are stored in GPU memory.

It is possible to craft a setup where the output frame buffer is a two dimensional rectangle on which a same sized three dimensional quad is rendered in an 1:1 "top view" transformation. In this case rendering the output will run the shader program once on each pixel of the frame buffer. The shader program has access to:

the x and y coordinates of the output pixel
a number of input textures configured (and "bound") before running the shader

The output frame buffer does not need to be tied to a screen but can be mapped onto a texture. Which means the shader program reads input textures from GPU memory and writes the output into a texture in the GPU memory. The output texture can be reused as a input texture for a subsequent shader program. Textures can be transferred between CPU memory and GPU memory any time.

Array abstraction

For gpcogl this is all abstracted one step further:

instead of textures, our terminology is arrays; an array is a two dimensional, row-major list of N*M cells
a cell holds one, two, three or four components (numbers) of the same type
a component is float32, uint32 or int32
a shader program takes zero or more input arrays and exactly one output array (dimensions do not need to match)
the compute operation runs a given shader program once for each pixel of the output array
only one shader program is being executed at a time; no new shader program is started before every pixel of the previous shader program is finalized and written out (into the output array in the GPU memory)

Massive parallelism

The GPU has a lot of shaders, which are tiny CPU cores with high speed access to GPU memory, capable of running shader programs. Shaders work in parallel. The number of shaders is specific to the GPU model, but typically varies between a few dozen and several hundreds.

When a gpcogl compute operation is started, the GPU splits up the output array among all available shaders and run them in parallel. This means how each shader instance executes the shader program (how each single output cell is computed) is definite, but in what order the output cells are calculated or how many of them are being worked on in parallel at any given time is unknown.

The consequence is how shader programs shall be designed. A shader program should:

calculate only a single output pixel (the one the GPU tells it to)
use input arrays different from the output array
if the output array is also used as an input array it is unsafe to read any other cell from it than the same cell as the current output cell; the reason is: it's impossible to tell if that other cell has already been calculated by another shader instance or it still has its old value

Global constants

It is possible to use global constants to communicate parameters from the C program to shader programs. In the shader these look like read-only variables. They are set once in C, passed in with the compute() call, and can not be changed during shader execution. They are ideal to communicate discrete configuration parameters (scalars or vectors of length 2..4) to shaders.

gpcogl context

A gpcogl context (C type: gpcogl_t, which is a struct) holds all states for a computational context:

shader programs compiled and stored in GPU memory
arrays stored in GPU memory
configuration and information on the GPU and driver

The user can create multiple, independent contexts in parallel. Depending on the GLI used, these contexts can use the same or different GPU hardware (if the host computer has multiple GPUs installed). Contexts can be created and discarded any time during execution.

Sequence of calls

The typical structure of a computation carried out using libgpcogl is:

initialize the GLI
create a gpcogl context
compile one or more shader programs from source (within the gpcogl context)
upload input arrays from CPU to GPU (within the gpcogl context)
create output and auxiliary arrays in the GPU (within the gpcogl context)
perform one or more computations using the shader programs compiled above (within the gpcogl context), using the arrays already in GPU memory
download results from GPU memory arrays to CPU memory (within the gpcogl context)
destroy the gpcogl context
uninit the GLI