libgpcogl - theory of operation

GPU rendering

A modern GPU uses user provided fragment shader programs to calculate pixel colors while rendering polygon fragments. The input of rendering is a set of (typically two dimensional) textures and the output is pixels of a framebuffer. A pixel value, both in the texture and in the output frame buffer, is normally a tuple of floating point or integer values (e.g. 2 values for RG or 4 values for RGBA). Compiled shader programs, frame buffers and textures are stored in GPU memory.

It is possible to craft a setup where the output frame buffer is a two dimensional rectangle on which a same sized three dimensional quad is rendered in an 1:1 "top view" transformation. In this case rendering the output will run the shader program once on each pixel of the frame buffer. The shader program has access to:

The output frame buffer does not need to be tied to a screen but can be mapped onto a texture. Which means the shader program reads input textures from GPU memory and writes the output into a texture in the GPU memory. The output texture can be reused as a input texture for a subsequent shader program. Textures can be transferred between CPU memory and GPU memory any time.

Array abstraction

For gpcogl this is all abstracted one step further:

Massive parallelism

The GPU has a lot of shaders, which are tiny CPU cores with high speed access to GPU memory, capable of running shader programs. Shaders work in parallel. The number of shaders is specific to the GPU model, but typically varies between a few dozen and several hundreds.

When a gpcogl compute operation is started, the GPU splits up the output array among all available shaders and run them in parallel. This means how each shader instance executes the shader program (how each single output cell is computed) is definite, but in what order the output cells are calculated or how many of them are being worked on in parallel at any given time is unknown.

The consequence is how shader programs shall be designed. A shader program should:

Global constants

It is possible to use global constants to communicate parameters from the C program to shader programs. In the shader these look like read-only variables. They are set once in C, passed in with the compute() call, and can not be changed during shader execution. They are ideal to communicate discrete configuration parameters (scalars or vectors of length 2..4) to shaders.

gpcogl context

A gpcogl context (C type: gpcogl_t, which is a struct) holds all states for a computational context:

The user can create multiple, independent contexts in parallel. Depending on the GLI used, these contexts can use the same or different GPU hardware (if the host computer has multiple GPUs installed). Contexts can be created and discarded any time during execution.

Sequence of calls

The typical structure of a computation carried out using libgpcogl is:

  1. initialize the GLI
  2. create a gpcogl context
  3. compile one or more shader programs from source (within the gpcogl context)
  4. upload input arrays from CPU to GPU (within the gpcogl context)
  5. create output and auxiliary arrays in the GPU (within the gpcogl context)
  6. perform one or more computations using the shader programs compiled above (within the gpcogl context), using the arrays already in GPU memory
  7. download results from GPU memory arrays to CPU memory (within the gpcogl context)
  8. destroy the gpcogl context
  9. uninit the GLI