ExPRESS - Benchmarks - Matrix Multiplication

      #define A(row,col)  a[(col<<2)+row]
      #define B(row,col)  b[(col<<2)+row]
      #define P(row,col)  product[(col<<2)+row]

      GLfloat ai0=A(i,0),  ai1=A(i,1),  ai2=A(i,2),  ai3=A(i,3);
      P(i,0) = ai0 * B(0,0) + ai1 * B(1,0) + ai2 * B(2,0) + ai3 * B(3,0);
      P(i,1) = ai0 * B(0,1) + ai1 * B(1,1) + ai2 * B(2,1) + ai3 * B(3,1);
      P(i,2) = ai0 * B(0,2) + ai1 * B(1,2) + ai2 * B(2,2) + ai3 * B(3,2);
      P(i,3) = ai0 * B(0,3) + ai1 * B(1,3) + ai2 * B(2,3) + ai3 * B(3,3);

Number of nodes	109
Number of edges	116
Avg. edges per node	1.06
Critical Path Length	9
Parallelism (nodes/critical path)	12.11
DOT File
Full C Function
MESA Application

The matrix multiplication function was selected as a benchmark because of the abundance of matrix operations in DSP applications. The function multiplies two 4x4 matricies (a and b) and stores the result in a product matrix. Since there is very little data dependency, this function is the perfect candidate for parallelism.

Benchmarks

MESA - Matrix Multiplication