#define A(row,col) a[(col<<2)+row] #define B(row,col) b[(col<<2)+row] #define P(row,col) product[(col<<2)+row] GLfloat ai0=A(i,0), ai1=A(i,1), ai2=A(i,2), ai3=A(i,3); P(i,0) = ai0 * B(0,0) + ai1 * B(1,0) + ai2 * B(2,0) + ai3 * B(3,0); P(i,1) = ai0 * B(0,1) + ai1 * B(1,1) + ai2 * B(2,1) + ai3 * B(3,1); P(i,2) = ai0 * B(0,2) + ai1 * B(1,2) + ai2 * B(2,2) + ai3 * B(3,2); P(i,3) = ai0 * B(0,3) + ai1 * B(1,3) + ai2 * B(2,3) + ai3 * B(3,3); | ||||||||||||||||
|
The matrix multiplication function was selected as a benchmark because of the abundance of matrix operations in DSP applications. The function multiplies two 4x4 matricies (a and b) and stores the result in a product matrix. Since there is very little data dependency, this function is the perfect candidate for parallelism.