In this second lecture on the solution of dense linear systems we will discuss first block algorithms, which have been designed to reduce memory access. Using Level 3 BLAS this will establish the basis for efficient linear systems solvers as implemented in LAPACK. We will discuss then the distributed memory implementation of such a solver, in particular how to lay out matrices. Then we will take a look at the LINPACK benchmark and its role in performance evaluation. We will discuss the use of faster algorithms, such as Strassen's algorithm, and conclude with look at the Tflops performance result.
Jim Demmel's Lecture Notes: Lecture 02, 2/29/96: Design and Implementation of LAPACK and ScaLAPACK.
Numerical Linear Algebra chapter of the Computational Science Education Project.
DOE's "Ultra" Computer Reaches 1 Trillion Operations Per Second Milestone