CS267 / Eng233 / IDS267 Applications of Parallel Computers

Lecture 10: Large Dense Linear Systems - Distributed Implementation


Abstract

 

In this second lecture on the solution of dense linear systems we will discuss first block algorithms, which have been designed to reduce memory access. Using Level 3 BLAS this will establish the basis for efficient linear systems solvers as implemented in LAPACK. We will discuss then the distributed memory implementation of such a solver, in particular how to lay out matrices. Then we will take a look at the LINPACK benchmark and its role in performance evaluation. We will discuss the use of faster algorithms, such as Strassen's algorithm, and conclude with look at the Tflops performance result.

 

Reading

Jim Demmel's Lecture Notes: Lecture 02, 2/29/96: Design and Implementation of LAPACK and ScaLAPACK.

Lecture 14 (part 1), 1/18/96: Designing fast linear algebra kernels in the presence of memory hierarchies.

Numerical Linear Algebra chapter of the Computational Science Education Project.

 

Postscript

PDF

 

Useful links

LINPACK Benchmark
LAPACK -- Linear Algebra PACKage
ScaLAPACK

DOE's "Ultra" Computer Reaches 1 Trillion Operations Per Second Milestone

MP Linpack Teraflop Info