We
highlight the trends leading to the increased appeal of using hybrid
multicore+GPU systems for high performance computing. We present a set of
techniques that can be used to develop efficient dense linear algebra
algorithms for these systems.We illustrate the main ideas with the development of a hybrid LU
factorization algorithm where we split the computation over a multicore and a
graphic processor, and use particular techniques to reduce the amount of
pivoting and communication between the hybrid components.
We also show how mixed precision algorithms can be used for accelerating
performance. |