Batched operators

Summary

Batching is abstract operation of performing independent operations in parallel. Batched operators occur in many applications and a carefully implemented batched implementation can provide significant performance benefits.

The design

The objective of the batched operators is to:

Solve many (a few thousands to ten thousands) of sparse linear systems, sharing the same sparsity pattern.
These linear systems are completely independent, hence ideally we could maximize the throughput by solving as many systems as possible at the same time.
Allow independent convergence of the systems, thereby optimizing scheduling of the batch systems on the multi-processors.
Allow easy integration of preconditioners, and various matrix formats.

We accomplish this in the Ginkgo numerical linear algebra library. Examples of how to use this functionality are available:

Performance and analysis

The performance depends on our ability to minimize memory movement as the algorithm is mainly memory bound. We would like to:

Keep data that is read only once (matrix data and right hand side) in the L1 cache.
Keep data that is both read and written into (auxiliary vectors used in the solver, the solution vector) in the shared memory.

You can refer to our paper for more details and analyses of the batch solvers.