summaryrefslogtreecommitdiff
path: root/math/cosma/pkg-descr
blob: bb85be81af81545b14a3bdf0ae9fd6c15f843f6e (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
COSMA is a parallel, high-performance, GPU-accelerated, matrix-matrix
multiplication algorithm that is communication-optimal for all
combinations of matrix dimensions, number of processors and memory
sizes, without the need for any parameter tuning. The key idea behind
COSMA is to first derive a tight optimal sequential schedule and only
then parallelize it, preserving I/O optimality between processes. This
stands in contrast with the 2D and 3D algorithms, which fix process
domain decomposition upfront and then map it to the matrix dimensions,
which may result in asymptotically more communication. The final
design of COSMA facilitates the overlap of computation and
communication, ensuring speedups and applicability of modern
mechanisms such as RDMA. COSMA allows to not utilize some processors
in order to optimize the processor grid, which reduces the
communication volume even further and increases the computation volume
per processor.