This package is a thin Julia wrapper for cusolverRF It extends CUSOLVER.jl, and is compatible with the rest of the Julia CUDA ecosystem.
cusolverRF is a package to perform fast sparse refactorization on CUDA GPU. The LU factorization of a sparse matrix usually happens in two steps.
- During the symbolic factorization, the matrix is reordered to minimize the fill-in, and the sparsity patterns of the
L
andU
factor are computed. - During the numerical factorization, the coefficients of the factors
L
andU
are computed.
The first step is challenging to compute on the GPU, as most sparse matrices have unstructured sparsity pattern. cusolver implements a default LU factorization, but it transfers the sparse matrix on the host under the hood to perform the symbolic factorization. This impairs the performance when the same matrix has to be factorized multiple times.
cusolverRF
uses a different approach: it computes the symbolic factorization on the
CPU and then transfers it on the device. If the coefficient of the sparse matrix
are updated without affecting the sparsity pattern of the matrix, then there
is no need to recompute the symbolic factorization and we can compute the
numerical factorization entirely on the device, without data transfer between
the host and the device.
Hence, cusolverRF
is very efficient when the same sparse matrix has to be factorized
multiple times. The package is also efficient to solve a sparse linear system
with multiple right-hand-side entirely on the GPU.
Suppose we have a sparse matrix dA
instantiated on the GPU.
using SparseArrays, CUDA, LinearAlgebra
using CUDA.CUSPARSE
# Generate a random example
n = 10
A = sprand(n, n, .2) + I
# Move A to the GPU
dA = CuSparseMatrixCSR(A)
Computing the LU factorization of the sparse matrix dA
with cusolverRF
simply amount to
using CUSOLVERRF
rf = CUSOLVERRF.RFLU(dA; symbolic=:RF)
In this step, the matrix is moved back to the host to compute the
symbolic factorization. The factors L
and U
are then deported on
the device, and can be accessed as a matrix M = L + U
rf.M
The symbolic factorization is computed by default (symbolic=:RF
) using cusolver
internal
routines, which can be inefficient.
As an alternative, CUSOLVERRF.jl allows to compute the symbolic factorization
with KLU (symbolic=:KLU
). However, this second option is still experimental.
Then, computing the solution of the linear system
b = rand(n) # random RHS
db = CuVector(b) # move RHS to the device
ldiv!(rf, db) # compute solution inplace
Suppose now we change the coefficients of the matrix dA
, without
modifying its sparsity pattern. Then, the new system can be
solved entirely on the device as:
# update coefficients (use whichever update you want here)
dA.nzVal .*= 2.0
# update factorization on the device
lu!(rf, dA)
# resolve Ax = b
copyto!(db, b)
ldiv!(rf, b)
Any RFLU
instance stores different buffers to avoid unnecessary
allocations when calling lu!
and ldiv!
. To solve a system
with k
multiple right-hand-side RFLU
with the proper buffers, as:
k = 64
rf_batched = CUSOLVERRF.RFLU(dA; nrhs=k)
Then, solving the system
B = rand(n, k)
dX = CuMatrix(B)
# Compute solution inplace
ldiv!(rf_batched, dX)
Someone, one prefers to have full control on the factorization,
without any boilerplate code. CUSOLVERRF provides a direct interface
to cusolverRF
for this purpose. In that case, a cusolverRF
instance
is instantiated as
rf_lowlevel = CUSOLVERRF.RFLowLevel(
dA; # matrix to factorize
fast_mode=true, # fast mode is recommended
ordering=:AMD, # :AMD, :MDQ, :METIS or :RCM
check=true,
factorization_algo=CUSOLVERRF.CUSOLVERRF_FACTORIZATION_ALG0,
triangular_algo=CUSOLVERRF.CUSOLVERRF_TRIANGULAR_SOLVE_ALG1,
)
Once the matrix factorized,
solving the linear system
b = rand(n)
dx = CuVector(b)
CUSOLVERRF.rf_solve!(rf_lowlevel, dx)
And refactorizing the matrix inplace on the device:
dA.nzVal .*= 2.0
CUSOLVERRF.rf_refactor!(rf_lowlevel, dA)
This package was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.