Hacker News new | ask | show | jobs
by Bimos 41 days ago
> I'm not sure about whether this is a bottlenecking step in applications

In barrier method of solving linear programs, numeric factorization normally takes about 1/3 of the time of main loop (like after all symbolic/presolve steps).

> That is, is there a (sparse) matrix representation which is used in gpu's?

We (COPT) did manage to write a gpu version of Cholesky factorization, and it's faster than our (pretty fast) cpu implementation. Although one can argue that the CPUs for benchmarking is not the fastest, or it is not as expensive as the GPUs.

Currently our barrier implementation is 67% faster on H200 than on i7-11700K [1]. Not all of the improvement contributes to Cholesky factorization.

Another GPU Cholesky factorization implementation we are aware of is CUDSS by nvidia.

> And does it make sense to carry through the dag/tree construction as a sort of "prep" step (on cpu or gpu)?

For both CPU and GPU version, a symbolic phase is necessary, as we usually expect many factorizations on matrices with the same sparse pattern. CUDSS has (at least some part of) their symbolic phase on GPU.

[1] https://plato.asu.edu/ftp/lpfeas.html