Hacker News new | ask | show | jobs
by bee_rider 781 days ago
If you are going to include vs MKL benchmarks in your repo, full pivoting LU might be one to consider. I think most people are happy with partial pivoting, so I sorta suspect Intel hasn’t heavily tuned their implementation, might be room to beat up on the king of the hill, haha.
1 comments

funny you mention that, the full pivoting. it's one of the few benchmarks where faer wins by a huge margin

n faer mkl openblas

1024 27.06 ms 186.33 ms 793.26 ms

1536 73.57 ms 605.71 ms 2.65 s

2048 280.74 ms 1.53 s 8.99 s

2560 867.15 ms 3.31 s 17.06 s

3072 1.87 s 6.13 s 55.21 s

3584 3.42 s 10.18 s 71.56 s

4096 6.11 s 15.70 s 168.88 s

it might be interesting to add butterfly lu https://arxiv.org/abs/1901.11371. it's a way of doing a numerically stable lu like factorization without any pivoting, which allows it to parallelize better.
It looks like they are describing a preconditioner there.
the key point is that the preconditioner allows you to skip pivoting which is really nice because the pivoting introduces a lot of data dependence.
looks interesting! thanks for sharing