|
|
|
|
|
by marmaduke
989 days ago
|
|
Because they use a funny format (BCOO). I'm not mocking, it must be a solid choice for some reasons, like sparsification or other fancy stuff. But for large and even with batches (ie multiply with tall dense matrix), it doesn't match an equivalent scatter (x.at[idx].add(vals)). Which itself is several times slow than equivalent opencl (on an A40) |
|