|
|
|
|
|
by bee_rider
205 days ago
|
|
Do low rank/block diagonal matrices come up in LLMs often? What about banded or block tridiagonal? Intuitively banded matrices seem like they ought to be good at encoding things about the world… everything is connected but not randomly so. |
|
I haven't seen banded matrices as much, though (with weight sharing) they're just convolutions. One nice feature of block diagonality is that you can express it as batched matrix multiplication, reusing all the existing matmul kernels.