Hacker News new | ask | show | jobs
by financltravsty 938 days ago
Parallelization.

Each "unit of work" in matrix multiplication is not dependent on any other unit of work. Stuff as many cores as you can into a chip, and then simply feed in all your vectors at the same time.

I.e. basically a beefed up GPU or an "AI" chip.

1 comments

A million element square matrix is a lot of data. To process that in a second is much more bandwidth than a single socket can support, so you'll need many sockets too.