| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by electricships 981 days ago

here is my good deed for the day:

modern AI is just vector multipication. any AI chip is just 10,000s of very simple cores which can do vector float operations and little else. this also entails clever trade offs of shared cache and internal bandwidth.

(as a thought experiment, consider a naive million by million matrix multipication. this will take a single cpu about 1 year! how do we reduce this to 1s ?)

the end

4 comments

Symmetry 981 days ago

Nowadays AI chips are specialized in not just vector multiplication but matrix multiplication. Just as moving from scalar math to vectors brings savings in control and routing logic, moving from vector to matrix does the same. Taking a result from a floating point unit and moving it to a big, multi-ported register file and then reading it out again to feed into another floating point unit is often a much bigger draw of power than the multiplication or addition itself and to the extent you can minimize that by feeding the results of one operation direction into the processing structures for the next you've got a big win.

link

automatic6131 981 days ago

>vector float operations and little else

I thought they were generally int8 or int16 vector multiply adds and occasionally float16 added in.

link

exikyut 981 days ago

As someone with a lot of interest in but no fluency with chip design, or the dividing and conquering of math within silicon, for that matter, how would you multiply a 1m² matrix?

link

financltravsty 981 days ago

Parallelization.

Each "unit of work" in matrix multiplication is not dependent on any other unit of work. Stuff as many cores as you can into a chip, and then simply feed in all your vectors at the same time.

I.e. basically a beefed up GPU or an "AI" chip.

link

Symmetry 981 days ago

A million element square matrix is a lot of data. To process that in a second is much more bandwidth than a single socket can support, so you'll need many sockets too.

link

bigbillheck 981 days ago

> naive million by million matrix multipication....how do we reduce this to 1s

A matrix of that size in single precision is 32TB, a better question is how do you store it?

link

ForkMeOnTinder 981 days ago

https://en.wikipedia.org/wiki/Sparse_matrix#Storage

link

bigbillheck 981 days ago

The original ask specified "naive million by million matrix multipication", I don't consider sparse matrices to be "naive".

link