Hacker News new | ask | show | jobs
by electricships 934 days ago
here is my good deed for the day:

modern AI is just vector multipication. any AI chip is just 10,000s of very simple cores which can do vector float operations and little else. this also entails clever trade offs of shared cache and internal bandwidth.

(as a thought experiment, consider a naive million by million matrix multipication. this will take a single cpu about 1 year! how do we reduce this to 1s ?)

the end

4 comments

Nowadays AI chips are specialized in not just vector multiplication but matrix multiplication. Just as moving from scalar math to vectors brings savings in control and routing logic, moving from vector to matrix does the same. Taking a result from a floating point unit and moving it to a big, multi-ported register file and then reading it out again to feed into another floating point unit is often a much bigger draw of power than the multiplication or addition itself and to the extent you can minimize that by feeding the results of one operation direction into the processing structures for the next you've got a big win.
>vector float operations and little else

I thought they were generally int8 or int16 vector multiply adds and occasionally float16 added in.

As someone with a lot of interest in but no fluency with chip design, or the dividing and conquering of math within silicon, for that matter, how would you multiply a 1m² matrix?
Parallelization.

Each "unit of work" in matrix multiplication is not dependent on any other unit of work. Stuff as many cores as you can into a chip, and then simply feed in all your vectors at the same time.

I.e. basically a beefed up GPU or an "AI" chip.

A million element square matrix is a lot of data. To process that in a second is much more bandwidth than a single socket can support, so you'll need many sockets too.
> naive million by million matrix multipication....how do we reduce this to 1s

A matrix of that size in single precision is 32TB, a better question is how do you store it?

The original ask specified "naive million by million matrix multipication", I don't consider sparse matrices to be "naive".