Hacker News new | ask | show | jobs
by barfbagginus 819 days ago
This readme does a really poor job of explaining what the improvement is or how they drop half the multiplications. What is the Big O run time on this? Is this shifting the known best bounds?

And the diagrams are chaotic and don't really explain anything about why this approach is fast or good. The result is that I'm reluctant to even click-through to the PDF.

If you want to improve the project credibility please consider being honest and open about what is actually going on and giving some clear explanations and illustrations, rather than things that may as well be designed to hype people too busy to tell you that you are cranks. It's hard to tell if this is incredibly groundbreaking or just but nothingburger. Sadly I almost feel like that must be an intentional decision motivated by poor merits of work and a desire to exploit AI height. The alternative - which I prefer to believe is the case - is that the author simply needs to revise and better contextualize.

4 comments

> What is the Big O run time on this?

The claim is they’re dropping half the multiplications, so it isn’t doing anything for Big O.

> If you want to improve the project credibility please consider being honest and open about what is actually going on and giving some clear explanations and illustrations,

The math explaining how to halve the number of multiplications in the paper (https://arxiv.org/abs/2311.12224) isn’t hard to understand.

You only have to read formulas 2 (traditional matrix multiplication) and 3 to 6.

I think it’s clear it does do what’s being advertised, halving the number of multiplications at the cost of lots of extra additions/subtractions.

They then go on to better vectorize that algorithm. That, as is usual for that, gets looking messy soon.

My main concern would be numerical stability.

Thanks, good summary. Regarding numerical stability, the application is for fixed-point arithmetic, and therefore numerical stability is not an issue (the result is identical compared to using the conventional inner-product)
The readme doesn't explain much, but the introduction to the paper itself is quite approachable.

As for whether it's groundbreaking or not ... it's a neat readily-accessible constant-factor improvement for area-constrained fixed-point accelerators. That doesn't change everything overnight, but neither is it nothing. It's nice work.

> This readme does a really poor job of explaining what the improvement is or how they drop half the multiplications. What is the Big O run time on this? Is this shifting the known best bounds?

Without wishing to sound elitist I, I don't understand the point of this comment at all. If you don't understand Big O notation enough to know that "half the multiplications" doesn't change it then why are you even asking about it?

I made two really bad misunderstandings which I completely own. I can't seem to edit it, so please let this be my refutation and explanation of my own mistake.

First misunderstanding: I assumed this was a new large matrix multiplication algorithm building on the hype from last week or so where we saw this paper: https://arxiv.org/abs/2210.10173

It is not an algorithm, but a hardware design - a systolic array using roughly half of the silicon area of a baseline design.

2. Assuming that we were talking about an algorithm, I then further assumed that it reduce an algorithm's multiplications by half for some important n. I assumed that it did this by accelerating some critical sub procedure in the baseline algorithm into a more efficient big O class without really changing the multiplicative factor. This is a common way to reduce the number of operations of an algorithm for some fixed n. As a consequence I thought that the author must be being sloppy by not telling us the full big o details of the improvement, and just picking some n where it just so happened that half of the multiplications vanish. That also seemed unlikely to be a consequence of a improvement in the bound of matrix multiplication, given how incredibly slow the progress on matrix multiplication bound has been. So I thought that the author might even be a crank.

But it turned out the sloppy thinking was on me. I was being the crank.

Reading the paper's introduction made it very very clear that we were dealing with a systolic array that reduces silicon area per compute.

Even worse that information is there in the first sentence of the readme as well.

Would a clearer sentence have helped me? Something like:

"We introduce a VHDL hardware design for a systolic array that nearly halves the silicon area of a baseline array, by replacing half of the multiplication-accumulate units (MACs) with simple adder units, exploiting Winograd's 1967 fast inner product formula (FIP)."

I'm honestly not sure, given how bad my mistake was to begin with. Not even the diagrams tipped me off - in hind site they are very obviously hardware block diagrams, but I thought that they were just needlessly complicated algorithmic diagrams! How silly!

I still believe that the readme could be simplified for a general audience of goobers like me. But first and foremost I have to admit that I was being a goober!

Does that help you understand my mistake here? I do understand Big O and why cutting operations by half is typically a constant factor improvement. But apparently I don't understand it well enough to prevent me from retconning a narrative with some very stinky assumptions and then projecting them on to the poor innocent hardware designer. Not very proud about that.

It´s actually fairly clear
Not to everyone. If it's clear to you, you could helpfully explain it.
This is an analogy. a^2 - b^2 = aa - bb. This can be factored to (a+b)(a-b). In the first expression there are two multiplies, in the factored version there is only one.

However, from a numerical analysis / accuracy standpoint, evaluating the factored expression can result in loss of precision in the result when a is close to b. This is especially true if you repeatedly and sequentially do a lot of these operations. Loss of precision can be a problem in numeral modeling (like climate simulation) -- long term predictions diverge.

Given that there is a drive to use greatly reduced precision in ML engines, loss of precision might have an effect on how a model performs. Then again, it might not. I haven't read a lot of papers on ML, but I don't recall seeing ones that try to quantify how sensitive a model is to error propagation. (I am making a distinction between tests where the precision is reduced to see where it breaks down v.s. calculating / understanding what the error level actually is in a model)

With LLMs they start showing signs of brain damage once the errors get too high. In my experience it reduces their ability to reason, they stop counting correctly, and they start homogenizing categories, like calling a lemur a monkey. Compare this with quantizing weights, which instead of brain damage leads to ignorance, forcing them to hallucinate more.