Hacker News new | ask | show | jobs
by bernaferrari 936 days ago
even if that's true, the A series neural engine is much worse than the M series, I'll say it wrong, but it can only do 32bit inference (or something like that) where M series can do 64bit, so A series can run LLM but has a series of limitations that M series doesn't.
1 comments

Practically speaking, most models today infer at 8bit or 16bit (sometimes, rarely 32). You don't see an empirical lift at more bits of precision. Size of the memory is far more important.
If we're talking about the results, is there any reason to think it should make a difference at all?
Sometimes gradients are small but meaningful, if you constrain them to too few bits / degrees of freedom they'll be unable to backprop successfully. This can hamper training and therefore results quality.

You can also think about it as compounding errors - at any one weight index the bit values might not be too meaningful, but cascaded over a lot of tensor multiplications they will be.

Oh I was thinking we were talking about the same calculations on different hardware.