Hacker News new | ask | show | jobs
by redox99 3 days ago
And I expect blackwells to hold value even more (already very LLM optimized, and semiconductor processes will slow down).
1 comments

Yeah most of the performance increases have mostly been from architectural improvements like reduced precision tensor cores. AFAIK FP4 is basically the limit for floating point matmuls, after which you need to switch to integer addition if you want to reduce bits, and I don’t think we’ve figured out 1-bit LLMs just yet.