Hacker News new | ask | show | jobs
by summerlight 615 days ago
The major benefit would be its significant decrease in memory consumption, rather than the compute itself. The major bottleneck of the current LLM infra is typically memory bandwidth and that's the reason why those chip industries are going crazy on HBM. Surely compute optimization helps but this is useful even without any hardware changes.
1 comments

Inference speeds go brrrr as well.