|
|
|
|
|
by summerlight
615 days ago
|
|
The major benefit would be its significant decrease in memory consumption, rather than the compute itself. The major bottleneck of the current LLM infra is typically memory bandwidth and that's the reason why those chip industries are going crazy on HBM. Surely compute optimization helps but this is useful even without any hardware changes. |
|