|
|
|
|
|
by pk-protect-ai
854 days ago
|
|
> GPT-2 is may be last LLM This is not true. You have tones of models those are even better than GPT-3.5 and really close in performance to GPT-4 and you still can train them on a single GPU with 24GB video memory. There is a hint at yet better models published last year which you can train on a single GPU and have a model comparable in performance to LLaMA2 34B. The horizontal scaling which you appeal here, may fit into 10^6 performance increase, but in general I expect single node to be at least 1000 times faster than now. And it is totally feasible that you can't scale with 0.99 vertically and of course not horizontally, but I honestly expect the scaling per GPU get better than 0.75 in next 5 years. |
|
It depends, on what target. For pure science (or for enjoy), I could train GPT-4 class model on C64, but this method will not fit on concurrent market, where need fast check hypotheses and fast deliver tuned models.
- Concurrent market is very sensitive for speed - for example, if MS present something on December 10, Google after New Year should present not equal, but significantly better, to just appear equal for customers.
So, horizontal scale is a must, not just my wish, even when speed increase is far from linear.
> I honestly expect the scaling per GPU get better than 0.75 in next 5 years
Could you give explanation, or even speculations, how this is possible, when we already hit Silicone limits (about 5GHz core, 1nm, etc)?