Hacker News new | ask | show | jobs
by simne 853 days ago
> you still can train them on a single GPU with 24GB video memory

It depends, on what target. For pure science (or for enjoy), I could train GPT-4 class model on C64, but this method will not fit on concurrent market, where need fast check hypotheses and fast deliver tuned models.

- Concurrent market is very sensitive for speed - for example, if MS present something on December 10, Google after New Year should present not equal, but significantly better, to just appear equal for customers.

So, horizontal scale is a must, not just my wish, even when speed increase is far from linear.

> I honestly expect the scaling per GPU get better than 0.75 in next 5 years

Could you give explanation, or even speculations, how this is possible, when we already hit Silicone limits (about 5GHz core, 1nm, etc)?

1 comments

> Could you give explanation, or even speculations, how this is possible

Nope. But i'm so desperate to give you a hint right now, it is almost impossible to hold myself... Stop looking into horizontal scalability. The vertical one is not exhausted yet. Btw that was not the hint.

> Stop looking into horizontal scalability.

Sure. B-747 officially need about 700 man-years so assemble, lets make them with small but highly motivated teams, with classics 3 pizza rule, world will wait :)