|
|
|
|
|
by pk-protect-ai
857 days ago
|
|
We are indeed talking about a 10^6 factor here ... It's not just 10x or 100x, or even 1000x ... If NVIDIA strips away everything not required from their chips, adds more SDRAM and HBM, it won't improve performance by 100x, maybe they'll make it 10x-15x with this. But they claim they are going to achieve a 10^6x improvement in performance. Even if they end up delivering an ARM-compatible CPU with built-in Tensor core, built-in HBM, and vast SDRAM, without DDR RAM at all, how fast can it be? This promise of 10^6x performance improve is a paradigm shift. They know something that we are not. Or they are just bluffing. |
|
What really main bottlenecks of NN hardware are neither number crunching, nor memory.
Real bottleneck is that GPT-2 is may be last LLM for which was possible train on one machine (even on one card).
About GPT-3 usually people said about 32-GPUs installations (possible to install into one machine), for GPT-4 scale said about clouds.
And modern clouds are NUMA beasts. I could say, modern clouds networking is slow, but it is not right words, as they are slow as hell.
What all these mean, NN are good target for parallel processing in clouds, but not good enough. Real benchmarks said, mentioned 32-cards machine is about 10 times faster than 1 card with such amount of memory, and when on GPT-4 things scaled, benchmarks become much worse. So, just improve network to move bottleneck to something else and will got additional 50-100x improve.
And with good team of AI scientists, it is more real to make special hardware network for NN processing, or to tune algorithms, than with team of GPU video processing specialized team.