Hacker News new | ask | show | jobs
by gok 2420 days ago
The MegatronLM example is a weird one. Neural network language models are replacing n-gram language models that grow to several terabytes for SotA results; 8 billion parameters is tiny by comparison.