Y
Hacker News
new
|
ask
|
show
|
jobs
by
gok
2420 days ago
The MegatronLM example is a weird one. Neural network language models are replacing n-gram language models that grow to several terabytes for SotA results; 8 billion parameters is tiny by comparison.