Y
Hacker News
new
|
ask
|
show
|
jobs
by
HaZeust
698 days ago
Probably because the benchmarks with higher models are, at this time, negligible. Increasing transformers and iterating attention might be a dead-stop for more capable models beyond 2T parameters. But, I'm not sure.