Y
Hacker News
new
|
ask
|
show
|
jobs
by
datastack
1076 days ago
I guess going with a parameter count that matches existing models makes it easier to compare benchmarks. Perhaps there is another particular reason like required memory, but momentum is probably also significant.