Y
Hacker News
new
|
ask
|
show
|
jobs
by
wongarsu
792 days ago
That seems to be the general experience. Maybe 8B are just too few parameters to achieve higher level reasoning.
1 comments
brrrrrm
792 days ago
Maybe depth rather than parameter count.
link