Hacker News new | ask | show | jobs
by wongarsu 792 days ago
That seems to be the general experience. Maybe 8B are just too few parameters to achieve higher level reasoning.
1 comments

Maybe depth rather than parameter count.