Hacker News new | ask | show | jobs
by machiaweliczny 126 days ago
We need that for this chinese 3B model that think 45s for hello world but also solves math.
1 comments

Nanbeige. Yeah this seems ideal for models that scale test time compute