Hacker News new | ask | show | jobs
by gdiamos 240 days ago
We used the best models available and went from the Pythia/gpt2 to Deepseek generations.

One annoying part was switching to new and better models that came out literally every week.

I don’t think it substantially changes anything. If anything I think the release of more advanced models like qwen-next makes things like fp4, moe, and reasoning tokens an even higher barrier of entry.