| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gdiamos 240 days ago

We used the best models available and went from the Pythia/gpt2 to Deepseek generations.

One annoying part was switching to new and better models that came out literally every week.

I don’t think it substantially changes anything. If anything I think the release of more advanced models like qwen-next makes things like fp4, moe, and reasoning tokens an even higher barrier of entry.