Hacker News new | ask | show | jobs
by echelon 240 days ago
What models did you try to find tune? Were the models at the time even good enough to fine tune? Did they suffer from catastrophic forgetting?

We have a lot of more capable open source models now. And my guess is that if you designed models specifically for being fine tuned, they could escape many of the last generation pitfalls.

Companies would love to own their own models instead of renting from a company that seeks to replace them.

1 comments

We used the best models available and went from the Pythia/gpt2 to Deepseek generations.

One annoying part was switching to new and better models that came out literally every week.

I don’t think it substantially changes anything. If anything I think the release of more advanced models like qwen-next makes things like fp4, moe, and reasoning tokens an even higher barrier of entry.