|
|
|
|
|
by gdiamos
240 days ago
|
|
We used the best models available and went from the Pythia/gpt2 to Deepseek generations. One annoying part was switching to new and better models that came out literally every week. I don’t think it substantially changes anything. If anything I think the release of more advanced models like qwen-next makes things like fp4, moe, and reasoning tokens an even higher barrier of entry. |
|