|
|
|
|
|
by Scene_Cast2
310 days ago
|
|
I find it interesting that the architectures of modern open weight LLMs are so similar, and that most innovation seems to be happening on the training (data, RL) front. This is contrary to what I've seen in a large ML shop, where architectural tuning was king. |
|