|
|
|
|
|
by WhitneyLand
353 days ago
|
|
Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work. Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days). Very nice write up that’s generous in sharing their learnings. This is a solid and positive contribution. |
|