|
|
|
|
|
by marci
765 days ago
|
|
Maybe that's not the right metrics to compare. True, the model is bigger, but required less tokens than Llama 3 to train. The issue is when there's no open datasets, it's hard to really compare and replicate. Is it because of the model's architecture? Dataset quality? Model size? A mixture of those? Something else? |
|
That…doesn’t matter to users. User’s care what it can do, and what it requires for them to use it, not what it took for you to make it.
Sure, if it has better performance relative to training set size that’s interesting from a scientific perspective and learning about how to train models, maybe, if it scales the same as other models in that regard. But ultimately, for use, until you get to a model that does better absolutely, or does better relevant to models with the same resource demands, you aren’t offering an advantage.