|
|
|
|
|
by logicchains
1144 days ago
|
|
It's not clear from the GitHub; are there any plans to eventually train the 30 or 65 billion weight LLaMA models? The 65B model seems comparable to GPT3.5 for many things, and can run fine on a beefy desktop just on CPU (CPU ram is much cheaper than GPU ram). It'd be amazing to have an open source version. |
|
…but, although it is true that for a fixed compute budget that these small models can have impressive results with good training data, it is also true that smaller models (7B) appear to have an upper performance bound that is beaten easily by larger well trained models.
It’s just way more expensive to train larger models.
They specifically note they are training a smaller 3B model In the future.
So… it seems reasonable to assume that this is a proof of concept, and that no, the Berkeley AI lab will not be fielding the cost for training a larger model.
This is probably more about exploring the “can we make a cheap good-enough model?” than “here is your GPT4 replacement”.