|
|
|
|
|
by facu17y
1064 days ago
|
|
If we have the budget for pre-training an LLM the architecture itself is a commodity, so what does llama2 add here? It's all the pre-training that we look to bigCo to do which can cost millions of dollars for the biggest models. Llama2 has too small of a window for this long of a wait, which suggests that http://Meta.AI team doesn't really have much of a budget as a larger context would be much more costly. The whole point of a base LLM is the money spent pre-training it. But it performs badly out of the gate on coding, which is what I'm hearing, then maybe fine-tuning with process/curriculum supervision would help, but that's about it. . Better? yes. Revolutionary? Nope. |
|