Hacker News new | ask | show | jobs
by facu17y 1064 days ago
If we have the budget for pre-training an LLM the architecture itself is a commodity, so what does llama2 add here?

It's all the pre-training that we look to bigCo to do which can cost millions of dollars for the biggest models.

Llama2 has too small of a window for this long of a wait, which suggests that http://Meta.AI team doesn't really have much of a budget as a larger context would be much more costly.

The whole point of a base LLM is the money spent pre-training it.

But it performs badly out of the gate on coding, which is what I'm hearing, then maybe fine-tuning with process/curriculum supervision would help, but that's about it. .

Better? yes. Revolutionary? Nope.