|
|
|
|
|
by lolinder
615 days ago
|
|
> Search for 'LLM Leaderboard' and you can see for yourself. The 8b models do not even rank. This is not true. On benchmarks, maybe, but I find the LLM Arena more accurately accounts for the subjective experience of using these things, and Llama 3.1 8B ranks relatively high, outperforming GPT-3.5 and certain iterations of 4. Where the 8Bs do struggle is that they don't have as deep a repository of knowledge, so using them without some form of RAG won't get you as good results as using a plain larger model. But frankly I'm not convinced that RAG-free chat is the future anyway, and 8B models are extremely fast and cheap to run. Combined with good RAG they can do very well. |
|
> 8B models are extremely fast and cheap to run
yes.
> Combined with good RAG they can do very well.
This is simply not true. They perform at a level which is useful for simple, trivial tasks.
If you consider that 'doing well', then sure.
However, if, like the parent post, you want to be writing scripts, which is specifically what they asked... then: heck, what 8B are you using, because llama 3.1 is shit at it out of the box.
¯\_(ツ)_/¯
A working unit test can take 6 or 7 iterations with a good prompt. Forget writing logic. Creating classes? Using RAG to execute functions from a spec? Forget it.
That's not not the level that I need for an assistant.