| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by littlestymaar 813 days ago

It's even worse than that now: they need to demonstrate how much value they bring compared to llama in terms of worker productivity.

While I've no doubt GPT-4 is a more capable model then llama3, I don't get any benefit using it compared to llama3 70B, from the real use benchmark I ran in a personal project last week: they both give solid response the majority of times, and make stupid mistakes often enough so I can't trust them blindly, with no flagrant difference in accuracy between those two.

And if I want to use hosted service, groq makes Llama70 run much faster than GPT-4 so there's less frustration of waiting for the answer (I don't think it matters to much in terms of productivity though, as this time is pretty negligible in reality, but it does affect the UX quite a bit).