Hacker News new | ask | show | jobs
by yankoff 1112 days ago
Is this opinion based on some benchmarking you (or someone else) did?
3 comments

Nothing that you can self host seems to come close to gpt 3.5, let alone gpt-4. r/LocalLlama is good subreddit to lurk to get a pulse on the local llms. Current leader seems to be Guanaco-65B.
I believe there are benchmarks, but I can informally second that opinion. I'm building a writing app (chiseleditor.com) and there is nothing as good as the ChatGPT models right now.
Since you have your hands in the mess, let me ask you this, and I ask, because I think this is what is meant by people who ask what's a.. bla..bla alternative.. to bla..bla..bla. How can an industry specific or company specific AI be created? meaning you take the LLM engine and you ingress company data.. or if you want to be bold, industry datasets. CHATGPT is marketted as being doctor/architect/lawyer/professor/etc. But what if all you want to do is build an ask jeeve's type of ai lawyer??
I would distrust the currently available benchmarks, as recent research (gah, can't remember the paper title) indicates that for many benchmarks at least some of the data splits have leaked into model training data; and there's some experience with the open source models which match an OpenAI model on the benchmark scores but subjectively feel much worse than that model on random questions.
I’m telling you from looking at this closely that there is substantial evidence solely from new, never seen before prompts that GPT4 is by far the best, ChatGPT/Claude is second, with other anthropic, Vicuña, etc bringing up the rear
Have you tried Anthropic, specifically Claude? I have no doubt GPT-4 is still king, I'm just curious how much of a lead it has.
I've played around a lot with Claude, and find it much better than GPT4 and even Claude+ at creative writing.

I also generally prefer Claude or Claude+ over GPT4 or ChatGPT (GPT 3.5) for explanations too, which tend to be more thorough without any special prompting.

No, I would love to try it out but unfortunately I don't have early access yet.
It seems reasonable for what many who are using openai and self hosting are finding.

There’s a gap, it’s closing, likely faster than anticipated.

Huggingface awaits :)