| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yankoff 1112 days ago
	Is this opinion based on some benchmarking you (or someone else) did?

3 comments

nickthegreek 1112 days ago

Nothing that you can self host seems to come close to gpt 3.5, let alone gpt-4. r/LocalLlama is good subreddit to lurk to get a pulse on the local llms. Current leader seems to be Guanaco-65B.

link

egonschiele 1112 days ago

I believe there are benchmarks, but I can informally second that opinion. I'm building a writing app (chiseleditor.com) and there is nothing as good as the ChatGPT models right now.

link

bzmrgonz 1111 days ago

Since you have your hands in the mess, let me ask you this, and I ask, because I think this is what is meant by people who ask what's a.. bla..bla alternative.. to bla..bla..bla. How can an industry specific or company specific AI be created? meaning you take the LLM engine and you ingress company data.. or if you want to be bold, industry datasets. CHATGPT is marketted as being doctor/architect/lawyer/professor/etc. But what if all you want to do is build an ask jeeve's type of ai lawyer??

link

PeterisP 1112 days ago

I would distrust the currently available benchmarks, as recent research (gah, can't remember the paper title) indicates that for many benchmarks at least some of the data splits have leaked into model training data; and there's some experience with the open source models which match an OpenAI model on the benchmark scores but subjectively feel much worse than that model on random questions.

link

whimsicalism 1112 days ago

I’m telling you from looking at this closely that there is substantial evidence solely from new, never seen before prompts that GPT4 is by far the best, ChatGPT/Claude is second, with other anthropic, Vicuña, etc bringing up the rear

link

joenot443 1112 days ago

Have you tried Anthropic, specifically Claude? I have no doubt GPT-4 is still king, I'm just curious how much of a lead it has.

link

pmoriarty 1112 days ago

I've played around a lot with Claude, and find it much better than GPT4 and even Claude+ at creative writing.

I also generally prefer Claude or Claude+ over GPT4 or ChatGPT (GPT 3.5) for explanations too, which tend to be more thorough without any special prompting.

link

egonschiele 1112 days ago

No, I would love to try it out but unfortunately I don't have early access yet.

link

j45 1112 days ago

It seems reasonable for what many who are using openai and self hosting are finding.

There’s a gap, it’s closing, likely faster than anticipated.

Huggingface awaits :)

link