| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by koboll 1132 days ago
	The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality, and even a long context window can't fix that. It will remember things from many many tokens ago, but it still doesn't reliably produce passable work. The combination of a GPT-4-quality model and a long context window will unlock a lot of applications that now rely on somewhat lossy window-prying hacks (i.e. summarizing chunks). But any model quality below that won't move the needle much in terms of what useful work is possible, with the exception of fairly simple summarization and text analysis tasks.

4 comments

phillipcarter 1132 days ago

Maybe! I certainly look forward to that. Although in my testing GPT-4 also hallucinates a bit (less than gpt-3.5), and the latency is so poor that it's unworkable for our product.

link

koboll 1132 days ago

Agreed. My heuristic is that GPT-4 is good for compile time tasks but bad for runtime tasks for both cost and speed reasons.

link

pmoriarty 1132 days ago

> The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality

It really depends on what you use it for.

I've found Claude better than GPT4 and even Claude+ at creative writing.

It also tends to give more comprehensive explanations without additional prompting. So I prefer to have it, rather than GPT3.5 or 4, explain things to me.

It's also free, which is another big win over GPT4.

link

dr_dshiv 1132 days ago

I find Claude significantly better than 3.5. I’d love to be able to make the case for that with data…

link

sanxiyn 1132 days ago

Since Chatbot Arena Leaderboard https://lmsys.org/blog/2023-05-10-leaderboard/ agrees with you, it's not just you.

link

famouswaffles 1132 days ago

There are 2 main claude models. I'm guessing it's claude-v1.3 aka claude plus that you find much better than 3.5 ? That tracks if so.

link

phillipcarter 1132 days ago

I've found for my use case that both claude-instant-* and claude-* are roughly on par with each other and gpt-3.5. claude-* seems to be the least inaccurate, but we also haven't put it into production like gpt-3.5, so it's hard to say for sure.

In either case, the claude models are very good. I think they'd do fine in a real product. But there's definitely issues that they all have (or that my prompt engineering has).

link

ssd532 1131 days ago

I am very impressed with the quality of GPT-4, even with the 8k model. However, I have started reaching the limit of what the 8k model can do. I am eagerly awaiting the release of the 32k model.

Claude 100k model is nowhere near in terms of quality in my experience.

link