Hacker News new | ask | show | jobs
by moonmagick 617 days ago
Yawn. I don't use Claude because the interface is good. I use it because Opus 3 is the best model anyone has ever created for long context coding, writing and retrieval. Give me a model that doesn't have polluted dataset to game MMLU scores, something that tangibly gives good results, and maybe I'll care again.

For now I only keep ChatGPT because it's better Google.

2 comments

I've found Sonnet 3.5 significantly better than Opus 3 at coding but I've not done much long context coding with it. In your experience did you find Opus 3 to degrade less or is it that you consider Sonnet 3.5 part of the "gamed" group?
Have you used Gemini? With the built-in RAG I actually find it way better than both Google Search and OpenAI for search. I think Claude still wins for overall chat quality but Gemini is amazing for Search, especially when you're not exactly sure what you're looking for.

Disclaimer: I work at Google Cloud, but I've had hands-on dev experience with all the major models.

Initially it had some real problems. large context window-- but you can only paste 4k tokens into the UI, for example. It never seemed like anyone at Google was using it. NotebookLM is a great interface, though, with some nice bells and whistles, and finally shows what Gemini is capable of. However, Opus still has the best long context retrieval with the least hallucination from what I've tried.

3.5 Sonnet is fast, and that is very meaningful to iteration speed, but I find for the level of complexity I throw at it, it strings together really bad solutions compared to the more wholistic solutions I can work through with Opus. I use Sonnet for general knowledge and small questions because it seems to do very well with shorter problems and is more up-to-date on libraries.

I don't know that I've ever seen someone recommend Gemini Advanced for "search". My experience is the model doesn't always tell you if it's using search or it's internal training, in fact I'm not sure if it even is "searching" the internet rather than accessing some internal google database.

In comparing it's performance to the pure model on Google AI studio I realized Gemini was presenting some sort of RAG results as the "answer" without disclosing where it got that information.

Perplexity, which is hardly perfect, will at least tell you it is searching the web and cite a source web page.

I'm basically saying Gemini fails at even the simplest thing you would want from a search tool: disclosing where the results came from.