Hacker News new | ask | show | jobs
by evrydayhustling 1297 days ago
These examples are terrific, but the framing is ridiculous.

- GPT-3 answers can be incorrect, and don't carry enough context with them for the reader to engage critically.

- Text is often an inefficient presentation of an answer and Google's knowledge card results can do more and more (while adopting the risk above).

- LLM's are a ways from being scalable at this quality to a fraction of the throughput of Google queries.

- Search increasingly benefits from user-specific context, which is even harder to integrate at a reasonable expense into queries at massive throughput.

- Google is also regularly putting forward LLM breakthroughs, which will of course impact productized search.

As an NLP practitioner who depends on LLMs, I'm excited as anyone about this progress. But I think some folks are jumping to a conclusion that generative AIs will be the standalone products, when I think they'll be much more powerful as integrated into structured product flows.

1 comments

I'm curious why everyone keeps getting confused about this model being GPT-3 and using their past experiences with GPT-3 to justify their position. The model is not GPT-3 and and at this point GPT-3 is far behind the state of the art. OpenAI calls this model "GPT-3.5".

It is also capable of far more than relaying information, as such it is also serving the purpose of Q/A sites like Stack Overflow. You can put wrong code into it and ask for bug fixes and it will return often exactly the correct fix.

Framed as a search engine it obviously fails on some measure, framed as a research assistant it exceeds Google by leaps and bounds (which suffers greatly from adversarial SEO gumming up its results).

I don't agree people are confused (I wasn't) or that they are depending on prior experiences (many of these points aren't rooted in direct experiences at all!). OpenAI is choosing to brand this as a fine tuning of a model that is a minor version of GPT 3.X, so it's a pretty natural shorthand.

Agree with you directionally on the research assistant point, although I think it would be interesting to define that task with more detail to see the comparisons. I'd expect that most research workflows starting with ChatGPT still need to end in search to confirm and contextualize the important parts.

Between the release of GPT-3 and GPT-3.5 there was Gopher, which raised the bar on TruthfulQA from essentially random (22.6%) in GPT-3's case to 45% for Gopher. GopherCite then brought the performance up to 80-90%. One has to assume that OpenAI is using state of the art techniques in their new model releases. That the LLMs went from choosing answers randomly to producing accurate results on a great deal of questions (they still suck at math) is missed for anyone who is not aware of the historical context that shorthanding 3.5 to 3 causes.