Hacker News new | ask | show | jobs
by pamelafox 320 days ago
I am testing out gpt-5-mini for a RAG scenario, and I'm impressed so far.

I used gpt-5-mini with reasoning_effort="minimal", and that model finally resisted a hallucination that every other model generated.

Screenshot in post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvtdyvb...

I'll run formal evaluations next.

3 comments

Q: What does a product manager do?

GPT4: Collaborating with engineering, sales, marketing, finance, external partners, suppliers and customers to ensure …… etc

GPT5: I don't know.

Upon speaking these words, AI was enlightened.

That is genuinely nice to see. What are you using for the embeddings?
We use text-embedding-3-large, with both quantization and MRL reduction, plus oversampling on the search to compensate for the compression.
This is huge news if we finally have a model that is able to say "I don't know".
If a model doesn't "know" what a PM is then I worry about any of its other outputs. That should be dictionary lookup.
Why? It's honest as it doesn't understand it without more context. Lookup could lead to wrong results
Seriously. I have never seen this, even once. I had been wondering if it was impossible. If a model can really say “I don’t know” when it doesn’t know, that could change everything. How many pointless, dumb rabbit holes could be avoided?
My comment peers are really whooshing hard on this. Clearly they have worked with a different sort of PM than I ever have.

The correct answer is: “professional managerial class grift”

This feels like honestly the biggest gain/difference. I work on things that do a lot of tool calling, and the model hallucinating fake tools is a huge problem. Worse, sometimes the model will hallucinate a response directly without ever generating the tool call.

The new training rewards that suppress hallucinations and tool-skipping hopefully push us in the right direction.

I get the "good" result with phi-4 and gemma-3n in RAG scenario - i.e. it only used context provided to answer and couldn't answer questions if context lacked the answer without hallucination.