| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pamelafox 320 days ago

I am testing out gpt-5-mini for a RAG scenario, and I'm impressed so far.

I used gpt-5-mini with reasoning_effort="minimal", and that model finally resisted a hallucination that every other model generated.

Screenshot in post here: https://bsky.app/profile/pamelafox.bsky.social/post/3lvtdyvb...

I'll run formal evaluations next.

3 comments

ralfd 320 days ago

Q: What does a product manager do?

GPT4: Collaborating with engineering, sales, marketing, finance, external partners, suppliers and customers to ensure …… etc

GPT5: I don't know.

Upon speaking these words, AI was enlightened.

link

ComputerGuru 320 days ago

That is genuinely nice to see. What are you using for the embeddings?

link

pamelafox 320 days ago

We use text-embedding-3-large, with both quantization and MRL reduction, plus oversampling on the search to compensate for the compression.

link

siva7 320 days ago

This is huge news if we finally have a model that is able to say "I don't know".

link

jofzar 319 days ago

If a model doesn't "know" what a PM is then I worry about any of its other outputs. That should be dictionary lookup.

link

siva7 319 days ago

Why? It's honest as it doesn't understand it without more context. Lookup could lead to wrong results

link

dimal 319 days ago

Seriously. I have never seen this, even once. I had been wondering if it was impossible. If a model can really say “I don’t know” when it doesn’t know, that could change everything. How many pointless, dumb rabbit holes could be avoided?

link

jondwillis 319 days ago

My comment peers are really whooshing hard on this. Clearly they have worked with a different sort of PM than I ever have.

The correct answer is: “professional managerial class grift”

link

potatolicious 320 days ago

This feels like honestly the biggest gain/difference. I work on things that do a lot of tool calling, and the model hallucinating fake tools is a huge problem. Worse, sometimes the model will hallucinate a response directly without ever generating the tool call.

The new training rewards that suppress hallucinations and tool-skipping hopefully push us in the right direction.

link

0x457 320 days ago

I get the "good" result with phi-4 and gemma-3n in RAG scenario - i.e. it only used context provided to answer and couldn't answer questions if context lacked the answer without hallucination.

link