Hacker News new | ask | show | jobs
by data-ottawa 28 days ago
Anyone using this yet?

I’m finding it very bad at instruction following vs 3.1. It calls tools it is told shouldn’t, and it loves calling tools. There’s a pretty strong bias towards its training vs system prompt instructions.

Google’s release notes say to reduce unnecessary tool calls by reducing thinking, but that feels like it should be orthogonal to me.

It definitely has improved a few logic things, like in data visualizations it’s better at labelling data, but it’s much worse at preparing data out of the box.

1 comments

Same. Feels very goal oriented. Requires multiple attempts to deter course and means to achieve it.

On tool use. Gave it interactive design assignment on Antigravity 2. Failed miserably until I asked to use playwright for testing. And boy did it go with it. Tested hell out of visuals, nailed the solution.

On following instruction. Asked Gemini Flash 3.5 to summarize YouTube video (google io developer keynote), a task that would previously be trivial (use ot often), but it kept hallucinating points and referencing io dev keynote blog posts from several years ago. Multiple attempts, same result even on repeat requests. Almost insistent on validity of information provided, ignoring questions if it had such capability.

What thinking level were you using?

In my testing, the minimal thinking mode hallucinated 2/3 times, which is pretty scary. The other modes weren’t as bad. I don’t have comprehensive data though.