Hacker News new | ask | show | jobs
by wwizo 32 days ago
Same. Feels very goal oriented. Requires multiple attempts to deter course and means to achieve it.

On tool use. Gave it interactive design assignment on Antigravity 2. Failed miserably until I asked to use playwright for testing. And boy did it go with it. Tested hell out of visuals, nailed the solution.

On following instruction. Asked Gemini Flash 3.5 to summarize YouTube video (google io developer keynote), a task that would previously be trivial (use ot often), but it kept hallucinating points and referencing io dev keynote blog posts from several years ago. Multiple attempts, same result even on repeat requests. Almost insistent on validity of information provided, ignoring questions if it had such capability.

1 comments

What thinking level were you using?

In my testing, the minimal thinking mode hallucinated 2/3 times, which is pretty scary. The other modes weren’t as bad. I don’t have comprehensive data though.