Hacker News new | ask | show | jobs
by sfink 461 days ago
When I saw the title, I knew what this was going to be. It made me want to immediately write a corresponding "Human Blindspots" blog post to counteract it, because I knew it was going to be the usual drivel about how the LLMs understand <X> but sometimes they don't quite manage to get the reasoning right, but not to worry because you can nudge them and their logical brains will then figure it out and do the right thing. They'll stop hallucinating and start functioning properly, and if they don't, just wait for the next generation and everything will be fine.

I was wrong. This is great! I really appreciate how you not only describe the problems, but also describe why they happen using terminology that shows you understand how these things work (rather than the usual crap that is based on how people imagine them to work or want them to work). Also, the examples are excellent.

It would be a bunch of work, but the organization I would like to see (alongside the current, not replacing it, because the one-page list works for me already) would require sketching out some kind of taxonomy of topics. Categories of ways that Sonnet gets things wrong, and perhaps categories of things that humans would like them to do (eg types of tasks, or skill/sophistication levels of users, or starting vs fixing vs summarizing/reviewing vs teaching, or whatever). But I haven't read through all of the posts yet, so I don't have a good sense for how applicable these categorizations might be.

I personally don't have nearly enough experience using LLMs to be able to write it up myself. So far, I haven't found LLMs very useful for the type of code I write (except when I'm playing with learning Rust; they're pretty good for that). I know I need to try them out more to really get a feel for their capabilities, but your writeups are the first I've found that I feel I can learn from without having to experience it all for myself first.

(Sorry if this sounds like spam. Too gushing with the praise? Are you bracing yourself for some sketchy URL to a gambling site?)