| HN Mirror

This kind of second-order (and higher-order) usage of LLMs is where things actually start to get much more interesting. The other thing you can do is just train a better model.

I use GPT-4 for debugging a lot now, because it's excellent at taking nothing other than an error message from the console and giving me back what's wrong and how to fix it. It's not perfect, but it's good enough that I reach for it by default now. I don't have API access to GPT-4 yet, and so I was comparing how well GPT-3.5 performed at this same task and for the example I tried, it just didn't get close enough for me to truly find it useful, so I wouldn't begin to rely on it in my daily workflow unlike GPT-4.

But... what I am actually quite interested in, and what I'm seeing a lot of, is exactly how far can you push a less capable model through prompt engineering? I think it's actually surprisingly further than you might have initially thought.