| HN Mirror

I needed to normalise a big list of dates recently. I thought maybe GPT could help. It spat out a list of normalised dates which, after a bit of careful reading, were about 95% right.

How can you trust a tool that's right 95% of the time? In the end I wrote a script which handled edge cases explicitly. That took a little bit longer, but the output is deterministic. It took less time than manually cross referencing the output and input would have.

I tried asking GPT to write the conversion script instead, but the script it generated just didn't deal with the edge cases. After a few rounds of increasingly specific directions which didn't seem to be helping, I gave up.

I've been using copilot for development work. It has some magic moments, and it can be great for boilerplate. But then it introduces subtle bugs which are really hard to catch in review, or suggests completely incorrect function signatures and I wonder if it's adding very much at all.

The biggest problem with these tools is that they turn a fun problem solving exercise into an incredibly tedious reviewing exercise. I'd much rather do it myself and understand it fully than have to review the unreliable output of an LLM. I find it much simpler to be correct than to find flaws in other peoples work.

Am I missing something?