With LLMs I am in this strange place where I read all of these amazing things that they supposedly all can do. And I have no doubt in my mind that they probably can. And then I ask ChatGPT a simple question such as: Here is a logic puzzle that I need some help solving: Samantha is a girl and has two brothers and four sisters. Alex is a man and also one of Samantha's brothers. How many brothers and sisters does Alex have? Assume that Samantha and Alex share all siblings.
And I get back a very well written, multi-step response that leaves no doubt in anyones mind that: To solve this logic puzzle:
Samantha has 2 brothers and 4 sisters.
This means there are 7 children in total (Samantha, her 2 brothers, and her 4 sisters).
Alex is one of Samantha's brothers. Since Samantha and Alex share all siblings, Alex has:
1 brother (the other brother besides himself).
4 sisters.
Final Answer:
Alex has 1 brother and 4 sisters.
Maybe it's like with Apple and I am using it wrong.To get back to the "intern"-comparison. I could usually tell when an intern was struggling, there just were human telltale signs. When AI is wrong, it still presents its results with the confidence of someone who is extremely deep in the Dunning-Kruger hole but can still write like a year-long expect on the topic. |
What I've learned is that it's good primarily for tasks like the following:
- Tasks which take time to do, but are then easy to verify.
- Tasks which effectively boil down to translating something from one format to another. Which might e.g. be "read this technical document and implement it in code, as for style, look at these sample code files as a reference”.
- Tasks which are about exploring unknown unknowns. E.g. I write down a design, and then I ask the AI to roast it. The point is not that all the points it'll make are good and I need to please the AI, it's that out of 20 points it will list, 2-3 might both make sense, and haven't been thought of by myself.
Finally, AI requires good writing skills, and asking questions in an unbiased way, otherwise the AI will gladly hallucinate to reinforce your bias.
Logic exercises which are easy to verify are a moderately good fit for "reasoning models" which will go through many iterations of an LLM and basically write out the whole reasoning process. In practice though, this can be very expensive to get good results with.