|
|
|
|
|
by PaulHoule
708 days ago
|
|
I easily broke copilot by asking it to make lists of radioactive isotopes in order of half lives. It can put the U.S. states in alphabetical or reverse alphabetical order but for any other order I would bet against it. If I ask it what the probability is that it can correctly complete a sorting task, however, it insists that it is almost certain to get it right. I had a good conversation with it about the theory of partial orderings, it even corrected my mistakes. I asked it to make a textbook problem determining if a graph was cyclic or not and it made a straight and beautiful example where the partial ordering was realized with a total ordering and everything was written out in a straight order that was easy to follow. If I wrote a script that made up a bunch of "is this graph cyclic?" problems that are well randomized I am sure there is some size where it just falls down the same way it falls down with sorting. The obvious answer is that the LLM should pick an algorithm or write some code to do the thing which ordinary algorithms can do such as arithmetic, sorting, SAT solving, etc. There's the deeper issue that it doesn't know what it doesn't know. It can't sort a list of radioactive isotopes any more than it can help you make an atom bomb. In the second case it will say that it won't help you, in the first case it will try to help you anyway when it really should be saying "I can't do that, Dave" because it just can't. |
|
ChatGPT-4o does just fine with that. Basing your opinion of a whole technology based on a poor implementation of that instead of the best one doesn't seem like the best analysis.