|
|
|
|
|
by llm_nerd
553 days ago
|
|
4o and o1 get this right. LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers. Basically all of the easy gotcha stuff that people use to point out errors are things that they shouldn't do. And you pointed out something they do now which is creating and run a python script. That really is a pretty solid, sustainable heuristic and is actually a pretty great approach. They need to apply that on their backend too so it works across all modes, but the solution was never just an LLM. Similarly, if you ask an LLM a chess question -- e.g. the best move -- I'd expect it to consult a chess engine like Stockfish. |
|
But these aren't "gotcha questions", these are just some of the basic interactions that people will want to have with intelligent assistants. Literally just two days ago I was doing some things with the compound interest formula - I asked Claude to solve for a particular variable of the formula, then plug in some numbers to calculate the results (it was able to do it). Could I have used Mathematica or something like that? Yes of course. But supposedly the whole purpose of a general purpose AI is that I can use it to do just about anything that I need to do. Likewise there have been multiple occasions where I've needed ChatGPT or Claude to work with tables or lists of data where I needed the results to be sorted.