Hacker News new | ask | show | jobs
by Workaccount2 814 days ago
LLMs (internally) don't have a pen and paper equivalent. Their output is the output of their neurons. Like if I was a head on a table with a screen on my forehead that printed out my thoughts as they appeared in my head. Ask (promt) me my favorite color and "green" would show up on the screen.

This is why prompting LLM's to show their steps works so well, it makes them work through the problem "in their head" more efficiently, rather than just spit out an answer.

However, you can give LLM's external access to tools. Ask GPT4 a particularly challenging math problem, and it will write a python script and run it to get a solution. That is an LLM's "pen and paper".

2 comments

> That is an LLM's "pen and paper".

No, that is an LLM's calculator or programming, it doesn't actually do the steps when it does that. When I use pen and paper to solve a problem I do all steps on my own, when I use a calculator or a programming language the tool does a lot of the work.

That difference is massive, since when I use a calculator that doesn't help me learn numbers and how they interact and how algorithms works, while if I do the steps myself I do. So getting an LLM that can reliably execute algorithms like us humans can is probably a critical step towards making them as reliable and smart as humans.

I do agree though that if LLMs could keep a hidden voice they used to reason before writing they could do better, but that voice being shown to the end user shouldn't make the model dumber, you would just see more spam.

You are spitting hairs on technicalities here. You need to do a lot of "steps" to write a program that solves your question. Debatably even more steps and more complexity than using pen and paper.

Maybe we should be giving the LLM's MS paint instead of python to work out problems? There is nothing unique or "human" about running through a long division problem, it is ultimately just an algorithm that is followed to arrive at a solution.

> There is nothing unique or "human" about running through a long division problem, it is ultimately just an algorithm that is followed to arrive at a solution.

Yes, which is why we should try to make LLMs do them and that way open them up to learn much more complex understanding of algorithms and instructions that humans has yet to build a tool for.

> You need to do a lot of "steps" to write a program that solves your question. Debatably even more steps and more complexity than using pen and paper.

What does this have to do with anything? I am highlighting a core deficiency in how LLMs are able to reason, you saying that what they currently do is harder doesn't change the fact that they are bad at this sort of reasoning.

And no, making such a program doesn't require more steps or understanding. You Google for a solution and then paste in your values, that is much easier to teach a kid than to teach them math. I am sure I can teach almost any 7 year old kid to add two numbers by changing values in a python program in about an hour, much faster than they could learn math the normal way. Working with such templates is the easiest task for an LLM, what we want is to try to get the LLM to do things that is harder for it.

Here is a prompt you can plug into GPT4:

"I have a problem for you to solve. Muffins sell for $3/each. rick bakes 30 muffins a day. Tom bakes 2 muffins monday, 4 tuesday, 6 wednsdays, up to 14 on sunday. On days which tom and jerry combined bake more than 41 muffins, the price of the muffins drops to $2.50. How much total revenue do rick and tom take in during a full week, combined."

Please tell me how ChaptGPT4 writing a script to solve that is not logical reasoning, while a human pulling out pen and paper to do it is...

> Please tell me how ChaptGPT4 writing a script to solve that is not logical reasoning, while a human pulling out pen and paper to do it is...

I changed the prompt a bit (made all the numbers 3-4 digits) and gpt-4 answered with this, it just made up numbers for the days that you didn't add numbers for so it failed before it even came to arithmetics. Here is what it said, after I said this about tom "Tom bakes 2911 muffins monday, 491 tuesday, 699 wednsdays, up to 149 on sunday.", it just assumed sundays number was for all other weekdays not given a human wouldn't do that, and it missed the "up to" statement. Maye the large numbers I gave threw it off, but if that is enough to throw it of just shows that it can't really reason.

So thanks for that, more evidence these models are bad at reasoning.

Here is the first part of what it responded with, it is wrong already here:

   First, let's calculate the number of muffins baked by Tom during the week:

   Monday: 2911
   Tuesday: 491
   Wednesday: 699
   Thursday: 149
   Friday: 149
   Saturday: 149
   Sunday: 149
Edit: Here it made an arithmetics error just below, the error is that 4062 is not greater than 4199, so two critical errors, I taught math at college for years and you wouldn't find many students making mistakes like this:

   Let's determine the days when Tom and Rick combined bake more than 4199 muffins:

   Monday: 2911 (Tom) + 3571 (Rick) = 6482
   Tuesday: 491 (Tom) + 3571 (Rick) = 4062
   Wednesday: 699 (Tom) + 3571 (Rick) = 4270

   On Monday, Tuesday, and Wednesday, they bake more than 4199 muffins combined, so the price of the muffins drops to $2851.50 on those days.
Just so we have this straight, you completely changed the nature of the problem (by turning a perfect information problem into an imperfect information problem) and then are looking at me with a straight face to make your point? Please...

Unless of course you didn't realize that tom has a pattern to his baking, at which point to irony becomes palpable.

And on top of that, I am willing to bet if you give me your prompt, I would be able to restructure it in such a way that GPT4 would be able to answer it correctly. More often than not, people are just really bad at properly asking it questions.

> That is an LLM's "pen and paper".

No, that's an LLM's Python playground.

An LLM's "pen and paper" is "think step by step" where it gets to see it's own output to keep track of what it is doing.

I'd expect that with appropriate prompting one could get a good model to one/few-shot learn how to do addition this way.