| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by easeout 535 days ago
	My guess at the upshot: Some domains, like math, are general but have outsized effective vocabularies like all possible numbers, which makes them more expensive to train by the same method that works for domains of regular-sized vocabularies. If you train for reasoning steps in such a problem domain, you can reinforce the comparatively few general terms of the vocabulary like "add", "inverse", "solve". And that leaves the arithmetic of number combinations separate from particular problems because you're not emphasizing one-shot answers. You can train N reasoning cases + M arithmetic cases instead of N*M whole math problems. So you have to use more inference power but you can get better answers for less training. Theory aside, I would think a good application-side method is to use this general reasoning process to structure a final expression and then pass that through a traditional evaluator. Then the reasoning and training thereof need only go as far as symbol manipulation. This is something like Wolfram Alpha, if its NLP handed off to the evaluator much later in the process.

1 comments

sega_sai 535 days ago

A connected question -- has there been an LLM that is a perfect calculator ? I.e. you give it a expression involving standard operations +/- and (say) integer numbers, standard operations and it should returns always a correct result. I don't remember seeing any papers on this (but i'm not an expert)

link

jkhdigital 535 days ago

Why would you ever want an LLM that is a perfect calculator? Humans invented calculators for a reason. A good LLM should respond to arithmetic questions by executing a cheap and efficient calculator program instead of wasting cycles on it.

link

ciphix 534 days ago

While your engineering perspective emphasizes efficiency, it's worth noting that, akin to the human brain, we aim to develop powerful LLMs capable of performing complex cognitive tasks. Although they may operate more slowly, these models can, for instance, reason through intricate problems without external tools, much like Einstein conceptualized relativity through thought experiments or Andrew Wiles proved Fermat's Last Theorem through deep mathematical insight

link

esafak 534 days ago

Solving FLT is not like using a calculator. You don't use the same skills. It is not mechanical.

link

levn11 533 days ago

i mean https://www.techrxiv.org/users/717330/articles/702287-on-fer......

link

sega_sai 534 days ago

It is the question of capabilities. People use LLMs to prove theorems. It is therefore a relevant question whether llms can work as generic calculators. And if they can't it shows IMO something is missing.

link

daxfohl 534 days ago

It depends what you mean by LLM, perfect, etc. You can train up a neural net pretty quickly to do basic addition perfectly. It just needs two inputs for the digits, plus one bit for carryover, and an output 0-19 (if base 10). Your code would do the iteration on digits. So once your NN is trained to map inputs to sums exactly, you've got your algorithm, and it's provably correct.

"That's cheating. You have custom code in the loop.": but that's what an LLM does; it feeds input tokens and feeds back output tokens through the LLM one by one. So.

Now, as far as a realistic LLM, no there's no way to prove that it will always get even 1+1=2 correct. There's always a chance that something in the context will throw it off. Generally LLMs are better at interpreting questions, finding some code that maps to the answer, executing that code, and spitting out the answer. As a case in point, try asking one to solve a sudoku. It will grab some code off github, run it, and give you the answer. Now ask it to solve it by pure reasoning step-by-step. It'll get hopelessly lost, tell you numbers are in the wrong places, tell you that eliminating 7 from {2,7} leaves only {3,8}, etc. (And then finally give you the correct answer, now _that's_ cheating!)

So, if not LLMs, and not handwritten loops, the only other option is single-shot. Can a NN be trained to do math in a single run? And the answer is not really. At least, not efficiently. If you think about it, a single run through a NN only has a limited number of steps. So it's going to be limited in what it can do. If your computation requires more steps than that, all your NN can do is guess.

So no, there's really no perfect "pure" AI for math. AI tools for math are generally a combination of NNs that make guesses, and hand-written code that checks or uses those guesses to generate some feedback and ask for next steps. Which, isn't too different from how humans do it either. Make a guess, try it out, look up references, look for tools, create a tool or modify an existing one, and so on until you get it right.

link

emporas 534 days ago

Then you need a Large Arithmetic Model (LAM). We have that, it's called calculator.

The LLM could invoke several command line programs, including calculators or anything else in which a deterministic answer is desirable. Structured outputs for example, people usually mean Json output, but any schema like Xml or Html could be enforced by some command line tools, and when the validation fails, it should double check it's own output and hopefully fix it.

link

sebzim4500 534 days ago

>And if they can't it shows IMO something is missing

I don't think this follows, since they are trying to replace humans who are also not perfect at arithmatic.

link

Scene_Cast2 535 days ago

Standard neural nets (created through regular training methods) have no guarantees about their output. So no, there hasn't been anything like that.

I do recall someone handcrafting the weights for a transformer and getting some sort of useful algorithm or computation going, so there's that.

link

scotty79 534 days ago

Conversly, is there an LLM that is given a calculator and taught how to use it so it doesn't need to waste neurons on doing simple arithmetic that neurons actually suck at?

Or even better, a simple programmable calculator and/or symbolic calculator.

link

regularfry 534 days ago

Anything that's got access to a python interpreter would qualify.

link