| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by emp17344 85 days ago
	I think this is still useful research that calls into question how “smart” these models are. If the model needs a separate tool to solve a problem, has the model really solved the problem, or just outsourced it to a harness that it’s been trained - via reinforcement learning - to call upon?

2 comments

dghlsakjg 85 days ago

Does it matter if the LLM can solve the problem or if it knows to use a resource?

There’s plenty of math that I couldn’t even begin to solve without a calculator or other tool. Doesn’t mean I’m not solving math problems.

In woodworking, the advice is to let the tool do the work. Does someone using a power saw have less claim to having built something than a handsaw user? Does a CNC user not count as a woodworker because the machine is doing the part that would be hard or impossible for a human?

link

grey-area 84 days ago

It does matter because the LLM doesn’t always know when to use tools (e.g. ask it for sales projections which are similar to something in its weights) and is unable to reason about the boundaries of its knowledge.

link

hooverd 81 days ago

Is your issue with math in this example the tediousness of the operations or a conceptual lack of understanding of how to solve them?

link

azakai 85 days ago

It has "outsourced" it to another component, sure, but does that matter?

What the user sees is the total behavior of the entire system, not whether the system has internal divisions and separations.

link

emp17344 85 days ago

It matters if you’re curious about whether AGI is possible. Have we really built “thinking machines”, or are these systems just elaborate harnesses that leverage the non-deterministic nature of LLMs?

link

azakai 85 days ago

An "elaborate harness" that can break down a problem into sub-tasks, write Python scripts for the ones it can't solve itself, and then combine the results, seems able to solve a wide range of cognitive tasks?

At least in theory.

link

TeMPOraL 85 days ago

What is a difference? If the "elaborate harness" consists of mix of "classical" code and ML model invocations, at which point it's disqualified from consideration for "thinking machine"? Best we can tell, even our brains have parts that are "dumb", interfacing with the parts that we consider "where the magic happens".

link