| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by soarerz 766 days ago
	The model's first attempt is impressive (not sure why it's labeled a choke). Unfortunately gpt4o cannot discover calculus on its own.

7 comments

munk-a 765 days ago

I think this is the biggest flaw in LLMs and what is likely going to sour a lot of businesses on their usage (at least in their current state). It is preferable to give the right answer to a query, it is acceptable to be unable to answer a query - we run into real issues, though, when a query is confidently answered incorrectly. This recently caused a major headache for AirCanada - businesses should be held to the statements they make, even if those statements were made by an AI or call center employee.

astrange 765 days ago

The Air Canada incident happened before ChatGPT was released so I haven't seen a reason to believe AI was involved.

munk-a 764 days ago

I can't tell if you're being sarcastic or not - but AI predates ChatGPT.

astrange 764 days ago

Chatbot-style AI didn't, and certainly not one major airlines would be using for customer service.

Chinjut 766 days ago

It's a choke because it failed to get the answer. Saying other true things but not getting the answer is not a success.

bombadilo 765 days ago

I mean, in this context I agree. But most people doing math in high school or university are graded on their working of a problem, with the final result usually equating to a small proportion of the total marks received.

giaour 765 days ago

This depends on the grader and the context. Outside of an academic setting, sometimes being close to the right answer is better than nothing, and sometimes it is much worse. You can expect a human to understand which contexts require absolute precision and which do not, but that seems like a stretch for an LLM.

phatfish 765 days ago

LLMs being confidently incorrect until they are challenged is a bad trait. At least they have a system prompt to tell them to be polite about it.

Most people learn to avoid that person that is wrong/has bad judgment and is arrogant about it.

ifwinterco 765 days ago

I think current LLMs suffer from something similar to the Dunning-Kruger effect when it comes to reasoning - in order to judge correctly that you don't understand something, you first need to understand it at least a bit.

Not only do LLMs not know some things, they don't know that they don't know because of a lack of true reasoning ability, so they inevitably end up like Peter Zeihan, confidently spouting nonsense

perfobotto 765 days ago

This is supposed to be a product , not a research artifact.

chongli 765 days ago

But most people doing math in high school or university are graded on their working of a problem, with the final result usually equating to a small proportion of the total marks received

That heavily depends on the individual grader/instructor. A good grader will take into account the amount of progress toward the solution. Restating trivial facts of the problem (in slightly different ways) or pursuing an invalid solution to a dead end should not be awarded any marks.

slushy-chivalry 765 days ago

it choked because it didn't solve for `t` at the end

impressive attempt though, it used number of wraps which I found quite clever

photochemsyn 765 days ago

I don't know... here's a prompt query for a standard problem in introductory integral calculus, and it seems to go pretty smoothly from a discrete arithmetical series into the continuous integral:

"Consider the following word problem: "A 100 meter long chain is hanging off the end of a cliff. It weighs one metric ton. How much physical work is required to pull the chain to the top of the cliff if we discretize the problem such that one meter is pulled up at a time?" Note that the remaining chain gets lighter after each lifting step. Find the equation that describes this discrete problem and from that, generate the continuous expression and provide the Latex code for it."

usaar333 765 days ago

Or.. use calculus?

It has gotten quite impressive at handling calculus word problems. GPT-4 (original) failed miserably on this problem (attempted to set it up using constant acceleration equations); GPT-4O finally gets it correct:

> I am driving a car at 65 miles per hour and release the gas pedal. The only force my car is now experiencing is air resistance, which in this problem can be assumed to be linearly proportional to my velocity.

> When my car has decelerated to 55 miles per hour, I have traveled 300 feet since I released the gas pedal.

> How much further will I travel until my car is moving at only 30 miles per hour?

xienze 765 days ago

Does it get the answer right every single time you ask the question the same way? If not, who cares how it’s coming to an answer, it’s not consistently correct and therefore not dependable. That’s what the article was exploring.

sabrina_ramonov 765 days ago

I labeled it choke because it just stopped.

HDThoreaun 766 days ago

Right its the only answer that accounts for wasted space there might be between wraps.

fmbb 765 days ago

Can it be taught calculus?