Hacker News new | ask | show | jobs
by Push_to_master 1105 days ago
YMMV but I just asked the same question to both and GPT-4 calculated 9.64 laps, and mentioned how you cannot complete a fraction of a lap, so it rounded down and then calculated 24.5L.

Bard mentioned something similar but oddly rounded up to 10.5 laps and added a 10% safety margin for 30.8L.

In this case bard would finish the race and GPT-4 would hit fuel exhaustion. Thats kind of the big issue with LLMs in general. Inconsistent.

In general I think gpt-4 is better overall but it shows both make mistakes, and both can be right.

1 comments

The answer cannot be consistent because the question is underspecified. Ask humans and you will not get the same answer.

(Though in this case it sounds like Bard just did crazy maths.)

If the person doing the calculation knows how timed races work, the math is very very straightforward. In this one GPT-4 did not seem to understand how racing worked in that context, where bard understood and also applied safety margin.

Although understand is an odd word to use for LLM