Hacker News new | ask | show | jobs
by Jerrrry 783 days ago
>complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks.

>counting

Pick one.

This is very similar to the "precision" misconception regarding floating point numbers.

The answer isn't wrong, it's just imprecise.

Hallucinations are a misnomer.

You are trying to get exact integer<>word accuracy from an architecture that is innately probabilistic, and where atomically it clashes; words get tokenized, so arithmetic is difficult at a microscale - the carry bit likely won't make it to the (needed transformer) context to work, since usually, most numbers don't overflow on average when summed.

It can, however, output a small program - with high confidence - that it can self-evaluate for functional proximity, then use that to help arrive at an answer.

This is a proto-Mixture of Experts model, achieved by another hyper-visor or guard dog LLM.

1 comments

Why should I? If a person told you that they can multiply, divide, add and subtract, would you not also assume that they can at least count?

The point here is: the justifications from AI engineers for why counting vs math aren't the same task, while valid, are irrelevant because marketing never brings up the limitation in the first place. So any logical person who doesn't know a lot about AI will arrive at a logical, albeit practically incorrect conclusion.

>If a person told you that they can multiply, divide, add and subtract, would you not also assume that they can at least count?

But that's not what they said; to be fair. They said it can do complex math - not simple math, repeatedly, many times, by one inference.

The architecture just clashes against the intent too much to arrive at a useful/acceptable answer.

Had you crafted a larger prompt that recursively divides the context into n amount of separation buckets, then sum them (inverted binary tree wise), you'd likely have better luck with the carry bits tallying correctly.

Fair, valid point. I do admit that this is far from a perfect analysis. I do hope, though, that it helps people at least classify their problems into categories where they need to design around the flaw rather than just assuming that the thing “just works”. I appreciate the discussion though!