Hacker News new | ask | show | jobs
by hmage 1069 days ago
What you're seeing is Kagi making the model dumber to make it respond quicker (hence the _Fast_ in the name).

I don't know how exactly models work, but they managed to make Claude instant dumber than it is by default (see below)

Here's your question asked verbatim to all the models I have access to:

Claude instant:

  Step by step working is as below:
  * 20 + 20 = 40
  * 40 + 20 = 60
  * 60 + 21 = 81

  Hence, 20 + 20 + 20 + 21 = 81
Claude v1:

  20 + 20 + 20 + 21 = 81
Claude v2:

  Okay, let's solve this step-by-step: 20 + 20 + 20 + 21 = 20 + 20 + 20 + 21 = 60 + 21= 81. Therefore, 20 + 20 + 20 + 21 = 81.
mpt-30b-chat:

  The sum of 20 + 20 + 20 + 21 is 81.
gpt-3.5-turbo:

  20 plus 20 plus 20 plus 21 equals 81.
gpt-4:

  81
1 comments

I think I've figured out how to make claude instant dumber. I've modified your prompt so that it answers only with a number (and avoids doing the summation on writing).

I've lowered the temperature to lowest to avoid randomization of answers.

Prompt:

  How much is 20 plus 20 plus 20 plus 21? Answer only with a number.
Claude instant:

  102
Claude v1:

  81
Claude v2:

  81
mpt-30b-chat:

  61
gpt-3.5-turbo:

  81
gpt-4:

  81
Here we see that smaller models like mpt (30b) and claude instant (52b) can't do math "inside their head" and need an aid by doing the calculations step by step.

I guess that's why all the models default to doing step-by-step when they see a math problem.

So crazy thought I had.. As far as I understand these models can only do a fixed amount of work per token of output. So asking for it to show its work has two benefits, it lets it reference previous results it worked out, but it also plain gives it more computational resources.

So I'm curious what would happen if you prompted it to stall for time a bit with an answer like "hmm.... err... let's see.. what about 81?"

As far as I understand, their attention mechanism is tuned to relevance, so theoretically "hmm.... err... let's see.. what about" will amount to nothing.

Lemme check...

Prompt:

  How much is 20 plus 20 plus 20 plus 21? Answer only with a number prepended with `hmm.... err... let's see.. what about`
claude-instant:

  hmm.... err... let's see.. what about 101
mpt-30b-chat:

  Hmm.... err... let's see.. what about 70?
Other models gave correct answers as before.

So yeah, the attention mechanism was ignoring the musing tokens. It needs more task-relevant tokens (doing the math) to improve the result.

Doing the math step by step fills the context with task-relevant tokens, thus increasing the probability that the attention mechanism will select them and pull the next token from the correct latent space.

The inference cycle treats the generation of each token separately, so if it puts "20+20=", it's easier to predict that it's 40, and after putting 40, the next iteration of the cycle, the attention mechanism sees "step by step", infers that the task isn't done yet, and generates "40+20=", etc.

In much larger models, the attention mechanism sees the question and presumably finds a solved answer to that question in the model's latent space, producing a memorized result.