Hacker News new | ask | show | jobs
by nl 1222 days ago
>No, it's paraphrasing it's training data that likely contains these tasks in one form or another.

Have you read "Emergent Abilities of Large Language Models"[1] or at least the related blog post[2].

It provides strong evidence that this isn't as simple as something it has seen in training data. Instead as the parameter count increases it learns to generalize from that data by learning chain-of-thought reasoning (for example).

Specifically, this explaination for multi-step reasoning goes well beyond the "it is just parroting training data":

> For instance, if a multi-step reasoning task requires l steps of sequential computation, this might require a model with a depth of at least O (l) layers.

[1] https://openreview.net/forum?id=yzkSU5zdwD

[2] https://ai.googleblog.com/2022/11/characterizing-emergent-ph...