| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by abiro 2127 days ago

OpenAI would naturally optimize for the tests published by Marcus as a critique of GPT-2, yet GPT-3 still fails physical reasoning spectacularly (the one test needing casual reasoning the most).

There are two broader points here:

1. The lack of independently verifiable evaluation metrics for these type of models should make everyone very skeptical. (Who can afford to retrain GPT-3 from scratch?)

2. I find it difficult to believe that smart people still insist that a model incapable of representing causal relationships can produce intelligent answers.

2 comments

moyix 2127 days ago

> OpenAI would naturally optimize for the tests published by Marcus as a critique of GPT-2

It would be difficult for them to do so since Marcus's GPT2 critique came out after they collected the dataset for GPT3.

Marcus's article: Jan 2020

GPT-3 dataset: "Table 2.2 shows the final mixture of datasets that we used in training. The CommonCrawl data was downloaded from 41 shards of monthly CommonCrawl covering 2016 to 2019"

link

SpicyLemonZest 2127 days ago

(1) I certainly agree with. But Marcus doesn't claim skepticism about GPT-3s intelligence; he claims that his evaluation metrics definitively show it doesn't understand the text it outputs or know anything about the world.

(2) is, I think, a misunderstanding. People who believe GPT-3 is producing intelligent answers generally believe it can represent causal relationships.

link

abiro 2127 days ago

Fair points. For the record, re 2:

The GPT family of models (and all neural networks for that matter) can estimate P(X | Y), but have no way of computing whether X -> Y or X <- Y.

link

Veedrac 2126 days ago

A computation can represent causality without being made of causality-neurons.

link