Hacker News new | ask | show | jobs
by abiro 2127 days ago
OpenAI would naturally optimize for the tests published by Marcus as a critique of GPT-2, yet GPT-3 still fails physical reasoning spectacularly (the one test needing casual reasoning the most).

There are two broader points here:

1. The lack of independently verifiable evaluation metrics for these type of models should make everyone very skeptical. (Who can afford to retrain GPT-3 from scratch?)

2. I find it difficult to believe that smart people still insist that a model incapable of representing causal relationships can produce intelligent answers.

2 comments

> OpenAI would naturally optimize for the tests published by Marcus as a critique of GPT-2

It would be difficult for them to do so since Marcus's GPT2 critique came out after they collected the dataset for GPT3.

Marcus's article: Jan 2020

GPT-3 dataset: "Table 2.2 shows the final mixture of datasets that we used in training. The CommonCrawl data was downloaded from 41 shards of monthly CommonCrawl covering 2016 to 2019"

(1) I certainly agree with. But Marcus doesn't claim skepticism about GPT-3s intelligence; he claims that his evaluation metrics definitively show it doesn't understand the text it outputs or know anything about the world.

(2) is, I think, a misunderstanding. People who believe GPT-3 is producing intelligent answers generally believe it can represent causal relationships.

Fair points. For the record, re 2:

The GPT family of models (and all neural networks for that matter) can estimate P(X | Y), but have no way of computing whether X -> Y or X <- Y.

A computation can represent causality without being made of causality-neurons.