Hacker News new | ask | show | jobs
by Sunhold 1053 days ago
Sam Altman has never denied that GPT-4 is a mixture of experts model. He denied an early rumor that it was a 100 trillion parameter model.[1] The mixture of experts rumor states that GPT-4 is eight 220B models. That's far more plausible than a single 100 trillion model, and the sources (geohotz and Soumith Chintala[2]) have some credibility. But yeah, it's still only a rumor.

[1] https://www.theverge.com/23560328/openai-gpt-4-rumor-release...

[2] https://twitter.com/soumithchintala/status/16712671501017210...

1 comments

Read this as if I'm smiling and shaking my head. I'm not upset, I call it a quixotic quest because there's little chance of correcting it given how far it diffused, how few people understand the nuts and bolts, and by far the biggest factor IMHO: confirmation bias.

You cited geohot as an expert on OpenAI[1], and to indicate skepticism Altman denied it, you fixated on the # of parameters, cited a Verge link to a chart in a random tweet about 100 trillion parameters, that it didn't show Sam Altman, and it didn't ask Altman about 100 trillion parameters specifically. And if it did, what does that have to do with mixture of experts?

I flipped to 3 to -2 within 30 minutes of you posting this.

"A lie gets halfway around the world before the truth has a chance to get its pants on." - Churchill

[1] never worked at OpenAI, no notable domain expertise, and a Twitter intern in 2022.

Here is the timeline again:

2022/11/11: A viral tweet claims GPT-4 will have "100 trillion parameters."[1] At this point, there were no rumors about mixture of experts.

2023/01/16: In an interview, Sam Altman mentions he saw the tweet and it was "complete bullshit."[2]

2023/06/20: geohotz and the lead of PyTorch, two people who would be expected to have relevant connections, claim that GPT-4 is an 8 x 220B mixture of experts model.[3]

These are two separate, unconnected rumors. One was denied by Sam Altman and was never plausible in the first place. The other was never denied and is highly plausible. You are conflating them by claiming, without any source, that there was "a clear denial from OpenAI's CEO" that "GPT4 is a trillion parameter mixture of experts model."

[1] https://twitter.com/andrewsteinwold/status/15948895625260277...

[2] https://youtu.be/ebjkD1Om4uw?t=313

[3] https://twitter.com/soumithchintala/status/16712671501017210...

1. You did find a tweet that claimed 100 trillion parameters, as the GP post did.

2. The video mentions he saw _a_ tweet about GPT...and actually we don't even know what the tweet said, the moderator never finished their question.

3. I'm not sure what sort of claim "connected" is, other than unfalsifiable, like all of the confirmation bias motivated arguing on this topic. People do know Geohot's name and Pytorch is an open source ML framework, neither of which make them likely venues to know a closely kept trade secret of Open AI's. (and as we show in the rest of this post, they were parroting claims made months earlier, I'm showing you through March '23, Geohot didn't get around to repeating it until June!)

Recentering: it's not a mixture of experts model, no matter if people claimed 1 trillion, 100 trillion or both. (btw, easy proof of the extensive 1 trillion claims: innumerable, all in 2022: https://twitter.com/search?q=until%3A2022-12-31%20since%3A20...)

Now: let's say a reader just can't let go of the fact some people also made 100 trillion claims, but I said most people made 1 trillion claims. I'm not sure what to say, because I never claimed no one made 100 trillion claims as well, so I'm not sure how to give those people peace so we can talk mixture of experts. I guess apologize? I'm sorry.

Now we can definitely focus on mixture of experts.

Here's innumerable claims between Jan 1st 2023 and March 31st 2023 that GPT4 was a 1 trillion mixture of experts model, as I claimed: [https://www.google.com/search?q=mixture+of+experts+trillion+.... [/r/MachineLearning](https://www.reddit.com/r/MachineLearning/comments/121q6nk/n_...) [the-decoder](https://the-decoder.com/gpt-4-has-a-trillion-parameters/) [rando boards](https://www.futuretimeline.net/forum/viewtopic.php?p=31145)