| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by __jl__ 702 days ago
	Meta just claimed the opposite in their Llama 3.1 paper. Look at the conclusion. They say that their experience indicates significant gains for the next iteration of models. The current crop of benchmarks might not reflect these gains, by the way.

5 comments

splwjs 702 days ago

I sell widgets. I promise the incalculable power of widgets has yet to be unleashed on the world, but it is tremendous and awesome and we should all be very afraid of widgets taking over the world because I can't see how they won't.

Anyway here's the sales page. the widget subscription is so premium you won't even miss the subscription fee.

link

_uhtu 702 days ago

This. It's really weird the way we suddenly live in a world where it's the norm to take whatever a tech company says about future products at face value. This is the same world where Tesla promised "zero intervention LA to NYC self driving" by the end of the year in 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, and 2024. The same world where we know for a fact that multiple GenAI demos by multiple companies were just completely faked.

It's weird. In the late 2010s it seems like people were wising up to the idea that you can't implicitly trust big tech companies, even if they have nap pods in the office and have their first day employees wear funny hats. Then ChatGPT lands and everyone is back to fully trusting these companies when they say they are mere months from turning the world upside down with their AI, which they say every month for the last 12-24 months.

link

cle 702 days ago

I'm not sure anyone is asking you to take it at face value or implicitly trust them? There's a 92-page paper with details: https://ai.meta.com/research/publications/the-llama-3-herd-o...

link

hnfong 702 days ago

> In the late 2010s it seems like people were wising up to the idea that you can't implicitly trust big tech companies

In the 2000s we only had Microsoft, and none of us were confused as to whether to trust Bill Gates or not...

link

mikae1 701 days ago

Nobody tells it like Zitron:

https://www.wheresyoured.at/pop-culture/

> What makes this interview – and really, this paper — so remarkable is how thoroughly and aggressively it attacks every bit of marketing collateral the AI movement has. Acemoglu specifically questions the belief that AI models will simply get more powerful as we throw more data and GPU capacity at them, and specifically ask a question: what does it mean to "double AI's capabilities"? How does that actually make something like, say, a customer service rep better? And this is a specific problem with the AI fantasists' spiel. They heavily rely on the idea that not only will these large language models (LLMs) get more powerful, but that getting more powerful will somehow grant it the power to do...something. As Acemoglu says, "what does it mean to double AI's capabilities?"

link

MrScruff 701 days ago

I don't think claiming that pure scaling of LLMs isn't going to lead to AGI is a particularly hot take. Or that current LLMs don't provide a whole lot of economic value. Obviously, if you were running a research lab you'd be trying a bunch of different things, including pure scaling. It would be weird not to. I don't know if we're going to hit actual AGI in the next decade, but given the progress of the last less-than-decade I don't see why anyone would rule it out. That in itself seems pretty remarkable, and it's not hard to see where the hype is coming from.

link

RhodesianHunter 701 days ago

Meta just keeps releasing their models as open-source, so that whole line of thinking breaks down quickly.

link

threecheese 701 days ago

That line of thinking would not have reached the conclusion that you imply, which is that open source == pure altruism. Having the benefit of hindsight, it’s very difficult for me to believe that. Who knows though!

I’m about Zucks age, and have been following his career/impact since college; it’s been roughly a cosine graph of doing good or evil over time :) I think we’re at 2pi by now, and if you are correct maybe it hockey-sticks up and to the right. I hope so.

link

RhodesianHunter 700 days ago

I don't think this is a matter of good or evil, simply a matter of business strategy.

If LLMs end up being the platform of the future, Zuck doesn't want OpenAI/Microsoft to be able to monopolize it.

link

ctoth 702 days ago

Wouldn't the equivalent for Meta actually be something like:

> Other companies sell widgets. We have a bunch of widget-making machines and so we released a whole bunch of free widgets. We noticed that the widgets got better the more we made and expect widgets to become even better in future. Anyway here's the free download.

Given that Meta isn't actually selling their models?

Your response might make sense if it were to something OpenAI or Anthropic said, but as is I can't say I follow the analogy.

link

mattnewton 702 days ago

that would make sense if it was from Openai, but Meta doesn't actually sell these widgets? They release the widget machines for free in the hopes that other people will build a widget ecosystem around them to rival the closed widget ecosystem that threatens to lock them out of a potential "next platform" powered by widgets.

link

camel_Snake 702 days ago

Meta doesn't sell widgets in this scenario - they give them away for free. Their competition sells widgets, so Meta would be perfectly happy if the widget market totally collapsed.

link

sqeaky 702 days ago

That is strong (and fun) point, but this is peer reviewable and has more open collaboration elements than purely selling widgets.

We should still be skeptical because often want to claim to be better or have unearned answers, but I don't think the motive to lie is quite as strong as a salesman's.

link

troupo 702 days ago

> this is peer reviewable

It's not peer-reviewable in any shape or form.

link

sqeaky 701 days ago

Others can build models that try to have decent performance with a lower number of parameters. If they match what is in the paper that is the crudest form of review, but Mistral is releasing some models (this one?) so this can get more nuanced if needs.

That said, doing that is slow and people will need to make decisions before that is done.

link

troupo 701 days ago

So, the best you can do is "the crudest form of review"?

link

hnfong 702 days ago

It is kind of "peer-reviewable" in the "Elon Musk vs Yann LeCun" form, but I doubt that the original commenter meant this.

link

littlestymaar 702 days ago

Except: Meta doesn't sell AI at all. Zuck is just doing this for two reasons:

- flex

- deal a blow to Altmann

link

HDThoreaun 702 days ago

Meta uses ai in all the recommendation algorithms. They absolutely hope to turn their chat assistants into a product on WhatsApp too, and GenAI is crucial to creating the metaverse. This isn’t just a charity case.

link

littlestymaar 701 days ago

AI isn't a single thing: of course meta didn't buy thousands of GPUs for fun.

But it has nothing to do with LLMs (and interestingly enough they aren't opening their recommendation tech).

link

PodgieTar 701 days ago

There are literal ads for Meta Ai on television. The idea they’re not selling something is absurd.

link

ThrowawayTestr 702 days ago

If OpenAI was saying this you'd have a point but I wouldn't call Facebook a widget seller in this case when they're giving their widgets away for free.

link

X6S1x6Okd1st 701 days ago

But Meta isn't selling it

link

nathanasmith 702 days ago

They also said in the paper that 405B was only trained to "compute-optimal" unlike the smaller models that were trained well past that point indicating the larger model still had some runway so had they continued it would have kept getting stronger.

link

moffkalast 702 days ago

Makes sense right? Otherwise why make a model so large that nobody can conceivably run it if not to optimize for performance on a limited dataset/compute? It was always a distillation source model, not a production one.

link

imtringued 702 days ago

LLMs are reaching saturation on even some of the latest benchmarks and yet I am still a little disappointed by how they perform in practice.

They are by no means bad, but I am now mostly interested in long context competency. We need benchmarks that force the LLM to complete multiple tasks simultaneously in one super long session.

link

xeromal 702 days ago

I don't know anything about AI but there's one thing I want it to do for me. Program a full body exercise program long term based on the parameters I give it such as available equipment and past workout context goals. I haven't had good success with chatgpt but I assume what you're talking about is relevant to my goals.

link

ThrowawayTestr 702 days ago

Aren't there apps that already do this like Fitbod?

link

xeromal 702 days ago

Fitbod might do the trick. Thanks! The availability of equipment was a difficult thing for me to incorporate into a fitness program.

link

Bjorkbat 702 days ago

Yeah, but what does that actually mean? That if they had simply doubled the parameters on Llama 405b it would score way better on benchmarks and become the new state-of-the-art by a long mile?

I mean, going by their own model evals on various benchmarks (https://llama.meta.com/), Llama 405b scores anywhere from a few points to almost 10 points more than than Llama 70b even though the former has ~5.5x more params. As far as scale in concerned, the relationship isn't even linear.

Which in most cases makes sense, you obviously can't get a 200% on these benchmarks, so if the smaller model is already at ~95% or whatever then there isn't much room for improvement. There is, however, the GPQA benchmark. Whereas Llama 70b scores ~47%, Llama 405b only scores ~51%. That's not a huge improvement despite the significant difference in size.

Most likely, we're going to see improvements in small model performance by way of better data. Otherwise though, I fail to see how we're supposed to get significantly better model performance by way of scale when the relationship between model size and benchmark scores is nowhere near linear. I really wish someone who's team "scale is all you need" could help me see what I'm missing.

And of course we might find some breakthrough that enables actual reasoning in models or whatever, but I find that purely speculative at this point, anything but inevitable.

link

dev1ycan 702 days ago

Or maybe they just want to avoid getting sued by shareholders for dumping so much money into unproven technology that ended up being the same or worse than the competitor

link