Hacker News new | ask | show | jobs
by happyhardcore 702 days ago
I suspect this is why OpenAI is going more in the direction of optimising for price / latency / whatever with 4o-mini and whatnot. Presumably they found out long before the rest of us did that models can't really get all that much better than what we're approaching now, and once you're there the only thing you can compete on is how many parameters it takes and how cheaply you can serve that to users.
3 comments

Meta just claimed the opposite in their Llama 3.1 paper. Look at the conclusion. They say that their experience indicates significant gains for the next iteration of models.

The current crop of benchmarks might not reflect these gains, by the way.

I sell widgets. I promise the incalculable power of widgets has yet to be unleashed on the world, but it is tremendous and awesome and we should all be very afraid of widgets taking over the world because I can't see how they won't.

Anyway here's the sales page. the widget subscription is so premium you won't even miss the subscription fee.

This. It's really weird the way we suddenly live in a world where it's the norm to take whatever a tech company says about future products at face value. This is the same world where Tesla promised "zero intervention LA to NYC self driving" by the end of the year in 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, and 2024. The same world where we know for a fact that multiple GenAI demos by multiple companies were just completely faked.

It's weird. In the late 2010s it seems like people were wising up to the idea that you can't implicitly trust big tech companies, even if they have nap pods in the office and have their first day employees wear funny hats. Then ChatGPT lands and everyone is back to fully trusting these companies when they say they are mere months from turning the world upside down with their AI, which they say every month for the last 12-24 months.

I'm not sure anyone is asking you to take it at face value or implicitly trust them? There's a 92-page paper with details: https://ai.meta.com/research/publications/the-llama-3-herd-o...
> In the late 2010s it seems like people were wising up to the idea that you can't implicitly trust big tech companies

In the 2000s we only had Microsoft, and none of us were confused as to whether to trust Bill Gates or not...

Nobody tells it like Zitron:

https://www.wheresyoured.at/pop-culture/

> What makes this interview – and really, this paper — so remarkable is how thoroughly and aggressively it attacks every bit of marketing collateral the AI movement has. Acemoglu specifically questions the belief that AI models will simply get more powerful as we throw more data and GPU capacity at them, and specifically ask a question: what does it mean to "double AI's capabilities"? How does that actually make something like, say, a customer service rep better? And this is a specific problem with the AI fantasists' spiel. They heavily rely on the idea that not only will these large language models (LLMs) get more powerful, but that getting more powerful will somehow grant it the power to do...something. As Acemoglu says, "what does it mean to double AI's capabilities?"

I don't think claiming that pure scaling of LLMs isn't going to lead to AGI is a particularly hot take. Or that current LLMs don't provide a whole lot of economic value. Obviously, if you were running a research lab you'd be trying a bunch of different things, including pure scaling. It would be weird not to. I don't know if we're going to hit actual AGI in the next decade, but given the progress of the last less-than-decade I don't see why anyone would rule it out. That in itself seems pretty remarkable, and it's not hard to see where the hype is coming from.
Meta just keeps releasing their models as open-source, so that whole line of thinking breaks down quickly.
That line of thinking would not have reached the conclusion that you imply, which is that open source == pure altruism. Having the benefit of hindsight, it’s very difficult for me to believe that. Who knows though!

I’m about Zucks age, and have been following his career/impact since college; it’s been roughly a cosine graph of doing good or evil over time :) I think we’re at 2pi by now, and if you are correct maybe it hockey-sticks up and to the right. I hope so.

I don't think this is a matter of good or evil, simply a matter of business strategy.

If LLMs end up being the platform of the future, Zuck doesn't want OpenAI/Microsoft to be able to monopolize it.

Wouldn't the equivalent for Meta actually be something like:

> Other companies sell widgets. We have a bunch of widget-making machines and so we released a whole bunch of free widgets. We noticed that the widgets got better the more we made and expect widgets to become even better in future. Anyway here's the free download.

Given that Meta isn't actually selling their models?

Your response might make sense if it were to something OpenAI or Anthropic said, but as is I can't say I follow the analogy.

that would make sense if it was from Openai, but Meta doesn't actually sell these widgets? They release the widget machines for free in the hopes that other people will build a widget ecosystem around them to rival the closed widget ecosystem that threatens to lock them out of a potential "next platform" powered by widgets.
Meta doesn't sell widgets in this scenario - they give them away for free. Their competition sells widgets, so Meta would be perfectly happy if the widget market totally collapsed.
That is strong (and fun) point, but this is peer reviewable and has more open collaboration elements than purely selling widgets.

We should still be skeptical because often want to claim to be better or have unearned answers, but I don't think the motive to lie is quite as strong as a salesman's.

> this is peer reviewable

It's not peer-reviewable in any shape or form.

Others can build models that try to have decent performance with a lower number of parameters. If they match what is in the paper that is the crudest form of review, but Mistral is releasing some models (this one?) so this can get more nuanced if needs.

That said, doing that is slow and people will need to make decisions before that is done.

So, the best you can do is "the crudest form of review"?
It is kind of "peer-reviewable" in the "Elon Musk vs Yann LeCun" form, but I doubt that the original commenter meant this.
Except: Meta doesn't sell AI at all. Zuck is just doing this for two reasons:

- flex

- deal a blow to Altmann

Meta uses ai in all the recommendation algorithms. They absolutely hope to turn their chat assistants into a product on WhatsApp too, and GenAI is crucial to creating the metaverse. This isn’t just a charity case.
AI isn't a single thing: of course meta didn't buy thousands of GPUs for fun.

But it has nothing to do with LLMs (and interestingly enough they aren't opening their recommendation tech).

There are literal ads for Meta Ai on television. The idea they’re not selling something is absurd.
If OpenAI was saying this you'd have a point but I wouldn't call Facebook a widget seller in this case when they're giving their widgets away for free.
But Meta isn't selling it
They also said in the paper that 405B was only trained to "compute-optimal" unlike the smaller models that were trained well past that point indicating the larger model still had some runway so had they continued it would have kept getting stronger.
Makes sense right? Otherwise why make a model so large that nobody can conceivably run it if not to optimize for performance on a limited dataset/compute? It was always a distillation source model, not a production one.
LLMs are reaching saturation on even some of the latest benchmarks and yet I am still a little disappointed by how they perform in practice.

They are by no means bad, but I am now mostly interested in long context competency. We need benchmarks that force the LLM to complete multiple tasks simultaneously in one super long session.

I don't know anything about AI but there's one thing I want it to do for me. Program a full body exercise program long term based on the parameters I give it such as available equipment and past workout context goals. I haven't had good success with chatgpt but I assume what you're talking about is relevant to my goals.
Aren't there apps that already do this like Fitbod?
Fitbod might do the trick. Thanks! The availability of equipment was a difficult thing for me to incorporate into a fitness program.
Yeah, but what does that actually mean? That if they had simply doubled the parameters on Llama 405b it would score way better on benchmarks and become the new state-of-the-art by a long mile?

I mean, going by their own model evals on various benchmarks (https://llama.meta.com/), Llama 405b scores anywhere from a few points to almost 10 points more than than Llama 70b even though the former has ~5.5x more params. As far as scale in concerned, the relationship isn't even linear.

Which in most cases makes sense, you obviously can't get a 200% on these benchmarks, so if the smaller model is already at ~95% or whatever then there isn't much room for improvement. There is, however, the GPQA benchmark. Whereas Llama 70b scores ~47%, Llama 405b only scores ~51%. That's not a huge improvement despite the significant difference in size.

Most likely, we're going to see improvements in small model performance by way of better data. Otherwise though, I fail to see how we're supposed to get significantly better model performance by way of scale when the relationship between model size and benchmark scores is nowhere near linear. I really wish someone who's team "scale is all you need" could help me see what I'm missing.

And of course we might find some breakthrough that enables actual reasoning in models or whatever, but I find that purely speculative at this point, anything but inevitable.

Or maybe they just want to avoid getting sued by shareholders for dumping so much money into unproven technology that ended up being the same or worse than the competitor
> the only thing you can compete on is how many parameters it takes and how cheaply you can serve that to users.

The problem with this strategy is that it's really tough to compete with open models in this space over the long run.

If you look at OpenAI's homepage right now they're trying to promote "ChatGPT on your desktop", so it's clear even they realize that most people are looking for a local product. But once again this is a problem for them because open models run locally are always going to offer more in terms of privacy and features.

In order for proprietary models served through an API to compete long term they need to offer significant performance improvements over open/local offerings, but that gap has been perpetually shrinking.

On an M3 macbook pro you can run open models easily for free that perform close enough to OpenAI that I can use them as my primary LLM for effectively free with complete privacy and lots of room for improvement if I want to dive into the details. Ollama today is pretty much easier to install than just logging into ChatGPT and the performance feels a bit more responsive for most tasks. If I'm doing a serious LLM project I most certainly won't use proprietary models because the control I have over the model is too limited.

At this point I have completely stopped using proprietary LLMs despite working with LLMs everyday. Honestly can't understand any serious software engineer who wouldn't use open models (again the control and tooling provided is just so much better), and for less technical users it's getting easier and easier to just run open models locally.

In the long run maybe but it's going to take probably 5 years or more before laptops such as Macbook M3 with 64 GB RAM will be mainstream. Also it's going going to take a while before such models with 70B params will be bundled in Windows and Mac with system update. Even more time before you will have such models inside your smartphone.

OpenAI did a good move with making GPTo mini so dirty cheap that it's faster and cheaper to run than LLama 3.1 70B. Most consumers will interact with LLM via some apps using LLM API, Web Panel on desktop or native mobile app for the same reason most people use GMail etc. instead of native email client. Setting up IMAP, POP etc is for most people out of reach the same like installing Ollama + Docker + OpenWebUI

App developers are not gonna bet on local LLM only as long they are not mainstream and preinstalled on 50%+ devices.

I think their desktop app still runs the actual LLM queries remotely.
This. It's a mac port of the iOS app. Using the API.
Totally. I wrote about this when they announced their dev-day stuff.

In my opinion, they've found that intelligence with current architecture is actually an S-curve and not an exponential, so trying to make progress in other directions: UX and EQ.

https://nicholascharriere.com/blog/thoughts-openai-spring-re...