Hacker News new | ask | show | jobs
by bbig 787 days ago
They've got a console for it as well, https://www.meta.ai/

And announcing a lot of integration across the Meta product suite, https://about.fb.com/news/2024/04/meta-ai-assistant-built-wi...

Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I guess it's far from being a frontier model. We'll see how it fares in the LLM Arena.

13 comments

They didn't compare against the best models because they were trying to do "in class" comparisons, and the 70B model is in the same class as Sonnet (which they do compare against) and GPT3.5 (which is much worse than sonnet). If they're beating sonnet that means they're going to be within stabbing distance of opus and gpt4 for most tasks, with the only major difference probably arising in extremely difficult reasoning benchmarks.

Since llama is open source, we're going to see fine tunes and LoRAs though, unlike opus.

Llama is open weight, not open source. They don’t release all the things you need to reproduce their weights.
Not really that either, if we assume that “open weight” means something similar to the standard meaning of “open source”—section 2 of the license discriminates against some users, and the entirety of the AUP against some uses, in contravention of FSD #0 (“The freedom to run the program as you wish, for any purpose”) as well as DFSG #5&6 = OSD #5&6 (“No Discrimination Against Persons or Groups” and “... Fields of Endeavor”, the text under those titles is identical in both cases). Section 7 of the license is a choice of jurisdiction, which (in addition to being void in many places) I believe was considered to be against or at least skirting the DFSG in other licenses. At best it’s weight-available and redistributable.
Those are all great points and these companies need to really be called out for open washing
It's a good balance IMHO. I appreciate what they have released.
I appreciate it too, and they're of course going to call it "open weights", but I reckon we (the technically informed public) should call it "weights-available".
Has anyone tested how close you need to be to the weights for copyright purposes?
It's not even clear if weights are copyrightable in the first place, so no.
Is it really useful to make an LLM open source when it takes millions of $ to train it?

At that scale, open weights with permissive license is much more useful than open source.

Which large model projects are open source in that sense? That its full source code including training material is published.
Olmo from AI2. They released the model weights plus training data and training code.

link: https://allenai.org/olmo

even if they released them, wouldn't it be prohibitively expensive to reproduce the weights?
It's impossible. Meta itself cannot reproduce the model. Because training is randomized and that info is lost. First samples a coming at random. Second there are often drop-out layers, they generate random pattern which exists only on GPU during training for the duration of a single sample. Nobody saves them, it would take much more than training data. If someone tries to re-train the patterns will be different, which results in different weight and divergence from the beginning. Model will converge to something completely different. With close behavior if training was stable. LLMs are stable.

So, no way to reproduce the model. This requirement for 'open source' is absurd. It cannot be reliably done even for small models due to GPU internal randomness. Only the smallest trained on CPU in single thread. Only academia will be interested.

1.3 million GPU hrs for the 8b model. Take you around 130 years to train on a desktop lol.
Interesting. LLAMA is trained using 16K GPUs so it would have taken around a quarter for them. An hour of GPU use costs $2-$3 so training a custom solution using LLAMA should be atleast $15K to $1M. I am trying to get started with this thing. A few guys suggested 2 GPUs were a good start but I think that would only be good for 10K training samples.
On the topic of LoRAs and finetuning, have a Colab for LoRA finetuning Llama-3 8B :) https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe...
"within stabbing distance"

dunno if english is your mother tongue, but this sounds really good (although a tad aggressive :-) )) !

As Mike Judge's historical documents show, this enhanced aggression will seem normal in a few years or even months.
ML Twitter was saying that they're working on a 400B parameter version?
Meta themselves are saying that: https://ai.meta.com/blog/meta-llama-3/
Losers & Winners from Llama-3-400B Matching 'Claude 3 Opus' etc..

Losers:

- Nvidia Stock : lid on GPU growth in the coming year or two as "Nation states" use Llama-3/Llama-4 instead spending $$$ on GPU for own models, same goes with big corporations.

- OpenAI & Sam: hard to raise speculated $100 Billion, Given GPT-4/GPT-5 advances are visible now.

- Google : diminished AI superiority posture

Winners:

- AMD, intel: these companies can focus on Chips for AI Inference instead of falling behind Nvidia Training Superior GPUs

- Universities & rest of the world : can work on top of Llama-3

I also disagree on Google...

Google's business is largely not predicated on AI the way everyone else is. Sure they hope it's a driver of growth, but if the entire LLM industry disappeared, they'd be fine. Google doesn't need AI "Superiority", they need "good enough" to prevent the masses from product switching.

If the entire world is saturated in AI, then it no longer becomes a differentiator to drive switching. And maybe the arms race will die down, and they can save on costs trying to out-gun everyone else.

AI is taking marketshare from search slowly. More and more people will go to the AI to find things and not a search bar. It will be a crisis for Google in 5-10 years.
I think I agree with you. I signed up for Perplexity Pro ($20/month) many months ago thinking I would experiment with it a month and cancel. Even though I only make about a dozen interactions a week, I can’t imagine not having it available.

That said, Google’s Gemini integration with Google Workplace apps is useful right now, and seems to be getting better. For some strange reason Google does not have Gemini integration with Google Calendar and asking the GMail integration what is on my schedule is only accurate if information is in emails.

I don’t intend to dump on Google, I liked working there and I use their paid for products like GCP, YouTube Plus, etc., but I don’t use their search all that often. I am paying for their $20/month LLM+Google One bundle, and I hope that evolves into a paid for high quality, no ad service.

Only if it does nothing. In fact Google is one of the major players in LLM field. The winner is hard to predict, chip makers likely ;) Everybody jumped on bandwagon, Amazon is jumping...
Source?
Anecdotally speaking I use google search much less frequently and instead opt for GPT4. This is also what a number of my colleagues are doing as well.
I often use ChatGPT4 for technical info. It's easier then scrolling through pages whet it works. But.. the accuracy is inconsistent, to put it mildly. Sometimes it gets stuck on wrong idea.

Interesting how far LLMs can get? Looks like we are close to scale-up limit. It's technically difficult to get bigger models. The way to go probably is to add assisting sub-modules. Examples would be web search, have it already. Database of facts, similar to search. Compilers, image analyzers, etc. With this approach LLM is only responsible for generic decisions and doesn't need to be that big. No need to memorize all data. Even logic can be partially outsourced to sub-module.

my own analysis
Google’s play is not really in AI imo, it’s in the the fact that their custom silicon allows them to run models cheaply.

Models are pretty much fungible at this point if you’re not trying to do any LoRAs or fine tunes.

There's still no other model on par with GPT-4. Not even close.
Many disagree. “Not even close” is a strong position to take on this.
It takes less than an hour of conversation with either, giving them a few tasks requiring logical reasoning, to arrive at that conclusion. If that is a strong position, it's only because so many people seem to be buying the common scoreboards wholesale.
Disagree on Nvidia, most folks fine-tune model. Proof: there are about 20k models in huggingface derived from llama 2, all of them trained on Nvidia GPUs.
Fine tuning can take a fraction of the resources required for training, so I think the original point stands.
Maybe in isolation when only considering a single fine tune. But if you look at it in aggregate I am not so sure.
The memory chip companies were done for, once Bill Gates figured out no one would ever need more than 64K of memory
Misattributed to Bill Gates, he never said it.
Right. We all need 192 or 256GB to locally run these ~70B models, and 1TB to run a 400B.
If anything a capable open source model is good for Nvidia, not commenting on their share price but business of course.

Better open models lower the barrier to build products and drive the price down, more options at cheaper prices which means bigger demand for GPUs and Cloud. More of what the end customers pay for goes to inference and not IP/training of proprietary models

Pretty sure meta still uses NVIDIA for training.
>AMD, intel: these companies can focus on Chips for AI Inference

No real evidence either can pull that off in any meaningful timeline, look how badly they neglected this type of computing the past 15 years.

AMD is already competitive on inference
Their problem is that the ecosystem is still very CUDA-centric as a whole.
And they even allow you to use it without logging in. Didnt expect that from Meta.
1. Free rlhf 2. They cookie the hell out of you to breadcrumb your journey around the web.

They don't need you to login to get what they need, much like Google

Do they really need “free RLHF”? As I understand it, RLHF needs relatively little data to work and its quality matters - I would expect paid and trained labellers to do a much better job than Joey Keyboard clicking past a “which helped you more” prompt whilst trying to generate an email.
Variety matters a lot. If you pay 1000 trained labellers, you get 1000 POVs for a good amount of money, and likely can't even think of 1000 good questions to have them ask. If you let 1000000 people give you feedback on random topics for free, and then pay 100 trained people to go through all of that and only retain the most useful 1%, you get much ten times more variety for a tenth of the cost.

Of course numbers are pretty random, but it's just to give an idea of how these things scale. This is my experience from my company's own internal -deep learning but not LLM- models to train which we had to buy data instead of collecting it. If you can't tap into data "from the wild" -in our case, for legal reason- you can still get enough data (if measured in GB), but it's depressingly more repetitive, and that's not quite the same thing when you want to generalize.

Absolutely.

Modern captchas are self driving object labelers; you just need a few to "agree" to know what the right answer is.

We should agree on a different answer for crosswalk and traffic light and mess it up for them.
I had the same reaction, but when I saw the thumbs up and down icon, I realized this was a smart way to crowd source validation data.
I do see on the bottom left:

Log in to save your conversation history, sync with Messenger, generate images and more.

Think they meant it can be used without login.
Not in the EU though
or the UK
Doesn't work for me, I'm in EU.
Probably bc they're violating gdpr
I imagine that is to compete with ChatGPT, which began doing the same.
Which indicates that they get enough value out of logged ~in~ out users. Potentially they can identify you without logging in, no need to. But also ofc they get a lot of value by giving them data via interacting with the model.
But not from Japan, and I assume most other non-English speaking countries.
Yeah, but not for image generation unfortunately

I've never had a FaceBook account, and really don't trust them regarding privacy

had to upvote this
They also stated that they are still training larger variants that will be more competitive:

> Our largest models are over 400B parameters and, while these models are still training, our team is excited about how they’re trending. Over the coming months, we’ll release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities.

Anyone have any informed guesstimations as to where we might expect a 400b parameter model for llama 3 to land benchmark wise and performance wise, relative to this current llama 3 and relative to GPT-4?

I understand that parameters mean different things for different models, and llama two had 70 b parameters, so I'm wondering if anyone can contribute some guesstimation as to what might be expected with the larger model that they are teasing?

They are aiming to beat the current GPT4 and stand a fair chance, they are unlikly to hold the crown for long.
Right because the very little I've heard out of Sam Altman this year hinting at future updates suggests that there's something coming before we turn our calendars to 2025. So equaling or mildly exceeding GPT-4 will certainly be welcome, but could amount to a temporary stint as king of the mountain.
This is always the case.

But the fact that open models are beating state of the art from 6 months ago is really telling just how little moat there is around AI.

FB are over $10B into AI. The English Channel was a wide moat just not uncrossable.
>This is always the case.

I mean anyone can throw out self evident general truisms about how there will always be new models and always new top dogs. It's a good generic assumption but I feel like I can make generic assumptions and general truisms just as well as the next person.

I'm more interested in divining in specific terms who we consider to be at the top currently, tomorrow and the day after tomorrow based on the specific things that have been reported thus far. And interestingly, thus far, the process hasn't been one of a regular rotation of temporary top dogs. It's been one top dog, Open AI's GPT, I would say that it currently is still, and when looking at what the future holds, it appears that it may have a temporary interruption before it once again is the top dog, so to speak.

That's not to say it'll always be the case but it seems like that's what our near future timeline has in store based on reporting, and it's piecing that near future together that I'm most interested in.

Google: "We Have No Moat, And Neither Does OpenAI"
Unless you are NVidia.
The benchmark for the latest checkpoint is pretty good: https://x.com/teknium1/status/1780991928726905050?s=46
Mark said in a podcast they are currently at MMLU 85, but it's still improving.
> Meta AI isn't available yet in your country

Where is it available? I got this in Norway.

Just use the Replicate demo instead, you can even alter the inference parameters

https://llama3.replicate.dev/

Or run a jupyter notebook from Unsloth on Colab

https://huggingface.co/unsloth/llama-3-8b-bnb-4bit

This version doesn't have web search and the image creation though.
The image creation isn't Llama 3, it's not multimodal yet. And the web search is Google and Bing API calls so just use Copilot or Perplexity.
>We’re rolling out Meta AI in English in more than a dozen countries outside of the US. Now, people will have access to Meta AI in Australia, Canada, Ghana, Jamaica, Malawi, New Zealand, Nigeria, Pakistan, Singapore, South Africa, Uganda, Zambia and Zimbabwe — and we’re just getting started.

https://about.fb.com/news/2024/04/meta-ai-assistant-built-wi...

That's a strange list of nations, isn't it? I wonder what their logic is.
No EU initially - I think this is the same with Gemini 1.5 Pro too. I believe it’s to do with the various legal restrictions around AI which iirc take a few weeks.
yes, china is too
All anglophone. I'm guessing privacy laws or something like that disqualifies the UK and Ireland.
GPU server locations, maybe?
LLM chat is so compute heavy and not bandwidth heavy that anywhere with reliable fiber and cheap electricity is suitable. Ping is lower than average keystroke delay for most who haven't undergone explicit speed typing training (we're talking 60~120 WPM for between intercontinental to pathological (other end of the world) servers). Bandwidth matters a bit more for multimodal interaction, but it's still rather minor.
The EU does not want you to have the AI.
Same message in Guatemala.
norway isn't in the EU
Got the same in the Netherlands.
Probably the EU laws are getting too draconian. I'm starting to see it a lot.
EU actually has the opposite of draconian privacy laws. It's more that meta doesn't have a business model if they don't intrude on your privacy
They just said laws, not privacy - the EU has introduced the "world's first comprehensive AI law". Even if it doesn't stop release of these models, it might be enough that the lawyers need extra time to review and sign off that it can be used without Meta getting one of those "7% of worldwide revenue" type fines the EU is fond of.

[0] https://www.europarl.europa.eu/topics/en/article/20230601STO...

Am I reading that right? It sounds like they’re outlawing advertising (“Cognitive behavioural manipulation of people”), credit scores (“classifying people based on behaviour, socio-economic status or personal characteristics”) and fingerprint/facial recognition for phone unlocking etc. (“Biometric identification and categorisation of people”)

Maybe they mean specific uses of these things in a centralised manner but the way it’s written makes it sound incredibly broad.

Well, exactly, and that's why IMO they'll end up pulling out the EU. There's barely any money in non-targeted ads.
If by "barely any money", you mean "all the businesses in the EU will still give you all their money as long as you've got eyeballs", then yes.
Facebook has shown me ads for both dick pills and breast surgery, for hyper-local events in town in a country I don't live in, and for a lawyer who specialises in renouncing a citizenship I don't have.

At this point, I think paying Facebook to advertise is a waste of money — the actual spam in my junk email folder is better targeted.

> IMO they'll end up pulling out the EU.

If only we’d be so lucky. I don’t thing they will, but fingers crossed.

If it's more money than it costs to operate, I doubt it. There's plenty of businesses in the EU buying ads and page promotion still.
Claude has the same restriction [0], the whole of Europe (except Albania) is excluded. Somehow I don't think it is a retaliation against Europe for fining Meta and Google. I could be wrong, but a business decision seems more likely, like keeping usage down to a manageable level in an initial phase. Still, curious to understand why, should anyone here know more.

[0] https://www.anthropic.com/claude-ai-locations

It's because of regulations!

The same reason that Threads was launched with a delay in EU. It simply takes a lot of work to comply with EU regulations, and by no surprise will we see these launches happen outside of EU first.

Yet for some reason it doesn't work in non-EU European countries like Serbia and Switzerland, either.
It's trivial to comply with EU privacy regulation if you're not depending on selling customer data.

But if you say "It's because of regulations!" I hope you have a source to back that up.

Same message in Guatemala. Not known for regulations.
Meta (and other privacy exploiting companies) have to actually... care? Even if it's just a bit more. Nothing draconian about it.
> the EU laws are getting too draconian

You also said that when Meta delayed the Threads release by a few weeks in the EU. I recommend reading the princess on a pea fairytale since you seem to be quite sheltered, using the term draconian as liberally.

>a few weeks

July to December is not "a few weeks"

Got the same in Denmark
Anakin AI has Llama 3 models available right now: https://app.anakin.ai/
Everyone saying it's an EU problem. Same message in Guatemala.
This is so frustrating. Why don't they just make it available everywhere?
This says "high-risk AI system", which is defined here: https://digital-strategy.ec.europa.eu/en/policies/regulatory.... I don't see why it would be applicable.
The text of the law says that the actual criteria can change to be whatever they think is scary:

    As regards stand-alone AI systems, namely high-risk AI systems other than those that are
   safety components of products, or that are themselves products, it is appropriate to classify
   them as high-risk if, in light of their intended purpose, they pose a high risk of harm to the
   health and safety or the fundamental rights of persons, taking into account both the severity
   of the possible harm and its probability of occurrence and they are used in a number of
   specifically pre-defined areas specified in this Regulation. The identification of those
   systems is based on the same methodology and criteria envisaged also for any future
   amendments of the list of high-risk AI systems that the Commission should be
   empowered to adopt, via delegated acts, to take into account the rapid pace of
   technological development, as well as the potential changes in the use of AI systems.
And there's also a section about systemic risks, which llama definitely falls into, and which mandates that they go through basically the same process, with offices and panels that do not yet exist:

https://ec.europa.eu/commission/presscorner/detail/en/qanda_....

I'm always glad at these rare moments when EU or American people can get a glimpse of a life outside the first world countries.
I'd call that the "anywhere but US" phenomena. Pretty much 100% of the times I see any "deals"/promotions or whatnot on my google feed, it's US based. Unfortunately I live nowhere near to the continent.
Also added Llama 3 70B to our coding copilot https://www.double.bot if anyone wants to try it for coding within their IDE and not just chat in the console
Can we stop referring to VS Code as "their IDE"?

Do you support any other editors? If the list is small, just name them. Not everyone uses or likes VS Code.

Done. Anything else?
No, actually. Thank you for that.

Your "Double vs. Github Copilot" page is great.

I've signed up for the Jetbrains waitlist.

Double seems more like a feature than a product. I feel like Copilot could easily implement those value-adds and obsolete this product.

I also don't understand why I can't bring my own API tokens. I have API keys for OpenAI, Anthropic, and even local LLMs. I guess the "secret" is in the prompting that is being done on the user's behalf.

I appreciate the work that went into this, I just think it's not for me.

That was fast! I've really been enjoying Double, thanks for your work.
Cool thanks! Will try
Tried a few queries and was surprised how fast it responded vs how slow chatgpt can be. Responses seemed just as good too.
Inference speed is not a great metric given the horizontal scalability of LLMs.
Because no one is using it
> Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I guess it's far from being a frontier model

Yeah, almost like comparing a 70b model with a 1.8 trillion parameter model doesn't make any sense when you have a 400b model pending release.

(You can't compare parameter count with a mixture of experts model, which is what the 1.8T rumor says that GPT-4 is.)
You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.
Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.
Why does Meta embed a 3.5MB animated GIF (https://about.fb.com/wp-content/uploads/2024/04/Meta-AI-Expa...) on their announcement post instead of much smaller animated WebP/APNG/MP4 file? They should care about users with low bandwidth and limited data plan.
I'm based on LLaMA 2, which is a type of transformer language model developed by Meta AI. LLaMA 2 is a more advanced version of the original LLaMA model, with improved performance and capabilities. I'm a specific instance of LLaMA 2, trained on a massive dataset of text from the internet, books, and other sources, and fine-tuned for conversational AI applications. My knowledge cutoff is December 2022, and I'm constantly learning and improving with new updates and fine-tuning.
Strange. The Llama 3 model card mentions that the knowledge cutoff dates are March 2023 for the 8B version and December 2023 for the 70B version (https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md)
Maybe a typo?
I suppose it could be hallucinations about itself.

I suppose it's perfectly fair for large language models not necessarily to know these things, but as far as manual fine tuning, I think it would be reasonable to build models that are capable of answering questions about which model they are, their training date, their number of training parameters, and how they are different from other models, etc. Seems like it would be helpful for it to know and not have to try to do its best guess and potentially hallucinate. Although in my experience Llama 3 seemed to know what it was, but generally speaking it seems like this is not necessarily always the case.

Are you trying to say you are a bot?
That's the response they got when asking the https://www.meta.ai/ web console what version of LLaMA it is.
That realtime `/imagine` prompt seems pretty great.
> And announcing a lot of integration across the Meta product suite, ...

That's ominous...

Spending millions/billions to train these models is for a reason and it's not just for funsies.
Are there an stats on if llama 3 beats out chatgpt 3.5 (the free one you can use)?