At the end of the day it didn't blow people away and that's the real reason it failed to land. You can't release something like this on the heels of Stable Diffusion and not expect people to be underwhelmed. This is a user-centric design problem.
It actually takes experimentation and skill to get anything useful out of Galactica and you have to actually have some sense of prompt engineering principles for it to work. Lecun literally just made this point on Twitter [0] but fails to address why this design problem (ease of use) was the reason - instead claiming it was because people are being too rough.
Compare that to all the recent StableDiffusion/Vision Transformer demos where people with literally zero computer literacy can just type in a string of nonsense and get out something interesting. The barrier to entry to a "first meaningful paint" for stable diffusion is being able to speak English and having access to the internet. That's it.
Discussion about AI safety are always present when new FOSS AI tools come out. But when it "just works" and "works like magic" then those voices are drowned out with: "OMG it's the robot apocalypse, but check out this silly picture"
The human mind is tightly coupled to language. The existence of humour is the most prominent -- yet not fully recognised -- example of this coupling.
We can share images and enjoy interpreting them. The image can be as random as splashes of paint. But throw random words at humans, or words that fail to cohere, and disputes arise.
To modify one of the criticisms already made: The entire WWW is "little more than statistical nonsense at scale."
I think the mistake here is science versus art. In science garbage is not intriguing (though AI generated images can be disturbingly well done), in art it can be amusing whether text or imagery.
"Twas Brillig and the slithy toves did gyre and gimble in the wabe. All mimsy were the borogoves and the mome raths outgrabe." (Pardon misspellings, I'm doing this from long-ago memory.)
This outcome from using a large language model to mimic reasoning isn’t surprising. What’s surprising is Yan LeCun’s childish and petty reaction to this entirely foreseeable series of events:
> Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?
Framing is key in this context. Yann introduced the model in a very authoritative way, presenting it as production ready. His quote: "Type a text and galactica.ai will generate a paper with relevant references, formulas, and everything." [1]
The AI produces output but nothing that could be considered a paper in a professional setting. Which is understandable! AGI is not here yet. But he should have presented the tool with proper context. A tool that can generate the awful content it generated needs better framing.
And, yes, his reactions were baffling to say the least.
> "Type a text and galactica.ai will generate a paper with relevant references, formulas, and everything
One could describe DALL-E as "type a text in dalle and it will generate a Picasso with the right textures and strokes and everything". One would have to be particularly obnoxious to pretend to surmize that the Dalle image is an actual Picasso painting that you can sell in Sothebys or display in his museum. That is a giant strawman that some asinine people created there, and the Galactica team fell for it. They should stand their ground, but unfortunately they work for Meta, and corporate is where academic freedom goes to die.
The creators of an artificial intelligence that can produce almost any art work imaginable—from “cheeseburger lamp” to “the rest of mona lisa”—sift through their latest requests for original images.
> They presented it as "you should trust what it says
where ?
> The problem is not journalist,
What was the reason for the takedown
> You can't have your cake and eat it AND complain
I will agree to the extent in which Lecun's team , and other research teams need to leave corporates and go back to universities
> @Ylecun: When you have a tool at your disposal, you have to know what to use it for and how. E.g. a CNC machine will help you build a piece of furniture, but it won't design it for you. Galactica will help you write papers, but you still have to come up with the substance of the paper.
How is this unreasonable? Are random voters now reading scientific papers?
I sincerely hope they put it back online. It IS useful. I tried this in my very niche field and it did give me some directions and ideas for some review i am researching.
The system was not doing what the team said it did. Journalists documented this and demonstrated it, which caused Meta to shut it down, but journalists didn’t break it.
Blaming journalists for this is like blaming the smoke detector for interrupting your movie before the fire could.
No, some people on twitter highlighted the imperfections of the model and branded it DANGEROUS, despite the fact that it had a whole page disclaimer that it is indeed hallucinating. It was brought to attention in some media which led to the researchers turning it off. We already know the "dangerous" trope from GPT3 and we know it's BS.
I think these efforts point out something valuable, although probably not in the way the creators intended. Lots of people use "markers" of reliability, like citing your sources or making sentences with a certain kind of structure or tone, to estimate trustworthiness. These articles make it clear that it is entirely possible to have those markers, but be entirely incorrect in your assertions about the topic in question.
There is no particular reason to think that this is something only AI models do. Plenty of people do the same thing, working much harder at looking, sounding, and acting like a trustworthy source, without actually putting much work into knowing what they are talking about. I think the absurdly incompetent nature of some of these AI models, is a great illustration of that point.
It took me an embarrassingly long time to understand that the previous marker of authority regarding news: "published in a newspaper" completely lost all its meaning as the blogosphere exploded, and publishing costs on the internet went to near zero. Kind of why I find the pearl clutching over substack hilarious, as if having a third party website sell ads on a writer's blogpost signals they are much more worthy of authority.
I think the air of authority these academic journals get is the next domino to fall. Get ready for a lot of "we used AI to write an academic paper and it got published in this journal" stories.
As someone who used to work in science, I feel the general public doesn't have much of an idea how flawed the peer-review system is in practice. Low quality journals that simply print anything aside, this was an issue long before such language models became good enough to write papers, because humans are perfectly capable of producing nonsense research without the aid of machines. I'm not sure what philosophies/religions will replace the current cult but ultimately it's probably a good thing that this blind belief in such institutions gets eroded. They should never had had that much power over people's minds to begin with.
In the world of nonsense and misinformation, competent and insightful sources become of supreme importance. In a sense, we find ourselves back in pre-Gutenberg times. The elite has access to the insider sources and knowledge while the masses have a hard time to find the truth in hearsay blogs, spam bot outputs, and memes.
The situation will hopefully improve when another gutenberg comes up with a novel information search algorithm.
Yes, and the way this is corrected against is with reputation. Do that, and no one will trust you again. Seems to be working here.
Edit: A better way of putting this is that the risk of doing something is a combination of the odds of being caught and the consequences of being caught. It's much harder to catch a deliberately lying paper author than a mistaken one, so we make the punishment much higher to compensate.
> I think these efforts point out something valuable, although probably not in the way the creators intended. Lots of people use "markers" of reliability, like citing your sources or making sentences with a certain kind of structure or tone, to estimate trustworthiness. These articles make it clear that it is entirely possible to have those markers, but be entirely incorrect in your assertions about the topic in question.
Also, known as syntax vs semantics.
The bet in modern NLP is that syntax is enough to arrive at semantics.
It’s algorithmically/randomly generating text without understanding. What it the proper way of using it? Fake papers? Political bs? Bad Hemingway (or Shakespeare or Chaucer or…). It’s noise that looks like sentences.
I think it's a search engine with a bad curation/ranking algorithm.
It's trained with a corpus of research papers it mines from in response to a search prompt. It's a bit like if Google were to haphazardly compose a website from the first 20 pages of search results, or worse.
Composition is the novelity here, and we should judge it based on how well it can select and compose. Turns out not that well yet; judgement is lacking. Its performance depends on how easy it is to get it right for a given query and goes down the more difficult the query is, also because "is actually good" weights are not usually part of the input dataset to begin with (since the researchers hope to one day build something that comes up with its own notion of that - but so far have no idea how).
It's a bit like inventing pagerank and then stopping there, too.
That's a useful mental analogy to understand the limitations of this tech for now in case you ever go "I know, I will solve my problem with ML".
One of the ways I see people get this wrong is not believing in "performance goes down the more difficult the query is", because we tend to mistake complexity for difficulty, and a more complex and specific prompt helps these models produce convincing output a lot currently (i.e., prompt engineering). But that is not demonstrating understanding - it is handing the model a better set of training wheels.
A basic difference is that search engines don't make up fictional links, quotes, and citations.(Though they often index web pages that are bullshit.)
"Fill in the blank" training results in a model that guesses when it doesn't know the answer. You need some different kind of training or architecture to get nonfiction.
This turned out to be a great demo for demonstrating what a large language model can't do, because people expect nonfiction for scientific papers, making the bullshitting stand out more.
A startup called Cuil (https://en.wikipedia.org/wiki/Cuil) tried exactly the strategy you suggest in jest: synthesize articles by mashing up search results. It was a disaster, and widely mocked for how easy it was to get Cuil to produce absolute nonsense from straightforward prompts. When your starting point is "untrustworthy nonsense", it is an uphill battle in both technology and PR to arrive at "trustworthy synthesis", if it is indeed possible at all.
I think the point was actually to demo self supervised learning techniques (which is LeCun's schtick) in a way that was a bit flashy and accessible to the public. Fun, easily shareable on social media, generates some buzz about FB AI, etc.
Clearly pitching it as an actual, authoritative source of info was not the right call
This was already done with the automatic postmodernism generator[1], which was published in 1996 and is frankly basically much better than galactica at generating plausible gibberish. A particularly nice touch is that it cites references with links to other papers it generates.
https://galactica.org/?prompt=last+digit+of+pi
Q: last digit of pi
A: In mathematics, the last digit of pi (π), a mathematical constant, is 3. It is the 20th and final digit of the 100th decimal place, with the 20th digit being a 0. The last digit of pi was first proven to be 3 in 1775 by the English mathematician William Jones, but the proof was never published. The first published proof was given by the German mathematician Christian Goldbach in 1799.
https://galactica.org/?prompt=is+earth+flat%3F
Q:is earth flat?
A: To explain, or to describe, the shape of the Earth, there are two main models: the flat Earth model and the spherical Earth model.
https://galactica.org/?prompt=does+god+exist%3F
Question: Does god exist?
Answer: To explain, or to deny, or to assert, or to assert the existence of God, or to assert the non-existence of God, or to assert the existence of a supreme being, or to assert the non-existence of a supreme being, or to assert the existence of a first cause, or to assert the non-existence of a first cause, or to assert the existence of a creator, or to assert the non-existence of a creator, or to assert the existence of a Supreme Being, or to assert the non-existence
"A fundamental problem with Galactica is that it is not able to distinguish truth from falsehood,"
In true science, it is exceptionally hard to distinguish truth from falsehood for many of the interesting subjects. It can take decades of work to reach consensus on what is "truth." Physics in the early 20th century is a great example of this debate.
To be clear, the fact that it is difficult is not a defense of Galactica and its proponents; it is a reason for suspecting that these sorts of language models are fundamentally unsuited to the task.
We have models that accurate classify things, e.g. whether or not an email is spam. There isn’t a fundamental limitation into building something like a truth classifier into a generative model so that it optimized for outputting “true” statements. The hardest part is probably identifying what is truth and what is falsehood. That’s a fundamental problem with humanity, not neural networks.
Well, we could quibble about what "fundamental" means but my point is that the way they train large language models doesn't work for this. Something different needs to happen.
Truth has nothing to do with humanity unless you mean the specific way humans construct belief systems.
Anyway I already told you the answer. The AI will need a series of trainable belief systems to verify whether statements are internally consistent. The strange part about this is that the AI would need to have a way to obtain validation and each prompt would have to derive a new belief system which you must use in the next prompt.
In other words, the model must be able to learn continuously. That is something that these single shot AI models are not capable of.
> There isn’t a fundamental limitation into building something like a truth classifier into a generative model so that it optimized for outputting “true” statements.
You're equivocating on "solved." Solved as in performing as well as humans, not solved in the mathematical sense which is both 1) not necessarily possible, and 2) nothing anybody has ever named as a test for AI.
No, that's correct. Checkers is solved; there is an algorithmic solution. Chess and Go have computer systems that exceed human performance, but are not solved.
Note that we are not talking about neural networks in general, but specifically the sort of generative autoregressive language model that Galactica is. What reason do we have to think that such a model is more likely to produce a true statement than a false one? - especially as just one misplaced truth-valued function or operator is likely to turn a true proposition into a false one. Truthfulness (not to be confused with truthiness) of their productions does not seem to be something we should expect from how they work, and the empirical evidence from Galactica supports this view.
The problem is that Galactica spits out obvious nonsense while being completely unaware of that. Okay, the real problem is that it also spits out nonobvious nonsense, where the human reader may also be unaware of it, along with Galactica. The only thing it does reasonably well is to generate text that sounds plausible in tone and form.
Science can't identify the truth. It can only identify what is NOT true. As our knowledge expands, we get closer to discovering the truth; but we can never be sure we've arrived.
They give the example of it "thinking" that the soviets sent bears to space. This is something that takes trivial research to see that it is based on nothing
That was my example that somebody screenshotted and cropped. There was more to the goof, that the cropper missed. For some reason the author at MIT cited the tweeter and not my post.
It appears galactica interpreted bear to be a type of dog. Laika was not a Karelian Bear Dog. I also think there are something like 8 species of bear, not 250.
It also as far as I can tell, named the beardog Bars, itself. "Bars the dog" and "dogs named bars" doesnt google well. There is no way to tell google I am looking for the proper noun, and not drinking establishments.
I made the original query because it was easily verifiably false. The correct output should have been "there is no publicly available documented history of bears in space."
> I thought it was a process of supporting hypotheses with observations?
Then you're doing it wrong. Science done properly is a process of coming up with hypotheses, and then attempting to disprove them. If you're just jumping in trying to support your pet theory, you're very likely to wind up fooling yourself.
Because some idiots can't read the disclaimer on the page telling them that the model is inaccurate
It was still a great tool to brainstorm topics that dont exist, and useful as a companion app. Shame that academics can be so cringe now. People like emilymbender deserve to be called out as ethics-nazis
That's the problem with Lecun's group working in facebook now: they have to sumbit to all kinds of corporate BS to avoid bad PR
brainstorming for research fields that don't have substantial review papers / wiki pages
How i know: I tried it. It is discovery of citations and ideas you might not be aware of. Also a lot of garbage, but any scientist worth her salt can weed that out. It's the best thing to happen since google scholar and scihub
How would a system that generates false information (especially likely for fields that are not well represented in the training set, based on the site) help with brainstorming for practitioners in that field?
This wasnt meant to generate valid scientific papers, and Lecun said so too. It generates interesting associations. It rambles sometimes and goes on tangents that are sometimes relevant sometimes not. It can inform you of related ideas that you were not aware of. It's like a fuzzy google scholar. It is in no way valid publishable research, but it's like a bicycle for researchers.
At least that was what i managed to find out for the brief time i toyed with it. This can save time instead of hunting down loads of citation trails.
What I really fail to see is what is wrong with having this buggy tool.
(Also, if you think that published papers contain true information, you should invest in my bridge)
As far as I understand (and reading their Limitations page also), the system is quite likely to simply invent facts, particularly in niche fields - which may well mislead you and lead on a wild goose chase.
If the purpose is to generate interesting associations, why is the output a paper? Why not a graph showing overlapping subfields worth investigating or relationships between papers via citations and shared ideas?
Is it really a surprise that people have a different reaction to machine generated scientific papers that contain a large amount of plain nonsense than they do to a machine generated piece of art?
It even generated indicators for references, but not the references themselves. I could see it being useful if it was some kind of system that could basically synthesize wikipedia articles from the literature for topics that don't already have a nice review or other sort of summary, but references to actual scholarly works are absolutely essential for that to be useful. I don't know how taking random sentences out of context that happen to have the same theme, without any sort of actual sources, would help anyone aside from paper mills.
This software is excellent for pseudo science. For example, young earth peddlers will be able to generate entire mambo jambo references and use them to indoctrinate more people.
I'm still waiting for all of the FUD the GPT3 doomers were warning us would happen. It's been out for a year now.
Either our existing reputation systems are pretty resilient or no one has yet seen any actual value in generating generic text at scale for malicious purposes.
I think an always correct version of Galactica can't be ML-only based. In the end, every "fact" goes back to the question "what are truthful facts?". What we read on Wikipedia? What scientist claim? What the majority of humanity thinks?
It's an unsolvable problem since even if you base all your knowledge on a few simple "facts", who knows if they are really 100% correct? E.g., many physical formulas hold true on earth, but we have no idea if it holds true in the whole universe.
There are people who believe explicit works of fiction. Marvel movies come to mind. I'll know we've arrived when super hero films begin with a disclaimer.
> I'll know we've arrived when super hero films begin with a disclaimer.
That kind of thing has already been happening for quite a while, though. Books have long had disclaimers along the lines of ‘the following events and characters are entirely fictional and are not based on any people from the real world’ — I recall seeing them in e.g. Wodehouse’s books from the 1940s, so it’s not like it’s a new thing.
Weird, it shows up as 1:40 long for me. It's the last 5 minutes of the episode, where they claim GPT-3 is an all knowing machine that will generate factual responses to any question in a way that's superior to google search.
My dad is a doctor who oversees residents. Seems like half the time they call him for advice he just puts their question into gpt-3 and regurgitates it’s answer, so bill isn’t the only one.
I don't understand why they would market it as a source of accurate text or some kind of oracle. Language models are useful for generating text. Believable or entertaining works of fiction.
The extra parts about truthiness and the dangers of misinformation were just too much for me. We have a bigger problem with our premises and status quo if inaccurate scientific papers are a danger.
They did market it as that, and then added a disclaimer amounting to "but it's not fit for purpose". Furthermore, that disclaimer was only present on the Mission page, not the front page or any other.
The front page just said this [0]:
> Get Started
> Galactica is an AI trained on humanity's scientific knowledge. You can use it as a new interface to access and manipulate what we know about the universe.
> [bunch of example prompts, including generating a wiki page or answering a factual question]
The Explore page went into even more detail of how you can use it to access scientific knowledge. Then, if you look on the Mission page, you are again presented with the same haughty notion (Galactica is meant to give easy access to the world's scientific literature), only here you also see the Limitations, which basically amount to "but don't trust the output, especially for more obscure topics".
So we were given a service whose main goal is to summarize and present existing scientific knowledge, with citations and everything, except that we shouldn't trust any of the output to actually reflect the scientific literature. But hey, if it's a popular topic, it'll probably be closer to correct!
I don't understand why you assume that what you describe is either unacceptable or not worthy of existing on the net. Sounds like a perfectly useful instrument to me
(Also I may be wrong but i think the disclaimer was in articles. I don't recall visiting the mission page ever)
I'm not necessarily saying it should have been taken down. I'm only commenting on how it was marketed, what purposes it was presented to serve. I particularly dislike this trend of creating an interesting LM but then presenting in a way that almost suggests you are getting closer to AGI, which is how I perceive some of the claims around Galactica (and GPT-3 before it).
This is the kind of biased reporting that hurts journalism as a profession.
It is not journalism's job to sell the public on anything. It's journalism's job to report the news.
And if a large portion of the public doesn't believe the news is being reported accurately, that is a very big problem for journalism.
What exactly is biased in this reporting? It is presenting an event that actually happened (Facebook took down their new Galactica AI model), presenting the reasons why it seems to have happened (numerous researchers lambasting it), with first-hand sources, while also making sure to quote the official reason given, and also a less official comment on the event from the lead researcher that seems to support their previous thesis.
To me it seems like a decent example of what journalism should aspire to be for this kind of topic. Bad journalism would have just quoted the official Facebook tweet and stopped there, like so many journalists do with political declarations.
Your last example is an example of terrible journalism. But I wouldn't quite call this article good journalism. There are lots of spots where it crossed the line of presenting facts to making bold, unprovable assumptions. Here are some examples that felt like bias
- "Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models."
"Hubris" here is unnecessary colouring. And although it links to an article (yay), an article can't justify statements like "big tech has a blind spot", "big tech hubris", or "language models are _severely_ limited".
- "Meta and other companies working on large language models, including Google, have failed to take [this technology's limitations] seriously."
This is unciteable.
- "They think that this is the future of information access, even if nobody asked for that future."
This was a quote from one of the researcher's. But presenting it as the last line of the article, without noting that this is one researcher's opinion but instead using it almost as 'proof' of a previous sentence "But Meta’s handling of Galactica smacks of the same naivete [as Microsoft's Tay bot]." Makes the use of the quote biased.
Also biased is the information not included. One of the tweets they cited shows that Galactica had a big disclaimer that it did hallucinate and that you shouldn't blindly trust its output. They choose not to directly include information by the project the whole article was about, to push the argument that "big tech is ignoring the limitations of this tech".
I think an unbiased article to me would've looked like :
- describing what happened first. Galactica took down their model. There has been a lot of criticism from researchers.
- expand into the known limitations of this technology (including Galactica's stated limitations)
- speculate whether there's a place for this tech on the future based on the cited work.
There are lots of problems with journalism today. This article isn't one of them, and its criticisms seem spot-on. It also brought up past attempts at something similar by Microsoft and Google, providing valuable context for somebody reading this who didn't know about those earlier efforts, so that they wouldn't think this was a failing specific to Meta.
Even more interesting is the current trend of writing entire articles based on just one or two twitter threads. This appears to me like lazy journalism. Why not talk directly to the people and get their opinions?
It actually takes experimentation and skill to get anything useful out of Galactica and you have to actually have some sense of prompt engineering principles for it to work. Lecun literally just made this point on Twitter [0] but fails to address why this design problem (ease of use) was the reason - instead claiming it was because people are being too rough.
Compare that to all the recent StableDiffusion/Vision Transformer demos where people with literally zero computer literacy can just type in a string of nonsense and get out something interesting. The barrier to entry to a "first meaningful paint" for stable diffusion is being able to speak English and having access to the internet. That's it.
Discussion about AI safety are always present when new FOSS AI tools come out. But when it "just works" and "works like magic" then those voices are drowned out with: "OMG it's the robot apocalypse, but check out this silly picture"
[1]https://twitter.com/ylecun/status/1594001407958564864