| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by frisco 1142 days ago
	Is there anything actually released here? Just a paper? No weights, not even code?! No interactive product of any kind? The degree of Google’s inability to actually ship anything even now is totally mindblowing

5 comments

skybrian 1142 days ago

I wish people would stop with these shallow, boring dismissals. It's a research paper, not a product. While it's disappointing that they don't release more, it's good that they share their ideas.

link

fpgaminer 1142 days ago

A research paper by itself isn't worth nothing, sure, but without the ability to reproduce the paper or even check their results it's ... not worth much.

link

reaperman 1142 days ago

The architecture is described enough to re-implement it and train it on known datasets/benchmarks such as VQA2. A single man with a medical degree named Phil Wang ('lucidrains')[0] has the ability to reproduce most of these papers by himself. He has 246 GitHub repos[1], most of which appear to be reproductions of models which are only described in papers that had no associated code or models released, such as [2]. Often it appears he releases code within 2 weeks of a paper's publication on ArXiv.

0: https://lucidrains.github.io

1: https://github.com/lucidrains

2: https://paperswithcode.com/paper/coca-contrastive-captioners...

link

anentropic 1142 days ago

Given that fact, why don't the paper authors just release the artefacts then?

If it's supposed to stay secret, what's the point of "here's instructions for how to reproduce our big secret"?

Presumably the societal purpose of papers is to share knowledge, and the individual purpose is to take credit and win prestige.

It seems like the first purpose would be better served by also publishing code etc, and the second purpose wouldn't be harmed by it?

link

mirker 1142 days ago

Because the authors don’t get a large reward for open sourcing the work and they stand to lose future value by lowering the gate to competition. You may want the code, but Google will not care (or it might dislike it).

Look at GPT-3+, OpenAI gets fame and fortune while people struggle to reproduce their last-gen models.

link

xiphias2 1141 days ago

Probably because the research code is not as nice as one rewritten from scratch anyways, and it's using internal data sets / APIs.

They just want to get onto the next research instead of taking time to publish a clean open-source implementation, which can (and will) be done by somebody else anyways.

link

AdoHaha 1141 days ago

He indeed re-implemented the MaMMUT: https://github.com/lucidrains/MaMMUT-pytorch

link

alsodumb 1142 days ago

Researchers always have a lot to take away from Google papers that don’t release code or dataset - people understand that sometimes folks at companies sometimes cannot release the code or dataset. Doesn’t make any key contributions less meaningful - if that was the case, all conferences would have banned papers that don’t release code and dataset by now.

link

fpgaminer 1142 days ago

Conferences _should_ ban papers that don't release code or other means of reliable reproduction. The only reason they don't is because "research" in ML has more or less been a joke compared to any other established scientific field. And I'm not going to give Google the benefit of the doubt. At the very least I'll treat them like any random stranger publishing a paper. But in reality I treat their papers with a heavy critical eye these days because more often than not their research has turned out to be bunk and unreproducible.

link

alsodumb 1142 days ago

Funny you compare ML with other fields - in my experience ML is the most open and reproducible area of scientific research, by far. Talk to researchers in other areas, many write “dataset and code available by request” but never share it, have custom CFS solvers and write papers with it but never release the code, and do experiments and leave out all details in the paper making it impossible to reproduce.

You are free to treat a paper from Google like a paper from any random stranger, sure. But it doesn’t change the fact that many ML researchers I know in my R1 University (and many other top universities) always mention how much insights they get from these papers even when they couldn’t always release the code/models.

link

sitkack 1142 days ago

I think a large part of the innovation in ML research is precisely because the code is release. The prevalence of a github.io page with the code, the paper, slides and a video presentation is amazing. I would love to see this practice extended for every other paper in every domain.

link

ukuina 1142 days ago

I'm not against them publishing papers without code, but if they tout the paper on their blog, the onus is on them to publish things of consequence.

link

rvcdbn 1142 days ago

I think these are both important perspectives. There’s a real societal cost to having some of the best AI researchers with a golden gag only able to share limited details about their work the main bulk of which remains behind closed doors. If we don’t impose social costs to this behavior it will become even more common.

link

dragonwriter 1142 days ago

> Just a paper?

Yes, its describing an architecture.

> No interactive product of any kind? The degree of Google’s inability to actually ship anything even now is totally mindblowing.

Yes, Google is comically bad at shipping AI products (mostly, from their description, for “safety” reasons).

OTOH, they are very good at putting out papers that other people turn into products, so this kind of thing isn’t without value.

link

nl 1142 days ago

This is absolutly wrong.

Google (and other large industry labs) have the budget and resources to experiment.

They should be encouraged and thanked for publishing what works.

It's not super hard to reimplement a published paper. But running multiple experiments is hard for individuals or smaller companies.

link

SanderNL 1142 days ago

> thanked for publishing what they and they alone claim works.

Fixed.

link

nl 1142 days ago

I'm not aware of any cases where this isn't the case and I personally have implemented a number of their papers.

Are you aware of other cases?

link

throwaway29303 1142 days ago

You should read this https://news.ycombinator.com/item?id=35824408

link

light_hue_1 1142 days ago

That's Google.

I don't bother to read most Google papers unless someone tells me that they're doing something astounding. Just because I know I don't have access to their models, their code or their data. So what's the point?

As a community we need to stop accepting and stop citing papers like these.

There is no science without replicability, and it is literally impossible to replicate this work. It's not worth the paper it's printed on.

It's fine if Google wants to play with its toys at home. But we should stop pretending this is research of any value.

link

gs17 1142 days ago

I don't even bother with the "astounding" things unless there's code. MusicLM was cool, but without code/weights it might as well be a hoax.

link

gremlinsinc 1142 days ago

yeah they were supposed to have the next big image thing, OMG it can do text, and stable diffusion beat them to releasing something that actually can render text on images, so yeah - there's a lot of young companies iterating way faster than an old guard can muster up the troops and honestly I don't think anyone at Google feels inspired much these days, hard to build some passionate endeavors when your soul is being sucked right out of you.

link

fpgaminer 1142 days ago

Yeah, these days I'm "over" Google's AI research. All their papers sound cool, and they've got nice pictures/audio/etc. But nothing meaningful has ever materialized from Google.

OpenAI is killing it with ChatGPT, a publicly accessible product with research papers that have been reproduced. Facebook released huge LLMs for free. Stability released successful image models for free. etc.

Meanwhile Google has ... Tensorflow? Dying. TPUs? Only used by Google themselves or when they're free. Bard? A joke compared to ChatGPT. Imagen? Never released.

Remember when Google said they were going to have an AI call your hair salon and make appointments for you? Yeah...

But hey, at least they've got that golden mountain of PII they harvested from everyone that's been oh so valuable in building new, market defining products... It's not like small companies are running circles around them using publicly available hardware and publicly available data...

And they've still got search, a that product just keeps getting better and more useful by the day...

link

esafak 1142 days ago

> Remember when Google said they were going to have an AI call your hair salon and make appointments for you?

I think that ran into social problems, not technical ones.

https://www.theverge.com/2019/5/9/18538194/google-duplex-ai-...

link

mirker 1142 days ago

There is a ton of value. OpenAI having proprietary LLMs single handedly pivoted the entire field to LLMs. A random GitHub repository doesn’t come close to impact.

link