Hacker News new | ask | show | jobs
by osipov 1857 days ago
There is nothing here but a promise. Back in the day we called this "vaporware".
4 comments

I don't think it's vaporware but the blog post with all these big claims like 1000 more powerful than BERT (based on our arbitrary cherry picked metric) makes one cringe.

Here's my guess: Some team under web search trained a large Transformer based model but with some adjustment here but now on a massive dataset from the crawled web pages using tons of TPUs. It made an incremental improvement to the search quality metrics and was shipped to production.

We sort of already know that these models scale in such a way that a model with 1000 times the parameters is, indeed, 1000 times more powerful. We haven't found a ceiling effect yet, so the onus is on the skeptics. These things scale.
According to the scaling laws it scales on a log scale.
It's Schrodinger's vaporware. We'll find out some years from now. In Perl 6's case, what, 12 years after the announcement?
Except this is Google not some startup.
Vaporware also happens with established companies.
Like IBM's Watson after Jeopardy.
What vaporware has come out of Google Brain? In fact, they've been publishing ground-breaking research after ground-breaking research that's completely changed the entire field in recent years.
After seeing Alpha* solve Go, Chess, and protein folding in the past ~3 years, I think it would be pretty silly for your prior to be discounting any Google AI project as vaporware.

Their models accomplish ridiculously powerful things. Tbh I think it's far _more_ likely the answer is "this is crazy powerful, but the engineers didn't feel like writing a blog post about it, and the marketing team hasn't figured out how to monetize it yet".

If there's anything SoTA AI researchers love and have experience doing it's writing blog posts and papers explaining how.

The lack of details makes me think they're either hiding a new technique they'd rather keep secret because it provides a competitive advantage, or that it's really only a marginal improvement over existing NLP models (or an ensemble of them with nearly no improvement on any given metric) and the 1000x improvement is on a metric that no actual ML scientist would respect.

I don't have the slightest bit of information about Google's AI team to know if those are the only two options and if so which is more likely.

It's not a secret at all. Transformer models scale. Big models are powerful. Everyone knows this. Google can afford to train very big models. It's not a new technique. I think the issue here is that people are uncomfortable with the idea of AI models displaying scale relativity.
Big model also means lots of data, including lots of unfiltered garbage used in training. Nobody can manually review so much data, all they can do is automated filtering at this scale. So this means the model has a large attack surface and it is going to be used to do something bad and shame itself when put together with critics determined to find those gaps.

We have seen in the last few months attacks on Google Translate, GPT-3 and other language models from the PC crowd, including the famous AI Ethics firings. It's just tricky to show it in this climate.

The PC crowd don't believe language is fair and concepts neutral, instead saying they are an expression of systems of power. So language models are a natural target for them because they could amplify biases against their identity groups.

I find this critique hasty especially because big language models are nascent technology. We shouldn't throw away the baby with the bath water!

The PC crowd is right. Language encodes our cultural beliefs, and many of them are pretty rotten. But how do you update a culture's shared set of beliefs? Banning words is a symbolic exercise. What we tend to do instead is that we tell stories and share perspectives. We learn to empathize.

Figuring out how to feed language models with diverse sources of information is a tough challenge, but not impossible. I share Gebru's concern about "stochastic parrots".

I'll take logical reasoning over "stories" any day.

And calling language model "parrots" is flouting. Many people worked for decades to reach that accomplishment, here come the critics to shit all over it.

> But how do you update a culture's shared set of beliefs?

It's not the place of AI models to do activism, and it's a slippery slope leading to AI based inquisition. Take a look at how China uses AI to oppress their own people.

I think showing the model would immediately trigger the critics to nitpick it like the famous "He is a doctor. She is a nurse." case, so they just don't show it until they figure out a way to avoid that. Moreover, language models are easy to trick into politically incorrect conversations and porn. AI Dungeon's GPT-3 was writing lots of porn, for example.