Hacker News new | ask | show | jobs
by gerash 1862 days ago
I don't think it's vaporware but the blog post with all these big claims like 1000 more powerful than BERT (based on our arbitrary cherry picked metric) makes one cringe.

Here's my guess: Some team under web search trained a large Transformer based model but with some adjustment here but now on a massive dataset from the crawled web pages using tons of TPUs. It made an incremental improvement to the search quality metrics and was shipped to production.

1 comments

We sort of already know that these models scale in such a way that a model with 1000 times the parameters is, indeed, 1000 times more powerful. We haven't found a ceiling effect yet, so the onus is on the skeptics. These things scale.
According to the scaling laws it scales on a log scale.