| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gerash 1862 days ago
	I don't think it's vaporware but the blog post with all these big claims like 1000 more powerful than BERT (based on our arbitrary cherry picked metric) makes one cringe. Here's my guess: Some team under web search trained a large Transformer based model but with some adjustment here but now on a massive dataset from the crawled web pages using tons of TPUs. It made an incremental improvement to the search quality metrics and was shipped to production.

1 comments

Lyapunov_Lover 1862 days ago

We sort of already know that these models scale in such a way that a model with 1000 times the parameters is, indeed, 1000 times more powerful. We haven't found a ceiling effect yet, so the onus is on the skeptics. These things scale.

link

ma2rten 1853 days ago

According to the scaling laws it scales on a log scale.

link