| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by philmcp 1747 days ago

This is fantastic progress, great to see

16x smaller = 41.5GB though

More research needs to be undertaken in model compression imo

1 comments

On superglue benchmark, much smaller Deberta outperforms vanilla T5: https://super.gluebenchmark.com/leaderboard

I am curious why authors preferred T5?..

T5 has a notion of prompting. None of the *BERT have notions of prompting.