Hacker News new | ask | show | jobs
by philmcp 1700 days ago
This is fantastic progress, great to see

16x smaller = 41.5GB though

More research needs to be undertaken in model compression imo

1 comments

On superglue benchmark, much smaller Deberta outperforms vanilla T5: https://super.gluebenchmark.com/leaderboard

I am curious why authors preferred T5?..

T5 has a notion of prompting. None of the *BERT have notions of prompting.