Hacker News new | ask | show | jobs
by riku_iki 1700 days ago
On superglue benchmark, much smaller Deberta outperforms vanilla T5: https://super.gluebenchmark.com/leaderboard

I am curious why authors preferred T5?..

1 comments

T5 has a notion of prompting. None of the *BERT have notions of prompting.