Hacker News new | ask | show | jobs
by anon373839 839 days ago
Oh, right. If the high-level task is to generate a translation or summary, I think that’s been swallowed up by the Bitter Lesson (though isn’t it an open question if decoder-only models are the best fit? I’d like to see a T5 with the scale and pretraining that newer models have had).

On the other hand, people seem to be using GPT-4 for simple text classification and entity extraction tasks that even a small BERT could do well at a fraction of the cost.