Hacker News new | ask | show | jobs
by chaxor 1055 days ago
This tradeoff is ridiculous, even if it is "better" by .01% F score. I would much rather have a dataset created in 1 day from BERT at 98% F-score than 1000 years at 98.01% F-score from a 540B parameter model, or even a 33B parameter model. The performance in million parameter models for NER is still excellent, and works at speed that are usable. Running things through OpenAI is also useless, as it would cost a few million $.
1 comments

It's more like 100% accuracy vs 95% accuracy, and the super large models are now able to extract non-trivial derived info from a regular human speech as well. While cost-wise it's not efficient right now, this will change over time (you skate to where puck will be, not where it is now), making the current fine-tuning way obsolete. Academically I am not thrilled as I built my research on fine-tuning, but as a producer of a product this solves so many issues at the same time, making me pretty happy.