|
|
|
|
|
by PaulHoule
161 days ago
|
|
The BERT models are easy to probability calibrate too! BERT + pooling + SVM works pretty good for some problems and is maybe 20x faster to train than the fine-tuned BERT. My take as an academic-adjacent [1] developer of boring and reliable applications is that I don't like the training recipes people use for fine-tuned BERT [2] and think that BERT + biLSTM + probability calibration should equal or exceed those fine-tuned BERTs particularly because I think I can add early stopping and do model selection with a parameter scan. [1] reads arXiv papers where run-of-the-mill researchers solve run-of-the-mill problems [2] particularly as the number of samples is >> 500 which is easy to get in many cases; e.g. for most tasks you can make 1-2k judgements a day though with visual tasks when I've done 5k a day sprints for a few days I start to hallucinate and compulsive classify scenes in front of me |
|