|
|
|
|
|
by napsternxg
506 days ago
|
|
We often ignore the importance of using good baseline systems and jump to the latest shiny thing. I had a similar experience few years back when participating in a ML competitions [1,2] for detecting and typing phrases in a text. I submitted an approach based on Named Enttiy Recognition using Conditional Random Field (CRF) which has been quite robust and well known in the community and my solution beat most of tuned Deep learning solutions by quite a large margin [1]. I think a lot of folks underestimate the complexity of using some of these models (DL, LLM) and just throw them at the problem or don't compare it well against well established baselines. [1] https://scholar.google.com/citations?view_op=view_citation&h...
[2] https://scholar.google.com/citations?view_op=view_citation&h... |
|
I have a BERT + SVM + Logistic Regression (for calibration) model that can train 20 models for automatic model selection and calibration in about 3 minutes. I feel like I understand the behavior of it really well.
I've tried fine tuning a BERT for the same task and the shortest model builds take 30 minutes, the training curves make no sense (back in the day I used to be able to train networks with early stopping and get a good one every time) and if I look at arXiv papers it is rare for anyone to have a model selection process with any discipline at all, mainly people use a recipe that sorta-kinda seemed to work in some other paper. People scoff at you if you ask the engineering-oriented question "What training procedure can I use to get a good model consistently?"
Because of that I like classical ML.