Hacker News new | ask | show | jobs
by PaulHoule 544 days ago
I have a content-based recommender based on SBERT + SVM, it starts to learn with around 500 examples, I don't think it benefits from having more than about 10,000.

I have also tried fine-tuning BERT models to do the same, it takes at least 30 minutes to make one model (not do all the model selection I do w/ the sk-learn based models) and I never developed a training protocol that reliably did better than my SVM-based model. My impression there was that the small BERT models don't really seem to have a lot of learning capacity and don't seem to really benefit from 5000+ documents but really high accuracy isn't possible with my problem (predict my own fickle judgements, I feel like I am doing great with AUC-ROC 0.78 or so)

1 comments

Do you think SBERT + SVM is a good fit for handling ambiguous or less common phrases, or do you still end up needing some post-processing rules for edge cases?
I haven't tried classifying anything as small as a phrase (assuming you've extracted it yet) with SBERT+SVM so I really don't know.

Another thing to consider is a T5 model. A T5 model maps strings to strings so it can be trained to take an input like

"Extract the skills from this resume: ..."

with the output like

"Excel, Pandas, Python, Cold Fusion, C#, ..."

and it will try to do the same. You'll probably still find it makes some mistake that drives you up the wall that need some pre- or post- processing.