Hacker News new | ask | show | jobs
by turnersr 1556 days ago
Awesome work!!!! curious how you handle paragraphs and niche language like federal regulations.

What are your favorite ways to do sentence and paragraph embeddedings and is there a framework you like where you can tune to custom data? Do you find fine tuning your embedding model helpful?

1 comments

Thanks! The post doesn’t cover fine tuning of the model which would be absolutely necessary (but out of scope for the post). Nils Reimers (the author of SBERT) has been on a speaking circuit covering Generative Pseudo Labelling to handle the vocabulary gap of new domains that a pretrained sbert model hasn’t seen yet.

https://youtu.be/qzQPbIcQu9Q