Hacker News new | ask | show | jobs
by jonathan-adly 1286 days ago
I did the same for the FDA drug label database and 100% believe that this the future for search. Semantic search layer for context then the large language layer for human answers.

Tip - you don’t actually need GPT-3 level embedding for a decent semantic search. Sentence transformers paired with one of their models is good enough.

I like this: https://huggingface.co/sentence-transformers/multi-qa-MiniLM... - since it’s very light.

Also, perhaps I am an idiot but I just used Postgres array field to store my embeddings array to keep things simple and free.

2 comments

It was less than $2 to embed all 100+ episodes with the new OpenAI embeddings and was as easy as just making a bunch of API calls. Pretty hard to beat that experience.
Did you use the sentence-transformers model as-is or did you need to fine-tune it for medical data?
As is.. all it’s doing is pulling relevant context to the question. GPT-3 is doing all the heavy natural language lifting.