|
|
|
|
|
by simonw
974 days ago
|
|
My previous implementation used TF-IDF - I basically took all the words in the post and turned them into a giant "word OR word OR word OR word" search query and piped that through SQLite full-text search. https://til.simonwillison.net/sqlite/related-content I jumped straight from that to OpenAI embeddings. The results were good enough that I didn't spend time investigating other approaches. |
|
Does that mean you'd return other docs if they share just one word?
The idea of tfidf is that it gives you a vector (maybe combined with pca or a random dimensionality reduction) that you can use just like an Ada embedding. But you still need vector search.