|
|
|
|
|
by leobg
763 days ago
|
|
You can use ONXX versions of embedding models. Those run faster on CPU. Also, don’t discount plain old BM25 and fastText. For many queries, keyword or bag-of-words based search works just as well as fancy 1536 dim vectors. You can also do things like tokenize your text using the tokenizer that GPT-4 uses (via tiktoken for instance) and then index those tokens instead of words in BM25. |
|