| HN Mirror

These ultra fast embeddings are really cool, because you can just spam them at everything and it's pretty much instant.

I was able to get them to answer very simple questions without any vector database or pre indexing, just expanding the search query to synonyms, then using normal fulltext search, using embeddings to match article titles to the query, plus adding a few "Personality documents" that are always in every result set no matter what.

Then I do chunking on the fly based on similarity to to query.

Retrieval takes about 1 second on a CPU, but then the actual LLM call takes 10 to 40 seconds, because you need about 1500 bytes of context to consistently get something that has the answers in it... Not exactly useful at the moment on cheap consumer hardware but still very interesting.

https://huggingface.co/blog/static-embeddings