| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Longwelwind 1142 days ago

AI trend due to ChatGPT.

For NLP use-cases, you can use a vector database to index the embeddings of your texts.

For exemple, if you implement a document retrieval (a search engine like Google), you train a transformer model that takes a text as input (the content of your document) and produces a vector of number as output (the embedding). You then index your documents by transorming them to their embeddings and storing them inside your vector database.

When you want to perform a query using keywords, you transform your keywords into a vector, and then ask your vector database to send you the most similar documents, using a similary function such as the cosine function.

1 comments

politician 1141 days ago

It seems like it would be better to index a document multiple times by generating embeddings for every paragraph rather than once per document. What am I missing?

link