| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bigzyg33k 1166 days ago

Here's a (kinda) ELI5: you would use a language model to create "embeddings" of the text, which you can think of as a set of numbers representing the "meaning" of a set of characters.

These numbers can be plotted as points in a space, and embeddings of things with similar meanings are plotted close to each other. So things like "exam preparation" would have embeddings close to things like "top study tips".

Say you have created embeddings for a large corpus of text (in this case all youtube captions) once. If you create embeddings for a user query, you can search for embeddings close to it, and these will be "semantically" similar to the query.

The advantage is that unlike traditional full-text search, the user doesn't need a query that includes words present in the text.

1 comments

cced 1166 days ago

Do you have any resources that might guide one on doing something like this from scratch?

link

fzliu 1165 days ago

Here's one that may be helpful for you: https://zilliz.com/blog/using-vector-database-search-white-h...

link

djhn 1166 days ago

Here's a 6 minute speed run of something like that on weviate https://youtu.be/mBcBoGhFndY

link

password4321 1166 days ago

https://news.ycombinator.com/item?id=34684593#34689275

link

rahimnathwani 1166 days ago

https://news.ycombinator.com/item?id=36012299

link