|
|
|
|
|
by gfourfour
752 days ago
|
|
Nothing too crazy, just downloading a dump, splitting it into manageable batch sizes, and using a lightweight embedding model to vectorize each article.
Using the best GPU available on colab it takes maybe 8 hours if I remember correctly? Vectors can be saved as NPY files and loaded into something like FAISS for fast querying. |
|