Hacker News new | ask | show | jobs
by egorr 1047 days ago
heya, supabase engineer here, i coauthored the part about benchmarking different models with pgvector to compare QPS when using fewer dims models.

btw, you can find the dataset with embeddings generated by all 3 mentioned: text-embedding-ada-002, all-MiniLM-L6-v2, and GTE-small on huggingface[0]

and big thanks to Stephan Sturges for his dataset[1]. we just extended his OpenAI ones and texts themselves with oss ones

[0] https://huggingface.co/datasets/Supabase/wikipedia-en-embedd...

[1] https://www.kaggle.com/datasets/stephanst/wikipedia-simple-o...