Hacker News new | ask | show | jobs
by softwaredoug 878 days ago
I agree pandas or whatever data frame library you like is ideal for prototyping and exploring than setting up a bunch of infrastructure in a dev environment. Especially if you have labels and are evaluating against a ground truth.

You might be interested in SearchArray which emulates the classic search index side of things in a pandas dataframe column

https://github.com/softwaredoug/searcharray

1 comments

Thanks for the article and definitely agree you are better off to start it simple like a parquet file and faiss and then test out options with your data. I say that mainly to test chunking strategies because of how big an effect it has on everything downstream whatever vector db or bert path you take -- chunking is a much bigger impact source than most people acknowledge.