Hacker News new | ask | show | jobs
by tylersuard 457 days ago
First of all, great question.

Second, we use a search service, and vectors are treated as supplementary to the text search, so chunking doesn't matter as much. We will usually take an entire PDF page and embed that, no matter what structure the data on that page is. We do keep track of the name of the document and the page number. For SQL records, we just turn each record into a text string and embed that.

1 comments

Thanks for your feedback! Could you share a bit about your team? I’m curious how many people are involved and what kinds of skills or roles are needed to make this happen.