Hacker News new | ask | show | jobs
by Royce-CMR 200 days ago
Super noob in vector embeddings: I never considered that tables would be a complexifier. (beyond defining in a parseable format for ingestion).

Do vector databases do better with long grouped text vs table formats?

1 comments

The issue is the ingestion (extracting the right data in the right format). This is mainly an issue in PDFs and sometimes when there are tables added as images in Docx too. You need a mix of text and OCR extraction to get the data correctly first before start chunking and adding embeddings