|
|
|
|
|
by mind-blight
799 days ago
|
|
I suspect the biggest difference is the input data. Embeddings are great over datasets that look like FAQs and QA docs, or data that conceptually fits into very small chunks (tweets, some product reviews, etc). It does very badly over diverse business docs, especially with naive chunking. B2B use cases usually have old PDFs and word docs that need to be searched, and they're often looking for specific keywords (e.g. a person's name, a product, an id, etc). Vectors terms to do badly in those kinds of searches, and just returning chunks misses a lot of important details |
|
Especially if they aren’t in the token vocab