Hacker News new | ask | show | jobs
by yuhongsun 479 days ago
We have a dataset that we use internally to evaluate our search quality. It's more representative of our use case since it contains Slack messages, call transcripts, very technical design docs, company policies which is pretty different from what embedding models are typically trained on.

We checked the recall at 4K tokens (which was a pretty typical token limit of the previous generation of LLMs) and we were at over 94% recall for our 10K document set. We also added a lot of noise to it (Slack messages from public Slack workspaces) to get hundreds of thousands of documents but recall remained at over 90%.