Hacker News new | ask | show | jobs
by binarymax 900 days ago
Interesting, but this aspect makes me double-take: "We demonstrate that Mistral-7B, when fine-tuned solely on synthetic data, attains competitive performance on the BEIR [ 40 ] and MTEB [27] benchmarks".

E5/BGE large are an order of magnitude smaller than Mistral-7B. So is this just "bigger model wins" in disguise?

I need to read the whole paper carefully, but this jumped out at me.

1 comments

agree, this is a nice example of generating synthetic data, and I believe that the synthetic data is helpful for generating useful embeddings for RAG, but not including an ablation with fine-tuned E5 or another commonly used embedding model (to control for the 'bigger model wins' effect) is a glaring omission. this paper shares many authors with the E5 paper, why did they not compare on a fair basis?
I thought the main point was that this is a very fast way (in terms of wall time) to beat state of the art, not a fair comparison by size; if one made E5 bigger, then E5 would be even slower to train.