| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by huac 898 days ago
	agree, this is a nice example of generating synthetic data, and I believe that the synthetic data is helpful for generating useful embeddings for RAG, but not including an ablation with fine-tuned E5 or another commonly used embedding model (to control for the 'bigger model wins' effect) is a glaring omission. this paper shares many authors with the E5 paper, why did they not compare on a fair basis?

1 comments

pama 897 days ago

I thought the main point was that this is a very fast way (in terms of wall time) to beat state of the art, not a fair comparison by size; if one made E5 bigger, then E5 would be even slower to train.

link