| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eden-u4 582 days ago
	this project only uses kaggle metadata and abstract from arxiv. Moreover it is "focused" on only 5-6 categories in the arxiv. Therefore, the costs are marginal. Plus you could use a mixed system: first you index the abstract of the most relevant 50 papers, then embedd the text of those 50 in order to asses which are truly relevant and/or meaningful.