Hacker News new | ask | show | jobs
by eden-u4 534 days ago
this project only uses kaggle metadata and abstract from arxiv. Moreover it is "focused" on only 5-6 categories in the arxiv. Therefore, the costs are marginal.

Plus you could use a mixed system: first you index the abstract of the most relevant 50 papers, then embedd the text of those 50 in order to asses which are truly relevant and/or meaningful.