Hacker News new | ask | show | jobs
by carlossouza 907 days ago
I did that (used other features). This is how new papers are ranked here:

https://trendingpapers.com

1 comments

Great site, thanks for sharing. Can you explain how you're determining how many times a paper is cited? Obviously papers include a list of references, but extracting them accurately from the PDF is difficult in my experience (two column formats, ugh) - though the new HTML versions help. And even if you have a list, many authors just mention arXiv paper titles, not their ids, making identifying specific references tricky.
Difficult, yes… but not impossible :)

I just extract the titles and look for their respective ids.

The real challenge was how to do that at scale. Only in CS there are well over half a million papers